Introduction

In the previous article (What is an MP3 file and where did it come from?), we discussed the origin of MP3 files, their advantages, and capabilities. In this article, we intend to explain the mechanism and the process of decoding and playing MP3 files.

As mentioned in the previous article, unlike WAV files, the MP3 file, in order to reduce the size of the audio file, performs compression and encoding of the sound and breaks it into small sections called frames.

MP3 waveform with fixed 1152-sample frames (1 to 1000) — **MP3 Audio Frame**

To achieve a suitable audio output, we must be able to decompress individual frames at the appropriate time, send them to the digital-to-analog unit for playback, and there should be no interruptions to prevent audio gaps or skips. Given the capabilities of old microcontrollers, this was not possible; because due to memory and processor limitations, the speed of converting frames to raw files was lower than the audio playback bitrate, and this very issue caused these types of microcontrollers to be unable to play such files. To address this deficiency, chip manufacturing companies took action and introduced their hardware decoders to the market. Most users resorted to these types of chips for playing audio files in MP3 format.

VLSI Solution VS1003 audio decoder IC in LQFP-48 package — **VS1003: all-in-one MP3/WMA decoder + DAC in a single chip.**

One of the well-known companies in the production of decoder chips is VLSI, which gained significant popularity through the VS1003 chip. Using this chip, many portable and non-portable players were designed.
As an example, the photo below shows a portable MP3 player built using this very chip and an AVR microcontroller.

DIY portable MP3 player built with VS1003 audio decoder chip — **DIY portable MP3 player**

However, this type of design came with its own specific problems. Including the bulkiness of the circuit, higher energy consumption, increased final cost, and so on. But there was no other solution, and using these types of chips was inevitable.

Until ARM microcontrollers were introduced, which, in addition to low power consumption, had significantly higher processing speeds. The amount of memory and high processing speed of this family of microcontrollers made it possible to decode and directly play MP3 files; in such a way that no external components are needed for decoding and playback, and all processes can be handled within the microcontroller itself.

Extracting Frames

Up to this point, we have become familiar with the general structure of the MP3 file. To start the decoding and playback process, the first step is to find the frames and extract them from the file. As you know, an MP3 file consists of a collection of audio frames.

Simplified diagram of MP3 file structure — **MP3 layout**

As you can see in the image above, each data frame has a header that, in addition to specifying the start of a frame, also contains a set of information about that frame. For clearer understanding, look at the table below:

Field	Description
Header	always 32 bits
CRC (optional)	16 bits (optional – present only if protection bit is set)
Side Information	136 bits (single channel) or 256 bits (dual/stereo channel)
Main Data	variable length – contains the actual encoded audio data
Ancillary Data	optional user-defined data (often empty)

Each frame contains the above details, where the first 32 bits (4 bytes) hold the sync and header information. The next 16 bits (2 bytes) contain validation information, which is optional, meaning it may not exist. The next 136/256 bits contain information related to the audio channel; (as you know, an audio file may be mono or stereo) and the remaining data pertains to the audio data.

To extract the audio data from its compressed and encoded state, it is necessary to first extract the frame information from the header and, based on the obtained data, begin decoding the audio data.

MP3 Decoding – MP3 Header Decryption

As we mentioned earlier, the header contains 32 bits, each bit having its own specific meaning and significance. If we categorize the bits separately and color them according to their meaning, the result will be as follows:

AAAAAAAAAAABBCCDEEEEFFGHIIJJKLMM

11 bits (A) (bits 21 to 31): Identification bits (all 1s) used to detect the start of a frame.
2 bits (B) (bits 20 and 19): Header version containing 4 states:

00 – MPEG Version 2.5
01 – reserved
10 – MPEG Version 2
11 – MPEG Version 1

2 bits (C) (bits 18 and 17): Layer descriptor:

00 – reserved
01 – Layer III
10 – Layer II
11 – Layer I

1 bit (D) (bit 16): Validation bit

0: CRC section will exist.
1: CRC section will not exist.
4 bits (E) (bits 12 to 15): Output bitrate (according to the table below)

Bits	V1,L1	V1,L2	V1,L3	V2,L1	V2,L2	V2,L3
0000	free	free	free	free	free	free
0001	32	32	32	88	0	0
0010	64	48	40	48	16	16
0011	96	56	48	56	24	24
0100	128	64	56	64	32	32
0101	160	80	64	80	40	40
0110	192	96	80	96	48	48
0111	224	112	96	112	56	56
1000	256	128	112	128	64	64
1001	288	160	128	144	80	80
1010	320	192	160	160	96	96
1011	352	224	192	176	116	116
1100	384	256	224	192	128	128
1101	416	320	256	224	144	144
1110	448	384	320	256	160	160
1111	bad	bad	bad	bad	bad	bad

🔗NOTES

All values are in kbps

V1 – MPEG Version 1
V2 – MPEG Version 2 and Version 2.5
L1 – Layer I
L2 – Layer II
L3 – Layer III

2 bits (F) (bits 10 and 11): Sample rate according to the table below

Bits	MPEG1	MPEG2	MPEG2.5
00	44100 Hz	22050 Hz	11025 Hz
01	48000 Hz	24000 Hz	12000 Hz
10	32000 Hz	16000 Hz	8000 Hz
11	reserv.	reserv.	reserv.

Bit (G) (bit 9): Padding bit
- 0 – frame is not padded
- 1 – frame is padded with one extra slot
Bit (H) (bit 8): Private bit. This one is only informative.
2 bits (I) (bits 6 and 7): Channel Mode
- 00 – Stereo
- 01 – Joint stereo (Stereo)
- 10 – Dual channel (2 mono channels)
- 11 – Single channel (Mono)
2 bits (J) (bits 4 and 5): Mode extension (Only used in Joint stereo)
Bit (K) (bit 3): Copyright
- Audio is not copyrighted
- Audio is copyrighted
Bit (L) (bit 2): Original
- 0 – Copy of original media
- 1 – Original media
2 bits (M) (bits 0 and 1): Emphasis

After the four bytes describing the header, depending on the status of bit D in this descriptor, there may or may not be 2 bytes of CRC. After these optional two bytes, depending on whether the frame is single channel or dual channel (bits 6 and 7), there will be 17 or 32 bytes of channel information, followed by the audio data. Note that in very rare cases, a frame may not contain audio data. In the image below, we have marked one frame within an MP3 file where the 4 red bytes are the header and the 32 blue bytes are the channel information. The remaining bytes are the audio data. Note that there are no error-checking bytes.

Hex dump of an MP3 file showing one complete MPEG audio frame highlighted — **Real MP3 frame dissected: 4-byte header (red) → 32-byte channel info (purple) → audio data (no CRC).**

MP3 Decoding – Audio Data Decryption

To decrypt the data in this section, you first need to be familiar with the Huffman algorithm; the Huffman coding method is a data compression method founded by David Huffman. In the Huffman method, data is compressed by 20 to 90 percent. The Huffman algorithm uses a priority queue data structure. Explaining how the Huffman algorithm works is beyond the scope of this article, and those interested in learning can refer to relevant websites.

For decoding this section, as shown in the image, you must use the Huffman tree.

MP3 main data section structure — **MP3 main data layout**

How to Build an MP3 Player Module

Hardware Explanation

As shown in the image below, the hardware used for MP3 file decoding is a very simple hardware whose main component is an STM32F103RET microcontroller. All processes occur within the microcontroller.

Complete schematic of MP3 decoder based on STM32F103RET6 — **MP3 decoder schematic with STM32F103**

For converting digital data, the internal digital-to-analog converters (DAC) of the microcontroller are used, which can provide a rate of 1 megasample per second and 12-bit precision. The memory card is also connected to the microcontroller using the SPI interface. In fact, the circuit has been designed to be as simple and understandable as possible.

Software

The software is written using the Keil compiler and is capable of playing MP3 files up to 320 kbps and WAV files up to the highest possible bitrate. For faster MP3 decoding, part of the code is written in assembly language. Maximum use has been made of available resources. For transferring audio data to the DAC converters, the DMA unit is used so that the CPU has minimal involvement in this transfer.

int main(void){
    /* System initialization */
    RCC_Configuration();
    GPIO_Configuration();
    NVIC_Configuration();
    buffer_Init();                 /* Uart init RX / TX buffers */
    USART1_Init();

    /* Initialize SD Card with LED feedback */
    while(disk_initialize(0)==STA_NOINIT)   /* Init SdCard Disk */
    {
        SET_LED();
    }
    RESET_LED();

    /* Mount FAT filesystem with LED feedback */
    while(f_mount(0,&fs)!=RES_OK)           /* Mount Fat File System */
    {
        SET_LED();
    }
    RESET_LED();

    printf("Eicut.com\r\n");

    MP3_PlayAudioFile("start.mp3");         /* Play Mp3 File */
    play_wave("start.wav");                 /* Play Wave File */

    while(1)
    {
        /* Main loop - add your application code here */
    }
}

int main(void){

/* System initialization */

RCC_Configuration();

GPIO_Configuration();

NVIC_Configuration();

buffer_Init(); /* Uart init RX / TX buffers */

USART1_Init();

/* Initialize SD Card with LED feedback */

while(disk_initialize(0)==STA_NOINIT) /* Init SdCard Disk */

{

SET_LED();

}

RESET_LED();

/* Mount FAT filesystem with LED feedback */

while(f_mount(0,&fs)!=RES_OK) /* Mount Fat File System */

{

SET_LED();

}

RESET_LED();

printf("Eicut.com\r\n");

MP3_PlayAudioFile("start.mp3"); /* Play Mp3 File */

play_wave("start.wav"); /* Play Wave File */

while(1)

{

/* Main loop - add your application code here */

}

According to the above program, after installing and configuring the memory card, the program first looks for the start.mp3 file within the memory card. If such a file exists, it executes it and after completion, looks for the start.wav file to play it.

You can easily turn this code into a portable MP3 Player by adding the functions and sections you need.

Download Source Code for MP3 Decoder

🔗Source Code

Source Code for MP3 Decoder Download

FAQ – MP3 Software Decoder on Microcontroller

Why could old microcontrollers not play MP3 directly?

Because decoding MP3 frames in real-time requires high CPU speed and sufficient RAM. Older 8-bit AVRs (e.g., ATmega328) were too slow — the decoder couldn’t keep up with the audio bitrate, causing gaps or stuttering. That’s why dedicated decoder chips like VS1003, VS1053 were mandatory.

What changed with ARM Cortex-M microcontrollers?

STM32F1/F4/H7 series brought 72–400 MHz clock, large Flash/RAM, hardware DMA, and built-in 12-bit DACs. This made pure-software MP3 decoding fast enough for real-time playback up to 320 kbps without any external decoder IC.

What is an MP3 “frame” and why is it important?

MP3 files are divided into independent frames (usually 26–41 ms). Each frame has:
• 4-byte header (sync + bitrate + samplerate + channel mode)
• Optional CRC
• Side information
• Huffman-coded main audio data
The player must decode and play each frame exactly on time — no gaps allowed.

What is Huffman coding in MP3?

After psychoacoustic masking, the remaining frequency coefficients are compressed using Huffman coding — a variable-length prefix code that assigns shorter codes to more frequent values. Decoding requires Huffman tables (32 different tables defined in the MPEG standard).

Why is DMA critical for smooth MP3 playback?

The DAC needs a constant stream of samples (e.g., 44.1 kHz × 16-bit = 88 KB/s). If the CPU manually feeds the DAC, it wastes cycles and risks underruns. DMA automatically transfers decoded PCM data from RAM → DAC, freeing the CPU for decoding the next frame.

Can I play 320 kbps MP3 on STM32F103?

Yes — with optimized code (some parts in assembly) and DMA + double buffering, the STM32F103RET6 at 72 MHz can decode and play 320 kbps stereo MP3 smoothly.

Do I still need an external audio DAC?

No — STM32F1/F4 have built-in 12-bit DACs with 1 MSPS. Just add a simple RC low-pass filter + amplifier (e.g., LM386 or PAM8403) and you have line-level or speaker output.

Which libraries are most commonly used for MP3 decoding on STM32?

• Helix MP3 Decoder – fixed-point, very popular (used in many DIY projects)
• MAD (libmad) – high quality, fixed-point
• Minimad / STM32-audio – lightweight ports specifically for Cortex-M

MP3 Software Decoder on Microcontroller | Source Code + Schematic Leave a comment