Image of MP3 Software Decoder on Microcontroller

MP3 Software Decoder on Microcontroller | Source Code + Schematic Leave a comment

Introduction

In the previous article (What is an MP3 file and where did it come from?), we discussed the origin of MP3 files, their advantages, and capabilities. In this article, we intend to explain the mechanism and the process of decoding and playing MP3 files.

As mentioned in the previous article, unlike WAV files, the MP3 file, in order to reduce the size of the audio file, performs compression and encoding of the sound and breaks it into small sections called frames.

 

MP3 waveform with fixed 1152-sample frames (1 to 1000)
MP3 Audio Frame

To achieve a suitable audio output, we must be able to decompress individual frames at the appropriate time, send them to the digital-to-analog unit for playback, and there should be no interruptions to prevent audio gaps or skips. Given the capabilities of old microcontrollers, this was not possible; because due to memory and processor limitations, the speed of converting frames to raw files was lower than the audio playback bitrate, and this very issue caused these types of microcontrollers to be unable to play such files. To address this deficiency, chip manufacturing companies took action and introduced their hardware decoders to the market. Most users resorted to these types of chips for playing audio files in MP3 format.

VLSI Solution VS1003 audio decoder IC in LQFP-48 package
VS1003: all-in-one MP3/WMA decoder + DAC in a single chip.

One of the well-known companies in the production of decoder chips is VLSI, which gained significant popularity through the VS1003 chip. Using this chip, many portable and non-portable players were designed.
As an example, the photo below shows a portable MP3 player built using this very chip and an AVR microcontroller.

DIY portable MP3 player built with VS1003 audio decoder chip
DIY portable MP3 player

However, this type of design came with its own specific problems. Including the bulkiness of the circuit, higher energy consumption, increased final cost, and so on. But there was no other solution, and using these types of chips was inevitable.

Until ARM microcontrollers were introduced, which, in addition to low power consumption, had significantly higher processing speeds. The amount of memory and high processing speed of this family of microcontrollers made it possible to decode and directly play MP3 files; in such a way that no external components are needed for decoding and playback, and all processes can be handled within the microcontroller itself.

Extracting Frames

Up to this point, we have become familiar with the general structure of the MP3 file. To start the decoding and playback process, the first step is to find the frames and extract them from the file. As you know, an MP3 file consists of a collection of audio frames.

Simplified diagram of MP3 file structure
MP3 layout

As you can see in the image above, each data frame has a header that, in addition to specifying the start of a frame, also contains a set of information about that frame. For clearer understanding, look at the table below:

Field Description
Header always 32 bits
CRC (optional) 16 bits (optional – present only if protection bit is set)
Side Information 136 bits (single channel) or 256 bits (dual/stereo channel)
Main Data variable length – contains the actual encoded audio data
Ancillary Data optional user-defined data (often empty)

 

Each frame contains the above details, where the first 32 bits (4 bytes) hold the sync and header information. The next 16 bits (2 bytes) contain validation information, which is optional, meaning it may not exist. The next 136/256 bits contain information related to the audio channel; (as you know, an audio file may be mono or stereo) and the remaining data pertains to the audio data.

To extract the audio data from its compressed and encoded state, it is necessary to first extract the frame information from the header and, based on the obtained data, begin decoding the audio data.

MP3 Decoding – MP3 Header Decryption

As we mentioned earlier, the header contains 32 bits, each bit having its own specific meaning and significance. If we categorize the bits separately and color them according to their meaning, the result will be as follows:

AAAAAAAAAAABBCCDEEEEFFGHIIJJKLMM

  • 11 bits (A) (bits 21 to 31): Identification bits (all 1s) used to detect the start of a frame.
  • 2 bits (B) (bits 20 and 19): Header version containing 4 states:
  • 00 – MPEG Version 2.5
  • 01 – reserved
  • 10 – MPEG Version 2
  • 11 – MPEG Version 1
  • 2 bits (C) (bits 18 and 17): Layer descriptor:
  • 00 – reserved
  • 01 – Layer III
  • 10 – Layer II
  • 11 – Layer I
  • 1 bit (D) (bit 16): Validation bit
  • 0: CRC section will exist.
  • 1: CRC section will not exist.
  • 4 bits (E) (bits 12 to 15): Output bitrate (according to the table below)
Bits V1,L1 V1,L2 V1,L3 V2,L1 V2,L2 V2,L3
0000 free free free free free free
0001 32 32 32 88 0 0
0010 64 48 40 48 16 16
0011 96 56 48 56 24 24
0100 128 64 56 64 32 32
0101 160 80 64 80 40 40
0110 192 96 80 96 48 48
0111 224 112 96 112 56 56
1000 256 128 112 128 64 64
1001 288 160 128 144 80 80
1010 320 192 160 160 96 96
1011 352 224 192 176 116 116
1100 384 256 224 192 128 128
1101 416 320 256 224 144 144
1110 448 384 320 256 160 160
1111 bad bad bad bad bad bad
🔗NOTES

All values are in kbps

  • V1 – MPEG Version 1
  • V2 – MPEG Version 2 and Version 2.5
  • L1 – Layer I
  • L2 – Layer II
  • L3 – Layer III
  • 2 bits (F) (bits 10 and 11): Sample rate according to the table below
Bits MPEG1 MPEG2 MPEG2.5
00 44100 Hz 22050 Hz 11025 Hz
01 48000 Hz 24000 Hz 12000 Hz
10 32000 Hz 16000 Hz 8000 Hz
11 reserv. reserv. reserv.

 

  • Bit (G) (bit 9): Padding bit
    • 0 – frame is not padded
    • 1 – frame is padded with one extra slot
  • Bit (H) (bit 8): Private bit. This one is only informative.
  • 2 bits (I) (bits 6 and 7): Channel Mode
    • 00 – Stereo
    • 01 – Joint stereo (Stereo)
    • 10 – Dual channel (2 mono channels)
    • 11 – Single channel (Mono)
  • 2 bits (J) (bits 4 and 5): Mode extension (Only used in Joint stereo)
  • Bit (K) (bit 3): Copyright
    • Audio is not copyrighted
    • Audio is copyrighted
  • Bit (L) (bit 2): Original
    • 0 – Copy of original media
    • 1 – Original media
  • 2 bits (M) (bits 0 and 1): Emphasis

After the four bytes describing the header, depending on the status of bit D in this descriptor, there may or may not be 2 bytes of CRC. After these optional two bytes, depending on whether the frame is single channel or dual channel (bits 6 and 7), there will be 17 or 32 bytes of channel information, followed by the audio data. Note that in very rare cases, a frame may not contain audio data. In the image below, we have marked one frame within an MP3 file where the 4 red bytes are the header and the 32 blue bytes are the channel information. The remaining bytes are the audio data. Note that there are no error-checking bytes.

Hex dump of an MP3 file showing one complete MPEG audio frame highlighted
Real MP3 frame dissected: 4-byte header (red) → 32-byte channel info (purple) → audio data (no CRC).

MP3 Decoding – Audio Data Decryption

To decrypt the data in this section, you first need to be familiar with the Huffman algorithm; the Huffman coding method is a data compression method founded by David Huffman. In the Huffman method, data is compressed by 20 to 90 percent. The Huffman algorithm uses a priority queue data structure. Explaining how the Huffman algorithm works is beyond the scope of this article, and those interested in learning can refer to relevant websites.

For decoding this section, as shown in the image, you must use the Huffman tree.

MP3 main data section structure
MP3 main data layout

How to Build an MP3 Player Module

Hardware Explanation

As shown in the image below, the hardware used for MP3 file decoding is a very simple hardware whose main component is an STM32F103RET microcontroller. All processes occur within the microcontroller.

Complete schematic of MP3 decoder based on STM32F103RET6
MP3 decoder schematic with STM32F103

For converting digital data, the internal digital-to-analog converters (DAC) of the microcontroller are used, which can provide a rate of 1 megasample per second and 12-bit precision. The memory card is also connected to the microcontroller using the SPI interface. In fact, the circuit has been designed to be as simple and understandable as possible.

Software

The software is written using the Keil compiler and is capable of playing MP3 files up to 320 kbps and WAV files up to the highest possible bitrate. For faster MP3 decoding, part of the code is written in assembly language. Maximum use has been made of available resources. For transferring audio data to the DAC converters, the DMA unit is used so that the CPU has minimal involvement in this transfer.

According to the above program, after installing and configuring the memory card, the program first looks for the start.mp3 file within the memory card. If such a file exists, it executes it and after completion, looks for the start.wav file to play it.

You can easily turn this code into a portable MP3 Player by adding the functions and sections you need.

Download Source Code for MP3 Decoder

FAQ – MP3 Software Decoder on Microcontroller

Why could old microcontrollers not play MP3 directly?

Because decoding MP3 frames in real-time requires high CPU speed and sufficient RAM. Older 8-bit AVRs (e.g., ATmega328) were too slow — the decoder couldn’t keep up with the audio bitrate, causing gaps or stuttering. That’s why dedicated decoder chips like VS1003, VS1053 were mandatory.

What changed with ARM Cortex-M microcontrollers?

STM32F1/F4/H7 series brought 72–400 MHz clock, large Flash/RAM, hardware DMA, and built-in 12-bit DACs. This made pure-software MP3 decoding fast enough for real-time playback up to 320 kbps without any external decoder IC.

What is an MP3 “frame” and why is it important?

MP3 files are divided into independent frames (usually 26–41 ms). Each frame has:
4-byte header (sync + bitrate + samplerate + channel mode)
• Optional CRC
• Side information
Huffman-coded main audio data
The player must decode and play each frame exactly on time — no gaps allowed.

What is Huffman coding in MP3?

After psychoacoustic masking, the remaining frequency coefficients are compressed using Huffman coding — a variable-length prefix code that assigns shorter codes to more frequent values. Decoding requires Huffman tables (32 different tables defined in the MPEG standard).

Why is DMA critical for smooth MP3 playback?

The DAC needs a constant stream of samples (e.g., 44.1 kHz × 16-bit = 88 KB/s). If the CPU manually feeds the DAC, it wastes cycles and risks underruns. DMA automatically transfers decoded PCM data from RAM → DAC, freeing the CPU for decoding the next frame.

Can I play 320 kbps MP3 on STM32F103?

Yes — with optimized code (some parts in assembly) and DMA + double buffering, the STM32F103RET6 at 72 MHz can decode and play 320 kbps stereo MP3 smoothly.

Do I still need an external audio DAC?

No — STM32F1/F4 have built-in 12-bit DACs with 1 MSPS. Just add a simple RC low-pass filter + amplifier (e.g., LM386 or PAM8403) and you have line-level or speaker output.

Which libraries are most commonly used for MP3 decoding on STM32?

Helix MP3 Decoder – fixed-point, very popular (used in many DIY projects)
MAD (libmad) – high quality, fixed-point
Minimad / STM32-audio – lightweight ports specifically for Cortex-M

Leave a Reply