In many microcontroller applications, you may need to read and write data from and to outside devices through I2C, SPI, USART or ADC… peripherals. When you do this using processor – you will waste a significant amount of processing time, especially for some applications that require transferring a huge data. In order to avoid occupying CPU, most advanced microcontrollers nowadays have Direct memory Access (DMA) unit. This unit does data transfers between memory locations without the need of CPU processing. In this post, I will take STM32 DMA as an example to show the advantages of using DMA over normal transferring method.

Low and medium density (LD & MD) STM32 microcontrollers have single 7 channel DMA unit while high density (HD) devices have two DMA controllers with 12 independent channels.

DMA do automated memory to memory data transfers as well as peripheral to memory and peripheral to peripheral. DMA channels can be assigned one of four priority levels: very high, high, medium and low. If two same priority channels are requested at same time – the lowest number channel gets priority. DMA channel can also be configured to transfer data in to circular buffer. So DMA is an ideal solution for any peripheral data stream application.

Speaking of physical DMA buss access it is important to note, that DMA only access bus for actual data transfer. Because DMA request phase, address computation and Ack pulse are performed during other DMA channel bus transfer. So when one DMA channel finishes bus transfer, another channel is already ready to do transfer immediately. This ensures minimal bus occupation and fast transfers. Another interesting feature of DMA bus access is that it doesn’t occupy 100% of bus time. DMA takes 5 AHB bus cycles for single word transfer between memory – three of them are still left for CPU access. This means that DMA only takes maximum 40% of buss time. So even if DMA is doing intense data transfer CPU can access any memory area, peripheral at any time. If you look at block diagram you will see that CPU has separate Ibus for Flash access. So program fetch isn’t affected by DMA.

Programming with DMA

Simply speaking programming DMA is easy when you understand the way it works. Each channel can be controlled using four registers: Memory address, peripheral address, number of data and configuration. And all channels have two dedicated registers: DMA interrupt status register and interrupt flag clear register. Once set DMA takes care of memory address increment without disturbing CPU. DMA channels can generate three interrupts: transfer finished, half-finished and transfer error.

Let’s write a simple program which transfers data between two arrays. To make it more interesting lets do same task using DMA and without it. Then we can compare the time taken in both cases.

Here is a code of STM32 DMA memory to memory transfer using Keil ARM with stm32f1 library:

 First, we create two arrays: source and destination. The size of  two arrays is determined by ARRAYSIZE which is 800 in our example.

We use one output to indicate start and stop transfer for both modes – DMA and CPU. First of all we must turn on DMA1 clock to make it functional. Then we load settings in to DMA_InitStructure. For this example, we chose DMA1 Channel1, so first of all we call DMA_DeInit(DMA1_Channel1) function which simply makes sure DMA is reset to its default values. Then, set memory to memory mode, select normal DMA mode. As priority mode we assign Medium. After that, we select data size to be transferred (32-bit word). This need to be done for both peripheral and memory addresses.

NOTE! if one of memory sizes would be different, say source 32-bit and destination 8- bit – then DMA would cycle four times in 8 bit each time.

Then we load destination, source start addresses and amount of data to be sent. After load these values using DMA_Init(DMA_Channel1, &DMA_InitStructure). After this operation DMA is prepared to do transfers. Any time DMA can be fired using DMA_Cmd(DMA_Channel1, ENABLE) command.

In order to catch end of DMA transfer, we initialized DMA transfer Complete (TC) on channel1 interrupt.

Here, we could toggle LED and change status flag giving signal to start CPU transfer test.

This is CPU based memory copy routine:

 Since two leds, LEDG at GPIOC pin 9 and LEDB is at GPIOC pin 8, we could track start and stop pulses using scope:

So transferring 800 32-bit word using DMA took 214μs:

While CPU memory copy algorithm took 544μs:

This shows significant improve of data transfer speed (more than two times). And with DMA, the biggest benefit is that CPU is totally free during transfer and may do other tasks or simply go in to sleep mode to save power.

With DMA we can do loads of work only in hardware level. We will get back to it when we get to other STM32 features such as I2C or SPI, ADC.

Next post will show an example of using DMA to transfer MPU6050 data directly from I2C to STM32 memory.

Modified from source:



6 Thoughts on “How to use STM32 DMA

  1. Gregory Smith on 27/10/2014 at 10:04 PM said:

    I love your blog
    I have read this article and enjoyed it

  2. Le Tan Phuc on 27/10/2014 at 10:14 PM said:

    Thanks 🙂

  3. Iran Espinoza on 15/07/2015 at 5:19 AM said:

    hello Le Tan Phuc, I have read a little more about what I do, basically occupy the spi, dma and 2 timer, one that generates the clock signal at the beginning (HSYNC and VSYNC), and the second phase mode counter. I read your tuturial dma, but I think you did not use cubemx, you can do with cubemx?

    • Le Tan Phuc on 18/07/2015 at 1:11 AM said:

      Hi Iran, this post and the MPU6050 i2c dma post I did last year were not using cubemx. I used stm32f1 and the standard library to test it. Basically, your application requires both Spi and timer running with dma to ensure the timing sequence. Hopefully, I will be free in the following months then I can help you go through it 🙂

      • Iran Espinoza on 18/07/2015 at 1:48 AM said:

        hi lethanpuc, I like those words! what I’m doing right now, is to generate the synchronization signals, chose 640×480 @ 85Hz, by having a 36 MHz clock pixel, I configured the STM32 to 36 MHz, to the frequency square!

  4. fatih kurnaz on 23/06/2016 at 8:28 PM said:

    hello Le Tan Phuc,

    I attempt to spi (stm32 is master) communication by using dma in the cube mx. Although first byte is send , code is waiting and output of the MISO pin send first byte repetitiously. Is there any suggestion to me at the solving this problem

Post Navigation