How to use STM32 DMA

In many microcontroller applications, you may need to read and write data from and to outside devices through I2C, SPI, USART or ADC… peripherals. When you do this using processor – you will waste a significant amount of processing time, especially for some applications that require transferring a huge data. In order to avoid occupying CPU, most advanced microcontrollers nowadays have Direct memory Access (DMA) unit. This unit does data transfers between memory locations without the need of CPU processing. In this post, I will take STM32 DMA as an example to show the advantages of using DMA over normal transferring method.

Low and medium density (LD & MD) STM32 microcontrollers have single 7 channel DMA unit while high density (HD) devices have two DMA controllers with 12 independent channels.

DMA do automated memory to memory data transfers as well as peripheral to memory and peripheral to peripheral. DMA channels can be assigned one of four priority levels: very high, high, medium and low. If two same priority channels are requested at same time – the lowest number channel gets priority. DMA channel can also be configured to transfer data in to circular buffer. So DMA is an ideal solution for any peripheral data stream application.

Speaking of physical DMA buss access it is important to note, that DMA only access bus for actual data transfer. Because DMA request phase, address computation and Ack pulse are performed during other DMA channel bus transfer. So when one DMA channel finishes bus transfer, another channel is already ready to do transfer immediately. This ensures minimal bus occupation and fast transfers. Another interesting feature of DMA bus access is that it doesn’t occupy 100% of bus time. DMA takes 5 AHB bus cycles for single word transfer between memory – three of them are still left for CPU access. This means that DMA only takes maximum 40% of buss time. So even if DMA is doing intense data transfer CPU can access any memory area, peripheral at any time. If you look at block diagram you will see that CPU has separate Ibus for Flash access. So program fetch isn’t affected by DMA.

Programming with DMA

Simply speaking programming DMA is easy when you understand the way it works. Each channel can be controlled using four registers: Memory address, peripheral address, number of data and configuration. And all channels have two dedicated registers: DMA interrupt status register and interrupt flag clear register. Once set DMA takes care of memory address increment without disturbing CPU. DMA channels can generate three interrupts: transfer finished, half-finished and transfer error.

Let’s write a simple program which transfers data between two arrays. To make it more interesting lets do same task using DMA and without it. Then we can compare the time taken in both cases.

Here is a code of STM32 DMA memory to memory transfer using Keil ARM with stm32f1 library:

#include "stm32f10x.h"
#include "leds.h"
#define ARRAYSIZE 800
volatile uint32_t status = 0;  
volatile uint32_t i;  
int main(void)  
//initialize source and destination arrays
uint32_t source[ARRAYSIZE];  
uint32_t destination[ARRAYSIZE];  
//initialize array
for (i=0; i<ARRAYSIZE;i++)  
//initialize led
//enable DMA1 clock
RCC_AHBPeriphClockCmd(RCC_AHBPeriph_DMA1, ENABLE);  
//create DMA structure
DMA_InitTypeDef  DMA_InitStructure;  
//reset DMA1 channe1 to default values;
//channel will be used for memory to memory transfer
DMA_InitStructure.DMA_M2M = DMA_M2M_Enable;  
//setting normal mode (non circular)
DMA_InitStructure.DMA_Mode = DMA_Mode_Normal;  
//medium priority
DMA_InitStructure.DMA_Priority = DMA_Priority_Medium;  
//source and destination data size word=32bit
DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Word;  
DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Word;  
//automatic memory increment enable. Destination and source
DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Enable;  
DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Enable;  
//Location assigned to peripheral register will be source
DMA_InitStructure.DMA_DIR = DMA_DIR_PeripheralSRC;  
//chunk of data to be transfered
DMA_InitStructure.DMA_BufferSize = ARRAYSIZE;  
//source and destination start addresses
DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t)source;  
DMA_InitStructure.DMA_MemoryBaseAddr = (uint32_t)destination;  
//send values to DMA registers
DMA_Init(DMA1_Channel1, &DMA_InitStructure);  
// Enable DMA1 Channel Transfer Complete interrupt
DMA_ITConfig(DMA1_Channel1, DMA_IT_TC, ENABLE);

NVIC_InitTypeDef NVIC_InitStructure;  
//Enable DMA1 channel IRQ Channel */
NVIC_InitStructure.NVIC_IRQChannel = DMA1_Channel1_IRQn;  
NVIC_InitStructure.NVIC_IRQChannelPreemptionPriority = 0;  
NVIC_InitStructure.NVIC_IRQChannelSubPriority = 0;  
NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE;  

//LED on before transfer
//Enable DMA1 Channel transfer
DMA_Cmd(DMA1_Channel1, ENABLE);  
while(status==0) {};  
    for (i=0; i<ARRAYSIZE;i++)

while (1)  
    //interrupts does the job

 First, we create two arrays: source and destination. The size of  two arrays is determined by ARRAYSIZE which is 800 in our example.

We use one output to indicate start and stop transfer for both modes – DMA and CPU. First of all we must turn on DMA1 clock to make it functional. Then we load settings in to DMAInitStructure. For this example, we chose DMA1 Channel1, so first of all we call *DMADeInit(DMA1_Channel1)* function which simply makes sure DMA is reset to its default values. Then, set memory to memory mode, select normal DMA mode. As priority mode we assign Medium. After that, we select data size to be transferred (32-bit word). This need to be done for both peripheral and memory addresses.

NOTE! if one of memory sizes would be different, say source 32-bit and destination 8- bit – then DMA would cycle four times in 8 bit each time.

Then we load destination, source start addresses and amount of data to be sent. After load these values using DMAInit(DMAChannel1, &DMAInitStructure). After this operation DMA is prepared to do transfers. Any time DMA can be fired using DMACmd(DMA_Channel1, ENABLE) command.

In order to catch end of DMA transfer, we initialized DMA transfer Complete (TC) on channel1 interrupt.

NVIC_InitTypeDef NVIC_InitStructure; //Enable DMA1 channel IRQ Channel */ NVIC_InitStructure.NVIC_IRQChannel = DMA1_Channel1_IRQn; NVIC_InitStructure.NVIC_IRQChannelPreemptionPriority = 0; NVIC_InitStructure.NVIC_IRQChannelSubPriority = 0; NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE; NVIC_Init(&NVIC_InitStructure);  

Here, we could toggle LED and change status flag giving signal to start CPU transfer test.

void DMA1_Channel1_IRQHandler(void)  
  //Test on DMA1 Channel1 Transfer Complete interrupt
   //Clear DMA1 Channel1 Half Transfer, Transfer Complete and Global interrupt pending bits

This is CPU based memory copy routine:

//wait for DMA transfer to be finished
while(status==0) {};  
    for (i=0; i<ARRAYSIZE;i++)

 Since two leds, LEDG at GPIOC pin 9 and LEDB is at GPIOC pin 8, we could track start and stop pulses using scope:

So transferring 800 32-bit word using DMA took 214μs:

While CPU memory copy algorithm took 544μs:

This shows significant improve of data transfer speed (more than two times). And with DMA, the biggest benefit is that CPU is totally free during transfer and may do other tasks or simply go in to sleep mode to save power.

With DMA we can do loads of work only in hardware level. We will get back to it when we get to other STM32 features such as I2C or SPI, ADC.

Next post will show an example of using DMA to transfer MPU6050 data directly from I2C to STM32 memory.

Modified from source: