by jar » Tue Jan 31, 2017 5:46 pm
Hi Nick,
2D DMAs are a challenge to comprehend and the API isn't obvious.
The 4s are there because that is the size of a word (4 bytes) in my example. An 8 byte double-word is the preferred transfer amount since the network is 64-bit. However, sometimes that wont work out as the size is data-dependent (alignment and sizes). Notzed's code assumes 4 bytes per word and aligned memory (he increments pointers to ints which implicitly add 4 bytes at a time). The outer stride is the big N (for the source) minus the little n (for the destination).
Most of the difficulty I had with understanding was the outer stride for the source and destination. The inner stride concept is easy. Because the inner stride of the destination was increasing by 4 and I wanted the block to copy to contiguous (linear) memory, the outer stride of the destination is also 4. This results in a dsta += 4 - 4 (nop) operation. Because the destination pointer was already incremented n times in the inner loop, we have to subtract that off: src += 4*(N-n+1) - 4.
I wanted to use this to simply copy rectangular (or square) blocks. But you could probably use this generic interface to copy "parallelogram shapes of data", transpose data using a combination of negative strides, or copy just even columns. I would have to think about those some more.
The "dma.inner_stride = 0x00010001 << shift;" is just for configuring the register that kicks off the DMA. 0x00010001 is byte alignment/transfers. 0x00080008 is double-word alignment/transfers. Using a shift operator makes this easy. You could also use other weird combinations like 0x00020004, but that gets harder to imagine.
Hope that helps.