by mikebell » Thu Feb 26, 2015 9:33 pm
Beware that the implementation for memcpy will by default be located in shared memory, rather than on local core memory (unless you mess with the linker definition file).
I'd generally try to avoid copying memory on core, but if you do need to then just use a for loop. But make sure you have -fno-tree-loop-distribute-patterns specified to the compiler, else it will replace the loop with a memcpy, which will be slower because that executes from shared memory.