by MiguelTasende » Mon Aug 22, 2016 1:33 pm
Hi Peter,
Thank you for your interest in the work, and for your comments.
No, I did not use any interrupts from the Epiphany to the host. I am polling a register, in the host code, to find the end of the calculations. If it was possible "not to accumulate", I think the performance could grow a lot, but I think the limitation to do so is not in the signaling of the end of calculations by the Epiphany. The problem is that after the Epiphany signals the end of calculations, the host has to read the resulting data from the RAM. It happens (still don't have a really good, complete, explaination) that reading the "shared zone" of the RAM is very slow for the host. Reading other areas of the RAM is fast, but the shared area is slow. That seems to be the limiting factor to abandon the accumulator. In any case, I like the idea of using interrupts to signal the host, and it could improve the software in other, more creative ways, but the simple limitation that I see now is not that one.
I am using DMA in the code (on the device side; I've read about host DMA in FPGA but didn't try that at all). By now I am just using it because it is faster than getting the data one by one, I am not using it concurrently with the rest of the device code. That could be done, but for now the device code (the one that is not transfering data) takes too little time to make it relevant (I mean, I could parallellize the data transfer in the device with the real multiplication, but by now the multiplication takes too little time to gain something good, and to make it worse the host data movement is still a limitation; with other previous improvements that could change).
Last edited by
MiguelTasende on Tue Aug 23, 2016 8:11 pm, edited 1 time in total.