Parallella for bio-inspired neural networks
Posted: Wed Apr 01, 2015 7:27 pm
Hello,
Currently PhD Student in neuroscience I received a parallella from one on my co-worker (youhou, it's Christmas) and would like to use it to accelerate one of my "large scale" neural simulations. Before allocating any time to the task, I would like to know a few things about the feasibility of the project.
I have read the board specification (architecture, core frequency and memory) but still cannot get a hang of what will the main restrictions be.
The current simulation is built in C++ using boost (just for its queue implementation - which can be re-written if needed) and - soon enough - openMP for the multi-threading (and quite possibly MPI to connect many boards/computer together).
The entire simulation currently counts about a million of neurons. A typical neuron counts about 15 float variables an a bunch of ODEs. Each neurons are inherited from an abstract class and contain a C-array pointers to Synapse objects (up to a thousand). Here are my questions :
- Does using C++ class against C function/structure pointers would generate a memory overhead regarding to the 32k of memory each epiphany core possess ? I actually have no clear idea what using C++ against C implies when it comes to hardware memory management and overhead. As far as I understood, C++ is more of a syntactic abstraction for some pointer acrobatics, still possible but much more painful to do, in pure C.
- Is using that small part of the boost library a major hindrance regarding the cores memory (yes, the memory once again) ?
- While using openMP to dispatch thread queue to the processors, is the memory of the core only containing the current function/variable to be executed (analog the the low level cash of Intel processors for instances) or is all the memory necessary to the complete thread execution imported at once ?
- How heavy is it, time wise, to copy from the global memory to the core memory ?
The code involves many other aspects to manage space, create the neurons an wire them together (about 300Mo of ram on my computer) but this only takes a few seconds on my desktop computer and is executed once before the actual neural simulation begins - where the acceleration is required.
Ultimately, I would like to use the board to let my simulation run in a robotic environment as a substitute for traditional vision systems (I am working on a Retina model).
Thanks you in advance for your time and advises
Currently PhD Student in neuroscience I received a parallella from one on my co-worker (youhou, it's Christmas) and would like to use it to accelerate one of my "large scale" neural simulations. Before allocating any time to the task, I would like to know a few things about the feasibility of the project.
I have read the board specification (architecture, core frequency and memory) but still cannot get a hang of what will the main restrictions be.
The current simulation is built in C++ using boost (just for its queue implementation - which can be re-written if needed) and - soon enough - openMP for the multi-threading (and quite possibly MPI to connect many boards/computer together).
The entire simulation currently counts about a million of neurons. A typical neuron counts about 15 float variables an a bunch of ODEs. Each neurons are inherited from an abstract class and contain a C-array pointers to Synapse objects (up to a thousand). Here are my questions :
- Does using C++ class against C function/structure pointers would generate a memory overhead regarding to the 32k of memory each epiphany core possess ? I actually have no clear idea what using C++ against C implies when it comes to hardware memory management and overhead. As far as I understood, C++ is more of a syntactic abstraction for some pointer acrobatics, still possible but much more painful to do, in pure C.
- Is using that small part of the boost library a major hindrance regarding the cores memory (yes, the memory once again) ?
- While using openMP to dispatch thread queue to the processors, is the memory of the core only containing the current function/variable to be executed (analog the the low level cash of Intel processors for instances) or is all the memory necessary to the complete thread execution imported at once ?
- How heavy is it, time wise, to copy from the global memory to the core memory ?
The code involves many other aspects to manage space, create the neurons an wire them together (about 300Mo of ram on my computer) but this only takes a few seconds on my desktop computer and is executed once before the actual neural simulation begins - where the acceleration is required.
Ultimately, I would like to use the board to let my simulation run in a robotic environment as a substitute for traditional vision systems (I am working on a Retina model).
Thanks you in advance for your time and advises