Getting Epiphany AND FPGA for full throttle is one of my fantasies
Alas, I've tried FPGA a few years ago and you must be skilled to succeed... I gave up. Not excluding a 2nd try though, but I have enough with Epiphany for the moment.
Epiphany de la Parallella is great ! Epiphany de la Parallella is famous !
I'll just share 3 thoughts.
1°) is it even technically possible ?
A few years ago, I was told:
the more you put inside the FPGA, the less the clock will be high.Even for basic FPGA projects, you cannot expect the highest clock rates.
I explain for beginners: use a few thousands of logic cells, if a simple adder could be clocked, say with the 600 MHz clock base,
you certainly cannot go beyond 200 MHz with your chess project.
The placement and routing constraints prevent you from this ideal situation.
Did things evolved these last years ? I'm not up-to-date.I fear Epiphany or part of it should be underclocked if the FPGA is too used... but again I'm not an FPGA expert and I did not study this point deeply.
The fact is, they wanted a 1000 MHz Epiphany first and had to downgrade it to 600 MHz... certainly not with pleasure. Things are rarely easier than hoped.
And why do you believe we have full speed with the 16 or 64-core local memory... 2 Mb at best, on 1 Gb RAM available... all this is not easy at all.
2°) FPGA... where Epiphany is not so good: 1-bit and > 32-bit
The better of FPGA ? Reconfigurable instructions. You have a program, say a bitcoin miner... reconfigure your FPGA, play your Epiphany program that calls sophisticated and greedy subroutines...
And then your face detection problem, with other custom FPGA routines... a dream probably.
But... Epiphany, even the most perfect CPU, cannot be good everywhere, it's a RISC architecture after all, fine with 32-bit but poor with single bit support (except the beloved BITREV).
I've already talked about the slow bit routines of the current C compiler... I had to rewrite the __builtin_popcount and __builtin_ctz to optimize my little Epiphany project.
Divisions are a problem too... even a single "add + 1" of a 64-bit integer takes about 6 instructions... or show me your code, how do you handle the carry ?
There are a lot of bit instructions that are not well implemented in Epiphany AND THE BIT INSTRUCTIONS ARE PERFECT WITH FPGA.
Each bit is used with FPGA, it's what fascinated me a few years ago. Tools, IP, P&R, money, ease of development... another story
Alas I don't think we can call FPGA then get the answer the next top... for routines lower than 10 instructions... rather 1000 or 10000 I guess.
Conclusion: if possible use FPGA for long 1-bit or > 32-bit routines.
3°) neural nets... or SPMD
I'm not against neural nets - I just never coded something like that.
Personally I use Epiphany the simpler way first: SPMD - Single Program Multiple Data.
Some algorithms support well this concept, like my current Eternity II project - it's a toy project I admit
You assign one core one single task with one chunk of input data... the easier way to begin multi-core development.
Well... i vote "YES" ! Do the best hybrid accelerating architecture you can, I'll follow ; I'll just improve my Epiphany assembly first
Long live to Epiphany de la Parallella !