Parallella Community

by **dobkeratops** » Fri Mar 18, 2016 11:59 pm

http://www.kalrayinc.com

seems to be a very similar idea, including the ability to extend the grid with multiple chips on a card.

they seem to have more structure, a grid of clusters, does that mean more yield problems? (i.e. if you have a defect, you're more likely to lose a cluster?)

I guess epiphany's ability to load/store across the network could give the same benefit as having clusters (i.e. you could think of a 2x2 block as having a slightly longer latency, larger scratchpad, like an openCL workgroup), seems more versatile to me?

they have 2mb/16 core cluster(?), I guess thats's 128k each, and they have to cache that, does that mean epiphany's compute density is higher. sounds like a significantly more complex chip all round.

maybe it maps more closely to OpenCL (but I wondered if making 2x2 tiles equivalent to workgroups and using a fraction of each scratchpad as 'local-memory' would help epiphany map better)

I guess it should be possible to have a programming model that work work 'very well' on both (MPI?) ?

by **jar** » Sat Mar 19, 2016 7:33 pm

Sorry for any typos. I am on my phone.

Kalray takes NUMA to the next level. I have not used it but the architecture diagram looks more complex than Epiphany. Epiphany addresses the "memory wall problem" by moving the core into memory. Other architectures build increasingly complex memory interfaces -- wider, deeper, higher latency, chip stacking, silicon interposers, or more "sockets" are their answers.

I've seen your comment on the Epiphany 2x2 workgroup scratchpad a number of times. You've also mentioned you have a C++ background. If you feel strongly about this idea, please consider writing a proof of concept demonstration. It could be done with C++ templates and overloading the square bracket operator. If you maintain the array size as a power of two, you could use a logical shift instead of a modulus operation for your array address computation.

What I believe you will find is that the additional address computation will cause your performance to tank. You may have other ideas and I encourage you to explore them with actual software proofs of concept rather than continuing to make claims without evidence.

by **dobkeratops** » Sun Mar 20, 2016 2:24 am

by **jar** » Sun Mar 20, 2016 3:39 pm

OpenCL is a programming model for SMP architectures. Epiphany is a PGAS architecture. Sure, you can beat a nail into wood with a screwdriver, but it would make a lot more sense to use a hammer. I know the OpenCL developer and I can tell you that inappropriate tools will receive little attention without a financial incentive. Time is limited and it's better to focus on things that make sense.

I like your thoughts on reduced precision NNs. I encourage you to explore them with software. You may look at the IBM TrueNorth architecture as what is possible with an ASIC with many similarities to Epiphany.

by **dobkeratops** » Mon Mar 21, 2016 1:20 am

Parallella Community

how do kalray's chips compare to epiphany

how do kalray's chips compare to epiphany

Re: how do kalray's chips compare to epiphany

Re: how do kalray's chips compare to epiphany

Re: how do kalray's chips compare to epiphany

Re: how do kalray's chips compare to epiphany

Who is online