Parallella Community

by **piotr5** » Fri Nov 27, 2015 8:21 am

I'm not against L3, but your image suggests epiphany uses it through a single eLink-adapter on the south. instead I'd like epiphany stuck between 2 FPGA designed to be the eLink adapter. (here I'm using the notion of eLink loosely, I'm not thinking of 1G cable but a direct link to the edge of the epiphany-core network.) i.e. give the L3 tasks to FPGA. would be a nightmare to create a compiler for that though. and eliminating bottlenecks sometimes requires physical changes that cannot be programmed into fpga...

by **dobkeratops** » Fri Dec 04, 2015 9:28 am

by **piotr5** » Mon Dec 07, 2015 2:21 pm

I think a parallella-task-manager is the primary goal for now. but please explain more about concurrency being at the center of software.

by **dobkeratops** » Tue Dec 08, 2015 2:39 am

by **piotr5** » Thu Dec 10, 2015 11:03 am

what you describe doesn't sound like concurrency actually is the desired solution but merely one possibility among many. what we have here is a problem with cache-size. cache is too smll to hold all the tasks a single core is performing. well, the solution to that problem I suggested is to get better pre-cacheing behaviour by letting fpga or whatever programmable microcode handle the cacheing. we need an evolutionary development of cache-techniques built into the compiler. the idea is that without any added instructions in the actual threds or the threading infrastructure, some other hardware monitors execution path and restores cache whenever same address gets visited. so, before the program spawns a thread the hardware will be told about the cache-requirements for it. (also, let's not forget apple's approach for shifting around the cache-contents among the levels instead of same data occupying multiple levels.) in a way you're right that each thread requires a fully unique environment, not just registers but also cache should be stored upon task-switching. in fact, pre-cacheing and pre-sorting were the major applications I had in mind when I talked of optional parts of an algorithm.

as I told you, I am against one-entity-per-core usecase for parallell architectures because it uses the same code on multiple cores and thereby is a waste of the code-cache. neuronal network sounds better. somewhere I saw the definition of neuronal network as many independent computation units connected with some kind of weight function. in that sense epiphany is a neuronal network which has weight 1.0 connections to 4 sides by default, without a chance to change that. for that reason programming epiphany does focus more on the processing power of the individual cores and not on their connections. on the other end of this scale I've seen neuronal networks which simply take the average of all input weights and based on a threshold decide what to do next. but what if each neuron really has a different program running?

I said threading should be our major goal. but that's not just for portability (I wouldn't want the game to complain that I have not enough cores for all of its actors). if you put each actor on a different core then most of the time all your cores will be idle. sebraa said low computation-frequency is sufficient for robotics. same goes for the "robotics" of virtual objects on the screen, or more accurately this only affects software-emulated robotics and not real physical robots since the user wants to actually see what the robots are doing and so it can not move too fast. in real robots the limited processing speed is a matter of how fast a movement-change the hardware is capable of, how much energy you put into the movements, it's independent of such an arbitrary element like human eye-sight capabilities. so if you take care of an onscreen-character's movements only 50 times per second, what do you do the rest of the time? same with animated icons.

as you mention, some programs might use multiple cores working together on a single actor, maybe make use of the network-topology here for more efficient data-flow. now suppose such a task becomes idle, how should all these cores then get their new jobs assigned? this deserves its own task-management! just because cores being cheap doesn't mean you're supposed to waste them for the sake of storing cache-content. what you need is a thread-management aware of bandwidth issues, taking into account what all the other cores are doing, and in an idle time plan ahead to move cached data to and from the epiphany-cores. whenever a program becomes uninterruptible, the threading should know how long that will last. maybe it's better to wait for a thread-switch to be possible instead of setting up another core which might have less data pre-cached. on high-level you might have a flat 64-bit memory, but on assembler-level you must work with what you have, software cache and context-dependent address space. and software-cache needs to make sure the right data is in the right place whenever context demands it.

you want to change the world. but this takes time. in past, when we were young, there were no multicore chips. later GPGPU added many simd-cores and the hardware developers noticed that this way most cores are idle. so now we have lots of wavefronts per gpu, each could be used as its own mimd core with somesort SSE-alike capabilities. teenagers now are in the situation that they could see a discrepancy between what hardware is capable of and what software does offer. these teenagers are aware of an unfulfilled demand, and they'll likely devote their life to fulfilling it. but that takes time, they need to grow up and get education, then they need to do slave-work in some big company, and eventually they''ll arrive at a position where they can make their dreams come true. older generations cannot do that since they have goals of their own. an individual like you, who happens to be interested in that area wont succeed either since many people with differing goals are blocking the way in their unwillingness to cooperate (as they have their own goals). these people first need to get off the chair, next generation must step in. for this reason, no matter what you do, don't expect financial success if it isn't main-stream yet. for now parallella is hardware for schools and not for big companies. since things are going slowly, take your time and think about the technology of the future you want to have...

by **dobkeratops** » Thu Dec 10, 2015 3:12 pm

by **piotr5** » Sun Dec 13, 2015 10:35 am

by **dobkeratops** » Mon Dec 14, 2015 3:49 pm

by **e97** » Sat Jun 25, 2016 7:23 pm

Hi, catching up on this thread. So is the conclusion that the Epiphany chip is not a good architectural fit for tensor flow or software is the issue? TensorFlow seems to be mostly 16/32-bit linear algebra so I figured Epiphany would be excellent power/performance application of TensorFlow.

Parallella Community

TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Who is online