integer division

Forum for anything not suitable for the other forums.

Re: integer division

Postby tnt » Fri Jan 25, 2013 3:40 pm

Reading from neighbor code is a a massively slow operation compared to a simple instruction, I mean, it's like 10s of cpu cycles of pipeline stall ...
tnt
 
Posts: 408
Joined: Mon Dec 17, 2012 3:21 am

Re: integer division

Postby piotr5 » Fri Jan 25, 2013 8:31 pm

I'm aware of that, the manual isn't concrete on that though (10+ cycles for external read). the point is that no actual stall happens during reading when done right, you just need to occupy the core for those 10+ cycles with other stuff, and as I said the number of cycles spent with preparing for the actual division-loop is around 50 cycles, plenty time for external lookup. the goal is to reduce those 50-170 cycles needed for divsi3 without loss of accuracy.

btw, I have an idea for a radix256 algorithm without look-up: shift the numbers till you only need to divide by 1. look-up would be interesting for other stuff though...
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm

Re: integer division

Postby aolofsson » Fri Jan 25, 2013 9:27 pm

Sorry to disappoint you :( but the core stalls until the data comes back in the case of a load transaction. It would have been very expensive to allow the core to continue operating while there are external loads pending.
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: integer division

Postby piotr5 » Sat Jan 26, 2013 1:23 am

you emphasize "load transaction" because with an external core storing data wont block the receiving core? also what about dma-transfer, in the mode where interrupt is requested the core goes to sleep till the transfer is complete? and I found in the manual that moving data from core to neighbour takes 1.5 cycles latency plus 1cycle for the cMesh transaction, is that correct? (this "Transactions move through the network, with a latency of 1.5 clock cycles per routing hop." in 5.1 contradicts the example given there: "A transaction traversing from the left edge to right edge of a 64- core chip would thus take 12 clock cycles." or have I misunderstood something when I associate "latency" with a delay caused by the actual data-transfer as opposed to the delay happening lateron through the protocol?) why is in another chapter then the cycle-count of 10+ given, does this other info refer to the rMesh transaction time of 8 plus these 1.5 cycles of latency? to clarify: I suspect for the rMesh (i.e. reading operation) you need 8 cycles plus 3 cycles per hop, while for the cMesh network (i.e. writing) you need 1 cycle plus 1.5 cycles per hop. (xMesh is for off-chip, probably including every memory-location outside of the 32kb stored in each core -- read and write doesn't make any difference, both are slow, allegedly about 60 cycles.) that's why I think one can program around the blocking slowness and do a non-blocking table-lookup in about 10-20 cycles. question is just: how much time is lost on the other core doing the write...
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm

Re: integer division

Postby tnt » Sat Jan 26, 2013 8:08 am

Well IMHO:
- It's not acceptable to use DMA from a 'gcc library' like idiv ... the user might be using the DMA for something really useful.
- You can't have the other core executing code to write the results ... again, that would imply that a library function on one core can trigger code on another core ? That's just asking for trouble ...
tnt
 
Posts: 408
Joined: Mon Dec 17, 2012 3:21 am

Re: integer division

Postby mrgs » Mon Jan 28, 2013 10:33 am

Well, IMHO: We have to divide --- copy ?! --- this 'issue' into two parts. On one hand everybody like 'a nice math library' and the other hand, nobody?! like to suffer about HW 'weaknesses'. From my point of view: I focus to the advantages of HW + SW, and try to solve the 'problematic' points. So, (#1) we need 'standard' math libraries, and (#2) we need a mechanism which deliver data and 'code' for the cores 'just on time'. I think both are possible and independent from each other. Additionally there is no another way. --- I mean WE have to figure out, and solve these! Am I right? --- (!) FIXME --- Regards, Gabor
| OS4E : A preemptive, multiprocessing, microkernel based OS for Epiphany ARCH |
User avatar
mrgs
 
Posts: 63
Joined: Mon Dec 17, 2012 3:22 am
Location: Hungary

Previous

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 3 guests

cron