Parallella Community

by **piotr5** » Sun Jan 20, 2013 2:19 am

I compiled a program with a/b somewhere, in the obj-dump it shows that this function is called, and looking at that function, in assembly and in the sourcecode contained in gcc/libgcc/config/epiphany/divsi3.c it doesn't look as if it would actually do a division, seems to be an updated copy of modsi3.c and calculating the modulo is what it probably does do. isn't that function finished or something? has anyone experimented with using an algorithm in another base than base2? (remember there is the imsub instruction.) is any help needed in that respect?

by **piotr5** » Sun Jan 20, 2013 2:42 pm

oops, silly me, I that's what you get from reading algorithms in the middle of the night. of course it does work. it's the ordinary long division algorithm in base2. was confused because remainder and quotient were stored in the same integer, quite ingenious. still there are many other algorithms that would fit epiphany architecture better, this algorithm seems to be for a processor with only single-bit shift operation, not with a whole multiplication-op. why not use goldschmidt or srt like intel and amd? a look-up table could be stored across multiple cores so retrieval is fast and little space per core is used...

by **amylaar** » Sun Jan 20, 2013 3:10 pm

What algorithm fits an architecture depends on things like instruction set, available registers, and the costs of
code / data storage and access.

For sh4, I've used floating point division.
For the SH64, I've used inline code that computes an approximate inverse and a correction factor, to multiply the dividend
by (there's sort of a Newton-Raphson step in there, but it's been arithmetically transformed for scheduling purposes),
but that is only a sensible space/time trade-off because the SH64 has 32*32->64 bit multiply, and 64 bit adds and shifts.
For sh4-nofpu and ARCompact ARC700, I've used lookup tables.

For Epiphany, space is at a premium, so big inline code, tables, or complex algorithms are out.
Yes, you could distribute the code across multiple cores, but there is also a need for a basic implementation that can
run on a single core, or that leaves you to spread other stuff across multiple cores.

You might be able to find an optimization for a specific use case and Parallella grid setup using a different algorithm and
specific link strategy.

by **piotr5** » Fri Feb 08, 2013 1:49 pm

Parallella Community

does __divsi3 really divide?

does __divsi3 really divide?

Re: does __divsi3 really divide?

Re: does __divsi3 really divide?

Re: does __divsi3 really divide?

Who is online