I don't have any src code yet, but just thought I'd put the idea out there..
I think parallell processing could handle this pretty well (and it would be interesting to see how one could implement scaling when additional resources are added):
Given some Composite number C
HOST
do
set x = sqrt(C); floor x (or ceiling maybe??) //all numbers below repeat above sqrt(C)
send x to all e-cores.
wait for prime factors to come back in memory
ECORES
get x
(this part i'm not sure how would happen yet)
Split up the number line from 2->x (integers only) for n number of cores
each core check divisibility of all 6K+/-1 in the selected range for C //all primes of the form 6K+/-1
if a hit found; write value to memory for HOST to pick up.
Example would be like C = 101*23 = 2323 (small composite number)
sqrt(C) = 48.19....
x = 48 (floor result)
wait for ECORE results (interrupt/loop to look at memory)..
ECORES
core0 get x, C
core0 tell core1-coreM value of C
core0 tell core1-coreM here is your range to look at: //(assume M=15 for 16 core board)
core0 range 3->5
core1 range 6->9
core2 range 10->12
...
core15 range 45->48
Each ECORE (1->M):
accept range from ecore0
create array in range for the form 6*k+/-1
C mod (i) // pointer i iterates through array
if value at array(i) ==0, you have a hit.. pass value to HOST.
Could do some other stuff like.. if you know it is a semi-prime, once you get a hit just kill all other processes (you've found the ONLY lower prime factor value)
I dunno, poke fun if you wish
edit1 6/11/2014: it's def floor not ceiling