Bitonic sort, mpi4py - python

I am attempting to implement the Bitonic-Sort algorithm.
Parallel Bitonic Sort Algorithm for processor Pk (for k := 0 : : : P 1)
d:= log P /* cube dimension */
sort(local datak) /* sequential sort */
/* Bitonic Sort follows */
for i:=1 to d do
window-id = Most Signicant (d-i) bits of Pk
for j:=(i-1) down to 0 do
if((window-id is even AND jth bit of Pk = 0) OR
(window-id is odd AND jth bit of Pk = 1))
then call CompareLow(j)
else call CompareHigh(j)
endif
endfor
endfor
Source: http://www.cs.rutgers.edu/~venugopa/parallel_summer2012/mpi_bitonic.html#expl
Unfortunately the descriptions of CompareHigh and CompareLow are shaky at best.
From my understanding, CompareHigh will take the data from the calling process, and its partner process, merge the two, sorted, and store the upper half in the calling process' data. CompareLow will do the same, and take the lower half.
I've verified that my implementation is selecting the correct partners and calling the correct CompareHigh/Low method during each iteration for each process, but my output is still only partially sorted. I'm assuming that my implementation of CompareHigh/Low is incorrect.
Here is a sample of my current output:
[0] [17 24 30 37]
[1] [ 92 114 147 212]
[2] [ 12 89 92 102]
[3] [172 185 202 248]
[4] [ 30 51 111 148]
[5] [148 149 158 172]
[6] [ 17 24 59 149]
[7] [160 230 247 250]
And here are my CompareHigh, CompareLow, and merge functions:
def CompareHigh(self, j):
partner = self.getPartner(self.rank, j)
print "[%d] initiating HIGH with %d" % (self.rank, partner)
new_data = np.empty(self.data.shape, dtype='i')
self.comm.Send(self.data, dest = partner, tag=55)
self.comm.Recv(new_data, source = partner, tag=55)
assert(self.data.shape == new_data.shape)
self.data = np.split(self.merge(data, new_data), 2)[1]
def CompareLow(self, j):
partner = self.getPartner(self.rank, j)
print "[%d] initiating LOW with %d" % (self.rank, partner)
new_data = np.empty(self.data.shape, dtype='i')
self.comm.Recv(new_data, source = partner, tag=55)
self.comm.Send(self.data, dest = partner, tag=55)
assert(self.data.shape == new_data.shape)
self.data = np.split(self.merge(data, new_data), 2)[0]
def merge(self, a, b):
merged = []
i = 0
j = 0
while i < a.shape[0] and j < b.shape[0]:
if a[i] < b[j]:
merged.append(a[i])
i += 1
else:
merged.append(b[j])
j += 1
while i < a.shape[0]:
merged.append(a[i])
i += 1
while j < a.shape[0]:
merged.append(b[j])
j += 1
return np.array(merged)
def getPartner(self, rank, j):
# Partner process is process with j_th bit of rank flipped
j_mask = 1 << j
partner = rank ^ j_mask
return partner
Finally, here the actual algorithm loop:
# Generating map of bit_j for each process.
bit_j = [0 for i in range(d)]
for i in range(d):
bit_j[i] = (rank >> i) & 1
bs = BitonicSorter(data)
for i in range(1, d+1):
window_id = rank >> i
for j in reversed(range(0, i)):
if rank == 0: print "[%d] iteration %d, %d" %(rank, i, j)
comm.Barrier()
if (window_id%2 == 0 and bit_j[j] == 0) \
or (window_id%2 == 1 and bit_j[j] == 1):
bs.CompareLow(j)
else:
bs.CompareHigh(j)
if rank == 0: print ""
comm.Barrier()
if rank != 0:
comm.Send(bs.data, dest = 0, tag=55)
comm.Barrier()
else:
dataset[0] = bs.data
for i in range(1, size) :
comm.Recv(dataset[i], source = i, tag=55)
comm.Barrier()
for i, datai in enumerate(dataset):
print "[%d]\t%s" % (i, str(datai))
dataset = np.array(dataset).reshape(data_size)

Well bugger me:
self.data = np.split(self.merge(data, new_data), 2)
Were the problematic lines. I'm not sure what variable data was bound to, but that was the problem right there.

Related

Looking for a compressing algorithm that match this code

I am trying to write a tool to modify a game save. To do so, I need to decompress and compress the save. I found this decompressing algorithm for the save (that works), but I don't understand the code, so I don't know how to compress the data.
Does anyone find this familiar to any compression/decompression algorithms?
Thanks!
def get_bit(buffer, ref_pointer, ref_filter, length):
result = 0
current = buffer[ref_pointer.value]
print(current)
for i in range(length):
result <<= 1
if current & ref_filter.value:
result |= 0x1
ref_filter.value >>= 1
if ref_filter.value == 0:
ref_pointer.value += 1
current = buffer[ref_pointer.value]
ref_filter.value = 0x80
return result
def decompress(buffer, decode, length):
ref_pointer = Ref(0)
ref_filter = Ref(0x80)
dest = 0
dic = [0] * 0x2010
while ref_pointer.value < length:
print(ref_pointer.value, ref_filter.value, dest)
bits = get_bit(buffer, ref_pointer, ref_filter, 1)
if ref_pointer.value >= length:
return dest
if bits:
bits = get_bit(buffer, ref_pointer, ref_filter, 8)
if ref_pointer.value >= length:
print(dic)
return dest
decode[dest] = bits
dic[dest & 0x1fff] = bits
dest += 1
else:
bits = get_bit(buffer, ref_pointer, ref_filter, 13)
if ref_pointer.value >= length:
print(dic)
return dest
index = bits - 1
bits = get_bit(buffer, ref_pointer, ref_filter, 4)
if ref_pointer.value >= length:
print(dic)
return dest
bits += 3
for i in range(bits):
dic[dest & 0x1fff] = dic[index + i]
decode[dest] = dic[index + i]
dest += 1
print(dic)
return dest

Are there faster techiniques or shortcut for MOD'ing large numbers using pow(x,y,n)

I created the following to find primes in numbers by using powers of two to increase the size of the number until the gcd of the original number and and a larger number mod of the powers of two up to as high as we need to go have a gcd answer greater than 1. What i'm looking for is if a mathematician knows of any tricks to make this technique faster as it could be quite useful for finding prime factors. Any indications as to why or why not a shortcut doesn't exist would be helpful
def search_for_prime(hm):
for x in range(2,500):
answer = gcd_mod_find(hm**x, hm)
if answer != 1:
break
return answer
def gcd_mod_find(hm, find):
prevcr = hm
getcr = hm%(1<<(hm).bit_length()-1)
bitlength = hm.bit_length()//2
count=0
found = False
while getcr != 0 and getcr != 1 and bitlength > count:
count+=1
temp = hm%getcr
prevcr = getcr
getcr = temp
answer = gcd(prevcr, find)
if answer != 1:
#print(f"{answer} found at {prevcr}")
found = True
break
if found == True:
return answer
else:
return 1
An example of how this works:
# Takes about less than a minute to find:
# In [1677]: search_for_prime(1009732533765211)
# Out[1677]: 11344301
# Finds rather quickly that the number is prime:
# In [1679]: search_for_prime(10099)
# Out[1679]: 10099
# Rather fast find of one of the primes
# In [1680]: search_for_prime(7919 * 7883)
# Out[1680]: 7919
My goal is for a mathematician to let me know if there are any optimizations to make this faster. I'm working on learning more about mod techniques (which lead me to a faster pollard_brent answers for large numbers rather than using random ints for y,c,m, which i'll share here for those who are interested ( This is to show that i could optimize pollard_brent, so hopefully a mathematician will know of tricks of how to optimize my method above:)
The following is a digression from the above question but useful for those interested in speed optimizations for finding factors using pollard_brent
Currently my method above is slower than a pollard brent optimization i made which is faster than it's (pollard_brent y,c,m as random) version ( So i made an optimization to make pollard_brent have consistently lower speed results that some might be interested in:
from sympy import isprime
from math import gcd
def pollard_brent_lars_opt(n):
if n % 2 == 0: return 2
if n % 3 == 0: return 3
# This optimization is contributed by Lars Rocha. I use an equation instead of random numbers for significant
# speed increases on larger numbers and consistent low speed results
y,c,m = (1<<((n**2).bit_length()+1)), (1<<((n**2).bit_length())) , (1<<((n**2).bit_length()+1))
#print(y,c,m)
g, r, q = 1, 1, 1
while g == 1:
x = y
for i in range(r):
y = (pow(y, 2, n) + c) % n
k = 0
while k < r and g==1:
ys = y
for i in range(min(m, r-k)):
y = (pow(y, 2, n) + c) % n
q = q * abs(x-y) % n
g = gcd(q, n)
k += m
r *= 2
if g == n:
while True:
ys = (pow(ys, 2, n) + c) % n
g = gcd(abs(x - ys), n)
if g > 1:
break
return g
def get_factors_lars_opt(hm, offset=-2):
num = hm
vv = []
while isprime(num // 1) != True:
print(vv, num)
a = find_prime_evens_lars_opt(num, offset)
#print(a)
if a[5] == 0:
vv.append(a[1])
elif a[5] == 2:
if num % 2 == 0:
vv.append(a[5])
else:
vv.append(a[3])
elif a[5] == 3:
if num % 2 == 0:
vv.append(2)
elif num % 3 == 0:
vv.append(3)
elif num % 5 == 0:
vv.append(5)
elif num % a[3] == 0:
vv.append(a[3])
else:
vv.append(a[5])
#print(a)
num = num // vv[-1]
if isprime(num):
vv.append(num)
print(vv)
return vv
def find_prime_evens_lars_opt(hm, offset=-2):
y = 3
prevtemp = 3
if isprime(hm):
print(f"{hm} is already prime")
return hm
while True:
if y.bit_length() > hm.bit_length() -1:
prevtemp = pollard_brent_lars_opt(hm)
break
j = powers(hm, y)
temp = j
if j != 1 and j != 0:
temp = j
temp = hm % temp
if temp == 0:
prevtemp = temp
print("c: break")
break
while temp != 1 and temp != 0:
prevtemp = temp
temp = hm % temp
if temp == 0:
print("h: break")
break
y = Xploder(y) -offset
return hm, j, y.bit_length(), y, temp, prevtemp
def build_prime_number(hm):
si = 1
for x in range(len(hm)):
si = si * hm[x]
return si
def Xploder(s, iter=1):
return ((s+1) << (iter))-1
def powers(hm, y):
return hm%y
Here is some speed test results from pollard brent compared to my optimizations ( for larger numbers, smaller numbers the random technique can be faster sometimes, but my optimizations consistently are on the lower bound, and i found much better results with larger factorizations ):
# Here get_factors_lars_opt calls my optimized version of y,c,m of pollard brent, and the get_factors_brent uses
# the original form with y,c,m as random
sin = 777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777
import time
start = time.time()
print(build_prime_number(get_factors_lars_opt(sin)))
end = time.time()
print("Time: ", end-start)
[3, 7, 7, 13, 11, 37, 103, 613, 4013, 210631, 2071723, 52986961, 21993833369, 5363222357, 291078844423, 13168164561429877, 377526955309799110357]
777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777
440.6862680912018
import time
start = time.time()
print(build_prime_number(get_factors_brent(sin)))
end = time.time()
print("Time: ", end-start)
[3, 7, 7, 13, 11, 37, 103, 613, 4013, 210631, 2071723, 52986961, 21993833369, 5363222357, 291078844423, 13168164561429877, 377526955309799110357]
777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777
Time: 814.3913190364838
sin = 272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727
import time
start = time.time()
print(build_prime_number(get_factors_lars_opt(sin)))
end = time.time()
print("Time: ", end-start)
[3, 3, 3, 3, 7, 13, 37, 103, 613, 4013, 210631, 2071723, 52986961, 5363222357, 21993833369, 291078844423, 13168164561429877, 377526955309799110357]
272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727
Time: 286.54595589637756
import time
start = time.time()
print(build_prime_number(get_factors_brent(sin)))
end = time.time()
print("Time: ", end-start)
[3, 3, 3, 3, 7, 13, 37, 103, 613, 4013, 210631, 2071723, 5363222357, 52986961, 21993833369, 291078844423, 13168164561429877, 377526955309799110357]
272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727272727
Time: 833.4450578689575
sin = 6324591032675721961071009838204690217216135
import time
start = time.time()
print(build_prime_number(get_factors_lars_opt(sin,-2)))
end = time.time()
print(end-start)
[5, 2267, 1031, 808309, 1034200481927, 647396143454755757]
6324591032675721961071009838204690217216135
3.2956199645996094
import time
start = time.time()
print(get_factors_brent(sin))
end = time.time()
print(end-start)
[5, 2267, 1031, 808309, 1034200481927, 647396143454755757]
6.273381233215332
sin = 632459103267572196107100983820469021721613
import time
start = time.time()
print(build_prime_number(get_factors_lars_opt(sin,-2)))
end = time.time()
[3, 11, 11, 938861870237, 1855769878917004820751635923]
632459103267572196107100983820469021721613
0.9035940170288086
import time
start = time.time()
print(build_prime_number(get_factors_brent(sin,-2)))
end = time.time()
[3, 11, 11, 938861870237, 1855769878917004820751635923]
[3, 11, 11, 938861870237, 1855769878917004820751635923]
632459103267572196107100983820469021721613
6.698847055435181
import time
start = time.time()
print(get_factors_brent(sin))
end = time.time()
print(end-start)
632459103267572196107100983820469021721613
[3, 11, 11, 938861870237, 1855769878917004820751635923]
1.687075138092041
# The point being that my optimizations always speed test to the lower end of the spectrum, when randomness can be as fast, faster or slower, so having my optimization leads to better consistently lower results. Anyway, this is to show that i'm interested in helping others optimize their techniques so hopefully a mathematician will help me optimize mine. Thx.
For questions regarding how this method works, you can use this code ( which i follow by examples of it finding the primes ). The issue with this pure method is that the squaring becomes slow as it reaches larger numbers. All primes are found by % mod of the number by it's bitslength powers of 2 offset by 2 and % modding down to you find a prime. You then divide by that prime and continue the processes. Once it's found all it's primes there, it squares the numbers and continues looking for it's bigger primes. Here is the code:
def SieveOfEratosthenes(n):
# Create a boolean array "prime[0..n]" and initialize
# all entries it as true. A value in prime[i] will
# finally be false if i is Not a prime, else true.
prime = [True for i in range(n+1)]
p = 2
while (p * p <= n):
# If prime[p] is not changed, then it is a prime
if (prime[p] == True):
# Update all multiples of p
for i in range(p * p, n+1, p):
prime[i] = False
p += 1
size = 0
for p in range(2, n+1):
if prime[p]:
size+=1
vv = []
count = 0
# Print all prime numbers
for p in range(2, n+1):
if prime[p]:
vv.append(p)
count+=1
return vv
def get_factor_lars_prime(hm, offset=-2):
num = hm
vv = []
a = lars_find_prime(num, offset)
if a[5] == 0:
vv.append(a[1])
elif a[5] != 0 and a[3] == hm:
vv.append(a[3])
elif a[5] == 2:
if num % 2 == 0:
vv.append(a[5])
else:
vv.append(a[3])
elif a[5] == 3:
if num % 2 == 0:
vv.append(2)
elif num % 3 == 0:
vv.append(3)
elif num % 5 == 0:
vv.append(5)
elif num % a[3] == 0:
vv.append(a[3])
else:
vv.append(a[5])
num = num // vv[-1]
return vv
def larsprimetest(hm):
larstest = [-2, -1, 0, 1, 2]
primereducer = SieveOfEratosthenes(hm.bit_length())
primetest = 0
for x in primereducer:
if hm == x:
return True
for x in primereducer:
if pow(int(x),hm-1,hm)%hm != 1:
return False
for x in larstest:
if get_factor_lars_prime(hm, x)[0] == hm:
primetest += 1
if primetest == 5:
return True
else:
return False
return True
def get_factors(hm, offset=-2):
num = hm
vv = []
while larsprimetest(num // 1) != True:
print(vv, num)
a = find_prime_evens(num, offset)
print(a)
if a[5] == 0:
vv.append(a[1])
elif a[5] != 0 and a[3] == hm:
vv.append(a[3])
elif a[5] == 2:
if num % 2 == 0:
vv.append(a[5])
else:
vv.append(a[3])
elif a[5] == 3:
if num % 2 == 0:
vv.append(2)
elif num % 3 == 0:
vv.append(3)
elif num % 5 == 0:
vv.append(5)
elif num % a[3] == 0:
vv.append(a[3])
else:
vv.append(a[5])
print(a)
num = num // vv[-1]
if larsprimetest(num):
vv.append(num)
print(vv)
return vv
def Xploder(s, iter=1):
return ((s+1) << (iter))-1
def powers(hm, y):
return hm%y
def find_prime_evens(hm, offset=-2):
y = 3
prevtemp = 3
if larsprimetest(hm):
print(f"{hm} is already prime")
return hm
count = 0
while True:
count += 1
if y.bit_length() > hm.bit_length() -1:
hm = hm**2
j = powers(hm, y)
temp = j
if j != 1 and j != 0:
temp = j
temp = hm % temp
if temp == 0:
prevtemp = temp
break
while temp != 1 and temp != 0:
prevtemp = temp
temp = hm % temp
if temp == 0:
print("h: break")
break
y = Xploder(y) -offset
return hm, j, y.bit_length(), y, temp, prevtemp, count
def lars_find_prime(hm, offset=-2):
y = 3
prevtemp = 3
count = 0
while True:
count += 1
j = powers(hm, y)
temp = j
if j != 1 and j != 0:
temp = j
temp = hm % temp
if temp == 0:
prevtemp = temp
break
while temp != 1 and temp != 0:
prevtemp = temp
temp = hm % temp
if temp == 0:
break
y = Xploder(y) -offset
return hm, j, y.bit_length(), y, temp, prevtemp, count
def build_prime_number(hm):
si = 1
for x in range(len(hm)):
si = si * hm[x]
return si
def lars_next_prime(hm):
if hm == 1:
return 2
if hm == 2:
return 3
if hm % 2 == 0:
hm = hm + 1
hm += 2
while larsprimetest(hm) == False:
hm += 2
return hm
Here are examples of it in action, i included extra prints so you can see it getting each prime in succession to help better understand how it works:
In [34]: build_prime_number(get_factors(1004))
[] 1004
(1004, 2, 2, 3, 0, 0, 1)
(1004, 2, 2, 3, 0, 0, 1)
[2] 502
h: break
(502, 7, 4, 9, 0, 2, 2)
(502, 7, 4, 9, 0, 2, 2)
[2, 2, 251]
Out[34]: 1004
In [27]: get_factors(1009732533765202)
[] 1009732533765202
h: break
(1009732533765202, 16, 5, 21, 0, 2, 3)
(1009732533765202, 16, 5, 21, 0, 2, 3)
[2] 504866266882601
h: break
(504866266882601, 314, 9, 381, 0, 13, 7)
(504866266882601, 314, 9, 381, 0, 13, 7)
[2, 13] 38835866683277
h: break
(38835866683277, 22047719, 25, 25165821, 0, 373, 23)
(38835866683277, 22047719, 25, 25165821, 0, 373, 23)
[2, 13, 373] 104117605049
h: break
(10840475681139550292401, 66734543691418, 51, 1688849860263933, 0, 1097, 49)
(10840475681139550292401, 66734543691418, 51, 1688849860263933, 0, 1097, 49)
[2, 13, 373, 1097, 94911217]
Out[27]: [2, 13, 373, 1097, 94911217]
In [26]: get_factors(1009732533765203)
[] 1009732533765203
h: break
(1009732533765203, 98408, 19, 393213, 0, 1823, 17)
(1009732533765203, 98408, 19, 393213, 0, 1823, 17)
[1823] 553885098061
h: break
(8858444060334285873499462093711776032199292767341903635560423388560603860385161070220665461281, 1837905802481463305694040153443073056866698693347558247643656, 202, 4820814132776970826625886277023487807566608981348378505904125, 0, 3163, 200)
(8858444060334285873499462093711776032199292767341903635560423388560603860385161070220665461281, 1837905802481463305694040153443073056866698693347558247643656, 202, 4820814132776970826625886277023487807566608981348378505904125, 0, 3163, 200)
[1823, 3163, 175113847]
Out[26]: [1823, 3163, 175113847]
In this example you can see what slows it down, the squaring results in a very large number making the number of % mods very high to find the resulting prime numbers:
In [35]: build_prime_number(get_factors(1009732533765211))
[] 1009732533765211
h: break
(1858687464945830613276207519418830078256252925527494817670836452584925872151997050342726690319815855916489163187476779328795581561584506918102376218558870556528131449536050119335272183001175087460816706604467590422729460305771026510195258807326149782536916059371245684962927048486207145468445983070111263743601945442437754692285582860831087579582865338897577976561107423651769237534542867346635933728009644016505419393349047377174870035612601951635799954836993221938911759004398692526370763278333555537149891691148929012230048164898839856174708241523484635142476852085507872602461333140057099374031231234692883772515864168322858763614749354444867119570513756048715083028301447984153569256687855399132439900591865090139158984073970060632974132395463497538996079103082850269708894436847549189253692504734313896341688057677128152397506215184070793063758319912367444161873329646621333822069513797521961617016648431755202336310221214445951592592865449312913586743041, 867432054263957195312394763826128523180839572794694625925893326576585472718884079259994297952068169500409139996557409526226130721935083932434155974396191513044409897151702362686457004179428822761911329822994774194886447878375861414059761388814088489690684480586657294595997373539633554633322628683970338029751517448126242789268931868877409296113419871613500698447737529174670292711688792459352329908009558736654555272729173099792228946045373872886607941401263166526198601410313477275934200380505058563197802581624185674924445556439949854181335354955193238910211915519483035004071046631189912140798111185601326587350356747504076298870161020037239314590660157491873623891581213042851417951748697766497038253143302123500968830751639348475147422594917068852946804447678119131167696912276272728600619979363485134551679606705258889428555149333805796401891036221772343868264049459632937898670649848299312252309046124292715358546940281583005313710132082, 3143, 10288158236705112447422412676734607870053121808427215392372356413249991433231817946259019960285051290857995822720743146217778185868842788207857017093380941762603675478415275610927176437914700447587292303826017744988440230377899747493402777461598453741291504123172281193487476681009866573741818827221076867067306582888054118228758213163000348022165104779211278807599015736046339741076053225413368146423933780513391217497933933021472551628227200656036586380577217835068596992642302384015703275756549655632282308707585597225290584296817091365296639633296475008816750266955889613824279377943214662329512838089193850285724935535928720351801544368420366013001495177810046104674236151184011707700353556483120087118410679668080153968024939929060725862159219354678155246920045922185882168329581212366120003846850432289454131135080189140814703595459255280289663654678483280428926392800140450176592552239268189492436400835156463000753657089320650379054022653, 0, 11344301, 3141)
(1858687464945830613276207519418830078256252925527494817670836452584925872151997050342726690319815855916489163187476779328795581561584506918102376218558870556528131449536050119335272183001175087460816706604467590422729460305771026510195258807326149782536916059371245684962927048486207145468445983070111263743601945442437754692285582860831087579582865338897577976561107423651769237534542867346635933728009644016505419393349047377174870035612601951635799954836993221938911759004398692526370763278333555537149891691148929012230048164898839856174708241523484635142476852085507872602461333140057099374031231234692883772515864168322858763614749354444867119570513756048715083028301447984153569256687855399132439900591865090139158984073970060632974132395463497538996079103082850269708894436847549189253692504734313896341688057677128152397506215184070793063758319912367444161873329646621333822069513797521961617016648431755202336310221214445951592592865449312913586743041, 867432054263957195312394763826128523180839572794694625925893326576585472718884079259994297952068169500409139996557409526226130721935083932434155974396191513044409897151702362686457004179428822761911329822994774194886447878375861414059761388814088489690684480586657294595997373539633554633322628683970338029751517448126242789268931868877409296113419871613500698447737529174670292711688792459352329908009558736654555272729173099792228946045373872886607941401263166526198601410313477275934200380505058563197802581624185674924445556439949854181335354955193238910211915519483035004071046631189912140798111185601326587350356747504076298870161020037239314590660157491873623891581213042851417951748697766497038253143302123500968830751639348475147422594917068852946804447678119131167696912276272728600619979363485134551679606705258889428555149333805796401891036221772343868264049459632937898670649848299312252309046124292715358546940281583005313710132082, 3143, 10288158236705112447422412676734607870053121808427215392372356413249991433231817946259019960285051290857995822720743146217778185868842788207857017093380941762603675478415275610927176437914700447587292303826017744988440230377899747493402777461598453741291504123172281193487476681009866573741818827221076867067306582888054118228758213163000348022165104779211278807599015736046339741076053225413368146423933780513391217497933933021472551628227200656036586380577217835068596992642302384015703275756549655632282308707585597225290584296817091365296639633296475008816750266955889613824279377943214662329512838089193850285724935535928720351801544368420366013001495177810046104674236151184011707700353556483120087118410679668080153968024939929060725862159219354678155246920045922185882168329581212366120003846850432289454131135080189140814703595459255280289663654678483280428926392800140450176592552239268189492436400835156463000753657089320650379054022653, 0, 11344301, 3141)
[11344301, 89007911]
Out[35]: 1009732533765211
You can use build_prime_number to prove the results
In [36]: build_prime_number([11344301, 89007911])
Out[36]: 1009732533765211
You can also find the next prime with this:
In [2075]: lars_next_prime(1009732533765201)
Out[2075]: 1009732533765251
Why i'm interested in this method and finding a shortcut to speed it up is that the squaring can extract primes as such:
In [2076]: build_prime_number(get_factors(1009*1013))
[] 1022117
h: break
(1044723161689, 1046109004, 32, 4294967295, 0, 1013, 31)
(1044723161689, 1046109004, 32, 4294967295, 0, 1013, 31)
[1013, 1009]
Out[2076]: 1022117
In [2077]: build_prime_number(get_factors(100999*7187))
[] 725879813
h: break
(77075748221559656466333073070674906020465405143124137442807471435863521, 4486117980173565510874636581654887178239539699915726, 174, 23945242826029513411849172299223580994042798784118783, 0, 7187, 173)
(77075748221559656466333073070674906020465405143124137442807471435863521, 4486117980173565510874636581654887178239539699915726, 174, 23945242826029513411849172299223580994042798784118783, 0, 7187, 173)
[7187, 100999]
Out[2077]: 725879813
So a faster shortcut would be very beneficial.

No of Pairs of consecutive prime numbers having difference of 6 like (23,29) from 1 to 2 billion

How to find number of pairs of consecutive prime numbers having difference of 6 like (23,29) from 1 to 2 billion (using any programming language and without using any external libraries) with considering time complexity?
Tried sieve of eratosthenes but getting consecutive primes is challenge
Used generators but time complexity is very high
The code is:
def gen_numbers(n):
for ele in range(1,n+1):
for i in range(2,ele//2):
if ele%i==0:
break
else:
yield ele
prev=0
count=0
for i in gen_numbers(2000000000):
if i-prev==6:
count+=1
prev = i
Interesting question! I have recently been working on Sieve of Eratosthenes prime generators. #Hans Olsson says
You should use segmented sieve to avoid memory issue:
en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Segmented_sieve
I agree, and happen to have one which I hacked to solve this question. Apologies in advance for the length and non Pythonic-ness. Sample output:
$ ./primes_diff6.py 100
7 prime pairs found with a difference of 6.
( 23 , 29 ) ( 31 , 37 ) ( 47 , 53 ) ( 53 , 59 ) ( 61 , 67 ) ( 73 , 79 ) ( 83 , 89 )
25 primes found.
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79,
83, 89, 97]
$ ./primes_diff6.py 1e5
1940 prime pairs found with a difference of 6.
9592 primes found.
The code:
#!/usr/bin/python -Wall
# program to find all primes smaller than n, using segmented sieve
# see https://github.com/kimwalisch/primesieve/wiki/Segmented-sieve-of-Eratosthenes
import sys
def segmentedSieve(limit):
sqrt = int(limit ** 0.5)
segment_size = sqrt
prev = 0
count = 0
# we sieve primes >= 3
i = 3
n = 3
sieve = []
is_prime = [True] * (sqrt + 1)
primes = []
multiples = []
out_primes = []
diff6 = []
for low in xrange(0, limit+1, segment_size):
sieve = [True] * segment_size
# current segment = [low, high]
high = min(low + segment_size -1, limit)
# add sieving primes needed for the current segment
# using a simple sieve of Eratosthenese, starting where we left off
while i * i <= high:
if is_prime[i]:
primes.append(i)
multiples.append(i * i - low)
two_i = i + i
for j in xrange(i * i, sqrt, two_i):
is_prime[j] = False
i += 2
# sieve the current segment
for x in xrange(len(primes)):
k = primes[x] * 2
j = multiples[x]
while j < segment_size: # NB: "for j in range()" doesn't work here.
sieve[j] = False
j += k
multiples[x] = j - segment_size
# collect results from this segment
while n <= high:
if sieve[n - low]:
out_primes.append(n)
if n - 6 == prev:
count += 1
diff6.append(n)
prev = n
n += 2
print count, "prime pairs found with a difference of 6."
if limit < 1000:
for x in diff6:
print "(", x-6, ",", x, ")",
print
return out_primes
# Driver Code
if len(sys.argv) < 2:
n = 500
else:
n = int(float(sys.argv[1]))
primes = [2] + segmentedSieve(n)
print len(primes), "primes found."
if n < 1000:
print primes
This might work as-is if you run it for size 2e9 (2 billion) and subtract the result of size 1e9 (1 billion).
EDIT
Performance info, requested by #ValentinB.
$ time ./primes_diff6.py 2e9
11407651 prime pairs found with a difference of 6.
98222287 primes found.
real 3m1.089s
user 2m56.328s
sys 0m4.656s
... on my newish laptop, 1.6 GHz i5-8265U, 8G RAM, Ubuntu on WSL, Win10
I found a mod 30 prime wheel here in a comment by Willy Good that is about 3x faster than this code at 1e9, about 2.2x faster at 2e9. Not segmented, the guts is a Python generator. I'm wondering if it could be segmented or changed to use a bit array to help its memory footprint without otherwise destroying its performance.
END EDIT
This will require storing all primes from 0 to sqrt(2000000000) so memory wise it's not optimal but maybe this will do for you ? You're going to have to go for a more complex sieve if you want to be more efficient though.
n = 2000000000
last_prime = 3
prime_numbers_to_test = [2, 3]
result = 0
for i in range(5, n, 2):
for prime in prime_numbers_to_test:
# Not prime -> next
if i % prime == 0:
break
else:
# Prime, test our condition
if i - last_prime == 6:
result += 1
last_prime = i
if i**2 < n:
prime_numbers_to_test.append(i)
print(result)
EDIT This code yielded a result of 11,407,651 pairs of consecutive primes with a difference of 6 for n = 2,000,000,000
Here's a demonstration along the lines of what I understood as user448810's intention in their comments. We use the primes to mark off, meaning sieve, only relevant numbers in the range. Those are numbers of the form 6k + 1 and 6k - 1.
Python 2.7 code:
# https://rosettacode.org/wiki/Modular_inverse
def extended_gcd(aa, bb):
lastremainder, remainder = abs(aa), abs(bb)
x, lastx, y, lasty = 0, 1, 1, 0
while remainder:
lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
x, lastx = lastx - quotient*x, x
y, lasty = lasty - quotient*y, y
return lastremainder, lastx * (-1 if aa < 0 else 1), lasty * (-1 if bb < 0 else 1)
def modinv(a, m):
g, x, y = extended_gcd(a, m)
if g != 1:
raise ValueError
return x % m
from math import sqrt, ceil
n = 2000000000
sqrt_n = int(sqrt(n))
A = [True] * (sqrt_n + 1)
for i in xrange(2, sqrt_n // 2):
A[2*i] = False
primes = [2]
for i in xrange(3, sqrt_n, 2):
if A[i]:
primes.append(i)
c = i * i
while c <= sqrt_n:
A[c] = False
c = c + i
print "Primes with a Difference of 6"
print "\n%s initial primes to work from." % len(primes)
lower_bound = 1000000000
upper_bound = 1000001000
range = upper_bound - lower_bound
print "Range: %s to %s" % (lower_bound, upper_bound)
# Primes of the form 6k + 1
A = [True] * (range // 6 + 1)
least_6k_plus_1 = int(ceil((lower_bound - 1) / float(6)))
most_6k_plus_1 = (upper_bound - 1) // 6
for p in primes[2:]:
least = modinv(-6, p)
least_pth_over = int(least + p * ceil((least_6k_plus_1 - least) / float(p)))
c = int(least_pth_over - least_6k_plus_1)
while c < len(A):
A[c] = False
c = c + p
print "\nPrimes of the form 6k + 1:"
for i in xrange(1, len(A)):
if A[i] and A[i - 1]:
p1 = (i - 1 + least_6k_plus_1) * 6 + 1
p2 = (i + least_6k_plus_1) * 6 + 1
print p1, p2
# Primes of the form 6k - 1
A = [True] * (range // 6 + 1)
least_6k_minus_1 = int(ceil((lower_bound + 1) / float(6)))
most_6k_minus_1 = (upper_bound + 1) // 6
for p in primes[2:]:
least = modinv(6, p)
least_pth_over = int(least + p * ceil((least_6k_minus_1 - least) / float(p)))
c = int(least_pth_over - least_6k_minus_1)
while c < len(A):
A[c] = False
c = c + p
print "\nPrimes of the form 6k - 1:"
for i in xrange(1, len(A)):
if A[i] and A[i - 1]:
p1 = (i - 1 + least_6k_minus_1) * 6 - 1
p2 = (i + least_6k_minus_1) * 6 - 1
print p1, p2
Output:
Primes with a Difference of 6
4648 initial primes to work from.
Range: 1000000000 to 1000001000
Primes of the form 6k + 1:
1000000087 1000000093
1000000447 1000000453
1000000453 1000000459
Primes of the form 6k - 1:
1000000097 1000000103
1000000403 1000000409
1000000427 1000000433
1000000433 1000000439
1000000607 1000000613
In order to count consecutive primes, we have to take into account the interleaving lists of primes 6k + 1 and 6k - 1. Here's the count:
# https://rosettacode.org/wiki/Modular_inverse
def extended_gcd(aa, bb):
lastremainder, remainder = abs(aa), abs(bb)
x, lastx, y, lasty = 0, 1, 1, 0
while remainder:
lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
x, lastx = lastx - quotient*x, x
y, lasty = lasty - quotient*y, y
return lastremainder, lastx * (-1 if aa < 0 else 1), lasty * (-1 if bb < 0 else 1)
def modinv(a, m):
g, x, y = extended_gcd(a, m)
if g != 1:
raise ValueError
return x % m
from math import sqrt, ceil
import time
start = time.time()
n = 2000000000
sqrt_n = int(sqrt(n))
A = [True] * (sqrt_n + 1)
for i in xrange(2, sqrt_n // 2):
A[2*i] = False
primes = [2]
for i in xrange(3, sqrt_n, 2):
if A[i]:
primes.append(i)
c = i * i
while c <= sqrt_n:
A[c] = False
c = c + i
lower_bound = 1000000000
upper_bound = 2000000000
range = upper_bound - lower_bound
A = [True] * (range // 6 + 1)
least_6k_plus_1 = int(ceil((lower_bound - 1) / float(6)))
most_6k_plus_1 = (upper_bound - 1) // 6
for p in primes[2:]:
least = modinv(-6, p)
least_pth_over = int(least + p * ceil((least_6k_plus_1 - least) / float(p)))
c = int(least_pth_over - least_6k_plus_1)
while c < len(A):
A[c] = False
c = c + p
B = [True] * (range // 6 + 1)
least_6k_minus_1 = int(ceil((lower_bound + 1) / float(6)))
most_6k_minus_1 = (upper_bound + 1) // 6
for p in primes[2:]:
least = modinv(6, p)
least_pth_over = int(least + p * ceil((least_6k_minus_1 - least) / float(p)))
c = int(least_pth_over - least_6k_minus_1)
while c < len(B):
B[c] = False
c = c + p
total = 0
for i in xrange(1, max(len(A), len(B))):
if A[i] and A[i - 1] and not B[i]:
total = total + 1
if B[i] and B[i - 1] and not A[i - 1]:
total = total + 1
# 47374753 primes in range 1,000,000,000 to 2,000,000,000
print "%s consecutive primes with a difference of 6 in range %s to %s." % (total, lower_bound, upper_bound)
print "--%s seconds" % (time.time() - start)
Output:
5317860 consecutive primes with a difference of 6 in range 1000000000 to 2000000000.
--193.314619064 seconds
Python isn't the best language to write this in, but since that's what we're all doing...
This little segmented sieve finds the answer 5317860 in 3:24
import math
# Find primes < 2000000000
sieve = [True]*(int(math.sqrt(2000000000))+1)
for i in range(3,len(sieve)):
if (sieve[i]):
for j in range(2*i, len(sieve), i):
sieve[j] = False
smallOddPrimes = [i for i in range(3,len(sieve),2) if sieve[i]]
# Check primes in target segments
total=0
lastPrime=0
for base in range(1000000000, 2000000000, 10000000):
sieve = [True]*5000000
for p in smallOddPrimes:
st=p-(base%p)
if st%2==0: #first odd multiple of p
st+=p
for i in range(st//2,len(sieve),p):
sieve[i]=False
for prime in [i*2+base+1 for i in range(0,len(sieve)) if sieve[i]]:
if prime == lastPrime+6:
total+=1
lastPrime = prime
print(total)
There are a number of ways to compute the sexy primes between one billion and two billion. Here are four.
Our first solution identifies sexy primes p by using a primality test to check both p and p + 6 for primality:
def isSexy(n):
return isPrime(n) and isPrime(n+6)
Then we check each odd number from one billion to two billion:
counter = 0
for n in xrange(1000000001, 2000000000, 2):
if isSexy(n): counter += 1
print counter
That takes an estimated two hours on my machine, determined by running it from 1 billion to 1.1 billion and multiplying by 10. We need something better. Here's the complete code:
Python 2.7.9 (default, Jun 21 2019, 00:38:53)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def isqrt(n): # newton
... x = n; y = (x + 1) // 2
... while y < x:
... x = y
... y = (x + n // x) // 2
... return x
...
>>> def isSquare(n):
... # def q(n):
... # from sets import Set
... # s, sum = Set(), 0
... # for x in xrange(0,n):
... # t = pow(x,2,n)
... # if t not in s:
... # s.add(t)
... # sum += pow(2,t)
... # return sum
... # q(32) => 33751571
... # q(27) => 38348435
... # q(25) => 19483219
... # q(19) => 199411
... # q(17) => 107287
... # q(13) => 5659
... # q(11) => 571
... # q(7) => 23
... # 99.82% of non-squares
... # caught by filters before
... # square root calculation
... if 33751571>>(n%32)&1==0:
... return False
... if 38348435>>(n%27)&1==0:
... return False
... if 19483219>>(n%25)&1==0:
... return False
... if 199411>>(n%19)&1==0:
... return False
... if 107287>>(n%17)&1==0:
... return False
... if 5659>>(n%13)&1==0:
... return False
... if 571>>(n%11)&1==0:
... return False
... if 23>>(n% 7)&1==0:
... return False
... s = isqrt(n)
... if s * s == n: return s
... return False
...
>>> def primes(n): # sieve of eratosthenes
... i, p, ps, m = 0, 3, [2], n // 2
... sieve = [True] * m
... while p <= n:
... if sieve[i]:
... ps.append(p)
... for j in range((p*p-3)/2, m, p):
... sieve[j] = False
... i, p = i+1, p+2
... return ps
...
>>> pLimit,pList = 0,[]
>>> pLen,pMax = 0,0
>>>
>>> def storePrimes(n):
... # call with n=0 to clear
... global pLimit, pList
... global pLen, pMax
... if n == 0:
... pLimit,pList = 0,[]
... pLen,pMax = 0,0
... elif pLimit < n:
... pLimit = n
... pList = primes(n)
... # x=primesRange(pLimit,n)
... # pList += x
... pLen = len(pList)
... pMax = pList[-1]
...
>>> storePrimes(1000)
>>> def gcd(a, b): # euclid
... if b == 0: return a
... return gcd(b, a%b)
...
>>> def kronecker(a,b):
... # for any integers a and b
... # cohen 1.4.10
... if b == 0:
... if abs(a) == 1: return 1
... else: return 0
... if a%2==0 and b%2==0:
... return 0
... tab2=[0,1,0,-1,0,-1,0,1]
... v = 0
... while b%2==0: v,b=v+1,b/2
... if v%2==0: k = 1
... else: k = tab2[a%8]
... if b < 0:
... b = -b
... if a < 0: k = -k
... while True:
... if a == 0:
... if b > 1: return 0
... else: return k
... v = 0
... while a%2==0: v,a=v+1,a/2
... if v%2==1: k*=tab2[b%8]
... if (a&b&2): k = -k
... r=abs(a); a=b%r; b=r
...
>>> def jacobi(a,b):
... # for integers a and odd b
... if b%2 == 0:
... m="modulus must be odd"
... raise ValueError(m)
... return kronecker(a,b)
...
>>> def isSpsp(n,a,r=-1,s=-1):
... # strong pseudoprime
... if r < 0:
... r, s = 0, n - 1
... while s % 2 == 0:
... r, s = r + 1, s / 2
... if pow(a,s,n) == 1:
... return True
... for i in range(0,r):
... if pow(a,s,n) == n-1:
... return True
... s = s * 2
... return False
...
>>> def lucasPQ(p, q, m, n):
... # nth element of lucas
... # sequence with parameters
... # p and q (mod m); ignore
... # modulus operation when
... # m is zero
... def mod(x):
... if m == 0: return x
... return x % m
... def half(x):
... if x%2 == 1: x=x+m
... return mod(x / 2)
... un, vn, qn = 1, p, q
... u=0 if n%2==0 else 1
... v=2 if n%2==0 else p
... k=1 if n%2==0 else q
... n,d = n//2, p*p-4*q
... while n > 0:
... u2 = mod(un*vn)
... v2 = mod(vn*vn-2*qn)
... q2 = mod(qn*qn)
... n2 = n // 2
... if n % 2 == 1:
... uu = half(u*v2+u2*v)
... vv = half(v*v2+d*u*u2)
... u,v,k = uu,vv,k*q2
... un,vn,qn,n = u2,v2,q2,n2
... return u, v, k
...
>>> def isSlpsp(n):
... # strong lucas pseudoprime
... def selfridge(n):
... d,s = 5,1
... while True:
... ds = d * s
... if gcd(ds, n) > 1:
... return ds, 0, 0
... if jacobi(ds,n) == -1:
... return ds,1,(1-ds)/4
... d,s = d+2, -s
... d, p, q = selfridge(n)
... if p == 0: return n == d
... s, t = 0, n + 1
... while t % 2 == 0:
... s, t = s + 1, t / 2
... u,v,k = lucasPQ(p,q,n,t)
... if u == 0 or v == 0:
... return True
... for r in range(1, s):
... v = (v*v-2*k) % n
... if v == 0: return True
... k = (k * k) % n
... return False
...
>>> def isPrime(n):
... # mathematica method
... if n < 2: return False
... for p in pList[:25]:
... if n%p == 0: return n==p
... if isSquare(n):
... return False
... r, s = 0, n - 1
... while s % 2 == 0:
... r, s = r + 1, s / 2
... if not isSpsp(n,2,r,s):
... return False
... if not isSpsp(n,3,r,s):
... return False
... if not isSlpsp(n):
... return False
... return True
...
>>> def isSexy(n):
... return isPrime(n) and isPrime(n+6)
...
>>> counter = 0
>>> for n in xrange(1000000001, 2000000000, 2):
... if isSexy(n): counter += 1
...
>>> print counter
5924680
By the way, if you want to see how slow Python is for things like this, here is the equivalent program in Pari/GP, a programming environment designed for number-theoretic calculations, which finishes in 70253 milliseconds, just over a minute:
gettime(); c = 0;
forprime(p = 1000000000, 2000000000, if (isprime(p+6), c++));
print (c); print (gettime());
5924680
70253
Our second solution uses our standard prime generator to generate the primes from one billion to two billion, checking for each prime p if p - 6 is on the list:
counter = 0
ps = primeGen(1000000000)
p2 = next(ps); p1 = next(ps); p = next(ps)
while p < 2000000000:
if p - p2 == 6 or p - p1 == 6:
counter += 1
p2 = p1; p1 = p; p = next(ps)
print counter
That took about 8.5 minutes and produced the correct result. Here's the complete code:
Python 2.7.9 (default, Jun 21 2019, 00:38:53)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def primeGen(start=0):
... if start <= 2: yield 2
... if start <= 3: yield 3
... ps = primeGen()
... p=next(ps); p=next(ps)
... q = p*p; D = {}
... def add(m,s):
... while m in D: m += s
... D[m] = s
... while q <= start:
... x = (start // p) * p
... if x < start: x += p
... if x%2 == 0: x += p
... add(x, p+p)
... p=next(ps); q=p*p
... c = max(start-2, 3)
... if c%2 == 0: c += 1
... while True:
... c += 2
... if c in D:
... s = D.pop(c)
... add(c+s, s)
... elif c < q: yield c
... else: # c == q
... add(c+p+p, p+p)
... p=next(ps); q=p*p
...
>>> counter = 0
>>> ps = primeGen(1000000000)
>>> p2 = next(ps); p1 = next(ps); p = next(ps)
>>> while p < 2000000000:
... if p - p2 == 6 or p - p1 == 6:
... counter += 1
... p2 = p1; p1 = p; p = next(ps)
...
p>>> print counter
5924680
If you will permit me another digression on Pari/GP, here is the equivalent program using a prime generator, computing the solution in just 37 seconds:
gettime(); p2 = nextprime(1000000000); p1=nextprime(p2+1); c = 0;
forprime(p = nextprime(p1+1), 2000000000,
if (p-p2==6 || p-p1==6, c++); p2=p1; p1=p);
print (c); print (gettime());
5924680
37273
Our third solution uses a segmented sieve to make a list of primes from one billion to two billion, then scans the list counting the sexy primes:
counter = 0
ps = primes(1000000000, 2000000000)
if ps[1] - ps[0] == 6: counter += 1
for i in xrange(2,len(ps)):
if ps[i] - ps[i-2] == 6 or ps[i] - ps[i-1] == 6:
counter += 1
print counter
That takes about four minutes to run, and produces the correct result. Here's the complete code:
Python 2.7.9 (default, Jun 21 2019, 00:38:53)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def isqrt(n): # newton
... x = n; y = (x + 1) // 2
... while y < x:
... x = y
... y = (x + n // x) // 2
... return x
...
>>> def primes(lo, hi=False):
... if not hi: lo,hi = 0,lo
...
... if hi-lo <= 50:
... xs = range(lo,hi)
... return filter(isPrime,xs)
...
... # sieve of eratosthenes
... if lo<=2 and hi<=1000000:
... i,p,ps,m = 0,3,[2],hi//2
... sieve = [True] * m
... while p <= hi:
... if sieve[i]:
... ps.append(p)
... s = (p*p-3)/2
... for j in xrange(s,m,p):
... sieve[j] = False
... i,p = i+1, p+2
... return ps
...
... if lo < isqrt(hi):
... r = isqrt(hi) + 1
... loPs=primes(lo,r)
... hiPs=primes(r+1,hi)
... return loPs + hiPs
...
... # segmented sieve
... if lo%2==1: lo-=1
... if hi%2==1: hi+=1
... r = isqrt(hi) + 1
... b = r//2; bs = [True] * b
... ps = primes(r)[1:]
... qs = [0] * len(ps); zs = []
... for i in xrange(len(ps)):
... q = (lo+1+ps[i]) / -2
... qs[i]= q % ps[i]
... for t in xrange(lo,hi,b+b):
... if hi<(t+b+b): b=(hi-t)/2
... for j in xrange(b):
... bs[j] = True
... for k in xrange(len(ps)):
... q,p = qs[k], ps[k]
... for j in xrange(q,b,p):
... bs[j] = False
... qs[k] = (qs[k]-b)%ps[k]
... for j in xrange(b):
... if bs[j]:
... zs.append(t+j+j+1)
... return zs
...
>>> counter = 0
>>> ps = primes(1000000000, 2000000000)
>>> for i in xrange(2, len(ps)):
... if ps[i] - ps[i-2] == 6 or ps[i] - ps[i-1] == 6:
... counter += 1
...
>>> print counter
5924680
Here is our fourth and final program to count the sexy primes, suggested in the comments above. The idea is to sieve each of the polynomials 6 n + 1 and 6 n − 1 separately, scan for adjacent pairs, and combine the counts.
This is a little bit tricky, so let's look at an example: sieve 6 n + 1 on the range 100 to 200 using the primes 5, 7, 11 and 13, which are the sieving primes less than 200 (excluding 2 and 3, which divide 6). The sieve has 17 elements 103, 109, 115, 121, 127, 133, 139, 145, 151, 157, 163, 169, 175, 181, 187, 193, 199. The least multiple of 5 in the list is 115, so we strike 115, 145 and 175 (every 5th item) from the sieve. The least multiple of 7 in the list is 133, so we strike 133 and 175 (every 7th item) from the sieve. The least multiple of 11 in the list is 121, so we strike 121 and 187 (every 11th item) from the list. And the least multiple of 13 in the list is 169, so we strike it from the list (it's in the middle of a 17-item list, and has no other multiples in the list). The primes that remain in the list are 103, 109, 127, 139, 151, 157, 163, 181, 193, and 199; of those, 103, 151, 157 and 193 are sexy.
The trick is finding the offset in the sieve of the first multiple of the prime. The formula is (lo + p) / -6 (mod p), where lo is the first element in the sieve (103 in the example above) and p is the prime; -6 comes from the gap between successive elements of the sieve. In modular arithmetic, division is undefined, so we can't just divide by -6; instead, we find the modular inverse. And since modular inverse is undefined for negative numbers, we first convert -6 to its equivalent mod p. Thus, for our four sieving primes, the offsets into the sieve are:
((103 + 5) * inverse(-6 % 5, 5)) % 5 = 2 ==> points to 115
((103 + 7) * inverse(-6 % 7, 7)) % 7 = 5 ==> points to 133
((103 + 11) * inverse(-6 % 11, 11)) % 11 = 3 ==> points to 121
((103 + 13) * inverse(-6 % 13, 13)) % 13 = 11 ==> points to 169
Sieving sieve 6 n - 1 works the same way, except that lo is 101 instead of 103; the sieve contains 101, 107, 113, 119, 125, 131, 137, 143, 149, 155, 161, 167, 173, 179, 185, 191, 197:
((101 + 5) * inverse(-6 % 5, 5)) % 5 = 4 ==> points to 125
((101 + 7) * inverse(-6 % 7, 7)) % 7 = 3 ==> points to 119
((101 + 11) * inverse(-6 % 11, 11)) % 11 = 7 ==> points to 143
((101 + 13) * inverse(-6 % 13, 13)) % 13 = 7 ==> points to 143
After sieving, the numbers that remain in the sieve are 101, 107, 113, 131, 137, 149, 167, 173, 179, 191, 197, of which 101, 107, 131, 167, 173 and 191 are sexy, so there are 10 sexy primes between 100 and 200.
Here's the code:
Python 2.7.9 (default, Jun 21 2019, 00:38:53)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def isqrt(n): # newton
... x = n; y = (x + 1) // 2
... while y < x:
... x = y
... y = (x + n // x) // 2
... return x
...
>>> def inverse(x, m): # euclid
... a, b, u = 0, m, 1
... while x > 0:
... x,a,b,u=b%x,u,x,a-b//x*u
... if b == 1: return a % m
... return 0 # must be coprime
...
>>> def primes(n): # sieve of eratosthenes
... i, p, ps, m = 0, 3, [2], n // 2
... sieve = [True] * m
... while p <= n:
... if sieve[i]:
... ps.append(p)
... for j in range((p*p-3)/2, m, p):
... sieve[j] = False
... i, p = i+1, p+2
... return ps
...
>>> counter = 0
>>> ps = primes(isqrt(2000000000))[2:]
>>> size = (2000000000-1000000000)/6+1
>>>
>>> # sieve on 6n-1
... lo = 1000000000/6*6+5
>>> sieve = [True] * size
>>> for p in ps:
... q = ((lo+p)*inverse(-6%p,p))%p
... for i in xrange(q,size,p):
... sieve[i] = False
...
>>> for i in xrange(1,size):
... if sieve[i] and sieve[i-1]:
... counter += 1
...
>>> # sieve on 6n+1
... lo += 2
>>> sieve = [True] * size
>>> for i in xrange(0,size):
... sieve[i] = True
...
>>> for p in ps:
... q = ((lo+p)*inverse(-6%p,p))%p
... for i in xrange(q,size,p):
... sieve[i] = False
...
>>> for i in xrange(1,size):
... if sieve[i] and sieve[i-1]:
... counter += 1
...
>>> print counter
5924680
That took about three minutes to run and produced the correct result.
If you are determined to count only consecutive primes that differ by 6, instead of counting all the sexy primes, the easiest way is to use the segmented sieve of the third method and change the predicate in the counting test to look only at ps[i] - ps[i-1] == 6. Or you can do this in just 22 seconds in Pari/GP:
gettime(); prev = nextprime(1000000000); c = 0;
forprime(p = nextprime(prev+1), 2000000000, if (p-prev==6, c++); prev=p);
print (c); print (gettime());
5317860
22212
First of all, I build a sieve; you need check primes only through sqrt(limit). This takes less that 7 minutes on my aging desktop (Intel Haswell ... yes, that out of date).
With this, finding the pairs is trivial. Check each odd number and its desired partner. I've also printed the time used at 1000-pair intervals.
NOTE: if the problem is, indeed, to count only consecutive primes, then remove the check against prime_list[idx+2].
from time import time
start = time()
limit = 10**9 * 2
prime_list = build_sieve(limit)
print("sieve built in", round(time()-start, 2), "sec")
count = 0
for idx, p in enumerate(prime_list[:-2]):
if prime_list[idx+1] == p+6 or \
prime_list[idx+2] == p+6:
count += 1
print(count, "pairs found in", round(time()-start, 2), "sec")
Output:
sieve built in 417.01 sec
12773727 pairs found in 481.23 sec
That's about 7 minutes to build the sieve, another minute to count the pairs. This is with base Python. If you use NumPy, you can shift the sieve by one and two positions, do the vectorized subtractions, and count how many times 6 appears in the results.

Python Last 6 Results, removing the last

I just can't get it done. Therefore I'll post the full code.
The .csv used is from http://www.football-data.co.uk/mmz4281/1415/E0.csv
Now when run, the variables home_team_a, home_team_d, away_team_a and away_team_d are based on all of the previous matches but I want them to be based always on the last 6 matches.
import csv, math, ast, numpy as np
def poisson(actual, mean):
return math.pow(mean, actual) * math.exp(-mean) / math.factorial(actual)
csvFile = '20152016.csv'
team_list = []
k = open('team_list.txt', 'w')
k.write("""{
""")
csvRead = csv.reader(open(csvFile))
next(csvRead)
for row in csvRead:
if row[2] not in team_list:
team_list.append(row[2])
if row[3] not in team_list:
team_list.append(row[3])
team_list.sort()
for team in team_list:
k.write(""" '%s': {'home_goals': 0, 'away_goals': 0, 'home_conceded': 0, 'away_conceded': 0, 'home_games': 0, 'away_games': 0, 'alpha_h': 0, 'beta_h': 0, 'alpha_a': 0, 'beta_a': 0},
""" % (team))
k.write("}")
k.close()
s = open('team_list.txt', 'r').read()
dict = ast.literal_eval(s)
GAMES_PLAYED = 0
WEEKS_WAIT = 4
TOTAL_VALUE = 0
csvRead = csv.reader(open(csvFile))
next(csvRead)
for game in csvRead:
home_team = game[2]
away_team = game[3]
home_goals = int(game[4])
away_goals = int(game[5])
home_win_prob = 0
draw_win_prob = 0
away_win_prob = 0
curr_home_goals = 0
curr_away_goals = 0
avg_home_goals = 1
avg_away_goals = 1
team_bet = ''
ev_bet = ''
# GETTING UPDATED VARIABLES
for key, value in dict.items():
curr_home_goals += dict[key]['home_goals']
curr_away_goals += dict[key]['away_goals']
if GAMES_PLAYED > (WEEKS_WAIT * 10):
avg_home_goals = curr_home_goals / (GAMES_PLAYED)
avg_away_goals = curr_away_goals / (GAMES_PLAYED)
# CALCULATING FACTORS
if GAMES_PLAYED > (WEEKS_WAIT * 10):
home_team_a = (dict[home_team]['alpha_h'] + dict[home_team]['alpha_a']) / 2
away_team_a = (dict[away_team]['alpha_h'] + dict[away_team]['alpha_a']) / 2
home_team_d = (dict[home_team]['beta_h'] + dict[home_team]['beta_a']) / 2
away_team_d = (dict[away_team]['beta_h'] + dict[away_team]['beta_a']) / 2
home_team_exp = avg_home_goals * home_team_a * away_team_d
away_team_exp = avg_away_goals * away_team_a * home_team_d
# RUNNING POISSON
l = open('poisson.txt', 'w')
for i in range(10):
for j in range(10):
prob = poisson(i, home_team_exp) * poisson(j, away_team_exp)
l.write("Prob%s%s = %s\n" % (i, j, prob))
l.close()
with open('poisson.txt') as f:
for line in f:
home_goals_m = int(line.split(' = ')[0][4])
away_goals_m = int(line.split(' = ')[0][5])
prob = float(line.split(' = ')[1])
if home_goals_m > away_goals_m:
home_win_prob += prob
elif home_goals_m == away_goals_m:
draw_win_prob += prob
elif home_goals_m < away_goals_m:
away_win_prob += prob
#CALCULATE VALUE
bet365odds_h, bet365odds_d, bet365odds_a = float(game[23]), float(game[24]), float(game[25])
ev_h = (home_win_prob * (bet365odds_h - 1)) - (1 - home_win_prob)
ev_d = (draw_win_prob * (bet365odds_d - 1)) - (1 - draw_win_prob)
ev_a = (away_win_prob * (bet365odds_a - 1)) - (1 - away_win_prob)
highestEV = max(ev_h, ev_d, ev_a)
if (ev_h == highestEV) and (ev_h > 0):
team_bet = home_team
ev_bet = ev_h
if home_goals > away_goals:
TOTAL_VALUE += (bet365odds_h - 1)
else:
TOTAL_VALUE -= 1
elif (ev_d == highestEV) and (ev_d > 0):
team_bet = 'Draw'
ev_bet = ev_d
if home_goals == away_goals:
TOTAL_VALUE += (bet365odds_d - 1)
else:
TOTAL_VALUE -= 1
elif (ev_a == highestEV) and (ev_a > 0):
team_bet = away_team
ev_bet = ev_a
if home_goals < away_goals:
TOTAL_VALUE += (bet365odds_a - 1)
else:
TOTAL_VALUE -= 1
if (team_bet != '') and (ev_bet != ''):
print ("Bet on '%s' (EV = %s)" % (team_bet, ev_bet))
print (TOTAL_VALUE)
# UPDATE VARIABLES AFTER MATCH HAS BEEN PLAYED
dict[home_team]['home_goals'] += home_goals
dict[home_team]['home_conceded'] += away_goals
dict[home_team]['home_games'] += 1
dict[away_team]['away_goals'] += away_goals
dict[away_team]['away_conceded'] += home_goals
dict[away_team]['away_games'] += 1
GAMES_PLAYED += 1
# CREATE FACTORS
if GAMES_PLAYED > (WEEKS_WAIT * 10):
for key, value in dict.items():
alpha_h = (dict[key]['home_goals'] / dict[key]['home_games']) / avg_home_goals
beta_h = (dict[key]['home_conceded'] / dict[key]['home_games']) / avg_away_goals
alpha_a = (dict[key]['away_goals'] / dict[key]['away_games']) / avg_away_goals
beta_a = (dict[key]['away_conceded'] / dict[key]['away_games']) / avg_home_goals
dict[key]['alpha_h'] = alpha_h
dict[key]['beta_h'] = beta_h
dict[key]['alpha_a'] = alpha_a
dict[key]['beta_a'] = beta_a
Use a deque to keep the 6 most recent items in memory; adding a new record will "push out" the oldest one.
import collections
import itertools
import csv
with open("foo.csv") as fh:
# Skip the first 44 rows
csv_read = islice(csv.reader(fh), 44, None)
# Initialize the deque with the next 6 rows
d = collections.deque(islice(csv_read, 6), 6)
for record in csv_read:
d.append(record)
print(list(d)) # Rows 46-51, then 47-52, then 48-53, etc
Because you set the maximum length of the deque to 6, each append to a "full" deque pushes out the older one. On the first iteration, d.append pushes out row 45 and adds row 51. On the next iteration, adding row 52 pushes out row 46, etc.
In general, a deque is a data structure that is like a combination of a queue and a stack; you can add or remove items to either end efficiently, but accessing an arbitrary item or modifying the "middle" is slow. Here, we're taking advantage of the fact that appending to a full deque causes an implicit removal from the opposite end.
How about:
if seen_records == 200:
recs = list(csvRead)[seen_records - 6:seen_records + 1]
You can do something like this....
previous_index = 0
previous_max = 6 # max number of previous numbers to remember
previous = [None for _ in range(previous_max)]
csvFile = 'X.csv'
seen_records = 0
csvRead = csv.reader(open(csvFile))
# Enumerate over the records to keep track of the index of each one
for i, records in enumerate(csvRead):
if (i > 50):
seen_records =+ 1
if previous_index == previous_max:
previous_index = 0 # Reset to the beginning when we reach the end
# Store the record and increment the index to the next location
previous[previous_index] = record
previous_index += 1
This creates a very basic array of length previous_max and just stores the oldest data at index 0 and newest at previous_max -1.

BitString with python

I am trying to use bitstring for python to interpret an incoming data packet and break it up into readable sections. the packet will consist of a header( Source (8bits), Destination (8bits), NS(3bits), NR(3bits), RSV(1bit), LST(1bit), OPcode(8bits), LEN(8bits) ),
the Payload which is somewhere between 0 and 128 bytes (determined by the LEN in the header) and a CRC of 16bits.
The data will be arriving in a large packet over the com port. The data is originated from a micro controller that is packetizing the data and sending it to the user, which is where the python comes in to play.
Since i am unsure of how to store it before parsing I do not have any code for this.
I am new to python and need a little help getting this off the ground.
Thanks,
Erik
EDIT
I currently have a section of code up and running, but it is not producing exactly what i need.... Here is the section of code that i have up and running....
def packet_make(ser):
src = 10
p = 0
lst = 0
payload_make = 0
crc = '0x0031'
ns = 0
nr = 0
rsv = 0
packet_timeout = 0
top = 256
topm = 255
#os.system(['clear','cls'][os.name == 'nt'])
print("\tBatts: 1 \t| Berry: 2 \t| Bessler: 3")
print("\tCordell: 4 \t| Dave: 5 \t| Gold: 6")
print("\tYen: 7 \t| Erik: 8 \t| Tommy: 9")
print("\tParsons: 10 \t| JP: 11 \t| Sucess: 12")
dst = raw_input("Please select a destination Adderss: ")
message = raw_input("Please type a message: ")
#################### Start Making packet#################
p_msg = message
message = message.encode("hex")
ln = (len(message)/2)
#print (ln)
ln_hex = (ln * 2)
message = list(message)
num_of_packets = ((ln/128) + 1)
#print (num_of_packets)
message = "".join(message)
src = hex(src)
dst = hex(int(dst))
#print (message)
print("\n########Number of packets = "+str(num_of_packets) + " ############\n\n")
for p in range (num_of_packets):
Ack_rx = 0
if( (p + 1) == (num_of_packets)):
lst = 1
else:
lst = 0
header_info = 0b00000000
if ((p % 2) > 0):
ns = 1
else:
ns = 0
header_info = (header_info | (ns << 5))
header_info = (header_info | (nr << 2))
header_info = (header_info | (rsv << 1))
header_info = (header_info | (lst))
header_info = hex(header_info)
#print (header_info)
op_code = '0x44'
if (lst == 1):
ln_packet = ((ln_hex - (p * 256)) % 256)
if (p > 0):
ln_packet = (ln_packet + 2)
else:
ln_packet = ln_packet
ln_packet = (ln_packet / 2)
# print (ln_packet)
# print()
else:
ln_packet = 128
# print(ln_packet)
# print()
#ll = (p * 128)
#print(ll)
#ul = ((ln - ll) % 128)
#print(ul)
#print (message[ll:ul])
if ((p == 0)&(ln_hex > 256)):
ll = (p * 255)
# print(ll)
payload_make = (message[ll:256])
# print (payload_make)
elif ((p > 0) & ((ln_hex - (p*256)) > 256)):
ll = (p * 256)
# print(ll)
ll = (ll - 2)
ul = (ll + 256)
# print (ul)
payload_make = (message[ll:ul])
# print(payload_make)
elif ((p > 0) & ((ln_hex - (p*256)) < 257)):
ll = (p * 256)
# print(ll)
ll = (ll - 2)
ul = ((ln_hex - ll) % 256)
ul = (ll + (ul))
ul = ul + 2
print()
print(ul)
print(ln_hex)
print(ln_packet)
print()
# print(ul)
payload_make = (message[ll:ul])
# print(payload)
elif ((p == 0) & (ln_hex < 257)):
ll = (p * 255)
ul = ln_hex
payload_make = (message[ll:ul])
print(payload_make)
packet_m = BitStream()
########################HEADER#########################
packet_m.append('0x0')
packet_m.append(src) #src
packet_m.append('0x0')
packet_m.append(dst) #dst
if(int(header_info,16) < 16):
packet_m.append('0x0')
packet_m.append(header_info) # Ns, Nr, RSV, Lst
packet_m.append(op_code) #op Code
#if(ln_packet < 16):
#packet_m.append('0x0')
packet_m.append((hex(ln_packet))) #Length
###################END OF HEADER#######################
packet_m.append(("0x"+payload_make)) #Payload
#packet_m.append(BitArray(p_msg)) #Payload
packet_m.append(crc) #CRC
#print()
#print(packet)
temp_ack = '0x00'
print(packet_m)
print(ln_packet)
while((Ack_rx == 0) & (packet_timeout <= 5)):
try:
###### Send the packet
#ser.write(chr(0x31))
str_pack = list(str(packet_m)[2:])
"".join(str_pack)
ser.write(chr(0x02))
#ser.write((str(packet_m)[2:]))
for i in range (len(str_pack)):
t = ord(str_pack[i])
ser.write(chr(t))
print(chr(t))
ser.write(chr(0x04))
ser.write(chr(0x10))
ack_packet = BitStream(ser.read())
if((len(ack_packet) > 3)):
temp_ack = ACK_parse(ack_packet)
else:
packet_timeout = (packet_timeout + 1)
print "why so serious\n\n"
if(temp_ack == '0x41'):
Ack_rx = 1
elif (temp_ack == '0x4E'):
Ack_rx = 0
else:
Acl_rx = 0
except serial.SerialTimeoutException: #if timeout occurs increment counter and resend last packet
Ack_rx = 0
packet_timeout = (packet_timeout + 1)
except serial.SerialException:
print "Error ... is not Active!!!", port
The output that is printed to the terminal is as follows when source and payload are both 1:
#######Number of packets = 1 #######
31
0x0a0101441310031
1
0
.... etc..
The micro on the other end of the serial reads : 0a0101441310031
when it should read a 1 1 44 1 31 0031
Python is sending each value as a separate character rather than putting it as one char. when it was appended into the packet rather than storing into the proper length and data type it seems to have separated the hex into 2 8 bit locations rather than 1 8 bit location....
The section of python code where i am reading from the Micro works flawlessly when reading an acknowledgement packet. I have not tried it with data, but i don't think that will be an issue. The C side can not read the ACK from the python side since it is separating the hex values into 2 char rather than transmitting just the 8 bit value....
Any ideas??? Thanks
Your exact problem is a bit vague, but I should be able to help with the bitstring portion of it.
You've probably got your payload to analyse as a str (or possibly bytes if you're using Python 3 but don't worry - it works the same way). If you haven't got that far then you're going to have to ask a more basic question. I'm going to make up some data to analyse (all this is being done with an interactive Python session):
>>> from bitstring import BitStream
>>> packet_data = '(2\x06D\x03\x124V\x03\xe8'
>>> b = BitStream(bytes=packet_data)
Now you can unpack or use reads on your BitStream to extract the things you need. For example:
>>> b.read('uint:8')
40
>>> b.read('uint:8')
50
>>> b.readlist('uint:3, uint:3')
[0, 1]
>>> b.readlist('2*bool')
[True, False]
>>> b.readlist('2*uint:8')
[68, 3]
>>> b.read('bytes:3')
'\x124V'
This is just parsing the bytes and interpreting chunks as unsigned integers, bools or bytes. Take a look at the manual for more details.
If you just want the payload, then you can just extract the length then grab it by slicing:
>>> length = b[32:40].uint
>>> b[40:40 + length*8]
BitStream('0x123456')
and if you want it back as a Python str, then use the bytes interpretation:
>>> b[40:40 + 3*8].bytes
'\x124V'
There are more advance things you can do too, but a good way to get going in Python is often to open an interactive session and try some things out.

Categories

Resources