Calculating the gap between pairs of twin primes in python - python

So I am making a program that generates a list of prime numbers and then sorts them into twin prime pairs, then calculates out what two sets of twin primes have the largest difference. I have gotten to sorting it into a list of twin prime pairs with my code, but now I am having a hard time figuring out how to make the next part happen. I am not quite sure how I can calculate the largest gap between the primes. Here is what I got so far:
def is_factor(n,f):
'''
Returns True if f is a factor of n,
OTW returns False.
Both n and f are ints.
'''
TV = (n % f == 0)
return TV
def properFactorsOf(n):
'''
Returns a list of the proper factors
of n. n is an int.
f is a proper factor of n if:
f is a factor of n
f > 1 and f < n.
'''
L = []
upper = n//2 # largest f to test
for f in range(2,upper + 1):
if is_factor(n,f):
L.append(f)
return L
def is_prime(n):
'''
Returns True if n is a prime,
OTW returns False.
n is an int.
Use properFactorsOf(n) to check whether n is prime.
'''
TV = len(properFactorsOf(n)) == 0
return TV
def LofPrimes(n):
'''
Returns a list of the first n primes.
Uses is_prime(n).
'''
primes = [2,3]
p = 5
while len(primes) < n:
if is_prime(p):
primes.append(p)
p += 2
return primes
def twins():
n = eval(input("How many primes?"))
L = (LofPrimes(n)) #Makes a list consisting of the function LofPrimes(n)
m = 1 #If m is zero it would take the zeroth position
L3 = [] # This is the list of twins
for i in range(len(L)-1): #keeps it in range
tp = L[m]-L[m-1] #subtract pos 1 from pos zero
if tp == 2: # If the difference of the pair is 2
L3.append(([L[m-1],L[m]])) #add the twins to the list L3
m += 1 # set m back to 1 at the end of the loop
print(L3)
So I feel like I am kind of on the right path, I made some pseudo code to give you an idea on where my thought is going:
assign a temp variable to m-1 on the first run,
assign a temp variable to m on the second run,
make a loop to go through the list of twins
take the difference of m-1 from the first set and m from the second set
in this loop calculate the max gap
return the greatest difference

Suppose we have a list of the pairs of primes, what you call L3 which could be like this:
L3 = [(3, 5), (5, 7), (11, 13), (17, 19), (29, 31), (41, 43), (59, 61),
(71, 73), (101, 103), (107, 109), (137, 139)]
Then what we want to do is take the first element of a pair minus the last element of the previous pair.
We also want to accumulate a list of these values so later we can see at which index the maximum happens. The reduce function is good for this.
def helper_to_subtract_pairs(acc, x):
return acc[:-1] + [x[0] - acc[-1]] + [x[1]]
Then printing
reduce(helper_to_subtract_pairs, L3, [0])
gives
[3, 0, 4, 4, 10, 10, 16, 10, 28, 4, 28, 139]
The first element happens because inside the call to reduce we use a starting value of [0] (so the first prime 3 leads to 3 - 0). We can ignore that. The final item, 139, represents the number that would be part of the subtraction if there was one more pair on the end. But since there is not, we can ignore that too:
In [336]: reduce(helper_to_subtract_pairs, L3, [0])[1:-1]
Out[336]: [0, 4, 4, 10, 10, 16, 10, 28, 4, 28]
Now, we want the index(es) where the max occurs. For this let's use the Python recipe for argmax:
def argmax(some_list):
return max(enumerate(some_list), key=lambda x:x[1])[0]
Then we get:
In [338]: argmax(reduce(helper_to_subtract_pairs, L3, [0])[1:-1])
Out[338]: 7
telling us that the gap at index 7 is the biggest (or at least tied for the biggest). The 7th gap in this example was between (101, 103) and (71, 73) (remember Python is 0-based, so the 7th gap is really between pair 7 and pair 8).
So as a function of just the list L3, we can write:
def max_gap(prime_list):
gaps = reduce(helper_to_subtract_pairs, prime_list, [0])[1:-1]
max_idx = argmax(gaps)
return gaps[max_idx], prime_list[max_idx:max_idx + 2]
The whole thing could look like:
def argmax(some_list):
return max(enumerate(some_list), key=lambda x:x[1])[0]
def helper_to_subtract_pairs(acc, x):
return acc[:-1] + [x[0] - acc[-1]] + [x[1]]
def max_gap(prime_list):
gaps = reduce(helper_to_subtract_pairs, prime_list, [0])[1:-1]
max_idx = argmax(gaps)
return gaps[max_idx], prime_list[max_idx:max_idx + 2]
L3 = [(3, 5), (5, 7), (11, 13), (17, 19), (29, 31), (41, 43), (59, 61),
(71, 73), (101, 103), (107, 109), (137, 139)]
print max_gap(L3)
which prints
In [342]: print max_gap(L3)
(28, [(71, 73), (101, 103)])
It's going to be more effort to modify argmax to return all occurrences of the max. If this is for investigating the asymptotic growth of the gap size, you won't need to, since any max will do to summarize a particular collection of primes.

Related

How to compare a pair of values in python, to see if the next value in pair is greater than the previous?

I have the following list:
each pair of the value gives me information about a specific row. I want to be able to compare the values from the first position, to the next and see if the next value is less than the current value, if it is keep it that way, if not delete that pair. So for example, for the first index 0 and 1, comparing 29 to 25, I see that 25 is less than 29 so I keep the pair, now I add two to the current index, taking me to 16 here I see that 16 is not less than 19 so I delete the pair values(16,19). I have the following code:
curr = 0
skip = 0
finapS = []
while curr < len(apS):
if distance1[apS[skip+1]] < distance1[apS[skip]]:
print("its less than prev")
print(curr,skip)
finapS.append(distance1[apS[skip]])
finapS.append(distance1[apS[skip+1]])
skip = skip + 2
curr = curr + 1
print("itterated,", skip, curr)
distance1 is a list of values that has variations of data points. apS is a list that contains the index of the important values from the distance1 list. Distance1 has all the values, but I need only the values from the index of apS, now I need to see if those pairs and the values of them are in descending order. The Code I tried running is giving me infinite loop I can't understand why. Here I am adding the values to a new list, but if possible I would like to just delete those pairs of value and keep the original list.
I think this kind of logic is more easily done using a generator. You can loop through the data and only yield values if they meet your condition. e.g.
def filter_pairs(data):
try:
it = iter(data)
while True:
a, b = next(it), next(it)
if b < a:
yield from (a, b)
except StopIteration:
pass
Example usage:
>>> aps = [1, 2, 3, 1, 2, 4, 6, 5]
>>> finaps = list(filter_pairs(aps))
>>> finaps
[3, 1, 6, 5]
So it looks like you want a new list. Therefore:
apS = [29.12, 25.01, 16.39, 19.49, 14.24, 12.06]
apS_new = []
for x, y in zip(apS[::2], apS[1::2]):
if x > y:
apS_new.extend([x, y])
print(apS_new)
Output:
[29.12, 25.01, 14.24, 12.06]
Pure Python speaking, I think zip is the elegant way of doing this, combined with slice steps.
Assuming you list as defined as::
>>> a = [29, 25, 16, 19, 14, 12, 22, 8, 26, 25, 26]
You can zip the list into itself with a shift of one, and a slice step of two::
>>> list(zip(a[:-1:2], a[1::2]))
[(29, 25), (16, 19), (14, 12), (22, 8), (26, 25)]
Once you have that, you can then filter the sequence down to the items you want, your full solution will be::
>>> list((x, y) for (x, y) in zip(a[:-1:2], a[1::2]) if x > y)
[(29, 25), (14, 12), (22, 8), (26, 25)]
If you prefer to go the numpy path, then read about the np.shift function.
If your test is false you loop without augmenting the counter curr.
You need an
else:
curr+=1
(or +=2 according to the logic)
to progress through the list.

Iterate Over Multiple Start-Stop Values in Python

Let's say I have a number, l, and I'd like to split it up into roughly equal n chunks. For example:
l = 11
n = 3
step = 1 + l // n
for start in range(0, l, step):
stop = min(l, start+step)
print(start, stop)
In this case, the first chunk (chunk 0) goes from 0 to 4 (5 elements), the next chunk (chunk 1) goes from 4 to 8 (5 elements), and the last chunk (chunk 2) is slightly smaller and goes from 8 to 11 (4 elements). Of course, the values of l and n may vary but both values will always be positive integers and n will always be smaller than l.
What I need to do is to generate a list that will iterate through each chunk in a round-robin fashion and append some chunk information to a list. The list should contain a tuple of the chunk number (i.e., 0, 1, or 2) and the next available start value in that chunk (until that chunk is exhausted as controlled by the stop value). So, the output list would be:
[(0,0), (1,4), (2,8), (0,1), (1,5), (2,9), (0,2), (1,6), (2,10), (0,3), (1,7)]
Note that the last chunk has one last element than the first two chunks. Whatever the solution is, it needs to work for any l and n (as long as both values are positive integers and n is always smaller than l). For simplicity, you can assume that l will be less than 100,000,000.
What is the best way to generate this list?
Use two loops for the two levels of your problem. The outer loop runs the starting point through all numbers in range(step). From there, use that value as the starting point for the inner loop you already wrote. Note that you have to adjust your output: you're printing (start, stop) values, when your requested output has (chunk#, start) values.
Can you take it from there?
One possible solution, using generators:
from itertools import islice, zip_longest, cycle
def chunk(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
def generate(l, n):
c, step = cycle(range(n)), l // n + (l % n != 0)
yield from ((next(c), v) for vals in zip_longest(*chunk(range(l), step)) for v in vals if v is not None)
l = 11
n = 3
out = [*generate(l, n)]
print(out)
Prints:
[(0, 0), (1, 4), (2, 8), (0, 1), (1, 5), (2, 9), (0, 2), (1, 6), (2, 10), (0, 3), (1, 7)]
For:
l = 9
n = 3
The output is:
[(0, 0), (1, 3), (2, 6), (0, 1), (1, 4), (2, 7), (0, 2), (1, 5), (2, 8)]

Group continuous numbers in a tuple with tolerance range

if i have a tuple set of numbers:
locSet = [(62.5, 121.0), (62.50000762939453, 121.00001525878906), (63.0, 121.0),(63.000003814697266, 121.00001525878906), (144.0, 41.5)]
I want to group them with a tolerance range of +/- 3.
aFunc(locSet)
which returns
[(62.5, 121.0), (144.0, 41.5)]
I have seen Identify groups of continuous numbers in a list but that is for continous integers.
If I have understood well, you are searching the tuples whose values differs in an absolute amount that is in the tolerance range: [0, 1, 2, 3]
Assuming this, my solution returns a list of lists, where every internal list contains tuples that satisfy the condition.
def aFunc(locSet):
# Sort the list.
locSet = sorted(locSet,key=lambda x: x[0]+x[1])
toleranceRange = 3
resultLst = []
for i in range(len(locSet)):
sum1 = locSet[i][0] + locSet[i][1]
tempLst = [locSet[i]]
for j in range(i+1,len(locSet)):
sum2 = locSet[j][0] + locSet[j][1]
if (abs(sum1-sum2) in range(toleranceRange+1)):
tempLst.append(locSet[j])
if (len(tempLst) > 1):
for lst in resultLst:
if (list(set(tempLst) - set(lst)) == []):
# This solution is part of a previous solution.
# Doesn't include it.
break
else:
# Valid solution.
resultLst.append(tempLst)
return resultLst
Here two use examples:
locSet1 = [(62.5, 121.0), (62.50000762939453, 121.00001525878906), (63.0, 121.0),(63.000003814697266, 121.00001525878906), (144.0, 41.5)]
locSet2 = [(10, 20), (12, 20), (13, 20), (14, 20)]
print aFunc(locSet1)
[[(62.5, 121.0), (144.0, 41.5)]]
print aFunc(locSet2)
[[(10, 20), (12, 20), (13, 20)], [(12, 20), (13, 20), (14, 20)]]
I hope to have been of help.

Out of range index

I am trying to make a program that will count the numbers in the list number, and would search for a sum of 10 in sequence_len numbers.
In the minute it gets a 10, it should stop.
1. With this code I have an error. what should I do?
total=total+(list_n[i+n])
IndexError: list index out of range
2.I want the first for to be stop if Im finding a sum of then. Is it write to "break" at the end as I did or should I write i=len(list_n)?
number = 1234
sequence_len = 2
list_n=[]
total=0
b="false"
list_t=[]
for j in str(number):
list_n.append(int(j))
c=len(list_n)
for i in list_n:
n=0
while n<sequence_len:
total=total+(list_n[i+n])
n=n+1
if total==10:
b=true
seq=0
while seq>sequence_len:
list_t.append(list_t[i+seq])
seq=seq+1
break
else:
total=0
if b=="true":
break
if b=="false":
print "Didn’t find any sequence of size", sequence_len
else:
print "Found a sequence of size", sequence_len ,":", list_t
You have a couple of errors. First with the basic:
b=true
This needs to the True, otherwise, python will look for the true variable.
Secondly, i actually contains the value of the variable for that iteration (loop). For example:
>>> l = ['a', 'b', 'c']
>>> for i in l: print i
a
b
c
Because of this, you cannot use it as an index, as indexes have to be integers. So, what you need to do is use enumerate, this will generate a tuple of both the index and the value, so something like:
for i, var in enumerate(list_n):
n = 0
An example of enumerate in action:
>>> var = enumerate([1,6,5,32,1])
>>> for x in var: print x
(0, 1)
(1, 6)
(2, 5)
(3, 32)
(4, 1)
And this statement should has logical problems I believe:
total = total + (list_n[i + n - 1])
If you want to get a sum of 10 from a list of numbers, you can use this brute-force technique:
>>> list_of_n = [1,0,5,4,2,1,2,3,4,5,6,8,2,7]
>>> from itertools import combinations
>>> [var for var in combinations(list_of_n, 2) if sum(var) == 10]
[(5, 5), (4, 6), (2, 8), (2, 8), (3, 7), (4, 6), (8, 2)]
So, if you want a 10 from 3 numbers in the list, you would put combinations(list_of_n, 3) instead of combinations(list_of_n, 2).
When you say
for i in list_n:
i will not refer to the indices, but to the list elements themselves. If you want just the indices,
for i in range(len(list_n)):
len(list_n) will give you the size of the list and range(len(list_n)) will give you a range of numbers starting from 0 and ending with len(list_n) - 1

finding n largest differences between two lists

I have two lists old and new, with the same number of elements.
I'm trying to write an efficient function that takes n as a parameter, compares the elements of two lists at the same locations (by index), finds n largest differences, and returns the indices of those n elements.
I was thinking this would be best solved by a value-sorted dictionary, but one isn't available in Python (and I'm not aware of any libraries that offer it). Perhaps there's a better solution?
Whenever you think "n largest", think heapq.
>>> import heapq
>>> import random
>>> l1 = [random.randrange(100) for _ in range(100)]
>>> l2 = [random.randrange(100) for _ in range(100)]
>>> heapq.nlargest(10, (((a - b), a, b) for a, b in zip(l1, l2)))
[(78, 99, 21), (75, 86, 11), (69, 90, 21), (69, 70, 1), (60, 86, 26), (55, 95, 40), (52, 56, 4), (48, 98, 50), (46, 80, 34), (44, 81, 37)]
This will find the x largest items in O(n log x) time, where n is the total number of items in the list; sorting does it in O(n log n) time.
It just occurred to me that the above doesn't actually do what you asked for. You want an index! Still very easy. I'll also use abs here in case you want the absolute value of the difference:
>>> heapq.nlargest(10, xrange(len(l1)), key=lambda i: abs(l1[i] - l2[i]))
[91, 3, 14, 27, 46, 67, 59, 39, 65, 36]
Assuming the number of elements in the lists aren't huge, you could just difference all of them, sort, and pick the first n:
print sorted((abs(x-y) for x,y in zip(old, new)), reverse=True)[:n]
This would be O(k log k) where k is the length of your original lists.
If n is significantly smaller than k, the best idea would be to use the nlargest function provided by the heapq module:
import heapq
print heapq.nlargest(n, (abs(x-y) for x,y in zip(old, new))
This will be O(k log n) instead of O(k log k) which can be significant for k >> n.
Also, if your lists are really big, you'd probably be better off using itertools.izip instead of the regular zip function.
From your question i think this is what you want:
In difference.py
l1 = [15,2,123,4,50]
l2 = [9,8,7,6,5]
l3 = zip(l1, l2)
def f(n):
diff_val = 0
index_val = 0
l4 = l3[:n]
for x,y in l4:
if diff_val < abs(x-y):
diff_val = abs(x-y)
elem = (x, y)
index_val = l3.index(elem)
print "largest diff: ", diff_val
print "index of values:", index_val
n = input("Enter value of n:")
f(n)
Execution:
[avasal#avasal ]# python difference.py
Enter value of n:4
largest diff: 116
index of values: 2
[avasal#avasal]#
if this is not what you want, consider elaborating the question little more..
>>> l = []
... for i in itertools.starmap(lambda x, y: abs(x-y), itertools.izip([1,2,3], [100,102,330])):
... l.append(i)
>>> l
5: [99, 100, 327]
itertools comes handy for repetitive tasks. From starmap converts tuples to *args. For reference. With max function you will be able to get the desired result. index function will help to find the position.
l.index(max(l)
>>> l.index(max(l))
6: 2
Here's a solution hacked together in numpy (disclaimer, I'm a novice in numpy so there may be even slicker ways to do this). I didn't combine any of the steps so it is very clear what each step was doing. The final value is a list of the indexes of the original lists in order of the highest delta. Picking the top n is simply sorted_inds[:n] and retrieving the values from each list or from the delta list is trivial.
I don't know how it compares in performance to the other solutions and it's obviously not going to show up with such a small data set, but it might be worth testing with your real data set as my understanding is that numpy is very very fast for numerical array operations.
Code
import numpy
list1 = numpy.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
list2 = numpy.array([9, 8, 7, 6, 5, 4, 3, 2, 1])
#Caculate the delta between the two lists
delta = numpy.abs(numpy.subtract(list1, list2))
print('Delta: '.ljust(20) + str(delta))
#Get a list of the indexes of the sorted order delta
sorted_ind = numpy.argsort(delta)
print('Sorted indexes: '.ljust(20) + str(sorted_ind))
#reverse sort
sorted_ind = sorted_ind[::-1]
print('Reverse sort: '.ljust(20) + str(sorted_ind))
Output
Delta: [8 6 4 2 0 2 4 6 8]
Sorted indexes: [4 3 5 2 6 1 7 0 8]
Reverse sort: [8 0 7 1 6 2 5 3 4]

Categories

Resources