Erathostenes sieve optimization [duplicate] - python

This question already has answers here:
A Fast Prime Number Sieve in Python
(2 answers)
Closed 9 years ago.
I've written the erathostenes algorithm in python a few weeks ago, and it looked like the following:
def erathostenes(n):
A = range(2,n+1)
B = []
i = 0
while A[i] < math.sqrt(n):
B.append(A[i])
j = i
aux = A[i]
while j < len(A):
if A[j]%aux == 0:
A[j] = 0
j += aux
i += 1
while A[i] == 0:
i += 1
for i in range(len(A)):
if A[i] != 0:
B.append(A[i])
i += 1
return B
After thinking a little (I'm noob in programming) i just did some modifications in my algorithm an right now looks like:
def erathostenes(n):
A = range(2,n + 1)
B = []
i = 0
raiz = math.sqrt(n)
lenA = len(A)
rangeLenA = range(lenA)
while A[i] < raiz:
B.append(A[i])
j = i
aux = A[i]
while j < lenA:
A[j] = 0
j += aux
i += 1
while A[i] == 0:
i += 1
for i in rangeLenA:
if A[i] != 0:
B.append(A[i])
i += 1
return B
If I execute the algorithm with n=10.000.000 the execution time in the first code is approximately 7 sec and with the second code it's about 4 seconds.
Any ideas about some more optimizations in my algorithm? thanks!

i += 1
in the last loop is funny.
Consider replacing
for i in rangeLenA:
with
for i in xrange(LenA)
you avoid generating a huge list you don't need.
EDIT:
Also consider this:
for j in xrange(i,lenA,aux):
instead of:
while j < lenA:
And fix the bug
while A[i] <= raiz:
as suggested by fryday.

There is a error in your code. Change
while A[i] < raiz:
on
while A[i] <= raiz:
You can found error when N is square.
For opimization use xrange for rangeLenA instead of range.

Tried to make a non-loop version just for fun. It came out like this:
def erathostenes(n):
def helper_function(num_lst, acc):
if not num_lst:
return acc
if len(num_lst) == 1:
acc.append(num_lst[0])
return acc
num = num_lst.pop(0)
multiples = ([x for x in range(num + 1, num_lst[-1] + 1)
if x % num == 0])
remains = ([x for x in num_lst if x not in multiples])
acc.append(num)
return helper_function(remains, acc )
return helper_function(range(2, n + 1), [])
When i ran the timing, got 826 us for the post erathostenes(1000), and 26ms for my version (!!). Surprised me it was so slow.
Functional programming it's more fun, but looks isn't the right fit for this problem, in Python (my guess is that it would be faster in a more functional language).
So i tried a imperative version. It looks like this:
def erathostenes_imperative(n):
limit = int(math.sqrt(n))
def helper_function(flags, size):
for i in range(2,limit):
if flags[i] == True:
j = 2*i
while j < size:
if j % i == 0:
flags[j] = False
j = j + i
return [x for x in range(2, n + 1) if flags[x]]
return helper_function([True]*(n + 1), n)
What i did was changing the list of integers into a list of True/False flags. Intuitively, looks like it's faster to iterate, right?
My results where 831ms for erathostenes_imperative(100000), vs. 1.45 in your version.
It's a shame that imperative writting it's faster. The code look so messy with all the fors, whiles, i's and j's

Try the Sieve of Atkin. It's similar, but it a modification of the Sieve of Eratosthenes, and it filters out all multiples of 2, 3, 5 right off, as well as a few other optimizations. You might also want to try to find a tool that tells you the run time of each operation and modify the operations with a larger run time.
However, since you're new to programming, you might best be served either implementing other algorithms, or doing other programming exercises to improve.

Related

Looking for ways to fix algorithm. Implementing combination sum II using dynamic programming

Basically, I am trying to implement combination sum II in python using dynamic programming, my goal was to create a program that does not have a time complexity of O(2^n) but I have been having a lot of issues and am unable to find solutions anywhere. The following is the code I have gotten so far, but it does not seem to give any output.
Expected output: [1,2,3],[1,5],[2,4]
Actual output: literally nothing
arr = [1,2,3,4,5]
def combinationSum(candidates, target):
counts = [0] * (target + 1)
for elem in candidates:
if elem <= target:
counts[elem] += 1
numbers = []
a = 1
while a <= target:
if counts[a] != 0:
numbers.append(a)
a += 1
subsets = [[]] * (target+1)
smallTarget = numbers[0]
while smallTarget <= target:
subset = []
for i in numbers:
if i > smallTarget:
break
if (((i == smallTarget) or (i <= smallTarget/2)) != True) :
continue
mList = subsets[smallTarget - i]
for j in mList:
if len(j) == 0 or j[0] >= i:
count = counts[number]
for k in j:
if k == i :
count -= 1
if count != 0:
tList = []
tList.append(i)
for l in j:
tList.append(l)
subset.add(tList)
subsets[smallTarget] = subset
smallTarget+=1
return subsets[target]
for i in combinationSum(arr, 6):
print(i)
To fix your code not printing answers, add an empty subset summing to 0 before your loop:
subsets[0].append([])
after subsets is created.
There are, however, several problems with this code, from variable names to repeated work. Take a look at several other approaches to the subset sum problem or just google "Subset sum" to see many existing solutions to your problem.

Time complexity of merge sort: function appears to be called 2*n-1 times rather than O(log n) times

I'm teaching a coding class and need an intuitive and obvious way to explain the time complexity of merge sort. I tried including a print statement at the start of my merge_sort() function, anticipating that the print statement would execute O(log n) times. However, as best as I can tell, it executes 2*n-1 times instead (Python code below):
merge_sort() function:
def merge_sort(my_list):
print("hi") #prints 2*n-1 times??
if(len(my_list) <= 1):
return
mid = len(my_list)//2
l = my_list[:mid]
r = my_list[mid:]
merge_sort(l)
merge_sort(r)
i = 0
j = 0
k = 0
while(i < len(l) or j < len(r)):
#print("hey") #prints nlogn times as expected
if(i >= len(l)):
my_list[k] = r[j]
j += 1
elif(j >= len(r)):
my_list[k] = l[i]
i += 1
elif(l[i] < r[j]):
my_list[k] = l[i]
i += 1
elif(l[i] > r[j]):
my_list[k] = r[j]
j += 1
k += 1
Driver code:
#print("Enter a list")
my_list = list(map(int, input().split()))
#print("Sorted list:")
#merge_sort(my_list)
print(my_list)
Input:
1 2 3 4 5 6 7 8
Expected output:
hi
hi
hi
or some variation thereof which varies proportional to log n.
Actual output:
hi
hi
hi
hi
hi
hi
hi
hi
hi
hi
hi
hi
hi
hi
hi #15 times, i.e. 2*n-1
A few more iterations of this with different input sizes have given me the impression that this is 2*n-1, which makes no sense to me. Does anyone have an explanation for this?
It is not true that there are only O(logn) recursive calls. The thing that is O(logn) is the depth of the recursion tree, not the number of nodes in the recursion tree.
When we look at one level of the recursion tree, then we can note that each call in that level deals with a distinct partition of the array. Together, the "nodes" in that recursion level, deal with all elements of the array, which gives that level a O(n) time complexity. This is true for each level.
As there are O(logn) levels, the total complexity comes down to O(nlogn).
Here is a suggestion on how to illustrate this:
statistics = []
def merge_sort(my_list, depth=0):
if len(my_list) <= 1:
return
# manage statistics
if depth >= len(statistics):
statistics.append(0) # for each depth we count operations
mid = len(my_list)//2
l = my_list[:mid]
r = my_list[mid:]
merge_sort(l, depth+1)
merge_sort(r, depth+1)
i = 0
j = 0
k = 0
while i < len(l) or j < len(r):
statistics[depth] += 1 # count this as a O(1) unit of work
if i >= len(l):
my_list[k] = r[j]
j += 1
elif j >= len(r):
my_list[k] = l[i]
i += 1
elif l[i] < r[j]:
my_list[k] = l[i]
i += 1
elif l[i] > r[j]:
my_list[k] = r[j]
j += 1
k += 1
import random
my_list = list(range(32))
random.shuffle(my_list)
merge_sort(my_list)
print(my_list)
print(statistics)
The statistics will output the number of units of work done at each level. In the example of an input size of 32, you'll get a list with 5 such numbers.
NB: In Python, if conditions don't need parentheses

ZigZag Quadruples

I've seen this interesting question, and wonder if there are more ways to approach it:
Given a permutation of numbers from 1 to n, count the number of
quadruples indices (i,j,k,l) such that i<j<k<l and A[i]<A[k]<A[j]<A[l]
e.g.
Input : [1,3,2,6,5,4]
Output : 1 (1,3,2,6)
Desired algorithm is O(n^2)
Approach:
I've tried to solve it using stack, in a similiar manner to Leetcode 132 pattern - but it seems to fail.
def get_smaller_before(A):
smaller_before = [0] * len(A)
for i in range(len(A)):
for j in range(i):
if A[j] < A[i]:
smaller_before[i] += 1
return smaller_before
def get_larger_after(A):
larger_after = [0] * len(A)
for i in range(len(A)):
for j in range(i+1, len(A)):
if A[i] < A[j]:
larger_after[i] += 1
return larger_after
def countQuadrples(nums):
if not nums:
return False
smaller_before = get_smaller_before(nums)
larger_after = get_larger_after(nums)
counter = 0
stack = []
for j in reversed(range(1, len(nums))):
# i < j < k < l
# smaller_before < nums[k] < nums[j] < larger_after
while stack and nums[stack[-1]] < nums[j]:
counter += smaller_before[j] * larger_after[stack[-1]]
stack.pop()
stack.append(j)
return counter
Does anyone has a better idea?
What you need is some sort of 2-dimensional tree that allows you to quickly answer the question "How many points after k have value bigger than A[j]," and the question "How many points before j have value less than A[k]?" These will usually be time O(n log(n)) to build and those queries should run in time something like O(log(n)^2)).
A number of such data structures exist. One option is a variant on a Quadtree. You you turn each array element into a point with x-coordinate the position in the array and y-coordinate being its value. And your queries are just counting how many elements are in a box.
And now you can do a double loop over all j, k and count how many zig-zag quadruples have those as the inner pair.

Is this most efficient to bubble sort a list in python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm trying to see if this is the most efficient way to sort a bubble list in python or if there are better ways some people tell me to use two loops, what are the benefits of doing like that vs the below
def sort_bubble(blist):
n = 0
while n < len(blist) - 1:
if blist[n] > blist[n + 1]:
n1 = blist[n]
n2 = blist[n + 1]
blist[n] = n2
blist[n + 1] = n1
n = 0
else:
n = n + 1
print blist
Your algorithm is technically a bubble sort in that it does exactly the swaps that it should. However, it's a very inefficient bubble sort, in that it does a lot more compares than are necessary.
How can you know that? It's pretty easy to instrument your code to count the number of compares and swaps. And meanwhile, Wikipedia gives implementations of a simple bubble sort, and one with the skip-sorted-tail optimization, in a pseudocode language that's pretty easy to port to Python and similarly instrument. I'll show the code at the bottom.
For a perfect bubble sort, given a random list of length 100, you should expect a bit under 10000 compares (100 * 100), and a bit under 2500 swaps. And the Wikipedia implementation does exactly that. The "skip-sorted-tail" version should have just over half as many compares, and it does.
Yours, however, has 10x as many compares as it should. The reason your code is inefficient is that it starts over at the beginning over and over, instead of starting where it swapped whenever possible. This causes an extra factor of O(sqrt(N)).
Meanwhile, almost any sort algorithm is better than bubble sort for almost any input, so even an efficient bubble sort is not an efficient sort.
I've made one minor change to your code: replacing the four-line swap with a more idiomatic single-line swap. Otherwise, nothing is changed but adding the cmpcount and swapcount variables, and returning the result instead of printing it.
def bogo_bubble(blist):
cmpcount, swapcount = 0, 0
n = 0
while n < len(blist) - 1:
cmpcount += 1
if blist[n] > blist[n + 1]:
swapcount += 1
blist[n], blist[n+1] = blist[n+1], blist[n]
n = 0
else:
n = n + 1
return blist, cmpcount, swapcount
This is the Psuedocode implementation from Wikipedia, translated to Python. I had to replace the repeat… unit with a while True… if not …: break, but everything else is trivial.
def wp1_bubble(blist):
cmpcount, swapcount = 0, 0
while True:
swapped = False
for i in range(1, len(blist)):
cmpcount += 1
if blist[i-1] > blist[i]:
swapcount += 1
blist[i-1], blist[i] = blist[i], blist[i-1]
swapped = True
if not swapped:
break
return blist, cmpcount, swapcount
This is the Optimizing bubble sort, which does the simple version of the skip-sorted-tail optimization, but not the more elaborate version (which comes right after it).
def wp2_bubble(blist):
cmpcount, swapcount = 0, 0
n = len(blist)
while True:
swapped = False
for i in range(1, n):
cmpcount += 1
if blist[i-1] > blist[i]:
swapcount += 1
blist[i-1], blist[i] = blist[i], blist[i-1]
swapped = True
n -= 1
if not swapped:
break
return blist, cmpcount, swapcount
import random
alist = [random.randrange(100) for _ in range(100)]
bb, cb, sb = bogo_bubble(alist[:])
b1, c1, s1 = wp1_bubble(alist[:])
b2, c2, s2 = wp2_bubble(alist[:])
assert bb == b1 == b2
print('bogo_bubble: {} cmp, {} swap'.format(cb, sb))
print('wp1_bubble : {} cmp, {} swap'.format(c1, s1))
print('wp2_bubble : {} cmp, {} swap'.format(c2, s2))
Typical output:
bogo_bubble: 100619 cmp, 2250 swap
wp1_bubble : 8811 cmp, 2250 swap
wp2_bubble : 4895 cmp, 2250 swap
This is how I would do it if I was forced to use bubble sort, you should probably always just use the default sort() function in python, it's very fast.
def BubbleSort(A):
end = len(A)-1
swapped = True
while swapped:
swapped = False
for i in range(0, end):
if A[i] > A[i+1]:
A[i], A[i+1] = A[i+1], A[i]
swapped = True
end -= 1
It's basically regular bubblesort but instead of traversing the entire list every time it only traverses up to the last swapped value, by definition any value past that is already in place.
Also you do not need to use temp values in python to swap, the pythonic way to do this is:
a , b = b , a
You could test it out yourself. Other things remaining the same, just counting the number of iterations will give you an idea, what is faster. Here is what I wrote:
def sort_bubble(blist):
ops=0
n = 0
while n < len(blist) - 1:
if blist[n] > blist[n + 1]:
n1 = blist[n]
n2 = blist[n + 1]
blist[n] = n2
blist[n + 1] = n1
n = 0
else:
n = n + 1
ops+=1
print ops
print blist
def bubbleSort(list):
ops=0
for i in range(len(list)):
for j in range(i):
if list[i] < list[j]:
list[i], list[j] = list[j], list[i]
ops+=1
print ops
return list
sort_bubble([ 6,5, 3 ,1, 8, 7, 2, 4])
print bubbleSort([ 6,5, 3 ,1, 8, 7, 2, 4])

How can I get this Python code to run more quickly? [Project Euler Problem #7]

I'm trying to complete this Project Euler challenge:
By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can
see that the 6th prime is 13.
What is the 10 001st prime number?
My code seem to be right because it works with small numbers, e.g 6th prime is 13.
How can i improve it so that the code will run much more quickly for larger numbers such as 10 001.
Code is below:
#Checks if a number is a prime
def is_prime(n):
count = 0
for i in range(2, n):
if n%i == 0:
return False
break
else:
count += 1
if count == n-2:
return True
#Finds the value for the given nth term
def term(n):
x = 0
count = 0
while count != n:
x += 1
if is_prime(x) == True:
count += 1
print x
term(10001)
UPDATE:
Thanks for your responses. I should have been more clear, I am not looking to speed up the interpreter or finding a faster interpreter, because i know my code isn't great, so i was looking for ways of make my code more efficient.
A few questions to ponder:
Do you really need to check the division until n-1? How earlier can you stop?
Apart from 2, do you really need to check the division by all the multiples of two ?
What about the multiples of 3? 5? Is there a way to extend this idea to all the multiples of previously tested primes?
The purpose of Project Euler is not really to think learn programming, but to think about algorithms. On problem #10, your algorithm will need to be even faster than on #7, etc. etc. So you need to come up with a better way to find prime numbers, not a faster way to run Python code. People solve these problems under the time limit with far slower computers that you're using now by thinking about the math.
On that note, maybe ask about your prime number algorithm on https://math.stackexchange.com/ if you really need help thinking about the problem.
A faster interpreter won't cut it. Even an implementation written in C or assembly language won't be fast enough (to be in the "about one second" timeframe of project Euler). To put it bluntly, your algorithm is pathetic. Some research and thinking will help you write an algorithm that runs faster in a dog-slow interpreter than your current algorithm implemented in native code (I won't name any specifics, partly because that's your job and partly because I can't tell offhand how much optimization will be needed).
Many of the Euler problems (including this one) are designed to have a solution that computes in acceptable time on pretty much any given hardware and compiler (well, not INTERCAL on a PDP-11 maybe).
You algorithm works, but it has quadratic complexity. Using a faster interpreter will give you a linear performance boost, but the quadratic complexity will dwarf it long before you calculate 10,000 primes. There are algorithms with much lower complexity; find them (or google them, no shame in that and you'll still learn a lot) and implement them.
Without discussing your algorithm, the PyPy interpreter can be ridiculously faster than the normal CPython one for tight numerical computation like this. You might want to try it out.
to check the prime number you dont have to run till n-1 or n/2....
To run it more faster,you can check only until square root of n
And this is the fastest algorithm I know
def isprime(number):
if number<=1:
return False
if number==2:
return True
if number%2==0:
return False
for i in range(3,int(sqrt(number))+1):
if number%i==0:
return False
return True
As most people have said, it's all about coming up with the correct algorithm. Have you considered looking at a
Sieve of Eratosthenes
import time
t=time.time()
def n_th_prime(n):
b=[]
b.append(2)
while len(b)<n :
for num in range(3,n*11,2):
if all(num%i!=0 for i in range(2,int((num)**0.5)+1)):
b.append(num)
print list(sorted(b))[n-1]
n_th_prime(10001)
print time.time()-t
prints
104743
0.569000005722 second
A pythonic Answer
import time
t=time.time()
def prime_bellow(n):
b=[]
num=2
j=0
b.append(2)
while len(b)-1<n:
if all(num%i!=0 for i in range(2,int((num)**0.5)+1)):
b.append(num)
num += 1
print b[n]
prime_bellow(10001)
print time.time()-t
Prints
104743
0.702000141144 second
import math
count = 0 <br/> def is_prime(n):
if n % 2 == 0 and n > 2:
return False
for i in range(3, int(math.sqrt(n)) + 1, 2):
if n % i == 0:
return False
return True
for i in range(2,2000000):
if is_prime(i):
count += 1
if count == 10001:
print i
break
I approached it a different way. We know that all multiples of 2 are not going to be prime (except 2) we also know that all non-prime numbers can be broken down to prime constituents.
i.e.
12 = 3 x 4 = 3 x 2 x 2
30 = 5 x 6 = 5 x 3 x 2
Therefore I iterated through a list of odd numbers, accumulating a list of primes, and only attempting to find the modulus of the odd numbers with primes in this list.
#First I create a helper method to determine if it's a prime that
#iterates through the list of primes I already have
def is_prime(number, list):
for prime in list:
if number % prime == 0:
return False
return True
EDIT: Originally I wrote this recursively, but I think the iterative case is much simpler
def find_10001st_iteratively():
number_of_primes = 0
current_number = 3
list_of_primes = [2]
while number_of_primes <= 10001:
if is_prime(current_number, list_of_primes):
list_of_primes.append(current_number)
number_of_primes += 1
current_number += 2
return current_number
A different quick Python solution:
import math
prime_number = 4 # Because 2 and 3 are already prime numbers
k = 3 # It is the 3rd try after 2 and 3 prime numbers
milestone = 10001
while k <= milestone:
divisible = 0
for i in range(2, int(math.sqrt(prime_number)) + 1):
remainder = prime_number % i
if remainder == 0: #Check if the number is evenly divisible (not prime) by i
divisible += 1
if divisible == 0:
k += 1
prime_number += 1
print(prime_number-1)
import time
t = time.time()
def is_prime(n): #check primes
prime = True
for i in range(2, int(n**0.5)+1):
if n % i == 0:
prime = False
break
return prime
def number_of_primes(n):
prime_list = []
counter = 0
num = 2
prime_list.append(2)
while counter != n:
if is_prime(num):
prime_list.append(num)
counter += 1
num += 1
return prime_list[n]
print(number_of_primes(10001))
print(time.time()-t)
104743
0.6159017086029053
based on the haskell code in the paper: The Genuine Sieve of Eratosthenes by Melissa E. O'Neill
from itertools import cycle, chain, tee, islice
wheel2357 = [2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]
def spin(wheel, n):
for x in wheel:
yield n
n = n + x
import heapq
def insertprime(p,xs,t):
heapq.heappush(t,(p*p,(p*v for v in xs)))
def adjust(t,x):
while True:
n, ns = t[0]
if n <= x:
n, ns = heapq.heappop(t)
heapq.heappush(t, (ns.next(), ns))
else:
break
def sieve(it):
t = []
x = it.next()
yield x
xs0, xs1 = tee(it)
insertprime(x,xs1,t)
it = xs0
while True:
x = it.next()
if t[0][0] <= x:
adjust(t,x)
continue
yield x
xs0, xs1 = tee(it)
insertprime(x,xs1,t)
it = xs0
primes = chain([2,3,5,7], sieve(spin(cycle(wheel2357), 11)))
from time import time
s = time()
print list(islice(primes, 10000, 10001))
e = time()
print "%.8f seconds" % (e-s)
prints:
[104743]
0.18839407 seconds
from itertools import islice
from heapq import heappush, heappop
wheel2357 = [2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,
4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]
class spin(object):
__slots__ = ('wheel','o','n','m')
def __init__(self, wheel, n, o=0, m=1):
self.wheel = wheel
self.o = o
self.n = n
self.m = m
def __iter__(self):
return self
def next(self):
v = self.m*self.n
self.n += self.wheel[self.o]
self.o = (self.o + 1) % len(self.wheel)
return v
def copy(self):
return spin(self.wheel, self.n, self.o, self.m)
def times(self, x):
return spin(self.wheel, self.n, self.o, self.m*x)
def adjust(t,x):
while t[0][0] <= x:
n, ns = heappop(t)
heappush(t, (ns.next(), ns))
def sieve_primes():
for p in [2,3,5,7]:
yield p
it = spin(wheel2357, 11)
t = []
p = it.next()
yield p
heappush(t, (p*p, it.times(p)))
while True:
p = it.next()
if t[0][0] <= p:
adjust(t,p)
continue
yield p
heappush(t, (p*p, it.times(p)))
from time import time
s = time()
print list(islice(sieve_primes(), 10000, 10001))[-1]
e = time()
print "%.8f seconds" % (e-s)
prints:
104743
0.22022200 seconds
import time
from math import sqrt
wheel2357 = [2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]
list_prime = [2,3,5,7]
def isprime(num):
limit = sqrt(num)
for prime in list_prime:
if num % prime == 0: return 0
if prime > limit: break
return 1
def generate_primes(no_of_primes):
o = 0
n = 11
w = wheel2357
l = len(w)
while len(list_prime) < no_of_primes:
i = n
n = n + w[o]
o = (o + 1) % l
if isprime(i):
list_prime.append(i)
t0 = time.time()
generate_primes(10001)
print list_prime[-1] # 104743
t1 = time.time()
print t1-t0 # 0.18 seconds
prints:
104743
0.307313919067

Categories

Resources