I wanted to code a prime number generator in python - I've only done this in C and Java. I did the following. I used an integer bitmap as an array. Performance of the algorithm should increase nlog(log(n)) but I am seeing exponential increase in cost/time as the problem size n increases. Is this something obvious I am not seeing or don't know about python as integers grow larger than practical? I am using python-3.8.3.
def countPrimes(n):
if n < 3:
return []
arr = (1 << (n-1)) - 2
for i in range(2, n):
selector = 1 << (i - 1)
if (selector & arr) == 0:
continue
# We have a prime
composite = selector
while (composite := composite << i) < arr:
arr = arr & (~composite)
primes = []
for i in range(n):
if (arr >> i) & 1 == 1:
primes.append(i+1)
return primes
Some analysis of my runtime:
A plot of y = nlog(log(n)) (red line which is steeper) and y = x (blue line which is less steep):
I'd normally not use integers with sizes exceeding uint64, because python allows unlimited size integers and I'm just testing, I used the above approach. As I said, I am trying to understand why the algorithm time increases exponentially with problem size n.
I used an integer bitmap as an array
That's extremely expensive. Python ints are immutable. Every time you want to toggle a bit, you're building a whole new gigantic int.
You also need to build other giant ints just to access single bits you're interested in - for example, composite and ~composite are huge in arr = arr & (~composite), even though you're only interested in 1 bit.
Use an actual mutable sequence type. Maybe a list, maybe a NumPy array, maybe some bitvector type off of PyPI, but don't use an int.
Below code is running so slow. I tried using numpy.argwhere instead of "if statement" to speed up the code and I got a pretty efficient result but it's still very slow. I also tried numpy.frompyfunc and numpy.vectorize but I failed. What would you suggest to speed up the code below?
import numpy as np
import time
time1 = time.time()
n = 1000000
k = 10000
velos = np.linspace(-1000, 1000, n)
line_centers = np.linspace(-1000, 1000, k)
weights = np.random.random_sample(k)
rvs = np.arange(-60, 60, 2)
m = len(rvs)
w = np.arange(10)
M = np.zeros((n, m))
for l, lc in enumerate(line_centers):
vi = velos - lc
for j in range(m - 1):
w = np.argwhere((vi < rvs[j + 1]) & (vi > rvs[j])).T[0]
M[w, j] = weights[l] * (rvs[j + 1] - vi[w]) / (rvs[j + 1] - rvs[j])
M[w, j + 1] = weights[l] * (vi[w] - rvs[j]) / (rvs[j + 1] - rvs[j])
time2 = time.time()
print(time2 - time1)
EDIT:
The size of the array M was incorrect. I fixed it.
This seems like a situation where a c++ interface could come in handy. With Pybind11 you can create c++ functions which take numpy arrays as argument, manipulate them and return them back to python. That would speed up you loops. Take a look at it!
Of course it is slow, you have two nested loops! You need to rethink your algorithm using vector operations, as in, no iteration over indices, but implement in terms of index or boolean arrays, and index shifts.
You have not given any background information, so it is incredibly hard for anyone to suggest something meaningful (given the soup of indices in the example). A few quick suggestions based on quickly gleaning over your example.
An expression like this (rvs[j + 1] - rvs[j]) is easily replaced with numpy.ediff1d.
You seem to be iterating through n in blocks of m, maybe numpy.nditer will be of use.
I have a hunch that your inner loop has an error, are you sure you really mean to iterate over range(m - 1)? That would mean you are iterating from 0 to m-2 (inclusive), I doubt you meant that.
We can help with more concrete answers if you provide more background information.
Implementing a Local Search algorithm for the Quadratic Assignment Problem in Python with Numpy, I've found the use of for-loops to be problematic at best, given that CPython is very slow when it comes to heavy mathematical computation.
My code has a 3-level nested for-loop which iterates a solution (numpy ndarray), iterates over a mask (numpy ndarray), and iterates the solution again, then doing some computations that, and this is the troublesome bit, may affect later iterations.
My question is, is it even possible to somehow precompute this? With the nature of the problem, I'm not really sure. As the problem is a Quadratic, for small sizes it's acceptable, but over 100 elements results in a lot of iterations.
The code is as follows:
while has_improved and iterations < 100:
for r in range(n):
for b in dlb:
if b: continue
has_improved = False
for s in range(n):
move_cost = cost_fn_delta((flow, distance), solution, r, s)
if move_cost < 0:
# Swap the indices
solution[r], solution[s] = solution[s], solution[r]
cost += move_cost
dlb[r], dlb[s] = False, False
has_improved = True
iterations += 1
if not has_improved:
dlb[r] = True
I am trying to create a generator that returns numbers in a given range that pass a particular test given by a function foo. However I would like the numbers to be tested in a random order. The following code will achieve this:
from random import shuffle
def MyGenerator(foo, num):
order = list(range(num))
shuffle(order)
for i in order:
if foo(i):
yield i
The Problem
The problem with this solution is that sometimes the range will be quite large (num might be of the order 10**8 and upwards). This function can become slow, having such a large list in memory. I have tried to avoid this problem, with the following code:
from random import randint
def MyGenerator(foo, num):
tried = set()
while len(tried) <= num - 1:
i = randint(0, num-1)
if i in tried:
continue
tried.add(i)
if foo(i):
yield i
This works well most of the time, since in most cases num will be quite large, foo will pass a reasonable number of numbers and the total number of times the __next__ method will be called will be relatively small (say, a maximum of 200 often much smaller). Therefore its reasonable likely we stumble upon a value that passes the foo test and the size of tried never gets large. (Even if it only passes 10% of the time, we wouldn't expect tried to get larger than about 2000 roughly.)
However, when num is small (close to the number of times that the __next__ method is called, or foo fails most of the time, the above solution becomes very inefficient - randomly guessing numbers until it guesses one that isn't in tried.
My attempted solution...
I was hoping to use some kind of function that maps the numbers 0,1,2,..., n onto themselves in a roughly random way. (This isn't being used for any security purposes and so doesn't matter if it isn't the most 'random' function in the world). The function here (Create a random bijective function which has same domain and range) maps signed 32-bit integers onto themselves, but I am not sure how to adapt the mapping to a smaller range. Given num I don't even need a bijection on 0,1,..num just a value of n larger than and 'close' to num (using whatever definition of close you see fit). Then I can do the following:
def mix_function_factory(num):
# something here???
def foo(index):
# something else here??
return foo
def MyGenerator(foo, num):
mix_function = mix_function_factory(num):
for i in range(num):
index = mix_function(i)
if index <= num:
if foo(index):
yield index
(so long as the bijection isn't on a set of numbers massively larger than num the number of times index <= num isn't True will be small).
My Question
Can you think of one of the following:
A potential solution for mix_function_factory or even a few other potential functions for mix_function that I could attempt to generalise for different values of num?
A better way of solving the original problem?
Many thanks in advance....
The problem is basically generating a random permutation of the integers in the range 0..n-1.
Luckily for us, these numbers have a very useful property: they all have a distinct value modulo n. If we can apply some mathemical operations to these numbers while taking care to keep each number distinct modulo n, it's easy to generate a permutation that appears random. And the best part is that we don't need any memory to keep track of numbers we've already generated, because each number is calculated with a simple formula.
Examples of operations we can perform on every number x in the range include:
Addition: We can add any integer c to x.
Multiplication: We can multiply x with any number m that shares no prime factors with n.
Applying just these two operations on the range 0..n-1 already gives quite satisfactory results:
>>> n = 7
>>> c = 1
>>> m = 3
>>> [((x+c) * m) % n for x in range(n)]
[3, 6, 2, 5, 1, 4, 0]
Looks random, doesn't it?
If we generate c and m from a random number, it'll actually be random, too. But keep in mind that there is no guarantee that this algorithm will generate all possible permutations, or that each permutation has the same probability of being generated.
Implementation
The difficult part about the implementation is really just generating a suitable random m. I used the prime factorization code from this answer to do so.
import random
# credit for prime factorization code goes
# to https://stackoverflow.com/a/17000452/1222951
def prime_factors(n):
gaps = [1,2,2,4,2,4,2,4,6,2,6]
length, cycle = 11, 3
f, fs, next_ = 2, [], 0
while f * f <= n:
while n % f == 0:
fs.append(f)
n /= f
f += gaps[next_]
next_ += 1
if next_ == length:
next_ = cycle
if n > 1: fs.append(n)
return fs
def generate_c_and_m(n, seed=None):
# we need to know n's prime factors to find a suitable multiplier m
p_factors = set(prime_factors(n))
def is_valid_multiplier(m):
# m must not share any prime factors with n
factors = prime_factors(m)
return not p_factors.intersection(factors)
# if no seed was given, generate random values for c and m
if seed is None:
c = random.randint(n)
m = random.randint(1, 2*n)
else:
c = seed
m = seed
# make sure m is valid
while not is_valid_multiplier(m):
m += 1
return c, m
Now that we can generate suitable values for c and m, creating the permutation is trivial:
def random_range(n, seed=None):
c, m = generate_c_and_m(n, seed)
for x in range(n):
yield ((x + c) * m) % n
And your generator function can be implemented as
def MyGenerator(foo, num):
for x in random_range(num):
if foo(x):
yield x
That may be a case where the best algorithm depends on the value of num, so why not using 2 selectable algorithms wrapped in one generator ?
you could mix your shuffle and set solutions with a threshold on the value of num. That's basically assembling your 2 first solutions in one generator:
from random import shuffle,randint
def MyGenerator(foo, num):
if num < 100000 # has to be adjusted by experiments
order = list(range(num))
shuffle(order)
for i in order:
if foo(i):
yield i
else: # big values, few collisions with random generator
tried = set()
while len(tried) < num:
i = randint(0, num-1)
if i in tried:
continue
tried.add(i)
if foo(i):
yield i
The randint solution (for big values of num) works well because there aren't so many repeats in the random generator.
Getting the best performance in Python is much trickier than in lower-level languages. For example, in C, you can often save a little bit in hot inner loops by replacing a multiplication by a shift. The overhead of python bytecode-orientation erases this. Of course, this changes again when you consider which variant of "python" you're targetting (pypy? numpy? cython?)- you really have to write your code based on which one you're using.
But even more important is arranging operations to avoid serialized dependencies, since all CPUs are superscalar these days. Of course, real compilers know about this, but it still matters when choosing an algorithm.
One of the easiest ways to gain a little bit over existing answers would be by by generating numbers in chunks using numpy.arange() and applying the ((x + c) * m) % n to the numpy ndarray directly. Every python-level loop that can be avoided helps.
If the function can be applied directly to numpy ndarrays, that might even better. Of course, a sufficiently-small function in python will be dominated by function-call overhead anyway.
The best fast random-number-generator today is PCG. I wrote a pure-python port here but concentrated on flexibility and ease-of-understanding rather than speed.
Xoroshiro128+ is second-best-quality and faster, but less informative to study.
Python's (and many others') default choice of Mersenne Twister is among the worst.
(there's also something called splitmix64 which I don't know enough about to place - some people say it's better than xoroshiro128+, but it has a period problem - of course, you might want that here)
Both default-PCG and xoroshiro128+ use a 2N-bit state to generate N-bit numbers. This is generally desirable, but means numbers will be repeated. PCG has alternate modes that avoid this, however.
Of course, much of this depends on whether num is (close to) a power of 2. In theory, PCG variants can be created for any bit width, but currently only various word sizes are implemented since you'd need explicit masking. I'm not sure exactly how to generate the parameters for new bit sizes (perhaps it's in the paper?), but they can be tested simply by doing a period/2 jump and verifying that the value is different.
Of course, if you're only making 200 calls to the RNG, you probably don't actually need to avoid duplicates on the math side.
Alternatively, you could use an LFSR, which does exist for every bit size (though note that it never generates the all-zeros value (or equivalently, the all-ones value)). LFSRs are serial and (AFAIK) not jumpable, and thus can't be easily split across multiple tasks. Edit: I figured out that this is untrue, simply represent the advance step as a matrix, and exponentiate it to jump.
Note that LFSRs do have the same obvious biases as simply generating numbers in sequential order based on a random start point - for example, if rng_outputs[a:b] all fail your foo function, then rng_outputs[b] will be much more likely as a first output regardless of starting point. PCG's "stream" parameter avoids this by not generating numbers in the same order.
Edit2: I have completed what I thought was a "brief project" implementing LFSRs in python, including jumping, fully tested.
My problem sees me cycling through a long number in order to find the largest product of 5 consecutive digits within said number. I have a solution, but it currently involves the hard coding of the elements' positions, feels/looks hideous and is not scalable (what if I wanted the the sum of the 10 consecutive terms?). Is there a way to "Python" up this solution and nest it or optimise it in some way?
n = 82166370484403199890008895243450658541227588666881
N = str(n)
Pro = 0
for i in range(0, len(N) - 4):
TemPro= int(N[i])*int(N[i+1])*int(N[i+2])*int(N[i+3])*int(N[i+4])
if TemPro> Pro :
Pro = TemPro
print(Pro )
OS: Windows 7
Language: Python 3
perfect case for using reduce on a slice of N:
from functools import reduce # python 3
nb_terms = 5
for i in range(0, len(N) - nb_terms - 1):
TemPro= reduce(lambda x,y:int(x)*int(y),N[i:i+nb_terms])
if TemPro> Pro :
Pro = TemPro
print(Pro)
reduce will multiply all the items together, without visible loop, and without hardcoding the number of terms.
You can do this quite concisely, by first converting the whole integer to a series of digits, and then calculating the products of a sliding window using reduce, mul and a slice.
from functools import reduce
from operator import mul
n = 82166370484403199890008895243450658541227588666881
def largest_prod(n, length=5):
digits = [int(d) for d in str(n)]
return max(reduce(mul, digits[i:i + length]) for i in range(len(digits) - length + 1))
print(largest_prod(n))
Note that this method of finding the digits is theoretically kind of slow - for all intents and purposes it's fast enough, but it involves some unnecessary object creation. If you really care about performance you can use an arithmetic approach similar to what I discussed in my answer here.