I'm interested in reordering the bits within a number, and since I want to do it several trillion times, I want to do it fast.
Here are the details: given a number num and an order matrix order.
order contains up to ~6000 lines of permutations of the numbers 0..31.
These are the positions to which the bits change.
Simplified example: binary(num) = 1001, order[1]=[0,1,3,2], reordered number for order[1] would be 1010 (binary).
Now I want to know, if my input number num is the smallest of these (~6000) reordered numbers. I'm searching for all 32-Bit numbers which fullfill this criterion.
My current approach is to slow, so I'm looking for a speedup.
minimal-reproducible-example:
num = 1753251840
order = [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]]
patterns=set()
bits = format(num, '032b')
for perm in order:
bitsn = [bits[perm[i]] for i in range(32)]
patterns.add(int(''.join(bitsn),2))
print( min(patterns)==num)
Where can I start to improve this?
Extracting bits using string is generally very inefficient (whatever the language). The same thing also apply for parsing. Moreover, for such a fast low-level operation, you need to use a JIT or a compiled language as comments already pointed out.
Here is a prototype using the Numba's JIT (assume all numbers are unsigned):
npOrder = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]], dtype=np.uint32)
#njit
def extractBits(num):
bits = np.empty(32, dtype=np.int32)
for i in range(32):
bits[i] = (num >> i) & 0x01
return bits
#njit
def permuteAndMerge(bits, perm):
bitsnFinal = 0
for i in range(32):
bitsnFinal |= bits[31-perm[i]] << i
return bitsnFinal
#njit
def computeOptimized(num):
bits = extractBits(num)
permCount = npOrder.shape[0]
patterns = np.empty(permCount, dtype=np.uint32)
for i in range(permCount):
patterns[i] = permuteAndMerge(bits, npOrder[i])
# The array can be converted to a set if needed here with: set(patterns)
return min(patterns) == num
This code is about 25 time faster than the original one on my machine (ran 5 000 000 times).
You can also use Numba to accelerate and parallelize the loop that run the function computeOptimized resulting in a significant additional speed-up.
Note that this code can be again much faster in C or C++ using low-level processor instructions (available for example on many x86_64 processors). With that and parallelism, the order of magnitude of the execution speed should be close to a billion of permutation per second.
Couple of possible speed-ups, staying with Python and the current algorithm:
Bail out as soon as you find a pattern less than num; once one like that is found, the condition cannot possibly be true. (You also don't need to store patterns; at most a flag whether an equal one was found, if that's not guaranteed by the problem.)
bitsn could be a generator expression, and doesn't need to be in a variable; you'll have to measure whether that's faster.
More fundamental improvements:
If you want to find all the numbers (rather than just test a particular one), it feels like there ought to be a faster algorithm by considering what the bits mean. A couple of hours thinking could potentially let you process just the 6000 lists, rather than all 2³² integers.
As others have written, if you're after pure speed, python is not the ideal language. That depends on the balance of how much time you want to spend on programming vs on running the program.
Side note:
Are the 32-bit integers signed or unsigned?
Related
I have the following code, which on first glance should produce 10 Jobs with 3 Tasks each.
class Job:
id = None
tasks = {}
class Task:
id = None
cnt = 0
jobs = []
for i in range(0, 10):
job = Job()
job.id = i
for ii in range(0, 3):
task = Task()
task.id = cnt
job.tasks[task.id] = task
cnt += 1
jobs.append(job)
for job in jobs:
print("job {}, tasks: {}".format(job.id, job.tasks.keys()))
The result is somehow surprising - we have 30 Tasks shared by each Job:
job 0, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 1, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 2, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 3, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 4, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 5, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 6, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 7, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 8, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
job 9, tasks: dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
Can someone explain what is going on in here?
UPDATE
tasks is a class variable shared by all the instances.
In your Job class you need to do this
class Job:
id = None
def __init__(self):
self.tasks = {}
tasks is in your class and each time you are appending to the class tasks which is shared by all the instances.
I have a graph g and I want to find the clusters in this graph using igraph, Here's my code:
g = Graph.Read_Ncol('karate.txt', directed=False)
p = g.community_label_propagation()
I tried to print the clusters in 2 ways, first:
print(p)
second:
for idx, cluster in enumerate(p):
print(cluster)
Here's the output of the first one:
[0] 0, 1, 3, 4, 6, 7, 10, 11, 12, 13, 17, 19, 21
[1] 2, 8, 31, 30, 9, 27, 28, 32, 33, 14, 15, 18, 20, 22, 23, 25, 29, 24, 26
[2] 5, 16
and the output of the second one is:
[0, 1, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15]
[2, 8, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]
[5, 22]
I was wondering why the clusters are different in these two outputs.
You are printing different data structures.
In the first,you are printing the entire clustering. Apparently,someone of the igraph authors decided that printing the cluster number is a good idea, too.
In the second case, that would be your responsibility.
Note that the output of
a=[[1,2],[3,4]]
print a
for row in a: print row
is not the same either.
Y, X = np.mgrid[-3:-3:10j, -3:3:10j]
I've noticed that when applying certain operations on meshgrids like the one above I get an error because the operations may not be compatible with numpy. Sometimes there might be a numpy function alternative for sin, cos but not for all function like the quad function in scipy.integrate.
How do I get around this problem? I need to apply operations on the entire meshgrids.
Your question (with the follow-on comment) can be taken at least two different ways:
You have a function of multiple arguments, and you would like to be able to call that function in a manner that is syntactically similar to the broadcasted calls supported natively by numpy. Performance is not the issue, just the calling syntax of the function.
You have a function of multiple arguments that is to be evaluated on a sequence of numpy arrays, but the function is not implemented in such a manner that it can exploit the contiguous memory layout of numpy arrays. Performance is the issue; you would be happy to loop over the numpy arrays and call your function in a boring, plain old for-loop style, except that doing so is too slow.
For item 1. there is a convenience function provided by numpy called vectorize which takes a regular callable and returns a callable that can be called with numpy arrays as the arguments and will obey numpy's broadcasting rules.
Consider this contrived example:
def my_func(x, y):
return x + 2*y
Now suppose I need to evaluate this function everywhere in a 2-D grid. Here is the plain old boring way:
Y, X = np.mgrid[0:10:1, 0:10:1]
Z = np.zeros_like(Y)
for i in range(Y.shape[0]):
for j in range(Y.shape[1]):
Z[i,j] = my_func(X[i,j], Y[i,j])
If we had a few different functions like my_func, it might be nice to generalize this process into a function that "mapped" a given function over the 2-D arrays.
import itertools
def array_map(some_func, *arg_arrays):
output = np.zeros_like(arg_arrays[0])
coordinates = itertools.imap(range, output.shape)
for coord in itertools.product(coordinates):
args = [arg_array[coord] for arg_array in arg_arrays]
output[coord] = some_func(*args)
return output
Now we can see that array_map(my_func, X, Y) acts just like the nested for-loop:
In [451]: array_map(my_func, X, Y)
Out[451]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25],
[18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])
Now, wouldn't it be nice if we could call array_map(my_func) and leave off the extra array arguments? Instead just getting back a new function that was just waiting to do the required for-loops.
We can do this with functools.partial -- so we can write a handy little vectorizer like this:
import functools
def vectorizer(regular_function):
awesome_function = functools.partial(array_map, regular_function)
return awesome_function
and testing it out:
In [453]: my_awesome_func = vectorizer(my_func)
In [454]: my_awesome_func(X, Y)
Out[454]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25],
[18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])
Now my_awesome_func behaves as if you are able to call it directly on top of ndarrays!
I've overlooked many extra little performance details, bounds checking, etc., while making this toy version called vectorizer ... but luckily in numpy there is vectorize which already does just this!
In [455]: my_vectorize_func = np.vectorize(my_func)
In [456]: my_vectorize_func(X, Y)
Out[456]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25],
[18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])
Once again, as stressed in my earlier comments to the OP and in the documentation for vectorize -- this is not a speed optimization. In fact, the extra function calling overhead will be slower in some cases than just writing a for-loop directly. But, for cases when speed is not a problem, this method does allow you to make your custom functions adhere to the same calling conventions as numpy -- which can improve the uniformity of your library's interface and make the code more consistent and more readable.
A whole lot of other stuff has already been written about item 2. If your problem is that you need to optimize your functions to leverage contiguous blocks of memory and by-passing repeated dynamic type checking (the main features that numpy arrays add to Python lists) then here are a few links you may find helpful:
< http://pandas.pydata.org/pandas-docs/stable/enhancingperf.html >
< http://csl.name/C-functions-from-Python/ >
< https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow >
< nbviewer.ipython.org/url/jakevdp.github.io/downloads/notebooks/NumbaCython.ipynb >
I've been trying to wrap my head around the best way to split this list of numbers up that are ordered but broken up in sections. Ex:
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 29, 30, 31, 32, 33, 35, 36, 44, 45, 46, 47]
I'd like the output to be this..
sliced_data = [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],[29, 30, 31, 32, 33],[35, 36],[44, 45, 46, 47]]
I've been trying a while look until it's empty but that isn't working too well..
Edit:
for each_half_hour in half_hour_blocks:
if next_number != each_half_hour:
skippers.append(half_hour_blocks[:next_number])
del half_hour_blocks[:next_number]
next_number = each_half_hour + 1
>>> data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 29, 30, 31, 32, 33, 35, 36, 44, 45, 46, 47]
>>> from itertools import groupby, count
>>> [list(g) for k,g in groupby(data, key=lambda i, c=count():i-next(c))]
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [29, 30, 31, 32, 33], [35, 36], [44, 45, 46, 47]]
I don't see why a while-loop wouldn't work here, unless you're going for something more efficient or succinct.
Something like:
slice = [data.pop(0)]
sliced_data = []
while data:
if data[0] == slice[-1] + 1:
slice.append(data.pop(0))
else:
sliced_data.append(slice)
slice = [data.pop(0)]
sliced_data.append(slice)
I am pretty new to Python, so thank you in advance for your help with this noob question.
import numpy as np
a=np.arange(30)
a[5]=a[10]=0
print a
array([10, 11, 12, 13, 14, 0, 16, 17, 18, 19, 0, 21, 22, 23, 24])
How could I change the zeros to be the preceding value? I know I can easily change the zeros to a constant by:
a[a==0]=3
array([10, 11, 12, 13, 14, 3, 16, 17, 18, 19, 3, 21, 22, 23, 24])
but I am looking for something that will return me:
array([10, 11, 12, 13, 14, 14, 16, 17, 18, 19, 19, 21, 22, 23, 24])
I've a feeling it has something to do with masked arrays, but I can't find any examples of the sort by googling. Thank you.
In answer to #DSM's question, the first value will never be a zero, but there may be a string of contiguous zeros.
So I'd like to be able to transform:
array([10, 11, 12, 13, 14, 0, 0, 0, 18, 19, 0, 21, 22, 23, 24])
into
array([10, 11, 12, 13, 14, 14, 14, 14, 18, 19, 19, 21, 22, 23, 24])
Handling contiguity was tricky. How about:
def fill_from_left(a, x=0):
to_fill = (a == x)
if a[0] == x:
raise ValueError("cannot have {} as first element".format(x))
if to_fill.any():
lefts = ~to_fill & (np.roll(a, -1) == x)
fill_from = lefts.cumsum()
fill_with = a[np.where(lefts)[0]][fill_from - 1]
a[to_fill] = fill_with[to_fill]
which gives
>>> a = np.array([1,2,3,0,4,0,0,5])
>>> fill_from_left(a)
>>> a
array([1, 2, 3, 3, 4, 4, 4, 5])
To be honest, though, most of the time when I have "missing values" it's because I'm working with real data, in which case I tend to use pandas instead of bare numpy. And Series are much easier to fill than ndarrays: this would simply be s.replace(0, np.nan).ffill().
How about using numpy.roll to shift all the values of an array to the left by one, and then use numpy.where:
>>> a=np.arange(10, 25)
>>> a[5]=a[10]=0
>>> a
array([10, 11, 12, 13, 14, 0, 16, 17, 18, 19, 0, 21, 22, 23, 24])
>>> np.where( a == 0, np.roll( a, 1 ), a )
array([10, 11, 12, 13, 14, 14, 16, 17, 18, 19, 19, 21, 22, 23, 24])