Related
I have an ordered Python list of forms:
[1, 2, 3, 4, 5, 12, 13, 14, 15, 20, 21, 22, 23, 30, 35, 36, 37, 38, 39, 40]
How can I group together consecutive numbers in a list. A group like this:
[[1, 2, 3, 4, 5], [12, 13, 14, 15], [20, 21, 22, 23,], [30], [35, 36, 37, 38, 39, 40]]
I tried using groupby from here but was not able to tailor it to my need.
Thanks,
You could use negative indexing:
def group_by_missing(seq):
if not seq:
return seq
grouped = [[seq[0]]]
for x in seq[1:]:
if x == grouped[-1][-1] + 1:
grouped[-1].append(x)
else:
grouped.append([x])
return grouped
Example Usage:
>>> lst = [1, 2, 3, 4, 5, 12, 13, 14, 15, 20, 21, 22, 23, 30, 35, 36, 37, 38, 39, 40]
>>> group_by_missing(lst)
[[1, 2, 3, 4, 5], [12, 13, 14, 15], [20, 21, 22, 23], [30], [35, 36, 37, 38, 39, 40]]
A fancy pythonic way to do it with less lines would be possible with the reduce function from functools and a lambda function with an inline if as a criteria for the reduce:
import functools
lis = [1, 2, 3, 4, 5, 12, 13, 14, 15, 20, 21, 22, 23, 30, 35, 36, 37, 38, 39, 40]
result = functools.reduce(lambda x,y : x[:-1]+[x[-1]+[y]] if (x[-1][-1]+1==y) else [*x,[y]], lis[1:] , [[lis[0]]] )
print(result)
Is it possible to apply few conditions to numpy.array while indexing them? In my case I want to show first 10 elements and then 2 neighbour elements with step 5:
numpy.arange(40)
#Output is:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39])
Applying my conditions to this array I want to get this:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 14, 15, 20, 21, 26, 27, 32,
33, 38, 39])
I haven't found any solution. I thought it should look something like this:
np.arange(40)[0:10, 10:len(np.arange(40)):5]
But it's not working for me.
You can try custom indexing on reshaped array:
n = 40
idx = np.zeros(n//2, dtype=bool)
idx[:5] = True
idx[4:None:3] = True
>>> np.arange(n).reshape(-1,2)[idx]
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[14, 15],
[20, 21],
[26, 27],
[32, 33],
[38, 39]])
>>> np.arange(n).reshape(-1,2)[idx].ravel()
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 14, 15, 20, 21, 26, 27, 32,
33, 38, 39])
I'm interested in reordering the bits within a number, and since I want to do it several trillion times, I want to do it fast.
Here are the details: given a number num and an order matrix order.
order contains up to ~6000 lines of permutations of the numbers 0..31.
These are the positions to which the bits change.
Simplified example: binary(num) = 1001, order[1]=[0,1,3,2], reordered number for order[1] would be 1010 (binary).
Now I want to know, if my input number num is the smallest of these (~6000) reordered numbers. I'm searching for all 32-Bit numbers which fullfill this criterion.
My current approach is to slow, so I'm looking for a speedup.
minimal-reproducible-example:
num = 1753251840
order = [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]]
patterns=set()
bits = format(num, '032b')
for perm in order:
bitsn = [bits[perm[i]] for i in range(32)]
patterns.add(int(''.join(bitsn),2))
print( min(patterns)==num)
Where can I start to improve this?
Extracting bits using string is generally very inefficient (whatever the language). The same thing also apply for parsing. Moreover, for such a fast low-level operation, you need to use a JIT or a compiled language as comments already pointed out.
Here is a prototype using the Numba's JIT (assume all numbers are unsigned):
npOrder = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]], dtype=np.uint32)
#njit
def extractBits(num):
bits = np.empty(32, dtype=np.int32)
for i in range(32):
bits[i] = (num >> i) & 0x01
return bits
#njit
def permuteAndMerge(bits, perm):
bitsnFinal = 0
for i in range(32):
bitsnFinal |= bits[31-perm[i]] << i
return bitsnFinal
#njit
def computeOptimized(num):
bits = extractBits(num)
permCount = npOrder.shape[0]
patterns = np.empty(permCount, dtype=np.uint32)
for i in range(permCount):
patterns[i] = permuteAndMerge(bits, npOrder[i])
# The array can be converted to a set if needed here with: set(patterns)
return min(patterns) == num
This code is about 25 time faster than the original one on my machine (ran 5 000 000 times).
You can also use Numba to accelerate and parallelize the loop that run the function computeOptimized resulting in a significant additional speed-up.
Note that this code can be again much faster in C or C++ using low-level processor instructions (available for example on many x86_64 processors). With that and parallelism, the order of magnitude of the execution speed should be close to a billion of permutation per second.
Couple of possible speed-ups, staying with Python and the current algorithm:
Bail out as soon as you find a pattern less than num; once one like that is found, the condition cannot possibly be true. (You also don't need to store patterns; at most a flag whether an equal one was found, if that's not guaranteed by the problem.)
bitsn could be a generator expression, and doesn't need to be in a variable; you'll have to measure whether that's faster.
More fundamental improvements:
If you want to find all the numbers (rather than just test a particular one), it feels like there ought to be a faster algorithm by considering what the bits mean. A couple of hours thinking could potentially let you process just the 6000 lists, rather than all 2³² integers.
As others have written, if you're after pure speed, python is not the ideal language. That depends on the balance of how much time you want to spend on programming vs on running the program.
Side note:
Are the 32-bit integers signed or unsigned?
I have written this code:
rand_map, lst = [2, 2, 6, 6, 8, 11, 4], []
for i in range(len(rand_map)):
num = rand_map[i]
lst.append(num)
for j in range(i+1, len(rand_map)):
assembly = num + rand_map[j]
num += rand_map[j]
lst.append(assembly)
print(sorted(lst))
Which gives this output:
[2, 2, 4, 4, 6, 6, 8, 8, 10, 11, 12, 14, 14, 15, 16, 19, 20, 22, 23, 24, 25, 29, 31, 33, 35, 35, 37, 39]
I've been trying to rewrite this code using list comprehensions, but I don't know how. I have tried multiple ways (standard and itertools) but I just can't get it right. I'll be very grateful for your help!
I came up with a couple of approaches for this problem:
Approach 1 - Vanilla list comprehension
In this approach, we iterate two variables, i and j and calculate the sum of the elements between these two indexes.
Code:
>>> rand_map = [2, 2, 6, 6, 8, 11, 4]
>>> sorted([sum(rand_map[i:i+j+1]) for i in range(len(rand_map)) for j in range(len(rand_map)-i)])
[2, 2, 4, 4, 6, 6, 8, 8, 10, 11, 12, 14, 14, 15, 16, 19, 20, 22, 23, 24, 25, 29, 31, 33, 35, 35, 37, 39]
Approach 2 - Itertools
In this approach, we use the itertools recipe from here to iterate n-wise through the rand_map list, and calculate the sums accordingly. This works in approximately the same way as the first approach, but is a bit tider.
Code:
from itertools import islice
def n_wise(iterable, n):
return zip(*(islice(iterable, i, None) for i in range(n)))
print(sorted([sum(x) for n in range(len(rand_map)) for x in n_wise(rand_map, n+1)]))
Output:
[2, 2, 4, 4, 6, 6, 8, 8, 10, 11, 12, 14, 14, 15, 16, 19, 20, 22, 23, 24, 25, 29, 31, 33, 35, 35, 37, 39]
For example: I want to split range(37) in n=5 chunks, which each chunk having
len(chunk) >= 4.
>>> def divide(lst, min_size, split_size):
it = iter(lst)
from itertools import islice
size = len(lst)
for i in range(split_size - 1,0,-1):
s = random.randint(min_size, size - min_size * i)
yield list(islice(it,0,s))
size -= s
yield list(it)
>>> list(divide(range(37), 4, 5))
[[0, 1, 2, 3], [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], [23, 24, 25, 26, 27], [28, 29, 30, 31], [32, 33, 34, 35, 36]]
>>> list(divide(range(37), 4, 5))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [17, 18, 19, 20, 21, 22], [23, 24, 25, 26], [27, 28, 29, 30, 31], [32, 33, 34, 35, 36]]
>>> list(divide(range(37), 4, 5))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28], [29, 30, 31, 32], [33, 34, 35, 36]]
>>> list(divide(range(37), 4, 5))
[[0, 1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [17, 18, 19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36]]
>>>
For example you could initialy set each of n chunks size to 4 and then calculate: r = (m=37 mod n), if m>=20. And then just add 1 to the first chunk and decrease r, 1 to second chunk and decrease r....and repeat until r = 0. Then you have your chunks and you can fill them.
def divide(val, num=5, minSize=4):
''' Divides val into # num chunks with each being at least of size minSize.
It limits max size of a chunk using math.ceil(val/(num-len(chunks)))'''
import random
import math
chunks = []
for i in xrange(num-1):
maxSize = math.ceil(val/(num-len(chunks)))
newSize = random.randint(minSize, maxSize)
val = val - newSize
chunks.append(newSize)
chunks.append(val)
return chunks
Calling divide with different parameters:
>>> divide(37,5,4)
>>> [7, 5, 4, 10, 11]
>>> divide(37,5,4)
>>> [4, 5, 4, 10, 14]
>>> divide(50,6,5)
>>> [6, 8, 8, 5, 9, 14]