What is the most pythonic way to create a list of given size N= l*k where l is the number of different symbols (integers for simplicity) and k is the subsequence length like this:
N=12, l=4, k=3
[ 0,0,0, 1,1,1, 2,2,2, 3,3,3 ]
or this for example N=15 l=3, k=5:
[ 0,0,0,0,0, 1,1,1,1,1, 2,2,2,2,2 ]
this function should be called very often so speed is desirable.
Using numpy you can do this:
In [23]: import numpy as np
In [26]: a=np.arange(3).repeat(5)
In [27]: a
Out[27]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2])
or python builtin:
In [29]: [l for l in range(3) for k in range(5)]
Out[29]: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]
>>> l=3
>>> k=5
>>> [e for i in range(l) for e in [i]*k]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]
I like this lazy version (it returns an iterator, not a list, and you can generate values out of it when you want).
l, k = 3, 5
itertools.chain.from_iterable(itertools.repeat(i, k) for i in xrange(l))
It outputs this:
list(itertools.chain.from_iterable(itertools.repeat(i, k) for i in xrange(l)))
# [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]
Perhaps with itertools.repeat and itertools.chain.from_iterable?
>>> from itertools.import repeat, chain
>>> k = 3
>>> l = 4
>>> list(chain.from_iterable(list(repeat(x, k)) for x in xrange(l))
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]
>>> sum([[x]*3 for x in xrange(4)], [])
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]
Putting it into a function:
def combine(l, k):
return sum([[x]*k for x in xrange(l)], [])
import itertools
l = 3
k = 5
print(list(itertools.chain(*[[i] * k for i in range(l)])))
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]
Related
I have a very large numpy.array of integers, where each integer is in the range [0, 31].
I would like to count, for every pair of integers (a, b) in the range [0, 31] (e.g. [0, 1], [7, 9], [18, 0]) how often b occurs right after a.
This would give me a (32, 32) matrix of counts.
I'm looking for an efficient way to do this with numpy. Raw python loops would be too slow.
Here's one way...
To make the example easier to read, I'll use a maximum value of 9 instead of 31:
In [178]: maxval = 9
Make a random input for the example:
In [179]: np.random.seed(123)
In [180]: x = np.random.randint(0, maxval+1, size=100)
Create the result, initially all 0:
In [181]: counts = np.zeros((maxval+1, maxval+1), dtype=int)
Now add 1 to each coordinate pair, using numpy.add.at to ensure that duplicates are counted properly:
In [182]: np.add.at(counts, (x[:-1], x[1:]), 1)
In [183]: counts
Out[183]:
array([[2, 1, 1, 0, 1, 0, 1, 1, 1, 1],
[2, 1, 1, 3, 0, 2, 1, 1, 1, 1],
[0, 2, 1, 1, 4, 0, 2, 0, 0, 0],
[1, 1, 1, 3, 3, 3, 0, 0, 1, 2],
[1, 1, 0, 1, 1, 0, 2, 2, 2, 0],
[1, 0, 0, 0, 0, 0, 1, 1, 0, 2],
[0, 4, 2, 3, 1, 0, 2, 1, 0, 1],
[0, 1, 1, 1, 0, 0, 2, 0, 0, 3],
[1, 2, 0, 1, 0, 0, 1, 0, 0, 0],
[2, 0, 2, 2, 0, 0, 2, 2, 0, 0]])
For example, the number of times 6 is followed by 1 is
In [188]: counts[6, 1]
Out[188]: 4
We can verify that with the following expression:
In [189]: ((x[:-1] == 6) & (x[1:] == 1)).sum()
Out[189]: 4
You can use numpy's built-in diff routine together with boolean arrays.
import numpy as np
test_array = np.array([1, 2, 3, 1, 2, 4, 5, 1, 2, 6, 7])
a, b = (1, 2)
sum(np.bitwise_and(test_array[:-1] == a, np.diff(test_array) == b - a))
# 3
If your array is multi-dimensional, you will need to flatten it first or make some small modifications to the code above.
I have a numpy array [0, 1, 1, 2, 2, 0, 1, ...] which only contains the numbers 0-k. I would like to create a new array that contains the n possible arrays of permutations of 0-k. A small example with k=2 and n=6:
a = [0, 1, 0, 2]
permute(a)
result = [[0, 1, 0, 2]
[0, 2, 0, 1]
[1, 0, 1, 2]
[2, 1, 2, 0]
[1, 2, 1, 0]
[2, 0, 2, 1]]
Does anyone have any ideas/solutions as to how one could achieve this?
Your a is what combinatorists call a multiset. The sympy library has various routines for working with them.
>>> from sympy.utilities.iterables import multiset_permutations
>>> import numpy as np
>>> a = np.array([0, 1, 0, 2])
>>> for p in multiset_permutations(a):
... p
...
[0, 0, 1, 2]
[0, 0, 2, 1]
[0, 1, 0, 2]
[0, 1, 2, 0]
[0, 2, 0, 1]
[0, 2, 1, 0]
[1, 0, 0, 2]
[1, 0, 2, 0]
[1, 2, 0, 0]
[2, 0, 0, 1]
[2, 0, 1, 0]
[2, 1, 0, 0]
if your permutations fit in the memory, you could store them in a set and thus only get the distinguishable permutations.
from itertools import permutations
a = [0, 1, 0, 2]
perms = set(permutations(a))
I have an MxN array. I want to zero out all the values after an element in a row is zero or less.
For example the 2x12 array
111110011111
112321341411
should turn into
111110000000
112321341411
Thanks!
It may not be the most efficient method, but I've used np.cumsum for these types of things.
>>> import numpy as np
>>> dat = np.array([[1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1],
[1, 1, 2, 3, 2, 1, 3, 4, 1, 4, 1, 1], ])
>>> dat[np.cumsum(dat <= 0, 1, dtype='bool')] = 0
>>> print(dat)
array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 2, 3, 2, 1, 3, 4, 1, 4, 1, 1]])
#Jaime just pointed out that, np.logical_or.accumulate(dat <= 0, axis=1), is probably better than np.cumsum.
May be you or someone else need alternative solution without using numpy.
>>> dat = ['111110011111','112321341411','000000000000', '123456789120']
>>> def zero(dat):
result = []
for row in dat:
pos = row.find('0')
if pos > 0:
result.append(row[0:pos] + ('0' * (len(row) - pos)))
else:
result.append(row)
return result
>>> res = zero(dat)
>>> res
['111110000000', '112321341411', '000000000000', '123456789120']
>>> dat
['111110011111', '112321341411', '000000000000', '123456789120']
I know if I want to create a list like this:
[0 1 2 0 1 2 0 1 2 0 1 2]
I can use this command:
range(3) * 4
Is there a similar way to create a list like this:
[0 0 0 0 1 1 1 1 2 2 2 2]
I mean a way without loops
Integer division can help:
[x/4 for x in range(12)]
Same thing through map:
map(lambda x: x/4, range(12))
In python 3 integer division is done with //.
Beware that multiplication of a list will likely lead to a result you probably don't expect.
Yes, you can.
>>> [e for e in range(3) for _ in [0]*4]
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]
itertools module is always an option:
>>> from itertools import chain, repeat
>>> list(chain(repeat(0, 4), repeat(1, 4), repeat(2, 4)))
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]
More general way is:
def done(group_count, repeat_count):
return list(chain(*map(lambda i: repeat(i, repeat_count),
range(group_count))))
>>> done(3, 4)
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]
Without any explicit "for" :)
>>> list(chain(*zip(*([range(5)] * 5))))
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]
What about this:
>>> sum([ [x]*4 for x in range(5)],[])
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4]
>>>
or
>>> reduce(lambda x,y: x+y, [ [x]*4 for x in range(5)])
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4]
>>>
If you can't use a loop in your current method, create one in an other?
range(0,1)*4 + range(1,2)*4 + range(2,3)*4
Let's say I have a list of 5 elements x=[a,b,c,d,e] and I want to run a for loop that prints all lists where two of the entries are 1 less than the corresponding entries in the original list.
What is a simple way to do this in Python? Thanks in advance.
Edit: if x=[4,5,6,7,8] I want:
[3,4,6,7,8], [3,5,5,7,8], [3,5,6,6,8] etc.
Something like this:
>>> from itertools import combinations
>>> lis = [0,1,2,3,4]
>>> for x,y in combinations(range(len(lis)),2):
l = lis[:]
l[x] -= 1
l[y] -= 1
print l
...
[-1, 0, 2, 3, 4]
[-1, 1, 1, 3, 4]
[-1, 1, 2, 2, 4]
[-1, 1, 2, 3, 3]
[0, 0, 1, 3, 4]
[0, 0, 2, 2, 4]
[0, 0, 2, 3, 3]
[0, 1, 1, 2, 4]
[0, 1, 1, 3, 3]
[0, 1, 2, 2, 3]
Shorter version:
for x,y in combinations(range(len(lis)),2):
print [item - 1 if i in (x,y) else item for i,item in enumerate(lis)]
...
[-1, 0, 2, 3, 4]
[-1, 1, 1, 3, 4]
[-1, 1, 2, 2, 4]
[-1, 1, 2, 3, 3]
[0, 0, 1, 3, 4]
[0, 0, 2, 2, 4]
[0, 0, 2, 3, 3]
[0, 1, 1, 2, 4]
[0, 1, 1, 3, 3]
[0, 1, 2, 2, 3]
from itertools import combinations
a = [1,2,3,4]
for combination in combinations(range(len(a)),r=2):
print [c-(1 if i in combination else 0) for i,c in enumerate(a)]