Generate arrays with fixed number of non-zero elements [duplicate] - python

This question already has answers here:
Generate all binary strings of length n with k bits set
(14 answers)
Closed 15 days ago.
I have a question about how to generate all possible combinations of an array that satisfy the following condition:
with fixed length N
only M elements among N are 1, and the rest elements are 0.
For example, N = 4 and M = 2, we shall have
1: '0011'
2: '0101'
3: '0110'
4: '1001'
5: '1010'
6: '1100'
The basic idea is to pick up M elements from range(N) and replace the corresponding indices in np.zeros(N) with 1. However, I can only realize this idea by using several for-loops in python, which is inefficient. I'd like to ask whether there is any straightforward solution? Thanks in advance.

One way is to use itertools to get all possible combinations of the locations of ones and fill N-zero arrays with these ones.
import itertools
import numpy as np
N = 4
M = 2
combs = list(itertools.combinations(range(N), M))
result = [np.zeros(N) for _ in range(len(combs))]
for i, comb in enumerate(combs):
for j in comb:
result[i][j] = 1
print(result)
[array([1., 1., 0., 0.]), array([1., 0., 1., 0.]), array([1., 0., 0., 1.]), array([0., 1., 1., 0.]), array([0., 1., 0., 1.]), array([0., 0., 1., 1.])]

Generate a string with as many 0s and 1s as you need, then permutate it:
from itertools import permutations
n = 4
m = 2
item = []
for _ in range(m):
item.append("1")
for _ in range(n-m):
item.append("0")
perm = permutations(item)
for x in perm:
my_str = ""
for c in x:
my_str += c
print(my_str)

Related

How to efficiently filter maximum elements of a matrix per row

Given a 2D array, I'm looking for a pythonic way to get an array of same shape, with only the maximum element per each row.
See max_row_filter function below
def max_row_filter(mat2d):
m = np.zeros(mat2d.shape)
for r in range(mat2d.shape[0]):
c = np.argmax(mat2d[r])
m[r,c]=mat2d[r,c]
return m
p = np.array([[1,2,3],[5,4,3,],[9,10,3]])
max_row_filter(p)
Out: array([[ 0., 0., 3.],
[ 5., 0., 0.],
[ 0., 10., 0.]])
I'm looking for an efficient way to do this, suitable to be done on big arrays.
Alternative answer (this will keep duplicates):
p * (p==p.max(axis=1, keepdims=True))
If there are no duplicates, you could use numpy.argmax:
import numpy as np
p = np.array([[1, 2, 3],
[5, 4, 3, ],
[9, 10, 3]])
result = np.zeros_like(p)
rows, cols = zip(*enumerate(np.argmax(p, axis=1)))
result[rows, cols] = p[rows, cols]
print(result)
Output
[[ 0 0 3]
[ 5 0 0]
[ 0 10 0]]
Note that, for multiple occurrences argmax return the first occurence.

Assigning values to overwrite python array

I am trying to assign new values to an array based on whether or not the stored value is <3. Coming from an R background this is how I would do it, but this gives me a syntax error in Python. What am I doing wrong, and what is the Python approach?
eurx=[1,2,3,4,5,6,7,'a',8]
sma50=3
tw=eurx
tw[eurx<sma50]=-1
tw[eurx>=sma50]=1
tw[(tw!=1)||(tw!=-1)]=0
print(tw)
GOAL:
-1
-1
1
1
1
1
1
0
1
This is "too much R". A pythonic way would be to use functional filtering:
>>> map(lambda i: -2*int(i<sma50)+1 if type(i) == int else 0, eurx)
[-1, -1, 1, 1, 1, 1, 1, 0, 1]
Or just a simple for-loop with a few ifs:
>>> for i in eurx:
... if type(i) != int:
... print 0
... else:
... print -2*int(i<sma50)+1
...
-1
-1
1
1
1
1
1
0
1
In general: don't try to guess the syntax. It's very simple, just read through some tutorials (e.g. https://docs.python.org/3/tutorial/introduction.html#first-steps-towards-programming)
Edit: the int conversion hack works as follows: you know you can convert bool to int, right?
>>> int(True)
1
>>> int(False)
0
If i<sma50 evaluates to True, int(i<sma50) will be 1. So yor numbers now are converted to ones if i is smaller than sma50 and to zeros otherwise. But apparently you want the values (-1, 1) instead of (1, 0). Just apply the transform -2x+1 and you're done!
Your desired syntax is pretty close to what you'd write in numpy.
The heterogeneous list doesn't make it easy, but here's an example:
>>> import numpy as np
>>> eurx=[1,2,3,4,5,6,7,'a',8]
>>> sma50 = 3
>>> tw = np.array([i if isinstance(i, int) else np.nan for i in eurx])
>>> tw
array([ 1., 2., 3., 4., 5., 6., 7., nan, 8.])
>>> tw[tw < sma50] = -1
__main__:1: RuntimeWarning: invalid value encountered in less
>>> tw[tw >= sma50] = 1
__main__:1: RuntimeWarning: invalid value encountered in greater_equal
>>> tw
array([ -1., -1., 1., 1., 1., 1., 1., nan, 1.])
>>> tw[np.isnan(tw)] = 0
>>> tw
array([-1., -1., 1., 1., 1., 1., 1., 0., 1.])

Change multiple items in a list [duplicate]

This question already has answers here:
How does assignment work with list slices?
(5 answers)
Closed 5 years ago.
I want to change ,multiple values in a list, for example, every multiple of 2. Using slicing.
my logic is:
list = [0] * 10
list[::2] = 1
However, I get an error message:
" must assign iterable to extended slice"
Can someone explain the error and also the correct logic to preform something like this? Thanks.
When you assign to a slice of a list, you need the assignment to be a list of the same length as the slice. For your example, assign a list of 5 ones:
l = [0] * 10
l[::2] = [1] *5
It isn't obvious why in this example, but if you think about it, you were doing:
l[3:6] = 2
Obviously that doesn't make sense. You are trying to assign an int to a list, which won't work. l[::2] is just another way to slice a list, so you must assign a list to it.
In the future, don't name your lists "list" because doing so overrides the builtin list() function.
my_list[::2] has 10//2 (=5) elements, so the right part of the assignment should have 10//2 elements as well:
>>> my_list = [0] * 10
>>> my_list[::2] = [1]*(10//2)
>>> my_list
[1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
Or you could use numpy with broadcasting:
>>> import numpy as np
>>> a = np.zeros(10)
>>> a
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> a[::2] = 1
>>> a
array([ 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.])

How to get unique rows and their occurrences for 2D array?

I have a 2D array, and it has some duplicate columns. I would like to be able to see which unique columns there are, and where the duplicates are.
My own array is too large to put here, but here is an example:
a = np.array([[ 1., 0., 0., 0., 0.],[ 2., 0., 4., 3., 0.],])
This has the unique column vectors [1.,2.], [0.,0.], [0.,4.] and [0.,3.]. There is one duplicate: [0.,0.] appears twice.
Now I found a way to get the unique vectors and their indices here but it is not clear to me how I would get the occurences of duplicates as well. I have tried several naive ways (with np.where and list comps) but those are all very very slow. Surely there has to be a numpythonic way?
In matlab it's just the unique function but np.unique flattens arrays.
Here's a vectorized approach to give us a list of arrays as output -
ids = np.ravel_multi_index(a.astype(int),a.max(1).astype(int)+1)
sidx = ids.argsort()
sorted_ids = ids[sidx]
out = np.split(sidx,np.nonzero(sorted_ids[1:] > sorted_ids[:-1])[0]+1)
Sample run -
In [62]: a
Out[62]:
array([[ 1., 0., 0., 0., 0.],
[ 2., 0., 4., 3., 0.]])
In [63]: out
Out[63]: [array([1, 4]), array([3]), array([2]), array([0])]
The numpy_indexed package (disclaimer: I am its author) contains efficient functionality for computing these kind of things:
import numpy_indexed as npi
unique_columns = npi.unique(a, axis=1)
non_unique_column_idx = npi.multiplicity(a, axis=1) > 1
Or alternatively:
unique_columns, column_count = npi.count(a, axis=1)
duplicate_columns = unique_columns[:, column_count > 1]
For small arrays:
from collections import defaultdict
indices = defaultdict(list)
for index, column in enumerate(a.transpose()):
indices[tuple(column)].append(index)
unique = [kk for kk, vv in indices.items() if len(vv) == 1]
non_unique = {kk:vv for kk, vv in indices.items() if len(vv) != 1}

Python: convert numpy array of signs to int and back

I'm trying to convert from a numpy array of signs (i.e., a numpy array whose entries are either 1. or -1.) to an integer and back through a binary representation. I have something that works, but it's not Pythonic, and I expect it'll be slow.
def sign2int(s):
s[s==-1.] = 0.
bstr = ''
for i in range(len(s)):
bstr = bstr + str(int(s[i]))
return int(bstr, 2)
def int2sign(i, m):
bstr = bin(i)[2:].zfill(m)
s = []
for d in bstr:
s.append(float(d))
s = np.array(s)
s[s==0.] = -1.
return s
Then
>>> m = 4
>>> s0 = np.array([1., -1., 1., 1.])
>>> i = sign2int(s0)
>>> print i
11
>>> s = int2sign(i, m)
>>> print s
[ 1. -1. 1. 1.]
I'm concerned about (1) the for loops in each and (2) having to build an intermediate representation as a string.
Ultimately, I will want something that works with a 2-d numpy array, too---e.g.,
>>> s = np.array([[1., -1., 1.], [1., 1., 1.]])
>>> print sign2int(s)
[5, 7]
For 1d arrays you can use this one linear Numpythonic approach, using np.packbits:
>>> np.packbits(np.pad((s0+1).astype(bool).astype(int), (8-s0.size, 0), 'constant'))
array([11], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([11], dtype=np.uint8))[-4:]).astype(float)
>>> unpack[unpack==0] = -1
>>> unpack
array([ 1., -1., 1., 1.])
And for 2d array:
>>> x, y = s.shape
>>> np.packbits(np.pad((s+1).astype(bool).astype(int), (8-y, 0), 'constant')[-2:])
array([5, 7], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([5, 7], dtype='uint8'))).astype(float).reshape(x, 8)[:,-y:]
>>> unpack[unpack==0] = -1
>>> unpack
array([[ 1., -1., 1.],
[ 1., 1., 1.]])
I'll start with sig2int.. Convert from a sign representation to binary
>>> a
array([ 1., -1., 1., -1.])
>>> (a + 1) / 2
array([ 1., 0., 1., 0.])
>>>
Then you can simply create an array of powers of two, multiply it by the binary and sum.
>>> powers = np.arange(a.shape[-1])[::-1]
>>> np.power(2, powers)
array([8, 4, 2, 1])
>>> a = (a + 1) / 2
>>> powers = np.power(2, powers)
>>> a * powers
array([ 8., 0., 2., 0.])
>>> np.sum(a * powers)
10.0
>>>
Then make it operate on rows by adding axis information and rely on broadcasting.
def sign2int(a):
# powers of two
powers = np.arange(a.shape[-1])[::-1]
np.power(2, powers, powers)
# sign to "binary" - add one and divide by two
np.add(a, 1, a)
np.divide(a, 2, a)
# scale by powers of two and sum
np.multiply(a, powers, a)
return np.sum(a, axis = -1)
>>> b = np.array([a, a, a, a, a])
>>> sign2int(b)
array([ 11., 11., 11., 11., 11.])
>>>
I tried it on a 4 by 100 bit array and it seemed fast
>>> a = a.repeat(100)
>>> b = np.array([a, a, a, a, a])
>>> b
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> sign2int(b)
array([ 2.58224988e+120, 2.58224988e+120, 2.58224988e+120,
2.58224988e+120, 2.58224988e+120])
>>>
I'll add the reverse if i can figure it. - the best I could do relies on some plain Python without any numpy vectoriztion magic and I haven't figured how to make it work with a sequence of ints other than to iterate over them and convert them one at a time - but the time still seems acceptable.
def foo(n):
'''yields bits in increasing powers of two
bit sequence from lsb --> msb
'''
while n > 0:
n, r = divmod(n, 2)
yield r
def int2sign(n):
n = int(n)
a = np.fromiter(foo(n), dtype = np.int8, count = n.bit_length())
np.multiply(a, 2, a)
np.subtract(a, 1, a)
return a[::-1]
Works on 1324:
>>> bin(1324)
'0b10100101100'
>>> a = int2sign(1324)
>>> a
array([ 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1], dtype=int8)
Seems to work with 1.2e305:
>>> n = int(1.2e305)
>>> n.bit_length()
1014
>>> a = int2sign(n)
>>> a.shape
(1014,)
>>> s = bin(n)
>>> s = s[2:]
>>> all(2 * int(x) -1 == y for x, y in zip(s, a))
True
>>>
Here are some vectorized versions of your functions:
def sign2int(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def int2sign(i, m):
tmp = np.array(list(bin(i)[2:].zfill(m)))
return np.where(tmp == "0", "-1", tmp).astype(int)
s0 = np.array([1., -1., 1., 1.])
sign2int(s0)
# 11
int2sign(11, 5)
# array([-1, 1, -1, 1, 1])
To use your functions on 2-d arrays, you can use map function:
s = np.array([[1., -1., 1.], [1., 1., 1.]])
map(sign2int, s)
# [5, 7]
map(lambda x: int2sign(x, 4), [5, 7])
# [array([-1, 1, -1, 1]), array([-1, 1, 1, 1])]
After a bit of testing, the Numpythonic approach of #wwii that doesn't use strings seems to fit what I need best. For the int2sign, I used a for-loop over the exponents with a standard algorithm for the conversion---which will have at most 64 iterations for 64-bit integers. Numpy's broadcasting happens across each integer very efficiently.
packbits and unpackbits are restricted to 8-bit integers; otherwise, I suspect that would've been the best (though I didn't try).
Here are the specific implementations I tested that follow the suggestions in the other answers (thanks to everyone!):
def _sign2int_str(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def sign2int_str(s):
return np.array(map(_sign2int_str, s))
def _int2sign_str(i, m):
tmp = np.array(list(bin(i)[2:])).astype(int)
return np.pad(np.where(tmp == 0, -1, tmp), (m - len(tmp), 0), "constant", constant_values = -1)
def int2sign_str(i,m):
return np.array(map(lambda x: _int2sign_str(x, m), i.astype(int).tolist())).transpose()
def sign2int_np(s):
p = np.arange(s.shape[-1])[::-1]
s = s + 1
return np.sum(np.power(s, p), axis = -1).astype(int)
def int2sign_np(i,m):
N = i.shape[-1]
S = np.zeros((m, N))
for k in range(m):
b = np.power(2, m - 1 - k).astype(int)
S[k,:] = np.divide(i.astype(int), b).astype(float)
i = np.mod(i, b)
S[S==0.] = -1.
return S
And here is my test:
X = np.sign(np.random.normal(size=(5000, 20)))
N = 100
t = time.time()
for i in range(N):
S = sign2int_np(X)
print 'sign2int_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
S = sign2int_str(X)
print 'sign2int_str: \t{:10.8f} sec'.format((time.time() - t)/N)
m = 20
S = np.random.randint(0, high=np.power(2,m), size=(5000,))
t = time.time()
for i in range(N):
X = int2sign_np(S, m)
print 'int2sign_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
X = int2sign_str(S, m)
print 'int2sign_str: \t{:10.8f} sec'.format((time.time() - t)/N)
This produced the following results:
sign2int_np: 0.00165325 sec
sign2int_str: 0.04121902 sec
int2sign_np: 0.00318024 sec
int2sign_str: 0.24846984 sec
I think numpy.packbits is worth another look. Given a real-valued sign array a, you can use numpy.packbits(a > 0). Decompression is done by numpy.unpackbits. This implicitly flattens multi-dimensional arrays so you'll need to reshape after unpackbits if you have a multi-dimensional array.
Note that you can combine bit packing with conventional compression (e.g., zlib or lzma). If there is a pattern or bias to your data, you may get a useful compression factor, but for unbiased random data, you'll typically see a moderate size increase.

Categories

Resources