Ignore dimension when using np.einsum - python

I use np.einsum to calculate the flow of material in a graph (1 node to 4 nodes in this example). The amount of flow is given by amount (amount.shape == (1, 1, 2) the dimensions define certain criteria, let's call them a, b, c).
The boolean matrix route determines the permissible flow based on the a, b, c criteria into y (route.shape == (4, 1, 1, 2); yabc). I label the dimensions y, a, b, c. abc are equivalent to amounts dimensions abc, y is the direction of the flow (0, 1, 2 or 3). To determine the amount of material in y, I calculate np.einsum('abc,yabc->y', amount, route) and get a y-dim vector with the flows into y. There's also an implicit priorisation of the route. For instance, any route[0, ...] == True is False for any y=1..3, any route[1, ...] == True is False for the next higher y-dim routes and so on. route[3, ...] (last y-index) defines the catch-all route, that is, its values are True when previous y-index values were False ((route[0] ^ route[1] ^ route[2] ^ route[3]).all() == True).
This works fine. However, when I introduce another criteria (dimension) x which only exists in route, but not in amount, this logic seems to break. The below code demonstrates the problem:
>>> import numpy as np
>>> amount = np.asarray([[[5000.0, 0.0]]])
>>> route = np.asarray([[[[[False, True]]], [[[False, True]]], [[[False, True]]]], [[[[True, False]]], [[[False, False]]], [[[False, False]]]], [[[[False, False]]], [[[True, False]]], [[[False, False]]]], [[[[False, False]]], [[[False, False]]], [[[True, False]]]]], dtype=bool)
>>> amount.shape
(1, 1, 2)
>>> Added dimension `x`
>>> # y,x,a,b,c
>>> route.shape
(4, 3, 1, 1, 2)
>>> # Attempt 1: `5000` can flow into y=1, 2 or 3. I expect
>>> # `flows1.sum() == amount.sum()` as it would be without `x`.
>>> # Correct solution would be `[0, 5000, 0, 0]` because material is routed
>>> # to y=1, and is not available for y=2 and y=3 as they are lower
>>> # priority (higher index)
>>> flows1 = np.einsum('abc,yxabc->y', amount, route)
>>> flows1
array([ 0., 5000., 5000., 5000.])
>>> # Attempt 2: try to collapse `x` => not much different, duplication
>>> np.einsum('abc,yabc->y', amount, route.any(1))
array([ 0., 5000., 5000., 5000.])
>>> # This is the flow by `y` and `x`. I'd only expect a `5000` in the
>>> # 2nd row (`[5000., 0., 0.]`) not the others.
>>> np.einsum('abc,yxabc->yx', amount, route)
array([[ 0., 0., 0.],
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
Is there any feasible operation which I can apply to route (.all(1) doesn't work either) to ignore the x-dimension?
Another example:
>>> amount2 = np.asarray([[[5000.0, 1000.0]]])
>>> np.einsum('abc,yabc->y', amount2, route.any(1))
array([1000., 5000., 5000., 5000.])
can be interpreted as 1000.0 being routed to y=0 (and none of the other y-destinations) and 5000.0 being compatible with destination y=1, y=2 and y=3, but ideally, I'd only like to show 5000.0 up in y=1 (as that's the lowest index and highest destination priority).
Solution attempt
The below works, but is not very numpy-ish. It'll be great if the loop could be eliminated.
# Initialise destination
result = np.zeros((route.shape[0]))
# Calculate flow by maintaining all dimensions (this will cause
# double ups because `x` is not part of `amount2`
temp = np.einsum('abc,yxabc->yxabc', amount2, route)
temp_ixs = np.asarray(np.where(temp))
# For each original amount, find the destination (`y`)
for a, b, c in zip(*np.where(amount2)):
# Find where dimensions `abc` are equal in the destination.
# Take the first vector which contains `yxabc` (we get `yx` as result)
ix = np.where((temp_ixs[2:].T == [a, b, c]).all(axis=1))[0][0]
y_ix = temp_ixs.T[ix][0]
# ignored
x_ix = temp_ixs.T[ix][1]
v = amount2[a, b, c]
# build resulting destination
result[y_ix] += v
# result == array([1000., 5000., 0., 0.])
With other words for each value in amount2, I am looking for the lowest indices yx in temp so that the value can be written to result[y] = value (x is ignored).
>>> temp = np.einsum('abc,yxabc->yx', amount2, route)
>>> temp
# +--- value=1000 at y=0 => result[0] += 1000
# /
array([[1000., 1000., 1000.],
# +--- value=5000 at y=1 => result[1] += 5000
# /
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
>>> result
array([1000., 5000., 0., 0.])
>>> amount2
array([[[5000., 1000.]]])
Another attempt to reduce the dimensionality of route is:
>>> r = route.any(1)
>>> for x in xrange(1, route.shape[0]):
r[x] = r[x] & (r[:x] == False).all(axis=0)
>>> np.einsum('abc,yabc->y', amount2, r)
array([1000., 5000., 0., 0.])
This essentially preserves above-mentioned priority given by the first dimension of route. Any lower priority (higher index) array cannot contain a True value when a higher priority array has a value of True already at that sub index. While this is a lot better than my explicit approach, it would be great if the for x in xrange... loop could be expressed as numpy vector operation.

I haven't tried to follow your 'flow' interpretation of the multiplication problem. I'm just focusing on the calculation options.
Stripped of unnecessary dimensions, your arrays are:
In [194]: amount
Out[194]: array([5000., 0.])
In [195]: route
Out[195]:
array([[[0, 1],
[0, 1],
[0, 1]],
[[1, 0],
[0, 0],
[0, 0]],
[[0, 0],
[1, 0],
[0, 0]],
[[0, 0],
[0, 0],
[1, 0]]])
And the yx calculation is:
In [197]: np.einsum('a,yxa->yx',amount, route)
Out[197]:
array([[ 0., 0., 0.],
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
which is just this slice of route times 5000.
In [198]: route[:,:,0]
Out[198]:
array([[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
Omit the x on the RHS of the einsum results in summation across the dimension.
Equivalently we can multiply (with broadcasting):
In [200]: (amount*route).sum(axis=2)
Out[200]:
array([[ 0., 0., 0.],
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
In [201]: (amount*route).sum(axis=(1,2))
Out[201]: array([ 0., 5000., 5000., 5000.])
Maybe looking at amount*route will help visualize the problem. You can also use max, min, argmax etc instead of sum, or along with it on one or more of the axes.

Related

How to efficiently filter maximum elements of a matrix per row

Given a 2D array, I'm looking for a pythonic way to get an array of same shape, with only the maximum element per each row.
See max_row_filter function below
def max_row_filter(mat2d):
m = np.zeros(mat2d.shape)
for r in range(mat2d.shape[0]):
c = np.argmax(mat2d[r])
m[r,c]=mat2d[r,c]
return m
p = np.array([[1,2,3],[5,4,3,],[9,10,3]])
max_row_filter(p)
Out: array([[ 0., 0., 3.],
[ 5., 0., 0.],
[ 0., 10., 0.]])
I'm looking for an efficient way to do this, suitable to be done on big arrays.
Alternative answer (this will keep duplicates):
p * (p==p.max(axis=1, keepdims=True))
If there are no duplicates, you could use numpy.argmax:
import numpy as np
p = np.array([[1, 2, 3],
[5, 4, 3, ],
[9, 10, 3]])
result = np.zeros_like(p)
rows, cols = zip(*enumerate(np.argmax(p, axis=1)))
result[rows, cols] = p[rows, cols]
print(result)
Output
[[ 0 0 3]
[ 5 0 0]
[ 0 10 0]]
Note that, for multiple occurrences argmax return the first occurence.

Python : Mapping values to other values without gap

I have the following question. Is there somekind of method with numpy or scipy , which I can use to get an given unsorted array like this
a = np.array([0,0,1,1,4,4,4,4,5,1891,7]) #could be any number here
to something where the numbers are interpolated/mapped , there is no gap between the values and they are in the same order like before?:
[0,0,1,1,2,2,2,2,3,5,4]
EDIT
Is it furthermore possible to swap/shuffle the numbers after the mapping, so that
[0,0,1,1,2,2,2,2,3,5,4]
become something like:
[0,0,3,3,5,5,5,5,4,1,2]
Edit: I'm not sure what the etiquette is here (should this be a separate answer?), but this is actually directly obtainable from np.unique.
>>> u, indices = np.unique(a, return_inverse=True)
>>> indices
array([0, 0, 1, 1, 2, 2, 2, 2, 3, 5, 4])
Original answer: This isn't too hard to do in plain python by building a dictionary of what index each value of the array would map to:
x = np.sort(np.unique(a))
index_dict = {j: i for i, j in enumerate(x)}
[index_dict[i] for i in a]
Seems you need to rank (dense) your array, in which case use scipy.stats.rankdata:
from scipy.stats import rankdata
rankdata(a, 'dense')-1
# array([ 0., 0., 1., 1., 2., 2., 2., 2., 3., 5., 4.])

Python: convert numpy array of signs to int and back

I'm trying to convert from a numpy array of signs (i.e., a numpy array whose entries are either 1. or -1.) to an integer and back through a binary representation. I have something that works, but it's not Pythonic, and I expect it'll be slow.
def sign2int(s):
s[s==-1.] = 0.
bstr = ''
for i in range(len(s)):
bstr = bstr + str(int(s[i]))
return int(bstr, 2)
def int2sign(i, m):
bstr = bin(i)[2:].zfill(m)
s = []
for d in bstr:
s.append(float(d))
s = np.array(s)
s[s==0.] = -1.
return s
Then
>>> m = 4
>>> s0 = np.array([1., -1., 1., 1.])
>>> i = sign2int(s0)
>>> print i
11
>>> s = int2sign(i, m)
>>> print s
[ 1. -1. 1. 1.]
I'm concerned about (1) the for loops in each and (2) having to build an intermediate representation as a string.
Ultimately, I will want something that works with a 2-d numpy array, too---e.g.,
>>> s = np.array([[1., -1., 1.], [1., 1., 1.]])
>>> print sign2int(s)
[5, 7]
For 1d arrays you can use this one linear Numpythonic approach, using np.packbits:
>>> np.packbits(np.pad((s0+1).astype(bool).astype(int), (8-s0.size, 0), 'constant'))
array([11], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([11], dtype=np.uint8))[-4:]).astype(float)
>>> unpack[unpack==0] = -1
>>> unpack
array([ 1., -1., 1., 1.])
And for 2d array:
>>> x, y = s.shape
>>> np.packbits(np.pad((s+1).astype(bool).astype(int), (8-y, 0), 'constant')[-2:])
array([5, 7], dtype=uint8)
And for reversing:
>>> unpack = (np.unpackbits(np.array([5, 7], dtype='uint8'))).astype(float).reshape(x, 8)[:,-y:]
>>> unpack[unpack==0] = -1
>>> unpack
array([[ 1., -1., 1.],
[ 1., 1., 1.]])
I'll start with sig2int.. Convert from a sign representation to binary
>>> a
array([ 1., -1., 1., -1.])
>>> (a + 1) / 2
array([ 1., 0., 1., 0.])
>>>
Then you can simply create an array of powers of two, multiply it by the binary and sum.
>>> powers = np.arange(a.shape[-1])[::-1]
>>> np.power(2, powers)
array([8, 4, 2, 1])
>>> a = (a + 1) / 2
>>> powers = np.power(2, powers)
>>> a * powers
array([ 8., 0., 2., 0.])
>>> np.sum(a * powers)
10.0
>>>
Then make it operate on rows by adding axis information and rely on broadcasting.
def sign2int(a):
# powers of two
powers = np.arange(a.shape[-1])[::-1]
np.power(2, powers, powers)
# sign to "binary" - add one and divide by two
np.add(a, 1, a)
np.divide(a, 2, a)
# scale by powers of two and sum
np.multiply(a, powers, a)
return np.sum(a, axis = -1)
>>> b = np.array([a, a, a, a, a])
>>> sign2int(b)
array([ 11., 11., 11., 11., 11.])
>>>
I tried it on a 4 by 100 bit array and it seemed fast
>>> a = a.repeat(100)
>>> b = np.array([a, a, a, a, a])
>>> b
array([[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 1., 1., 1., ..., 1., 1., 1.]])
>>> sign2int(b)
array([ 2.58224988e+120, 2.58224988e+120, 2.58224988e+120,
2.58224988e+120, 2.58224988e+120])
>>>
I'll add the reverse if i can figure it. - the best I could do relies on some plain Python without any numpy vectoriztion magic and I haven't figured how to make it work with a sequence of ints other than to iterate over them and convert them one at a time - but the time still seems acceptable.
def foo(n):
'''yields bits in increasing powers of two
bit sequence from lsb --> msb
'''
while n > 0:
n, r = divmod(n, 2)
yield r
def int2sign(n):
n = int(n)
a = np.fromiter(foo(n), dtype = np.int8, count = n.bit_length())
np.multiply(a, 2, a)
np.subtract(a, 1, a)
return a[::-1]
Works on 1324:
>>> bin(1324)
'0b10100101100'
>>> a = int2sign(1324)
>>> a
array([ 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1], dtype=int8)
Seems to work with 1.2e305:
>>> n = int(1.2e305)
>>> n.bit_length()
1014
>>> a = int2sign(n)
>>> a.shape
(1014,)
>>> s = bin(n)
>>> s = s[2:]
>>> all(2 * int(x) -1 == y for x, y in zip(s, a))
True
>>>
Here are some vectorized versions of your functions:
def sign2int(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def int2sign(i, m):
tmp = np.array(list(bin(i)[2:].zfill(m)))
return np.where(tmp == "0", "-1", tmp).astype(int)
s0 = np.array([1., -1., 1., 1.])
sign2int(s0)
# 11
int2sign(11, 5)
# array([-1, 1, -1, 1, 1])
To use your functions on 2-d arrays, you can use map function:
s = np.array([[1., -1., 1.], [1., 1., 1.]])
map(sign2int, s)
# [5, 7]
map(lambda x: int2sign(x, 4), [5, 7])
# [array([-1, 1, -1, 1]), array([-1, 1, 1, 1])]
After a bit of testing, the Numpythonic approach of #wwii that doesn't use strings seems to fit what I need best. For the int2sign, I used a for-loop over the exponents with a standard algorithm for the conversion---which will have at most 64 iterations for 64-bit integers. Numpy's broadcasting happens across each integer very efficiently.
packbits and unpackbits are restricted to 8-bit integers; otherwise, I suspect that would've been the best (though I didn't try).
Here are the specific implementations I tested that follow the suggestions in the other answers (thanks to everyone!):
def _sign2int_str(s):
return int(''.join(np.where(s == -1., 0, s).astype(int).astype(str)), 2)
def sign2int_str(s):
return np.array(map(_sign2int_str, s))
def _int2sign_str(i, m):
tmp = np.array(list(bin(i)[2:])).astype(int)
return np.pad(np.where(tmp == 0, -1, tmp), (m - len(tmp), 0), "constant", constant_values = -1)
def int2sign_str(i,m):
return np.array(map(lambda x: _int2sign_str(x, m), i.astype(int).tolist())).transpose()
def sign2int_np(s):
p = np.arange(s.shape[-1])[::-1]
s = s + 1
return np.sum(np.power(s, p), axis = -1).astype(int)
def int2sign_np(i,m):
N = i.shape[-1]
S = np.zeros((m, N))
for k in range(m):
b = np.power(2, m - 1 - k).astype(int)
S[k,:] = np.divide(i.astype(int), b).astype(float)
i = np.mod(i, b)
S[S==0.] = -1.
return S
And here is my test:
X = np.sign(np.random.normal(size=(5000, 20)))
N = 100
t = time.time()
for i in range(N):
S = sign2int_np(X)
print 'sign2int_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
S = sign2int_str(X)
print 'sign2int_str: \t{:10.8f} sec'.format((time.time() - t)/N)
m = 20
S = np.random.randint(0, high=np.power(2,m), size=(5000,))
t = time.time()
for i in range(N):
X = int2sign_np(S, m)
print 'int2sign_np: \t{:10.8f} sec'.format((time.time() - t)/N)
t = time.time()
for i in range(N):
X = int2sign_str(S, m)
print 'int2sign_str: \t{:10.8f} sec'.format((time.time() - t)/N)
This produced the following results:
sign2int_np: 0.00165325 sec
sign2int_str: 0.04121902 sec
int2sign_np: 0.00318024 sec
int2sign_str: 0.24846984 sec
I think numpy.packbits is worth another look. Given a real-valued sign array a, you can use numpy.packbits(a > 0). Decompression is done by numpy.unpackbits. This implicitly flattens multi-dimensional arrays so you'll need to reshape after unpackbits if you have a multi-dimensional array.
Note that you can combine bit packing with conventional compression (e.g., zlib or lzma). If there is a pattern or bias to your data, you may get a useful compression factor, but for unbiased random data, you'll typically see a moderate size increase.

Python: Counting identical rows in an array (without any imports)

For example, given:
import numpy as np
data = np.array(
[[0, 0, 0],
[0, 1, 1],
[1, 0, 1],
[1, 0, 1],
[0, 1, 1],
[0, 0, 0]])
I want to get a 3-dimensional array, looking like:
result = array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
One way is:
for row in data
newArray[ row[0] ][ row[1] ][ row[2] ] += 1
What I'm trying to do is the following:
for i in dimension1
for j in dimension2
for k in dimension3
result[i,j,k] = (data[data[data[:,0]==i, 1]==j, 2]==k).sum()
This doesn't seem to work and I would like to achieve the desired result by sticking to my implementation rather than the one mentioned in the beginning (or using any extra imports, eg counter).
Thanks.
You can also use numpy.histogramdd for this:
>>> np.histogramdd(data, bins=(2, 2, 2))[0]
array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
The problem is that data[data[data[:,0]==i, 1]==j, 2]==k is not what you expect it to be.
Let's take this apart for the case (i,j,k) == (0,0,0)
data[:,0]==0 is [True, True, False, False, True, True], and data[data[:,0]==0] correctly gives us the lines where the first number is 0.
Now from those lines we get the lines where the second number is 0: data[data[:,0]==0, 1]==0, which gives us [True, False, False, True]. And this is the problem. Because if we take those indices from data, i.e., data[data[data[:,0]==0, 1]==0] we do not get the rows where the first and second number are 0, but the 0th and 3rd row instead:
In [51]: data[data[data[:,0]==0, 1]==0]
Out[51]: array([[0, 0, 0],
[1, 0, 1]])
And if we now filter for the rows where the third number is 0, we get the wrong result w.r.t. the orignal data.
And that's why your approach does not work. For better methods, see the other answers.
You can do something like the following
#Get output dimension and construct output array.
>>> dshape = tuple(data.max(axis=0)+1)
>>> dshape
(2, 2, 2)
>>> out = np.zeros(shape)
If you have numpy 1.8+:
out.flat[np.ravel_multi_index(data.T, dshape)]+=1
Else:
#Get indices and unique the resulting array
>>> inds = np.ravel_multi_index(data.T, dshape)
>>> inds, inverse = np.unique(inds, return_inverse=True)
>>> values = np.bincount(inverse)
>>> values
array([2, 2, 2])
>>> out.flat[inds] = values
>>> out
array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
Numpy versions before numpy 1.7 do not have a add.at attribute and the top code will not work without it. As ravel_multi_index may not be the fastest algorithm ever you can look into taking the unique rows of a numpy array. In effect these two operations should be equivalent.
Don't fear the imports. They're what make Python awesome.
If question assumes that you already have the result matrix.
import numpy as np
data = np.array(
[[0, 0, 0],
[0, 1, 1],
[1, 0, 1],
[1, 0, 1],
[0, 1, 1],
[0, 0, 0]]
)
result = np.zeros((2,2,2))
# range of each dim, aka allowable values for each dim
dim_ranges = zip(np.zeros(result.ndim), np.array(result.shape)-1)
dim_ranges
# Out[]:
# [(0.0, 2), (0.0, 2), (0.0, 2)]
# Multidimentional histogram will effectively "count" along each dim
sums,_ = np.histogramdd(data,bins=result.shape,range=dim_ranges)
result += sums
result
# Out[]:
# array([[[ 2., 0.],
# [ 0., 2.]],
#
# [[ 0., 2.],
# [ 0., 0.]]])
This solution solves for any "result" ndarray, no matter what the shape. Additionally, it works fine even if your "data" ndarray has indices which are out-of-bounds for your result matrix.

Flip (reverse) image vertically given its string?

So I have a string of RGBA image data, each pixel is a byte long. I know the image's x and y resolution too. Now I want to edit the string in a way which would cause the image to be flipped or reversed vertically, which means have the first "row" of pixels become the last row and the opposite, and like this for all other "rows". Is there a fast way to do it?
To do what you want to the letter this is one way to proceed:
>>> img = 'ABCDEFGHIJKL'
>>> x, y = 4, 3
>>> def chunks(l, n):
... for i in xrange(0, len(l), n):
... yield l[i:i+n]
...
>>> [row for row in chunks(img, x)]
['ABCD', 'EFGH', 'IJKL']
>>> ''.join(reversed([row for row in chunks(img, x)]))
'IJKLEFGHABCD'
HOWEVER, unless you have very small images, you would be better off passing through numpy, as this is at the very minimum an order of magnitude faster than Cpython datatypes. You should look at at the flipup function. Example:
>>> A
array([[ 1., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 3.]])
>>> np.flipud(A)
array([[ 0., 0., 3.],
[ 0., 2., 0.],
[ 1., 0., 0.]])
EDIT: thought to add a complete example in case you have never worked with NumPy before. Of course the conversion is worth only for images that are not 2x2, as instantiating the array has an overhead....
>>> import numpy as np
>>> img = [0x00, 0x01, 0x02, 0x03]
>>> img
[0, 1, 2, 3]
>>> x = y = 2
>>> aimg = np.array(img).reshape(x, y)
>>> aimg
array([[0, 1],
[2, 3]])
>>> np.flipud(aimg)
array([[2, 3],
[0, 1]])
say you have the image in array img, then do
img.reverse();
#also need to flip each row
for row in img:
row.reverse();

Categories

Resources