Suppose:
A=np.array([1,2,0,-4])
B=np.array([1,1,1,1])
C=np.array([1,2,3,4])
With fancy indexing I can assign a scalar value to C wherever A > 0.
C[A > 0]= 1
But is there anyway to get something like C = B/A wherever A > 0 while preserving the original values of C for A <= 0 with fancy indexing ? If I try something like
C[A > 0] = B/A
I get an error like:
<input>:1: RuntimeWarning: divide by zero encountered in true_divide
Traceback (most recent call last):
File "<input>", line 1, in <module>
ValueError: NumPy boolean array indexing assignment cannot assign 4 input values to the 2 output values where the mask is true
I can get the result with a for loop or making copies of A & C where :
D = np.copy(A)
E = np.copy(C)
D[ D <= 0]= 1
E=B/A
E[A <=0] = C
or set C=Run(A,B) where
def Run(A,B):
C=np.zeros(A.shape[0],A.shape[1])
for i in range(len(A)):
if A[i] != O:
C[i] = A[i]/B[i]
else:
C[i] = C[i]
But i was just wondering if there was a more direct way to do it without adding so many steps if i am looping millions of times. Thanks.
You can index the operands: C[A > 0] = B[A > 0] / A[A > 0]. You might want to compute A > 0 once, and reuse it, e.g.
mask = A > 0
C[mask] = B[mask] / A[mask]
A more efficient alternative is to use the where parameter of np.divide or np.floor_divide. For example,
In [19]: A = np.array([1, 2, 0, -4])
In [20]: B = np.array([1, 1, 1, 1])
In [21]: C = np.array([1, 2, 3, 4])
In [22]: np.floor_divide(B, A, where=A > 0, out=C)
Out[22]: array([1, 0, 3, 4])
In [23]: C
Out[23]: array([1, 0, 3, 4])
I had to use floor_divide because all the arrays are integer arrays, and numpy.divide creates a floating point array, so that function will complain about the type mismatch if the out array is an integer array. If you want a floating point result, C should be an array of floating point values:
In [24]: C = np.array([1., 2., 3., 4.])
In [25]: np.divide(B, A, where=A > 0, out=C)
Out[25]: array([1. , 0.5, 3. , 4. ])
In [26]: C
Out[26]: array([1. , 0.5, 3. , 4. ])
Related
Please assume a vector of invertible matrices:
import numpy as np
a = np.arange(120).reshape((2, 2, 5, 6))
I want to invert the matrices over their defined axes:
b = np.linalg.inv(a, axis1=0, axis2=1)
but this does not seems supported.
How to achieve this?
inv docs specifies its array input as:
a : (..., M, M) array_like
Matrix to be inverted.
You have a
a = np.arange(120).reshape((2, 2, 5, 6))
(M,M,...)
The dimensions are in the wrong order - change them!
In [44]: a = np.arange(120).reshape((2, 2, 5, 6))
Change the axes to the order that inv accepts:
In [45]: A = a.transpose(2,3,0,1)
In [46]: Ai = np.linalg.inv(A)
In [47]: Ai.shape
Out[47]: (5, 6, 2, 2)
In [48]: ai = Ai.transpose(2,3,0,1) # and back
In [49]: ai.shape
Out[49]: (2, 2, 5, 6)
I was going to test the result, but got:
In [50]: x = a#ai
Traceback (most recent call last):
File "<ipython-input-50-9dfe3616745d>", line 1, in <module>
x = a#ai
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 5 is different from 6)
Like inv, matmul treats the last 2 dimensions as the matrix, the first 2 as 'batch':
In [51]: x = A#Ai
In [52]: x[0,0]
Out[52]:
array([[1., 0.],
[0., 1.]])
In [53]: x[0,3]
Out[53]:
array([[1.00000000e+00, 1.38777878e-17],
[4.44089210e-16, 1.00000000e+00]])
We can do the equivalent with einsum:
In [55]: x = np.einsum('ijkl,jmkl->imkl',a,ai)
In [56]: x[:,:,0,0]
Out[56]:
array([[1., 0.],
[0., 1.]])
You might want to change the original specification to match the inv and matmul usage. It could make life easier for you. Also remember that in numpy the trailing dimensions are the inner most ones.
If you know that the matrices are 2x2 you can do that easily using the standard formula for inverting such matrices; otherwise, I fear the only reasonable solution would be to do it with for loops? For example, the following works for any shape (modifying the sizes adequately):
b = np.stack([np.linalg.inv(a[:, :, i, j]) for i in range(a.shape[2]) for j in range(a.shape[3])], axis=2)
b = b.reshape(2, 2, 5, 6)
as checked by
for i in range(a.shape[2]):
for j in range(a.shape[3]):
assert np.allclose(np.dot(a[:,:,i,j], b[:,:,i,j]), np.eye(2))
In the specific 2x2 case you can do the following, which is fully vectorized hence probably faster:
determinants = a[0, 0] * a[1, 1] - a[0, 1] * a[1, 0]
b = 1 / determinants * np.stack([
np.stack([a[1, 1], -a[0, 1]]),
np.stack([-a[1, 0], a[0, 0]]),
])
On the specific (small) input size, the second solution is about 10 times faster in my tests (43us vs. 537us).
I have an array called a and another array b. The array a is the main array where I store float data, and b is an array which contain some indexes belonging to a.
Example:
a = [1.3, 1.7, 18.4, 56.2, 82.2, 18.1, 81.9, 56.9, -274.45]
b = [0, 1, 2, 3, 4, 5, 6, 7]
In this example b contains indexes of a from 0 to 7.
What I'm trying to do in Python is to remove "duplicates", I mean to remove all indexes from b which have their similar value in a. For example, notice that there are pair 1.3 and 1.7. Also, there are 18.4 and 18.1 etc. I want to find all these values and to write -1 in all places in array b which has that value.
Output should be the following:
b = [0, -1, 2, 3, 4, -1, -1, -1]
I think it is obvious what I am trying to achieve. Here index 1 is replaced with -1 because in a it represents 1.7 which has "pair" 1.3. Also, last 3 indexes represents 18.1, 81.9 and 56.9 which also have their "pairs" before, so they are replaced with -1.
Of course, I have a parameter x which represents how "similar" values are. So, here x = 2 which mean that any 2 values which differ by 2 are similar.
What have I tried? I tried to use 2 nested for loops and a lot of unnecessary variables and my algorithm eats memory and performance. Is there an elegant np-ish way to achieve it?
Approach #1 : Here's a vectorized approach using broadcasting and a bit memory intensive -
x = 2 # threshold that decides similarity
a_b = a[b]
mask = np.triu(np.abs(a_b[:,None]-a_b)<x,1).any(0)
b[mask[:len(b)]] = -1
Sample run -
In [95]: a = np.array([1.3, 1.7, 18.4, 56.2, 82.2, 18.1, 81.9, 56.9, -274.45])
...: b = np.array([0, 1, 2, 3, 4, 5, 6, 7])
...:
# After code run ...
In [97]: b
Out[97]: array([ 0, -1, 2, 3, 4, -1, -1, -1])
Approach #2 : Less memory intensive approach
import pandas as pd
def set_mask(a,b,thresh):
a_b = a[b]
N = len(a_b)
sidx = a_b.argsort()
sorted_a_b = a_b[sidx]
mask0 = sorted_a_b[1:] - sorted_a_b[:-1] < thresh
id_arr = np.zeros(N, dtype=int)
id_arr[np.flatnonzero(~mask0)+1] = 1
ids = id_arr.cumsum()
d = np.column_stack(( ids, sidx))
df0 = pd.DataFrame(d, columns=(('ids','sidx')))
pp = df0['sidx'].groupby([ids]).min()
maskc = np.ones(N,dtype=bool)
maskc[pp.values] = 0
return maskc
Use this mask to replace the mask needed at the last step from previous approach.
Suppose you have an array:
a =
[ 0,1,0]
[-1,2,1]
[3,-4,2]
And lets say you add 20 to everything
b =
[ 20, 21, 20]
[ 19, 22, 21]
[ 23, 16, 22]
Now lets say I want to add the resulting b to the original array a but only in cases where a < 0 i.e at the index [0,1] and [1,2] where a = -1, -4 respectively getting the value 0 otherwise. Ultimately leading to a matrix as such:
c =
[ 0, 0, 0]
[ 18, 0, 0]
[ 0, 12, 0]
18 = 19 (from b) + -1 (from a)
12 = 16 (from b) + -4 (from a)
And assume that I want to be able to extend this to any operation (not just add 20), so that you can't just filter all values < 20 from matrix c. So I want to use matrix a as a mask toward matrix c, zeroing the i, j where a[i,j] < 0.
I'm having a tough time finding a concise example of how to do this in numpy with python. I was hoping you may be able to direct me to the correct implementation of such a method.
What I am struggling to get is this into a mask and only performing operations on the retained values, finally resulting in c.
Thanks for the help in advance.
Probably something like:
(a + b)*(a<0)
should work unless you have very strong requirements concerning the number of intermediate arrays.
You can do this through a combination of boolean indexing and broadcasting. Working example below,
import numpy as np
a = np.array([[ 0,1,0],[-1,2,1],[3,-4,2]])
b = a+20
c = np.zeros(a.shape)
c[a<0] = b[a<0] + a[a<0]
which gives c as
array([[ 0., 0., 0.],
[ 18., 0., 0.],
[ 0., 12., 0.]])
The only important line in the code snippet above is the last one. Because the entries of a, b, and c are all aligned, we can say we want only the corresponding indices of c where a<0 to be assigned to the sum of the entries in b and a where a<0.
Here is another way to get the same result:
c = np.where(a < 0, a + b, 0)
Although this is slightly more verbose than Thomas Baruchel's solution, I found the method signature similar to the ternary operation (a < 0 ? a + b : 0), which makes it easier for me to understand what it is doing right away. Also, this is still a one-liner which makes it elegant enough in my opinion.
reference: numpy.where
Perhaps not the cleanest solution, but how about this?:
def my_mask(a, b, threshold=0):
c = numpy.zeros(a.shape)
idx = np.where(a < threshold)
for ii in idx:
c[ii[1], ii[0]] = a[ii[1], ii[0]] + b[ii[1], ii[0]]
return c
The solution using numpy.zeros_like function:
import numpy as np
# the initial array
a = [[ 0,1,0],
[-1,2,1],
[3,-4,2]]
a = np.array(a)
b = a + 20 # after adding 20 to each element
c = np.zeros_like(a) # resulting matrix (filled with zeros by default)
neg = a < 0 # indeces of negative values
c[neg] = b[neg] + a[neg] # overriding the needed elements
print(c)
The output:
[[ 0 0 0]
[18 0 0]
[ 0 12 0]]
Given 2 numpy arrays of unequal size: A (a presorted dataset) and B (a list of query values). I want to find the closest "lower" neighbor in array A to each element of array B. Example code below:
import numpy as np
A = np.array([0.456, 2.0, 2.948, 3.0, 7.0, 12.132]) #pre-sorted dataset
B = np.array([1.1, 1.9, 2.1, 5.0, 7.0]) #query values, not necessarily sorted
print A.searchsorted(B)
# RESULT: [1 1 2 4 4]
# DESIRED: [0 0 1 3 4]
In this example, B[0]'s closest neighbors are A[0] and A[1]. It is closest to A[1], which is why searchsorted returns index 1 as a match, but what i want is the lower neighbor at index 0. Same for B[1:4], and B[4] should be matched with A[4] because both values are identical.
I could do something clunky like this:
desired = []
for b in B:
id = -1
for a in A:
if a > b:
if id == -1:
desired.append(0)
else:
desired.append(id)
break
id+=1
print desired
# RESULT: [0, 0, 1, 3, 4]
But there's gotta be a prettier more concise way to write this with numpy. I'd like to keep my solution in numpy because i'm dealing with large data sets, but i'm open to other options.
You can introduce the optional argument side and set it to 'right' as mentioned in the docs. Then, subtract the final indices result by 1 for the desired output, like so -
A.searchsorted(B,side='right')-1
Sample run -
In [63]: A
Out[63]: array([ 0.456, 2. , 2.948, 3. , 7. , 12.132])
In [64]: B
Out[64]: array([ 1.1, 1.9, 2.1, 5. , 7. ])
In [65]: A.searchsorted(B,side='right')-1
Out[65]: array([0, 0, 1, 3, 4])
In [66]: A.searchsorted(A,side='right')-1 # With itself
Out[66]: array([0, 1, 2, 3, 4, 5])
Here's one way to do this. np.argmax stops at the first True it encounters, so as long as A is sorted this provides the desired result.
[np.argmax(A>b)-1 for b in B]
Edit: I got the inequality wrong initially, it works now.
Assume I have the following arrays:
N = 8
M = 4
a = np.zeros(M)
b = np.random.randint(M, size=N) # contains indices for a
c = np.random.rand(N) # contains random values
I want to sum the values of c according to the indices provided in b, and store them in a. Writing a loop for this is trivial:
for i, v in enumerate(b):
a[v] += c[i]
Since N can get quite big in my real-world problem I'd like to avoid using python loops, but I can't figure out how to write it as a numpy-statement. Can anyone help me out?
Ok, here some example values:
In [27]: b
Out[27]: array([0, 1, 2, 0, 2, 3, 1, 1])
In [28]: c
Out[28]:
array([ 0.15517108, 0.84717734, 0.86019899, 0.62413489, 0.24357903,
0.86015187, 0.85813481, 0.7071174 ])
In [30]: a
Out[30]: array([ 0.77930596, 2.41242955, 1.10377802, 0.86015187])
import numpy as np
N = 8
M = 4
b = np.array([0, 1, 2, 0, 2, 3, 1, 1])
c = np.array([ 0.15517108, 0.84717734, 0.86019899, 0.62413489, 0.24357903, 0.86015187, 0.85813481, 0.7071174 ])
a = ((np.mgrid[:M,:N] == b)[0] * c).sum(axis=1)
returns
array([ 0.77930597, 2.41242955, 1.10377802, 0.86015187])