If I have a and b:
a=[[1,2,3],
[4,5,6],
[7,8,9]]
b=8.1
and I want to find the index of the value b in a, I can do:
nonzero(abs(a-b)<0.5)
to get (2,1) as the index, but what do I do if b is a 1d or 2d array? Say,
b=[8.1,3.1,9.1]
and I want to get (2,1),(0,2),(2,2)
In general I expect only one match in a for every value of b. Can I avoid a for loop?
Use a list comprehension:
[nonzero(abs(x-a)<0.5) for x in b]
Vectorized approach with NumPy's broadcasting -
np.argwhere((np.abs(a - b[:,None,None])<0.5))[:,1:]
Explanation -
Extend b from a 1D to a 3D case with None/np.newaxis, keeping the elements along the first axis.
Perform absolute subtractions with the 2D array a, thus bringing in broadcasting and leading to a 3D array of elementwise subtractions.
Compare against the threshold of 0.5 and get the indices corresponding to matches along the last two axes and sorted by the first axis with np.argwhere(...)[:,1:].
Sample run -
In [71]: a
Out[71]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [72]: b
Out[72]: array([ 8.1, 3.1, 9.1, 0.7])
In [73]: np.argwhere((np.abs(a - b[:,None,None])<0.5))[:,1:]
Out[73]:
array([[2, 1],
[0, 2],
[2, 2],
[0, 0]])
Related
I have the following array and a list of indices
my_array = np.array([ [1,2], [3,4], [5,6], [7,8] ])
indices = np.array([0,2])
I can get the values of the array corresponding to my indices by just doing my_array[indices], which gives me the expected result
array([[1, 2],
[5, 6]])
Now I want to get the complement of it. As mentioned in one of the answers, doing
my_array[~indices]
Will not give the expected result [[3,4],[7,8]].
I was hoping this could be done in a 1-liner way, without having to define additional masks.
You can use numpy.delete. It returns a new array with sub-arrays along an axis deleted.
complement = np.delete(my_array, indices, axis=0)
>>> np.delete(my_array, indices, axis=0)
array([[3, 4],
[7, 8]])
a=np.arange(8).reshape(2,2,2)
b=np.arange(4).reshape(2,2)
print(np.matmul(a,b))
the Result is:
[[[ 2 3]
[ 6 11]]
[[10 19]
[14 27]]]
How to understand the result? How does it come about
Short answer: it "broadcasts" the second 2d matrix to a 3d matrix, and then performs a "mapping" so, it maps the elementwise submatrices to new submatrices in the result.
As the documentation on np.matmul [numpy-doc] says:
numpy.matmul(a, b, out=None)
Matrix product of two arrays.
The behavior depends on the arguments in the following way.
If both arguments are 2-D they are multiplied like conventional matrices.
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast
accordingly.
If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the
prepended 1 is removed.
If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the
appended 1 is removed.
So here the second item is applicable. So first the second matrix is "broadcasted" to the 3d variant as well, so that means that we multiple:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
with:
array([[[0, 1],
[2, 3]],
[[0, 1],
[2, 3]]])
and we see these as stacked matrices. So first we multiply:
array([[0, 1], array([[0, 1],
[2, 3]]) x [2, 3]])
which gives us:
array([[ 2, 3],
[ 6, 11]])
and then the elementwise second submatrices:
array([[4, 5], array([[0, 1],
[6, 7]]) x [2, 3]])
an this gives us:
array([[10, 19],
[14, 27]])
we thus stack these together into the result, and obtain:
>>> np.matmul(a, b)
array([[[ 2, 3],
[ 6, 11]],
[[10, 19],
[14, 27]]])
Although the behavior is thus perfectly defined, it might be better to use this feature carefully, since there are other "sensical" definitions of what a "matrix product" on 3d matrices with 2d matrices might look like, and these are thus not used here.
You can think of the multiplication more explicitly as a summation. So, if a has dimensions (i, j, k) and b has dimensions (k, l) then the result will have dimensions of (i, j, l).
In code this can be written (very explicitly) like so:
def matmul(a, b):
dim1, dim2, dim3 = a.shape
dim4 = b.shape[1]
c = np.zeros(shape=(dim1, dim2, dim4))
for i in range(dim1):
for j in range(dim2):
for l in range(dim4):
c[i, j, l] = sum(a[i, j, k] * b[k, l] for k in range(dim3))
return c
If you tried printing out the result from this matmul() function it will be the same as the numpy function.
Note: This function is very inefficient and also it works only when a has 3 dimensions and b has two but it can be generalized very easily.
I'm newbie in python and I don't understand the following code;
I expected that test1 and test2 give me same results (8, the sum of second row), instead
a=np.matrix([[1,2,3],[1,3, 4]])
b=np.matrix([[0,1]])
print(np.where(b==1))
test1=a[np.nonzero(b==1),:]
print(test1.sum())
ind,_=np.nonzero(b==1); #found in a code that I'm trying to undestand (why the _ ?)
test2=a[ind,:]
print(test2.sum())
gives me
(array([0]), array([1]))
14
6
in the first case I have the sum of the full matrix, in the second case I have the sum of the first row (instead of the 2nd)
I don't understand why this behavior
In [869]: a
Out[869]:
matrix([[1, 2, 3],
[1, 3, 4]])
In [870]: b
Out[870]: matrix([[0, 1]])
In this use where is the same as nonzero:
In [871]: np.where(b==1)
Out[871]: (array([0], dtype=int32), array([1], dtype=int32))
In [872]: np.nonzero(b==1)
Out[872]: (array([0], dtype=int32), array([1], dtype=int32))
It gives a tuple, one indexing array for each dimension (2 for an np.matrix). The ind,_= just unpacks those arrays, and throws away the 2nd. _ is reused in an interactive session such as the one I'm using.
In [873]: ind,_ =np.nonzero(b==1)
In [874]: ind
Out[874]: array([0], dtype=int32)
Selecting with where returns (0,1) value from a. But is that what you want?
In [875]: a[np.where(b==1)]
Out[875]: matrix([[2]])
Adding the : does index the whole array, but with an added dimension; again probably not what we want
In [876]: a[np.where(b==1),:]
Out[876]:
matrix([[[1, 2, 3]],
[[1, 3, 4]]])
ind is a single indexing array, and so selects the 0's row from a.
In [877]: a[ind,:]
Out[877]: matrix([[1, 2, 3]])
In [878]:
But is the b==1 supposed to find the 2nd element of b, and then select the 2nd row of a? To do that we have to use the 2nd indexing array from where:
In [878]: a[np.where(b==1)[1],:]
Out[878]: matrix([[1, 3, 4]])
Or the 2nd column from a corresponding to the 2nd column of b
In [881]: a[:,np.where(b==1)[1]]
Out[881]:
matrix([[2],
[3]])
Because a and b are np.matrix, the indexing result is always 2d.
For c array, the where produces a single element tuple
In [882]: c=np.array([0,1])
In [883]: np.where(c==1)
Out[883]: (array([1], dtype=int32),)
In [884]: a[_,:] # here _ is the last result, Out[883]
Out[884]: matrix([[1, 3, 4]])
We generally advise using np.array to construct new arrays, even 2d. np.matrix is a convenience for wayward MATLAB users, and often confuses new numpy users.
Suppose I have
[[array([x1, y1]), z1]
[array([x2, y1]), z2]
......
[array([xn, yn]), zn]
]
And I want to find the index of array([x5, y5]). How can find effieciently using NumPy?
To start off, owing to the mixed data format, I don't think you can extract the arrays in a vectorized manner. Thus, you can use loop comprehension to extract the first element corresponding to the arrays from each list element as a 2D array. So, let's say A is the input list, we would have -
arr = np.vstack([a[0] for a in A])
Then, simply do the comparison in a vectorized fashion using NumPy's broadcasting feature, as it will broadcast that comparison along all the rows and look all matching rows with np.all(axis=1). Finally, use np.flatnonzero to get the final indices. Thus, the final peace of the puzzle would be -
idx = np.flatnonzero((arr == search1D).all(1))
You can read up on the answers to this post to see other alternatives to get indices in such a 1D array searching in 2D array problem.
Sample run -
In [140]: A
Out[140]:
[[array([3, 4]), 11],
[array([2, 1]), 12],
[array([4, 2]), 16],
[array([2, 1]), 21]]
In [141]: search1D = [2,1]
In [142]: arr = np.vstack([a[0] for a in A]) # Extract 2D array
In [143]: arr
Out[143]:
array([[3, 4],
[2, 1],
[4, 2],
[2, 1]])
In [144]: np.flatnonzero((arr == search1D).all(1)) # Finally get indices
Out[144]: array([1, 3])
I have a 2D numpy array S representing a state space, with 80000000 rows (as states) and 5 columns (as state variables).
I initialize K0 with S, and at each iteration, I apply a state transition function f(x) on all of the states in Ki, and delete states whose f(x) is not in Ki, resulting Ki+1. Until it converges i.e. Ki+1 = Ki.
Going like this would take ages:
K = S
to_delete = [0]
While to_delete:
to_delete = []
for i in xrange(len(K)):
if not f(i) in K:
to_delete.append(K(i))
K = delete(K,to_delete,0)
So I wanted to make a vectorized implementation :
slice K in columns, apply f and, join them once again, thus obtaining f(K) somehow.
The question now is how to get an array of length len(K), say Sel, where each row Sel[i] determine whether f(K[i]) is in K. Exactly like the function in1d works.
Then it would be simple to make
K=K[Sel]]
Your question is difficult to understand because it contains extraneous information and contains typos. If I understand correctly, you simply want an efficient way to perform a set operation on the rows of a 2D array (in this case the intersection of the rows of K and f(K)).
You can do this with numpy.in1d if you create structured array view.
Code:
if this is K:
In [50]: k
Out[50]:
array([[6, 6],
[3, 7],
[7, 5],
[7, 3],
[1, 3],
[1, 5],
[7, 6],
[3, 8],
[6, 1],
[6, 0]])
and this is f(K) (for this example I subtract 1 from the first col and add 1 to the second):
In [51]: k2
Out[51]:
array([[5, 7],
[2, 8],
[6, 6],
[6, 4],
[0, 4],
[0, 6],
[6, 7],
[2, 9],
[5, 2],
[5, 1]])
then you can find all rows in K also found in f(K) by doing something this:
In [55]: k[np.in1d(k.view(dtype='i,i').reshape(k.shape[0]),k2.view(dtype='i,i').
reshape(k2.shape[0]))]
Out[55]: array([[6, 6]])
view and reshape create flat structured views so that each row appears as a single element to in1d. in1d creates a boolean index of k of matched items which is used to fancy index k and return the filtered array.
Not sure if I understand your question entirely, but if the interpretation of Paul is correct, it can be solved efficiently and fully vectorized using the numpy_indexed package as such in a single readable line:
import numpy_indexed as npi
K = npi.intersection(K, f(K))
Also, this works for rows of any type or shape.
Above answer is great.
But if one doesn't want to mingle with structured arrays and wants a solution that doesn't care what is the type of your array, nor the dimensions of your array elements, I came up with this :
k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]
basically, list(map(np.ndarray.dumps, k))instead of k.view(dtype='f8,f8').reshape(k.shape[0]).
Take into account that this solution is ~50 times slower.
k = np.array([[6.5, 6.5],
[3.5, 7.5],
[7.5, 5.5],
[7.5, 3.5],
[1.5, 3.5],
[1.5, 5.5],
[7.5, 6.5],
[3.5, 8.5],
[6.5, 1.5],
[6.5, 0.5]])
k = np.tile(k, (1000, 1))
k2 = np.c_[k[:, 0] - 1, k[:, 1] + 1]
In [132]: k.shape, k2.shape
Out[132]: ((10000, 2), (10000, 2))
In [133]: timeit k[np.in1d(k.view(dtype='f8,f8').reshape(k.shape[0]),k2.view(dtype='f8,f8').reshape(k2.shape[0]))]
10 loops, best of 3: 22.2 ms per loop
In [134]: timeit k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]
1 loop, best of 3: 892 ms per loop
It can be marginal for small inputs, but for the op's, it will take 1h 20min instead of 2 min.