I have a 2D numpy array:
[[1,2,3,4,5],[2,4,5,6,7],[0,9,3,2,4]]
I also have a second 1D array:
[2,3,4]
I want to replace all occurences of the elements of the second array with 0
So eventually, my second array should look like
[[1,0,0,0,5],[0,0,5,6,7],[0,9,0,0,0]]
is there a way in python/numpy I can do this without using a loop.
I already checked at np.where, but the condition there is only for example where element = 1 value, and not multiple.
Thanks a lot !
Use numpy.isin.
>>> import numpy as np
>>> a = np.array([[1,2,3,4,5],[2,4,5,6,7],[0,9,3,2,4]])
>>> b = np.array([2,3,4])
>>> a[np.isin(a, b)] = 0
>>> a
array([[1, 0, 0, 0, 5],
[0, 0, 5, 6, 7],
[0, 9, 0, 0, 0]])
I have asked a previous question, but I think my example was not clear. I am still trying to subtract two different sizes of numpy arrays from a list of numpy arrays. For example:
####Data####
### For same size numpy arrays the subtraction works fine!!!!###
easy_data= [[1,2,3],[2,2,2]],[[1,2,3],[1,2,5]]
d = [np.array(i) for i in easy_data] # List of numpy arrays
res = d[1] - d[0]
>> array([[ 0, 0, 0],
[-1, 0, 3]])
##### Current Issue ####
data = [[1,2,3],[2,2,2]],[[1,2,3],[1,2,5],[1,1,1]]
d = [np.array(i) for i in data]
res = d[1] - d[0] #### As the sizes are different I can't subtract them ###
Desired Output
array([[ 0, 0, 0],
[-1, 0, 3],[1,1,1])
I am kind of slow getting how to work with numpy arrays but I can't figure out how to make this work? Can anybody help me?
It's easiest to operate on a slice. If you do not want to erase the original array, use a copy:
>>> res=d[1].copy()
>>> res[:d[0].shape[0]]-=d[0]
>>> res
array([[ 0, 0, 0],
[-1, 0, 3],
[ 1, 1, 1]])
I have an array X:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
And I wish to find the index of the row of several values in this array:
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
For this example I would like a result like:
[0,3,4]
I have a code doing this, but I think it is overly complicated:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
result = []
for s in searched_values:
idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
result.append(idx)
print(result)
I found this answer for a similar question but it works only for 1d arrays.
Is there a way to do what I want in a simpler way?
Approach #1
One approach would be to use NumPy broadcasting, like so -
np.where((X==searched_values[:,None]).all(-1))[1]
Approach #2
A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -
dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
np.ravel_multi_index(searched_values.T,dims)))[0]
Approach #3
Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -
dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]
Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.
How does np.ravel_multi_index work?
This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.
Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.
To see how this function would compute linear indices, consider the first row of X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
We have the shape of the n-dimensional grid as dims -
In [78]: dims
Out[78]: array([10, 7])
Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -
In [79]: out = np.zeros(dims,dtype=int)
In [80]: out
Out[80]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Let's set the first indexing tuple from X, i.e. the first row from X into the grid -
In [81]: out[4,2] = 1
In [82]: out
Out[82]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.
In [83]: np.where(out.ravel())[0]
Out[83]: array([30])
This could also be computed if row-major ordering is taken into account.
Let's use np.ravel_multi_index and verify those linear indices -
In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])
Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.
Choosing dimensions for np.ravel_multi_index to form unique linear indices
Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.
Let's take another look at X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.
So, for our sample case, we had -
In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1
In [8]: dims
Out[8]: array([10, 7])
Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.
Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.
Another alternative is to use asvoid (below) to view each row as a single
value of void dtype. This reduces a 2D array to a 1D array, thus allowing you to use np.in1d as usual:
import numpy as np
def asvoid(arr):
"""
Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
View the array as dtype np.void (bytes). The items along the last axis are
viewed as one value. This allows comparisons to be performed which treat
entire rows as one value.
"""
arr = np.ascontiguousarray(arr)
if np.issubdtype(arr.dtype, np.floating):
""" Care needs to be taken here since
np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
Adding 0. converts -0. to 0.
"""
arr += 0.
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
idx = np.flatnonzero(np.in1d(asvoid(X), asvoid(searched_values)))
print(idx)
# [0 3 4]
The numpy_indexed package (disclaimer: I am its author) contains functionality for performing such operations efficiently (also uses searchsorted under the hood). In terms of functionality, it acts as a vectorized equivalent of list.index:
import numpy_indexed as npi
result = npi.indices(X, searched_values)
Note that using the 'missing' kwarg, you have full control over behavior of missing items, and it works for nd-arrays (fi; stacks of images) as well.
Update: using the same shapes as #Rik X=[520000,28,28] and searched_values=[20000,28,28], it runs in 0.8064 secs, using missing=-1 to detect and denote entries not present in X.
Here is a pretty fast solution that scales up well using numpy and hashlib. It can handle large dimensional matrices or images in seconds. I used it on 520000 X (28 X 28) array and 20000 X (28 X 28) in 2 seconds on my CPU
Code:
import numpy as np
import hashlib
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
#hash using sha1 appears to be efficient
xhash=[hashlib.sha1(row).digest() for row in X]
yhash=[hashlib.sha1(row).digest() for row in searched_values]
z=np.in1d(xhash,yhash)
##Use unique to get unique indices to ind1 results
_,unique=np.unique(np.array(xhash)[z],return_index=True)
##Compute unique indices by indexing an array of indices
idx=np.array(range(len(xhash)))
unique_idx=idx[z][unique]
print('unique_idx=',unique_idx)
print('X[unique_idx]=',X[unique_idx])
Output:
unique_idx= [4 3 0]
X[unique_idx]= [[5 6]
[3 3]
[4 2]]
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
S = np.array([[4, 2],
[3, 3],
[5, 6]])
result = [[i for i,row in enumerate(X) if (s==row).all()] for s in S]
or
result = [i for s in S for i,row in enumerate(X) if (s==row).all()]
if you want a flat list (assuming there is exactly one match per searched value).
Another way is to use cdist function from scipy.spatial.distance like this:
np.nonzero(cdist(X, searched_values) == 0)[0]
Basically, we get row numbers of X which have distance zero to a row in searched_values, meaning they are equal. Makes sense if you look on rows as coordinates.
I had similar requirement and following worked for me:
np.argwhere(np.isin(X, searched_values).all(axis=1))
Here's what worked out for me:
def find_points(orig: np.ndarray, search: np.ndarray) -> np.ndarray:
equals = [np.equal(orig, p).all(1) for p in search]
exists = np.max(equals, axis=1)
indices = np.argmax(equals, axis=1)
indices[exists == False] = -1
return indices
test:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6],
[0, 0]])
find_points(X, searched_values)
output:
[0,3,4,-1]
Is there a way in Python to initialize a multi-dimensional array / list without using a loop?
Sure there is a way
arr = eval(`[[0]*5]*10`)
or
arr = eval(("[[0]*5]+"*10)[:-1])
but it's horrible and wasteful, so everyone uses loops (usually list comprehensions) or numpy
Depending on your real needs, the de facto "standard" package Numpy might provide you with exactly what you need.
You can for instance create a multi-dimensional array with
numpy.empty((10, 4, 100)) # 3D array
(initialized with arbitrary values) or create the same arrays with zeros everywhere with
numpy.zeros((10, 4, 100))
Numpy is very fast, for array operations.
The following does not use any special library, nor eval:
arr = [[0]*5 for x in range(6)]
and it doesn't create duplicated references:
>>> arr[1][1] = 2
>>> arr
[[0, 0, 0, 0, 0],
[0, 2, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]
Sure, you can just do
mylist = [
[1,2,3],
[4,5,6],
[7,8,9]
]
I don't believe it's possible.
You can do something like this:
>>> a = [[0] * 5] * 5
to create a 5x5 matrix, but it is repeated objects (which you don't want). For example:
>>> a[1][2] = 1
[[0, 0, 1, 0, 0], [0, 0, 1, 0, 0], [0, 0, 1, 0, 0], [0, 0, 1, 0, 0], [0, 0, 1, 0, 0]]
You almost certainly need to use some kind of loop as in:
[[0 for y in range(5)] for x in range(5)]
Recursion is your friend :D
It's a pretty naive implementation but it works!
dim = [2, 2, 2]
def get_array(level, dimension):
if( level != len(dimension) ):
return [get_array(level+1, dimension) for i in range(dimension[level])]
else:
return 0
print get_array(0, dim)
It depends on what you what to initialize the array to, but sure. You can use a list comprehension to create a 5×3 array, for instance:
>>> [[0 for x in range(3)] for y in range(5)]
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> [[3*y+x for x in range(3)] for y in range(5)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14]]
Yes, I suppose this still has loops—but it's all done in one line, which I presume is the intended meaning of your question?
a = [[]]
a.append([1,2])
a.append([2,3])
Then
>>> a
[[1, 2], [2, 3]]
If you're doing numerical work using Numpy, something like
x = numpy.zeros ((m,n))
x = numpy.ones ((m,n))
Python does not have arrays. It has other sequence types ranging from lists to dictionaries without forgetting sets - the right one depends on your specific needs.
Assuming your "array" is actually a list, and "initialize" means allocate a list of lists of NxM elements, you can (pseudocode):
for N times: for M times: add an element
for N times: add a row of M elements
write the whole thing out
You say you don't want to loop and that rules out the first two points, but why?
You also say you don't want to write the thing down (in response to JacobM), so how would you exactly do that? I don't know of any other way of getting a data structure without either generating it in smaller pieces (looping) or explicitly writing it down - in any programming language.
Also keep in mind that a initialized but empty list is no better than no list, unless you put data into it. And you don't need to initialize it before putting data...
If this isn't a theoretical exercise, you're probably asking the wrong question. I suggest that you explain what do you need to do with that array.
You can do by this way:
First without using any loop:
[[0] * n] * m
Secondly using simple inline list comprehension:
[[0 for column in range(n)] for row in range(m)]
You can use N-dimensional array (ndarray). Here is the link to the documentation. http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html