Having trouble with numpy append function - python

I was doing work for a course today, and the assignment was to create a tic tac toe board. The possibilities method takes a tic tac toe board as an input, and checks if any of the values are "0", meaning that it's an open space. My plan would be to add the location of the 0 to an array, called locations, and then return locations at the end of the function. However, when I try to append the location of the 0 to the locations array, I keep getting this issue: "all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2 and the array at index 1 has size 1". Does anyone know how to solve this? Thanks
import numpy as np
def create_board():
board = np.zeros((3,3), dtype = "int")
return board
def place(board, player, position):
x, y = position
board[x][y] = player
def posibilities(board):
locations = np.empty(shape=[2,0])
for i in range(len(board)):
for x in range(len(board[0])):
if board[i][x] == 0:
locations = np.append(locations, [[i,x]], axis=1)
print(locations)
posibilities(create_board())

As #hpaulj suggested use list instead and change it to np.array at the end, that is:
def posibilities(board):
locations = []
for i in range(len(board)):
for x in range(len(board[0])):
if board[i][x] == 0:
locations.append([[i,x]])
locations = np.array(locations) # or np.concatenate(locations) depending what you want
print(locations)
This is the proper way to do this due to the fact that python lists are mutable and numpy arrays aren't.

In [530]: board = np.random.randint(0,2,(3,3))
In [531]: board
Out[531]:
array([[0, 0, 0],
[1, 0, 1],
[0, 1, 0]])
Looks like you are trying to collect the locations on the board where it is 0. argwhere does this nicely:
In [532]: np.argwhere(board==0)
Out[532]:
array([[0, 0],
[0, 1],
[0, 2],
[1, 1],
[2, 0],
[2, 2]])
With the list append:
In [533]: alist = []
In [534]: for i in range(3):
...: for j in range(3):
...: if board[i,j]==0:
...: alist.append([i,j])
...:
In [535]: alist
Out[535]: [[0, 0], [0, 1], [0, 2], [1, 1], [2, 0], [2, 2]]
argwhere actually uses np.nonzero to get a tuple of arrays which index the desired locations.
In [536]: np.nonzero(board==0)
Out[536]: (array([0, 0, 0, 1, 2, 2]), array([0, 1, 2, 1, 0, 2]))
Often this nonzero version is easier to use. For example it can be used directly to select all those cells:
In [537]: board[np.nonzero(board==0)]
Out[537]: array([0, 0, 0, 0, 0, 0])
and setting some of those to 1:
In [538]: board[np.nonzero(board==0)] = np.random.randint(0,2,6)
In [539]: board
Out[539]:
array([[0, 0, 1],
[1, 0, 1],
[1, 1, 1]])

Related

Diagonal array in numpy

If I have the array [[1,0,0],[0,1,0],[0,0,1]] (let's call it So) which is done as numpy.eye(3).
How can I get that the elements below the diagonal are only 2 and 3 like this [[1,0,0],[2,1,0],[3,2,1]] ?? How can I assign vectors of an array to a different set of values?
I know I could use numpy.concatenate to join 3 vectors and I know how to change rows/columns but I can't figure out how to change diagonals below the main diagonal.
I tried to do np.diagonal(So,-1)=2*np.diagonal(So,-1) to change the diagonal right below the main diagonal but I get the error message cannot assign to function call.
I would not start from numpy.eye but rather numpy.ones and use numpy.tril+cumsum to compute the next numbers on the lower triangle:
import numpy as np
np.tril(np.ones((3,3))).cumsum(axis=0).astype(int)
output:
array([[1, 0, 0],
[2, 1, 0],
[3, 2, 1]])
reversed output (from comment)
Assuming the array is square
n = 3
a = np.tril(np.ones((n,n)))
(a*(n+2)-np.eye(n)*n-a.cumsum(axis=0)).astype(int)
Output:
array([[1, 0, 0],
[3, 1, 0],
[2, 3, 1]])
Output for n=5:
array([[1, 0, 0, 0, 0],
[5, 1, 0, 0, 0],
[4, 5, 1, 0, 0],
[3, 4, 5, 1, 0],
[2, 3, 4, 5, 1]])
You can use np.fill_diagonal and index the matrix so the principal diagonal of your matrix is the one you want. This suposing you want to put other values than 2 and 3 is the a good solution:
import numpy as np
q = np.eye(3)
#if you want the first diagonal below the principal
# you can call q[1:,:] (this is not a 3x3 or 2x3 matrix but it'll work)
val =2
np.fill_diagonal(q[1:,:], val)
#note that here you can use an unique value 'val' or
# an array with values of corresponding size
#np.fill_diagonal(q[1:,:], [2, 2])
#then you can do the same on the last one column
np.fill_diagonal(q[2:,:], 3)
You could follow this approach:
def func(n):
... return np.array([np.array(list(range(i, 0, -1)) + [0,] * (n - i)) for i in range(1, n + 1)])
func(3)
OUTPUT
array([[1, 0, 0],
[2, 1, 0],
[3, 2, 1]])

Looking to iterate through list of lists that skips over specific indices

I'm currently working on a Python 3 project that involves iterating several times through a list of lists, and I want to write a code that skips over specific indices of this list of lists. The specific indices are stored in a separate list of lists. I have written a small list of lists, grid and the values I do not want to iterate over, coordinates:
grid = [[0, 0, 1], [0, 1, 0], [1, 0, 0]]
coordinates = [[0, 2], [1, 1], [2, 0]]
Basically, I wish to skip over each 1 in grid(the 1s are just used to make the corresponding coordinate locations more visible).
I tried the following code to no avail:
for row in grid:
for value in row:
for coordinate in coordinates:
if coordinate[0] != grid.index(row) and coordinate[1] != row.index(value):
row[value] += 4
print(grid)
The expected output is: [[4, 4, 1], [4, 1, 4], [1, 4, 4]]
After executing the code with, I am greeted with ValueError: 1 is not in list.
I have 2 questions:
Why am I being given this error message when each coordinate in coordinates contains a 0th and 1st position?
Is there a better way to solve this problem than using for loops?
There are two issues with your code.
The row contains a list of integers, and value contains values in those rows. The issue is that you need access to the indices of these values, not the values themselves. The way that you've set up your loops don't allow for that.
.index() returns the index of the first instance of the argument passed in; it is not a drop-in replacement for using indexing with brackets.
Here is a code snippet that does what you've described, fixing both of the above issues:
grid = [[0, 0, 1], [0, 1, 0], [1, 0, 0]]
coordinates = [[0, 2], [1, 1], [2, 0]]
for row in range(len(grid)):
for col in range(len(grid[row])):
if [row, col] not in coordinates:
grid[row][col] += 4
print(grid) # -> [[4, 4, 1], [4, 1, 4], [1, 4, 4]]
By the way, if you have a lot of coordinates, you can make it a set of tuples rather than a 2D list so you don't have to iterate over the entire list for each row / column index pair. The set would look like coordinates = {(0, 2), (1, 1), (2, 0)}, and you'd use if (row, col) not in coordinates: as opposed to if [row, col] not in coordinates: if you were using a set instead.
Here is a numpy way to do this, for anyone looking -
import numpy as np
g = np.array(grid)
c = np.array(coordinates)
mask = np.ones(g.shape, bool)
mask[tuple(c.T)] = False
#This mask skips each coordinate in the list
g[mask]+=4
print(g)
[[4 4 1]
[4 1 4]
[1 4 4]]
And a one-liner list comprehension, for those who prefer that -
[[j+4 if [row,col] not in coordinates else j for col,j in enumerate(i)] for row,i in enumerate(grid)]
[[4, 4, 1], [4, 1, 4], [1, 4, 4]]
grid = [
[0, 0, 1],
[0, 1, 0],
[1, 0, 0]]
coordinates = [[0, 2], [1, 1], [2, 0]]
for y, row in enumerate(grid):
for x, value in enumerate(row):
if [x, y] in coordinates or value != 1:
grid[y][x] += 4
print(grid)

Searching values in matrix in python

I am trying to resolve one task with matrix. I have function: fill_area(matrix, coordinates, value). If I would input coordinates for example (1,0) and value 3, it should write 3 in position [1][0] and rewrite the value to 3. If near numbers have the same value, it should also rewrite them. I would like to use an stack or queue for numbers, which have to change an value.
But I have no idea, how to check all nearest position.
matrix = [[2, 0, 1],
[0, 0, 1],
[0, 1, 1]]
fill_area(matrix, (1, 0), 3)
matrix = [[2, 0, 1],
[3, 3, 1],
[3, 1, 1]]
The preferred way to handle matrices is numpy but there seems to be no obvious way to handle this special case while avoiding loops over items in numpy. I take it from your example that you define "nearness" as the patch around the given coordinates but in coordinate-direction only. I have two versions, one for the entire coordinate-patch around the given coordinates, the other in ij-direction only (per your example). Both are list comprehensions, for better readability you might want to change that to regular loops.
def fill_area(mat, ij, val):
'''set patch around ij in mat to val for all positions where the original mat value == mat val at ij, full patch'''
return [[val if abs(i-ij[0])<2 and abs(j-ij[1])<2 and mat[i][j] == mat[ij[0]][ij[1]]
else mat[i][j] for j in range(len(mat[1]))] for i in range(len(mat))]
def fill_area1(mat, ij, val):
'''set patch around ij in mat to val for all positions where the original mat value == mat val at ij, patch ij-dir'''
return [[val if ((abs(i-ij[0])<2 and j==ij[1]) or (abs(j-ij[1])<2 and i==ij[0])) and mat[i][j] == mat[ij[0]][ij[1]]
else mat[i][j] for j in range(len(mat[1]))] for i in range(len(mat))]
matrix = [[2, 0, 1],
[0, 0, 1],
[0, 1, 1]]
print('entire patch: ', fill_area(matrix, (1, 0), 3))
print('ij-directional: ', fill_area1(matrix, (1, 0), 3))
which produces
entire patch: [[2, 3, 1], [3, 3, 1], [3, 1, 1]]
ij-directional: [[2, 0, 1], [3, 3, 1], [3, 1, 1]]

Compare numpy arrays of different sizes and find index that matches in Python 3 [duplicate]

I have an array X:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
And I wish to find the index of the row of several values in this array:
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
For this example I would like a result like:
[0,3,4]
I have a code doing this, but I think it is overly complicated:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
result = []
for s in searched_values:
idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
result.append(idx)
print(result)
I found this answer for a similar question but it works only for 1d arrays.
Is there a way to do what I want in a simpler way?
Approach #1
One approach would be to use NumPy broadcasting, like so -
np.where((X==searched_values[:,None]).all(-1))[1]
Approach #2
A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -
dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
np.ravel_multi_index(searched_values.T,dims)))[0]
Approach #3
Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -
dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]
Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.
How does np.ravel_multi_index work?
This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.
Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.
To see how this function would compute linear indices, consider the first row of X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
We have the shape of the n-dimensional grid as dims -
In [78]: dims
Out[78]: array([10, 7])
Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -
In [79]: out = np.zeros(dims,dtype=int)
In [80]: out
Out[80]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Let's set the first indexing tuple from X, i.e. the first row from X into the grid -
In [81]: out[4,2] = 1
In [82]: out
Out[82]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.
In [83]: np.where(out.ravel())[0]
Out[83]: array([30])
This could also be computed if row-major ordering is taken into account.
Let's use np.ravel_multi_index and verify those linear indices -
In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])
Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.
Choosing dimensions for np.ravel_multi_index to form unique linear indices
Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.
Let's take another look at X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.
So, for our sample case, we had -
In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1
In [8]: dims
Out[8]: array([10, 7])
Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.
Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.
Another alternative is to use asvoid (below) to view each row as a single
value of void dtype. This reduces a 2D array to a 1D array, thus allowing you to use np.in1d as usual:
import numpy as np
def asvoid(arr):
"""
Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
View the array as dtype np.void (bytes). The items along the last axis are
viewed as one value. This allows comparisons to be performed which treat
entire rows as one value.
"""
arr = np.ascontiguousarray(arr)
if np.issubdtype(arr.dtype, np.floating):
""" Care needs to be taken here since
np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
Adding 0. converts -0. to 0.
"""
arr += 0.
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
idx = np.flatnonzero(np.in1d(asvoid(X), asvoid(searched_values)))
print(idx)
# [0 3 4]
The numpy_indexed package (disclaimer: I am its author) contains functionality for performing such operations efficiently (also uses searchsorted under the hood). In terms of functionality, it acts as a vectorized equivalent of list.index:
import numpy_indexed as npi
result = npi.indices(X, searched_values)
Note that using the 'missing' kwarg, you have full control over behavior of missing items, and it works for nd-arrays (fi; stacks of images) as well.
Update: using the same shapes as #Rik X=[520000,28,28] and searched_values=[20000,28,28], it runs in 0.8064 secs, using missing=-1 to detect and denote entries not present in X.
Here is a pretty fast solution that scales up well using numpy and hashlib. It can handle large dimensional matrices or images in seconds. I used it on 520000 X (28 X 28) array and 20000 X (28 X 28) in 2 seconds on my CPU
Code:
import numpy as np
import hashlib
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
#hash using sha1 appears to be efficient
xhash=[hashlib.sha1(row).digest() for row in X]
yhash=[hashlib.sha1(row).digest() for row in searched_values]
z=np.in1d(xhash,yhash)
##Use unique to get unique indices to ind1 results
_,unique=np.unique(np.array(xhash)[z],return_index=True)
##Compute unique indices by indexing an array of indices
idx=np.array(range(len(xhash)))
unique_idx=idx[z][unique]
print('unique_idx=',unique_idx)
print('X[unique_idx]=',X[unique_idx])
Output:
unique_idx= [4 3 0]
X[unique_idx]= [[5 6]
[3 3]
[4 2]]
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
S = np.array([[4, 2],
[3, 3],
[5, 6]])
result = [[i for i,row in enumerate(X) if (s==row).all()] for s in S]
or
result = [i for s in S for i,row in enumerate(X) if (s==row).all()]
if you want a flat list (assuming there is exactly one match per searched value).
Another way is to use cdist function from scipy.spatial.distance like this:
np.nonzero(cdist(X, searched_values) == 0)[0]
Basically, we get row numbers of X which have distance zero to a row in searched_values, meaning they are equal. Makes sense if you look on rows as coordinates.
I had similar requirement and following worked for me:
np.argwhere(np.isin(X, searched_values).all(axis=1))
Here's what worked out for me:
def find_points(orig: np.ndarray, search: np.ndarray) -> np.ndarray:
equals = [np.equal(orig, p).all(1) for p in search]
exists = np.max(equals, axis=1)
indices = np.argmax(equals, axis=1)
indices[exists == False] = -1
return indices
test:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6],
[0, 0]])
find_points(X, searched_values)
output:
[0,3,4,-1]

How to calculate intersection in numpy.array? [duplicate]

I have an array X:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
And I wish to find the index of the row of several values in this array:
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
For this example I would like a result like:
[0,3,4]
I have a code doing this, but I think it is overly complicated:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
result = []
for s in searched_values:
idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
result.append(idx)
print(result)
I found this answer for a similar question but it works only for 1d arrays.
Is there a way to do what I want in a simpler way?
Approach #1
One approach would be to use NumPy broadcasting, like so -
np.where((X==searched_values[:,None]).all(-1))[1]
Approach #2
A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -
dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
np.ravel_multi_index(searched_values.T,dims)))[0]
Approach #3
Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -
dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]
Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.
How does np.ravel_multi_index work?
This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.
Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.
To see how this function would compute linear indices, consider the first row of X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
We have the shape of the n-dimensional grid as dims -
In [78]: dims
Out[78]: array([10, 7])
Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -
In [79]: out = np.zeros(dims,dtype=int)
In [80]: out
Out[80]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Let's set the first indexing tuple from X, i.e. the first row from X into the grid -
In [81]: out[4,2] = 1
In [82]: out
Out[82]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.
In [83]: np.where(out.ravel())[0]
Out[83]: array([30])
This could also be computed if row-major ordering is taken into account.
Let's use np.ravel_multi_index and verify those linear indices -
In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])
Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.
Choosing dimensions for np.ravel_multi_index to form unique linear indices
Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.
Let's take another look at X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.
So, for our sample case, we had -
In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1
In [8]: dims
Out[8]: array([10, 7])
Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.
Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.
Another alternative is to use asvoid (below) to view each row as a single
value of void dtype. This reduces a 2D array to a 1D array, thus allowing you to use np.in1d as usual:
import numpy as np
def asvoid(arr):
"""
Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
View the array as dtype np.void (bytes). The items along the last axis are
viewed as one value. This allows comparisons to be performed which treat
entire rows as one value.
"""
arr = np.ascontiguousarray(arr)
if np.issubdtype(arr.dtype, np.floating):
""" Care needs to be taken here since
np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
Adding 0. converts -0. to 0.
"""
arr += 0.
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
idx = np.flatnonzero(np.in1d(asvoid(X), asvoid(searched_values)))
print(idx)
# [0 3 4]
The numpy_indexed package (disclaimer: I am its author) contains functionality for performing such operations efficiently (also uses searchsorted under the hood). In terms of functionality, it acts as a vectorized equivalent of list.index:
import numpy_indexed as npi
result = npi.indices(X, searched_values)
Note that using the 'missing' kwarg, you have full control over behavior of missing items, and it works for nd-arrays (fi; stacks of images) as well.
Update: using the same shapes as #Rik X=[520000,28,28] and searched_values=[20000,28,28], it runs in 0.8064 secs, using missing=-1 to detect and denote entries not present in X.
Here is a pretty fast solution that scales up well using numpy and hashlib. It can handle large dimensional matrices or images in seconds. I used it on 520000 X (28 X 28) array and 20000 X (28 X 28) in 2 seconds on my CPU
Code:
import numpy as np
import hashlib
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
#hash using sha1 appears to be efficient
xhash=[hashlib.sha1(row).digest() for row in X]
yhash=[hashlib.sha1(row).digest() for row in searched_values]
z=np.in1d(xhash,yhash)
##Use unique to get unique indices to ind1 results
_,unique=np.unique(np.array(xhash)[z],return_index=True)
##Compute unique indices by indexing an array of indices
idx=np.array(range(len(xhash)))
unique_idx=idx[z][unique]
print('unique_idx=',unique_idx)
print('X[unique_idx]=',X[unique_idx])
Output:
unique_idx= [4 3 0]
X[unique_idx]= [[5 6]
[3 3]
[4 2]]
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
S = np.array([[4, 2],
[3, 3],
[5, 6]])
result = [[i for i,row in enumerate(X) if (s==row).all()] for s in S]
or
result = [i for s in S for i,row in enumerate(X) if (s==row).all()]
if you want a flat list (assuming there is exactly one match per searched value).
Another way is to use cdist function from scipy.spatial.distance like this:
np.nonzero(cdist(X, searched_values) == 0)[0]
Basically, we get row numbers of X which have distance zero to a row in searched_values, meaning they are equal. Makes sense if you look on rows as coordinates.
I had similar requirement and following worked for me:
np.argwhere(np.isin(X, searched_values).all(axis=1))
Here's what worked out for me:
def find_points(orig: np.ndarray, search: np.ndarray) -> np.ndarray:
equals = [np.equal(orig, p).all(1) for p in search]
exists = np.max(equals, axis=1)
indices = np.argmax(equals, axis=1)
indices[exists == False] = -1
return indices
test:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6],
[0, 0]])
find_points(X, searched_values)
output:
[0,3,4,-1]

Categories

Resources