Numpy Indexing and creation of new array - python

I am trying to understand what is going on inside the print statement.
I know indexing is taking place to create a 2D array however, I don't understand the order
x = np.arange(0,2*np.pi,0.001)
X = np.concatenate(([x], [np.ones(y.shape[0])]), axis=0)
print(X[[[0,1],[0,1]],[[0,0],[-1,-1]]])
Produces:
array([[0. , 1. ],
[6.283, 1. ]])

I looked into documentation and I think following example from there should explain that indexing:
From a 4x3 array the corner elements should be selected using advanced indexing. Thus all elements for which the column is one of [0, 2] and the row is one of [0, 3] need to be selected. To use advanced indexing one needs to select all elements explicitly. Using the method explained previously one could write:
x = array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
[9, 10, 11]])
rows = np.array([[0, 0],
[3, 3]], dtype=np.intp)
columns = np.array([[0, 2],
[0, 2]], dtype=np.intp)
x[rows, columns]
last line give
array([[0, 2],
[9, 11]])
code in your question seems to do same operation (but with other values) as that example, but with slamming indices directly rather than creating rows and columns.
X[[[0, 1], [0, 1]], [[0, 0], [-1, -1]]] might be readed as get elements which are, counting from 0: (in 0th or 1st row) and (in 0th or last column)

Related

What is the use of the data type intp?

The data type intp is mentioned in the following table:
Also, what does indexing means?
Integer array indexing
Integer array indexing allows selection of arbitrary items in the array based on their N-dimensional index. Each integer array represents a number of indexes into that dimension.
Purely integer array indexing
When the index consists of as many integer arrays as the array being indexed has dimensions, the indexing is straight forward, but different from slicing.
Advanced indexes always are broadcast and iterated as one:
result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
..., ind_N[i_1, ..., i_M]]
Note that the result shape is identical to the (broadcast) indexing array shapes ind_1, ..., ind_N.
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
x = np.array([[1, 2], [3, 4], [5, 6]])
x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
To achieve a behaviour similar to the basic slicing above, broadcasting can be used. The function ix_ can help with this broadcasting. This is best understood with an example.
Example
From a 4x3 array the corner elements should be selected using advanced indexing. Thus all elements for which the column is one of [0, 2] and the row is one of [0, 3] need to be selected. To use advanced indexing one needs to select all elements explicitly. Using the method explained previously one could write:
x = np.array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
rows = np.array([[0, 0],
[3, 3]], dtype=np.intp)
columns = np.array([[0, 2],
[0, 2]], dtype=np.intp)
x[rows, columns]
array([[ 0, 2],
[ 9, 11]])
However, since the indexing arrays above just repeat themselves, broadcasting can be used (compare operations such as rows[:, np.newaxis] + columns) to simplify this:
rows = np.array([0, 3], dtype=np.intp)
columns = np.array([0, 2], dtype=np.intp)
rows[:, np.newaxis]
array([[0],
[3]])
x[rows[:, np.newaxis], columns]
array([[ 0, 2],
[ 9, 11]])
This broadcasting can also be achieved using the function ix_:
x[np.ix_(rows, columns)]
array([[ 0, 2],
[ 9, 11]])
Note that without the np.ix_ call, only the diagonal elements would be selected, as was used in the previous example. This difference is the most important thing to remember about indexing with multiple advanced indexes.
Please read this.
REF: https://numpy.org/doc/stable/reference/arrays.indexing.html

axis in numpy.ufunc.outer

numpy.ufunc.outer is like Mathematica Outer[], but what I can't seem to figure out is how to treat a two dimensional array as a one dimensional array of one dimensional arrays. That is, suppose
a = [[1, 2], [3, 4]] and b = [[4, 5], [6, 7]]. I want to compute the two dimensional array whose ijth element is the distance (however I define it) between the ith row of a and the jth row of b, so in this case, if we use the supnorm distance, the result will be [[3, 5], [1, 3]]. Obviously one can write a loop, but that seems morally wrong, and precisely what ufunc.outer is meant to avoid.
In [309]: a = np.array([[1, 2], [3, 4]]); b = np.array([[4, 5], [6, 7]])
With broadcasting we can take the row differences:
In [310]: a[:,None,:]-b[None,:,:]
Out[310]:
array([[[-3, -3],
[-5, -5]],
[[-1, -1],
[-3, -3]]])
and reduce those with the max/abs on the last axis (I think that's what you mean by the sup norm:
In [311]: np.abs(a[:,None,:]-b[None,:,:]).max(axis=-1)
Out[311]:
array([[3, 5],
[1, 3]])
With subtract.outer, I have to select a subset of the results, and then transpose:
In [318]: np.subtract.outer(a,b)[:,[0,1],:,[0,1]].transpose(2,1,0)
Out[318]:
array([[[-3, -3],
[-1, -1]],
[[-5, -5],
[-3, -3]]])
I don't see any axis controls in the outer docs. Since broadcasting gives finer control, I haven't seen much use of the ufunc.outer feature.

Compare numpy arrays of different sizes and find index that matches in Python 3 [duplicate]

I have an array X:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
And I wish to find the index of the row of several values in this array:
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
For this example I would like a result like:
[0,3,4]
I have a code doing this, but I think it is overly complicated:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
result = []
for s in searched_values:
idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
result.append(idx)
print(result)
I found this answer for a similar question but it works only for 1d arrays.
Is there a way to do what I want in a simpler way?
Approach #1
One approach would be to use NumPy broadcasting, like so -
np.where((X==searched_values[:,None]).all(-1))[1]
Approach #2
A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d, like so -
dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
np.ravel_multi_index(searched_values.T,dims)))[0]
Approach #3
Another memory efficient approach using np.searchsorted and with that same philosophy of converting to linear index equivalents would be like so -
dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]
Please note that this np.searchsorted method assumes there is a match for each row from searched_values in X.
How does np.ravel_multi_index work?
This function gives us the linear index equivalent numbers. It accepts a 2D array of n-dimensional indices, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.
Let's use the inputs we have for the problem at hand. Take the case of input X and note the first row of it. Since, we are trying to convert each row of X into its linear index equivalent and since np.ravel_multi_index assumes each column as one indexing tuple, we need to transpose X before feeding into the function. Since, the number of elements per row in X in this case is 2, the n-dimensional grid to be mapped onto would be 2D. With 3 elements per row in X, it would had been 3D grid for mapping and so on.
To see how this function would compute linear indices, consider the first row of X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
We have the shape of the n-dimensional grid as dims -
In [78]: dims
Out[78]: array([10, 7])
Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index -
In [79]: out = np.zeros(dims,dtype=int)
In [80]: out
Out[80]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Let's set the first indexing tuple from X, i.e. the first row from X into the grid -
In [81]: out[4,2] = 1
In [82]: out
Out[82]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Now, to see the linear index equivalent of the element just set, let's flatten and use np.where to detect that 1.
In [83]: np.where(out.ravel())[0]
Out[83]: array([30])
This could also be computed if row-major ordering is taken into account.
Let's use np.ravel_multi_index and verify those linear indices -
In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])
Thus, we would have linear indices corresponding to each indexing tuple from X, i.e. each row from X.
Choosing dimensions for np.ravel_multi_index to form unique linear indices
Now, the idea behind considering each row of X as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X.
Let's take another look at X -
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X, such a stretch would be the maximum of each column in X + 1. That + 1 is because Python follows 0-based indexing. So, for example X[1,0] == 9 would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6 would go to the 7th column of that grid.
So, for our sample case, we had -
In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1
In [8]: dims
Out[8]: array([10, 7])
Thus, we would need a grid of at least a shape of (10,7) for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.
Concluding remarks : One important thing to be noted here is that if we have negative numbers in X, we need to add proper offsets along each column in X to make those indexing tuples as positive numbers before using np.ravel_multi_index.
Another alternative is to use asvoid (below) to view each row as a single
value of void dtype. This reduces a 2D array to a 1D array, thus allowing you to use np.in1d as usual:
import numpy as np
def asvoid(arr):
"""
Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
View the array as dtype np.void (bytes). The items along the last axis are
viewed as one value. This allows comparisons to be performed which treat
entire rows as one value.
"""
arr = np.ascontiguousarray(arr)
if np.issubdtype(arr.dtype, np.floating):
""" Care needs to be taken here since
np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
Adding 0. converts -0. to 0.
"""
arr += 0.
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
idx = np.flatnonzero(np.in1d(asvoid(X), asvoid(searched_values)))
print(idx)
# [0 3 4]
The numpy_indexed package (disclaimer: I am its author) contains functionality for performing such operations efficiently (also uses searchsorted under the hood). In terms of functionality, it acts as a vectorized equivalent of list.index:
import numpy_indexed as npi
result = npi.indices(X, searched_values)
Note that using the 'missing' kwarg, you have full control over behavior of missing items, and it works for nd-arrays (fi; stacks of images) as well.
Update: using the same shapes as #Rik X=[520000,28,28] and searched_values=[20000,28,28], it runs in 0.8064 secs, using missing=-1 to detect and denote entries not present in X.
Here is a pretty fast solution that scales up well using numpy and hashlib. It can handle large dimensional matrices or images in seconds. I used it on 520000 X (28 X 28) array and 20000 X (28 X 28) in 2 seconds on my CPU
Code:
import numpy as np
import hashlib
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6]])
#hash using sha1 appears to be efficient
xhash=[hashlib.sha1(row).digest() for row in X]
yhash=[hashlib.sha1(row).digest() for row in searched_values]
z=np.in1d(xhash,yhash)
##Use unique to get unique indices to ind1 results
_,unique=np.unique(np.array(xhash)[z],return_index=True)
##Compute unique indices by indexing an array of indices
idx=np.array(range(len(xhash)))
unique_idx=idx[z][unique]
print('unique_idx=',unique_idx)
print('X[unique_idx]=',X[unique_idx])
Output:
unique_idx= [4 3 0]
X[unique_idx]= [[5 6]
[3 3]
[4 2]]
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
S = np.array([[4, 2],
[3, 3],
[5, 6]])
result = [[i for i,row in enumerate(X) if (s==row).all()] for s in S]
or
result = [i for s in S for i,row in enumerate(X) if (s==row).all()]
if you want a flat list (assuming there is exactly one match per searched value).
Another way is to use cdist function from scipy.spatial.distance like this:
np.nonzero(cdist(X, searched_values) == 0)[0]
Basically, we get row numbers of X which have distance zero to a row in searched_values, meaning they are equal. Makes sense if you look on rows as coordinates.
I had similar requirement and following worked for me:
np.argwhere(np.isin(X, searched_values).all(axis=1))
Here's what worked out for me:
def find_points(orig: np.ndarray, search: np.ndarray) -> np.ndarray:
equals = [np.equal(orig, p).all(1) for p in search]
exists = np.max(equals, axis=1)
indices = np.argmax(equals, axis=1)
indices[exists == False] = -1
return indices
test:
X = np.array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
searched_values = np.array([[4, 2],
[3, 3],
[5, 6],
[0, 0]])
find_points(X, searched_values)
output:
[0,3,4,-1]

Indexing numpy multidimensional arrays depends on a slicing method

I have a 3-D array. When I take a 2-D slice of it the result depends on whether it is indexed with a list or with a slice. In the first case the result is transposed. Didn't find this behaviour in the manual.
>>> import numpy as np
>>> x = np.array([[[1,1,1],[2,2,2]], [[3,3,3],[4,4,4]]])
>>> x
array([[[1, 1, 1],
[2, 2, 2]],
[[3, 3, 3],
[4, 4, 4]]])
>>> x[0,:,[0,1]]
array([[1, 2],
[1, 2]])
>>> x[0,:,slice(2)]
array([[1, 1],
[2, 2]])
>>>
Could anyone point a rationale for this?
Because you are actually using advanced indexing when you use [0,1]. From the docs:
Combining advanced and basic indexing When there is at least one
slice (:), ellipsis (...) or np.newaxis in the index (or the array has
more dimensions than there are advanced indexes), then the behaviour
can be more complicated. It is like concatenating the indexing result
for each advanced index element
In the simplest case, there is only a single advanced index. A single
advanced index can for example replace a slice and the result array
will be the same, however, it is a copy and may have a different
memory layout. A slice is preferable when it is possible.
Pay attention to the two parts I've bolded above.
In particular, in this construction:
>>> x[0,:,[0,1]]
array([[1, 2],
[1, 2]])
Is the case where there is at least once "slice, ellipsisi, or np.newaxis" in the index, and the behavior is like concatenating the indexing result for each advanced index element. So:
>>> x[0,:,[0]]
array([[1, 2]])
>>> x[0,:,[1]]
array([[1, 2]])
>>> np.concatenate((x[0,:,[0]], x[0,:,[1]]))
array([[1, 2],
[1, 2]])
However, this construction is like the simple case: there is only a single advanced index, so it acts like a slice:
>>> x[0,:,slice(2)]
array([[1, 1],
[2, 2]])
>>> x[slice(0,1),:,slice(2)]
array([[[1, 1],
[2, 2]]])
Although note, that the later is actually three dimensional because the first part of the index acted as a slice, it's 3 slices so three dimensions.
As I understand it, NumPy is following the axis numbering philosophy when it spits out the result when given a list/tuple-like index.
array([[[1, 1, 1],
[2, 2, 2]],
[[3, 3, 3],
[4, 4, 4]]])
When you already specify the first two indices (x[0, :, ]), now the next question is how to extract the third dimension. Now, when you specify a tuple (0,1), it first extracts the 0th slice axis wise, so it gets [1, 2] since it lies in 0th axis, next it extracts 1st slice likewise and stacks below the already existing row [1, 2].
[[1, 1, 1], array([[1, 2],
[2, 2, 2]] =====> [1, 2]])
(visualize this stacking as below (not on top of) the already existing row since axis-0 grows downwards)
Alternatively, it is following the slicing philosophy (start:stop:step) when slice(n) is given for the index. Note that using slice(2) is essentially equal to 0:2 in your example. So, it extracts [1, 1] first, then [2, 2]. Note, here to how [1, 1] comes on top of [2, 2], again following the same axis philosophy here since we didn't leave the third dimension yet. This is why this result is the transpose of the other.
array([[1, 1],
[2, 2]])
Also, note that starting from 3-D arrays this consistency is preserved. Below is an example from 4-D array and the slicing results.
In [327]: xa
Out[327]:
array([[[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]],
[[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]],
[[27, 28, 29],
[30, 31, 32],
[33, 34, 35]]]])
In [328]: xa[0, 0, :, [0, 1]]
Out[328]:
array([[0, 3, 6],
[1, 4, 7]])
In [329]: xa[0, 0, :, 0:2]
Out[329]:
array([[0, 1],
[3, 4],
[6, 7]])

Python Matrix sorting via one column

I have a n x 2 matrix of integers. The first column is a series 0,1,-1,2,-2, however these are in the order that they were compiled in from their constituent matrices. The second column is a list of indices from another list.
I would like to sort the matrix via this second column. This would be equivalent to selecting two columns of data in Excel, and sorting via Column B (where the data is in columns A and B). Keep in mind, the adjacent data in the first column of each row should be kept with its respective second column counterpart. I have looked at solutions using the following:
data[np.argsort(data[:, 0])]
But this does not seem to work. The matrix in question looks like this:
matrix([[1, 1],
[1, 3],
[1, 7],
...,
[2, 1021],
[2, 1040],
[2, 1052]])
You could use np.lexsort:
numpy.lexsort(keys, axis=-1)
Perform an indirect sort using a sequence of keys.
Given multiple sorting keys, which can be interpreted as columns in a
spreadsheet, lexsort returns an array of integer indices that
describes the sort order by multiple columns.
In [13]: data = np.matrix(np.arange(10)[::-1].reshape(-1,2))
In [14]: data
Out[14]:
matrix([[9, 8],
[7, 6],
[5, 4],
[3, 2],
[1, 0]])
In [15]: temp = data.view(np.ndarray)
In [16]: np.lexsort((temp[:, 1], ))
Out[16]: array([4, 3, 2, 1, 0])
In [17]: temp[np.lexsort((temp[:, 1], ))]
Out[17]:
array([[1, 0],
[3, 2],
[5, 4],
[7, 6],
[9, 8]])
Note if you pass more than one key to np.lexsort, the last key is the primary key. The next to last key is the second key, and so on.
Using np.lexsort as I show above requires the use of a temporary array because np.lexsort does not work on numpy matrices. Since
temp = data.view(np.ndarray) creates a view, rather than a copy of data, it does not require much extra memory. However,
temp[np.lexsort((temp[:, 1], ))]
is a new array, which does require more memory.
There is also a way to sort by columns in-place. The idea is to view the array as a structured array with two columns. Unlike plain ndarrays, structured arrays have a sort method which allows you to specify columns as keys:
In [65]: data.dtype
Out[65]: dtype('int32')
In [66]: temp2 = data.ravel().view('int32, int32')
In [67]: temp2.sort(order = ['f1', 'f0'])
Notice that since temp2 is a view of data, it does not require allocating new memory and copying the array. Also, sorting temp2 modifies data at the same time:
In [69]: data
Out[69]:
matrix([[1, 0],
[3, 2],
[5, 4],
[7, 6],
[9, 8]])
You had the right idea, just off by a few characters:
>>> import numpy as np
>>> data = np.matrix([[9, 8],
... [7, 6],
... [5, 4],
... [3, 2],
... [1, 0]])
>>> data[np.argsort(data.A[:, 1])]
matrix([[1, 0],
[3, 2],
[5, 4],
[7, 6],
[9, 8]])

Categories

Resources