Finding Element in Array Indicing Syntax Explanation

Finding Element in Array Indicing Syntax Explanation - python

I have an array in Python that looks like this:
array = [[UUID('0d9ba9c6-632b-4dd4-912c-e8ff0a7134f7'), array([['1', '1']], dtype='<U21')], [UUID('9cb1feb6-0ef4-4e15-9070-7735545d12c9'), array([['2', '1']], dtype='<U21')], [UUID('955d308b-3570-4166-895e-81a077e6b9f9'), array([['3', '1']], dtype='<U21')]]
I also have a query that looks like this:
query = UUID('0d9ba9c6-632b-4dd4-912c-e8ff0a7134f7')
I am trying to find the sub-array in the main array that contains this UUID. So, querying this would return in:
[UUID('0d9ba9c6-632b-4dd4-912c-e8ff0a7134f7'), array([['1', '1']], dtype='<U21')]
I found this syntax online to do this:
out = array[array[:, 0] == query]
I know that this only works in NumPy if the array itself is an NP array. But why does this work and how? I am extremely confused by the syntax.

You might want to read the numpy tutorial on indexing on ndarrays.
But here are some basic explanations, understanding the examples would be a
good starting point.
So you have an array:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
The most basic indexing is with slices, as for basic lists, but you can access
nested dimensions with tuples (arr[i1:i2, j1:j2]) instead of chained indexing
as with basic lists (arr[i1:i2][j1:j2]):
arr[0] # array([1, 2, 3])
arr[:, 0] # array([1, 4])
arr[0, 0] # 1
Another way of indexing with numpy is to use boolean arrays.
You can use a tuple of lists of booleans, one list per dimension, each list
having the same size as the dimension.
And you can notice that you can use booleans on one dimension ([False, True])
and slices on the other dimension (:):
arr[[False, True], :] # array([[4, 5, 6]])
arr[[False, True]] # array([[4, 5, 6]])
arr[[False, True], [True, False, True]] # array([[4, 6]])
You can also use a single big boolean numpy array that has the same shape as
the array you are indexing:
arr[np.array([[False, True, False], [False, False, True]])] # array([2, 6])
Also, otherwise elementwise operation (+, /, % ...) are redefined by
numpy so they can work on whole arrays at once:
def is_even(x):
return x % 2 == 0
is_even(2) # True
is_even(arr) # array([[False, True, False], [True, False, True]])
Here we just constructed a big boolean array, so it can be used on your original array:
arr[is_even(arr)] # array([2, 4, 6])
But in your case you was only indexing on the first dimension, so using the tuple of boolean lists indexing method:
arr[is_even(arr[:, 0]), :] # array([4, 5, 6])

Related

Creating a "bitmask" from several boolean numpy arrays

I'm trying to convert several masks (boolean arrays) to a bitmask with numpy, while that in theory works I feel that I'm doing too many operations.
For example to create the bitmask I use:
import numpy as np
flags = [
np.array([True, False, False]),
np.array([False, True, False]),
np.array([False, True, False])
]
flag_bits = np.zeros(3, dtype=np.int8)
for idx, flag in enumerate(flags):
flag_bits += flag.astype(np.int8) << idx # equivalent to flag * 2 ** idx
Which gives me the expected "bitmask":
>>> flag_bits
array([1, 6, 0], dtype=int8)
>>> [np.binary_repr(bit, width=7) for bit in flag_bits]
['0000001', '0000110', '0000000']
However I feel that especially the casting to int8 and the addition with the flag_bits array is too complicated. Therefore I wanted to ask if there is any NumPy functionality that I missed that could be used to create such an "bitmask" array?
Note: I'm calling an external function that expects such a bitmask, otherwise I would stick with the boolean arrays.

>>> x = np.array(2**i for i in range(1, np.shape(flags)[1]+1))
>>> np.dot(flags, x)
array([1, 2, 2])
How it works: in a bit mask, every bit is effectively an original array element multiplied by a degree of 2 according to its position, e.g. 4 = False * 1 + True * 2 + False * 4. Effectively this can be represented as matrix multiplication, which is really efficient in numpy.
So, first line is a list comprehension to create these weights: x = [1, 2, 4, 8, ... 2^(n+1)].
Then, each line in flags is multiplied by the corresponding element in x and everything is summed up (this is how matrix multiplication works). At the end, we get the bitmask

How about this (added conversion to int8, if desired):
flag_bits = (np.transpose(flags) << np.arange(len(flags))).sum(axis=1)\
.astype(np.int8)
#array([1, 6, 0], dtype=int8)

Here's an approach to directly get to the string bitmask with boolean-indexing -
out = np.repeat('0000000',3).astype('S7')
out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T
Sample run -
In [41]: flags
Out[41]:
[array([ True, False, False], dtype=bool),
array([False, True, False], dtype=bool),
array([False, True, False], dtype=bool)]
In [42]: out = np.repeat('0000000',3).astype('S7')
In [43]: out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T
In [44]: out
Out[44]:
array([b'0000001', b'0000110', b'0000000'],
dtype='|S7')
Using the same matrix-multiplication strategy as dicussed in detail in #Marat's solution, but using a vectorized scaling array that gives us flag_bits -
np.dot(2**np.arange(3),flags)

Membership checking in Numpy ndarray

I have written a script that evaluates if some entry of arr is in check_elements. My approach does not compare single entries, but whole vectors inside of arr. Thus, the script checks if [8, 3], [4, 5], ... is in check_elements.
Here's an example:
import numpy as np
# arr.shape -> (2, 3, 2)
arr = np.array([[[8, 3],
[4, 5],
[6, 2]],
[[9, 0],
[1, 10],
[7, 11]]])
# check_elements.shape -> (3, 2)
# generally: (n, 2)
check_elements = np.array([[4, 5], [9, 0], [7, 11]])
# rslt.shape -> (2, 3)
rslt = np.zeros((arr.shape[0], arr.shape[1]), dtype=np.bool)
for i, j in np.ndindex((arr.shape[0], arr.shape[1])):
if arr[i, j] in check_elements: # <-- condition is checked against
# the whole last dimension
rslt[i, j] = True
else:
rslt[i, j] = False
Now:
print(rslt)
...would print:
[[False True False]
[ True False True]]
For getting the indices of I use:
print(np.transpose(np.nonzero(rslt)))
...which prints the following:
[[0 1] # arr[0, 1] -> [4, 5] -> is in check_elements
[1 0] # arr[1, 0] -> [9, 0] -> is in check_elements
[1 2]] # arr[1, 2] -> [7, 11] -> is in check_elements
This task would be easy and performant if I would check a condition on single values, like arr > 3 or np.where(...), but I am not interested in single values. I want to check a condition against the whole last dimension (or slices of it).
My question is: is there a faster way to achieve the same result? Am I right that vectorized attempts and things like np.where can not be used for my problem, because they always operate on single values and not on a whole dimension or slices of that dimension?

Here is a Numpythonic approach using broadcasting:
>>> (check_elements == arr[:,:,None]).reshape(2, 3, 6).any(axis=2)
array([[False, True, False],
[ True, False, True]], dtype=bool)

The numpy_indexed package (disclaimer: I am its author) contains functionality to perform these kind of queries; specifically, containment relations for nd (sub)arrays:
import numpy_indexed as npi
flatidx = npi.indices(arr.reshape(-1, 2), check_elements)
idx = np.unravel_index(flatidx, arr.shape[:-1])
Note that the implementation is fully vectorized under the hood.
Also, note that with this approach, the order of the indices in idx match with the order of check_elements; the first item in idx are the row and col of the first item in check_elements. This information is lost when using an approach along the lines you posted above, or when using one of the alternative suggested answers, which will give you the idx sorted by their order of appearance in arr instead, which is often undesirable.

You can use np.in1d even though it is meant for 1D arrays by giving it a 1D view of your array, containing one element per last axis:
arr_view = arr.view((np.void, arr.dtype.itemsize*arr.shape[-1])).ravel()
check_view = check_elements.view((np.void,
check_elements.dtype.itemsize*check_elements.shape[-1])).ravel()
will give you two 1D arrays, which contain a void type version of you 2 element arrays along the last axis. Now you can check, which of the elements in arr is also in check_view by doing:
flatResult = np.in1d(arr_view, check_view)
This will give a flattened array, which you can then reshape to the shape of arr, dropping the last axis:
print(flatResult.reshape(arr.shape[:-1]))
which will give you the desired result:
array([[False, True, False],
[ True, False, True]], dtype=bool)

Deleting elements at specific positions in a M X N numpy array

I am trying to implement Seam carving algorithm wherein we have to delete a seam from the image. Image is stored as a numpy M X N array. I have found the seam, which is nothing but an array of M integers whose value specifies column values to be deleted.
Eg: a 2 X 3 array
import numpy
img_array = numpy.array([[1, 2, 3],[4, 5, 6]])
and
seam = numpy.array([1,2])
This means that we have to delete from the Img 1st element from the 1st row (1), and second element from the second row (5). After deletion, Img will be
print img_array
[[2,3]
[4,6]]
Work done:
I am new to python and I have found solutions which concern about single dimensional array or deleting an entire row or column. But I could not find a way to delete elements from specific columns.

Will you always delete one element from each row? If you try to delete one element from one row, but not another, you will end up with a ragged array. That is why there isn't a general purpose way of removing single elements from a 2d array.
One option is to figure out which ones you want to delete, remove them from a flattened array, and then reshape it back to the correct shape. Then it is your responsibility to ensure that the correct number of elements are removed.
All of these 'delete' methods actually copy the 'keep' values to a new array. Nothing actually deletes elements from the original array. So you could just as easily (and just as fast) do your own copy to a new array.
Another option is to work with lists of lists. Those are more tolerant of becoming ragged.
Here's an example of using a boolean mask to remove selected elements from an array (making a copy of course):
In [100]: x=np.arange(1,7).reshape(2,3)
In [101]: x
Out[101]:
array([[1, 2, 3],
[4, 5, 6]])
In [102]: mask=np.ones_like(x,bool)
In [103]: mask
Out[103]:
array([[ True, True, True],
[ True, True, True]], dtype=bool)
In [104]: mask[0,0]=False
In [105]: mask[1,1]=False
In [106]: mask
Out[106]:
array([[False, True, True],
[ True, False, True]], dtype=bool)
In [107]: x[mask]
Out[107]: array([2, 3, 4, 6]) # it's flat
In [108]: x[mask].reshape(2,2)
Out[108]:
array([[2, 3],
[4, 6]])
Notice that even though both x and mask are 2d, the indexing result is flattened. Such a mask could easily have produced an array that couldn't be reshape back to 2d.

Each row in your matrix is a single dimensional array.
import numpy
ary=numpy.array([[1,2,3],[4,5,6]])
print ary[0]
Gives
array([1, 2, 3])
You could iterate over your matrix, using the values from you seam to remove an element from the current row. Append the result to a modified matrix you are building.
seam = numpy.array([1,2])
for i in range(2):
tmp = numpy.delete(ary[i],seam[i]-1)
if i == 0:
modified_ary = tmp
else:
modified_ary = numpy.vstack((modified_ary,tmp))
print modified_ary
Gives
[[2 3]
[4 6]]

ValueError: too many boolean indices for a n=600 array (float)

I am getting an issue where I am trying to run (on Python):
#Loading in the text file in need of analysis
x,y=loadtxt('2.8k to 293k 15102014_rerun 47_0K.txt',skiprows=1,unpack=True,dtype=float,delimiter=",")
C=-1.0 #Need to flip my voltage axis
yone=C*y #Actually flipping the array
plot(x,yone)#Test
origin=600.0#Where is the origin? i.e V=0, taking the 0 to 1V elements of array
xorg=x[origin:1201]# Array from the origin to the final point (n)
xfit=xorg[(x>0.85)==True] # Taking the array from the origin and shortening it further to get relevant area
It returns the ValueError. I have tried doing this process with a much smaller array of 10 elements and the xfit=xorg[(x>0.85)==True] command works fine. What the program is trying to do is to narrow the field of vision, of some data, to a relevant point so I can fit a line of best fit a linear element of the data.
I apologise for the formatting being messy but this is the first question I have asked on this website as I cannot search for something that I can understand where I am going wrong.

This answer is for people that don't know about numpy arrays (like me), thanks MrE for the pointers to numpy docs.
Numpy arrays have this nice feature of boolean masks.
For numpy arrays, most operators return an array of the operation applied to every element - instead of a single result like in plain Python lists:
>>> alist = range(10)
>>> alist
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> alist > 5
True
>>> anarray = np.array(alist)
>>> anarray
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> anarray > 5
array([False, False, False, False, False, False, True, True, True, True], dtype=bool)
You can use an array of bool as the index for a numpy array, in this case you get a filtered array for the positions where the corresponding bool array element is True.
>>> mask = anarray > 5
>>> anarray[mask]
array([6, 7, 8, 9])
The mask must not be bigger than the array:
>>> anotherarray = anarray[mask]
>>> anotherarray
array([6, 7, 8, 9])
>>> anotherarray[mask]
ValueError: too many boolean indices
So you cant use a mask bigger than the array you are masking:
>>> anotherarray[anarray > 7]
ValueError: too many boolean indices
>>> anotherarray[anotherarray > 7]
array([8, 9])
Since xorg is smaller than x, a mask based on x will be longer than xorg and you get the ValueError exception.

Change
xfit=xorg[x>0.85]
to
xfit=xorg[xorg>0.85]
x is larger than xorg so x > 0.85 has more elements than xorg

Try the following:
replace your code
xorg=x[origin:1201]
xfit=xorg[(x>0.85)==True]
with
mask = x > 0.85
xfit = xorg[mask[origin:1201]]
This works when x is a numpy.ndarray, otherwise you might end up in problems as advanced indexing will return a view, not a copy, see SciPy/NumPy documentation.
I'm unsure whether you like to use numpy, but when trying to fit data, numpy/scipy is a good choice anyway...

Indexing NumPy 2D array with another 2D array

I have something like
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
and
select = array([0,1,0,0])
My target is
result = array([1, 5, 7, 6])
I tried _ix as I read at Simplfy row AND column extraction, numpy, but this did not result in what I wanted.
p.s. Please change the title of this question if you can think of a more precise one.

The numpy way to do this is by using np.choose or fancy indexing/take (see below):
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
select = array([0,1,0,0])
result = np.choose(select, m.T)
So there is no need for python loops, or anything, with all the speed advantages numpy gives you. m.T is just needed because choose is really more a choise between the two arrays np.choose(select, (m[:,0], m[:1])), but its straight forward to use it like this.
Using fancy indexing:
result = m[np.arange(len(select)), select]
And if speed is very important np.take, which works on a 1D view (its quite a bit faster for some reason, but maybe not for these tiny arrays):
result = m.take(select+np.arange(0, len(select) * m.shape[1], m.shape[1]))

I prefer to use NP.where for indexing tasks of this sort (rather than NP.ix_)
What is not mentioned in the OP is whether the result is selected by location (row/col in the source array) or by some condition (e.g., m >= 5). In any event, the code snippet below covers both scenarios.
Three steps:
create the condition array;
generate an index array by calling NP.where, passing in this
condition array; and
apply this index array against the source array
>>> import numpy as NP
>>> cnd = (m==1) | (m==5) | (m==7) | (m==6)
>>> cnd
matrix([[ True, False],
[False, True],
[ True, False],
[ True, False]], dtype=bool)
>>> # generate the index array/matrix
>>> # by calling NP.where, passing in the condition (cnd)
>>> ndx = NP.where(cnd)
>>> ndx
(matrix([[0, 1, 2, 3]]), matrix([[0, 1, 0, 0]]))
>>> # now apply it against the source array
>>> m[ndx]
matrix([[1, 5, 7, 6]])
The argument passed to NP.where, cnd, is a boolean array, which in this case, is the result from a single expression comprised of compound conditional expressions (first line above)
If constructing such a value filter doesn't apply to your particular use case, that's fine, you just need to generate the actual boolean matrix (the value of cnd) some other way (or create it directly).

What about using python?
result = array([subarray[index] for subarray, index in zip(m, select)])

IMHO, this is simplest variant:
m[np.arange(4), select]

Since the title is referring to indexing a 2D array with another 2D array, the actual general numpy solution can be found here.
In short:
A 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, is used to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.

result = array([m[j][0] if i==0 else m[j][1] for i,j in zip(select, range(0, len(m)))])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding Element in Array Indicing Syntax Explanation - python

Related

Creating a "bitmask" from several boolean numpy arrays

Membership checking in Numpy ndarray

Deleting elements at specific positions in a M X N numpy array

ValueError: too many boolean indices for a n=600 array (float)

Indexing NumPy 2D array with another 2D array

Categories

Resources