Numpy: Find indexes of rows in another array - python

I have two arrays containing lists of 3d coordinates.
I am trying to get a list of the indexes where each element in my array is found in another array.
a = np.array([[0.4,0.6,0.8],
[0.4, 1.0, 1.2],
[0.6,1.0,1.4],
[0.6,1.2,1.6]])
b = np.array([[0.4, 1.0, 1.2],
[0.4,0.6,0.8],
[0.6,1.2,1.6],
[0.6,1.0,1.4],
[0.6,1.0,1.4]])
idx = [np.where(np.all(a==i,axis=1)) for i in b]
# idx = [1, 0, 3, 2, 2]
Is there a way to achieve this using numpy methods as my arrays for a and b are large ~100k elements each.
Thanks in advance.

You can use b[:,None] for the comparison. It is a special kind of slicing that allows you to compare both arrays row-wise, since the shape of b is now (5,1,3), instead of (5,3).
idx = np.where( (a==b[:,None]).all(-1) )[1]
#output of idx: [1 0 3 2 2]

Related

How to append values to multidimensional numpy array?

How can I efficiently append values to a multidimensional numpy array?
import numpy as np
a = np.array([[1,2,3], [4,5,6]])
print(a)
I want to append np.NaN for k=2 times to each dimension/array of the outer array?
One option would be to use a loop - but I guess there must be something smarter (vectorized) in numpy
Expected result would be:
np.array([[1,2,3, np.NaN, np.NaN, ], [4,5,6, np.NaN, np.NaN, ]])
I.e. I am looking for a way to:
np.concatenate((a, np.NaN))
on all the inner dimensions.
A
np.append(a, [[np.NaN, np.NaN]], axis=0)
fails with:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2
For your problem np.hstack() or np.pad() should do the job.
Using np.hstack():
k = 2
a_mat = np.array([[1,2,3], [4, 5, 6]])
nan_mat = np.zeros((a_mat.shape[0], k))
nan_mat.fill(np.nan)
a_mat = np.hstack((a_mat, nan_mat))
Using np.pad():
k = 2
padding_shape = [(0, 0), (0, k)] # [(dim1_before_pads, dim1_after_pads)..]
a_mat = np.array([[1,2,3], [4, 5, 6]])
np.pad(a_mat, mode='constant', constant_values=np.nan)
Note: Incase you are using np.pad() for filling with np.nan, check this post out as well: about padding with np.nan

iterating a filtered Numpy array whilst maintaining index information

I am attempting to pass filtered values from a Numpy array into a function.
I need to pass values only above a certain value, and their index position with the Numpy array.
I am attempting to avoid iterating over the entire array within python by using Numpys own filtering systems, the arrays i am dealing with have 20k of values in them with potentially only very few being relevant.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = np.nonzero(somearray > 4)
for i in arrayindex:
somefunction(arrayindex[0], somearray[arrayindex[0]])
This threw up errors of logic not being able to handle multiple values,
this led me to testing it through print statement to see what was going on.
for cell in arrayindex:
print(f"index {cell}")
print(f"data {somearray[cell]}")
I expected an output of
index 4
data 5
index 5
data 6
But instead i get
index [4 5]
data [5 6]
I have looked through different methods to iterate through numpy arrays such and neditor, but none seem to still allow me to do the filtering of values outside of the for loop.
Is there a solution to my quandary?
Oh, i am aware that is is generally frowned upon to loop through a numpy array, however the function that i am passing these values to are complex, triggering certain events and involving data to be uploaded to a data base dependent on the data location within the array.
Thanks.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = [idx for idx, val in enumerate(somearray) if val > 4]
for i in range(0, len(arrayindex)):
somefunction(arrayindex[i], somearray[arrayindex[i]])
for i in range(0, len(arrayindex)):
print("index", arrayindex[i])
print("data", somearray[arrayindex[i]])
You need to have a clear idea of what nonzero produces, and pay attention to the difference between indexing with a list(s) and with a tuple.
===
In [110]: somearray = np.array([1,2,3,4,5,6])
...: arrayindex = np.nonzero(somearray > 4)
nonzero produces a tuple of arrays, one per dimension (this becomes more obvious with 2d arrays):
In [111]: arrayindex
Out[111]: (array([4, 5]),)
It can be used directly as an index:
In [113]: somearray[arrayindex]
Out[113]: array([5, 6])
In this 1d case you could take the array out of the tuple, and iterate on it:
In [114]: for i in arrayindex[0]:print(i, somearray[i])
4 5
5 6
argwhere does a 'transpose', which could also be used for iteration
In [115]: idxs = np.argwhere(somearray>4)
In [116]: idxs
Out[116]:
array([[4],
[5]])
In [117]: for i in idxs: print(i,somearray[i])
[4] [5]
[5] [6]
idxs is (2,1) shape, so i is (1,) shape array, resulting in the brackets in the display. Occasionally it's useful, but nonzero is used more (often by it's other name, np.where).
2d
argwhere has a 2d example:
In [119]: x=np.arange(6).reshape(2,3)
In [120]: np.argwhere(x>1)
Out[120]:
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
In [121]: np.nonzero(x>1)
Out[121]: (array([0, 1, 1, 1]), array([2, 0, 1, 2]))
In [122]: x[np.nonzero(x>1)]
Out[122]: array([2, 3, 4, 5])
While nonzero can be used to index the array, argwhere elements can't.
In [123]: for ij in np.argwhere(x>1):
...: print(ij,x[ij])
...:
...
IndexError: index 2 is out of bounds for axis 0 with size 2
Problem is that ij is a list, which is used to index on dimension. numpy distinguishes between lists and tuples when indexing. (Earlier versions fudged the difference, but current versions are taking a more rigorous approach.)
So we need to change the list into a tuple. One way is to unpack it:
In [124]: for i,j in np.argwhere(x>1):
...: print(i,j,x[i,j])
...:
...:
0 2 2
1 0 3
1 1 4
1 2 5
I could have used: print(ij,x[tuple(ij)]) in [123].
I should have used unpacking the [117] iteration:
In [125]: for i, in idxs: print(i,somearray[i])
4 5
5 6
or somearray[tuple(i)]

How to delete rows of numpy array by multiple row indices?

I have two lists of indices (idx[0] and idx[1]), and I should delete the corresponding rows from numpy array y_test.
y_test
12 11 10
1 2 2
3 2 3
4 1 2
13 1 10
idx[0] = [0,2]
idx[1] = [1,3]
I tried to delete the rows as follows (using ~). But it didn't work:
result = y_test[(~idx[0]+~idx[1]+~idx[2])]
Expected result:
result =
13 1 10
Instead of removing elements, just make a new array with the desired ones. This will keep any future indexing from getting jumbled up and maintain the old array.
import numpy as np
y_test = np.asarray([[12, 11, 10], [1, 2, 2], [3, 2, 3], [4, 1, 2], [13, 1, 10]])
idx = [[0, 2], [1, 3]]
# flatten list of lists
idx_flat = [i for j in idx for i in j]
# assign values that are NOT in your idx list to a new array
result = [row for num, row in enumerate(y_test) if num not in idx_flat]
# cast this however you want it, right now 'result' is a list of np.arrays
print result
[array([13, 1, 10])]
For an understanding of the flatten step using list comprehensions check this out
You can use numpy.delete which deletes the subarrays along the axis.
np.delete(y_test, idx, axis=0)
Make sure that idx.dtype is an integer type and use numpy.astype if not.
Your approach did not work because idx is not a boolean index array but holds the indices. So ~ which is binary negation will produce ~[0, 2] = [-1, -3] (where both should be numpy arrays).
I would definitely recommend reading up on the difference between index arrays and boolean index arrays. For boolean index arrays I would suggest using numpy.logical_not and numpy.logical_or.
+ concatenates Python lists but is the standard plus for numpy arrays.
Since you are using NumPy I'd suggest masking in this way.
Setup:
import numpy as np
y_test = np.array([[12,11,10],
[1,2,2],
[3,2,3],
[4,1,2],
[13,1,10]])
idx = np.array([[0,2], [1,3]])
Generate the mask:
Generate a mask of ones then set to zero elements at index in idx:
mask = np.ones(len(y_test), dtype = int).reshape(5,1)
mask[idx.flatten()] = 0
Finally apply the mask:
y_test[~np.all(y_test * mask == 0, axis=1)]
#=> [[13 1 10]]
y_test has not been modified.

Multiplying each array inside another list with an element from another array

I have created a list containing 10 arrays that consist of 20 random numbers between 0 and 1 each.
Now, I wish to multiply each array in the list with the numbers 0.05, 0.1, ..., to 1.0 so that none of the elements in each array is larger than the number it is multiplied with.
For example, all the 20 elements in the first array should lie between 0 and 0.05, all the elements in the second array between 0 and 0.10 and so on.
I create a list of 10 random arrays and a range of numbers between 0 and 1 with:
range1 = np.arange(0.005, 0.105, 0.005)
noise1 = [abs(np.random.uniform(0,1,20)) for i in range(10)]
I then try to multiply the elements with:
noise2 = [noise1 * range1 for i in noise1]
But this doesn't work and just causes all the arrays in the list to have the same values.
I would really appreciate some help with how to do this.
Hoping I have clearly understood the question and hence providing this solution.
noise2 = [noise1[i] * range1[i] for i in range(len(noise1))]
A more pythonic way would be using zip:
range1 = [1, 2, 3]
noise1 = [3, 4, 5]
noise2 = [i * j for i, j in zip(range1, noise1)]
# [3, 8, 15]

Find the indices of the lowest closest neighbors between two lists in python

Given 2 numpy arrays of unequal size: A (a presorted dataset) and B (a list of query values). I want to find the closest "lower" neighbor in array A to each element of array B. Example code below:
import numpy as np
A = np.array([0.456, 2.0, 2.948, 3.0, 7.0, 12.132]) #pre-sorted dataset
B = np.array([1.1, 1.9, 2.1, 5.0, 7.0]) #query values, not necessarily sorted
print A.searchsorted(B)
# RESULT: [1 1 2 4 4]
# DESIRED: [0 0 1 3 4]
In this example, B[0]'s closest neighbors are A[0] and A[1]. It is closest to A[1], which is why searchsorted returns index 1 as a match, but what i want is the lower neighbor at index 0. Same for B[1:4], and B[4] should be matched with A[4] because both values are identical.
I could do something clunky like this:
desired = []
for b in B:
id = -1
for a in A:
if a > b:
if id == -1:
desired.append(0)
else:
desired.append(id)
break
id+=1
print desired
# RESULT: [0, 0, 1, 3, 4]
But there's gotta be a prettier more concise way to write this with numpy. I'd like to keep my solution in numpy because i'm dealing with large data sets, but i'm open to other options.
You can introduce the optional argument side and set it to 'right' as mentioned in the docs. Then, subtract the final indices result by 1 for the desired output, like so -
A.searchsorted(B,side='right')-1
Sample run -
In [63]: A
Out[63]: array([ 0.456, 2. , 2.948, 3. , 7. , 12.132])
In [64]: B
Out[64]: array([ 1.1, 1.9, 2.1, 5. , 7. ])
In [65]: A.searchsorted(B,side='right')-1
Out[65]: array([0, 0, 1, 3, 4])
In [66]: A.searchsorted(A,side='right')-1 # With itself
Out[66]: array([0, 1, 2, 3, 4, 5])
Here's one way to do this. np.argmax stops at the first True it encounters, so as long as A is sorted this provides the desired result.
[np.argmax(A>b)-1 for b in B]
Edit: I got the inequality wrong initially, it works now.

Categories

Resources