find two closest values in numpy ndarray

find two closest values in numpy ndarray - python

i have a numpy ndarray which looks like this
np.array(
[[40.26164428, 63.50590524, 58.30951895],
[50.99019514, 69.0651866 , 60.44005295],
[20.24845673, 14.31782106, 58.52349955],
[54.58937626, 53.03772242, 21.09502311],
[56.75385449, 57.5847202 , 1.41421356]])
(NOTE: the arrays i generate are always in different shapes ( the shape of this array is (5, 3) but it can be (2, 2) (4, 1)... ),
so this isn't a 3D coordinates array, it just generated like that)
What i need is to find the two closest values of the generated array and return their indices, in this case the values are 58.30951895 and 58.52349955, which should return the coordinates [0, 2] and [2, 2]
I've tried to use cKDtree but this isn't a coordinate array so that doesn't work in this case, how should i do this?

I'm going to be embarrassed when someone points out a one-liner, but here's a way to do it.
I flatten the array, then sort it, then find the deltas between each element. Find the minimum delta. Now, from the sorted array, I know the values of the two closest elements. argwhere then gives me the coordinates.
import numpy as np
data = np.array(
[[40.26164428, 63.50590524, 58.30951895],
[50.99019514, 69.0651866 , 60.44005295],
[20.24845673, 14.31782106, 58.52349955],
[54.58937626, 53.03772242, 21.09502311],
[56.75385449, 57.5847202 , 1.41421356]])
order = np.sort(data.reshape(-1))
delta = np.diff(order)
am = np.argmin(delta)
print( np.argwhere(data == order[am]))
print( np.argwhere(data == order[am+1]))
Output:
C:\tmp>python x.py
[[0 2]]
[[2 2]]

If I understand it correctly their position in the array is irrelevant, so in that case is simple, put the numbers in a list in a way that it remember their original position, then sorted it and find the lowest difference between two consecutive elements
>>> import itertools
>>> def pairwise(iterable):
a,b = itertools.tee(iterable)
next(b,None)
return zip(a,b)
>>> data=[[40.26164428, 63.50590524, 58.30951895],
[50.99019514, 69.0651866, 60.44005295],
[20.24845673, 14.31782106, 58.52349955],
[54.58937626, 53.03772242, 21.09502311],
[56.75385449, 57.5847202, 1.41421356]]
>>> linear=[ (value,x,y) for x,row in enumerate(data) for y,value in enumerate(row)]
>>> linear.sort(key=lambda x:x[0])
>>> min(pairwise(linear),key=lambda pair: abs(pair[0][0]-pair[1][0]))
((58.30951895, 0, 2), (58.52349955, 2, 2))
>>>

Related

How to get the index of np.maximum?

I know np.maximum computes the element-wise maximum, e.g.
>>> b = np.array([3, 6, 1])
>>> c = np.array([4, 2, 9])
>>> np.maximum(b, c)
array([4, 6, 9])
But is there any way to get the index as well? like in the above example, I also want something like this where each tuple denote (which array, index), it could be tuple or dictionary or something else. And also it would be great if it could work on 3d array, like the input two arrays are 3d arrays.
array([(1, 0), (0, 1), (1, 2)])

You could stack the two 1d-arrays to get a 2d-array and use argmax:
arr = np.vstack((b, c))
indices = np.argmax(arr, axis=0)
This will give you a list of integers, not tuples, but as you know that you compare per column, the last elements of each tuple are unnecessary anyway. They are just ascending integers starting at 0. If you really need them, though, you could just add
indices = list(zip(indices, range(len(b)))

Iterate over numpy with index (numpy equivalent of python enumerate)

I'm trying to create a function that will calculate the lattice distance (number of horizontal and vertical steps) between elements in a multi-dimensional numpy array. For this I need to retrieve the actual numbers from the indexes of each element as I iterate through the array. I want to store those values as numbers that I can run through a distance formula.
For the example array A
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
I'd like to create a loop that iterates through each element and for the first element 1 it would retrieve a=0, b=0 since 1 is at A[0,0], then a=0, b=1 for element 2 as it is located at A[0,1], and so on...
My envisioned output is two numbers (corresponding to the two index values for that element) for each element in the array. So in the example above, it would be the two values that I am assigning to be a and b. I only will need to retrieve these two numbers within the loop (rather than save separately as another data object).
Any thoughts on how to do this would be greatly appreciated!

As I've become more familiar with the numpy and pandas ecosystem, it's become clearer to me that iteration is usually outright wrong due to how slow it is in comparison, and writing to use a vectorized operation is best whenever possible. Though the style is not as obvious/Pythonic at first, I've (anecdotally) gained ridiculous speedups with vectorized operations; more than 1000x in a case of swapping out a form like some row iteration .apply(lambda)
#MSeifert's answer much better provides this and will be significantly more performant on a dataset of any real size
More general Answer by #cs95 covering and comparing alternatives to iteration in Pandas
Original Answer
You can iterate through the values in your array with numpy.ndenumerate to get the indices of the values in your array.
Using the documentation above:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
for index, values in np.ndenumerate(A):
print(index, values) # operate here

You can do it using np.ndenumerate but generally you don't need to iterate over an array.
You can simply create a meshgrid (or open grid) to get all indices at once and you can then process them (vectorized) much faster.
For example
>>> x, y = np.mgrid[slice(A.shape[0]), slice(A.shape[1])]
>>> x
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
>>> y
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2]])
and these can be processed like any other array. So if your function that needs the indices can be vectorized you shouldn't do the manual loop!
For example to calculate the lattice distance for each point to a point say (2, 3):
>>> abs(x - 2) + abs(y - 3)
array([[5, 4, 3],
[4, 3, 2],
[3, 2, 1]])
For distances an ogrid would be faster. Just replace np.mgrid with np.ogrid:
>>> x, y = np.ogrid[slice(A.shape[0]), slice(A.shape[1])]
>>> np.hypot(x - 2, y - 3) # cartesian distance this time! :-)
array([[ 3.60555128, 2.82842712, 2.23606798],
[ 3.16227766, 2.23606798, 1.41421356],
[ 3. , 2. , 1. ]])

Another possible solution:
import numpy as np
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
for _, val in np.ndenumerate(A):
ind = np.argwhere(A==val)
print val, ind
In this case you will obtain the array of indexes if value appears in array not once.

Python: How can I force 1-element NumPy arrays to be two-dimensional?

I have piece of code that slices a 2D NumPy array and returns the resulting (sub-)array. In some cases, the slicing only indexes one element, in which case the result is a one-element array:
>>> sub_array = orig_array[indices_h, indices_w]
>>> sub_array.shape
(1,)
How can I force this array to be two-dimensional in a general way? I.e.:
>>> sub_array.shape
(1,1)
I know that sub_array.reshape(1,1) works, but I would like to be able to apply it to sub_array generally without worrying about the number of elements in it. To put it in another way, I would like to compose a (light-weight) operation that converts a shape-(1,) array to a shape-(1,1) array, a shape-(2,2) array to a shape-(2,2) array etc. I can make a function:
def twodimensionalise(input_array):
if input_array.shape == (1,):
return input_array.reshape(1,1)
else:
return input_array
Is this the best I am going to get or does NumPy have something more 'native'?
Addition:
As pointed out in https://stackoverflow.com/a/31698471/865169, I was doing the indexing wrong. I really wanted to do:
sub_array = orig_array[indices_h][:, indices_w]
This does not work when there is only one entry in indices_h, but combining it with np.atleast_2d suggested in another answer, I arrive at:
sub_array = np.atleast_2d(orig_array[indices_h])[:, indices_w]

It sounds like you might be looking for atleast_2d. This function returns a view of a 1D array as a 2D array:
>>> arr1 = np.array([1.7]) # shape (1,)
>>> np.atleast_2d(arr1)
array([[ 1.7]])
>>> _.shape
(1, 1)
Arrays that are already 2D (or have more dimensions) are unchanged:
>>> arr2 = np.arange(4).reshape(2,2) # shape (2, 2)
>>> np.atleast_2d(arr2)
array([[0, 1],
[2, 3]])
>>> _.shape
(2, 2)

When defining a numpy array you can use the keyword argument ndmin to specify that you want at least two dimensions.
e.g.
arr = np.array(item_list, ndmin=2)
arr.shape
>>> (100, 1) # if item_list is 100 elements long etc
In the example in the question, just do
sub_array = np.array(orig_array[indices_h, indices_w], ndmin=2)
sub_array.shape
>>> (1,1)
This can be extended to higher dimensions too, unlike np.atleast_2d().

Are you sure you are indexing in the way you want to? In the case where indices_h and indices_w are broadcastable integer indexing arrays, the result will have the broadcasted shape of indices_h and indices_w. So if you want to make sure that the result is 2D, make the indices arrays 2D.
Otherwise, if you want all combinations of indices_h[i] and indices_w[j] (for all i, j), do e.g. a sequential indexing:
sub_array = orig_array[indices_h][:, indices_w]
Have a look at the documentation for details about advanced indexing.

Sort a list then give the indexes of the elements in their original order

I have an array of n numbers, say [1,4,6,2,3]. The sorted array is [1,2,3,4,6], and the indexes of these numbers in the old array are 0, 3, 4, 1, and 2. What is the best way, given an array of n numbers, to find this array of indexes?
My idea is to run order statistics for each element. However, since I have to rewrite this function many times (in contest), I'm wondering if there's a short way to do this.

>>> a = [1,4,6,2,3]
>>> [b[0] for b in sorted(enumerate(a),key=lambda i:i[1])]
[0, 3, 4, 1, 2]
Explanation:
enumerate(a) returns an enumeration over tuples consisting of the indexes and values in the original list: [(0, 1), (1, 4), (2, 6), (3, 2), (4, 3)]
Then sorted with a key of lambda i:i[1] sorts based on the original values (item 1 of each tuple).
Finally, the list comprehension [b[0] for b in ...] returns the original indexes (item 0 of each tuple).

Using numpy arrays instead of lists may be beneficial if you are doing a lot of statistics on the data. If you choose to do so, this would work:
import numpy as np
a = np.array( [1,4,6,2,3] )
b = np.argsort( a )
argsort() can operate on lists as well, but I believe that in this case it simply copies the data into an array first.

Here is another way:
>>> sorted(xrange(len(a)), key=lambda ix: a[ix])
[0, 3, 4, 1, 2]
This approach sorts not the original list, but its indices (created with xrange), using the original list as the sort keys.

This should do the trick:
from operator import itemgetter
indices = zip(*sorted(enumerate(my_list), key=itemgetter(1)))[0]

The long way instead of using list comprehension for beginner like me
a = [1,4,6,2,3]
b = enumerate(a)
c = sorted(b, key = lambda i:i[1])
d = []
for e in c:
d.append(e[0])
print(d)

Pythonic way to get the first AND the last element of the sequence

What is the easiest and cleanest way to get the first AND the last elements of a sequence? E.g., I have a sequence [1, 2, 3, 4, 5], and I'd like to get [1, 5] via some kind of slicing magic. What I have come up with so far is:
l = len(s)
result = s[0:l:l-1]
I actually need this for a bit more complex task. I have a 3D numpy array, which is cubic (i.e. is of size NxNxN, where N may vary). I'd like an easy and fast way to get a 2x2x2 array containing the values from the vertices of the source array. The example above is an oversimplified, 1D version of my task.

Use this:
result = [s[0], s[-1]]

Since you're using a numpy array, you may want to use fancy indexing:
a = np.arange(27)
indices = [0, -1]
b = a[indices] # array([0, 26])
For the 3d case:
vertices = [(0,0,0),(0,0,-1),(0,-1,0),(0,-1,-1),(-1,-1,-1),(-1,-1,0),(-1,0,0),(-1,0,-1)]
indices = list(zip(*vertices)) #Can store this for later use.
a = np.arange(27).reshape((3,3,3)) #dummy array for testing. Can be any shape size :)
vertex_values = a[indices].reshape((2,2,2))
I first write down all the vertices (although I am willing to bet there is a clever way to do it using itertools which would let you scale this up to N dimensions ...). The order you specify the vertices is the order they will be in the output array. Then I "transpose" the list of vertices (using zip) so that all the x indices are together and all the y indices are together, etc. (that's how numpy likes it). At this point, you can save that index array and use it to index your array whenever you want the corners of your box. You can easily reshape the result into a 2x2x2 array (although the order I have it is probably not the order you want).

This would give you a list of the first and last element in your sequence:
result = [s[0], s[-1]]
Alternatively, this would give you a tuple
result = s[0], s[-1]

With the particular case of a (N,N,N) ndarray X that you mention, would the following work for you?
s = slice(0,N,N-1)
X[s,s,s]
Example
>>> N = 3
>>> X = np.arange(N*N*N).reshape(N,N,N)
>>> s = slice(0,N,N-1)
>>> print X[s,s,s]
[[[ 0 2]
[ 6 8]]
[[18 20]
[24 26]]]

>>> from operator import itemgetter
>>> first_and_last = itemgetter(0, -1)
>>> first_and_last([1, 2, 3, 4, 5])
(1, 5)

Why do you want to use a slice? Getting each element with
result = [s[0], s[-1]]
is better and more readable.
If you really need to use the slice, then your solution is the simplest working one that I can think of.
This also works for the 3D case you've mentioned.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

find two closest values in numpy ndarray - python

Related

How to get the index of np.maximum?

Iterate over numpy with index (numpy equivalent of python enumerate)

Python: How can I force 1-element NumPy arrays to be two-dimensional?

Sort a list then give the indexes of the elements in their original order

Pythonic way to get the first AND the last element of the sequence

Categories

Resources