Get original indices of a sorted Numpy array

Get original indices of a sorted Numpy array - python

I have an array of distances a = np.array([20.5 ,5.3 ,60.7 ,3.0 ], 'double') and I need the indices of the sorted array (for example [3, 1, 0, 2], for a.sort()). Is there a function in Numpy to do that?

Yes, there's the x = numpy.argsort(a) function or x = numpy.ndarray.argsort(a) method. It does exactly what you're asking for. You can also call argsort as a method on an ndarray object like so: a.argsort().
Here's a link to the documentation: http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html#numpy.argsort

Here's an example, for reference and convenience:
# create an array
a = np.array([5,2,3])
# np.sort - returns the array, sorted
np.sort(a)
>>> array([2, 3, 5])
# argsort - returns the original indexes of the sorted array
np.argsort(a)
>>> array([1, 2, 0])

Related

Calculation on list of numpy array

I'm trying to do some calculation (mean, sum, etc.) on a list containing numpy arrays.
For example:
list = [array([2, 3, 4]),array([4, 4, 4]),array([6, 5, 4])]
How can retrieve the mean (for example) ?
In a list like [4,4,4] or a numpy array like array([4,4,4]) ?
Thanks in advance for your help!
EDIT : Sorry, I didn't explain properly what I was aiming to do : I would like to get the mean of i-th index of the arrays. For example, for index 0 :
(2+4+6)/3 = 4
I don't want this :
(2+3+4)/3 = 3
Therefore the end result will be
[4,4,4] / and not [3,4,5]

If L were a list of scalars then calculating the mean could be done using the straight forward expression:
sum(L) / len(L)
Luckily, this works unchanged on lists of arrays:
L = [np.array([2, 3, 4]), np.array([4, 4, 4]), np.array([6, 5, 4])]
sum(L) / len(L)
# array([4., 4., 4.])
For this example this happens to be quitea bit faster than the numpy function
np.mean
timeit(lambda: np.mean(L, axis=0))
# 13.708808058872819
timeit(lambda: sum(L) / len(L))
# 3.4780975924804807

You can use a for loop and iterate through the elements of your array, if your list is not too big:
mean = []
for i in range(len(list)):
mean.append(np.mean(list[i]))

Given a 1d array a, np.mean(a) should do the trick.
If you have a 2d array and want the means for each one separately, specify np.mean(a, axis=1).
There are equivalent functions for np.sum, etc.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html

You can use map
import numpy as np
my_list = [np.array([2, 3, 4]),np.array([4, 4, 4]),np.array([6, 5, 4])]
np.mean(my_list,axis=0) #[4,4,4]
Note: Do not name your variable as list as it will shadow the built-ins

numpy-equivalent of list.pop?

Is there a numpy method which is equivalent to the builtin pop for python lists?
Popping obviously doesn't work on numpy arrays, and I want to avoid a list conversion.

There is no pop method for NumPy arrays, but you could just use basic slicing (which would be efficient since it returns a view, not a copy):
In [104]: y = np.arange(5); y
Out[105]: array([0, 1, 2, 3, 4])
In [106]: last, y = y[-1], y[:-1]
In [107]: last, y
Out[107]: (4, array([0, 1, 2, 3]))
If there were a pop method it would return the last value in y and modify y.
Above,
last, y = y[-1], y[:-1]
assigns the last value to the variable last and modifies y.

Here is one example using numpy.delete():
import numpy as np
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(arr)
# array([[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12]])
arr = np.delete(arr, 1, 0)
print(arr)
# array([[ 1, 2, 3, 4],
# [ 9, 10, 11, 12]])

Pop doesn't exist for NumPy arrays, but you can use NumPy indexing in combination with array restructuring, for example hstack/vstack or numpy.delete(), to emulate popping.
Here are some example functions I can think of (which apparently don't work when the index is -1, but you can fix this with a simple conditional):
def poprow(my_array,pr):
""" row popping in numpy arrays
Input: my_array - NumPy array, pr: row index to pop out
Output: [new_array,popped_row] """
i = pr
pop = my_array[i]
new_array = np.vstack((my_array[:i],my_array[i+1:]))
return [new_array,pop]
def popcol(my_array,pc):
""" column popping in numpy arrays
Input: my_array: NumPy array, pc: column index to pop out
Output: [new_array,popped_col] """
i = pc
pop = my_array[:,i]
new_array = np.hstack((my_array[:,:i],my_array[:,i+1:]))
return [new_array,pop]
This returns the array without the popped row/column, as well as the popped row/column separately:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> [A,poparow] = poprow(A,0)
>>> poparow
array([1, 2, 3])
>>> A = np.array([[1,2,3],[4,5,6]])
>>> [A,popacol] = popcol(A,2)
>>> popacol
array([3, 6])

There isn't any pop() method for numpy arrays unlike List, Here're some alternatives you can try out-
Using Basic Slicing
>>> x = np.array([1,2,3,4,5])
>>> x = x[:-1]; x
>>> [1,2,3,4]
Or, By Using delete()
Syntax - np.delete(arr, obj, axis=None)
arr: Input array
obj: Row or column number to delete
axis: Axis to delete
>>> x = np.array([1,2,3,4,5])
>>> x = x = np.delete(x, len(x)-1, 0)
>>> [1,2,3,4]

The important thing is that it takes one from the original array and deletes it.
If you don't m
ind the superficial implementation of a single method to complete the process, the following code will do what you want.
import numpy as np
a = np.arange(0, 3)
i = 0
selected, others = a[i], np.delete(a, i)
print(selected)
print(others)
# result:
# 0
# [1 2]

The most 'elegant' solution for retrieving and removing a random item in Numpy is this:
import numpy as np
import random
arr = np.array([1, 3, 5, 2, 8, 7])
element = random.choice(arr)
elementIndex = np.where(arr == element)[0][0]
arr = np.delete(arr, elementIndex)
For curious coders:
The np.where() method returns two lists. The first returns the row indexes of the matching elements and the second the column indexes. This is useful when searching for elements in a 2d array. In our case, the first element of the first returned list is interesting.

To add, If you want to implement pop for a row or column from a numpy 2D array you could do like:
col = arr[:, -1] # gets the last column
np.delete(arr, -1, 1) # deletes the last column
and for row:
row = arr[-1, :] # gets the last row
np.delete(arr, -1, 0) # deletes the last row

unutbu had a simple answer for this, but pop() can also take an index as a parameter. This is how you replicate it with numpy:
pop_index = 4
pop = y[pop_index]
y = np.concatenate([y[:pop_index],y[pop_index+1:]])

OK, since I didn't see a good answer that RETURNS the 1st element and REMOVES it from the original array, I wrote a simple (if kludgy) function utilizing global for a 1d array (modification required for multidims):
tmp_array_for_popfunc = 1d_array
def array_pop():
global tmp_array_for_popfunc
r = tmp_array_for_popfunc[0]
tmp_array_for_popfunc = np.delete(tmp_array_for_popfunc, 0)
return r
check it by using-
print(len(tmp_array_for_popfunc)) # confirm initial size of tmp_array_for_popfunc
print(array_pop()) #prints return value at tmp_array_for_popfunc[0]
print(len(tmp_array_for_popfunc)) # now size is 1 smaller

I made a function as follow, doing almost the same. This function has 2 arguments: np_array and index, and return the value of the given index of the array.
def np_pop(np_array, index=-1):
'''
Pop the "index" from np_array and return the value.
Default value for index is the last element.
'''
# add this to make sure 'numpy' is imported
import numpy as np
# read the value of the given array at the given index
value = np_array[index]
# remove value from array
np.delete(np_array, index, 0)
# return the value
return value
Remember you can add a condition to make sure the given index is exist in the array and return -1 if anything goes wrong.
Now you can use it like this:
import numpy as np
i = 2 # let's assume we want to pop index number 2
y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) # assume 'y' is our numpy array
poped_val = np_pop(y, i) # value of the piped index

Numpy - the best way to remove the last element from 1 dimensional array?

What is the most efficient way to remove the last element from a numpy 1 dimensional array? (like pop for list)

NumPy arrays have a fixed size, so you cannot remove an element in-place. For example using del doesn't work:
>>> import numpy as np
>>> arr = np.arange(5)
>>> del arr[-1]
ValueError: cannot delete array elements
Note that the index -1 represents the last element. That's because negative indices in Python (and NumPy) are counted from the end, so -1 is the last, -2 is the one before last and -len is actually the first element. That's just for your information in case you didn't know.
Python lists are variable sized so it's easy to add or remove elements.
So if you want to remove an element you need to create a new array or view.
Creating a new view
You can create a new view containing all elements except the last one using the slice notation:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> arr[:-1] # all but the last element
array([0, 1, 2, 3])
>>> arr[:-2] # all but the last two elements
array([0, 1, 2])
>>> arr[1:] # all but the first element
array([1, 2, 3, 4])
>>> arr[1:-1] # all but the first and last element
array([1, 2, 3])
However a view shares the data with the original array, so if one is modified so is the other:
>>> sub = arr[:-1]
>>> sub
array([0, 1, 2, 3])
>>> sub[0] = 100
>>> sub
array([100, 1, 2, 3])
>>> arr
array([100, 1, 2, 3, 4])
Creating a new array
1. Copy the view
If you don't like this memory sharing you have to create a new array, in this case it's probably simplest to create a view and then copy (for example using the copy() method of arrays) it:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> sub_arr = arr[:-1].copy()
>>> sub_arr
array([0, 1, 2, 3])
>>> sub_arr[0] = 100
>>> sub_arr
array([100, 1, 2, 3])
>>> arr
array([0, 1, 2, 3, 4])
2. Using integer array indexing [docs]
However, you can also use integer array indexing to remove the last element and get a new array. This integer array indexing will always (not 100% sure there) create a copy and not a view:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> indices_to_keep = [0, 1, 2, 3]
>>> sub_arr = arr[indices_to_keep]
>>> sub_arr
array([0, 1, 2, 3])
>>> sub_arr[0] = 100
>>> sub_arr
array([100, 1, 2, 3])
>>> arr
array([0, 1, 2, 3, 4])
This integer array indexing can be useful to remove arbitrary elements from an array (which can be tricky or impossible when you want a view):
>>> arr = np.arange(5, 10)
>>> arr
array([5, 6, 7, 8, 9])
>>> arr[[0, 1, 3, 4]] # keep first, second, fourth and fifth element
array([5, 6, 8, 9])
If you want a generalized function that removes the last element using integer array indexing:
def remove_last_element(arr):
return arr[np.arange(arr.size - 1)]
3. Using boolean array indexing [docs]
There is also boolean indexing that could be used, for example:
>>> arr = np.arange(5, 10)
>>> arr
array([5, 6, 7, 8, 9])
>>> keep = [True, True, True, True, False]
>>> arr[keep]
array([5, 6, 7, 8])
This also creates a copy! And a generalized approach could look like this:
def remove_last_element(arr):
if not arr.size:
raise IndexError('cannot remove last element of empty array')
keep = np.ones(arr.shape, dtype=bool)
keep[-1] = False
return arr[keep]
If you would like more information on NumPys indexing the documentation on "Indexing" is quite good and covers a lot of cases.
4. Using np.delete()
Normally I wouldn't recommend the NumPy functions that "seem" like they are modifying the array in-place (like np.append and np.insert) but do return copies because these are generally needlessly slow and misleading. You should avoid them whenever possible, that's why it's the last point in my answer. However in this case it's actually a perfect fit so I have to mention it:
>>> arr = np.arange(10, 20)
>>> arr
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> np.delete(arr, -1)
array([10, 11, 12, 13, 14, 15, 16, 17, 18])
5.) Using np.resize()
NumPy has another method that sounds like it does an in-place operation but it really returns a new array:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> np.resize(arr, arr.size - 1)
array([0, 1, 2, 3])
To remove the last element I simply provided a new shape that is 1 smaller than before, which effectively removes the last element.
Modifying the array inplace
Yes, I've written previously that you cannot modify an array in place. But I said that because in most cases it's not possible or only by disabling some (completely useful) safety checks. I'm not sure about the internals but depending on the old size and the new size it could be possible that this includes an (internal-only) copy operation so it might be slower than creating a view.
Using np.ndarray.resize()
If the array doesn't share its memory with any other array, then it's possible to resize the array in place:
>>> arr = np.arange(5, 10)
>>> arr.resize(4)
>>> arr
array([5, 6, 7, 8])
However that will throw ValueErrors in case it's actually referenced by another array as well:
>>> arr = np.arange(5)
>>> view = arr[1:]
>>> arr.resize(4)
ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function
You can disable that safety-check by setting refcheck=False but that shouldn't be done lightly because you make yourself vulnerable for segmentation faults and memory corruption in case the other reference tries to access the removed elements! This refcheck argument should be treated as an expert-only option!
Summary
Creating a view is really fast and doesn't take much additional memory, so whenever possible you should try to work as much with views as possible. However depending on the use-cases it's not so easy to remove arbitrary elements using basic slicing. While it's easy to remove the first n elements and/or last n elements or remove every x element (the step argument for slicing) this is all you can do with it.
But in your case of removing the last element of a one-dimensional array I would recommend:
arr[:-1] # if you want a view
arr[:-1].copy() # if you want a new array
because these most clearly express the intent and everyone with Python/NumPy experience will recognize that.
Timings
Based on the timing framework from this answer:
# Setup
import numpy as np
def view(arr):
return arr[:-1]
def array_copy_view(arr):
return arr[:-1].copy()
def array_int_index(arr):
return arr[np.arange(arr.size - 1)]
def array_bool_index(arr):
if not arr.size:
raise IndexError('cannot remove last element of empty array')
keep = np.ones(arr.shape, dtype=bool)
keep[-1] = False
return arr[keep]
def array_delete(arr):
return np.delete(arr, -1)
def array_resize(arr):
return np.resize(arr, arr.size - 1)
# Timing setup
timings = {view: [],
array_copy_view: [], array_int_index: [], array_bool_index: [],
array_delete: [], array_resize: []}
sizes = [2**i for i in range(1, 20, 2)]
# Timing
for size in sizes:
print(size)
func_input = np.random.random(size=size)
for func in timings:
print(func.__name__.ljust(20), ' ', end='')
res = %timeit -o func(func_input) # if you use IPython, otherwise use the "timeit" module
timings[func].append(res)
# Plotting
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(1)
ax = plt.subplot(111)
for func in timings:
ax.plot(sizes,
[time.best for time in timings[func]],
label=func.__name__)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()
I get the following timings as log-log plot to cover all the details, lower time still means faster, but the range between two ticks represents one order of magnitude instead of a fixed amount. In case you're interested in the specific values, I copied them into this gist:
According to these timings those two approaches are also the fastest. (Python 3.6 and NumPy 1.14.0)

If you want to quickly get array without last element (not removing explicit), use slicing:
array[:-1]

To delete the last element from a 1-dimensional NumPy array, use the numpy.delete method, like so:
import numpy as np
# Create a 1-dimensional NumPy array that holds 5 values
values = np.array([1, 2, 3, 4, 5])
# Remove the last element of the array using the numpy.delete method
values = np.delete(values, -1)
print(values)
Output:
[1 2 3 4]
The last value of the NumPy array, which was 5, is now removed.

Indexing a numpy array with a list of tuples

Why can't I index an ndarray using a list of tuple indices like so?
idx = [(x1, y1), ... (xn, yn)]
X[idx]
Instead I have to do something unwieldy like
idx2 = numpy.array(idx)
X[idx2[:, 0], idx2[:, 1]] # or more generally:
X[tuple(numpy.vsplit(idx2.T, 1)[0])]
Is there a simpler, more pythonic way?

You can use a list of tuples, but the convention is different from what you want. numpy expects a list of row indices, followed by a list of column values. You, apparently, want to specify a list of (x,y) pairs.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing
The relevant section in the documentation is 'integer array indexing'.
Here's an example, seeking 3 points in a 2d array. (2 points in 2d can be confusing):
In [223]: idx
Out[223]: [(0, 1, 1), (2, 3, 0)]
In [224]: X[idx]
Out[224]: array([2, 7, 4])
Using your style of xy pairs of indices:
In [230]: idx1 = [(0,2),(1,3),(1,0)]
In [231]: [X[i] for i in idx1]
Out[231]: [2, 7, 4]
In [240]: X[tuple(np.array(idx1).T)]
Out[240]: array([2, 7, 4])
X[tuple(zip(*idx1))] is another way of doing the conversion. The tuple() is optional in Python2. zip(*...) is a Python idiom that reverses the nesting of a list of lists.
You are on the right track with:
In [242]: idx2=np.array(idx1)
In [243]: X[idx2[:,0], idx2[:,1]]
Out[243]: array([2, 7, 4])
My tuple() is just a bit more compact (and not necessarily more 'pythonic'). Given the numpy convention, some sort of conversion is necessary.
(Should we check what works with n-dimensions and m-points?)

Use a tuple of NumPy arrays which can be directly passed to index your array:
index = tuple(np.array(list(zip(*index_tuple))))
new_array = list(prev_array[index])

Python (Numpy) array sorting

I've got this array, named v, of dtype('float64'):
array([[ 9.33350000e+05, 8.75886500e+06, 3.45765000e+02],
[ 4.33350000e+05, 8.75886500e+06, 6.19200000e+00],
[ 1.33360000e+05, 8.75886500e+06, 6.76650000e+02]])
... which I've acquired from a file by using the np.loadtxt command. I would like to sort it after the values of the first column, without mixing up the structure that keeps the numbers listed on the same line together. Using v.sort(axis=0) gives me:
array([[ 1.33360000e+05, 8.75886500e+06, 6.19200000e+00],
[ 4.33350000e+05, 8.75886500e+06, 3.45765000e+02],
[ 9.33350000e+05, 8.75886500e+06, 6.76650000e+02]])
... i.e. places the smallest number of the third column in the first line, etc. I would rather want something like this...
array([[ 1.33360000e+05, 8.75886500e+06, 6.76650000e+02],
[ 4.33350000e+05, 8.75886500e+06, 6.19200000e+00],
[ 9.33350000e+05, 8.75886500e+06, 3.45765000e+02]])
... where the elements of each line hasn't been moved relatively to each other.

Try
v[v[:,0].argsort()]
(with v being the array). v[:,0] is the first column, and .argsort() returns the indices that would sort the first column. You then apply this ordering to the whole array using advanced indexing. Note that you get a sorte copy of the array.
The only way I know of to sort the array in place is to use a record dtype:
v.dtype = [("x", float), ("y", float), ("z", float)]
v.shape = v.size
v.sort(order="x")

Alternatively
Try
import numpy as np
order = v[:, 0].argsort()
sorted = np.take(v, order, 0)
'order' has the order of the first row.
and then 'np.take' take the columns their corresponding order.
See the help of 'np.take' as
help(np.take)
take(a, indices, axis=None, out=None,
mode='raise')
Take elements from an array along an axis.
This function does the same thing as "fancy" indexing (indexing arrays
using arrays); however, it can be easier to use if you need elements
along a given axis.
Parameters
----------
a : array_like
The source array.
indices : array_like
The indices of the values to extract.
axis : int, optional
The axis over which to select values. By default, the flattened
input array is used.
out : ndarray, optional
If provided, the result will be placed in this array. It should
be of the appropriate shape and dtype.
mode : {'raise', 'wrap', 'clip'}, optional
Specifies how out-of-bounds indices will behave.
* 'raise' -- raise an error (default)
* 'wrap' -- wrap around
* 'clip' -- clip to the range
'clip' mode means that all indices that are too large are
replaced
by the index that addresses the last element along that axis. Note
that this disables indexing with negative numbers.
Returns
-------
subarray : ndarray
The returned array has the same type as `a`.
See Also
--------
ndarray.take : equivalent method
Examples
--------
>>> a = [4, 3, 5, 7, 6, 8]
>>> indices = [0, 1, 4]
>>> np.take(a, indices)
array([4, 3, 6])
In this example if `a` is an ndarray, "fancy" indexing can be used.
>>> a = np.array(a)
>>> a[indices]
array([4, 3, 6])

If you have instances where v[:,0] has some identical values and you want to secondarily sort on columns 1, 2, etc.., then you'll want to use numpy.lexsort() or numpy.sort(v, order=('col1', 'col2', etc..) but for the order= case, v will need to be a structured array.

An example application of numpy.lexsort() to sort the rows of an array and deals with ties in the first column. Note that lexsort effectively sorts columns and starts with the last column, so you need to reverse the rows of a then take the transpose before the lexsort, and finally transpose the result (you'd have thought this should be easier, but hey!):
In [1]: import numpy as np
In [2]: a = np.array([[1,2,3,4],[1,0,4,1],[0,4,1,1]])
In [3]: a[np.lexsort(np.flip(a, axis=1).T).T]
Out[3]:
array([[0, 4, 1, 1],
[1, 0, 4, 1],
[1, 2, 3, 4]])
In [4]: a
Out[4]:
array([[1, 2, 3, 4],
[1, 0, 4, 1],
[0, 4, 1, 1]])
Thanks go to #Paul for the suggestion to use lexsort.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get original indices of a sorted Numpy array - python

I have an array of distances a = np.array([20.5 ,5.3 ,60.7 ,3.0 ], 'double') and I need the indices of the sorted array (for example [3, 1, 0, 2], for a.sort()). Is there a function in Numpy to do that?

Here's an example, for reference and convenience: # create an array a = np.array([5,2,3]) # np.sort - returns the array, sorted np.sort(a) >>> array([2, 3, 5]) # argsort - returns the original indexes of the sorted array np.argsort(a) >>> array([1, 2, 0])

Related

Calculation on list of numpy array

numpy-equivalent of list.pop?

Numpy - the best way to remove the last element from 1 dimensional array?

Indexing a numpy array with a list of tuples

Python (Numpy) array sorting

Categories

Resources