Python (Numpy) array sorting - python

I've got this array, named v, of dtype('float64'):
array([[ 9.33350000e+05, 8.75886500e+06, 3.45765000e+02],
[ 4.33350000e+05, 8.75886500e+06, 6.19200000e+00],
[ 1.33360000e+05, 8.75886500e+06, 6.76650000e+02]])
... which I've acquired from a file by using the np.loadtxt command. I would like to sort it after the values of the first column, without mixing up the structure that keeps the numbers listed on the same line together. Using v.sort(axis=0) gives me:
array([[ 1.33360000e+05, 8.75886500e+06, 6.19200000e+00],
[ 4.33350000e+05, 8.75886500e+06, 3.45765000e+02],
[ 9.33350000e+05, 8.75886500e+06, 6.76650000e+02]])
... i.e. places the smallest number of the third column in the first line, etc. I would rather want something like this...
array([[ 1.33360000e+05, 8.75886500e+06, 6.76650000e+02],
[ 4.33350000e+05, 8.75886500e+06, 6.19200000e+00],
[ 9.33350000e+05, 8.75886500e+06, 3.45765000e+02]])
... where the elements of each line hasn't been moved relatively to each other.

Try
v[v[:,0].argsort()]
(with v being the array). v[:,0] is the first column, and .argsort() returns the indices that would sort the first column. You then apply this ordering to the whole array using advanced indexing. Note that you get a sorte copy of the array.
The only way I know of to sort the array in place is to use a record dtype:
v.dtype = [("x", float), ("y", float), ("z", float)]
v.shape = v.size
v.sort(order="x")

Alternatively
Try
import numpy as np
order = v[:, 0].argsort()
sorted = np.take(v, order, 0)
'order' has the order of the first row.
and then 'np.take' take the columns their corresponding order.
See the help of 'np.take' as
help(np.take)
take(a, indices, axis=None, out=None,
mode='raise')
Take elements from an array along an axis.
This function does the same thing as "fancy" indexing (indexing arrays
using arrays); however, it can be easier to use if you need elements
along a given axis.
Parameters
----------
a : array_like
The source array.
indices : array_like
The indices of the values to extract.
axis : int, optional
The axis over which to select values. By default, the flattened
input array is used.
out : ndarray, optional
If provided, the result will be placed in this array. It should
be of the appropriate shape and dtype.
mode : {'raise', 'wrap', 'clip'}, optional
Specifies how out-of-bounds indices will behave.
* 'raise' -- raise an error (default)
* 'wrap' -- wrap around
* 'clip' -- clip to the range
'clip' mode means that all indices that are too large are
replaced
by the index that addresses the last element along that axis. Note
that this disables indexing with negative numbers.
Returns
-------
subarray : ndarray
The returned array has the same type as `a`.
See Also
--------
ndarray.take : equivalent method
Examples
--------
>>> a = [4, 3, 5, 7, 6, 8]
>>> indices = [0, 1, 4]
>>> np.take(a, indices)
array([4, 3, 6])
In this example if `a` is an ndarray, "fancy" indexing can be used.
>>> a = np.array(a)
>>> a[indices]
array([4, 3, 6])

If you have instances where v[:,0] has some identical values and you want to secondarily sort on columns 1, 2, etc.., then you'll want to use numpy.lexsort() or numpy.sort(v, order=('col1', 'col2', etc..) but for the order= case, v will need to be a structured array.

An example application of numpy.lexsort() to sort the rows of an array and deals with ties in the first column. Note that lexsort effectively sorts columns and starts with the last column, so you need to reverse the rows of a then take the transpose before the lexsort, and finally transpose the result (you'd have thought this should be easier, but hey!):
In [1]: import numpy as np
In [2]: a = np.array([[1,2,3,4],[1,0,4,1],[0,4,1,1]])
In [3]: a[np.lexsort(np.flip(a, axis=1).T).T]
Out[3]:
array([[0, 4, 1, 1],
[1, 0, 4, 1],
[1, 2, 3, 4]])
In [4]: a
Out[4]:
array([[1, 2, 3, 4],
[1, 0, 4, 1],
[0, 4, 1, 1]])
Thanks go to #Paul for the suggestion to use lexsort.

Related

Updating elements in multiply indexed np.array

I have a 2D numpy array and need to update a selection of elements via multiple layers of indexing. The obvious way to do this for me does not work since it seems numpy is only updating a copy of the array and not the array itself:
import numpy as np
# Create an array and indices that should be updated
arr = np.arange(9).reshape(3,3)
idx = np.array([[0,2], [1,1],[2,0]])
bool_idx = np.array([True, True, False])
# This line does not work as intended since the original array stays unchanged
arr[idx[:,0],idx[:,1]][bool_idx] = -1 * arr[idx[:,0],idx[:,1]][bool_idx]
This is the resulting output:
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
However, I expected this output:
>>> arr
array([[0, 1, -2],
[3, -4, 5],
[6, 7, 8]])
We need to mask the indices with the given mask and then index into arr and assign new values. For indexing, we can use tuple(masked_indices) to index or use the two columns of the index-array for integer-indexing, thus giving us two methods.
Method #1 :
arr[tuple(idx[bool_idx].T)] *= -1
Method #2 :
idx_masked = idx[bool_idx]
arr[idx_masked[:,0],idx_masked[:,1]] *= -1
Why didn't the original method work?
On LHS you were doing arr[idx[:,0],idx[:,1]][bool_idx], which is esssentially two steps : arr[idx[:,0],idx[:,1]], which under the hoods calls arr.__getitem__(indexer)*. When indexer is a slice, the regularity of the elements allows NumPy to return a view (by modifying the strides and offset). When indexer is an arbitrary boolean mask or arbitrary array of integers, there is in general no regularity to the elements selected, so there is no way to return a view. Let's call arr[idx[:,0],idx[:,1]] as arr2.
In the next step, with the combined arr[idx[:,0],idx[:,1]][bool_idx], i.e. arr2[bool_idx], under the hoods it calls arr2.__setitem__(mask), which is implemented to modify arr2 and as such doesn't propagate back to arr.
*Inspiration from - https://stackoverflow.com/a/38768993/.
More info on __getitem__,__setitem__.
Why did the methods posted in this post work?
Because both directly used the indexer on arr with arr.__setitem__(indexer) that modifies arr.
You just need to make a small change to your own attempt -- you need to apply the boolean index array on each of your integer index expressions. In other words, this should work:
arr[idx[:,0][bool_idx],idx[:,1][bool_idx]] *= -1
(I've just moved the [bool_idx] inside the square brackets, to apply it on the both of the integer index expressions -- idx[:,0] and idx[:,1])

delete all columns of a dimension except for a specific column

I want to make a function which takes a n-dimensional array, the dimension and the column index, and it will return the (n-1)-dimensional array after removing all the other columns of that specific dimension.
Here is the code I am using now
a = np.arange(6).reshape((2, 3)) # the n-dimensional array
axisApplied = 1
colToKeep = 0
colsToDelete = np.delete(np.arange(a.shape[axisApplied]), colToKeep)
a = np.squeeze(np.delete(a, colsToDelete, axisApplied), axis=axisApplied)
print(a)
# [0, 3]
Note that I have to manually calculate the n-1 indices (the complement of the specific column index) to use np.delete(), and I am wondering whether there is a more convenient way to achieve my goal, e.g. specify which column to keep directly.
Thank you for reading and I am welcome to any suggestions.
In [1]: arr = np.arange(6).reshape(2,3)
In [2]: arr
Out[2]:
array([[0, 1, 2],
[3, 4, 5]])
Simple indexing:
In [3]: arr[:,0]
Out[3]: array([0, 3])
Or if you need to used the general axis parameter, try take:
In [4]: np.take(arr,0,axis=1)
Out[4]: array([0, 3])
Picking one element, or a list of elements, along an axis is a lot easier than deleting some. Look at the code for np.delete.

Why is the colon needed to represent the row but not for the column of a matrix?

I've been migrating from Matlab to NumPy/Scipy. There is one fundamental thing I don't clearly understand.
When we have a two-dimensional array like the following:
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
in order to represent the first column and row, we use the following expressions.
col = x[:, 0]
row = x[0, ]
So ,we see that : is not needed to represent a row but : is needed for a column.
Could somebody explain what would be the reason?
The slice notation uses tuples to indicate what to slice.
:, 0 is a tuple with two elements; a slice(None, None, None) object (so slicing from start to end with step 1), and the integer 0. The notation , 0 however is not valid Python. You have to have an expression before comma, you can't just leave it blank.
0, on the other hand, is a valid tuple. It contains just one element, the integer 0. Because there is more than one dimension in your array, numpy can extrapolate that you wanted to use all elements for the remaining dimensions, so you don't need to give it a 0, : (== 0, slice(None, None, None)) tuple.
It is merely because [,0] is invalid syntax in Python. Whereas [0,] is perfectly legal.

How do I remove the first and last rows and columns from a 2D numpy array?

I'd like to know how to remove the first and last rows and columns from a 2D array in numpy. For example, say we have a (N+1) x (N+1) matrix called H then in MATLAB/Octave, the code I'd use would be:
Hsub = H(2:N,2:N);
What's the equivalent code in Numpy? I thought that np.reshape might do what I want but I'm not sure how to get it to remove just the target rows as I think if I reshape to a (N-1) x (N-1) matrix, it'll remove the last two rows and columns.
How about this?
Hsub = H[1:-1, 1:-1]
The 1:-1 range means that we access elements from the second index, or 1, and we go up to the second last index, as indicated by the -1 for a dimension. We do this for both dimensions independently. When you do this independently for both dimensions, the result is the intersection of how you're accessing each dimension, which is essentially chopping off the first row, first column, last row and last column.
Remember, the ending index is exclusive, so if we did 0:3 for example, we only get the first three elements of a dimension, not four.
Also, negative indices mean that we access the array from the end. -1 is the last value to access in a particular dimension, but because of the exclusivity, we are getting up to the second last element, not the last element. Essentially, this is the same as doing:
Hsub = H[1:H.shape[0]-1, 1:H.shape[1]-1]
... but using negative indices is much more elegant. You also don't have to use the number of rows and columns to extract out what you need. The above syntax is dimension agnostic. However, you need to make sure that the matrix is at least 3 x 3, or you'll get an error.
Small bonus
In MATLAB / Octave, you can achieve the same thing without using the dimensions by:
Hsub = H(2:end-1, 2:end-1);
The end keyword with regards to indexing means to get the last element for a particular dimension.
Example use
Here's an example (using IPython):
In [1]: import numpy as np
In [2]: H = np.meshgrid(np.arange(5), np.arange(5))[0]
In [3]: H
Out[3]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In [4]: Hsub = H[1:-1,1:-1]
In [5]: Hsub
Out[5]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
As you can see, the first row, first column, last row and last column have been removed from the source matrix H and the remainder has been placed in the output matrix Hsub.

Finding the minimum value in a numpy array and the corresponding values for the rest of that array's row

Consider the following NumPy array:
a = np.array([[1,4], [2,1],(3,10),(4,8)])
This gives an array that looks like the following:
array([[ 1, 4],
[ 2, 1],
[ 3, 10],
[ 4, 8]])
What I'm trying to do is find the minimum value of the second column (which in this case is 1), and then report the other value of that pair (in this case 2). I've tried using something like argmin, but that gets tripped up by the 1 in the first column.
Is there a way to do this easily? I've also considered sorting the array, but I can't seem to get that to work in a way that keeps the pairs together. The data is being generated by a loop like the following, so if there's a easier way to do this that isn't a numpy array, I'd take that as an answer too:
results = np.zeros((100,2))
# Loop over search range, change kappa each time
for i in range(100):
results[i,0] = function1(x)
results[i,1] = function2(y)
How about
a[np.argmin(a[:, 1]), 0]
Break-down
a. Grab the second column
>>> a[:, 1]
array([ 4, 1, 10, 8])
b. Get the index of the minimum element in the second column
>>> np.argmin(a[:, 1])
1
c. Index a with that to get the corresponding row
>>> a[np.argmin(a[:, 1])]
array([2, 1])
d. And take the first element
>>> a[np.argmin(a[:, 1]), 0]
2
Using np.argmin is probably the best way to tackle this. To do it in pure python, you could use:
min(tuple(r[::-1]) for r in a)[::-1]

Categories

Resources