Sorting arrays in Python by a non-integer column - python

So for example I have an array that I want to sort by a column in an ascending order, and it's easy to do for integers using 'sorting()', 'np.arrange()', or 'np.argsort()'.
However, what if my column is consisting of floats?
What would you recommend?
Edit:
I mean, I have something like:
a = array([[1.7, 2, 3],
[4.5, 5, 6],
[0.1, 0, 1]])
and I want to get this:
array([[0.1, 0, 1],
[1.7, 2, 3],
[4.5, 5, 6]])
So far with argsort() I get the following error:
Type Error: only integer scalar arrays can be converted to a scalar index

You can use a standard Python's sorted (or sort for in-place sorting), no matter what is contained in the sequence. Just use a custom key, or a custom compare function (cmp). For example, to sort a list of lists (2-d array) ascending by 4th column:
>>> a=[[1.0,2.0,3.0,4.0], [4.0,3.0,2.0,1.0], [0,0,0,0]]
>>> from operator import itemgetter
>>>> sorted(a, key=itemgetter(3))
[[0, 0, 0, 0], [4.0, 3.0, 2.0, 1.0], [1.0, 2.0, 3.0, 4.0]]

The standard way to do this in numpy is to specify the correct axis you want to sort on, by default it sorts on axis=-1:
>>> np.sort(a, axis=0)
array([[ 0.1, 0. , 1. ],
[ 1.7, 2. , 3. ],
[ 4.5, 5. , 6. ]])
Or inplace:
>>> a.sort(axis=0)
>>> a
array([[ 0.1, 0. , 1. ],
[ 1.7, 2. , 3. ],
[ 4.5, 5. , 6. ]])
To sort just on a specific column you can use argsort(), e.g. column 0:
>>> a[np.argsort(a[:,0])]
array([[ 0.1, 0. , 1. ],
[ 1.7, 2. , 3. ],
[ 4.5, 5. , 6. ]])

Related

Get average column value from list of arrays Python

I'm trying to get the average value of values in column 1 and column 2 of a list of arrays. I am using a dict called clusters with an index of clusterNo where I iterate through clusterNo.
print(kMeans.clusters[clusterNo])
When I print the dictionary it gives me this result:
[array([ 5.1, 3.5]), array([ 4.9, 3. ]), array([ 4.7, 3.2]), array([ 4.6, 3.1]), array([ 5. , 3.6])
etc etc..
I cannot figure out how to slice into columns and then get the average. Bare in mind they are float values so I cannot simply avg() them.
Setup
>>> import numpy as np
>>> lst = [np.array([ 5.1, 3.5]), np.array([ 4.9, 3. ]), np.array([ 4.7, 3.2]), np.array([ 4.6, 3.1]), np.array([ 5. , 3.6])]
Solution
>>> np.mean(lst, axis=0)
array([4.86, 3.28])
However, having lst as an array might be advantageous if you need to do more calculations or array operations on that data.
>>> arr = np.array(lst)
>>> arr
array([[5.1, 3.5],
[4.9, 3. ],
[4.7, 3.2],
[4.6, 3.1],
[5. , 3.6]])
>>> arr.mean(axis=0)
array([4.86, 3.28])

Find numpy array coordinates of neighboring maximum

I used the accepted answer in this question to obtain local maxima in a numpy array of 2 or more dimensions so I could assign labels to them. Now I would like to also assign these labels to neighboring cells in the array, depending on gradient – i.e. a cell gets the same label as the neighboring cell with the highest value. This way I can iteratively assign labels to my entire array.
Assume I have an array A like
>>> A = np.array([[ 1. , 2. , 2.2, 3.5],
[ 2.1, 2.4, 3. , 3.3],
[ 1. , 3. , 3.2, 3. ],
[ 2. , 4.1, 4. , 2. ]])
Applying the maximum_filter I get
>>> scipy.ndimage.filters.maximum_filter(A, size=3)
array([[ 2.4, 3. , 3.5, 3.5],
[ 3. , 3.2, 3.5, 3.5],
[ 4.1, 4.1, 4.1, 4. ],
[ 4.1, 4.1, 4.1, 4. ]])
Now, for every cell in this array I would like to have the coordinates of the maximum found by the filter, i.e.
array([[[1,1],[1,2],[0,3],[0,3]],
[[2,1],[2,2],[0,3],[0,3]],
[[3,1],[3,1],[3,1],[3,2]],
[[3,1],[3,1],[3,1],[3,2]]])
I would then use these coordinates to assign my labels iteratively.
I can do it for two dimensions using loops, ignoring borders
highest_neighbor_coordinates = np.array([[(argmax2D(A[i-1:i+2, j-1:j+2])+np.array([i-1, j-1])) for j in range(1, A.shape[1]-1)] for i in range(1, A.shape[0]-1)])
but after seeing the many filter functions in scipy.ndimage I was hoping there would be a more elegant and extensible (to >=3 dimensions) solution.
We can use pad with reflected elements to simulate the max-filter operation and get sliding windows on it with scikit-image's view_as_windows, compute the flattened argmax indices, offset those with ranged values to translate onto global scale -
from skimage.util import view_as_windows as viewW
def window_argmax_global2D(A, size):
hsize = (size-1)//2 # expects size as odd number
m,n = A.shape
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, (size,size)).reshape(-1,size**2).argmax(-1).reshape(m,n)
r,c = np.unravel_index(idx, (size,size))
rows = np.abs(r + np.arange(-hsize,m-hsize)[:,None])
cols = np.abs(c + np.arange(-hsize,n-hsize))
return rows, cols
Sample run -
In [201]: A
Out[201]:
array([[1. , 2. , 2.2, 3.5],
[2.1, 2.4, 3. , 3.3],
[1. , 3. , 3.2, 3. ],
[2. , 4.1, 4. , 2. ]])
In [202]: rows, cols = window_argmax_global2D(A, size=3)
In [203]: rows
Out[203]:
array([[1, 1, 0, 0],
[2, 2, 0, 0],
[3, 3, 3, 3],
[3, 3, 3, 3]])
In [204]: cols
Out[204]:
array([[1, 2, 3, 3],
[1, 2, 3, 3],
[1, 1, 1, 2],
[1, 1, 1, 2]])
Extending to n-dim
We would use np.ogrid for this extension part :
def window_argmax_global(A, size):
hsize = (size-1)//2 # expects size as odd number
shp = A.shape
N = A.ndim
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, ([size]*N)).reshape(-1,size**N).argmax(-1).reshape(shp)
offsets = np.ogrid[tuple(map(slice, shp))]
out = np.unravel_index(idx, ([size]*N))
return [np.abs(i+j-hsize) for i,j in zip(out,offsets)]

Numpy array of distances to list of (row,col,distance)

I have an nd array that looks as follows:
[[ 0. 1.73205081 6.40312424 7.21110255 2.44948974]
[ 1.73205081 0. 5.09901951 5.91607978 1. ]
[ 6.40312424 5.09901951 0. 1. 4.35889894]
[ 7.21110255 5.91607978 1. 0. 5.09901951]
[ 2.44948974 1. 4.35889894 5.09901951 0. ]]
Each element in this array is a distance and I need to turn this into a list with the row,col,distance as follows:
l = [(0,0,0),(0,1, 1.73205081),(0,2, 6.40312424),...,(1,0, 1.73205081),(1,1,0),...,(4,4,0)]
Additionally, it would be cool to remove the diagonal elements and also the elements (j,i) as (i,j) are already there. Essentially, is it possible to take just the top triangular matrix of this?
Is this possible to do efficiently (without a lot of loops)? I had created this array with squareform, but couldn't find any docs to do this.
squareform does all this. Read the docs and experiment. It works in both directions. If you give it a matrix it returns the upper triangle values (condensed form). If you give it those values, it returns the matrix.
In [668]: M
Out[668]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
In [669]: spatial.distance.squareform(M)
Out[669]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [670]: v=spatial.distance.squareform(M)
In [671]: v
Out[671]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
In [672]: spatial.distance.squareform(v)
Out[672]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0.1, 0. , 2. , 0.3],
[ 0.5, 2. , 0. , 0.2],
[ 0.2, 0.3, 0.2, 0. ]])
You can also specify a force and checks parameter, but without those it just goes by the shape.
Indicies can come from triu
In [677]: np.triu_indices(4,1)
Out[677]:
(array([0, 0, 0, 1, 1, 2], dtype=int32),
array([1, 2, 3, 2, 3, 3], dtype=int32))
In [680]: np.vstack((np.triu_indices(4,1),v)).T
Out[680]:
array([[ 0. , 1. , 0.1],
[ 0. , 2. , 0.5],
[ 0. , 3. , 0.2],
[ 1. , 2. , 2. ],
[ 1. , 3. , 0.3],
[ 2. , 3. , 0.2]])
Just to check, we can fill in a 4x4 matrix with these values
In [686]: A=np.vstack((np.triu_indices(4,1),v)).T
In [687]: MM = np.zeros((4,4))
In [688]: MM[A[:,0].astype(int),A[:,1].astype(int)]=A[:,2]
In [689]: MM
Out[689]:
array([[ 0. , 0.1, 0.5, 0.2],
[ 0. , 0. , 2. , 0.3],
[ 0. , 0. , 0. , 0.2],
[ 0. , 0. , 0. , 0. ]])
Those triu indices can also fetch the values from M:
In [693]: I,J = np.triu_indices(4,1)
In [694]: M[I,J]
Out[694]: array([ 0.1, 0.5, 0.2, 2. , 0.3, 0.2])
squareform uses compiled code in spatial.distance._distance_wrap so I expect it will be quite fast for large arrays. Only problem it just returns the condensed form values, but not the indices. But given the shape,the indices can always be calculated. They don't need to be stored with the values.
If your input is x, first generate the indices:
i0,i1 = np.indices(x.shape)
Then:
np.concatenate((i1,i0,x)).reshape(3,5,5).T
That gives you the first result--for the entire matrix.
As for taking only the upper triangle, you might considering trying np.triu() but I'm not sure exactly what result you're looking for. You can probably figure out how to mask the parts you don't want now though.
you can try this,
print([(x,y, value) for (x,y), value in np.ndenumerate(numpymatrixarray)])
output [(0, 0, 0.0), (0, 1, 1.7320508100000001), (0, 2, 6.4031242400000004), (0, 3, 7.2111025499999997), (0, 4, 2.4494897400000002), (1, 0, 1.7320508100000001), (1, 1, 0.0), (1, 2, 5.0990195099999998), (1, 3, 5.9160797799999996), (1, 4, 1.0), (2, 0, 6.4031242400000004), (2, 1, 5.0990195099999998), (2, 2, 0.0), (2, 3, 1.0), (2, 4, 4.3588989400000004), (3, 0, 7.2111025499999997), (3, 1, 5.9160797799999996), (3, 2, 1.0), (3, 3, 0.0), (3, 4, 5.0990195099999998), (4, 0, 2.4494897400000002), (4, 1, 1.0), (4, 2, 4.3588989400000004), (4, 3, 5.0990195099999998), (4, 4, 0.0)]
Do you really want the top triangular matrix for an [nxm] matrix where n>m? That will give you (nxn-n)/2 elements and lose all the data where m⊖n.
What you probably want is the lower triangular matrix:
def tri_reduce(m):
n=m.shape
if n[0]>n[1]:
i=np.tril_indices(n[0],1,n[1])
else:
i=np.triu_indices(n[0],1,n[1])
return np.vstack((i,m[i])).T
Rebuilding it into a list of tuples would require a loop though I believe. list(tri_reduce(m)) would give a list of nd arrays.

numpy: how interpolate between two arrays for various timesteps?

I'm looking for a way to do a simple linear interpolation between two numpy arrays that represent a start and endpoint in time.
The two arrays have the same length:
fst = np.random.random_integers(5, size=(10.))
>>> array([4, 4, 1, 3, 1, 4, 3, 2, 5, 2])
snd = np.random.random_integers(5, size=(10.))
>>> array([1, 1, 3, 4, 1, 5, 5, 5, 4, 3])
Between my start and endpoint there are 3 timesteps. How can I interpolate between fst and snd? I want to be able, taking the first entry of fst and snd as an example, to retrieve the value of each timestep like
np.interp(1, [1,5], [4,1])
np.interp(2, [1,5], [4,1])
...
# that is
np.interp([1,2,3,4,5], [1,5], [4,1])
>>> array([ 4. , 3.25, 2.5 , 1.75, 1. ])
But than not just for the first entry but over the whole array.
Obviously, this won't do it:
np.interp(1, [1,5], [fst,snd])
Well I know I get there in a loop, e.g.
[np.interp(2, [1,5], [item,snd[idx]]) for idx,item in enumerate(fst)]
>>> [3.25, 3.25, 1.5, 3.25, 1.0, 4.25, 3.5, 2.75, 4.75, 2.25]
but I believe when you are lopping over numpy arrays you are doing something fundamentally wrong.
The facilities in scipy.interpolate.interp1d allow this to be done quite easily if you form your samples into a 2D matrix. In your case, you can construct a 2xN array, and construct an interpolation function that operates down the columns:
from scipy.interpolate import interp1d
fst = np.array([4, 4, 1, 3, 1, 4, 3, 2, 5, 2])
snd = np.array([1, 1, 3, 4, 1, 5, 5, 5, 4, 3])
linfit = interp1d([1,5], np.vstack([fst, snd]), axis=0)
You can then generate an interpolated vector at any time of interest. For example linfit(2) produces:
array([ 3.25, 3.25, 1.5 , 3.25, 1. , 4.25, 3.5 , 2.75, 4.75, 2.25])
or you can invoke linfit() with a vector of time values, e.g. linfit([1,2,3]) gives:
array([[ 4. , 4. , 1. , 3. , 1. , 4. , 3. , 2. , 5. , 2. ],
[ 3.25, 3.25, 1.5 , 3.25, 1. , 4.25, 3.5 , 2.75, 4.75, 2.25],
[ 2.5 , 2.5 , 2. , 3.5 , 1. , 4.5 , 4. , 3.5 , 4.5 , 2.5 ]])
If you're only doing linear interpolation, you could also just do something like:
((5-t)/(5-1)) * fst + ((t-1)/(5-1)) * snd
to directly compute the interpolated vector at any time t.

Numpy: placing values into an 1-of-n array based on indices in another array

Suppose we had two arrays: some values, e.g. array([1.2, 1.4, 1.6]), and some indices (let's say, array([0, 2, 1])) Our output is expected to be the values put into a bigger array, "addressed" by the indices, so we would get
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
Is there a way to do this without loops, in a nice, fast way?
With
a = zeros((3,3))
b = array([0, 2, 1])
vals = array([1.2, 1.4, 1.6])
You just need to index it (with the help of arange or r_):
>>> a[r_[:len(b)], b] = vals
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
How do we modify this for higher dimensions? For example, a is a 5x4x3 array and b and vals are 5x4 arrays.
then How do we modify the statement a[r_[:len(b)],b] = vals ?

Categories

Resources