First of all I would like to point out that my question is different than this one: Sort a numpy matrix based on its diagonal
The question is as follow:
Suppose I have a numpy matrix
A=
5 7 8
7 2 9
8 9 3
I would like to sort the matrix based on its diagonal and then re-arrange the matrix element based on it. Such that now
sorted_A:
2 9 7
9 3 8
7 8 5
Note that:
(1). The diagonal is sorted
(2). The other elements (non-diagonal) re-adjusted by it. How?
because diag(A)= [5,2,3] & diag(sorted_A)=[2,3,5]
so row/column indices A=[0,1,2] become [1,2,0] in sorted_A.
So far I use brute force where I extract the diagonal elements, get the indices O(N²) and then re-arrange the matrix (another O(N²)). I wonder if there is any efficient/elegant way to do this. I appreciate all the help I can get.
Sorting the rows based on the diagonal values is easy:
In [192]: A=np.array([[5,7,8],[7,2,9],[8,9,3]])
In [193]: A
Out[193]:
array([[5, 7, 8],
[7, 2, 9],
[8, 9, 3]])
In [194]: np.diag(A)
Out[194]: array([5, 2, 3])
In [195]: idx=np.argsort(np.diag(A))
In [196]: idx
Out[196]: array([1, 2, 0], dtype=int32)
In [197]: A[idx,:]
Out[197]:
array([[7, 2, 9],
[8, 9, 3],
[5, 7, 8]])
Rearranging the elements in each row to the original diagonals are back on the diagonal will take some experimenting - trial and error. We probably have to 'roll' each row based on some value related to the sorting idx. I don't recall if there is a function to roll each row separately or if we have to iterate over the rows to do that.
In [218]: A1=A[idx,:]
In [219]: [np.roll(a,-i) for a,i in zip(A1,[1,1,1])]
Out[219]: [array([2, 9, 7]), array([9, 3, 8]), array([7, 8, 5])]
In [220]: np.array([np.roll(a,-i) for a,i in zip(A1,[1,1,1])])
Out[220]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
So roll with [1,1,1] does the job. But off hand I don't see how that can be derived. I suspect we need to generate several more test cases, possibly larger ones, and look for a pattern.
That roll probably has something to do with how much the row has moved, the difference between the original position and the new one. Let's try:
np.arange(3)-idx
In [222]: np.array([np.roll(a,i) for a,i in zip(A1,np.arange(3)-idx)])
Out[222]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
Applying the sorting idx to both rows and columns seems to do the trick as well:
In [227]: A[idx,:][:,idx]
Out[227]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
In [229]: A[idx[:,None],idx]
Out[229]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
Here I simplify a straightforward solution that has been stated before but is hard to get your heads around.
This is useful if you want to sort a table (e.g. confusion matrix by its diagonal magnitude and arrange rows and columns accordingly.
>>> A=np.array([[5,1,4],[7,2,9],[8,0,3]])
>>> A
array([[5, 1, 4],
[7, 2, 9],
[8, 0, 3]])
>>> diag = np.diag(A)
>>> diag
array([5, 2, 3])
>>> idx=np.argsort(diag) # get the order of items that are in diagon
>>> A[idx,:][:,idx] # reorder rows and arrows based on the order of items on diagon
array([[2, 9, 7],
[0, 3, 8],
[1, 4, 5]])
if you want to sort in descending order just add idx = idx[::-1] # reverse order
Related
Given the following matrix,
In [0]: a = np.array([[1,2,9,4,2,5],[4,5,1,4,2,4],[2,3,6,7,8,9],[5,6,7,4,3,6]])
Out[0]:
array([[1, 2, 9, 4, 2, 5],
[4, 5, 1, 4, 2, 4],
[2, 3, 6, 7, 8, 9],
[5, 6, 7, 4, 3, 6]])
I want to get the indices of the rows that have 9 as a member. This is,
idx = [0,2]
Currently I am doing this,
def myf(x):
if any(x==9):
return True
else:
return False
aux = np.apply_along_axis(myf, axis=1, arr=a)
idx = np.where(aux)[0]
And I get the result I wanted.
In [1]: idx
Out[1]: array([0, 2], dtype=int64)
But this method is very slow (meaning maybe there is a faster way) and certainly not very pythonic.
How can I do this in a cleaner, more pythonic but mainly more efficient way?
Note that this question is close to this one but here I want to apply the condition on the entire row.
You could combine np.argwhere and np.any:
np.argwhere(np.any(a==9,axis=1))[:,0]
Use np.argwhere to find the indices where a==9 and use the 0th column of those indices to index a:
In [171]: a = np.array([[1,2,9,4,2,5],[4,5,1,4,2,4],[2,3,6,7,8,9],[5,6,7,4,3,6]])
...:
...: indices = np.argwhere(a==9)
...: a[indices[:,0]]
Out[171]:
array([[1, 2, 9, 4, 2, 5],
[2, 3, 6, 7, 8, 9]])
...or if you just need the row numbers just save indices[:,0]. If 9 can appear more than once per row and you don't want duplicate rows listed, you can use np.unique to filter your result (does nothing for this example):
In [173]: rows = indices[:,0]
In [174]: np.unique(rows)
Out[174]: array([0, 2])
You may try np.nonzero and unique
Check on 9
np.unique((a == 9).nonzero()[0])
Out[356]: array([0, 2], dtype=int64)
Check on 6
np.unique((a == 6).nonzero()[0])
Out[358]: array([2, 3], dtype=int64)
Check on 8
np.unique((a == 8).nonzero()[0])
Out[359]: array([2], dtype=int64)
On non-existent number, return empty list
np.unique((a == 88).nonzero()[0])
Out[360]: array([], dtype=int64)
I have a function that returns a numpy array. I loop this function with different data files but will end up with every loops giving out a different sized array (which is the desired output) but I cannot figure out how to properly append these arrays. Example arrays and the method I use for arranging them after I grab the data from the file is shown:
a1 = np.array([1,2,3])
a2 = np.vstack(a1)
# array([[1],
[2],
[3]])
b1 = np.array([4,5,6,7])
b2 = np.vstack(b2)
# array([[4],
[5],
[6],
[7]])
Simply I have these two arrays with one having 3 elements and one with 4. I want to arrange these vertically to look something like this for it to be exported:
1 4
2 5
3 6
7
I do not want zeros or Na to fill the gaps in the data as that would make more work.
This needs to work for vertical arrays with a column width of 2 to get output data to be organized like this:
1 2 5 6 10 11
2 3 6 7 11 12
3 4 7 8 12 13
8 9
So the first loop would produce this vertical 3,2 array while the second iteration of the loop would produce the 4,2 array where I would want to append or concatenate the 4,2 array to the original 3,2 array and so on. These sets of arrays will always be width of 2 but the lengths will change from each set of 2.
I have tried using the basic np.column_stack, np.concatenate, and np.append functions but they haven't worked. These can be lists instead of numpy arrays if that works better or even organizing the outputted data in a dataframe would be fine.
======= Update =======
To be more specific and after trying some of the solutions provided here are some more details on my issue.
My function gets data from a data file (works fine) which returns 2 lists or arrays (which ever) of values that are the same dimensions (no issue here either).
Now I am trying to do this while looping over all of the files in a directory and I want to append/concatenate these two lists (or arrays) for each file together but they could be different sizes. The trouble arises when I try to put them together vertically to yield columns of the output data. Also I need to do a simple mathematical operation on the values within the loop so I think they might need to be numpy arrays (or something similar) and not a list.
Loop #1 returns:
outdata1 = [0.0012, 0.0013, 0.00124, 0.00127]
outdata2 = [0.0016, 0.0014, 0.00134, 0.0013]
Loop #2 returns:
outdata1 = [0.00155, 0.00174, 0.0018]
outdata2 = [0.0019, 0.0020, 0.0021]
and so on...
Now I need to do math on these and spit them out into vertically organized column data without cutting off any data. This can be done with putting Na in space or with a data frame if that would work and I could correct those spaces before export. I would like it to look like this:
0.0012 0.0016 0.00155 0.0019
0.0013 0.0014 0.00174 0.0020
0.00124 0.00134 0.0018 0.0021
0.00127 0.0013
First, vstack on an array treats the array as a list on the first dimension. It then makes each 'row/element' into a 2d array, and concatenates them.
These all do the same thing:
In [94]: np.vstack(np.array([1,2,3]))
Out[94]:
array([[1],
[2],
[3]])
In [95]: np.vstack([[1],[2],[3]])
Out[95]:
array([[1],
[2],
[3]])
In [96]: np.concatenate(([[1]],[[2]],[[3]]), axis=0)
Out[96]:
array([[1],
[2],
[3]])
Matching arrays or lists can be 'column_stack` - the arrays are turned into (n,1) arrays, and then joined on the 2nd dimension:
In [97]: np.column_stack(([1,2,3], [4,5,6]))
Out[97]:
array([[1, 4],
[2, 5],
[3, 6]])
But the ragged arrays don't work.
An array of lists/arrays of differing size has object dtype, and is, for many purposes like a list of lists:
In [98]: np.array(([1,2,3],[4,5,6,7]))
Out[98]: array([list([1, 2, 3]), list([4, 5, 6, 7])], dtype=object)
Your last structure could written as a ragged list of lists:
In [100]: [[1,2,5,6,10,11],[2,3,6,7,11,12],[3,4,7,8,12,13],[8,9]]
Out[100]: [[1, 2, 5, 6, 10, 11], [2, 3, 6, 7, 11, 12], [3, 4, 7, 8, 12, 13], [8, 9]]
In [101]: np.array(_)
Out[101]:
array([list([1, 2, 5, 6, 10, 11]), list([2, 3, 6, 7, 11, 12]),
list([3, 4, 7, 8, 12, 13]), list([8, 9])], dtype=object)
Notice though this doesn't line up the [8,9] with the others. You need some sort of filler/spacer. The Python list zip_longest provides that:
In [102]: from itertools import zip_longest
In [103]: alist = [[1,2,3],[2,3,4],[5,6,7,8],[11,12,13]]
In [104]: list(zip_longest(*alist))
Out[104]: [(1, 2, 5, 11), (2, 3, 6, 12), (3, 4, 7, 13), (None, None, 8, None)]
With this padding we can make a 2d array (object dtype because of the None):
In [105]: np.array(_)
Out[105]:
array([[1, 2, 5, 11],
[2, 3, 6, 12],
[3, 4, 7, 13],
[None, None, 8, None]], dtype=object)
===
I can generate the numbers in your last display with a little function:
In [232]: def foo(i,n):
...: return np.column_stack((np.arange(i,i+n), np.arange(i+1,i+1+n)))
...:
In [233]: foo(1,3)
Out[233]:
array([[1, 2],
[2, 3],
[3, 4]])
In [234]: foo(5,4)
Out[234]:
array([[5, 6],
[6, 7],
[7, 8],
[8, 9]])
In [235]: foo(10,3)
Out[235]:
array([[10, 11],
[11, 12],
[12, 13]])
I can put all those arrays in a list:
In [236]: [Out[233], Out[234], Out[235]]
Out[236]:
[array([[1, 2],
[2, 3],
[3, 4]]), array([[5, 6],
[6, 7],
[7, 8],
[8, 9]]), array([[10, 11],
[11, 12],
[12, 13]])]
I can turn that list into an object dtype array:
In [237]: np.array([Out[233], Out[234], Out[235]])
Out[237]:
array([array([[1, 2],
[2, 3],
[3, 4]]),
array([[5, 6],
[6, 7],
[7, 8],
[8, 9]]),
array([[10, 11],
[11, 12],
[12, 13]])], dtype=object)
I could also display several rows of these arrays with:
In [238]: for i in range(3):
...: print(np.hstack([a[i,:] for a in Out[236]]))
...:
[ 1 2 5 6 10 11]
[ 2 3 6 7 11 12]
[ 3 4 7 8 12 13]
but to show the 4th row, which only exists for the middle array, I'd have to add more code to test whether we're off the end, and whether to add padding etc. I'll leave that exercise up to you, if it really matters. :)
Since you mentioned that lists are ok, why not use a list of such "vertical arrays"?:
my_list = []
while (not_done_yet):
two_col_array = your_func (some_param) # your_func returns (x,2) array
my_list.append(two_col_array)
my_list would now be a list of arrays of shape (x,2), where x could be different for different arrays in the list.
I have two matrices, A and B.
A=np.matrix([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
B=np.matrix([[1,1,1],[2,2,2],[3,3,3],[4,4,4]])
I want to substract some of B'rows (namely 0,2 and 3) from A. I tried to use
Index=np.array([0,2,3])
for i in Index:
A[i,:]=A[i,:]-B[i,:]
but it didn't work because matriz A should look like
matrix([[0, 1, 2],
[1, 2, 3],
[4, 5, 6],
[6, 7, 8]])
and I got
matrix([[ 1, 2, 3],
[ 2, 3, 4],
[ 7, 8, 9],
[10, 11, 12]])
What's the correct way to do this operation? I took me a long time to realize this problem (the real problem I'm trying to solve has more variables) and can't seem to figure it out.
If you do mean substract, then your should use
A[i,:]=A[i,:]-B[i,:]
instead of
A[i,:]=A[i,:]+B[i,:]
Numpy has element-wise subtraction, so something like:
import numpy as np
A=np.matrix([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
B=np.matrix([[1,1,1],[2,2,2],[3,3,3],[4,4,4]])
indices = [0,2,3]
for i in indices:
A[i,:]=np.subtract(A[i,:], B[i,:])
Will give you this matrix for A:
[[0, 1, 2],
[4, 5, 6],
[4, 5, 6],
[6, 7, 8]])
Is this what you are after? For better performance you could also just change the particular rows of A:
A[indices]=np.subtract(A[indices],B[indices])
Which will give the same answer.
I have a 3 dimensional numpy array. The dimension can go up to 128 x 64 x 8192. What I want to do is to change the order in the first dimension by interchanging pairwise.
The only idea I had so far is to create a list of the indices in the correct order.
order = [1,0,3,2...127,126]
data_new = data[order]
I fear, that this is not very efficient but I have no better idea so far
You could reshape to split the first axis into two axes, such that latter of those axes is of length 2 and then flip the array along that axis with [::-1] and finally reshape back to original shape.
Thus, we would have an implementation like so -
a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Sample run -
In [170]: a = np.random.randint(0,9,(6,3))
In [171]: order = [1,0,3,2,5,4]
In [172]: a[order]
Out[172]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
In [173]: a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Out[173]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
Alternatively, if you are looking to efficiently create those constantly flipping indices order, we could do something like this -
order = np.arange(data.shape[0]).reshape(-1,2)[:,::-1].ravel()
For example, I have a ndarray that is:
a = np.array([1, 3, 5, 7, 2, 4, 6, 8])
Now I want to split a into two parts, one is all numbers <5 and the other is all >=5:
[array([1,3,2,4]), array([5,7,6,8])]
Certainly I can traverse a and create two new array. But I want to know does numpy provide some better ways?
Similarly, for multidimensional array, e.g.
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[2, 4, 7]])
I want to split it according to the first column <3 and >=3, which result is:
[array([[1, 2, 3],
[2, 4, 7]]),
array([[4, 5, 6],
[7, 8, 9]])]
Are there any better ways instead of traverse it? Thanks.
import numpy as np
def split(arr, cond):
return [arr[cond], arr[~cond]]
a = np.array([1,3,5,7,2,4,6,8])
print split(a, a<5)
a = np.array([[1,2,3],[4,5,6],[7,8,9],[2,4,7]])
print split(a, a[:,0]<3)
This produces the following output:
[array([1, 3, 2, 4]), array([5, 7, 6, 8])]
[array([[1, 2, 3],
[2, 4, 7]]), array([[4, 5, 6],
[7, 8, 9]])]
It might be a quick solution
a = np.array([1,3,5,7])
b = a >= 3 # variable with condition
a[b] # to slice the array
len(a[b]) # count the elements in sliced array
1d array
a = numpy.array([2,3,4,...])
a_new = a[(a < 4)] # to get elements less than 5
2d array based on column(consider value of column i should be less than 5,
a = numpy.array([[1,2],[5,6],...]
a = a[(a[:,i] < 5)]
if your condition is multicolumn based, then you can make a new array applying the conditions on the columns. Then you can just compare the new array with value 5(according to my assumption) to get indexes and follow above codes.
Note that, whatever i have written in (), returns the index array.