I am using a template-script to learn data analysis in using numpy and I don't understand this syntax. There exist two arrays dist_data and dataArray, l is a loop-dummy-variable (as in for l in range(0,k):)and I don't understand the content, specifically the purpose of separation by , in the second parenthesis [l, self.dataArray.shape[1]-1] because I am assuming that it represented a column of dist_data
dist_data[dist_data[:,-1].argsort()][l, self.dataArray.shape[1]-1]
dist_data[dist_data[:,-1].argsort()][l, self.dataArray.shape[1]-1]
dist_data[:,-1] last column of 2d dist_data. Sort on that and get the indices
So dist_data[dist_data[:,-1].argsort()] is dist_data sorted on the last column.
[l, self.dataArray.shape[1]-1] is just an indexing on a 2d array; the l row, and the self... column. It looks like the column that corresponds to the last of self.dataArray.
So in sum - sort dist_data on the last column, and pick the l'th row, and some column.
Related
I have read in a numpy.ndarray that looks like this:
[[1.4600e-01 2.9575e+00 6.1580e+02]
[5.8600e-01 4.5070e+00 8.7480e+02]]
Let's assume that this array I am reading will not always have a length of 2. (e.g. It could have a length of 1,3, 456, etc.)
I would like to separate this to two separate arrays that look like this:
[[1.4600e-01 2.9575e+00 6.1580e+02]]
[[5.8600e-01 4.5070e+00 8.7480e+02]]
I previously tried searching a solution to this problem but this is not the solution I am looking for: python convert 2d array to 1d array
Since you want to extract the rows, you can just index them. So suppose your array is stored in the variable x. x[0] will give you the first row: [1.4600e-01 2.9575e+00 6.1580e+02], while x[1] will give you the second row: [5.8600e-01 4.5070e+00 8.7480e+02], etc.
You can also iterate over the rows doing something like:
for row in x:
# Do stuff with the row
If you really want to preserve the outer dimension, you reshape the rows using x[0].reshape((1,-1)) which says to set the first dimension to 1 (meaning it has 1 row) and infer the second dimension from the existing data.
Alternatively if you want to split some number of rows into n groups, you can use the numpy.vsplit() function: https://numpy.org/doc/stable/reference/generated/numpy.vsplit.html#numpy.vsplit
However, I would suggest looping over the rows instead of splitting them up unless you really need to split them up.
Code:
import numpy as np
ray = [1,22,33,42,51], [61,71,812,92,103], [113,121,132,143,151], [16,172,183,19,201]
ray = np.asarray(ray)
type(ray)
ray[np.ix_([-2:],[3:4])]
I'd like to use index slicing and get a subarray consisting of the last two rows and the 3rd/4th columns. My current code produces an error:
I'd also like to sum each column. What am I doing wrong? I cannot post a picture because I need at least 10 reputation points.
So you want to make a slice of an array. The most straightforward way to do it is... slicing:
slice = ray[-2:,3:]
or if you want it explicitly
slice = ray[-2:,3:5]
See it explained in Understanding slicing
But if you do want to use np.ix_ for some reason, you need
slice = ray[np.ix_([-2,-1],[3,4])]
You can't use : here, because [] here don't make a slice, they construct lists and you should specify explicitly every row number and every column number you want in the result. If there are too many consecutive indices, you may use range:
slice = ray[np.ix_(range(-2, 0),range(3, 5))]
And to sum each column:
slice.sum(0)
0 means you want to reduce the 0th dimension (rows) by summation and keep other dimensions (columns in this case).
I understand that
np.argmax(np.max(x, axis=1))
returns the index of the row that contains the maximum value and
np.argmax(np.max(x, axis=0))
returns the index of the row that contains the maximum value.
But what if the matrix contained strings? How can I change the code so that it still finds the index of the largest value?
Also (if there's no way to do what I previously asked for), can I change the code so that the operation is only carried out on a sub-section of the matrix, for instance, on the bottom right '2x2' sub-matrix in this example:
array = [['D','F,'J'],
['K',3,4],
['B',3,1]]
[[3,4],
[3,1]]
Can you try first converting the column to type dtype? If you take the min/max of a dtype column, it should use string values for the minimum/maximum.
Although not efficient, this could be one way to find index of the maximum number in the original matrix by using slices:
newmax=0
newmaxrow=0
newmaxcolumn=0
for row in [array[i][1:] for i in range(1,2)]:
for num in row:
if num>newmax:
newmax=num
newmaxcolumn=row.index(newmax)+1
newmaxrow=[array[i][1:] for i in range(1,2)].index(row)+1
Note: this method would not work if the lagest number lies within row 0 or column 0.
I have an numpy array 'A' of size 5000x10. I also have another number 'Num'. I want to apply the following to each row of A:
import numpy as np
np.max(np.where(Num > A[0,:]))
Is there a pythonic way than writing a for loop for above.
You could use argmax -
A.shape[1] - 1 - (Num > A)[:,::-1].argmax(1)
Alternatively with cumsum and argmax -
(Num > A).cumsum(1).argmax(1)
Explanation : With np.max(np.where(..), we are basically looking to get the last occurrence of matches along each row on the comparison.
For the same, we can use argmax. But, argmax on a boolean array gives us the first occurrence and not the last one. So, one trick is to perform the comparison and flip the columns with [:,::-1] and then use argmax. The column indices are then subtracted by the number of cols in the array to make it trace back to the original order.
On the second approach, it's very similar to a related post and therefore quoting from it :
One of the uses of argmax is to get ID of the first occurence of the max element along an axis in an array . So, we get the cumsum along the rows and get the first max ID, which represents the last non-zero elem. This is because cumsum on the leftover elements won't increase the sum value after that last non-zero element.
I understood that sorting a numpy array arr by column (for only a particular column, for example, its 2nd column) can be done with:
arr[arr[:,1].argsort()]
How I understood this code sample works: argsort sorts the values of the 2nd column of arr, and gives the corresponding indices as an array. This array is given to arr as row numbers. Am I correct in my interpretation?
Now I wonder what if I want to sort the array arr with respect to the 2nd row instead of the 2nd column? Is the simplest way to transpose the array before sorting it and transpose it back after sorting, or is there a way to do it like previously (by giving an array with the number of the columns we wish to display)?
Instead of doing (n,n)array[(n,)array] (n is the size of the 2d array) I tried to do something like (n,n)array[(n,1)array] to indicate the numbers of the columns but it does not work.
EXAMPLE of what I want:
arr = [[11,25],[33,4]] => base array
arr_col2=[[33,4],[11,25]] => array I got with argsort()
arr_row2=[[25,11],[4,33]] => array I tried to got in a simple way with argsort() but did not succeed
I assume that arr is a numpy array? I haven't seen the syntax arr[:,1] in any other context in python. It would be worth mentioning this in your question!
Assuming this is the case, then you should be using
arr.sort(axis=0)
to sort by column and
arr.sort(axis=1)
to sort by row. (Both sort in-place, i.e. change the value of arr. If you don't want this you can copy arr into another variable first, and apply sort to that.)
If you want to sort just a single row (in this case, the second one) then
arr[1,:].sort()
works.
Edit: I now understand what problem you are trying to solve. You would like to reorder the columns in the matrix so that the nth row goes in increasing order. You can do this simply by
arr[:,arr[1,:].argsort()]
(where here we're sorting by the 2nd row).