Say I have a numpy 2D array:
>>> t
array([[-0.00880717, 0.02522217, -0.01014062],
[-0.00866732, 0.01737254, 0.05396272]])
Now using array slicing, I can quickly obtain all items in all rows starting from the column with index 1 and sum them up:
>>> t[:, 1:].sum()
0.08641680780899146
To verify manually, here is what happens:
>>> 0.02522217+0.01737254+-0.01014062+0.05396272
0.08641680999999998
Just to understand the numpy array operations better, is numpy first going over all rows and summing the items of the rows, or is it going down one column, and then down the next one?
Thanks for asking your question, #TMOTTM!
The way NumPy's sum semantics work are documented in the NumPy manual.
To summarize the manual while injecting my own understanding:
arr.sum() called without an axis argument simply sums up everything in the array. It is the most straightforward semantic operation to implement.
arr.sum(axis=0) will collapse axis 0 (the first axis) while summing.
arr.sum(axis=k) will collapse axis k while performing a summation.
Canonically, axis 0 is semantically recognized as the row-wise axis, axis 1 is the column-wise axis, and axis 2 is the depth-wise axis, and anything higher goes into hypercubes which are not easily described in words.
Made concrete:
In a 2D array, to collapse the rows (i.e. sum column-wise), do arr.sum(axis=0).
In a 2D array, to collapse the columns (i.e. sum row-wise), do arr.sum(axis=1).
At the end of the day, point 3 is the one you want to remember: reason carefully about which axis you wish to collapse, and you'll never go wrong!
Related
Code:
import numpy as np
ray = [1,22,33,42,51], [61,71,812,92,103], [113,121,132,143,151], [16,172,183,19,201]
ray = np.asarray(ray)
type(ray)
ray[np.ix_([-2:],[3:4])]
I'd like to use index slicing and get a subarray consisting of the last two rows and the 3rd/4th columns. My current code produces an error:
I'd also like to sum each column. What am I doing wrong? I cannot post a picture because I need at least 10 reputation points.
So you want to make a slice of an array. The most straightforward way to do it is... slicing:
slice = ray[-2:,3:]
or if you want it explicitly
slice = ray[-2:,3:5]
See it explained in Understanding slicing
But if you do want to use np.ix_ for some reason, you need
slice = ray[np.ix_([-2,-1],[3,4])]
You can't use : here, because [] here don't make a slice, they construct lists and you should specify explicitly every row number and every column number you want in the result. If there are too many consecutive indices, you may use range:
slice = ray[np.ix_(range(-2, 0),range(3, 5))]
And to sum each column:
slice.sum(0)
0 means you want to reduce the 0th dimension (rows) by summation and keep other dimensions (columns in this case).
I have a numpy array called predictions as follows
array([[3.7839172e-06, 8.0308418e-09, 2.2542761e-06, 5.9392878e-08,
5.3137046e-07, 1.7033290e-05, 1.7738441e-07, 1.0742254e-03,
1.8656212e-06, 9.9890006e-01]], dtype=float32)
In order to get the index of the maximum value in this array, I used the following
np.where(prediction==prediction.max())
But the result I am getting showing index 0 also.
(array([0], dtype=int64), array([9], dtype=int64))
Does anyone know why is it showing index 0 also?
Also how can I get just the index number instead of showing as (array([9], dtype=int64)
Use built-in function for it:
prediction.argmax()
output:
9
Also, that index 0 is the row number, so the max is at row 0 and column 9.
The predictions array here is two dimensional. When you call np.where with only a condition, this is the same as calling np.asarray(condition).nonzero() which returns you the indices of the non-zero elements of prediction==prediction.max() which is a boolean array with the only non-zero element at (0,9).
What you are looking for is the argmax function which will give you the index of the maximum value along an axis. You effectively only have one axis (2d but only one row) here so this should be fine.
As the other answers mentioned, you have a 2D array, so you end up with two indices. Since the array is just a row, the first index is always zero. You can bypass this in a number of ways:
Use prediction.argmax(). The default axis argument is None, which means operate on a flattened array. Other options that will get you the same result are prediction.argmax(-1) (last axis) and prediction.argmax(1) (second axis). Keep in mind that you will only ever get the index of the first maximum this way. That's fine if you only ever expect to have one, or only need one.
Use np.flatnonzero to get the linear indices similarly to the way you were doing:
np.flatnonzero(perdiction == prediction.max())
Use np.nonzero or np.where, but extract the axis you care about:
np.nonzero(prediction == prediction.max())[1]
ravel the array on input:
np.where(prediction.ravel() == prediction.max())
Do the same thing, but with np.squeeze:
np.nonzero(prediction.squeeze() == prediction.max())
As in Matlab, the nonzeros return the indexes ordered by columns. In NumPy, it seems the returned indexes are ordered by rows (for a 2D matrix). But this is not articulated in its documentation.
So, is it safe to assume that?
An example:
test = np.array([[0,2], [3,0]])
test[test.nonzero()]
gives array([2, 3]) instead of array([3, 2]).
There is the following comment on the C source code of PyArray_NonZero, the C function that handles all the calls to nonzero:
/*NUMPY_API
* Nonzero
*
* TODO: In NumPy 2.0, should make the iteration order a parameter.
*/
NPY_NO_EXPORT PyObject *
PyArray_Nonzero(PyArrayObject *self)
The iteration order is now hardcoded to be C-order, i.e. last index varies the fastest, i.e. sorted rows, then columns, for the 2D case. Given that comment, it is very safe to assume that, if this ever changes, it will be by providing new functionality that defaults to the current behavior.
Yes, numpy arrays work with rows then column indexes. If you want to work in a more Matlab way then you work with a transposed array. eg.
test.T[test.T.nonzero()]
The T property gives a transposed view of your array. So rows become columns and columns become rows. And because the array is a view, rather than a copy, it's also a very cheap operation.
I have a largish 2d numpy array, and I want to extract the lowest 10 elements of each row as well as their indexes. Since my array is largish, I would prefer not to sort the whole array.
I heard about the argpartition() function, with which I can get the indexes of the lowest 10 elements:
top10indexes = np.argpartition(myBigArray,10)[:,:10]
Note that argpartition() partitions axis -1 by default, which is what I want. The result here has the same shape as myBigArray containing indexes into the respective rows such that the first 10 indexes point to the 10 lowest values.
How can I now extract the elements of myBigArray corresponding to those indexes?
Obvious fancy indexing like myBigArray[top10indexes] or myBigArray[:,top10indexes] do something quite different. I could also use list comprehensions, something like:
array([row[idxs] for row,idxs in zip(myBigArray,top10indexes)])
but that would incur a performance hit iterating numpy rows and converting the result back to an array.
nb: I could just use np.partition() to get the values, and they may even correspond to the indexes (or may not..), but I don't want to do the partition twice if I can avoid it.
You can avoid using the flattened copies and the need to extract all the values by doing:
num = 10
top = np.argpartition(myBigArray, num, axis=1)[:, :num]
myBigArray[np.arange(myBigArray.shape[0])[:, None], top]
For NumPy >= 1.9.0 this will be very efficient and comparable to np.take().
I am having a issues figuring out to do this operation
So I have and the variable index 1xM sparse binary array and I have a 2-d array (NxM) samples. I want to use index to select specific rows of samples adnd get a 2-d array.
I have tried stuff like:
idx = index.todense() == 1
samples[idx.T,:]
but nothing.
So far I have made it work doing this:
idx = test_x.todense() == 1
selected_samples = samples[np.array(idx.flat)]
But there should be a cleaner way.
To give an idea using a fraction of the data:
print(idx.shape) # (1, 22360)
print(samples.shape) (22360, 200)
The short answer:
selected_samples = samples[index.nonzero()[1]]
The long answer:
The first problem is that your index matrix is 1xN while your sample ndarray is NxM. (See the mismatch?) This is why you needed to call .flat.
That's not a big deal, though, because we just need the nonzero entries in the sparse vector. Get those with index.nonzero(), which returns a tuple of (row indices, column indices). We only care about the column indices, so we use index.nonzero()[1] to get those by themselves.
Then, simply index with the array of nonzero column indices and you're done.