My question pertains to array iteration but is a bit more complicated. You see I have an array with a shape of (4, 50). What I want to do is find the mean of the arrays. I will show a simple explanation of what I mean
A = np.array([[10,5,3],[12,6,6],[9,8,7],[20,3,4]])
When this code is run, you get an array with a shape of (4,3). What I want is for the mean of each row to be found and returned.
Returned should be an array of ([[6],[8],[8],[9]]) with the same rows and naturally a column of 1.
Please explain the code and thought process behind it. Thank you very much.
Use the numpy.mean function. Parameter axis=1 means that the row-wise mean will be calculated. Parameter keepdims=True means that original array dimensions are kept.
import numpy as np
A = np.array([[10,5,3],[12,6,6],[9,8,7],[20,3,4]])
B = np.mean(A, axis=1, keepdims=True)
print(B)
# Output:
# [[6.]
# [8.]
# [8.]
# [9.]]
Use np.mean and list comprehension into a new array:
A = np.array([[10,5,3],[12,6,6], [9,8,7],[20,3,4]])
# Use .reshape() to get 4 rows by 1 column.
new_A = np.array([np.mean(row) for row in A]).reshape(-1, 1)
Output:
array([[6.], [8.], [8.], [9.]])
Related
I have a numpy array called predictions as follows
array([[3.7839172e-06, 8.0308418e-09, 2.2542761e-06, 5.9392878e-08,
5.3137046e-07, 1.7033290e-05, 1.7738441e-07, 1.0742254e-03,
1.8656212e-06, 9.9890006e-01]], dtype=float32)
In order to get the index of the maximum value in this array, I used the following
np.where(prediction==prediction.max())
But the result I am getting showing index 0 also.
(array([0], dtype=int64), array([9], dtype=int64))
Does anyone know why is it showing index 0 also?
Also how can I get just the index number instead of showing as (array([9], dtype=int64)
Use built-in function for it:
prediction.argmax()
output:
9
Also, that index 0 is the row number, so the max is at row 0 and column 9.
The predictions array here is two dimensional. When you call np.where with only a condition, this is the same as calling np.asarray(condition).nonzero() which returns you the indices of the non-zero elements of prediction==prediction.max() which is a boolean array with the only non-zero element at (0,9).
What you are looking for is the argmax function which will give you the index of the maximum value along an axis. You effectively only have one axis (2d but only one row) here so this should be fine.
As the other answers mentioned, you have a 2D array, so you end up with two indices. Since the array is just a row, the first index is always zero. You can bypass this in a number of ways:
Use prediction.argmax(). The default axis argument is None, which means operate on a flattened array. Other options that will get you the same result are prediction.argmax(-1) (last axis) and prediction.argmax(1) (second axis). Keep in mind that you will only ever get the index of the first maximum this way. That's fine if you only ever expect to have one, or only need one.
Use np.flatnonzero to get the linear indices similarly to the way you were doing:
np.flatnonzero(perdiction == prediction.max())
Use np.nonzero or np.where, but extract the axis you care about:
np.nonzero(prediction == prediction.max())[1]
ravel the array on input:
np.where(prediction.ravel() == prediction.max())
Do the same thing, but with np.squeeze:
np.nonzero(prediction.squeeze() == prediction.max())
An example is shown as follows:
>>> import numpy as np
>>> a=np.zeros((288,512))
>>> x1,x2,y1,y2=0,16,0,16
>>> p=a[x1:x2][y1:y2]
>>> p.shape
(16, 512)
>>> p=a[x1:x2,y1:y2]
>>> p.shape
I try to query a patch from an array, ranging from columns 0 to 16, and rows 0 to 16. I index the array in two ways and get very different result. a[x1:x2][y1:y2] gives me the wrong result.
Why?
Thx for helping me!!!
When you do a[x1:x2][y1:y2], you are slicing by rows twice. That is, a[x1:x2] will give you a shape (16,512). The second slice operation in a[x1:x2][y1:y2] is slicing the result of the first operation and will give you the same result.
In the second case, when you do a[x1:x2,y1:y2], you are slicing by the two dimensions of your 2-dimensional array.
Important note: If you have a 2-dimensional array and you slice like this:
a = np.zeros((10,15))
a[1:3].shape
Output:
(2, 15)
you will slice only by rows. Your resulting array will have 2 rows and the total number of columns (15 columns). If you want to slice by rows and columns, you will have to use a[1:3, 1:3].
The two methods of indexing you tried are not equivalent. In the first one (a[x1:x2][y1:y2]), you are essentially indexing the first axis twice. In the second, you are indexing the first and second axes.
a[x1:x2][y1:y2] can be rewritten as
p = a[x1:x2] # result still has two dimensions
p = p[y1:y2]
You are first indexing 0:16 in the first dimension. Then you index 0:16 in the first dimension of the result of the previous operation (which will simply return the same as a[x1:x2] because x1==y1 and x2==y2).
In the second method, you index the first and second dimensions directly. I would not write it this way, but one could write it like this to contrast it with the first method:
a[x1:x2][:, y1:y2]
One question about mask 2-d np.array data.
For example:
one 2-d np.array value in the shape of 20 x 20.
An index t = [(1,2),(3,4),(5,7),(12,13)]
How to mask the 2-d array value by the (y,x) in index?
Usually, replacing with np.nan are based on the specific value like y[y==7] = np.nan
On my example, I want to replace the value specific location with np.nan.
For now, I can do it by:
Creating a new array value_mask in the shape of 20 x 20
Loop the value and testify the location by (i,j) == t[k]
If True, value_mask[i,j] = value[i,j] ; In verse, value_mask[i,j] = np.nan
My method was too bulky especially for hugh data(3 levels of loops).
Are there some efficiency method to achieve that? Any advice would be appreciate.
You are nearly there.
You can pass arrays of indices to arrays. You probably know this with 1D-arrays.
With a 2D-array you need to pass the array a tuple of lists (one tuple for each axis; one element in the lists (which have to be of equal length) for each array-element you want to chose). You have a list of tuples. So you have just to "transpose" it.
t1 = zip(*t)
gives you the right shape of your index array; which you can now use as index for any assignment, for example: value[t1] = np.NaN
(There are lots of nice explanation of this trick (with zip and *) in python tutorials, if you don't know it yet.)
You can use np.logical_and
arr = np.zeros((20,20))
You can select by location, this is just an example location.
arr[4:8,4:8] = 1
You can create a mask the same shape as arr
mask = np.ones((20,20)).astype(bool)
Then you can use the np.logical_and.
mask = np.logical_and(mask, arr == 1)
And finally, you can replace the 1s with the np.nan
arr[mask] = np.nan
I have an array X of <class 'scipy.sparse.csr.csr_matrix'> format with shape (44, 4095)
I would like to now to create a new numpy array say X_train = np.empty([44, 4095]) and copy row by row in a different order. Say I want the 5th row of X in 1st row of X_train.
How do I do this (copying an entire row into a new numpy array) similar to matlab?
Define the new row order as a list of indices, then define X_train using integer indexing:
row_order = [4, ...]
X_train = X[row_order]
Note that unlike Matlab, Python uses 0-based indexing, so the 5th row has index 4.
Also note that integer indexing (due to its ability to select values in arbitrary order) returns a copy of the original NumPy array.
This works equally well for sparse matrices and NumPy arrays.
Python works generally by reference, which is something you should keep in mind. What you need to do is make a copy and then swap. I have written a demo function which swaps rows.
import numpy as np # import numpy
''' Function which swaps rowA with rowB '''
def swapRows(myArray, rowA, rowB):
temp = myArray[rowA,:].copy() # create a temporary variable
myArray[rowA,:] = myArray[rowB,:].copy()
myArray[rowB,:]= temp
a = np.arange(30) # generate demo data
a = a.reshape(6,5) # reshape the data into 6x5 matrix
print a # prin the matrix before the swap
swapRows(a,0,1) # swap the rows
print a # print the matrix after the swap
To answer your question, one solution would be to use
X_train = np.empty([44, 4095])
X_train[0,:] = x[4,:].copy() # store in the 1st row the 5th one
unutbu answer seems to be the most logical.
Kind Regards,
I want to compute the integral image. for example
a=array([(1,2,3),(4,5,6)])
b = a.cumsum(axis=0)
This will generate another array b.Can I execute the cumsum in-place. If not . Are there any other methods to do that
You have to pass the argument out:
np.cumsum(a, axis=1, out=a)
OBS: your array is actually a 2-D array, so you can use axis=0 to sum along the rows and axis=1 to sum along the columns.
Try this using numpy directly numpy.cumsum(a) :
a=array([(1,2,3)])
b = np.cumsum(a)
print b
>>array([1,3,6])