I have a numpy array X with shape (768, 8).
The last value for each row can either be 0 or 1, I only want rows with value 1, and call this T.
I did:
T = [x for x in X if x[7]==1]
This is correct, however, this is now a list, not a numpy array (in fact I cannot print T.shape).
What should I do instead to keep this a numpy array?
NumPy's boolean indexing gets the job done in a fully vectorized manner. This approach is generally more efficient (and arguably more elegant) than using list comprehensions and type conversions.
T = X[X[:, -1] == 1]
Demo:
In [232]: first_columns = np.random.randint(0, 10, size=(10, 7))
In [233]: last_column = np.random.randint(0, 2, size=(10, 1))
In [234]: X = np.hstack((first_columns, last_column))
In [235]: X
Out[235]:
array([[4, 3, 3, 2, 6, 2, 2, 0],
[2, 7, 9, 4, 7, 1, 8, 0],
[9, 8, 2, 1, 2, 0, 5, 1],
[4, 4, 4, 9, 6, 4, 9, 1],
[9, 8, 7, 6, 4, 4, 9, 0],
[8, 3, 3, 2, 9, 5, 5, 1],
[7, 1, 4, 5, 2, 4, 7, 0],
[8, 0, 0, 1, 5, 2, 6, 0],
[7, 9, 9, 3, 9, 3, 9, 1],
[3, 1, 8, 7, 3, 2, 9, 0]])
In [236]: mask = X[:, -1] == 1
In [237]: mask
Out[237]: array([False, False, True, True, False, True, False, False, True, False], dtype=bool)
In [238]: T = X[mask]
In [239]: T
Out[239]:
array([[9, 8, 2, 1, 2, 0, 5, 1],
[4, 4, 4, 9, 6, 4, 9, 1],
[8, 3, 3, 2, 9, 5, 5, 1],
[7, 9, 9, 3, 9, 3, 9, 1]])
By calling
T = [x for x in X if x[8]==1]
you are making T as a list. To convert it any list to a numpy array, just use:
T = numpy.array([x for x in X if x[8]==1])
Here is what happens:
In [1]: import numpy as np
In [2]: a = [1,2,3,4]
In [3]: a.T
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-9f69ed463660> in <module>()
----> 1 a.T
AttributeError: 'list' object has no attribute 'T'
In [4]: a = np.array(a)
In [5]: a.T
Out[5]: array([1, 2, 3, 4])
In [6]:
Related
I have an np.array :
a = np.array([x for x in range(10) for y in range(10)]).reshape(10, 10)
I want to get the 3rd and the 6th rows columns 4,7,10 - marked in green.
I tried:
a[[2,5]] # gives me the rows I want
a[[2,5], [3,6,10]] # produces an an error
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
In the end the result should look like:
[[2,2,2],
[5,5,5]]
Where is my mistake?
the first index list must the (2,1) shape (or list equivalent):
In [31]: a[[[2], [5]], [3, 6, 9]]
Out[31]:
array([[2, 2, 2],
[5, 5, 5]])
Do you understand what the error message means by broadcasting?
For simple addition, a (2,1) array broadcasts with a (3,) to produce a (2,3) result:
In [32]: I, J = np.array((2, 5)), np.array((3, 6, 9))
In [33]: I, J
Out[33]: (array([2, 5]), array([3, 6, 9]))
In [34]: I + J
Traceback (most recent call last):
Input In [34] in <module>
I + J
ValueError: operands could not be broadcast together with shapes (2,) (3,)
In [35]: I[:, None] + J
Out[35]:
array([[ 5, 8, 11],
[ 8, 11, 14]])
The same idea applies to indexing with several arrays.
Your a can be created with the same logic:
In [38]: np.arange(10)[:, None] + np.zeros(10,int)
Out[38]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]])
With 2 arrays (lists) of matching size, the effect is to select a "diagonal", or 1d array of values (rather than the block that you were trying to get):
In [39]: a[[2, 3], [5, 6]]
Out[39]: array([2, 3])
In [40]: a[2, 5], a[3, 6]
Out[40]: (2, 3)
I believe this link will explain everything since the question is the same and was given great answer:
Selecting specific rows and columns from NumPy array
I need to calculate the sum of elementwise subtracts from the vector from the following equitation:
sum(y(i) - y(j)) at i!=j
y is given as a numpy array
One option is to iterate through the double loop:
dev = 0
for i in range(y.shape[0]):
for j in range(y.shape[0]):
if i == j:
continue
dev += y[i, j] - y[i, j]
That is definitely not the optimal solution.
How it can be optimized using vectorized operations with numpy vectors?
Say y is flat, e.g.
>>> y = np.arange(10)
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> y.shape
(10,)
You could compute the "cartesian differences" as follows
>>> m = np.abs(y[:, None] - y[None, :])
>>> m
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 0, 1, 2, 3, 4, 5, 6, 7, 8],
[2, 1, 0, 1, 2, 3, 4, 5, 6, 7],
[3, 2, 1, 0, 1, 2, 3, 4, 5, 6],
[4, 3, 2, 1, 0, 1, 2, 3, 4, 5],
[5, 4, 3, 2, 1, 0, 1, 2, 3, 4],
[6, 5, 4, 3, 2, 1, 0, 1, 2, 3],
[7, 6, 5, 4, 3, 2, 1, 0, 1, 2],
[8, 7, 6, 5, 4, 3, 2, 1, 0, 1],
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]])
and finally
>>> dev = m.sum()/2
>>> dev
165.0
using itertools combination:
import itertools
sum([x2 - x1 for x1, x2 in itertools.combinations(y, 2)])
using np.subtract.outer
np.abs(np.subtract.outer(y,y)).sum()/2
Time Comparison:
Method 1 (Using Itertools):
Wall time: 18.9 s
Method 2 (Using KeepAlive's cartesian differences):
Wall time: 491 ms
Method 3 (Using np.subtract.outer):
Wall time: 467 ms
How can I replace array values in place if I don't know the axis beforehand?
For example, if I wanna do something like
arr[:,5]
but I don't know the axis beforehand and want to make it general I can use take:
arr.take(5, axis=1)
and it'll work.
However, if I want to something like
arr[:,5]=10
but I don't know the axis beforehand, how can I do it? I obviously can't do arr.take(5, axis=1) = 10, and I can't find a function to do it.
The function that comes the closest (that I found) would be np.put(), but I don't think it can be done with that.
You could swap the desired axis to the first position and then do the assignment. swapaxes returns a view, so the assignment will do what you want.
For example,
In [87]: np.random.seed(123)
In [88]: a = np.random.randint(1, 9, size=(5, 8))
In [89]: a
Out[89]:
array([[7, 6, 7, 3, 5, 3, 7, 2],
[4, 3, 4, 2, 7, 2, 1, 2],
[7, 8, 2, 1, 7, 1, 8, 2],
[4, 7, 6, 5, 1, 1, 5, 2],
[8, 4, 3, 5, 8, 3, 5, 8]])
In [90]: ax = 1
In [91]: k = 5
In [92]: val = 99
In [93]: a.swapaxes(0, ax)[k] = val
In [94]: a
Out[94]:
array([[ 7, 6, 7, 3, 5, 99, 7, 2],
[ 4, 3, 4, 2, 7, 99, 1, 2],
[ 7, 8, 2, 1, 7, 99, 8, 2],
[ 4, 7, 6, 5, 1, 99, 5, 2],
[ 8, 4, 3, 5, 8, 99, 5, 8]])
In [95]: ax = 0
In [96]: k = 2
In [97]: val = -1
In [98]: a.swapaxes(0, ax)[k] = val
In [99]: a
Out[99]:
array([[ 7, 6, 7, 3, 5, 99, 7, 2],
[ 4, 3, 4, 2, 7, 99, 1, 2],
[-1, -1, -1, -1, -1, -1, -1, -1],
[ 4, 7, 6, 5, 1, 99, 5, 2],
[ 8, 4, 3, 5, 8, 99, 5, 8]])
I don't think there is a NumPy function for this, but it is not too hard to construct your own:
def replace(arr, indices, val, axis):
s = [slice(None)]*arr.ndim
s[axis] = indices
arr[s] = val
import numpy as np
def replace(arr, indices, val, axis):
s = [slice(None)]*arr.ndim
s[axis] = indices
arr[s] = val
arr = np.zeros((3,6,2))
indices = 5
axis = 1
val = 10
replace(arr, indices, val, axis)
print(np.take(arr, indices, axis))
prints
[[ 10. 10.]
[ 10. 10.]
[ 10. 10.]]
I have a 2D Numpy ndarray, x, that I need to split in square subregions of size s. For each subregion, I want to get the greatest element (which I do), and its position within that subregion (which I can't figure out).
Here is a minimal example:
>>> x = np.random.randint(0, 10, (6,8))
>>> x
array([[9, 4, 8, 9, 5, 7, 3, 3],
[3, 1, 8, 0, 7, 7, 5, 1],
[7, 7, 3, 6, 0, 2, 1, 0],
[7, 3, 9, 8, 1, 6, 7, 7],
[1, 6, 0, 7, 5, 1, 2, 0],
[8, 7, 9, 5, 8, 3, 6, 0]])
>>> h, w = x.shape
>>> s = 2
>>> f = x.reshape(h//s, s, w//s, s)
>>> mx = np.max(f, axis=(1, 3))
>>> mx
array([[9, 9, 7, 5],
[7, 9, 6, 7],
[8, 9, 8, 6]])
For example, the 8 in the lower left corner of mx is the greatest element from subregion [[1,6], [8, 7]] in the lower left corner of x.
What I want is to get an array similar to mx, that keeps the indices of the largest elements, like this:
[[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]]
where, for example, the 2 in the lower left corner is the index of 8 in the linear representation of [[1, 6], [8, 7]].
I could do it like this: np.argmax(f[i, :, j, :]) and iterate over i and j, but the speed difference is enormous for large amounts of computation. To give you an idea, I'm trying to use (only) Numpy for max pooling. Basically, I'm asking if there is a faster alternative than what I'm using.
Here's one approach -
# Get shape of output array
m,n = np.array(x.shape)//s
# Reshape and permute axes to bring the block as rows
x1 = x.reshape(h//s, s, w//s, s).swapaxes(1,2).reshape(-1,s**2)
# Use argmax along each row and reshape to output shape
out = x1.argmax(1).reshape(m,n)
Sample input, output -
In [362]: x
Out[362]:
array([[9, 4, 8, 9, 5, 7, 3, 3],
[3, 1, 8, 0, 7, 7, 5, 1],
[7, 7, 3, 6, 0, 2, 1, 0],
[7, 3, 9, 8, 1, 6, 7, 7],
[1, 6, 0, 7, 5, 1, 2, 0],
[8, 7, 9, 5, 8, 3, 6, 0]])
In [363]: out
Out[363]:
array([[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]])
Alternatively, to simplify things, we could use scikit-image that does the heavy work of reshaping and permuting axes for us -
In [372]: from skimage.util import view_as_blocks as viewB
In [373]: viewB(x, (s,s)).reshape(-1,s**2).argmax(1).reshape(m,n)
Out[373]:
array([[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]])
I have an array of shape 3x3 which looks something like:
import numpy as np
A = np.array(([1,2,3],[11,12,5],[4,9,1]))
>>> A
array([[ 1, 2, 3],
[11, 12, 5],
[ 4, 9, 1]])
I want to repmat one column at a time for 3 times so that I can achieve the following:
B
array([[ 1, 1, 1, 2, 2, 2, 3, 3, 3],
[11, 11, 11, 12, 12, 12, 5, 5, 5],
[ 4, 4, 4, 9, 9, 9, 1, 1, 1]])
I can do a loop for each column and repmat that but I am looking for smarter way to do it as my real life array has size 5000x300
This is the job of numpy.repeat. Quoting an example from the docs:
>>> x = np.array([[1,2],[3,4]])
>>> np.repeat(x, 3, axis=1)
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4]])