Let's say I have a matrix:
>> a = np.arange(25).reshape(5, 5)`
>> a
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
and two vectors of indices that define a span of matrix elements that I want to extract:
>> indices1 = np.array([0, 1, 1, 0, 0])
>> indices2 = np.array([2, 3, 3, 2, 2])
As you can see, difference between each corresponding index is equal to 2.
I would like to do sth like this extract a part of the matrix:
>> submatrix = a[indices1:indices2, :]
so that the result would be 2x5 matrix:
>> submatrix
[[ 0 6 7 3 4],
[ 5 11 12 8 9]]
For all I know, numpy allows to provide indices as a boundaries, but does not allow to provide arrays, only integers, e.g. a[0:2].
Note what I want to subtract is not a submatrix:
Do you know of some other way of indexing a numpy matrix so that it is possible to provide arrays defining spans? For now I managed to do it only with for loops.
For reference, the most obvious loop (still took several experimental steps):
In [87]: np.concatenate([a[i:j,n] for n,(i,j) in enumerate(zip(indices1,indices2))], ).reshape(-1,2).T
Out[87]:
array([[ 0, 6, 7, 3, 4],
[ 5, 11, 12, 8, 9]])
Broadcasted indices taking advantage of the constant length:
In [88]: indices1+np.arange(2)[:,None]
Out[88]:
array([[0, 1, 1, 0, 0],
[1, 2, 2, 1, 1]])
In [89]: a[indices1+np.arange(2)[:,None],np.arange(5)]
Out[89]:
array([[ 0, 6, 7, 3, 4],
[ 5, 11, 12, 8, 9]])
Related
a = np.array([0,1,2])
b = np.array([3,4,5,6,7])
...
c = np.dot(a,b)
I want to transpose b so I can calculate the dot product of a and b.
You can use numpy's broadcasting for this:
import numpy as np
a = np.array([0,1,2])
b = np.array([3,4,5,6,7])
In [3]: a[:,None]*b
Out[3]:
array([[ 0, 0, 0, 0, 0],
[ 3, 4, 5, 6, 7],
[ 6, 8, 10, 12, 14]])
This has nothing to do with a dot product, though. But in the comments you said, that you want this result.
You could also use the numpy function outer:
In [4]: np.outer(a, b)
Out[4]:
array([[ 0, 0, 0, 0, 0],
[ 3, 4, 5, 6, 7],
[ 6, 8, 10, 12, 14]])
Well for this what you want is the outer product of the two arrays. The function you want to use for this is np.outer, :
a = np.array([0,1,2])
b = np.array([3,4,5,6,7])
np.outer(a,b)
array([[ 0, 0, 0, 0, 0],
[ 3, 4, 5, 6, 7],
[ 6, 8, 10, 12, 14]])
So with NumPy you could reshape swapping axes:
a = np.swapaxes([a], 1, 0)
# [[0]
# [1]
# [2]]
Then
print(a * b)
# [[ 0 0 0 0 0]
# [ 3 4 5 6 7]
# [ 6 8 10 12 14]]
Swapping b require to transpose the product, se here below.
Or usual NumPy reshape:
a = np.array([0,1,2])
b = np.array([3,4,5,6,7]).reshape(5,1)
print((a * b).T)
# [[ 0 0 0 0 0]
# [ 3 4 5 6 7]
# [ 6 8 10 12 14]]
Reshape is like b = np.array([ [bb] for bb in [3,4,5,6,7] ]) then b becomes:
# [[3]
# [4]
# [5]
# [6]
# [7]]
While reshaping a no need to transpose:
a = np.array([0,1,2]).reshape(3,1)
b = np.array([3,4,5,6,7])
print(a * b)
# [[ 0 0 0 0 0]
# [ 3 4 5 6 7]
# [ 6 8 10 12 14]]
Just out of curiosity, good old list comprehension:
a = [0,1,2]
b = [3,4,5,6,7]
print( [ [aa * bb for bb in b] for aa in a ] )
#=> [[0, 0, 0, 0, 0], [3, 4, 5, 6, 7], [6, 8, 10, 12, 14]]
Others have provided the outer and broadcasted solutions. Here's the dot one(s):
np.dot(a.reshape(3,1), b.reshape(1,5))
a[:,None].dot(b[None,:])
a[None].T.dot( b[None])
Conceptually I think it's a bit of an overkill, but due to implementation details, it actually is fastest
.
I have a numpy array:
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
>> arr
[[ 1 2 3 4 5]
[ 6 7 8 9 10]]
I want to take a portion of the array based on indices (not slices):
ix = np.ix_([0, 1], [0, 2])
>> arr[ix]
[[1 3]
[6 8]]
And I want to modify those elements in the original array, which would work if I did this:
arr[ix] = 0
>> arr
[[ 0 2 0 4 5]
[ 0 7 0 9 10]]
But I only want to change them if they follow a specific condition, like if they are lesser than 5. I am trying this:
subarr = arr[ix]
subarr[subarr < 5] = 0
But it doesn't modify the original one.
>> arr
[[ 1 2 3 4 5]
[ 6 7 8 9 10]]
>> subarr
[[0 0]
[6 8]]
I am not sure why this is not working, since both accessing the array by indices with np.ix_ and using a mask subarr < 5 should return a view of the array, not a copy.
Fancy indexing returns a copy; hence your original array will not be updated. You can use numpy.where to update your values:
arr[ix] = np.where(arr[ix] < 5, 0, arr[ix])
array([[ 0, 2, 0, 4, 5],
[ 6, 7, 8, 9, 10]])
When you do:
arr[ix] = 0
The python interpreter does arr.__setitem__(ix, 0) hence modifying the original object.
On the second case subarr is independent of arr, it is a copy of the subset of arr. You then modify this copy.
I have a large NumPy array (OriginalArray) with many rows and 8 columns.
I want to create a new array (NewArray) in which each row has the following properties:
Columns 1, 3, 5, and 7 of NewArray are the sum over N rows of columns 1, 3, 5, and 7 of OriginalArray
Columns 2, 4, 6, and 8 of NewArray are the mean over N rows of columns 2, 4, 6, and 8 of OriginalArray
So, the NewArray has 1/N as many rows as the OriginalArray.
For example:
Original Array = [1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 ]
with N = 2
NewArray = [2 1 2 1 2 1 2 1
2 1 2 1 2 1 2 1]
Please excuse the messy formatting. I'm still very new at this (my first question here, actually).
Thanks!
Here's a vectorized approach making heavy usage of slicing -
nrows = a.shape[0]//N # a is input array
out = np.empty((nrows,8))
out[:,::2] = a[:,::2].reshape(-1,N,4).sum(1)
out[:,1::2] = a[:,1::2].reshape(-1,N,4).mean(1)
Sample run -
In [64]: a # Input array
Out[64]:
array([[5, 1, 5, 8, 5, 0, 3, 1],
[0, 7, 8, 7, 0, 3, 5, 1],
[8, 6, 6, 4, 1, 6, 1, 2],
[4, 5, 5, 7, 5, 2, 1, 2]])
In [65]: N = 2 # Summing/averaging length
In [66]: a[:,::2] # Select [1,3,5,7] cols
Out[66]:
array([[5, 5, 5, 3],
[0, 8, 0, 5],
[8, 6, 1, 1],
[4, 5, 5, 1]])
In [67]: a[:,::2].reshape(-1,N,4).sum(1) # Sum N rows by splitting axis
Out[67]:
array([[ 5, 13, 5, 8],
[12, 11, 6, 2]])
In [68]: a[:,1::2] # Select [2,4,6,8] cols
Out[68]:
array([[1, 8, 0, 1],
[7, 7, 3, 1],
[6, 4, 6, 2],
[5, 7, 2, 2]])
In [69]: a[:,1::2].reshape(-1,N,4).mean(1) # Similarly average across N rows
Out[69]:
array([[ 4. , 7.5, 1.5, 1. ],
[ 5.5, 5.5, 4. , 2. ]])
I'm assuming that your original_array (note the PEP8 style) is already formatted in rows and columns. By this I mean, original_array = np.array([[1,1...],[1,...],[1,...],[1,...]])
An easy one-liner to create a single row of new_array would be as follows:
import numpy as np
row = [np.sum(original_array[:,x]) if x%2==1 else np.mean(test[:,x]) for x in range(len(original_array[0]))]
And then to copy the row, simply:
new_array = [row]*N
I have two arrays A and B. In NumPy you can use A as an index to B e.g.
A = np.array([[1,2,3,1,7,3,1,2,3],[4,5,6,4,5,6,4,5,6],[7,8,9,7,8,9,7,8,9]])
B= np.array([1,2,3,4,5,6,7,8,9,0])
c = B[A]
Which produces:
[[2 3 4 2 8 4 2 3 4] [5 6 7 5 6 7 5 6 7] [8 9 0 8 9 0 8 9 0]]
However, in my case the arrays A and B are SciPy CSR sparse arrays and they don't seem to support indexing.
A_sparse = sparse.csr_matrix(A)
B_sparse = sparse.csr_matrix(B)
c = B_sparse[A_sparse]
This results in:
IndexError: Indexing with sparse matrices is not supported except boolean indexing where matrix and index are equal shapes.
I've come up with the function below to replicate NumPy's behavior with the sparse arrays:
def index_sparse(A,B):
A_sparse = scipy.sparse.coo_matrix(A)
B_sparse = sparse.csr_matrix(B)
res = sparse.csr_matrix(A_sparse)
for i,j,v in zip(A_sparse.row, A_sparse.col, A_sparse.data):
res[i,j] = B_sparse[0, v]
return res
res = index_sparse(A, B)
print res.todense()
Looping over the array and having to create a new array in Python isn't ideal. Is there a better way of doing this using built-in functions from SciPy/ NumPy?
Sparse indexing is less developed. coo format for example doesn't implement it at all.
I haven't tried to implement this problem, though I have answered others that involve working with the sparse format attributes. So I'll just make some general observations.
B_sparse is a matrix, so its shape is (1,10). So the equivalent to B[A] is
In [294]: B_sparse[0,A]
Out[294]:
<3x9 sparse matrix of type '<class 'numpy.int32'>'
with 24 stored elements in Compressed Sparse Row format>
In [295]: _.A
Out[295]:
array([[2, 3, 4, 2, 8, 4, 2, 3, 4],
[5, 6, 7, 5, 6, 7, 5, 6, 7],
[8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)
B_sparse[A,:] or B_sparse[:,A] gives a 3d warning, since it would be trying to create a matrix version of:
In [298]: B[None,:][:,A]
Out[298]:
array([[[2, 3, 4, 2, 8, 4, 2, 3, 4],
[5, 6, 7, 5, 6, 7, 5, 6, 7],
[8, 9, 0, 8, 9, 0, 8, 9, 0]]])
As to your function:
A_sparse.nonzero() does A_sparse.tocoo() and returns its row and col. Effectively the same as what you do.
Here's something that should be faster, though I haven't tested it enough to be sure it is robust:
In [342]: Ac=A_sparse.tocoo()
In [343]: res=Ac.copy()
In [344]: res.data[:]=B_sparse[0, Ac.data].A[0]
In [345]: res
Out[345]:
<3x9 sparse matrix of type '<class 'numpy.int32'>'
with 27 stored elements in COOrdinate format>
In [346]: res.A
Out[346]:
array([[2, 3, 4, 2, 8, 4, 2, 3, 4],
[5, 6, 7, 5, 6, 7, 5, 6, 7],
[8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)
In this example there are 2 zeros that could cleaned up as well (look at res.nonzero()).
Since you are setting each res[i,j] with values from Ac.row and Ac.col, res has the same row,col values as Ac, so I initialize it as a copy. Then it's just a matter of updating the res.data attribute. It would be faster to index Bc.data directly, but that doesn't account for its sparsity.
I am in need of efficiently padding a numpy array on all 4 sides, using the first and last row/column as the padding data. For example, given the following:
A=np.array([[1 2 3 4],
[5 6 7 8],
[9 10 11 12]])
I am trying to end up with:
B=np.array([[1 1 2 3 4 4],
[1 1 2 3 4 4],
[5 5 6 7 8 8],
[9 9 10 11 12 12],
[9 9 10 11 12 12]])
Notice the original array A is located at: B[1:-1,1:-1]. I assume I could pad in one direction first (horizontal or vertical) than the other, to get the duplicated corner values. However, my vectorization/numpification is failing me. (Note: the array I am doing this with is quite large, and i need to perform this option many times, so doing it efficiently is key- I can do it with a loop, but it is quite slow).
With np.pad, you can specify the width of padding and the padding mode to apply to an array. For your example array, the edge padding mode gives the desired result:
>>> np.pad(A, 1, 'edge')
array([[ 1, 1, 2, 3, 4, 4],
[ 1, 1, 2, 3, 4, 4],
[ 5, 5, 6, 7, 8, 8],
[ 9, 9, 10, 11, 12, 12],
[ 9, 9, 10, 11, 12, 12]])