Asymmetric slicing python - python

Consider the following matrix:
X = np.arange(9).reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Let say I want to subset the following array
array([[0, 4, 2],
[3, 7, 5]])
It is possible with some indexing of rows and columns, for instance
col=[0,1,2]
row = [[0,1],[1,2],[0,1]]
Then if I store the result in a variable array I can do it with the following code:
array=np.zeros([2,3],dtype='int64')
for i in range(3):
array[:,i]=X[row[i],col[i]]
Is there a way to broadcast this kind of operation ? I have to do this as a data cleaning stage for a large file ~ 5 Gb, and I would like to use dask to parallelize it. But in a first time if I could avoid using a for loop I would feel great.

For arrays with NumPy's advanced-indexing, it would be -
X[row, np.asarray(col)[:,None]].T
Sample run -
In [9]: X
Out[9]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [10]: col=[0,1,2]
...: row = [[0,1],[1,2],[0,1]]
In [11]: X[row, np.asarray(col)[:,None]].T
Out[11]:
array([[0, 4, 2],
[3, 7, 5]])

Related

Are the elements created by numpy.repeat() views of the original numpy.array or unique elements?

I have a 3D array that I like to repeat 4 times.
Achieved via a mixture of Numpy and Python methods:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = []
>>> for i in range(4):
z2.append(z)
>>> z2
[array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])]
>>> z2 = np.array(z2)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Achieved via Pure NumPy:
>>> z2 = np.repeat(z[np.newaxis,...], 4, axis=0)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Are the elements created by numpy.repeat() views of the original numpy.array() or unique elements?
If the latter, is there an equivalent NumPy functions that can create views of the original array the same way as numpy.repeat()?
I think such an ability can help reduce the buffer space of z2 in the event size of z is large and when there are many repeats of z involved.
A follow-up on one of #FrankYellin answer:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = np.repeat(z[np.newaxis,...], 1_000_000_000, axis=0)
>>> z2.nbytes
72000000000
>>> y2 = np.broadcast_to(z, (1_000_000_000, 3, 3))
>>> y2.nbytes
72000000000
The nbytes from using np.broadcast_to() is the same as np.repeat(). This is surprising given that the former returns a readonly view on the original z array with the given shape. Having said this, I did notice that np.broadcast_to() created the y2 array instantaneously, while the creation of z2 via np.repeat() took abt 40 seconds to complete. Hence,np.broadcast_to() yielded significantly faster performance.
If you want a writable version, it is doable, but it's really ugly.
If you want a read-only version, np.broadcast_to(z, (4, 3, 3)) should be all you need.
Now the ugly writable version. Be careful. You can corrupt memory if you mess the arguments up.
> z.shape
(3, 3)
> z.strides
(24, 8)
from numpy.lib.stride_tricks import as_strided
z2 = as_strided(z, shape=(4, 3, 3), strides=(0, 24, 8))
and you end up with:
>>> z2[1, 1, 1]
4
>>> z2[1, 1, 1] = 100
>>> z2[2, 1, 1]
100
>>>
You are using strides to say that I want to create a second array overlayed on top of the first array. You set its new shape, and you prefix 0 to the previous stride, indicating that the first dimension has no effect on the data you want.
Make sure you understand strides.
numpy.repeat creates a new array and not a view (you can check it by looking the __array_interface__ field). In fact, it is not possible to create a view on the original array in the general case since Numpy views does not support such pattern. A views is basically just an object containing a pointer to a raw memory buffer, some strides, a shape and a type. While it is possible to repeat one item N times with a 0 stride, it is not possible to repeat 2 items N times (without adding a new dimension to the output array). Thus, no there is no way to build a function like numpy.repeat having the same array output shape to repeat items of the last axis. If adding a new dimension is Ok, then you can build an array with a new dimension and a stride set to 0. Repeating the last dimension is possible though. The answer of #FrankYellin gives a good example. Note that reshaping/ravel the resulting array cause a mandatory copy. Supporting such advanced views would make the Numpy code more complex or/and less efficient for a feature that is only used rarely by users.

Operations with matrices inside a loop in Python

I have two matrices, A and B.
A=np.matrix([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
B=np.matrix([[1,1,1],[2,2,2],[3,3,3],[4,4,4]])
I want to substract some of B'rows (namely 0,2 and 3) from A. I tried to use
Index=np.array([0,2,3])
for i in Index:
A[i,:]=A[i,:]-B[i,:]
but it didn't work because matriz A should look like
matrix([[0, 1, 2],
[1, 2, 3],
[4, 5, 6],
[6, 7, 8]])
and I got
matrix([[ 1, 2, 3],
[ 2, 3, 4],
[ 7, 8, 9],
[10, 11, 12]])
What's the correct way to do this operation? I took me a long time to realize this problem (the real problem I'm trying to solve has more variables) and can't seem to figure it out.
If you do mean substract, then your should use
A[i,:]=A[i,:]-B[i,:]
instead of
A[i,:]=A[i,:]+B[i,:]
Numpy has element-wise subtraction, so something like:
import numpy as np
A=np.matrix([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
B=np.matrix([[1,1,1],[2,2,2],[3,3,3],[4,4,4]])
indices = [0,2,3]
for i in indices:
A[i,:]=np.subtract(A[i,:], B[i,:])
Will give you this matrix for A:
[[0, 1, 2],
[4, 5, 6],
[4, 5, 6],
[6, 7, 8]])
Is this what you are after? For better performance you could also just change the particular rows of A:
A[indices]=np.subtract(A[indices],B[indices])
Which will give the same answer.

Efficiently change order of numpy array

I have a 3 dimensional numpy array. The dimension can go up to 128 x 64 x 8192. What I want to do is to change the order in the first dimension by interchanging pairwise.
The only idea I had so far is to create a list of the indices in the correct order.
order = [1,0,3,2...127,126]
data_new = data[order]
I fear, that this is not very efficient but I have no better idea so far
You could reshape to split the first axis into two axes, such that latter of those axes is of length 2 and then flip the array along that axis with [::-1] and finally reshape back to original shape.
Thus, we would have an implementation like so -
a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Sample run -
In [170]: a = np.random.randint(0,9,(6,3))
In [171]: order = [1,0,3,2,5,4]
In [172]: a[order]
Out[172]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
In [173]: a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Out[173]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
Alternatively, if you are looking to efficiently create those constantly flipping indices order, we could do something like this -
order = np.arange(data.shape[0]).reshape(-1,2)[:,::-1].ravel()

Numpy: operations on columns (or rows) of NxM array

This may be a silly question, but I've just started using numpy and I have to figure out how to perform some simple operations.
Suppose that I have the 2x3 array
array([[1, 3, 5],
[2, 4, 6]])
And that I want to perform some operation on the first column, for example subtract 1 to all the elements to get
array([[0, 3, 5],
[1, 4, 6]])
How can I perform such an operation?
arr
# array([[1, 3, 5],
# [2, 4, 6]])
arr[:,0] = arr[:,0] - 1 # choose the first column here, subtract one and
# assign it back to the same column
arr
# array([[0, 3, 5],
# [1, 4, 6]])

numpy, Select elements in rows by 1d indexes array

Let we have square array, n*n. For example, n=3 and array is this:
arr = array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
And let we have array of indices in the each ROW. For example:
myidx=array([1, 2, 1], dtype=int64)
I want to get:
[1, 5, 7]
Because in line [0,1,2] take element with index 1, in line [3,4,5] get element with index 2, in line [6,7,8] get element with index 1.
I'm confused, and can't take elements this way using standard numpy indexing.
Thank you for answer.
There's no real pretty way but this does what you are looking for :)
In [1]: from numpy import *
In [2]: arr = array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [3]: myidx = array([1, 2, 1], dtype=int64)
In [4]: arr[arange(len(myidx)), myidx]
Out[4]: array([1, 5, 7])
Simpler way to reach the goal is using choose numpy function:
numpy.choose(myidx, arr.transpose())

Categories

Resources