Using numpy.vectorize() to rotate all elements of a NumPy array - python

I am in the beginning phases of learning NumPy. I have a Numpy array of 3x3 matrices. I would like to create a new array where each of those matrices is rotated 90 degrees. I've studied this answer but I still can't figure out what I am doing wrong.
import numpy as np
# 3x3
m = np.array([[1,2,3], [4,5,6], [7,8,9]])
# array of 3x3
a = np.array([m,m,m,m])
# rotate a single matrix counter-clockwise
def rotate90(x):
return np.rot90(x)
# function that can be called on all elements of an np.array
# Note: I've tried different values for otypes= without success
f = np.vectorize(rotate90)
result = f(a)
# ValueError: Axes=(0, 1) out of range for array of ndim=0.
# The error occurs in NumPy's rot90() function.
Note: I realize I could do the following but I'd like to understand the vectorized option.
t = np.array([ np.rot90(x, k=-1) for x in a])

No need to do the rotations individually: numpy has a builtin numpy.rot90(m, k=1, axes=(0, 1)) function. By default the matrix is thus rotate over the first and second dimension.
If you want to rotate one level deeper, you simply have to set the axes over which rotation happens, one level deeper (and optionally swap them if you want to rotate in a different direction). Or as the documentation specifies:
axes: (2,) array_like
The array is rotated in the plane defined by the
axes. Axes must be different.
So we rotate over the y and z plane (if we label the dimensions x, y and z) and thus we either specify (2,1) or (1,2).
All you have to do is set the axes correctly, when you want to rotate to the right/left:
np.rot90(a,axes=(2,1)) # right
np.rot90(a,axes=(1,2)) # left
This will rotate all matrices, like:
>>> np.rot90(a,axes=(2,1))
array([[[7, 4, 1],
[8, 5, 2],
[9, 6, 3]],
[[7, 4, 1],
[8, 5, 2],
[9, 6, 3]],
[[7, 4, 1],
[8, 5, 2],
[9, 6, 3]],
[[7, 4, 1],
[8, 5, 2],
[9, 6, 3]]])
Or if you want to rotate to the left:
>>> np.rot90(a,axes=(1,2))
array([[[3, 6, 9],
[2, 5, 8],
[1, 4, 7]],
[[3, 6, 9],
[2, 5, 8],
[1, 4, 7]],
[[3, 6, 9],
[2, 5, 8],
[1, 4, 7]],
[[3, 6, 9],
[2, 5, 8],
[1, 4, 7]]])
Note that you can only specify the axes from numpy 1.12 and (probably) future versions.

Normally np.vectorize is used to apply a scalar (Python, non-numpy) function to all elements of an array, or set of arrays. There's a note that's often overlooked:
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
In [278]: m = np.array([[1,2,3],[4,5,6]])
In [279]: np.vectorize(lambda x:2*x)(m)
Out[279]:
array([[ 2, 4, 6],
[ 8, 10, 12]])
This multiplies each element of m by 2, taking care of the looping paper-work for us.
Better yet, when given several arrays, it broadcasts (a generalization of 'outer product').
In [280]: np.vectorize(lambda x,y:2*x+y)(np.arange(3), np.arange(2)[:,None])
Out[280]:
array([[0, 2, 4],
[1, 3, 5]])
This feeds (x,y) scalar tuples to the lambda for all combinations of a (3,) array broadcasted against a (2,1) array, resulting in a (2,3) array. It can be viewed as a broadcasted extension of map.
The problem with np.vectorize(np.rot90) is that rot90 takes a 2d array, but vectorize will feed it scalars.
However I see in the docs that for v1.12 they've added a signature parameter. This is the first time I used it.
Your problem - apply np.rot90 to 2d elements of a 3d array:
In [266]: m = np.array([[1,2,3],[4,5,6]])
In [267]: a = np.stack([m,m])
In [268]: a
Out[268]:
array([[[1, 2, 3],
[4, 5, 6]],
[[1, 2, 3],
[4, 5, 6]]])
While you could describe this a as an array of 2d arrays, it's better to think of it as a 3d array of integers. That's how the np.vectorize(myfun)(a) sees it, giving myfun each number.
Applied to a 2d m:
In [269]: np.rot90(m)
Out[269]:
array([[3, 6],
[2, 5],
[1, 4]])
With the Python work horse, the list comprehension:
In [270]: [np.rot90(i) for i in a]
Out[270]:
[array([[3, 6],
[2, 5],
[1, 4]]), array([[3, 6],
[2, 5],
[1, 4]])]
The result is a list, but we could wrap that in np.array.
Python map does the same thing.
In [271]: list(map(np.rot90, a))
Out[271]:
[array([[3, 6],
[2, 5],
[1, 4]]), array([[3, 6],
[2, 5],
[1, 4]])]
The comprehension and map both iterate on the 1st dimension of a, action on the resulting 2d element.
vectorize with signature:
In [272]: f = np.vectorize(np.rot90, signature='(n,m)->(k,l)')
In [273]: f(a)
Out[273]:
array([[[3, 6],
[2, 5],
[1, 4]],
[[3, 6],
[2, 5],
[1, 4]]])
The signature tells it to pass a 2d array and expect back a 2d array. (I should explore how signature plays with the otypes parameter.)
Some quick time comparisons:
In [287]: timeit np.array([np.rot90(i) for i in a])
10000 loops, best of 3: 40 µs per loop
In [288]: timeit np.array(list(map(np.rot90, a)))
10000 loops, best of 3: 41.1 µs per loop
In [289]: timeit np.vectorize(np.rot90, signature='(n,m)->(k,l)')(a)
1000 loops, best of 3: 234 µs per loop
In [290]: %%timeit f=np.vectorize(np.rot90, signature='(n,m)->(k,l)')
...: f(a)
...:
1000 loops, best of 3: 196 µs per loop
So for a small array, the Python list methods are faster, by quite a bit. Sometimes, numpy approaches do better with larger arrays, though I doubt in this case.
rot90 with the axes parameter is even better, and will do well with larger arrays:
In [292]: timeit np.rot90(a,axes=(1,2))
100000 loops, best of 3: 15.7 µs per loop
Looking at the np.rot90 code, I see that it is just doing np.flip (reverse) and np.transpose, in various combinations depending on the k. In effect for this case it is doing:
In [295]: a.transpose(0,2,1)[:,::-1,:]
Out[295]:
array([[[3, 6],
[2, 5],
[1, 4]],
[[3, 6],
[2, 5],
[1, 4]]])
(this is even faster than rot90.)
I suspect vectorize with the signature is doing something like:
In [301]: b = np.zeros(2,dtype=object)
In [302]: b[...] = [m,m]
In [303]: f = np.frompyfunc(np.rot90, 1,1)
In [304]: f(b)
Out[304]:
array([array([[3, 6],
[2, 5],
[1, 4]]),
array([[3, 6],
[2, 5],
[1, 4]])], dtype=object)
np.stack(f(b)) will convert the object array into a 3d array like the other code.
frompyfunc is the underlying function for vectorize, and returns an array of objects. Here I create an array like your a except it is 1d, containing multiple m arrays. It is an array of arrays, as opposed to a 3d array.

Related

Are the elements created by numpy.repeat() views of the original numpy.array or unique elements?

I have a 3D array that I like to repeat 4 times.
Achieved via a mixture of Numpy and Python methods:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = []
>>> for i in range(4):
z2.append(z)
>>> z2
[array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])]
>>> z2 = np.array(z2)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Achieved via Pure NumPy:
>>> z2 = np.repeat(z[np.newaxis,...], 4, axis=0)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Are the elements created by numpy.repeat() views of the original numpy.array() or unique elements?
If the latter, is there an equivalent NumPy functions that can create views of the original array the same way as numpy.repeat()?
I think such an ability can help reduce the buffer space of z2 in the event size of z is large and when there are many repeats of z involved.
A follow-up on one of #FrankYellin answer:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = np.repeat(z[np.newaxis,...], 1_000_000_000, axis=0)
>>> z2.nbytes
72000000000
>>> y2 = np.broadcast_to(z, (1_000_000_000, 3, 3))
>>> y2.nbytes
72000000000
The nbytes from using np.broadcast_to() is the same as np.repeat(). This is surprising given that the former returns a readonly view on the original z array with the given shape. Having said this, I did notice that np.broadcast_to() created the y2 array instantaneously, while the creation of z2 via np.repeat() took abt 40 seconds to complete. Hence,np.broadcast_to() yielded significantly faster performance.
If you want a writable version, it is doable, but it's really ugly.
If you want a read-only version, np.broadcast_to(z, (4, 3, 3)) should be all you need.
Now the ugly writable version. Be careful. You can corrupt memory if you mess the arguments up.
> z.shape
(3, 3)
> z.strides
(24, 8)
from numpy.lib.stride_tricks import as_strided
z2 = as_strided(z, shape=(4, 3, 3), strides=(0, 24, 8))
and you end up with:
>>> z2[1, 1, 1]
4
>>> z2[1, 1, 1] = 100
>>> z2[2, 1, 1]
100
>>>
You are using strides to say that I want to create a second array overlayed on top of the first array. You set its new shape, and you prefix 0 to the previous stride, indicating that the first dimension has no effect on the data you want.
Make sure you understand strides.
numpy.repeat creates a new array and not a view (you can check it by looking the __array_interface__ field). In fact, it is not possible to create a view on the original array in the general case since Numpy views does not support such pattern. A views is basically just an object containing a pointer to a raw memory buffer, some strides, a shape and a type. While it is possible to repeat one item N times with a 0 stride, it is not possible to repeat 2 items N times (without adding a new dimension to the output array). Thus, no there is no way to build a function like numpy.repeat having the same array output shape to repeat items of the last axis. If adding a new dimension is Ok, then you can build an array with a new dimension and a stride set to 0. Repeating the last dimension is possible though. The answer of #FrankYellin gives a good example. Note that reshaping/ravel the resulting array cause a mandatory copy. Supporting such advanced views would make the Numpy code more complex or/and less efficient for a feature that is only used rarely by users.

Asymmetric slicing python

Consider the following matrix:
X = np.arange(9).reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Let say I want to subset the following array
array([[0, 4, 2],
[3, 7, 5]])
It is possible with some indexing of rows and columns, for instance
col=[0,1,2]
row = [[0,1],[1,2],[0,1]]
Then if I store the result in a variable array I can do it with the following code:
array=np.zeros([2,3],dtype='int64')
for i in range(3):
array[:,i]=X[row[i],col[i]]
Is there a way to broadcast this kind of operation ? I have to do this as a data cleaning stage for a large file ~ 5 Gb, and I would like to use dask to parallelize it. But in a first time if I could avoid using a for loop I would feel great.
For arrays with NumPy's advanced-indexing, it would be -
X[row, np.asarray(col)[:,None]].T
Sample run -
In [9]: X
Out[9]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [10]: col=[0,1,2]
...: row = [[0,1],[1,2],[0,1]]
In [11]: X[row, np.asarray(col)[:,None]].T
Out[11]:
array([[0, 4, 2],
[3, 7, 5]])

numpy vectorize dimension increasing function

I would like to create a function that has input: x.shape==(2,2), and outputs y.shape==(2,2,3).
For example:
#np.vectorize
def foo(x):
#This function doesn't work like I want
return x,x,x
a = np.array([[1,2],[3,4]])
print(foo(a))
#desired output
[[[1 1 1]
[2 2 2]]
[[3 3 3]
[4 4 4]]]
#actual output
(array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]))
Or maybe:
#np.vectorize
def bar(x):
#This function doesn't work like I want
return np.array([x,2*x,5])
a = np.array([[1,2],[3,4]])
print(bar(a))
#desired output
[[[1 2 5]
[2 4 5]]
[[3 6 5]
[4 8 5]]]
Note that foo is just an example. I want a way to map over a numpy array (which is what vectorize is supposed to do), but have that map take a 0d object and shove a 1d object in its place. It also seems to me that the dimensions here are arbitrary, as one might wish to take a function that takes a 1d object and returns a 3d object, vectorize it, call it on a 5d object, and get back a 7d object.... However, my specific use case only requires vectorizing a 0d to 1d function, and mapping it appropriately over a 2d array.
It would help, in your question, to show both the actual result and your desired result. As written that isn't very clear.
In [79]: foo(np.array([[1,2],[3,4]]))
Out[79]:
(array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]), array([[1, 2],
[3, 4]]))
As indicated in the vectorize docs, this has returned a tuple of arrays, corresponding to the tuple of values that your function returned.
Your bar returns an array, where as vectorize expected it to return a scalar (or single value):
In [82]: bar(np.array([[1,2],[3,4]]))
ValueError: setting an array element with a sequence.
vectorize takes an otypes parameter that sometimes helps. For example if I say that bar (without the wrapper) returns an object, I get:
In [84]: f=np.vectorize(bar, otypes=[object])
In [85]: f(np.array([[1,2],[3,4]]))
Out[85]:
array([[array([1, 2, 5]), array([2, 4, 5])],
[array([3, 6, 5]), array([4, 8, 5])]], dtype=object)
A (2,2) array of (3,) arrays. The (2,2) shape matches the shape of the input.
vectorize has a relatively new parameter, signature
In [90]: f=np.vectorize(bar, signature='()->(n)')
In [91]: f(np.array([[1,2],[3,4]]))
Out[91]:
array([[[1, 2, 5],
[2, 4, 5]],
[[3, 6, 5],
[4, 8, 5]]])
In [92]: _.shape
Out[92]: (2, 2, 3)
I haven't used this much, so am still getting a feel for how it works. When I've tested it, it is slower than the original scalar version of vectorize. Neither offers any speed advantage of explicit loops. However vectorize does help when 'broadcasting', allowing you to use a variety of input shapes. That's even more useful when your function takes several inputs, not just one as in this case.
In [94]: f(np.array([1,2]))
Out[94]:
array([[1, 2, 5],
[2, 4, 5]])
In [95]: f(np.array(3))
Out[95]: array([3, 6, 5])
For best speed, you want to use existing numpy whole-array functions where possible. For example your foo case can be done with:
In [97]: np.repeat(a[:,:,None],3, axis=2)
Out[97]:
array([[[1, 1, 1],
[2, 2, 2]],
[[3, 3, 3],
[4, 4, 4]]])
np.stack([a]*3, axis=2) also works.
And your bar desired result:
In [100]: np.stack([a, 2*a, np.full(a.shape, 5)], axis=2)
Out[100]:
array([[[1, 2, 5],
[2, 4, 5]],
[[3, 6, 5],
[4, 8, 5]]])
2*a takes advantage of the whole-array multiplication. That's true 'numpy-onic' thinking.
Just repeating the value into another dimension is quite simple:
import numpy as np
x = a = np.array([[1,2],[3,4]])
y = np.repeat(x[:,:,np.newaxis], 3, axis=2)
print y.shape
print y
(2L, 2L, 3L)
[[[1 1 1]
[2 2 2]]
[[3 3 3]
[4 4 4]]]
This seems to work for the "f R0 -> R1 mapped over a nd array giving a (n+1)d one"
def foo(x):
return np.concatenate((x,x))
np.apply_along_axis(foo,2,x.reshape(list(x.shape)+[1]))
doesn't generalize all that well, though

How to add a 2D matrix to another 3D matrix in python?

I have an 3D matrix a,like this:
a=np.array([[[1,2],[2,3]],[[3,4],[4,5]]])
[
[[1 2],[2 3]]
[[3 4],[4 5]]
]
a.shape
(2, 2, 2)
Now, I want to add another element, like [[5,6],[6,7]] to this array.
So, new array will be:
[
[[1, 2],[2, 3]]
[[3, 4],[4, 5]]
[[5, 6],[6, 7]]
]
a.shape
(3, 2, 2)
What is the best way to do this?
( I'm working with big datasets so I need the best way)
Use np.vstack to vertically stack after extending the second array to 3D by adding a new axis as its first axis with None/np.newaxis, like so -
np.vstack((a,b[None]))
Sample run -
In [403]: a
Out[403]:
array([[[1, 2],
[2, 3]],
[[3, 4],
[4, 5]]])
In [404]: b
Out[404]:
array([[5, 6],
[6, 7]])
In [405]: np.vstack((a,b[None]))
Out[405]:
array([[[1, 2],
[2, 3]],
[[3, 4],
[4, 5]],
[[5, 6],
[6, 7]]])
You can use np.append to append to matrixes:
a = np.array([[[1,2],[2,3]],[[3,4],[4,5]]])
a = np.append(a, [[[5,6],[6,7]]], axis=0)
Note that I had to add an extra set of brackets around the second part, in order for the dimensions to be correct. Also, you must use an axis or it will all be flattened to a linear array.
Try numpy.append
import numpy as np
a=np.array([[[1,2],[2,3]],[[3,4],[4,5]]])
b=np.array([[3,4],[4,5]])
np.append(a,[b[:,:]],axis=0)

Efficiently change order of numpy array

I have a 3 dimensional numpy array. The dimension can go up to 128 x 64 x 8192. What I want to do is to change the order in the first dimension by interchanging pairwise.
The only idea I had so far is to create a list of the indices in the correct order.
order = [1,0,3,2...127,126]
data_new = data[order]
I fear, that this is not very efficient but I have no better idea so far
You could reshape to split the first axis into two axes, such that latter of those axes is of length 2 and then flip the array along that axis with [::-1] and finally reshape back to original shape.
Thus, we would have an implementation like so -
a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Sample run -
In [170]: a = np.random.randint(0,9,(6,3))
In [171]: order = [1,0,3,2,5,4]
In [172]: a[order]
Out[172]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
In [173]: a.reshape(-1,2,*a.shape[1:])[:,::-1].reshape(a.shape)
Out[173]:
array([[0, 8, 5],
[4, 5, 6],
[0, 0, 2],
[7, 3, 8],
[1, 6, 3],
[2, 4, 4]])
Alternatively, if you are looking to efficiently create those constantly flipping indices order, we could do something like this -
order = np.arange(data.shape[0]).reshape(-1,2)[:,::-1].ravel()

Categories

Resources