Predict memory layout of ufunc output - python

Using numpy ndarrays most of the time we need't worry our pretty little heads about memory layout because results do not depend on it.
Except when they do. Consider, for example, this slightly overengineered way of setting the diagonal of a 3x2 matrix
>>> a = np.zeros((3,2))
>>> a.reshape(2,3)[:,0] = 1
>>> a
array([[1., 0.],
[0., 1.],
[0., 0.]])
As long as we control the memory layout of a this is fine. But if we don't it is a bug and to make matters worse a nasty silent one:
>>> a = np.zeros((3,2),order='F')
>>> a.reshape(2,3)[:,0] = 1
>>> a
array([[0., 0.],
[0., 0.],
[0., 0.]])
This shall suffice to show that memory layout is not merely an implementation detail.
The first thing one might reasonably ask to get on top of array layout is What do new arrays look like? The factories empty,ones,zeros,identity etc. return C-contiguous layouts per default.
This rule does, however, not extend to every new array that was allocated by numpy. For example:
>>> a = np.arange(8).reshape(2,2,2).transpose(1,0,2)
>>> aa = a*a
The product aa is a new array allocated by ufunc np.multiply. Is it C-contiguous? No:
>>> aa.strides
(16, 32, 8)
My guess is that this is the result of an optimization that recognizes that this operation can be done on a flat linear array which would explain why the output has the same memory layout as the inputs.
In fact this can even be useful, unlike the following nonsense function. It shows a handy idiom to implement an axis parameter while still keeping indexing simple.
>>> def symmetrize_along_axis(a,axis=0):
... aux = a.swapaxes(0,axis)
... out = aux + aux[::-1]
... return out.swapaxes(0,axis)
The slightly surprising but clearly desirable thing is that this produces contiguous output as long as input is contiguous.
>>> a = np.arange(8).reshape(2,2,2)
>>> symmetrize_along_axis(a,1).flags.contiguous
True
This shall suffice to show that knowing what layouts are returned by ufuncs can be quite useful. Hence my question:
Given the layouts of ufunc arguments are there any rules or guarantees regarding the layout of the output?

In a = np.zeros((3,2),order='F') case, a.reshape(2,3) creates a copy, not a view. That's why assignment fails, not the memory layout itself.
Look at same shape array:
In [123]: a = np.arange(6).reshape(3,2)
In [124]: a
Out[124]:
array([[0, 1],
[2, 3],
[4, 5]])
In [125]: a.reshape(2,3)
Out[125]:
array([[0, 1, 2],
[3, 4, 5]])
In [127]: a.reshape(2,3)[:,0]
Out[127]: array([0, 3])
In [125] the values still flow in order C.
and an order F array:
In [128]: b = np.arange(6).reshape(3,2, order='F')
In [129]: b
Out[129]:
array([[0, 3], # values flow in order F
[1, 4],
[2, 5]])
In [130]: b.reshape(2,3)
Out[130]:
array([[0, 3, 1], # values are jumbled
[4, 2, 5]])
In [131]: b.reshape(2,3)[:,0]
Out[131]: array([0, 4])
If I keep order F in the shape:
In [132]: b.reshape(2,3, order='F')
Out[132]:
array([[0, 2, 4], # values still flow in order F
[1, 3, 5]])
In [133]: b.reshape(2,3, order='F')[:,0]
Out[133]: array([0, 1])
Confirm with assignment:
In [135]: a.reshape(2,3)[:,0]=10
In [136]: a
Out[136]:
array([[10, 1],
[ 2, 10],
[ 4, 5]])
not assignment:
In [137]: b.reshape(2,3)[:,0]=10
In [138]: b
Out[138]:
array([[0, 3],
[1, 4],
[2, 5]])
but here assignment works:
In [139]: b.reshape(2,3, order='F')[:,0]=10
In [140]: b
Out[140]:
array([[10, 3],
[10, 4],
[ 2, 5]])
Or we can use order A to preserve order:
In [143]: b.reshape(2,3, order='A')[:,0]
Out[143]: array([10, 10])
In [144]: b.reshape(2,3, order='A')[:,0] = 20
In [145]: b
Out[145]:
array([[20, 3],
[20, 4],
[ 2, 5]])
ufunc order
Suspecting that ufunc are (mostly) implemented with nditer (C version), I checked np.nditer` docs - order can be specified in several places. And the tutorial demonstrates order effect on the iteration.
I don't see order documented for ufunc, but, it is accepted by the kwargs.
In [171]: c = np.arange(8).reshape(2,2,2)
In [172]: d = c.transpose(1,0,2)
In [173]: d.strides
Out[173]: (16, 32, 8)
In [174]: np.multiply(d,d,order='K').strides
Out[174]: (16, 32, 8)
In [175]: np.multiply(d,d,order='C').strides
Out[175]: (32, 16, 8)
In [176]: np.multiply(d,d,order='F').strides
Out[176]: (8, 16, 32)

Related

Multi-dimensional array notation in Python

I have two arrays A and i with dimensions (1, 3, 3) and (1, 2, 2) respectively. I want to define a new array I which gives the elements of A based on i. The current and desired outputs are attached.
import numpy as np
i=np.array([[[0,0],[1,2],[2,2]]])
A = np.array([[[1,2,3],[4,5,6],[7,8,9]]], dtype=float)
I=A[0,i]
print([I])
The current output is
[array([[[[1.000000000, 2.000000000, 3.000000000],
[1.000000000, 2.000000000, 3.000000000]],
[[4.000000000, 5.000000000, 6.000000000],
[7.000000000, 8.000000000, 9.000000000]],
[[7.000000000, 8.000000000, 9.000000000],
[7.000000000, 8.000000000, 9.000000000]]]])]
The desired output is
[array(([[[1],[6],[9]]]))
In [131]: A.shape, i.shape
Out[131]: ((1, 3, 3), (1, 3, 2))
That leading size 1 dimension just adds a [] layer, and complicates indexing (a bit):
In [132]: A[0]
Out[132]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
This is the indexing that I think you want:
In [133]: A[0,i[0,:,0],i[0,:,1]]
Out[133]: array([1, 6, 9])
If you really need a trailing size 1 dimension, add it after:
In [134]: A[0,i[0,:,0],i[0,:,1]][:,None]
Out[134]:
array([[1],
[6],
[9]])
From the desired numbers, I deduced that you wanted to use the 2 columns of i as indices to two different dimensions of A:
In [135]: i[0]
Out[135]:
array([[0, 0],
[1, 2],
[2, 2]])
Another way to do the same thing:
In [139]: tuple(i.T)
Out[139]:
(array([[0],
[1],
[2]]),
array([[0],
[2],
[2]]))
In [140]: A[0][tuple(i.T)]
Out[140]:
array([[1],
[6],
[9]])
You must enter
I=A[0,:1,i[:,1]]
You can use numpy's take for that.
However, take works with a flat index, so you will need to use [0, 5, 8] for your indexes instead.
Here is an example:
>>> I = [A.shape[2] * x + y for x,y in i[0]] # Convert to flat indexes
>>> I = np.expand_dims(I, axis=(1,2))
>>> A.take(I)
array([[[1.]],
[[6.]],
[[9.]]])

How to insert a value in a fixed positon of pytorch

I have a PyTorch tensor
x = [[1,2,3,4,5]]
Now I want to add a value to a fixed position of the tensor x, for example, I want to add 11 in position 3 then the x will be
x= [[1,2,3,11,4,5]]
How can I perform this operation in Pytorch?
Dynamically extending arrays to arbitrary sizes along the non-singleton dimensions, such as the ones you mentioned, are unsupported in PyTorch mainly because the memory is pre-allocated during tensor construction and set to fixed size depending on the data type. The only way to grow non-singleton dimension size is to create a new (empty/zero) tensor with the target shape and insert values at the desired position(s), while also copying values.
In [24]: z = torch.zeros(1, 6)
In [27]: t
Out[27]: tensor([[1, 2, 3, 4, 5]])
In [30]: z[:, :3] = t[:, :3]
In [33]: z[:, -2:] = t[:, -2:]
In [36]: z[z == 0] = 11
In [37]: z
Out[37]: tensor([[ 1., 2., 3., 11., 4., 5.]])
However, if you'd have instead wanted to expand the tensor along the singleton dimension, then that's easy to achieve using tensor.expand(new_shape). In the below example, we expand the tensor t to length 3 along the 0th dimension, which is originally a singleton dimension.
# make a copy for in-place modification since `expand()` returns a view
In [64]: t_expd = t.expand(3, -1).clone()
In [65]: t_expd
Out[65]:
tensor([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]])
# modify 2nd and 3rd rows
In [66]: t_expd[1:, ...] = 23
In [67]: t_expd
Out[67]:
tensor([[ 1, 2, 3, 4, 5],
[23, 23, 23, 23, 23],
[23, 23, 23, 23, 23]])

Use Array as Indexing Mask for Multidimensional Array

I have the following arrays:
a = np.arange(12).reshape((2, 2, 3))
and
b = np.zeros((2, 2))
Now I want to use b to access a, s.t. at each for index i,j we take the z-th element of a, if b[i, j] = z.
Meaning for the above example the answer should be [[0, 3], [6, 9]].
I feel this is very related to np.choose, but yet somehow cannot quite manage it.
Can you help me?
Two approaches could be suggested.
With explicit range arrays for advanced-indexing -
m,n = b.shape
out = a[np.arange(m)[:,None],np.arange(n),b.astype(int)]
With np.take_along_axis -
np.take_along_axis(a,b.astype(int)[...,None],axis=2)[...,0]
Sample run -
In [44]: a
Out[44]:
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
In [45]: b
Out[45]:
array([[0., 0.],
[0., 0.]])
In [46]: m,n = b.shape
In [47]: a[np.arange(m)[:,None],np.arange(n),b.astype(int)]
Out[47]:
array([[0, 3],
[6, 9]])
In [48]: np.take_along_axis(a,b.astype(int)[...,None],axis=2)[...,0]
Out[48]:
array([[0, 3],
[6, 9]])

How to fill numpy array with another numpy array

I have an empty numpy array, and another one populated with values. I want to fill the empty numpy array with the populated one, x times.
So, when x = 3, the (originally empty array) would look like [[populated_array],[populated_array], [populated_array]]
Where populated_array is the same value/array each time.
I have tried this
a = np.empty(3)
a.fill(np.array([4,6,6,1]))
but get this
ValueError: Input object to FillWithScalar is not a scalar
and want this
[[4,6,6,1],[4,6,6,1],[4,6,6,1]]
cheers for any help.
tile and repeat are handy functions when you want to repeat an array in various ways:
In [233]: np.tile(np.array([4,6,6,1]),(3,1))
Out[233]:
array([[4, 6, 6, 1],
[4, 6, 6, 1],
[4, 6, 6, 1]])
On the failure, note the docs for fill:
a.fill(value)
Fill the array with a scalar value.
np.array([4,6,6,1]) is not a scalar value. a was initialized as a 3 element float array.
It is possible to assign values to elements of an array, provided the shapes are right:
In [241]: a=np.empty(3)
In [242]: a[:]=np.array([1,2,3]) # 3 numbers into 3 slots
In [243]: a
Out[243]: array([ 1., 2., 3.])
In [244]: a=np.empty((3,4))
In [245]: a[:]=np.array([1,2,3,4]) # 4 numbers into 4 columns
In [246]: a
Out[246]:
array([[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]])
This fill works with an object type array, but the result is quite different, and should be used with considerable caution:
In [247]: a=np.empty(3, object)
In [248]: a
Out[248]: array([None, None, None], dtype=object)
In [249]: a.fill(np.array([1,2,3,4]))
In [250]: a
Out[250]: array([array([1, 2, 3, 4]), array([1, 2, 3, 4]), array([1, 2, 3, 4])], dtype=object)
This (3,) array is not the same as the (3,4) array produced by other methods. Each element of the object array is a pointer to the same thing. Changing a value in one element of a changes that value in all the elements (because they are the same object).
In [251]: a[0][3]=5
In [252]: a
Out[252]: array([array([1, 2, 3, 5]), array([1, 2, 3, 5]), array([1, 2, 3, 5])], dtype=object)
Use broadcasting
vstack, tile, and repeat are all great and whatnot, but broadcasting can be several orders of magnitude faster...
import numpy as np
from time import time
t = time()
for _ in xrange(10000):
a = np.array([4,6,6,1])
b = np.vstack((a,)*100)
print time()-t
t = time()
for _ in xrange(10000):
a = np.array([4,6,6,1])
b = np.tile(a,(3,1))
print time()-t
t = time()
for _ in xrange(10000):
a = np.array([4,6,6,1])
b = np.empty([100,a.shape[0]])
b[:] = a
print time()-t
prints:
2.76399993896
0.140000104904
0.0490000247955
You can vstack it:
>>> a = np.array([4,6,6,1])
>>> np.vstack((a,)*3)
array([[4, 6, 6, 1],
[4, 6, 6, 1],
[4, 6, 6, 1]])
Note that you frequently don't need to do this... You can do a lot of neat tricks with numpy's broadcasting...:
>>> a = np.array([4,6,6,1])
>>> ones = np.ones((4, 4))
>>> ones * a
array([[ 4., 6., 6., 1.],
[ 4., 6., 6., 1.],
[ 4., 6., 6., 1.],
[ 4., 6., 6., 1.]])
In some cases, you can also use np.newaxis and ... to do neat things as well. It's probably worth looking at numpy's indexing documentation to get familiar with the options.
As the Numpy Array Documentation states, first param is shape, son when you're doing:
a = np.empty(3)
You're creating an array of dimension 3 (just one dimension).
Instead, you should do:
a = np.empty([3,3])
That creates an array of 3 subarrays of dimension each with dimension 3 (that is a matrix 3x3).
As the Numpy fill Documentation states, fill only takes a number(scalar) as param, so you cannot use another array as argument and what you had done isn't properly working:
a.fill(np.array([4,6,6,1]))
To achieve what you're trying to do, I would do:
a = np.array([[4,6,6,1]]*3)
Hope my comments help you!
Repeating tasks like this often get reduced to matrix or vector operations. np.outer() can do it even faster than multiplication with the eye matrix or filling in empty array:
>>>a = np.array([4,6,6,1])
>>>b = np.outer(np.ones(3, dtype=np.int), a)
>>>b
array([[4, 6, 6, 1],
[4, 6, 6, 1],
[4, 6, 6, 1]])
You could use np.full(), as described here.
>>>repetitions = 3
>>>fill_array = np.array([4,6,6,1])
>>>fill_shape = np.shape(fill_array)
>>>a = np.full([*(repetitions,),*fill_shape],fill_array)
>>>a
array([[4, 6, 6, 1],
[4, 6, 6, 1],
[4, 6, 6, 1]])

Python/numpy issue with array/vector with empty second dimension

I have what seems to be an easy question.
Observe the code:
In : x=np.array([0, 6])
Out: array([0, 6])
In : x.shape
Out: (2L,)
Which shows that the array has no second dimension, and therefore x is no differnet from x.T.
How can I make x have dimension (2L,1L)? The real motivation for this question is that I have an array y of shape [3L,4L], and I want y.sum(1) to be a vector that can be transposed, etc.
While you can reshape arrays, and add dimensions with [:,np.newaxis], you should be familiar with the most basic nested brackets, or list, notation. Note how it matches the display.
In [230]: np.array([[0],[6]])
Out[230]:
array([[0],
[6]])
In [231]: _.shape
Out[231]: (2, 1)
np.array also takes a ndmin parameter, though it add extra dimensions at the start (the default location for numpy.)
In [232]: np.array([0,6],ndmin=2)
Out[232]: array([[0, 6]])
In [233]: _.shape
Out[233]: (1, 2)
A classic way of making something 2d - reshape:
In [234]: y=np.arange(12).reshape(3,4)
In [235]: y
Out[235]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
sum (and related functions) has a keepdims parameter. Read the docs.
In [236]: y.sum(axis=1,keepdims=True)
Out[236]:
array([[ 6],
[22],
[38]])
In [237]: _.shape
Out[237]: (3, 1)
empty 2nd dimension isn't quite the terminology. More like a nonexistent 2nd dimension.
A dimension can have 0 terms:
In [238]: np.ones((2,0))
Out[238]: array([], shape=(2, 0), dtype=float64)
If you are more familiar with MATLAB, which has a minimum of 2d, you might like the np.matrix subclass. It takes steps to ensure that most operations return another 2d matrix:
In [247]: ym=np.matrix(y)
In [248]: ym.sum(axis=1)
Out[248]:
matrix([[ 6],
[22],
[38]])
The matrix sum does:
np.ndarray.sum(self, axis, dtype, out, keepdims=True)._collapse(axis)
The _collapse bit lets it return a scalar for ym.sum().
There is another point to keep dimension info:
In [42]: X
Out[42]:
array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
In [43]: X[1].shape
Out[43]: (2,)
In [44]: X[1:2].shape
Out[44]: (1, 2)
In [45]: X[1]
Out[45]: array([0, 1])
In [46]: X[1:2] # this way will keep dimension
Out[46]: array([[0, 1]])

Categories

Resources