Related
The output of the two commands below gives a different array shape, I do appreciate explaining why and referring me to a reference if any, I searched the internet but did not find any clear explanation for it.
data.shape
(11,2)
# outputs the values in column-0 in an (1x11) array.
data[:,0]
array([-7.24070e-01, -2.40724e+00, 2.64837e+00, 3.60920e-01,
6.73120e-01, -4.54600e-01, 2.20168e+00, 1.15605e+00,
5.06940e-01, -8.59520e-01, -5.99700e-01])
# outputs the values in column-0 in an (11x1) array
data[:,:-1]
array([[-7.24070e-01],
[-2.40724e+00],
[ 2.64837e+00],
[ 3.60920e-01],
[ 6.73120e-01],
[-4.54600e-01],
[ 2.20168e+00],
[ 1.15605e+00],
[ 5.06940e-01],
[-8.59520e-01],
[-5.99700e-01]])
I'll try to consolidate the comments into an answer.
First look at Python list indexing
In [92]: alist = [1,2,3]
selecting an item:
In [93]: alist[0]
Out[93]: 1
making a copy of the whole list:
In [94]: alist[:]
Out[94]: [1, 2, 3]
or a slice of length 2, or 1 or 0:
In [95]: alist[:2]
Out[95]: [1, 2]
In [96]: alist[:1]
Out[96]: [1]
In [97]: alist[:0]
Out[97]: []
Arrays follow the same basic rules
In [98]: x = np.arange(12).reshape(3,4)
In [99]: x
Out[99]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Select a row:
In [100]: x[0]
Out[100]: array([0, 1, 2, 3])
or a column:
In [101]: x[:,0]
Out[101]: array([0, 4, 8])
x[0,1] selects an single element.
https://numpy.org/doc/stable/user/basics.indexing.html#single-element-indexing
Indexing with a slice returns multiple rows:
In [103]: x[0:2]
Out[103]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
In [104]: x[0:1] # it retains the dimensions, even if only 1 (or even 0)
Out[104]: array([[0, 1, 2, 3]])
Likewise for columns:
In [106]: x[:,0:1]
Out[106]:
array([[0],
[4],
[8]])
subslices on both dimensions:
In [107]: x[0:2,1:3]
Out[107]:
array([[1, 2],
[5, 6]])
https://numpy.org/doc/stable/user/basics.indexing.html
x[[0]] also returns a 2d array, but that gets into "advanced" indexing (which doesn't have a list equivalent).
Using numpy ndarrays most of the time we need't worry our pretty little heads about memory layout because results do not depend on it.
Except when they do. Consider, for example, this slightly overengineered way of setting the diagonal of a 3x2 matrix
>>> a = np.zeros((3,2))
>>> a.reshape(2,3)[:,0] = 1
>>> a
array([[1., 0.],
[0., 1.],
[0., 0.]])
As long as we control the memory layout of a this is fine. But if we don't it is a bug and to make matters worse a nasty silent one:
>>> a = np.zeros((3,2),order='F')
>>> a.reshape(2,3)[:,0] = 1
>>> a
array([[0., 0.],
[0., 0.],
[0., 0.]])
This shall suffice to show that memory layout is not merely an implementation detail.
The first thing one might reasonably ask to get on top of array layout is What do new arrays look like? The factories empty,ones,zeros,identity etc. return C-contiguous layouts per default.
This rule does, however, not extend to every new array that was allocated by numpy. For example:
>>> a = np.arange(8).reshape(2,2,2).transpose(1,0,2)
>>> aa = a*a
The product aa is a new array allocated by ufunc np.multiply. Is it C-contiguous? No:
>>> aa.strides
(16, 32, 8)
My guess is that this is the result of an optimization that recognizes that this operation can be done on a flat linear array which would explain why the output has the same memory layout as the inputs.
In fact this can even be useful, unlike the following nonsense function. It shows a handy idiom to implement an axis parameter while still keeping indexing simple.
>>> def symmetrize_along_axis(a,axis=0):
... aux = a.swapaxes(0,axis)
... out = aux + aux[::-1]
... return out.swapaxes(0,axis)
The slightly surprising but clearly desirable thing is that this produces contiguous output as long as input is contiguous.
>>> a = np.arange(8).reshape(2,2,2)
>>> symmetrize_along_axis(a,1).flags.contiguous
True
This shall suffice to show that knowing what layouts are returned by ufuncs can be quite useful. Hence my question:
Given the layouts of ufunc arguments are there any rules or guarantees regarding the layout of the output?
In a = np.zeros((3,2),order='F') case, a.reshape(2,3) creates a copy, not a view. That's why assignment fails, not the memory layout itself.
Look at same shape array:
In [123]: a = np.arange(6).reshape(3,2)
In [124]: a
Out[124]:
array([[0, 1],
[2, 3],
[4, 5]])
In [125]: a.reshape(2,3)
Out[125]:
array([[0, 1, 2],
[3, 4, 5]])
In [127]: a.reshape(2,3)[:,0]
Out[127]: array([0, 3])
In [125] the values still flow in order C.
and an order F array:
In [128]: b = np.arange(6).reshape(3,2, order='F')
In [129]: b
Out[129]:
array([[0, 3], # values flow in order F
[1, 4],
[2, 5]])
In [130]: b.reshape(2,3)
Out[130]:
array([[0, 3, 1], # values are jumbled
[4, 2, 5]])
In [131]: b.reshape(2,3)[:,0]
Out[131]: array([0, 4])
If I keep order F in the shape:
In [132]: b.reshape(2,3, order='F')
Out[132]:
array([[0, 2, 4], # values still flow in order F
[1, 3, 5]])
In [133]: b.reshape(2,3, order='F')[:,0]
Out[133]: array([0, 1])
Confirm with assignment:
In [135]: a.reshape(2,3)[:,0]=10
In [136]: a
Out[136]:
array([[10, 1],
[ 2, 10],
[ 4, 5]])
not assignment:
In [137]: b.reshape(2,3)[:,0]=10
In [138]: b
Out[138]:
array([[0, 3],
[1, 4],
[2, 5]])
but here assignment works:
In [139]: b.reshape(2,3, order='F')[:,0]=10
In [140]: b
Out[140]:
array([[10, 3],
[10, 4],
[ 2, 5]])
Or we can use order A to preserve order:
In [143]: b.reshape(2,3, order='A')[:,0]
Out[143]: array([10, 10])
In [144]: b.reshape(2,3, order='A')[:,0] = 20
In [145]: b
Out[145]:
array([[20, 3],
[20, 4],
[ 2, 5]])
ufunc order
Suspecting that ufunc are (mostly) implemented with nditer (C version), I checked np.nditer` docs - order can be specified in several places. And the tutorial demonstrates order effect on the iteration.
I don't see order documented for ufunc, but, it is accepted by the kwargs.
In [171]: c = np.arange(8).reshape(2,2,2)
In [172]: d = c.transpose(1,0,2)
In [173]: d.strides
Out[173]: (16, 32, 8)
In [174]: np.multiply(d,d,order='K').strides
Out[174]: (16, 32, 8)
In [175]: np.multiply(d,d,order='C').strides
Out[175]: (32, 16, 8)
In [176]: np.multiply(d,d,order='F').strides
Out[176]: (8, 16, 32)
My goal was to insert a column to the right on a numpy matrix. However, I found that the code I was using is putting in two columns rather than just one.
# This one results in a 4x1 matrix, as expected
np.insert(np.matrix([[0],[0]]), 1, np.matrix([[0],[0]]), 0)
>>>matrix([[0],
[0],
[0],
[0]])
# I would expect this line to return a 2x2 matrix, but it returns a 2x3 matrix instead.
np.insert(np.matrix([[0],[0]]), 1, np.matrix([[0],[0]]), 1)
>>>matrix([[0, 0, 0],
[0, 0, 0]]
Why do I get the above, in the second example, instead of [[0,0], [0,0]]?
While new use of np.matrix is discouraged, we get the same result with np.array:
In [41]: np.insert(np.array([[1],[2]]),1, np.array([[10],[20]]), 0)
Out[41]:
array([[ 1],
[10],
[20],
[ 2]])
In [42]: np.insert(np.array([[1],[2]]),1, np.array([[10],[20]]), 1)
Out[42]:
array([[ 1, 10, 20],
[ 2, 10, 20]])
In [44]: np.insert(np.array([[1],[2]]),1, np.array([10,20]), 1)
Out[44]:
array([[ 1, 10],
[ 2, 20]])
Insert as [1]:
In [46]: np.insert(np.array([[1],[2]]),[1], np.array([[10],[20]]), 1)
Out[46]:
array([[ 1, 10],
[ 2, 20]])
In [47]: np.insert(np.array([[1],[2]]),[1], np.array([10,20]), 1)
Out[47]:
array([[ 1, 10, 20],
[ 2, 10, 20]])
np.insert is a complex function written in Python. So we need to look at that code, and see how values are being mapped on the target space.
The docs elaborate on the difference between insert at 1 and [1]. But off hand I don't see an explanation of how the shape of values matters.
Difference between sequence and scalars:
>>> np.insert(a, [1], [[1],[2],[3]], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> np.array_equal(np.insert(a, 1, [1, 2, 3], axis=1),
... np.insert(a, [1], [[1],[2],[3]], axis=1))
True
When adding an array at the end of another, I'd use concatenate (or one of its stack variants) rather than insert. None of these operate in-place.
In [48]: np.concatenate([np.array([[1],[2]]), np.array([[10],[20]])], axis=1)
Out[48]:
array([[ 1, 10],
[ 2, 20]])
Have the following:
In [14]: A = array([[1, 1], [3, 2], [-4, 1]])
In [15]: A
Out[15]:
array([[ 1, 1],
[ 3, 2],
[-4, 1]])
In [16]: x = array([1, 1])
In [17]: x
Out[17]: array([1, 1])
In [18]: dot(A, x)
Out[18]: array([ 2, 5, -3])
I was expecting a column, because dot() function is described as an ordinary matrix multiplication.
Why does it return a row instead? This behaviour seems very discouraging.
x a 1D vector, and as such has no notion of whether it's a row vector or a column vector. Same goes for the result of dot(A, x).
Turn x into a 2D array, and all will be well:
In [7]: x = array([[1], [1]])
In [8]: x
Out[8]:
array([[1],
[1]])
In [9]: dot(A, x)
Out[9]:
array([[ 2],
[ 5],
[-3]])
Finally, if you prefer to use more natural matrix notation, convert A to numpy.matrix:
In [10]: A = matrix(A)
In [11]: A * x
Out[11]:
matrix([[ 2],
[ 5],
[-3]])
I need to accomplish the following task:
from:
a = array([[1,3,4],[1,2,3]...[1,2,1]])
(add one element to each row) to:
a = array([[1,3,4,x],[1,2,3,x]...[1,2,1,x]])
I have tried doing stuff like a[n] = array([1,3,4,x])
but numpy complained of shape mismatch. I tried iterating through a and appending element x to each item, but the changes are not reflected.
Any ideas on how I can accomplish this?
Appending data to an existing array is a natural thing to want to do for anyone with python experience. However, if you find yourself regularly appending to large arrays, you'll quickly discover that NumPy doesn't easily or efficiently do this the way a python list will. You'll find that every "append" action requires re-allocation of the array memory and short-term doubling of memory requirements. So, the more general solution to the problem is to try to allocate arrays to be as large as the final output of your algorithm. Then perform all your operations on sub-sets (slices) of that array. Array creation and destruction should ideally be minimized.
That said, It's often unavoidable and the functions that do this are:
for 2-D arrays:
np.hstack
np.vstack
np.column_stack
np.row_stack
for 3-D arrays (the above plus):
np.dstack
for N-D arrays:
np.concatenate
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
b = np.array([10,20,30])
c = np.hstack((a, np.atleast_2d(b).T))
returns c:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 20],
[ 1, 2, 1, 30]])
One way to do it (may not be the best) is to create another array with the new elements and do column_stack. i.e.
>>>a = array([[1,3,4],[1,2,3]...[1,2,1]])
[[1 3 4]
[1 2 3]
[1 2 1]]
>>>b = array([1,2,3])
>>>column_stack((a,b))
array([[1, 3, 4, 1],
[1, 2, 3, 2],
[1, 2, 1, 3]])
Appending a single scalar could be done a bit easier as already shown (and also without converting to float) by expanding the scalar to a python-list-type:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack ((a, [[x]] * len (a) ))
returns b as:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 10],
[ 1, 2, 1, 10]])
Appending a row could be done by:
c = np.vstack ((a, [x] * len (a[0]) ))
returns c as:
array([[ 1, 3, 4],
[ 1, 2, 3],
[ 1, 2, 1],
[10, 10, 10]])
np.insert can also be used for the purpose
import numpy as np
a = np.array([[1, 3, 4],
[1, 2, 3],
[1, 2, 1]])
x = 5
index = 3 # the position for x to be inserted before
np.insert(a, index, x, axis=1)
array([[1, 3, 4, 5],
[1, 2, 3, 5],
[1, 2, 1, 5]])
index can also be a list/tuple
>>> index = [1, 1, 3] # equivalently (1, 1, 3)
>>> np.insert(a, index, x, axis=1)
array([[1, 5, 5, 3, 4, 5],
[1, 5, 5, 2, 3, 5],
[1, 5, 5, 2, 1, 5]])
or a slice
>>> index = slice(0, 3)
>>> np.insert(a, index, x, axis=1)
array([[5, 1, 5, 3, 5, 4],
[5, 1, 5, 2, 5, 3],
[5, 1, 5, 2, 5, 1]])
If x is just a single scalar value, you could try something like this to ensure the correct shape of the array that is being appended/concatenated to the rightmost column of a:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack((a,x*np.ones((a.shape[0],1))))
returns b as:
array([[ 1., 3., 4., 10.],
[ 1., 2., 3., 10.],
[ 1., 2., 1., 10.]])
target = []
for line in a.tolist():
new_line = line.append(X)
target.append(new_line)
return array(target)