Related
I have tried to apply the function np.nditer() like zip() with arrays of different dimensions, where the iterator should use only the first dimensions.
Minimal example
a_all = np.arange(6).reshape(2,3)
idx_all = np.arange(12).reshape(2,3,2)
for a, idx in np.nditer([a_all, idx_all]):
print((a, idx))
Which throws the error:
ValueError: operands could not be broadcast together with shapes (2,3) (2,3,2)
My use case
I have two arrays with data which I want to calculate each other. Furthermore I have an index list for another array. So I try:
a_all = np.arange(6).reshape(2,3)
b_all = np.arange(6).reshape(2,3)
idx_all = (
((0,0), (0,1), (0,2)),
((1,0), (1,1), (1,2))
)
result = np.zeros((2,3))
for a, b, idx in np.nditer([a_all, b_all, idx_all]):
result[idx] += a*b
Which throws the same error like the minimal example.
I assume the problem is that np.nditer() tries to iterate over all dimensions of idx_all, but I couldn't figure out how to limit it to the first two.
zip() I do not want to use, otherwise I would need two loops:
for a_, b_, idx_ in zip(a_all, b_all, idx_all):
for a, b, idx in zip(a_, b_, idx_):
result[idx] += a*b
More sensible example
a_all = np.random.randn(2,3)
b_all = np.random.randn(2)
idx_all = (
((1,1), (2,2))
)
result = np.zeros(2)
for a, b, idx, res in np.nditer([a_all, b_all, idx_all, result], op_flags=['readwrite']):
res += a[idx] + b
Look at the first case, corrected so the arrays do broadcast together (if you don't understand what I've changed, you have read enough basic numpy docs.)
In [14]: a_all = np.arange(6).reshape(2,3,1)
...: idx_all = np.arange(12).reshape(2,3,2)
...:
...: for a, idx in np.nditer([a_all, idx_all]):
...: print((a, idx))
...:
(array(0), array(0))
(array(0), array(1))
(array(1), array(2))
(array(1), array(3))
(array(2), array(4))
(array(2), array(5))
(array(3), array(6))
(array(3), array(7))
(array(4), array(8))
(array(4), array(9))
(array(5), array(10))
(array(5), array(11))
nditer iterates in a 'flat' sense, passing single element arrays (0d) to the body. It's not like zip which just iterates on the first dimension (or outer layer of nested lists).
np.vectorize (which I don't recommend either), does the same sort of broadcasting, but passes python scalar elements to the function instead:
In [15]: np.vectorize(lambda a,idx: print((a,idx)))(a_all,idx_all)
(0, 0) # test run
(0, 0)
(0, 1)
(1, 2)
(1, 3)
(2, 4)
(2, 5)
(3, 6)
(3, 7)
(4, 8)
(4, 9)
(5, 10)
(5, 11)
Out[15]:
array([[[None, None],
[None, None],
[None, None]],
[[None, None],
[None, None],
[None, None]]], dtype=object)
nditer needs the same sort of performance disclaimer as np.vectorize. It doesn't help, at least not when using in python code. In cython it can be useful, as demonstrated in the larger nditer documentation page.
Also nditer inputs can be complex, as shown by the TypeError that your last example produces.
Your last example:
I had to change idx_all to array, not tuple, so it can be readwrite able. Read the op_flags docs more carefully.
And we still get the broadcasting error. It isn't iterating the 'first layer'.
In [24]: a_all = np.random.randn(2,3)
...: b_all = np.random.randn(2)
...: idx_all = (
...: ((1,1), (2,2))
...: ); idx_all=np.array(idx_all)
...: result = np.zeros(2)
...:
...: for a, b, idx, res in np.nditer([a_all, b_all, idx_all, result], op_flags=['readwrite']):
...: res += a[idx] + b
...:
ValueError: operands could not be broadcast together with shapes (2,3) (2,) (2,2) (2,)
I have the following numpy arrays:
import numpy as np
np.ones((10, 3, 2))
and I need to reshape it to <10,1,3,2>.
How can I do so?
How about this:
np.ones((10, 3, 2)).reshape([10,1,3,2])
x = np.ones((10, 3, 2))
# in place
x.shape = (10,1,3,2)
# new view
x.reshape((10,1,3,2))
# Add new axis
x[:, np.newaxis, :, :]
Like others mentioned, you can .reshape it. An alternative is to use np.newaxis or np.expand_dims like this:
arr = np.ones((10, 3, 2))
arr1 = arr[:, np.newaxis, ...]
print(arr1.shape) # (10, 1, 3, 2)
arr2 = np.expand_dims(arr, 1)
print(arr2.shape) # (10, 1, 3, 2)
# check if the two arrays are equal
print(np.array_equal(arr1, arr2)) # True
I just want to get the dot product of some sets of multidimensional data.
For simplicity, I am posting the pieces small, and demonstrating my efforts
thus far.
To just get 'a' dot 'q', and the 4 numbers that I want is easy enough.
import numpy as np
a = np.arange(1,4) # shape = (3,)
q = np.array([[x, x, x] for x in range(4)])+1 # shape = (4, 3)
c = np.dot(a, q.T) # array([ 6, 12, 18, 24]) shape = (4,)
If I want to add another set to 'a', I can expand the dimensions. Again, pretty easy. The dot product simply reflects the additional dimension.
a = np.arange(1,4).reshape(1,3) # shape = (1,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24]]) shape = (1,4)
and the other set...
a = np.vstack((a,a+1)) # shape = (2,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24], [ 9, 18, 27, 36]]) shape = (2,4)
To add another dimension to q, the transpose needs to be a little more complicated.
q = np.expand_dims(q, axis=0) # shape = (1, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 1, 4)
now stack 'q' matrix
q = np.vstack((q, q+1)) # shape = (2, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 2, 4)
Though, what I am going for is the diagonal of c. While I have not tried it yet, I am imagining that when 'a' and 'q' start to reach >(2000, 3) and >(2000, 4, 3) c will be (2000, 2000, 4) and I only need 1/2000th of that. Does anyone know how to make this more efficient than doing the calculation and then taking the diagonal?
Again, what I want is...
c = np.dot(a, np.transpose(q, (0, 2, 1)))
c = c[np.arange(2), np.arange(2)]
or
c[0] = np.dot(a[0:1], np.transpose(q[0:1], (0, 2, 1)))
c[1] = np.dot(a[1:2], np.transpose(q[1:2], (0, 2, 1)))
but without having to make the enormous matrix first and then trim it later.
I have read a couple other, kinda, similar questions. Though, I hope that this question is perceived to be more complicated than a dot product of the same vector and its diagonal, Also, if the answer is np.einsum(), could you explain the process a more than the numpy docs?
I reposted the question, with the einsum() entries at each c. In fact, Alexander Korovin linked to an excellent einsum summary.
I just want to get the dot product of some sets of multidimensional data.
For simplicity, I am posting the pieces small, and demonstrating my efforts
thus far.
To just get 'a' dot 'q', and the 4 numbers that I want is easy enough.
import numpy as np
a = np.arange(1,4) # shape = (3,)
q = np.array([[x, x, x] for x in range(4)])+1 # shape = (4, 3)
c = np.dot(a, q.T) # array([ 6, 12, 18, 24]) shape = (4,)
c = np.einsum('i,ji->j', a, q)
If I want to add another set to 'a', I can expand the dimensions. Again, pretty easy. The dot product simply reflects the additional dimension.
a = np.arange(1,4).reshape(1,3) # shape = (1,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24]]) shape = (1,4)
c = np.einsum('ij,ij->i', a, q)
and the other set...
a = np.vstack((a,a+1)) # shape = (2,3)
c = np.dot(a, q.T) # array([[ 6, 12, 18, 24], [ 9, 18, 27, 36]]) shape = (2,4)
c = np.einsum('ij,gj->ig', a, q)
To add another dimension to q, the transpose needs to be a little more complicated.
q = np.expand_dims(q, axis=0) # shape = (1, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 1, 4)
c = np.einsum('ij,fgj->fig', a, q)
now stack 'q' matrix
q = np.vstack((q, q+1)) # shape = (2, 4, 3)
c = np.dot(a, np.transpose(q, (0, 2, 1))) # shape = (2, 2, 4)
c = np.einsum('ij,fgj->fig', a, q)
Though, what I am going for is the diagonal of c. While I have not tried it yet, I am imagining that when 'a' and 'q' start to reach >(2000, 3) and >(2000, 4, 3) c will be (2000, 2000, 4) and I only need 1/2000th of that. Does anyone know how to make this more efficient than doing the calculation and then taking the diagonal?
Again, what I want is...
c = np.dot(a, np.transpose(q, (0, 2, 1)))
c = c[np.arange(2), np.arange(2)]
or
c[0] = np.dot(a[0:1], np.transpose(q[0:1], (0, 2, 1)))
c[1] = np.dot(a[1:2], np.transpose(q[1:2], (0, 2, 1)))
but without having to make the enormous matrix first and then trim it later.
So do this...
c = np.einsum('ik,ijk->ij', a, q)
I have a question regarding the conversion between (N,) dimension arrays and (N,1) dimension arrays. For example, y is (2,) dimension.
A=np.array([[1,2],[3,4]])
x=np.array([1,2])
y=np.dot(A,x)
y.shape
Out[6]: (2,)
But the following will show y2 to be (2,1) dimension.
x2=x[:,np.newaxis]
y2=np.dot(A,x2)
y2.shape
Out[14]: (2, 1)
What would be the most efficient way of converting y2 back to y without copying?
Thanks,
Tom
reshape works for this
a = np.arange(3) # a.shape = (3,)
b = a.reshape((3,1)) # b.shape = (3,1)
b2 = a.reshape((-1,1)) # b2.shape = (3,1)
c = b.reshape((3,)) # c.shape = (3,)
c2 = b.reshape((-1,)) # c2.shape = (3,)
note also that reshape doesn't copy the data unless it needs to for the new shape (which it doesn't need to do here):
a.__array_interface__['data'] # (22356720, False)
b.__array_interface__['data'] # (22356720, False)
c.__array_interface__['data'] # (22356720, False)
Use numpy.squeeze:
>>> x = np.array([[[0], [1], [2]]])
>>> x.shape
(1, 3, 1)
>>> np.squeeze(x).shape
(3,)
>>> np.squeeze(x, axis=(2,)).shape
(1, 3)
Slice along the dimension you want, as in the example below. To go in the reverse direction, you can use None as the slice for any dimension that should be treated as a singleton dimension, but which is needed to make shapes work.
In [786]: yy = np.asarray([[11],[7]])
In [787]: yy
Out[787]:
array([[11],
[7]])
In [788]: yy.shape
Out[788]: (2, 1)
In [789]: yy[:,0]
Out[789]: array([11, 7])
In [790]: yy[:,0].shape
Out[790]: (2,)
In [791]: y1 = yy[:,0]
In [792]: y1.shape
Out[792]: (2,)
In [793]: y1[:,None]
Out[793]:
array([[11],
[7]])
In [794]: y1[:,None].shape
Out[794]: (2, 1)
Alternatively, you can use reshape:
In [795]: yy.reshape((2,))
Out[795]: array([11, 7])
the opposite translation can be made by:
np.atleast_2d(y).T
Another option in your toolbox could be ravel:
>>> y2.shape
(2, 1)
>>> y_ = y2.ravel()
>>> y_.shape
(2,)
Again, a copy is made only if needed, but this is not the case:
>>> y2.__array_interface__["data"]
(2700295136768, False)
>>> y_.__array_interface__["data"]
(2700295136768, False)
For further details, you can take a look at this answer.
Normally, when we know where should we insert the newaxis, we can do a[:, np.newaxis,...]. Is there any good way to insert the newaxis at certain axis?
Here is how I do it now. I think there must be some much better ways than this:
def addNewAxisAt(x, axis):
_s = list(x.shape)
_s.insert(axis, 1)
return x.reshape(tuple(_s))
def addNewAxisAt2(x, axis):
ind = [slice(None)]*x.ndim
ind.insert(axis, np.newaxis)
return x[ind]
That singleton dimension (dim length = 1) could be added as a shape criteria to the original array shape with np.insert and thus directly change its shape, like so -
x.shape = np.insert(x.shape,axis,1)
Well, we might as well extend this to invite more than one new axes with a bit of np.diff and np.cumsum trick, like so -
insert_idx = (np.diff(np.append(0,axis))-1).cumsum()+1
x.shape = np.insert(x.shape,insert_idx,1)
Sample runs -
In [151]: def addNewAxisAt(x, axis):
...: insert_idx = (np.diff(np.append(0,axis))-1).cumsum()+1
...: x.shape = np.insert(x.shape,insert_idx,1)
...:
In [152]: A = np.random.rand(4,5)
In [153]: addNewAxisAt(A, axis=1)
In [154]: A.shape
Out[154]: (4, 1, 5)
In [155]: A = np.random.rand(5,6,8,9,4,2)
In [156]: addNewAxisAt(A, axis=5)
In [157]: A.shape
Out[157]: (5, 6, 8, 9, 4, 1, 2)
In [158]: A = np.random.rand(5,6,8,9,4,2,6,7)
In [159]: addNewAxisAt(A, axis=(1,3,4,6))
In [160]: A.shape
Out[160]: (5, 1, 6, 1, 1, 8, 1, 9, 4, 2, 6, 7)
np.insert does
slobj = [slice(None)]*ndim
...
slobj[axis] = slice(None, index)
...
new[slobj] = arr[slobj2]
Like you it constructs a list of slices, and modifies one or more elements.
apply_along_axis constructs an array, and converts it to indexing tuple
outarr[tuple(i.tolist())] = res
Other numpy functions work this way as well.
My suggestion is to make initial list large enough to hold the None. Then I don't need to use insert:
In [1076]: x=np.ones((3,2,4),int)
In [1077]: ind=[slice(None)]*(x.ndim+1)
In [1078]: ind[2]=None
In [1080]: x[ind].shape
Out[1080]: (3, 2, 1, 4)
In [1081]: x[tuple(ind)].shape # sometimes converting a list to tuple is wise
Out[1081]: (3, 2, 1, 4)
Turns out there is a np.expand_dims
In [1090]: np.expand_dims(x,2).shape
Out[1090]: (3, 2, 1, 4)
It uses reshape like you do, but creates the new shape with tuple concatenation.
def expand_dims(a, axis):
a = asarray(a)
shape = a.shape
if axis < 0:
axis = axis + len(shape) + 1
return a.reshape(shape[:axis] + (1,) + shape[axis:])
Timings don't tell me much about which is better. They are the 2 µs range, where simply wrapping the code in a function makes a difference.