matrixADimensions = matrixA.shape # returns [901,1249,1]
matrixBDimensions = matrixB.shape # returns [901,1249]
I am trying to get the element-wise multiplication of matrixA and matrixB but I am getting the error ValueError: operands could not be broadcast together with shapes (901,1249,1) (901,1249).
I believe it has something to do with the dimensions of both matrices since they are not the same. Actually, technically they are the same since [901,1249,1] is the same thing as [901,1249] but Python does not seem to know this.
How can I multiply matrixA with matrixB?
You can use numpy.squeeze to remove single-dimensional entries from the shape of your array. So in your case, you would do:
import numpy as np
np.squeeze(matrixA) * matrixB
This has the advantage of not needing to know the position of your single-dimensional entry in your array shape (unlike taking an indexing approach such as matrixA[:,:,0]).
Related
I am trying to generate a numpy array with elements as two other numpy arrays, as below.
W1b1 = np.zeros((256, 161))
W2b2 = np.zeros((256, 257))
Wx = np.array([W1b1, W2b2], dtype=np.object)
this gives an error:
ValueError: could not broadcast input array from shape (256,161) into shape (256).
However, if I take entirely different dimensions for of W1b1 and W2b2 then I do not get an error, as below.
A1 = np.zeros((256, 161))
A2 = np.zeros((257, 257))
A3 = np.array([A1, A2], dtype=np.object)
I do not get what is wrong in the first code and why is numpy array trying to broadcast one of the input arrays.
I have tried on below versions (Python 2.7.6, Numpy 1.13.1) and (Python 3.6.4, Numpy 1.14.1).
Don't count on np.array(..., object) making the right object array. At the moment we don't have control over how many dimensions it makes. Conceivably it could make a (2,) array, or (2, 256) (with 1d contents). Sometimes it works, sometimes raises an error. There's something of a pattern, but I haven't seen an analysis of the code that shows exactly what is happening.
For now it is safer to allocate the array, and fill it:
In [57]: arr = np.empty(2, object)
In [58]: arr[:] = [W1b1, W2b2]
np.array([np.zeros((3,2)),np.ones((3,4))], object) also raises this error. So the error arises when the first dimensions match, but the later ones don't. Now that I think about, I've seen this error before.
Earlier SO questions on the topic
numpy array 1.9.2 getting ValueError: could not broadcast input array from shape (4,2) into shape (4)
Creation of array of arrays fails, when first size of first dimension matches
Creating array of arrays in numpy with different dimensions
I am trying to generate a numpy array with elements as two other numpy arrays, as below.
W1b1 = np.zeros((256, 161))
W2b2 = np.zeros((256, 257))
Wx = np.array([W1b1, W2b2], dtype=np.object)
this gives an error:
ValueError: could not broadcast input array from shape (256,161) into shape (256).
However, if I take entirely different dimensions for of W1b1 and W2b2 then I do not get an error, as below.
A1 = np.zeros((256, 161))
A2 = np.zeros((257, 257))
A3 = np.array([A1, A2], dtype=np.object)
I do not get what is wrong in the first code and why is numpy array trying to broadcast one of the input arrays.
I have tried on below versions (Python 2.7.6, Numpy 1.13.1) and (Python 3.6.4, Numpy 1.14.1).
Don't count on np.array(..., object) making the right object array. At the moment we don't have control over how many dimensions it makes. Conceivably it could make a (2,) array, or (2, 256) (with 1d contents). Sometimes it works, sometimes raises an error. There's something of a pattern, but I haven't seen an analysis of the code that shows exactly what is happening.
For now it is safer to allocate the array, and fill it:
In [57]: arr = np.empty(2, object)
In [58]: arr[:] = [W1b1, W2b2]
np.array([np.zeros((3,2)),np.ones((3,4))], object) also raises this error. So the error arises when the first dimensions match, but the later ones don't. Now that I think about, I've seen this error before.
Earlier SO questions on the topic
numpy array 1.9.2 getting ValueError: could not broadcast input array from shape (4,2) into shape (4)
Creation of array of arrays fails, when first size of first dimension matches
Creating array of arrays in numpy with different dimensions
I am trying to use argsort on an array of float arrays, but faced some problem.
Here is what I try to do:
import numpy as np
z = np.array([np.array([30.9,29.0,5.87],dtype=float),np.array([20.3,1.3,8.8,4.4],dtype=float)]) # actually z is transferred from a tree using root2array whose corrseponding branches is a vector<vector<float>>
Index_list = np.argsort(z)
Then I received:
ValueError: operands could not be broadcast together with shapes (4,) (3,)
So what should I do to modify z or change the way of argsort to make it work?
I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.
I have a list of several hundred 10x10 arrays that I want to stack together into a single Nx10x10 array. At first I tried a simple
newarray = np.array(mylist)
But that returned with "ValueError: setting an array element with a sequence."
Then I found the online documentation for dstack(), which looked perfect: "...This is a simple way to stack 2D arrays (images) into a single 3D array for processing." Which is exactly what I'm trying to do. However,
newarray = np.dstack(mylist)
tells me "ValueError: array dimensions must agree except for d_0", which is odd because all my arrays are 10x10. I thought maybe the problem was that dstack() expects a tuple instead of a list, but
newarray = np.dstack(tuple(mylist))
produced the same result.
At this point I've spent about two hours searching here and elsewhere to find out what I'm doing wrong and/or how to go about this correctly. I've even tried converting my list of arrays into a list of lists of lists and then back into a 3D array, but that didn't work either (I ended up with lists of lists of arrays, followed by the "setting array element as sequence" error again).
Any help would be appreciated.
newarray = np.dstack(mylist)
should work. For example:
import numpy as np
# Here is a list of five 10x10 arrays:
x = [np.random.random((10,10)) for _ in range(5)]
y = np.dstack(x)
print(y.shape)
# (10, 10, 5)
# To get the shape to be Nx10x10, you could use rollaxis:
y = np.rollaxis(y,-1)
print(y.shape)
# (5, 10, 10)
np.dstack returns a new array. Thus, using np.dstack requires as much additional memory as the input arrays. If you are tight on memory, an alternative to np.dstack which requires less memory is to
allocate space for the final array first, and then pour the input arrays into it one at a time.
For example, if you had 58 arrays of shape (159459, 2380), then you could use
y = np.empty((159459, 2380, 58))
for i in range(58):
# instantiate the input arrays one at a time
x = np.random.random((159459, 2380))
# copy x into y
y[..., i] = x