I'm trying to initialize an "empty" array with each elements containing t_list a 8x8 np.zeros array :
t_list = np.zeros((8,8), dtype=np.float32)
I would now want to have a np.array with multiple t_list at each indexes:
result = np.array((t_list, t_list, ...., tlist))
I would like to be able to control the number of time t_list is in result.
I know that I could use list instead of arrays. The problem is, I put this in a numba njit function so I need to precise everything.
The aim is then to change each values in a double for loop.
The shape param of numpy.zeros can be a tuple of ints of any length, so you can create an ndarray with multiple dimensions.
e.g.:
n = 5 # or any other number that you want
result = np.zeros((n,8,8), dtype=np.float32)
Related
I have some weights that are generated via the command:
weights = np.random.rand(9+1, 8)
for i in range(8): # 7 to 8
weights[9][i] = random.uniform(.5,1.5)
Then, I try to insert it into an element of the following lattice:
lattice = np.zeros((2,10,5))
lattice[0][0][0] = weights
print(lattice)
This results in the error:
ValueError: setting an array element with a sequence.
My question is:
How can I insert the weights into the lattice?
I am aware that the problem is that the lattice is filled with float values, so it cannot accept a matrix.
I'm interested in finding a way to generate a lattice with the correct number of elements so that I can insert my matrices. An example would be very helpful.
I've read several posts on stackoverflow, including:
how to append a numpy matrix into an empty numpy array
ValueError: setting an array element with a sequence
Numpy ValueError: setting an array element with a sequence. This message may appear without the existing of a sequence?
Initialize the lattice like so in order to have entries that can be filled with matrices.
lattice = np.empty(shape=(2,10,5), dtype='object')
Presumably you won't need this to actually be a numpy array until you've finished filling the lattice. Thus, what you can do is just use nested lists, and then call numpy array on the entire list. You could do something like:
lattice = [[[None for _ in range(5)] for _ in range(10)] for _ in range(2)]
and then use:
lattice[0][0][0] = weights
and when you've filled in all the elements, call:
lattice = np.array(lattice)
I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.
I've been running into a TypeError: list indices must be integers, not tuple. However, I can't figure out how to fix it, as I'm apparently misunderstanding where the tuple is (wasn't even aware there would be one from what I understand). Shouldn't my index and the values that I'm passing in all be integers?
def videoVolume(images):
""" Create a video volume from the image list.
Note: Simple function to convert a list to a 4D numpy array.
Args:
images (list): A list of frames. Each element of the list contains a
numpy array of a colored image. You may assume that each
frame has the same shape, (rows, cols, 3).
Returns:
output (numpy.ndarray): A 4D numpy array. This array should have
dimensions (num_frames, rows, cols, 3) and
dtype np.uint8.
"""
output = np.zeros((len(images), images[0].shape[0], images[0].shape[1],
images[0].shape[2]), dtype=np.uint8)
# WRITE YOUR CODE HERE.
for x in range(len(images)):
output[:,:,:,:] = [x, images[x,:,3], images[:,x,3], 3]
# END OF FUNCTION.
return output
The tuple referred to in the error message is the x,:,3 in the index here:
images[x,:,3]
The reason this is happening is that images is passed in as a list of frames (each a 3d numpy array), but you are trying to access it as though it is itself a numpy array. (Try doing lst = [1, 2, 3]; lst[:,:] and you'll see you get the same error message).
Instead, you meant to access it as something like images[x][:,:,:], for instance
for x in range(len(images)):
output[x,:,:,:] = images[x][:,:,:]
I have a list which contains 1000 integers. The 1000 integers represent 20X50 elements of dimensional array which I read from a file into the list.
I need to walk through the list with an indicator in order to find close elements to each other. I want that my indicator will be represented not only by a simple index i, but as a two indices x,y so I can know where is my indicator along the list.
I tried to reshape the list like that:
data = np.array( l )
shape = ( 20, 50 )
data.reshape( shape )
but I don't know how to access the data array.
Update: Is there any way to find the indices of x, y for an integers that are smaller than NUM(let's say NUM=12)
According to documentation of numpy.reshape , it returns a new array object with the new shape specified by the parameters (given that, with the new shape, the amount of elements in the array remain unchanged) , without changing the shape of the original object, so when you are calling the data.reshape() function you should also assign it back to data for it to reflect in data.
Example -
data = data.reshape( shape ) # where shape = (20,50)
Also, another way to change the shape, is to directly assign the new shape to the data.shape property.
Example -
shape = (20,50)
data.shape = shape # where shape is the new shape
I have a list of several hundred 10x10 arrays that I want to stack together into a single Nx10x10 array. At first I tried a simple
newarray = np.array(mylist)
But that returned with "ValueError: setting an array element with a sequence."
Then I found the online documentation for dstack(), which looked perfect: "...This is a simple way to stack 2D arrays (images) into a single 3D array for processing." Which is exactly what I'm trying to do. However,
newarray = np.dstack(mylist)
tells me "ValueError: array dimensions must agree except for d_0", which is odd because all my arrays are 10x10. I thought maybe the problem was that dstack() expects a tuple instead of a list, but
newarray = np.dstack(tuple(mylist))
produced the same result.
At this point I've spent about two hours searching here and elsewhere to find out what I'm doing wrong and/or how to go about this correctly. I've even tried converting my list of arrays into a list of lists of lists and then back into a 3D array, but that didn't work either (I ended up with lists of lists of arrays, followed by the "setting array element as sequence" error again).
Any help would be appreciated.
newarray = np.dstack(mylist)
should work. For example:
import numpy as np
# Here is a list of five 10x10 arrays:
x = [np.random.random((10,10)) for _ in range(5)]
y = np.dstack(x)
print(y.shape)
# (10, 10, 5)
# To get the shape to be Nx10x10, you could use rollaxis:
y = np.rollaxis(y,-1)
print(y.shape)
# (5, 10, 10)
np.dstack returns a new array. Thus, using np.dstack requires as much additional memory as the input arrays. If you are tight on memory, an alternative to np.dstack which requires less memory is to
allocate space for the final array first, and then pour the input arrays into it one at a time.
For example, if you had 58 arrays of shape (159459, 2380), then you could use
y = np.empty((159459, 2380, 58))
for i in range(58):
# instantiate the input arrays one at a time
x = np.random.random((159459, 2380))
# copy x into y
y[..., i] = x