How to append an an array but not the memory location? - python

I am generating a large set of monte carlo data and ideally want to store in it an array of arrays.
When i use the array.append(x) and then cycle over a loop that returns a new array for x when I look at the elements of the array at the end they are all the same as the last array x added to the list. I believe this must be because i'm adding the memory location to the list and not the actual array data hence when I add more arrays all the other elements that point to the same location also update.
Is there anyway to prevent this by setting a kwarg or something or do i have to construct my arrays in a different way?
#test to illustrate point
import numpy as np
x = np.random.choice((-1, 1), size=(5, 5))
array_test = []
for T in range(10):
array_test.append(x)
print(x)
x += 10
print(array_test)

Related

How to insert a matrix into an numpy array?

I have some weights that are generated via the command:
weights = np.random.rand(9+1, 8)
for i in range(8): # 7 to 8
weights[9][i] = random.uniform(.5,1.5)
Then, I try to insert it into an element of the following lattice:
lattice = np.zeros((2,10,5))
lattice[0][0][0] = weights
print(lattice)
This results in the error:
ValueError: setting an array element with a sequence.
My question is:
How can I insert the weights into the lattice?
I am aware that the problem is that the lattice is filled with float values, so it cannot accept a matrix.
I'm interested in finding a way to generate a lattice with the correct number of elements so that I can insert my matrices. An example would be very helpful.
I've read several posts on stackoverflow, including:
how to append a numpy matrix into an empty numpy array
ValueError: setting an array element with a sequence
Numpy ValueError: setting an array element with a sequence. This message may appear without the existing of a sequence?
Initialize the lattice like so in order to have entries that can be filled with matrices.
lattice = np.empty(shape=(2,10,5), dtype='object')
Presumably you won't need this to actually be a numpy array until you've finished filling the lattice. Thus, what you can do is just use nested lists, and then call numpy array on the entire list. You could do something like:
lattice = [[[None for _ in range(5)] for _ in range(10)] for _ in range(2)]
and then use:
lattice[0][0][0] = weights
and when you've filled in all the elements, call:
lattice = np.array(lattice)

Create np.array filled with zero arrays

I'm trying to initialize an "empty" array with each elements containing t_list a 8x8 np.zeros array :
t_list = np.zeros((8,8), dtype=np.float32)
I would now want to have a np.array with multiple t_list at each indexes:
result = np.array((t_list, t_list, ...., tlist))
I would like to be able to control the number of time t_list is in result.
I know that I could use list instead of arrays. The problem is, I put this in a numba njit function so I need to precise everything.
The aim is then to change each values in a double for loop.
The shape param of numpy.zeros can be a tuple of ints of any length, so you can create an ndarray with multiple dimensions.
e.g.:
n = 5 # or any other number that you want
result = np.zeros((n,8,8), dtype=np.float32)

Append numpy one dimensional arrays does not lead to a matrix

I am trying to get a 2d array, by randomly generating its rows and appending
import numpy as np
my_nums = np.array([])
for i in range(100):
x = np.random.rand(2, 1)
my_nums = np.append(my_nums, np.array(x))
But I do not get what I want but instead get a 1d array.
What is wrong?
Transposing x did not help either.
You could do this by using np.append(axis=0) or np.vstack. This however requires the rows appended to have the same length as the rows already in the array.
You cannot use the same code to append a row with two values to an empty array, and to append a row to an already existing 2D array: numpy will throw a
ValueError: all the input arrays must have same number of dimensions.
You could initialize my_nums to work around this:
my_nums = np.random.rand(1, 2)
for i in range(99):
x = np.random.rand(1, 2)
my_nums = np.append(my_nums, x, axis=0)
Note the decrease in the range by one due to the initialization row. Also note that I changed the dimensions to (1, 2) to get actual row vectors.
Much easier than appending row-wise will of course be to create the array in the wanted final shape:
my_nums = np.random.rand(100, 2)

np.bincount for 1 line, vectorized multidimensional averaging

I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.

Convert a list of 2D numpy arrays to one 3D numpy array?

I have a list of several hundred 10x10 arrays that I want to stack together into a single Nx10x10 array. At first I tried a simple
newarray = np.array(mylist)
But that returned with "ValueError: setting an array element with a sequence."
Then I found the online documentation for dstack(), which looked perfect: "...This is a simple way to stack 2D arrays (images) into a single 3D array for processing." Which is exactly what I'm trying to do. However,
newarray = np.dstack(mylist)
tells me "ValueError: array dimensions must agree except for d_0", which is odd because all my arrays are 10x10. I thought maybe the problem was that dstack() expects a tuple instead of a list, but
newarray = np.dstack(tuple(mylist))
produced the same result.
At this point I've spent about two hours searching here and elsewhere to find out what I'm doing wrong and/or how to go about this correctly. I've even tried converting my list of arrays into a list of lists of lists and then back into a 3D array, but that didn't work either (I ended up with lists of lists of arrays, followed by the "setting array element as sequence" error again).
Any help would be appreciated.
newarray = np.dstack(mylist)
should work. For example:
import numpy as np
# Here is a list of five 10x10 arrays:
x = [np.random.random((10,10)) for _ in range(5)]
y = np.dstack(x)
print(y.shape)
# (10, 10, 5)
# To get the shape to be Nx10x10, you could use rollaxis:
y = np.rollaxis(y,-1)
print(y.shape)
# (5, 10, 10)
np.dstack returns a new array. Thus, using np.dstack requires as much additional memory as the input arrays. If you are tight on memory, an alternative to np.dstack which requires less memory is to
allocate space for the final array first, and then pour the input arrays into it one at a time.
For example, if you had 58 arrays of shape (159459, 2380), then you could use
y = np.empty((159459, 2380, 58))
for i in range(58):
# instantiate the input arrays one at a time
x = np.random.random((159459, 2380))
# copy x into y
y[..., i] = x

Categories

Resources