Pythonic way to create a numpy array of coordinates - python

I'm trying to create a numpy array of coordinates. Up until now, I've been just using x_coords, y_coords = numpy.indices((shape)). Now, however, I want to combine x_coords and y_coords into one array, such that x_coords = thisArray[:,:,0] and y_coords = thisArray[:,:,1] In this case, thisArray is a two-dimensional array. Is there a simple or pythonic way to do this?
I originally thought about using numpy.outer, but that doesn't quite give me what I need. A possible idea is just using concatenation of the indices array along the (2nd?) axis, but that doesn't seem like a very elegant solution. (it may be the cleanest one here though).
Thanks!

What np.indices returns is already an array, but x_coords = thisArray[0, :, :] and y_coords = thisArray[1, :, :]. Unless you have very strict requirements for your array of coordinates (namely that it be contiguous), you can take a view of that array with the first axis rolled to the end:
thisArray = numpy.rollaxis(numpy.indices(shape), 0, len(shape)+1)

Related

Creating Mask that Applies to Vectors in 3D Array

I could not find a previous post that specifically addressed how to create masks that work against vectors in a 3D array. I have only found previous questions and answers that either address only how masks can be applied to individual elements in a 3D array or vectors in a 2D array. So as the title states, that is exactly what I wish to do here. I want to remove all zero vectors from a 3D (x,y,z) array and the only method I can think of is to create two for loops that run over both x and (y,:) as shown in the code below. However, this does not work either because of the error message I receive when I try to run this.
'list' object cannot be safely interpreted as an integer
Moreover, even if I do get this method to work somehow, I know that using a double for loop will make this masking process very time consuming because eventually I want to apply this to array sizes in the millions. So this develops into my main question; What would be the fastest method to accomplish this?
Code:
import numpy as np
data = np.array([[[0,0,0],[1,2,3],[4,5,6],[0,0,0]],[[7,8,9],[0,0,0],[0,0,0],[10,11,12]]],dtype=float)
datanonzero = np.empty([[],[]],dtype=float)
for maskclear1 in range(0,2):
for maskclear2 in range(0,4):
datanonzero[maskclear1,maskclear2,:] = data[~np.all(data[maskclear1,maskclear2,0:3] == 0, axis=0)
import numpy as np
data = np.array([[[0,0,0],[1,2,3],[4,5,6],[0,0,0]],[[7,8,9],[0,0,0],[0,0,0],[10,11,12]]],dtype=float)
flatten_data = data.reshape(-1, 3)
datanonzero = [ data[~np.all(vec == 0, axis=0)] for vec in flatten_data ]
datanonzero = np.reshape(datanonzero, (2,-1))

Reshaping array of matrices in Python

I have a Numpy array X of n 2x2 matrices, arranged so that X.shape = (2,2,n), that is, to get the first matrix I call X[:,:,0]. I would like to reshape X into an array Y such that I can get the first matrix by calling Y[0] etc., but performing X.reshape(n,2,2) messes up the matrices. How can I get it to preserve the matrices while reshaping the array?
I am essentially trying to do this:
import numpy as np
Y = np.zeros([n,2,2])
for i in range(n):
Y[i] = X[:,:,i]
but without using the for loop. How can I do this with reshape or a similar function?
(To get an example array X, try X = np.concatenate([np.identity(2)[:,:,None]] * n, axis=2) for some n.)
numpy.moveaxis can be used to take a view of an array with one axis moved to a different position in the shape:
numpy.moveaxis(X, 2, 0)
numpy.moveaxis(a, source, destination) takes a view of array a where the axis originally at position source ends up at position destination, so numpy.moveaxis(X, 2, 0) makes the original axis 2 the new axis 0 in the view.
There's also numpy.transpose, which can be used to perform arbitrary rearrangements of an array's axes in one go if you pass it the optional second argument, and numpy.rollaxis, an older version of moveaxis with a more confusing calling convention.
Use swapaxis:
Y = X.swapaxes(0,2)

np.bincount for 1 line, vectorized multidimensional averaging

I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.

Vectorizing a numpy array call of varying indices

I have a 2D numpy array and a list of lists of indices for which I wish to compute the sum of the corresponding 1D vectors from the numpy array. This can be easily done through a for loop or via list comprehension, but I wonder if it's possible to vectorize it. With similar code I gain about 40x speedups from the vectorization.
Here's sample code:
import numpy as np
indices = [[1,2],[1,3],[2,0,3],[1]]
array_2d = np.array([[0.5, 1.5],[1.5,2.5],[2.5,3.5],[3.5,4.5]])
soln = [np.sum(array_2d[x], axis=-1) for x in indices]
(edit): Note that the indices are not (x,y) coordinates for array_2d, instead indices[0] = [1,2] represents the first and second vectors (rows) in array_2d. The number of elements of each list in indices can be variable.
This is what I would hope to be able to do:
vectorized_soln = np.sum(array_2d[indices[:]], axis=-1)
Does anybody know if there are any ways of achieving this?
First to all, I think you have a typo in the third element of indices...
The easy way to do that is building a sub_array with two arrays of indices:
i = np.array([1,1,2])
j = np.array([2,3,?])
sub_arr2d = array_2d[i,j]
and finally, you can take the sum of sub_arr2d...

scipy -- how to insert an array of zeros into another array with different dimensions

If i have an array:
myzeros=scipy.zeros((c*pos,c*pos)) , c=0.1 , pos=100
and an array:
grid=scipy.ones((pos,pos))
How can i insert the zeros into the grid in random positions? The problem is with the dimensions.
I know that in 1d you can do:
myzeros=sc.zeros(c*pos) # array full of (zeros)
grid=sc.ones(pos) # grid full of available positions(ones)
dist=sc.random.permutation(pos)[:c*pos] # distribute c*pos zeros in random
# positions
grid[dist]=myzeros
I tried something similar but it doesn't work. I tried also: myzeros=sc.zeros(c*pos), but it still does not work.
There are several ways, but the easiest seems to be to first convert the 2D grid into a 1D grid and proceed as in the 1D case, then convert back to 2D:
c = 0.1
pos = 100
myzeros=scipy.zeros((c*pos,c*pos))
myzeros1D = myzeros.ravel()
grid=scipy.ones((pos,pos))
grid1D = grid.ravel()
dist=sc.random.permutation(pos*pos)[:c*pos*c*pos]
grid1D[dist]=myzeros1D
myzeros = myzeros1D.reshape((c*pos,c*pos))
grid = grid1D.reshape((pos, pos))
EDIT: to answer your comment: if you only want a part of the myzeros to go into the grid array, you have to make the dist array smaller. Example:
dist = scipy.random.permutation(pos*pos)[:c*pos]
grid1D[dist] = myzeros1D[:c*pos]
And I hope you are aware, that this last line can be written as
grid1D[dist] = 0
if you really only want to set those elements to a single instead of using the elements from another array.

Categories

Resources