I'm currently trying to find an easy way to do the following operation to an N dimensional array in Python. For simplicity let's start with a 1 dimensional array of size 4.
X = np.array([1,2,3,4])
What I want to do is create a new array, call it Y, such that:
Y = np.array([1,2,3,4],[2,3,4,1],[3,4,1,2],[4,1,2,3])
So what I'm trying to do is create an array Y such that:
Y[:,i] = np.roll(X[:],-i, axis = 0)
I know how to do this using for loops, but I'm looking for a faster method of doing so. The actual array I'm trying to do this to is a 3 dimensional array, call it X. What I'm looking for is a way to find an array Y, such that:
Y[:,:,:,i,j,k] = np.roll(X[:,:,:],(-i,-j,-k),axis = (0,1,2))
I can do this using the itertools.product class using for loops, but this is quite slow. If anyone has a better way of doing this, please let me know. I also have CUPY installed with a GTX-970, so if there's a way of using CUDA to do this faster please let me know. If anyone wants some more context please let me know.
Here is my original code for computing the position space two point correlation function. The array x0 is an n by n by n real valued array representing a real scalar field. The function iterate(j,s) runs j iterations. Each iteration consists of generating a random float between -s and s for each lattice site. It then computes the change in the action dS and accepts the change with a probability of min(1,exp^(-dS))
def momentum(k,j,s):
global Gxa
Gx = numpy.zeros((n,n,t))
for i1 in range(0,k):
iterate(j,s)
for i2,i3,i4 in itertools.product(range(0,n),range(0,n),range(0,n)):
x1 = numpy.roll(numpy.roll(numpy.roll(x0, -i2, axis = 0),-i3, axis = 1),-i4,axis = 2)
x2 = numpy.mean(numpy.multiply(x0,x1))
Gx[i2,i3,i4] = x2
Gxa = Gxa + Gx
Gxa = Gxa/k
Approach #1
We can extend this idea to our 3D array case here. So, simply concatenate with sliced versions along the three dims and then use np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to efficiently get the final output as the strided-view of the concatenated version, like so -
from skimage.util.shape import view_as_windows
X1 = np.concatenate((X,X[:,:,:-1]),axis=2)
X2 = np.concatenate((X1,X1[:,:-1,:]),axis=1)
X3 = np.concatenate((X2,X2[:-1,:,:]),axis=0)
out = view_as_windows(X3,X.shape)
Approach #2
For really large arrays, we might want to initialize the output array and then re-use X3 from earlier approach to assign with slicing it. This slicing process would be faster than the original-rolling. The implementation would be -
m,n,r = X.shape
Yout = np.empty((m,n,r,m,n,r),dtype=X.dtype)
for i in range(m):
for j in range(n):
for k in range(r):
Yout[:,:,:,i,j,k] = X3[i:i+m,j:j+n,k:k+r]
Related
I wrote the following function, which takes as inputs three 1D array (namely int_array, x, and y) and a number lim. The output is a number as well.
def integrate_to_lim(int_array, x, y, lim):
if lim >= np.max(x):
res = 0.0
if lim <= np.min(x):
res = int_array[0]
else:
index = np.argmax(x > lim) # To find the first element of x larger than lim
partial = int_array[index]
slope = (y[index-1] - y[index]) / (x[index-1] - x[index])
rest = (x[index] - lim) * (y[index] + (lim - x[index]) * slope / 2.0)
res = partial + rest
return res
Basically, outside form the limit cases lim>=np.max(x) and lim<=np.min(x), the idea is that the function finds the index of the first value of the array x larger than lim and then uses it to make some simple calculations.
In my case, however lim can also be a fairly big 2D array (shape ~2000 times ~1000 elements)
I would like to rewrite it such that it makes the same calculations for the case that lim is a 2D array.
Obviously, the output should also be a 2D array of the same shape of lim.
I am having a real struggle figuring out how to vectorize it.
I would like to stick only to the numpy package.
PS I want to vectorize my function because efficiency is important and as I understand using for loops is not a good choice in this regard.
Edit: my attempt
I was not aware of the function np.take, which made the task way easier.
Here is my brutal attempt that seems to work (suggestions on how to clean up or to make the code faster are more than welcome).
def integrate_to_lim_vect(int_array, x, y, lim_mat):
lim_mat = np.asarray(lim_mat) # Make sure that it is an array
shape_3d = list(lim_mat.shape) + [1]
x_3d = np.ones(shape_3d) * x # 3 dimensional version of x
lim_3d = np.expand_dims(lim_mat, axis=2) * np.ones(x_3d.shape) # also 3d
# I use np.argmax on the 3d matrices (is there a simpler way?)
index_mat = np.argmax(x_3d > lim_3d, axis=2)
# Silly calculations
partial = np.take(int_array, index_mat)
y1_mat = np.take(y, index_mat)
y2_mat = np.take(y, index_mat - 1)
x1_mat = np.take(x, index_mat)
x2_mat = np.take(x, index_mat - 1)
slope = (y1_mat - y2_mat) / (x1_mat - x2_mat)
rest = (x1_mat - lim_mat) * (y1_mat + (lim_mat - x1_mat) * slope / 2.0)
res = partial + rest
# Make the cases with np.select
condlist = [lim_mat >= np.max(x), lim_mat <= np.min(x)]
choicelist = [0.0, int_array[0]] # Shoud these options be a 2d matrix?
output = np.select(condlist, choicelist, default=res)
return output
I am aware that if the limit is larger than the maximum value in the array np.argmax returns the index zero (leading to wrong results). This is why I used np.select to check and correct for these cases.
Is it necessary to define the three dimensional matrices x_3d and lim_3d, or there is a simpler way to find the 2D matrix of the indices index_mat?
Suggestions, especially to improve the way I expanded the dimension of the arrays, are welcome.
I think you can solve this using two tricks. First, a 2d array can be easily flattened to a 1d array, and then your answers can be converted back into a 2d array with reshape.
Next, your use of argmax suggests that your array is sorted. Then you can find your full set of indices using digitize. Thus instead of a single index, you will get a complete array of indices. All the calculations you are doing are intrinsically supported as array operations in numpy, so that should not cause any problems.
You will have to specifically look at the limiting cases. If those are rare enough, then it might be okay to let the answers be derived by the default formula (they will be garbage values), and then replace them with the actual values you desire.
Is it possible to use numpy to raise an array to the power of another array, in a way that yields a result with a larger dimension than the inputs - i.e. not just simple element wise raising to the power of.
As a simple example, I'm looking to compute the following. Below is the "longhand" form - in practice this is implemented by a loop over a large x array, so it's slow.
x = np.arange(4)
t = np.random.rand(3,3)
y = np.empty_like(x)
y[0] = np.sum(x[0]**t)
y[1] = np.sum(x[1]**t)
y[2] = np.sum(x[2]**t)
y[3] = np.sum(x[3]**t)
I'd like a vectorised solution to replace doing y[i] each time. However, since x has shape [4] and y has shape [3,3], when I try to compute x**t I get an error.
Is there a fast optimized solution?
A straight-forward vectorized way would be with broadcasting -
y = (x[:,None,None]**t).sum((1,2)).astype(x.dtype)
Or with the builtin np.power.outer -
y = np.power.outer(x,t).sum((1,2)).astype(x.dtype)
For large arrays, leverage multi-cores with numexpr module -
import numexpr as ne
y = ne.evaluate('sum(x3D**t1D,1)',{'x3D':x[:,None],'t1D':t.ravel()}).astype(x.dtype)
i got a problem to solve and i cannot come up with a good solution.
To ease it down I got an array of 10x10 and i want to slice out "little arrays" of 3x3. Right now i do this the following way:
array = np.arange(100).reshape((10,10))
patch = np.array(array[:3, :3]
for n in range(3, 10, 3):
for m in range(3, 10, 3):
patch = numpy.append(patch, array[n:n+3, m:m+3]
i basically create the numpy array patch with the first slice and append all other slices afterwards. The problem with this is that its horribly slow and does not do good use of the slicing opportunities of numpy. I need to do this for a high number of much bigger arrays.
Can anyone give me any advice on how to make this more efficient?
1000 thanks!
Your problem is entirely down to using numpy.append. append creates a new array each time you use it. As your patch array gets bigger this will take progressively longer.
Instead, use a presized array (you already know the final size of the patch array), and avoid making intermediary copies of any data.
# setup
x, y = 999, 999
array = np.arange(x * y)
array.shape = x, y
little_array_size = 3
# creates an array of "little arrays"
patch = np.empty(array.size, dtype=int)
patch.shape = -1, little_array_size, little_array_size
i = 0
for n in range(0, array.shape[0], little_array_size):
for m in range(0, array.shape[1], little_array_size):
# uses view, so data is copied straight from array to patch
patch[i,:] = array[n:n+little_array_size, m:m+little_array_size]
i += 1
patch.shape = -1 # flattens array
The above takes about a third of second on my computer (two orders of magnitude faster than using numpy.append (20+ seconds)).
I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.
I am trying to create 3D array in python using Numpy and by multiplying 2D array in to 3rd dimension. I am quite new in Numpy multidimensional arrays and basically I am missing something important here.
In this example I am trying to make 10x10x20 3D array using base 2D array(10x10) by copying it 20 times.
My starting 2D array:
a = zeros(10,10)
for i in range(0,9):
a[i+1, i] = 1
What I tried to create 3D array:
b = zeros(20)
for i in range(0,19):
b[i]=a
This approach is probably stupid. So what is correct way to approach construction of 3D arrays from base 2D arrays?
Cheers.
Edit
Well I was doing things wrongly probably because of my R background.
Here is how I did it finally
b = zeros(20*10*10)
b = b.reshape((20,10,10))
for i in b:
for m in range(0, 9):
i[m+1, m] = 1
Are there any other ways to do the same?
There are many ways how to construct multidimensional arrays.
If you want to construct a 3D array from given 2D arrays you can do something like
import numpy
# just some 2D arrays with shape (10,20)
a1 = numpy.ones((10,20))
a2 = 2* numpy.ones((10,20))
a3 = 3* numpy.ones((10,20))
# creating 3D array with shape (3,10,20)
b = numpy.array((a1,a2,a3))
Depending on the situation there are other ways which are faster. However, as long as you use built-in constructors instead of loops you are on the fast side.
For your concrete example in Edit I would use numpy.tri
c = numpy.zeros((20,10,10))
c[:] = numpy.tri(10,10,-1) - numpy.tri(10,10,-2)
Came across similar problem...
I needed to modify 2D array into 3D array like so:
(y, x) -> (y, x, 3).
Here is couple solutions for this problem.
Solution 1
Using python tool set
array_3d = numpy.zeros(list(array_2d.shape) + [3], 'f')
for z in range(3):
array_3d[:, :, z] = array_2d.copy()
Solution 2
Using numpy tool set
array_3d = numpy.stack([array_2d.copy(), ]*3, axis=2)
That is what I came up with. If someone knows numpy to give a better solution I would love to see it! This works but I suspect there is a better way performance-wise.