Related
I have to run the snippet shown below about 200000 times in a row and the snippet needs about 0.12585 seconds for 1000 iterations. Datapoints has a shape of (3, 2704, 64)
output = []
maxium = 0
for datapoint in datapoints:
tmp = []
for data in datapoint:
maxium = max(data)
if maxium == 0:
tmp.append(data)
else:
tmp.append(data / maxium)
output.append(tmp)
I have tried to rewrite it using map() but this gives me an average of 0.23237 seconds per iteration. This is probably due to the multiple max(y) and list() calls.
np.asarray(list(map(lambda datapoint: list(map(lambda data: data / max(data) if max(data) > 0 else y, datapoint)), datapoints)))
Is there a possibility to optimize the code again to improve performance?
Well here's a short answer:
def bar(datapoints):
m = np.amax(datapoints, axis=2)
m[m == 0] = 1
return datapoints / m[:,:,np.newaxis]
Here's an explanation of how you might have got there (it's how I did get there!):
Let's start off with some example data:
>>> x = np.array([[[1, 2, 3, 4], [11, -12, 13, -14]], [[26, 27, 28, 29], [0, 0, 0, 0]]])
Now check what you get on your original function:
def foo(datapoints):
output = []
maxium = 0
for datapoint in datapoints:
tmp = []
for data in datapoint:
maxium = max(data)
if maxium == 0:
tmp.append(data)
else:
tmp.append(data / maxium)
output.append(tmp)
return numpy.array(output)
The result is:
>>> foo(x)
array([[[ 0.25 , 0.5 , 0.75 , 1. ],
[ 0.84615385, -0.92307692, 1. , -1.07692308]],
[[ 0.89655172, 0.93103448, 0.96551724, 1. ],
[ 0. , 0. , 0. , 0. ]]])
Now let's try out amax:
>>> np.amax(x, axis=0)
array([[26, 27, 28, 29],
[11, 0, 13, 0]])
>>> np.amax(x, axis=2)
array([[ 4, 13],
[29, 0]])
Ah ha, looks like axis=2 is what we're after. Now we want to divide the original array by this, but only in the places where the max is non-zero. How do only divide in some places? The answer is: we divide everywhere, but in some places we divide by 1 so it has no effect. So let's replace zeros with ones:
>>> m = np.amax(x, axis=2)
>>> m[m == 0] = 1
>>> m
array([[ 4, 13],
[29, 1]])
Finally, let's divide by this, broadcasting back over axis 2 which we took the maximum over earlier:
>>> x / m[:,:,np.newaxis]
array([[[ 0.25 , 0.5 , 0.75 , 1. ],
[ 0.84615385, -0.92307692, 1. , -1.07692308]],
[[ 0.89655172, 0.93103448, 0.96551724, 1. ],
[ 0. , 0. , 0. , 0. ]]])
Putting that all together you get bar() at the top.
Try something like this:
maximum = datapoints.max(axis=2, keepdims=True)
output = np.where(maximum==0, datapoints, datapoints/maximum)
You would see a warning invalid value encounter in true_divide but it should work as expected.
Update as #ArthurTacca pointed out:
output = datapoints/np.where(maximum==0, 1, maximum)
will eliminate the warning.
Yes you can definitely speed this up w/ vectorized numpy operations. Here's how I would do it, if I understand what you're trying to do correctly:
import numpy as np
# I use a randomly initialized array here, replace this with your input
arr = np.random.random(size=(3, 2704, 64))
# Find max for 3rd dimension, returns array w/ shape (3, 2704)
max_arr = np.max(arr, axis=2)
# Set up divisor, returns array w/ shape (3, 2704)
divisor = np.where(max_arr == 0, 1, max_arr)
# Use expand_dims to add third dimension, returns array w/ shape (3, 2704, 1)
divisor = np.expand_dims(divisor, axis=2)
# Perform division, shape is (3, 2704, 64)
ans = np.divide(arr, divisor)
From your code, I gather that you intend to scale your data by the max of your 3rd axis, but in the event of there being 0, forego scaling instead. You seem to also want your output to have the same shape as your input, which explains the way you structured output and tmp. That's why I left the code snippet to end w/ output in a numpy array, but if you need it in its original form regardless, its a simple loop to re-arrange your data:
output = []
for i in ans:
tmp = []
for j in i:
tmp.append(list(j))
output.append(tmp)
For future reference, furnish your questions with more detail. It will make it easier for people to participate, and you'll increase the chance of getting your questions answered quickly!
I have a target array A, which represents isobaric pressure levels in NCEP reanalysis data.
I also have the pressure at which a cloud is observed as a long time series, B.
What I am looking for is a k-nearest neighbour lookup that returns the indices of those nearest neighbours, something like knnsearch in Matlab that could be represented the same in python such as: indices, distance = knnsearch(A, B, n)
where indices is the nearest n indices in A for every value in B, and distance is how far removed the value in B is from the nearest value in A, and A and B can be of different lengths (this is the bottleneck that I have found with most solutions so far, whereby I would have to loop each value in B to return my indices and distance)
import numpy as np
A = np.array([1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10]) # this is a fixed 17-by-1 array
B = np.array([923, 584.2, 605.3, 153.2]) # this can be any n-by-1 array
n = 2
What I would like returned from indices, distance = knnsearch(A, B, n) is this:
indices = [[1, 2],[4, 5] etc...]
where 923 in A is matched to first A[1]=925 and then A[2]=850
and 584.2 in A is matched to first A[4]=600 and then A[5]=500
distance = [[72, 77],[15.8, 84.2] etc...]
where 72 represents the distance between queried value in B to the nearest value in A e.g. distance[0, 0] == np.abs(B[0] - A[1])
The only solution I have been able to come up with is:
import numpy as np
def knnsearch(A, B, n):
indices = np.zeros((len(B), n))
distances = np.zeros((len(B), n))
for i in range(len(B)):
a = A
for N in range(n):
dif = np.abs(a - B[i])
ind = np.argmin(dif)
indices[i, N] = ind + N
distances[i, N] = dif[ind + N]
# remove this neighbour from from future consideration
np.delete(a, ind)
return indices, distances
array_A = np.array([1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10])
array_B = np.array([923, 584.2, 605.3, 153.2])
neighbours = 2
indices, distances = knnsearch(array_A, array_B, neighbours)
print(indices)
print(distances)
returns:
[[ 1. 2.]
[ 4. 5.]
[ 4. 3.]
[10. 11.]]
[[ 2. 73. ]
[ 15.8 84.2]
[ 5.3 94.7]
[ 3.2 53.2]]
There must be a way to remove the for loops, as I need the performance should my A and B arrays contain many thousands of elements with many nearest neighbours...
Please help! Thanks :)
The second loop can easily be vectorized. The most straightforward way to do it is to use np.argsort and select the indices corresponding to the n smallest dif values. However, for large arrays, as only n values should be sorted, it is better to use np.argpartition.
Therefore, the code would look like something like that:
def vector_knnsearch(A, B, n):
indices = np.empty((len(B), n))
distances = np.empty((len(B), n))
for i,b in enumerate(B):
dif = np.abs(A - b)
min_ind = np.argpartition(dif,n)[:n] # Returns the indexes of the 3 smallest
# numbers but not necessarily sorted
ind = min_ind[np.argsort(dif[min_ind])] # sort output of argpartition just in case
indices[i, :] = ind
distances[i, :] = dif[ind]
return indices, distances
As said in the comments, the first loop can also be removed using a meshgrid, however, the extra use of memory and computation time to construct the meshgrid makes this approach slower for the dimensions I tried (and this will probably get worse for large arrays and end up in Memory Error). In addition, the readability of the code decreases. Overall, this would probably do this approach less pythonic.
def mesh_knnsearch(A, B, n):
m = len(B)
rng = np.arange(m).reshape((m,1))
Amesh, Bmesh = np.meshgrid(A,B)
dif = np.abs(Amesh-Bmesh)
min_ind = np.argpartition(dif,n,axis=1)[:,:n]
ind = min_ind[rng,np.argsort(dif[rng,min_ind],axis=1)]
return ind, dif[rng,ind]
Not that it is important to define this rng as a 2d array in order to retrieve a[rng[0],ind[0]], a[rng[1],ind[1]], etc and maintain the dimensions of the array, as opposed to a[:,ind] which retrieves a[:,ind[0]], a[:,ind[1]], etc.
I tried to implement strided convolution of a 2D array using for loop i.e
arr = np.array([[2,3,7,4,6,2,9],
[6,6,9,8,7,4,3],
[3,4,8,3,8,9,7],
[7,8,3,6,6,3,4],
[4,2,1,8,3,4,6],
[3,2,4,1,9,8,3],
[0,1,3,9,2,1,4]])
arr2 = np.array([[3,4,4],
[1,0,2],
[-1,0,3]])
def stride_conv(arr1,arr2,s,p):
beg = 0
end = arr2.shape[0]
final = []
for i in range(0,arr1.shape[0]-1,s):
k = []
for j in range(0,arr1.shape[0]-1,s):
k.append(np.sum(arr1[beg+i : end+i, beg+j:end+j] * (arr2)))
final.append(k)
return np.array(final)
stride_conv(arr,arr2,2,0)
This results in 3*3 array:
array([[ 91, 100, 88],
[ 69, 91, 117],
[ 44, 72, 74]])
Is there a numpy function or scipy function to do the same? My approach is not that good. How can I vectorize this?
Ignoring the padding argument and trailing windows that won't have enough lengths for convolution against the second array, here's one way with np.lib.stride_tricks.as_strided -
def strided4D(arr,arr2,s):
strided = np.lib.stride_tricks.as_strided
s0,s1 = arr.strides
m1,n1 = arr.shape
m2,n2 = arr2.shape
out_shp = (1+(m1-m2)//s, m2, 1+(n1-n2)//s, n2)
return strided(arr, shape=out_shp, strides=(s*s0,s*s1,s0,s1))
def stride_conv_strided(arr,arr2,s):
arr4D = strided4D(arr,arr2,s=s)
return np.tensordot(arr4D, arr2, axes=((2,3),(0,1)))
Alternatively, we can use the scikit-image built-in view_as_windows to get those windows elegantly, like so -
from skimage.util.shape import view_as_windows
def strided4D_v2(arr,arr2,s):
return view_as_windows(arr, arr2.shape, step=s)
How about using signal.convolve2d from scipy?
My approach is similar to Jason's one but using indexing.
def strideConv(arr, arr2, s):
return signal.convolve2d(arr, arr2[::-1, ::-1], mode='valid')[::s, ::s]
Note that the kernal has to be reversed. For details, please see discussion here and here. Otherwise use signal.correlate2d.
Examples:
>>> strideConv(arr, arr2, 1)
array([[ 91, 80, 100, 84, 88],
[ 99, 106, 126, 92, 77],
[ 69, 98, 91, 93, 117],
[ 80, 79, 87, 93, 61],
[ 44, 72, 72, 63, 74]])
>>> strideConv(arr, arr2, 2)
array([[ 91, 100, 88],
[ 69, 91, 117],
[ 44, 72, 74]])
I think we can do a "valid" fft convolution and pick out only those results at strided locations, like this:
def strideConv(arr,arr2,s):
cc=scipy.signal.fftconvolve(arr,arr2[::-1,::-1],mode='valid')
idx=(np.arange(0,cc.shape[1],s), np.arange(0,cc.shape[0],s))
xidx,yidx=np.meshgrid(*idx)
return cc[yidx,xidx]
This gives same results as other people's answers.
But I guess this only works if the kernel size is odd numbered.
Also I've flipped the kernel in arr2[::-1,::-1] just to stay consistent with others, you may want to omit it depending on context.
UPDATE:
We currently have a few different ways of doing 2D or 3D convolution using numpy and scipy alone, and I thought about doing some comparisons to give some idea on which one is faster on data of different sizes. I hope this won't be regarded as off-topic.
Method 1: FFT convolution (using scipy.signal.fftconvolve):
def padArray(var,pad,method=1):
if method==1:
var_pad=numpy.zeros(tuple(2*pad+numpy.array(var.shape[:2]))+var.shape[2:])
var_pad[pad:-pad,pad:-pad]=var
else:
var_pad=numpy.pad(var,([pad,pad],[pad,pad])+([0,0],)*(numpy.ndim(var)-2),
mode='constant',constant_values=0)
return var_pad
def conv3D(var,kernel,stride=1,pad=0,pad_method=1):
'''3D convolution using scipy.signal.convolve.
'''
var_ndim=numpy.ndim(var)
kernel_ndim=numpy.ndim(kernel)
stride=int(stride)
if var_ndim<2 or var_ndim>3 or kernel_ndim<2 or kernel_ndim>3:
raise Exception("<var> and <kernel> dimension should be in 2 or 3.")
if var_ndim==2 and kernel_ndim==3:
raise Exception("<kernel> dimension > <var>.")
if var_ndim==3 and kernel_ndim==2:
kernel=numpy.repeat(kernel[:,:,None],var.shape[2],axis=2)
if pad>0:
var_pad=padArray(var,pad,pad_method)
else:
var_pad=var
conv=fftconvolve(var_pad,kernel,mode='valid')
if stride>1:
conv=conv[::stride,::stride,...]
return conv
Method 2: Special conv (see this anwser):
def conv3D2(var,kernel,stride=1,pad=0):
'''3D convolution by sub-matrix summing.
'''
var_ndim=numpy.ndim(var)
ny,nx=var.shape[:2]
ky,kx=kernel.shape[:2]
result=0
if pad>0:
var_pad=padArray(var,pad,1)
else:
var_pad=var
for ii in range(ky*kx):
yi,xi=divmod(ii,kx)
slabii=var_pad[yi:2*pad+ny-ky+yi+1:1, xi:2*pad+nx-kx+xi+1:1,...]*kernel[yi,xi]
if var_ndim==3:
slabii=slabii.sum(axis=-1)
result+=slabii
if stride>1:
result=result[::stride,::stride,...]
return result
Method 3: Strided-view conv, as suggested by Divakar:
def asStride(arr,sub_shape,stride):
'''Get a strided sub-matrices view of an ndarray.
<arr>: ndarray of rank 2.
<sub_shape>: tuple of length 2, window size: (ny, nx).
<stride>: int, stride of windows.
Return <subs>: strided window view.
See also skimage.util.shape.view_as_windows()
'''
s0,s1=arr.strides[:2]
m1,n1=arr.shape[:2]
m2,n2=sub_shape[:2]
view_shape=(1+(m1-m2)//stride,1+(n1-n2)//stride,m2,n2)+arr.shape[2:]
strides=(stride*s0,stride*s1,s0,s1)+arr.strides[2:]
subs=numpy.lib.stride_tricks.as_strided(arr,view_shape,strides=strides)
return subs
def conv3D3(var,kernel,stride=1,pad=0):
'''3D convolution by strided view.
'''
var_ndim=numpy.ndim(var)
kernel_ndim=numpy.ndim(kernel)
if var_ndim<2 or var_ndim>3 or kernel_ndim<2 or kernel_ndim>3:
raise Exception("<var> and <kernel> dimension should be in 2 or 3.")
if var_ndim==2 and kernel_ndim==3:
raise Exception("<kernel> dimension > <var>.")
if var_ndim==3 and kernel_ndim==2:
kernel=numpy.repeat(kernel[:,:,None],var.shape[2],axis=2)
if pad>0:
var_pad=padArray(var,pad,1)
else:
var_pad=var
view=asStride(var_pad,kernel.shape,stride)
#return numpy.tensordot(aa,kernel,axes=((2,3),(0,1)))
if numpy.ndim(kernel)==2:
conv=numpy.sum(view*kernel,axis=(2,3))
else:
conv=numpy.sum(view*kernel,axis=(2,3,4))
return conv
I did 3 sets of comparisons:
convolution on 2D data, with different input size and different kernel size, stride=1, pad=0. Results below (color as time used for convolution repeated for 10 times):
So "FFT conv" is in general the fastest. "Special conv" and "Stride-view conv" get slow as kernel size increases, but decreases again as it approaches the size of input data. The last subplot shows the fastest method, so the big triangle of purple indicates FFT being the winner, but note there is a thin green column on the left side (probably too small to see, but it's there), suggesting that "Special conv" has advantage for very small kernels (smaller than about 5x5). And when kernel size approaches input, "stride-view conv" is fastest (see the diagonal line).
Comparison 2: convolution on 3D data.
Setup: pad=0, stride=2, input dimension=nxnx5, kernel shape=fxfx5.
I skipped computations of "Special Conv" and "Stride-view conv" when kernel size is in the mid of input. Basically "Special Conv" shows no advantage now, and "Stride-view" is faster than FFT for both small and large kernels.
One additional note: when sizes goes above 350, I notice considerable memory usage peaks for the "Stride-view conv".
Comparison 3: convolution on 3D data with larger stride.
Setup: pad=0, stride=5, input dimension=nxnx10, kernel shape=fxfx10.
This time I omitted the "Special Conv". For a larger area "Stride-view conv" surpasses FFT, and last subplots shows that the difference approaches 100 %.
Probably because as the stride goes up, the FFT approach will have more wasted numbers so the "stride-view" gains more advantages for small and large kernels.
Here is an O(N^d (log N)^d) fft-based approach. The idea is to chop up both operands into strides-spaced grids at all offsets modulo strides, do the conventional fft convolution between grids of corresponding offsets and then pointwise sum the results. It is a bit index-heavy but I'm afraid that can't be helped:
import numpy as np
from numpy.fft import fftn, ifftn
def strided_conv_2d(x, y, strides):
s, t = strides
# consensus dtype
cdt = (x[0, 0, ...] + y[0, 0, ...]).dtype
xi, xj = x.shape
yi, yj = y.shape
# round up modulo strides
xk, xl, yk, yl = map(lambda a, b: -a//b * -b, (xi,xj,yi,yj), (s,t,s,t))
# zero pad to avoid circular convolution
xp, yp = (np.zeros((xk+yk, xl+yl), dtype=cdt) for i in range(2))
xp[:xi, :xj] = x
yp[:yi, :yj] = y
# fold out strides
xp = xp.reshape((xk+yk)//s, s, (xl+yl)//t, t)
yp = yp.reshape((xk+yk)//s, s, (xl+yl)//t, t)
# do conventional fft convolution
xf = fftn(xp, axes=(0, 2))
yf = fftn(yp, axes=(0, 2))
result = ifftn(xf * yf.conj(), axes=(0, 2)).sum(axis=(1, 3))
# restore dtype
if cdt in (int, np.int_, np.int64, np.int32):
result = result.real.round()
return result.astype(cdt)
arr = np.array([[2,3,7,4,6,2,9],
[6,6,9,8,7,4,3],
[3,4,8,3,8,9,7],
[7,8,3,6,6,3,4],
[4,2,1,8,3,4,6],
[3,2,4,1,9,8,3],
[0,1,3,9,2,1,4]])
arr2 = np.array([[3,4,4],
[1,0,2],
[-1,0,3]])
print(strided_conv_2d(arr, arr2, (2, 2)))
Result:
[[ 91 100 88 23 0 29]
[ 69 91 117 19 0 38]
[ 44 72 74 17 0 22]
[ 16 53 26 12 0 0]
[ 0 0 0 0 0 0]
[ 19 11 21 -9 0 6]]
As far as I know, there is no direct implementation of convolution filter in numpy or scipy that supports stride and padding so I think it's better to use a DL package such as torch or tensorflow, then cast the final result to numpy. a torch implementation might be:
import torch
import torch.nn.functional as F
arr = torch.tensor(np.expand_dims(arr, axis=(0,1))
arr2 = torch.tensor(np.expand_dims(arr2, axis=(0,1))
output = F.conv2d(arr, arr2, stride=2, padding=0)
output = output.numpy().squeeze()
output>
array([[ 91, 100, 88],
[ 69, 91, 117],
[ 44, 72, 74]])
Convolution which supports strides and dilation. numpy.lib.stride_tricks.as_strided is used.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def conv_view(X, F_s, dr, std):
X_s = np.array(X.shape)
F_s = np.array(F_s)
dr = np.array(dr)
Fd_s = (F_s - 1) * dr + 1
if np.any(Fd_s > X_s):
raise ValueError('(Dilated) filter size must be smaller than X')
std = np.array(std)
X_ss = np.array(X.strides)
Xn_s = (X_s - Fd_s) // std + 1
Xv_s = np.append(Xn_s, F_s)
Xv_ss = np.tile(X_ss, 2) * np.append(std, dr)
return as_strided(X, Xv_s, Xv_ss, writeable=False)
def convolve_stride(X, F, dr=None, std=None):
if dr is None:
dr = np.ones(X.ndim, dtype=int)
if std is None:
std = np.ones(X.ndim, dtype=int)
if not (X.ndim == F.ndim == len(dr) == len(std)):
raise ValueError('X.ndim, F.ndim, len(dr), len(std) must be the same')
Xv = conv_view(X, F.shape, dr, std)
return np.tensordot(Xv, F, axes=X.ndim)
I already found two solutions for the strides moving windows which can compute mean, max, min, variance, etc. Now, I look to add a count of unique value function by axis. By axis, I mean compute all 2D arrays in single pass.
len(numpy.unique(array)) can make it but a lot of iterations will be needed to compute all arrays. I may work with image as big as 2000 x 2000, so iterations are not a good option. It's all about performance and memory effectiveness.
Here is the two solutions for the strides moving windows:
First is directly taken from Erik Rigtorp's at http://www.mail-archive.com/numpy-discussion#scipy.org/msg29450.html
import numpy as np
def rolling_window_lastaxis(a, window):
if window < 1:
raise ValueError, "`window` must be at least 1."
if window > a.shape[-1]:
raise ValueError, "`window` is too long."
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def rolling_window(a, window):
if not hasattr(window, '__iter__'):
return rolling_window_lastaxis(a, window)
for i, win in enumerate(window):
if win > 1:
a = a.swapaxes(i, -1)
a = rolling_window_lastaxis(a, win)
a = a.swapaxes(-2, i)
return a
filtsize = (3, 3)
a = np.zeros((10,10), dtype=np.float)
a[5:7,5] = 1
b = rolling_window(a, filtsize)
blurred = b.mean(axis=-1).mean(axis=-1)
Second is from Alex Rogozhnikov at http://gozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html.
def compute_window_mean_and_var_strided(image, window_w, window_h):
w, h = image.shape
strided_image = np.lib.stride_tricks.as_strided(image,
shape=[w - window_w + 1, h - window_h + 1, window_w, window_h],
strides=image.strides + image.strides)
# important: trying to reshape image will create complete 4-dimensional compy
means = strided_image.mean(axis=(2,3))
mean_squares = (strided_image ** 2).mean(axis=(2, 3))
maximums = strided_image.max(axis=(2,3))
variations = mean_squares - means ** 2
return means, maximums, variations
image = np.random.random([500, 500])
compute_window_mean_and_var_strided(image, 20, 20)
Is there a way to add/implement a count of unique value function in one or both solutions?
Clarification: Basically, I need a Unique Value filter for a 2D array, just like numpy.ndarray.mean.
Thanks you
Alex
Here's one approach with scikit-image's view_as_windows for efficient sliding window extraction.
Steps involved :
Get sliding windows.
Reshape into 2D array. Note that this would make a copy and thus we would lose the efficiency of views, but keep it vectorized.
Sort along the axis of merged block axes.
Get the differentiation along that axes and count the number of different elements, which when added with 1 would be the count of unique values in each of those sliding windows and hence the final expected result.
The implementation would be like so -
from skimage.util import view_as_windows as viewW
def sliding_uniq_count(a, BSZ):
out_shp = np.asarray(a.shape) - BSZ + 1
a_slid4D = viewW(a,BSZ)
a_slid2D = np.sort(a_slid4D.reshape(-1,np.prod(BSZ)),axis=1)
return ((a_slid2D[:,1:] != a_slid2D[:,:-1]).sum(1)+1).reshape(out_shp)
Sample run -
In [233]: a = np.random.randint(0,10,(6,7))
In [234]: a
Out[234]:
array([[6, 0, 5, 7, 0, 8, 5],
[3, 0, 7, 1, 5, 4, 8],
[5, 0, 5, 1, 7, 2, 3],
[5, 1, 3, 3, 7, 4, 9],
[9, 0, 7, 4, 9, 1, 1],
[7, 0, 4, 1, 6, 3, 4]])
In [235]: sliding_uniq_count(a, [3,3])
Out[235]:
array([[5, 4, 4, 7, 7],
[5, 5, 4, 6, 7],
[6, 6, 6, 6, 6],
[7, 5, 6, 6, 6]])
Hybrid approach
To make it work with very large arrays, to accommodate everything into memory, we might have to keep one loop that would iterate along each row of the input data, like so -
def sliding_uniq_count_oneloop(a, BSZ):
S = np.prod(BSZ)
out_shp = np.asarray(a.shape) - BSZ + 1
a_slid4D = viewW(a,BSZ)
out = np.empty(out_shp,dtype=int)
for i in range(a_slid4D.shape[0]):
a_slid2D_i = np.sort(a_slid4D[i].reshape(-1,S),-1)
out[i] = (a_slid2D_i[:,1:] != a_slid2D_i[:,:-1]).sum(-1)+1
return out
Hybrid approach - Version II
Another version of hybrid one, with the explicit usage of np.lib.stride_tricks.as_strided -
def sliding_uniq_count_oneloop(a, BSZ):
S = np.prod(BSZ)
out_shp = np.asarray(a.shape) - BSZ + 1
strd = np.lib.stride_tricks.as_strided
m,n = a.strides
N = out_shp[1]
out = np.empty(out_shp,dtype=int)
for i in range(out_shp[0]):
a_slid3D = strd(a[i], shape=((N,) + tuple(BSZ)), strides=(n,m,n))
a_slid2D_i = np.sort(a_slid3D.reshape(-1,S),-1)
out[i] = (a_slid2D_i[:,1:] != a_slid2D_i[:,:-1]).sum(-1)+1
return out
np.mean operates on a given axis without making any copies. Looking at just the shape of the as_strided array it looks much bigger than the original array. But because each 'window' is a view, it doesn't take up any additional space. Reduction operators like mean work fine with that kind of view.
But note that your second example warns about reshape. That creates a copy; it replicates the values in all of those windows.
unique starts with
ar = np.asanyarray(ar).flatten()
so right off the bat is is making a reshapened copy. It's a copy, and 1d. Then it sorts elements, looks for duplicates etc.
There are ways of finding unique rows, but they require converting rows into large structured array elements. In effect turning a 2d array into a 1d that unique can work with.
I have a number of different size arrays with a common index.
For example,
Arr1 = np.arange(0, 1000, 1).reshape(100, 10)
Arr2 = np.arange(0, 500, 1).reshape(100,5)
Arr1.shape = (100, 10)
Arr2.shape = (100, 5)
I want to add these together into a new Array, Arr3 which is three dimensional. e.g.
Arr3 = Arr1 + Arr2
Arr3.shape = (100, 10, 5)
Note, in this instance the values should allign e.g.
Arr3[10, 3, 2] = Arr1[10, 3] + Arr2[10, 2]
I have been attempting to use the following method
test = Arr1.copy()
test = test[:, np.newaxis] + Arr2
Now, I've been able to make this work when adding two square matrices together.
m = np.arange(0, 100, 1)
[x, y] = np.meshgrid(x, y)
x.shape = (100, 100)
test44 = x.copy()
test44 = test44[:, np.newaxis] + x
test44.shape = (100, 100, 100)
test44[4, 3, 2] = 4
x[4, 2] = 2
x[3, 2] = 2
However, in my actual program I will not have square matrices for this issue.
In addition this method is extremely memory intensive as evidenced when you begin moving up the number of dimensions as follows.
test44 = test44[:, :, np.newaxis] + x
test44.shape = (100, 100, 100, 100)
# Note this next command will fail with a memory error on my computer.
test44 = test44[:, :, :, np.newaxis] + x
So my question has two parts:
Is it possible to create a 3D array from two differently shaped 2D array with a common "shared" axis.
Is such a method extensible at higher order dimensions?
Any assistance is greatly appreciated.
Yes what you're trying to do is called broadcasting, it's done automatically by numpy if the inputs have the right shapes. Try this:
Arr1 = Arr1.reshape((100, 10, 1))
Arr2 = Arr2.reshape((100, 1, 5))
Arr3 = Arr1 + Arr2
I've found this to be a really good introduction to broadcasting which should show you how to extend this kind of behavior to n dimensions.