How to evaluate the sum of values within array blocks - python

I have data array, with shape 100x100. I want to divide it into 5x5 blocks, and each block has 20x20 grids. The value of each block I want is the sum of all values in it.
Is there a more elegant way to accomplish it?
x = np.arange(100)
y = np.arange(100)
X, Y = np.meshgrid(x, y)
Z = np.cos(X)*np.sin(Y)
Z_new = np.zeros((5, 5))
for i in range(5):
for j in range(5):
Z_new[i, j] = np.sum(Z[i*20:20+i*20, j*20:20+j*20])
This is based on index, how if based on x?
x = np.linspace(0, 1, 100)
y = np.linspace(0, 1, 100)
X, Y = np.meshgrid(x, y)
Z = np.cos(X)*np.sin(Y)
x_new = np.linspace(0, 1, 15)
y_new = np.linspace(0, 1, 15)
Z_new?

Simply reshape splitting each of those two axes into two each with shape (5,20) to form a 4D array and then sum reduce along the axes having the lengths 20, like so -
Z_new = Z.reshape(5,20,5,20).sum(axis=(1,3))
Functionally the same, but potentially faster option with np.einsum -
Z_new = np.einsum('ijkl->ik',Z.reshape(5,20,5,20))
Generic block size
Extending to a generic case -
H,W = 5,5 # block-size
m,n = Z.shape
Z_new = Z.reshape(H,m//H,W,n//W).sum(axis=(1,3))
With einsum that becomes -
Z_new = np.einsum('ijkl->ik',Z.reshape(H,m//H,W,n//W))
To compute average/mean across blocks, use mean instead of sum method.
Generic block size and reduction operation
Extending to use reduction operations that have ufuncs supporting multiple axes parameter with axis for reductions, it would be -
def blockwise_reduction(a, height, width, reduction_func=np.sum):
m,n = a.shape
a4D = a.reshape(height,m//height,width,n//width)
return reduction_func(a4D,axis=(1,3))
Thus, to solve our specific case, it would be :
blockwise_reduction(Z, height=5, width=5)
and for a block-wise average computation, it would be -
blockwise_reduction(Z, height=5, width=5, reduction_func=np.mean)

You can do following.
t = np.eye(5).repeat(20, axis=1)
Z_new = t.dot(Z).dot(t.T)
This is correct because Z_new[i, j] = t[i, k] * Z[k, l] * t[j, l]
Also this seems faster than Divakar's solution.

Such a problem is a very good candidate for a function like scipy.ndimage.measurements.sum since it allows "grouping" and "labelling" terms. You will have what you want with something like:
labels = [[20*(y//5) + x//5 for x in range(100)] for y in range(100)]
s = scipy.ndimage.measurements.sum(Z, labels, range(400))
(Not tested, but that is the idea).

Related

Spatial Encoding (sum of elements within a specific region in a numpy array) [duplicate]

I have data array, with shape 100x100. I want to divide it into 5x5 blocks, and each block has 20x20 grids. The value of each block I want is the sum of all values in it.
Is there a more elegant way to accomplish it?
x = np.arange(100)
y = np.arange(100)
X, Y = np.meshgrid(x, y)
Z = np.cos(X)*np.sin(Y)
Z_new = np.zeros((5, 5))
for i in range(5):
for j in range(5):
Z_new[i, j] = np.sum(Z[i*20:20+i*20, j*20:20+j*20])
This is based on index, how if based on x?
x = np.linspace(0, 1, 100)
y = np.linspace(0, 1, 100)
X, Y = np.meshgrid(x, y)
Z = np.cos(X)*np.sin(Y)
x_new = np.linspace(0, 1, 15)
y_new = np.linspace(0, 1, 15)
Z_new?
Simply reshape splitting each of those two axes into two each with shape (5,20) to form a 4D array and then sum reduce along the axes having the lengths 20, like so -
Z_new = Z.reshape(5,20,5,20).sum(axis=(1,3))
Functionally the same, but potentially faster option with np.einsum -
Z_new = np.einsum('ijkl->ik',Z.reshape(5,20,5,20))
Generic block size
Extending to a generic case -
H,W = 5,5 # block-size
m,n = Z.shape
Z_new = Z.reshape(H,m//H,W,n//W).sum(axis=(1,3))
With einsum that becomes -
Z_new = np.einsum('ijkl->ik',Z.reshape(H,m//H,W,n//W))
To compute average/mean across blocks, use mean instead of sum method.
Generic block size and reduction operation
Extending to use reduction operations that have ufuncs supporting multiple axes parameter with axis for reductions, it would be -
def blockwise_reduction(a, height, width, reduction_func=np.sum):
m,n = a.shape
a4D = a.reshape(height,m//height,width,n//width)
return reduction_func(a4D,axis=(1,3))
Thus, to solve our specific case, it would be :
blockwise_reduction(Z, height=5, width=5)
and for a block-wise average computation, it would be -
blockwise_reduction(Z, height=5, width=5, reduction_func=np.mean)
You can do following.
t = np.eye(5).repeat(20, axis=1)
Z_new = t.dot(Z).dot(t.T)
This is correct because Z_new[i, j] = t[i, k] * Z[k, l] * t[j, l]
Also this seems faster than Divakar's solution.
Such a problem is a very good candidate for a function like scipy.ndimage.measurements.sum since it allows "grouping" and "labelling" terms. You will have what you want with something like:
labels = [[20*(y//5) + x//5 for x in range(100)] for y in range(100)]
s = scipy.ndimage.measurements.sum(Z, labels, range(400))
(Not tested, but that is the idea).

How to create a multi-dimensional grid in python

I have seen similar questions but none that need the format of the output array of shape (numpoints, dim)
Here is a simple example of what I have for dim=2
import numpy as np
bounds = [0.5, 0.5]
n = [10,10]
dim = 2
x = np.linspace(-bounds[0], bounds[0], n[0])
y = np.linspace(-bounds[1], bounds[1], n[1])
X, Y = np.meshgrid(x, y)
s = X.shape
data = np.zeros((n[0]*n[1],dim))
# convert mesh into point vector for which the model can be evaluated
c = 0
for i in range(s[0]):
for j in range(s[1]):
data[c,0] = X[i,j]
data[c,1] = Y[i,j]
c = c+1;
plt.scatter(data[:,0], data[:,1])
Is there a faster/better way of doing this so that the data are arranged in this way? I want a general method that could work for any dim.
Edit: Suggested answer does not work.
Yeah, that can be vectorized with
axis_coords = np.meshgrid(x, y, indexing='xy')
data = np.hstack([c.reshape(-1, 1) for c in axis_coords])
c.reshape(-1, 1) just reshapes c from (HxW to (H*W)x1) so that it can be stacked horizontally.
Note - if you're looking to generalize to more dims you probably want to switch to indexing='ij' so it's arranged by (row, column, dim2, dim3, ...) rather than (column, row, dim2, dim3, ...) since in numpy rows are considered the 0'th dimension and columns the 1st.
I managed to solve my problem with this function that is general enough for any dim:
def get_grid_of_points(n, *args):
ls = [np.linspace(-i,i,n) for i in args]
mesh_ls = np.meshgrid(*ls)
all_mesh = [np.reshape(x, [-1]) for x in mesh_ls]
grid_points = np.stack(all_mesh, axis=1)
return grid_points
get_grid_of_points(10, 0.5, 0.5)

Poor parallelization using dask

I have a 2D grid on which there is a path. I want to calculate the distances of each point of the grid to each point on the path, then do some operations on those grid. I am using dask.dataframe and dask.array for this task.
The code is:
import dask.dataframe as dd
import dask.array as da
x = np.linspace(-60, 60, 10000)
xv, yv = da.meshgrid(x, x, sparse='True')
path = da.from_array(np.random.rand(100, 2))
h = 100.0
# function to calculate distance to point
def dist_to_point(x, y, p):
x_dist = x-p[0]
y_dist = y-p[1]
dist = da.sqrt(x_dist**2+y_dist**2)
d2 = da.sqrt(dist**2 + h**2)
return dd.from_dask_array(d2)
distances = [dist_to_point(xv, yv, path[i, :]) for i in range(npath)]
distances_grid = dd.multi.concat(distances, axis=1, ignore_index=True)
So distances_grid should the concatenation of [grid distance to point 1, grid distance to point 2, ..., grid distance to point 100]
Now suppose I want to get the max across all dataframes I apply this
l_max = distances_grid.map_partitions(lambda x: x.groupby(level=0, axis=1).max())
The dask graph for this looks like this which to me does not look like proper parallelization of the tasks. Can anyone help point me to what I am doing wrong or how I can improve this? My final application will be on 100000x100000 grids hence the use of dask
So in case anyone runs into this I solved it by broadcasting the arrays and avoiding the for loop all together. The code I ended up using is
x = da.from_array(np.linspace(-60, 60, 10000), chunks=1000)
xv, yv = da.meshgrid(x, x, sparse='True')
path = da.from_array(np.random.rand(10, 2))
h = 100.0
ngrid = x.shape[0]
xd = x[:, np.newaxis] - path[:, 0]
yd = x[:, np.newaxis] - path[:, 1]
z = xd**2 + yd[:, np.newaxis]**2 + h**2
# euclidian distance at height = 100
z = xd**2 + yd[:, np.newaxis]**2 + h**2
distances_grid = z**0.5
l_max = distances_grid.max(axis=2)
This gave me a nicer graph which I am able to balance even more by changing the sizes of the chunks.

Vecrtorized evluation of function defined by matrix over grid

I'm looking to plot the value of a function defined by a matrix over a grid of values.
Let S be an invertable 2x2 matrix and let x be a 2-dimensional vector. How can vectorize the evaluation of x#S#x over a two dimensional grid?
Here is how I currently do it. It works, but takes a beat to perform the computation since the grid is so fine.
#Initialize Matrix
S = np.zeros(shape = (2,2))
while np.linalg.matrix_rank(S)<S.shape[1]:
S = np.random.randint(-5,5+1, size = (2,2))
X,Y = [j.ravel() for j in np.meshgrid(np.linspace(-2,2,1001),np.linspace(-2,2,1001))]
Z = np.zeros_like(X)
for i,v in enumerate(zip(X,Y)):
v = np.array(v)
Z[i] = v#S#v
n = int(np.sqrt(X.size))
Z = Z.reshape(n,n)
X = X.reshape(n,n)
Y = Y.reshape(n,n)
plt.contour(X,Y,Z)
Simplest would be with stacking those X,Y into a 2-column 2D array and then using np.einsum to replace the loopy matrix-multiplications -
p = np.column_stack((X,Y)) # or np.stack((X,Y)).T
Zout = np.einsum('ij,jk,ik->i',p,S,p,optimize=True)

numpy's interp function - how to find a value of x for a given value of y?

So I have an array of values of x (in increasing order) and the corresponding y values. Numpy's interp function takes in the X value, and the x and y arrays. How do I get the value of X for a given value of Y?? Eg if y = 0, x = ?.
Cheers!
code:
j = (((1840/(2*pi))**0.5)*exp(phi)) - 1.0 #y axis, phi is a 1-D array
t = linspace(0, 40, 100) #x axis
a = interp(<x-value>,t,j) # this gives me the corresponding y value!
So what should I do to get the x value for a given y value!!
y_interp = np.interp(x_interp, x, y) yields an interpolation of the function y_interp = f(x_interp) based on a previous interpolation y = f(x), where x.size = y.size, x_interp.size = y_interp.size.
If you want x as a function of y, you have to construct the inverse function. As #NPE indicated, you have to make sure that x and y are always increasing. An easy way to check this is to use
np.all(np.diff(x) > 0)
np.all(np.diff(y) > 0)
Now finding the inverse function is actually very simple: you have to reverse the roles of x and y (since we deal with interpolations).
With my notations: x_value = np.interp(y_value, x, y).
With your notations: x_value = interp(y_value, j, t)
When using Numpy make sure you're using a Numpy array: np.array([])
#x data array
x_data = np.array([1,2,3,4])
#y data array
y_data = np.array([1,3,2,1])
#the known max y value is 3
y_max = 3
# sort the arrays
order = y_data.argsort()
y_data = y_data[order]
x_data = x_data[order]
# call the interpolation function but reverse the array datasets to find the corresponding x value
x = np.interp(y_max, y_data, x_data, left=None, right=None, period=None)
For this example the result is x = 2 when max y = 3. The maximum is (2,3).
Furthermore you can append data to a numpy array: x_data = np.append(x_data, appendedValue)

Categories

Resources