Applying a Fast Coordinate Transformation in Python - python

I have a simple 2x2 transformation matrix, s, which encodes some liner transformation of coordinates such that X' = sX.
I have generated a set of uniformley distributed coordinates on a grid using the np.meshgrid() function and at the moment I traverse each coordinate and apply the transformation at a coordinate by coordinate level. Unfortunately, this very slow for large arrays. Are there any fast ways of doing this? Thanks!
import numpy as np
image_dimension = 1024
image_index = np.arange(0,image_dimension,1)
xx, yy = np.meshgrid(image_index,image_index)
# Pre-calculated Transformation Matrix.
s = np.array([[ -2.45963439e+04, -2.54997726e-01], [ 3.55680731e-02, -2.48005486e+04]])
xx_f = xx.flatten()
yy_f = yy.flatten()
for x_t in range(0, image_dimension*image_dimension):
# Get the current (x,y) coordinate.
x_y_in = np.matrix([[xx_f[x_t]],[yy_f[x_t]]])
# Perform the transformation with x.
optout = s * x_y_in
# Store the new coordinate.
xx_f[x_t] = np.array(optout)[0][0]
yy_f[x_t] = np.array(optout)[1][0]
# Reshape Output
xx_t = xx_f.reshape((image_dimension, image_dimension))
yy_t = yy_f.reshape((image_dimension, image_dimension))

You can use the numpy dot function to get the dot product of your matices as:
xx_tn,yy_tn = np.dot(s,[xx.flatten(),yy.flatten()])
xx_t = xx_tn.reshape((image_dimension, image_dimension))
yy_t = yy_tn.reshape((image_dimension, image_dimension))
Which is much faster

Loops are slow in Python. It is better to use vectorization.
In a nutshell, the idea is to let numpy do the loops in C, which is much faster.
You can express your problem as matrix multiplications X' = sX, where you put all the points in X and transform them all with just one call to numpy's dot product:
xy = np.vstack([xx.ravel(), yy.ravel()])
xy_t = np.dot(s, xy)
xx_t, yy_t = xy_t.reshape((2, image_dimension, image_dimension))

Related

Memory Efficient Nearest Neighbour Algorithm

I have 10,00,000 agents, each associated with (x,y) coordinates. I am trying to find agents close to each other (radius=1.5). I tried to implement this using PyTorch:
X = torch.DoubleTensor(1000000,2).uniform_(0,10000)
torch.cdist(X,X,p=2)
However, with this the session crashes. I am running this on google colab. The same happened when I tried constructing the graph using, radius_neighbors_graph of scikit-learn package. It would be of great help if someone suggested a memory efficient way to implement the same.
It's unlikely that you'll be able to compute a 1M*1M matrix in its entirety without thinking it through very carefully. You probably want something along the lines of scipy.spatial.KDTree. Once you've constructed a tree, you can pass the coordinates of an agent to the query method to get its neighbors within a certain radius. To get all the neighbors at once, you can come compute something like sparse_distance_matrix of the tree with itself at an appropriate threshold.
Alternatively, you can look into any number of efficient clustering algorithms.
I found three solutions,
Solution 1
import torch
x = torch.randn(3000000, 2).cuda()
y = x
# Turn our Tensors into KeOps symbolic variables:
from pykeops.torch import LazyTensor
x_i = LazyTensor( x[:,None,:] )
y_j = LazyTensor( y[None,:,:] )
# We can now perform large-scale computations, without memory overflows:
D_ij = ((x_i - y_j)**2).sum(dim=2)
D_ij.argKmin(20,dim=1)
Solution 2
M = 3000000
import numpy as np
from pykeops.numpy import LazyTensor as LazyTensor_np
x = np.random.rand(M, 2)
y = x
x_i = LazyTensor_np(
x[:, None, :]
) # (M, 1, 2) KeOps LazyTensor, wrapped around the numpy array x
y_j = LazyTensor_np(
y[None, :, :]
) # (1, N, 2) KeOps LazyTensor, wrapped around the numpy array y
D_ij = ((x_i - y_j) ** 2).sum(-1) # **Symbolic** (M, N) matrix of squared distances
s_i = D_ij.argKmin(20,dim=1).ravel() # genuine (M,) array of integer indices
Solution 3
from sklearn.neighbors import NearestNeighbors
import numpy as np
M = 3000000
x = np.random.rand(M, 2)
nbrs = NearestNeighbors(n_neighbors=20, algorithm='ball_tree').fit(x)
distances, indices = nbrs.kneighbors(x)
Although the execution time of all the three solutions is the same, a minute, the memory requirements are approximately 2GB, 1GB and 1.3GB, respectively. It would be great to hear ideas to lower the execution time.

Distance matrix between two point layers

I have two arrays containing point coordinates as shapely.geometry.Point with different sizes.
Eg:
[Point(X Y), Point(X Y)...]
[Point(X Y), Point(X Y)...]
I would like to create a "cross product" of these two arrays with a distance function. Distance function is from shapely.geometry, which is a simple geometry vector distance calculation. I am tryibg to create distance matrix between M:N points:
Right now I have this function:
source = gpd.read_file(source)
near = gpd.read_file(near)
source_list = source.geometry.values.tolist()
near_list = near.geometry.values.tolist()
array = np.empty((len(source.ID_SOURCE), len(near.ID_NEAR)))
for index_source, item_source in enumerate(source_list):
for index_near, item_near in enumerate(near_list):
array[index_source, index_near] = item_source.distance(item_near)
df_matrix = pd.DataFrame(array, index=source.ID_SOURCE, columns = near.ID_NEAR)
Which does the job fine, but is slow. 4000 x 4000 points is around 100 seconds (I have datasets which are way bigger, so speed is main issue). I would like to avoid this double loop if possible. I tried to do in in pandas dataframe as in (which has terrible speed):
for index_source, item_source in source.iterrows():
for index_near, item_near in near.iterrows():
df_matrix.at[index_source, index_near] = item_source.geometry.distance(item_near.geometry)
A bit faster is (but still 4x slower than numpy):
for index_source, item_source in enumerate(source_list):
for index_near, item_near in enumerate(near_list):
df_matrix.at[index_source, index_near] = item_source.distance(item_near)
Is there a faster way to do this? I guess there is, but I have no idea how to proceed. I might be able to chunk the dataframe into smaller pieces and send the chunk onto different core and concat the results - this is the last resort. If somehow we can use numpy only with some indexing only magic, I can send it to GPU and be done with it in no time. But the double for loop is a no no right now. Also I would like to not use any other library than Pandas/Numpy. I can use SAGA processing and its Point distances module (http://www.saga-gis.org/saga_tool_doc/2.2.2/shapes_points_3.html), which is pretty damn fast, but I am looking for Python only solution.
If you can get the coordinates in separate vectors, I would try this:
import numpy as np
x = np.asarray([5.6, 2.1, 6.9, 3.1]) # Replace with data
y = np.asarray([7.2, 8.3, 0.5, 4.5]) # Replace with data
x_i = x[:, np.newaxis]
x_j = x[np.newaxis, :]
y_i = y[:, np.newaxis]
y_j = y[np.newaxis, :]
d = (x_i-x_j)**2+(y_i-y_j)**2
np.sqrt(d, out=d)

Fastest way to convert a set of 3D points into image of heights in python

I am trying to convert a set of 3D points into a heightmap (a 2d image that shows the largest displacements of the points from the floor)
The only way I can come up with is writing a for look that iterates through all points and update the heightmap, this method, is quite slow.
import numpy as np
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
heightmap = np.zeros((int(np.max(points[:,1])/heightmap_resolution) + 1,
int(np.max(points[:,0])/heightmap_resolution) + 1))
for point in points:
y = int(point[1]/heightmap_resolution)
x = int(point[0]/heightmap_resolution)
if point[2] > heightmap[y][x]:
heightmap[y][x] = point[2]
I wonder if there is a better way of doing this. Any improvement is greatly appreciated!
The intuition:
If you find yourself using a for loop with numpy, you probably need to check again if numpy has an operation for it. I saw you wanted to compare items to get max and I wasn't sure if the structure was imporant so I changed it.
2nd point is heightmap is pre-allocating a lot of memory you aren't going to use. Try using a dictionary with a tuple (x,y) as the key or this (a dataframe)
import numpy as np
import pandas as pd
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
points_df = pd.DataFrame(points, columns = ['x','y','z'])
#didn't know if you wanted to keep the x and y columns so I made new ones.
points_df['x_normalized'] = (points_df['x']/heightmap_resolution).astype(int)
points_df['y_normalized'] = (points_df['y']/heightmap_resolution).astype(int)
points_df.groupby(['x_normalized','y_normalized'])['z'].max()

Simultaneously fit linearly every line of a 2d numpy array

I am working in Python on image analysis. I have an image (2d numpy array) with some intensity drift in it. I want to level it.
To remove the increasing/decreasing intensity over the width of the image, I want to fit every row of the 2d numpy array with a line. I however do not want to loop through every row index.
MWE:
import numpy as np
import matplotlib.pyplot as plt
width=1500
height=2500
np.random.random((width,height))
fill_fun = lambda x,a,b : a*x+b
play_image = fill_fun(np.tile(np.arange(width),(height,1)),0.15,2)+np.random.random( (height,width) )
#For representation purposes:
#plt.imshow(play_image,cmap='Greys_r')
#plt.show()
#1) Fit every row and kill the intensity decrease/increase tendency
fit_func = lambda p,x: p[0]*x+b
errfunc = lambda p, x, y: abs(fitfunc(p, x) - y) # Distance to the target function
x_axis=np.linspace(0,width,width)
for i in range(height):
row_val=play_image[i,:]
p0=[(row_val[-1]-row_val[0])/float(width),row_val[0]] #guess
p1, success = optimize.leastsq(errfunc, p0[:], args=(x_axis,row_val))
play_image[i,:]-= fit_func(p1,x_axis)-p1[1]
By doing this I effectively level my image intensity horizontally. Is there anyway I can replace the loop by a matrix operation ? To somehow fit all the lines at the same time with a (height,2) parameter vector ?
Thanks for the help
Fitting a line is a simple formula to use directly, which can be done about three short lines in numpy (most of the code below is just making and plotting the data and fits):
import numpy as np
import matplotlib.pyplot as plt
# make the data as sequential sections of a circle
theta = np.linspace(np.pi, 0, 120)
y = np.reshape(np.sin(theta), (10,12))
x = np.repeat(np.arange(12)[None,:], 10, axis=0)
# fit the line
m = lambda x: np.mean(x, axis=1)
beta = ( m(y*x) - m(x)*m(y) )/(m(x*x) - m(x)**2)
alpha = m(y) - beta*m(x)
# plot the data and fits
plt.plot([y[:,i] for i in range(12)], ".") # plot the data
plt.gca().set_color_cycle(None) # reset the color cycle
fits = alpha[:,None] + beta[:,None]*x # make lines from the fits for the plots
plt.plot(fits.T)
plt.show()
You can implement the normal equations and their solution pretty easily. The main challenge is keeping track of the appropriate dimensions so all the vectorized operations work correctly. Here's one method:
import numpy as np
# image size
m = 100
n = 125
# A random image to work with.
np.random.seed(123)
img = np.random.randint(0, 100, size=(m, n))
# X is the design matrix. It is the same for each row. It has shape (n, 2).
X = np.column_stack((np.ones(n), np.arange(n)))
# A is X.T.dot(X), but in this case we can use an explicit formula for each term.
s1 = 0.5*n*(n - 1) # Sum of integers
s2 = n*(n - 0.5)*(n - 1)/3.0 # Sum of squared integers
A = np.array([[n, s1], [s1, s2]])
# Y has shape (2, m). Each column is a vector on the right-hand-side of the
# normal equations.
Y = X.T.dot(img.T)
# Solve the normal equations. beta has shape (2, m). Each column gives the
# coefficients of the linear fit for each row of img.
beta = np.linalg.solve(A, Y)
# Create an array that holds the linear drift for each row.
# X has shape (n, 2) and beta has shape (2, m), so row_drift has shape (m, n),
# the same as img.
row_drift = X.dot(beta).T
# Remove the drift from img.
img2 = img - row_drift

Pass coordinates of 2D Numpy pixel array to distance function

I'm working on an image processing program with OpenCV and numpy. For most pixel operations, I'm able to avoid nested for loops by using np.vectorize(), but one of the functions I need to implement requires as a parameter the 'distance from center', or basically the coordinates of the point being processed.
Pseudoexample :
myArr = [[0,1,2]
[3,4,5]]
def myFunc(val,row,col):
return [row,col]
f = np.vectorize(myFunc)
myResult = f(myArr,row,col)
I obviously can't get elemX and elemY from the vectorized array, but is there another numpy function I could use in this situation or do I have to use for loops?, Is there a way to do it using openCV?
The function I need to put each pixel through is :
f(i, j) = 1/(1 + d(i, j)/L) , d(i,j) being the euclidean distance of the point from the center of the image.
You can get an array of distance from the center using the following lines (which is an example, there are a lot of ways to do this):
import numpy as np
myArr = np.array([[0,1,2], [3,4,5]])
nx, ny = myArr.shape
x = np.arange(nx) - (nx-1)/2. # x an y so they are distance from center, assuming array is "nx" long (as opposed to 1. which is the other common choice)
y = np.arange(ny) - (ny-1)/2.
X, Y = np.meshgrid(x, y)
d = np.sqrt(X**2 + Y**2)
# d =
# [[ 1.11803399 1.11803399]
# [ 0.5 0.5 ]
# [ 1.11803399 1.11803399]]
Then you can calculate f(i, j) by:
f = 1/(1 + d/L)
As an aside, your heavy use of np.vectorize() is a bit dubious. Are you sure it's doing what you want, and did you note the statement from the documentation:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
It's generally better to just write you code in vectorized form (like my line for f above which will work whether L is an array or a scaler), and not use numpy.vectorize(), and these are different things.
np.vectorize don't accelerate the code, you can vectorize it this way, `
# This compute distance between all points of MyArray and the center
dist_vector= np.sqrt(np.sum(np.power(center-MyArray,2),axis=1))
# F will contain the target value for each point
F = 1./(1 + 1. * dist_vector/L)

Categories

Resources