Distances to positions on a square integer lattice - python

I want to calculate the distance to every other position in an integer lattice from it's centre and the number of positions at each distance. I'm currently using the following code to calculate this:
x = numpy.arange(-10, 11, 1)
[X, Y] = numpy.meshgrid(x, x)
R = numpy.sqrt(X**2+Y**2)
R2 = numpy.ndarray.flatten(R)
R3 = numpy.unique(R2)
r = R3[1:] # excludes the 0
Nr = numpy.zeros(numpy.size(r))
for i in range(numpy.size(r)):
Nr[i] = numpy.count_nonzero(R2 == r[i]
This tells me that the possible distances are 1, sqrt2, 2, sqrt5 etc.
It also tells me their is 4x1, 4xsqrt2, 4x2, 8xsqrt5 etc.
As this is a common problem in physics I was wondering if there is a function from a library such as numpy or scipy which could return these values more easily.

The lattice is centered at (0,0). So it's symmetric on the four quadrants. So, we can use this restriction to our advantage as we could compute those required unique distances and counts for one quadrant and multiply those counts by 4 to simulate for all four quadrants.
So, let's say we use the first quadrant (upper right quad). We would skip the elements on the (y = 0) line, because otherwise with the multiplication by 4 for simulating on all four quadrants would result in duplicating results. Additionally, this way we won't have to exclude the first element, as done in the original post.
Thus, an implementation would be -
N = 11 # Lattice size
xa, ya = np.ogrid[0:N,1:N] # x's:0:N, y's:1:N
unq_dists, count = np.unique(np.sqrt(xa**2 + ya**2), return_counts=1)
count = count*4
For further performance boost, we could use np.unique on the squared summations and then use np.sqrt on the unique ones. The idea is to perform the slow square-root computation on smaller unique set, like so -
unq_dists, count = np.unique(xa**2 + ya**2, return_counts=1)
unq_dists = np.sqrt(unq_dists)

After you flattened the array, convert it to a Pandas Series and count unique values:
distances = pandas.Series(numpy.ndarray.flatten(R))
distances.value_counts()
# 9.219544 16
# 8.062258 16
# 5.000000 12
#10.000000 12
# 7.071068 12
# 2.236068 8
# ....

Your code can be made more efficient in a few ways. You are first forming two large matrices and then square the terms. It would be better to square first. Also, forming meshgrid only to perform outer addition is unnecessary: there is numpy.add.outer. Lastly, the loop you have is made unnecessary by the return_counts=True option in numpy.unique. (Which, incidentally, flattens the array itself, so you don't have to.) So the code is shortened to three lines.
x = numpy.arange(-10, 11, 1)
R = numpy.sqrt(numpy.add.outer(x**2, x**2))
r, Nr = numpy.unique(R, return_counts=True)
(And if you want to exclude 0 distance, return r[1:] and Nr[1:])

Related

How to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus

Given a 10x10 grid (2d-array) filled randomly with numbers, either 0, 1 or 2. How can I find the Euclidean distance (the l2-norm of the distance vector) between two given points considering periodic boundaries?
Let us consider an arbitrary grid point called centre. Now, I want to find the nearest grid point containing the same value as centre. I need to take periodic boundaries into account, such that the matrix/grid can be seen rather as a torus instead of a flat plane. In that case, say the centre = matrix[0,2], and we find that there is the same number in matrix[9,2], which would be at the southern boundary of the matrix. The Euclidean distance computed with my code would be for this example np.sqrt(0**2 + 9**2) = 9.0. However, because of periodic boundaries, the distance should actually be 1, because matrix[9,2] is the northern neighbour of matrix[0,2]. Hence, if periodic boundary values are implemented correctly, distances of magnitude above 8 should not exist.
So, I would be interested on how to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus by applying a wrap-around for the boundaries.
import numpy as np
matrix = np.random.randint(0,3,(10,10))
centre = matrix[0,2]
#rewrite the centre to be the number 5 (to exclude itself as shortest distance)
matrix[0,2] = 5
#find the points where entries are same as centre
same = np.where((matrix == centre) == True)
idx_row, idx_col = same
#find distances from centre to all values which are of same value
dist = np.zeros(len(same[0]))
for i in range(0,len(same[0])):
delta_row = same[0][i] - 0 #row coord of centre
delta_col = same[1][i] - 2 #col coord of centre
dist[i] = np.sqrt(delta_row**2 + delta_col**2)
#retrieve the index of the smallest distance
idx = dist.argmin()
print('Centre value: %i. The nearest cell with same value is at (%i,%i)'
% (centre, same[0][idx],same[1][idx]))
For each axis, you can check whether the distance is shorter when you wrap around or when you don't. Consider the row axis, with rows i and j.
When not wrapping around, the difference is abs(i - j).
When wrapping around, the difference is "flipped", as in 10 - abs(i - j). In your example with i == 0 and j == 9 you can check that this correctly produces a distance of 1.
Then simply take whichever is smaller:
delta_row = same[0][i] - 0 #row coord of centre
delta_row = min(delta_row, 10 - delta_row)
And similarly for delta_column.
The final dist[i] calculation needs no changes.
I have a working 'sketch' of how this could work. In short, I calculate the distance 9 times, 1 for the normal distance, and 8 shifts to possibly correct for a closer 'torus' distance.
As n is getting larger, the calculation costs can go sky high as the numbers go up. But, the torus effect, is probably not needed as there is always a point nearby without 'wrap around'.
You can easily test this, because for a grid of size 1, if a point is found of distance 1/2 or closer, you know there is not a closer torus point (right?)
import numpy as np
n=10000
np.random.seed(1)
A = np.random.randint(low=0, high=10, size=(n,n))
I create 10000x10000 points, and store the location of the 1's in ONES.
ONES = np.argwhere(A == 0)
Now I define my torus distance, which is trying which of the 9 mirrors is the closest.
def distance_on_torus( point=[500,500] ):
index_diff = [[1],[1],[0],[0],[0,1],[0,1],[0,1],[0,1]]
coord_diff = [[-1],[1],[-1],[1],[-1,-1],[-1,1],[1,-1],[1,1]]
tree = BallTree( ONES, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances = [dist[0]]
for indici_to_shift, coord_direction in zip(index_diff, coord_diff):
MIRROR = ONES.copy()
for i,shift in zip(indici_to_shift,coord_direction):
MIRROR[:,i] = MIRROR[:,i] + (shift * n)
tree = BallTree( MIRROR, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances.append(dist[0])
return np.min(distances)
%%time
distance_on_torus([2,3])
It is slow, the above takes 15 minutes.... For n = 1000 less than a second.
A optimisation would be to first consider the none-torus distance, and if the minimum distance is possibly not the smallest, calculate with only the minimum set of extra 'blocks' around. This will greatly increase speed.

Efficient way for maximum medians from triangle polygons

Objective
I have a soup of triangle polygons. I want to retrieve the largest median as vector for each triangle.
State of work
Starting point:
Array of points (n,3) , e.g. [x,y,z]
Array of triangle point indices (n, 3) referencing the array of points above, e.g. [[0,1,2],[2,3,4]...]
I combine both two one single matrix containing the real 3D point coordinates. Then I calculate the median vectors and their lengths.
/Edit : I updated the code to my current version of it
def calcMedians(polygon):
# C -> AB = C-(A + 0.5(B-A))
# B -> AC = B - (A + 0.5(C-A))
# A -> BC = A - (B
dim = np.shape(polygon)
medians = np.zeros((dim[0],3,2,dim[1]))
medians[:,0,0] = polygon[:,2]
medians[:,0,1] = polygon[:,0] + 0.5*(polygon[:,1]-polygon[:,0])
medians[:,1,0] = polygon[:,1]
medians[:,1,1] = polygon[:,0] + 0.5*(polygon[:,2]-polygon[:,0])
medians[:,2,0] = polygon[:,0]
medians[:,2,1] = polygon[:,1] + 0.5*(polygon[:,2]-polygon[:,1])
m1 = np.linalg.norm(medians[:,0,0]-medians[:,0,1],axis=1)
m2 = np.linalg.norm(medians[:,1,0]-medians[:,1,1],axis=1)
m3 = np.linalg.norm(medians[:,2,0]-medians[:,2,1],axis=1)
medianlengths = np.vstack((m1,m2,m3)).T
maxlengths = np.argmax(medianlengths,axis=1)
final = np.zeros((dim[0],2,dim[1]))
dim = np.shape(medians)
for i in range(0,dim[0]):
idx = maxlengths[i]
final[i] = medians[i,idx]
return final
Now I am creating the final median vector matrix using an empty matrix first. The lengths are calculated using np.linalg.norm and collected in a matrix. For this matrix, the argmax method is used to identify to target median vector.
Problem
Old:However, I am somehow confused by the dimensionality and currently not able to get this to work or to understand if the result is correct.
Does somebody know how to do this correctly and/or if this approach is efficient?
My target would be a construct of the 3 medians in form of [n_polygons, 3( due to 3 medians), 2 (start and end point), 3 (xyz)]
Using the max lengths information, i would like to reduce it to [n_polygons, 2 (start and end point), 3 (xyz)]
Using this improvised for loop in the function, I can create the output. But there has to be a more "clean" matrix method to it. Using medians[:,maxlengths,:,:] leads to a shape of [4,n_polygons,2,3] instead of [n_polygons,2,3] and I do not understand why.
Example image for medians of two triangles:
Unfortunately, I don't have a large exemplary data set but I guess that this can be generated quite quickly. The example data set from the picture shown above is:
polygons = np.array([[0,1,2],[0,3,2]])
points = np.array([[0,0],
[1,0],
[1,1],
[0,1]])
polygons3d = points[polygons[:,:]]
The longest median is for the shortest triangle side. Look here and rewrite median length formula as
M[i] = Sqrt(2(a^2+b^2+c^2)-3*side[i]^2) / 2
So you can simplify calculations a bit using only side lengths (perhaps you already have them)
Concerning 3D coordinates - just use projection on any coordinate plane not perpendicular to your point plane - ignore one dimension (choose dimension with the lowest value range)

Finding n-dimensional neighbors

I am trying to get the neighbors of a cell in an n-dimensional space, something like 8-connected or 26-connected cells, but at any dimensionality provided an n-tuple.
Neighbors that are directly adjacent are easy enough, just +1/-1 in any dimension. The part I am having difficulty with are the diagonals, where I can have any quantity of coordinates differing by 1.
I wrote a function that recurs for each sub-dimension, and generates all +/- combinations:
def point_neighbors_recursive(point):
neighbors = []
# 1-dimension
if len(point) == 1:
neighbors.append([point[0] - 1]) # left
neighbors.append([point[0]]) # current
neighbors.append([point[0] + 1]) # right
return neighbors
# n-dimensional
for sub_dimension in point_neighbors_recursion(point[1:]):
neighbors.append([point[0] - 1] + sub_dimension) # left
neighbors.append([point[0]] + sub_dimension) # center
neighbors.append([point[0] + 1] + sub_dimension) # right
return neighbors
However this returns a lot of redundant neighbors.
Are there any better solutions?
I'll bet that all you need is in the itertools package, especially the product method. What you're looking for is the Cartesian product of your current location with each coordinate perturbed by 1 in each direction. Thus, you'll have a list of triples derived from your current point:
diag_coord = [(x-1, x, x+1) for x in point]
Now, you take the product of all those triples, recombine each set, and you have your diagonals.
Is that what you needed?

FInd all the points that lie with in a spherical region

For example, find the image below, which explains the problem for a simple 2D case. The label (N) and coordinates (x,y) for each point is known. I need to find all the point labels that lie within the red circle
My actual problem is in 3D and the points are not uniformly distributed
Sample input file which contain coordinates of 7.25 M points is attached here point file.
I tried the following piece of code
import numpy as np
C = [50,50,50]
R = 20
centroid = np.loadtxt('centroid') #chk the file attached
def dist(x,y): return sum([(xi-yi)**2 for xi, yi in zip(x,y)])
elabels=[i+1 for i in range(len(centroid)) if dist(C,centroid[i])<=R**2]
For an single search it takes ~ 10 min. Any suggestions to make it faster ?
Thanks,
Prithivi
When using numpy, avoid using list comprehensions on arrays.
Your computation can be done using vectorized expressions like this
centre = np.array((50., 50., 50.))
points = np.loadtxt('data')
distances2= np.sum((points-centre)**2, axis=1)
points is a N x 2 array, points-centre is also a N x 2 array,
(points-centre)**2 computes the squares of each element of the difference and eventually np.sum(..., axis=1) sums the elements of the squared differences along axis no. 1, that is, across columns.
To filter the array of positions, you can use boolean indexing
close = points[distances2<max_dist**2]
You are heavily calling the dist function. You could try to low level optimize it, and control with the timeit Python module which is more efficient. On my machine, I tried this one:
def dist(x,y):
d0 = y[0] -x[0]
d1 = y[1] -x[1]
d2 = y[2] -x[2]
return d0 * d0 + d1*d1 + d2*d2
and timeit said it was more than 3 times quicker.
This one was just in the middle:
def dist(x,y):
s = 0
for i in range(len(x)):
d = y[i] - x[i]
s += d * d
return s

Fast, elegant way to calculate empirical/sample covariogram

Does anyone know a good method to calculate the empirical/sample covariogram, if possible in Python?
This is a screenshot of a book which contains a good definition of covariagram:
If I understood it correctly, for a given lag/width h, I'm supposed to get all the pair of points that are separated by h (or less than h), multiply its values and for each of these points, calculate its mean, which in this case, are defined as m(x_i). However, according to the definition of m(x_{i}), if I want to compute m(x1), I need to obtain the average of the values located within distance h from x1. This looks like a very intensive computation.
First of all, am I understanding this correctly? If so, what is a good way to compute this assuming a two dimensional space? I tried to code this in Python (using numpy and pandas), but it takes a couple of seconds and I'm not even sure it is correct, that is why I will refrain from posting the code here. Here is another attempt of a very naive implementation:
from scipy.spatial.distance import pdist, squareform
distances = squareform(pdist(np.array(coordinates))) # coordinates is a nx2 array
z = np.array(z) # z are the values
cutoff = np.max(distances)/3.0 # somewhat arbitrary cutoff
width = cutoff/15.0
widths = np.arange(0, cutoff + width, width)
Z = []
Cov = []
for w in np.arange(len(widths)-1): # for each width
# for each pairwise distance
for i in np.arange(distances.shape[0]):
for j in np.arange(distances.shape[1]):
if distances[i, j] <= widths[w+1] and distances[i, j] > widths[w]:
m1 = []
m2 = []
# when a distance is within a given width, calculate the means of
# the points involved
for x in np.arange(distances.shape[1]):
if distances[i,x] <= widths[w+1] and distances[i, x] > widths[w]:
m1.append(z[x])
for y in np.arange(distances.shape[1]):
if distances[j,y] <= widths[w+1] and distances[j, y] > widths[w]:
m2.append(z[y])
mean_m1 = np.array(m1).mean()
mean_m2 = np.array(m2).mean()
Z.append(z[i]*z[j] - mean_m1*mean_m2)
Z_mean = np.array(Z).mean() # calculate covariogram for width w
Cov.append(Z_mean) # collect covariances for all widths
However, now I have confirmed that there is an error in my code. I know that because I used the variogram to calculate the covariogram (covariogram(h) = covariogram(0) - variogram(h)) and I get a different plot:
And it is supposed to look like this:
Finally, if you know a Python/R/MATLAB library to calculate empirical covariograms, let me know. At least, that way I can verify what I did.
One could use scipy.cov, but if one does the calculation directly (which is very easy), there are more ways to speed this up.
First, make some fake data that has some spacial correlations. I'll do this by first making the spatial correlations, and then using random data points that are generated using this, where the data is positioned according to the underlying map, and also takes on the values of the underlying map.
Edit 1:
I changed the data point generator so positions are purely random, but z-values are proportional to the spatial map. And, I changed the map so that left and right side were shifted relative to eachother to create negative correlation at large h.
from numpy import *
import random
import matplotlib.pyplot as plt
S = 1000
N = 900
# first, make some fake data, with correlations on two spatial scales
# density map
x = linspace(0, 2*pi, S)
sx = sin(3*x)*sin(10*x)
density = .8* abs(outer(sx, sx))
density[:,:S//2] += .2
# make a point cloud motivated by this density
random.seed(10) # so this can be repeated
points = []
while len(points)<N:
v, ix, iy = random.random(), random.randint(0,S-1), random.randint(0,S-1)
if True: #v<density[ix,iy]:
points.append([ix, iy, density[ix,iy]])
locations = array(points).transpose()
print locations.shape
plt.imshow(density, alpha=.3, origin='lower')
plt.plot(locations[1,:], locations[0,:], '.k')
plt.xlim((0,S))
plt.ylim((0,S))
plt.show()
# build these into the main data: all pairs into distances and z0 z1 values
L = locations
m = array([[math.sqrt((L[0,i]-L[0,j])**2+(L[1,i]-L[1,j])**2), L[2,i], L[2,j]]
for i in range(N) for j in range(N) if i>j])
Which gives:
The above is just the simulated data, and I made no attempt to optimize it's production, etc. I assume this is where the OP starts, with the task below, since the data already exists in a real situation.
Now calculate the "covariogram" (which is much easier than generating the fake data, btw). The idea here is to sort all the pairs and associated values by h, and then index into these using ihvals. That is, summing up to index ihval is the sum over N(h) in the equation, since this includes all pairs with hs below the desired values.
Edit 2:
As suggested in the comments below, N(h) is now only the pairs that are between h-dh and h, rather than all pairs between 0 and h (where dh is the spacing of h-values in ihvals -- ie, S/1000 was used below).
# now do the real calculations for the covariogram
# sort by h and give clear names
i = argsort(m[:,0]) # h sorting
h = m[i,0]
zh = m[i,1]
zsh = m[i,2]
zz = zh*zsh
hvals = linspace(0,S,1000) # the values of h to use (S should be in the units of distance, here I just used ints)
ihvals = searchsorted(h, hvals)
result = []
for i, ihval in enumerate(ihvals[1:]):
start, stop = ihvals[i-1], ihval
N = stop-start
if N>0:
mnh = sum(zh[start:stop])/N
mph = sum(zsh[start:stop])/N
szz = sum(zz[start:stop])/N
C = szz-mnh*mph
result.append([h[ihval], C])
result = array(result)
plt.plot(result[:,0], result[:,1])
plt.grid()
plt.show()
which looks reasonable to me as one can see bumps or troughs at the expected for the h values, but I haven't done a careful check.
The main speedup here over scipy.cov, is that one can precalculate all of the products, zz. Otherwise, one would feed zh and zsh into cov for every new h, and all the products would be recalculated. This calculate could be sped up even more by doing partial sums, ie, from ihvals[n-1] to ihvals[n] at each timestep n, but I doubt that will be necessary.

Categories

Resources