Finding n-dimensional neighbors - python

I am trying to get the neighbors of a cell in an n-dimensional space, something like 8-connected or 26-connected cells, but at any dimensionality provided an n-tuple.
Neighbors that are directly adjacent are easy enough, just +1/-1 in any dimension. The part I am having difficulty with are the diagonals, where I can have any quantity of coordinates differing by 1.
I wrote a function that recurs for each sub-dimension, and generates all +/- combinations:
def point_neighbors_recursive(point):
neighbors = []
# 1-dimension
if len(point) == 1:
neighbors.append([point[0] - 1]) # left
neighbors.append([point[0]]) # current
neighbors.append([point[0] + 1]) # right
return neighbors
# n-dimensional
for sub_dimension in point_neighbors_recursion(point[1:]):
neighbors.append([point[0] - 1] + sub_dimension) # left
neighbors.append([point[0]] + sub_dimension) # center
neighbors.append([point[0] + 1] + sub_dimension) # right
return neighbors
However this returns a lot of redundant neighbors.
Are there any better solutions?

I'll bet that all you need is in the itertools package, especially the product method. What you're looking for is the Cartesian product of your current location with each coordinate perturbed by 1 in each direction. Thus, you'll have a list of triples derived from your current point:
diag_coord = [(x-1, x, x+1) for x in point]
Now, you take the product of all those triples, recombine each set, and you have your diagonals.
Is that what you needed?

Related

How to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus

Given a 10x10 grid (2d-array) filled randomly with numbers, either 0, 1 or 2. How can I find the Euclidean distance (the l2-norm of the distance vector) between two given points considering periodic boundaries?
Let us consider an arbitrary grid point called centre. Now, I want to find the nearest grid point containing the same value as centre. I need to take periodic boundaries into account, such that the matrix/grid can be seen rather as a torus instead of a flat plane. In that case, say the centre = matrix[0,2], and we find that there is the same number in matrix[9,2], which would be at the southern boundary of the matrix. The Euclidean distance computed with my code would be for this example np.sqrt(0**2 + 9**2) = 9.0. However, because of periodic boundaries, the distance should actually be 1, because matrix[9,2] is the northern neighbour of matrix[0,2]. Hence, if periodic boundary values are implemented correctly, distances of magnitude above 8 should not exist.
So, I would be interested on how to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus by applying a wrap-around for the boundaries.
import numpy as np
matrix = np.random.randint(0,3,(10,10))
centre = matrix[0,2]
#rewrite the centre to be the number 5 (to exclude itself as shortest distance)
matrix[0,2] = 5
#find the points where entries are same as centre
same = np.where((matrix == centre) == True)
idx_row, idx_col = same
#find distances from centre to all values which are of same value
dist = np.zeros(len(same[0]))
for i in range(0,len(same[0])):
delta_row = same[0][i] - 0 #row coord of centre
delta_col = same[1][i] - 2 #col coord of centre
dist[i] = np.sqrt(delta_row**2 + delta_col**2)
#retrieve the index of the smallest distance
idx = dist.argmin()
print('Centre value: %i. The nearest cell with same value is at (%i,%i)'
% (centre, same[0][idx],same[1][idx]))
For each axis, you can check whether the distance is shorter when you wrap around or when you don't. Consider the row axis, with rows i and j.
When not wrapping around, the difference is abs(i - j).
When wrapping around, the difference is "flipped", as in 10 - abs(i - j). In your example with i == 0 and j == 9 you can check that this correctly produces a distance of 1.
Then simply take whichever is smaller:
delta_row = same[0][i] - 0 #row coord of centre
delta_row = min(delta_row, 10 - delta_row)
And similarly for delta_column.
The final dist[i] calculation needs no changes.
I have a working 'sketch' of how this could work. In short, I calculate the distance 9 times, 1 for the normal distance, and 8 shifts to possibly correct for a closer 'torus' distance.
As n is getting larger, the calculation costs can go sky high as the numbers go up. But, the torus effect, is probably not needed as there is always a point nearby without 'wrap around'.
You can easily test this, because for a grid of size 1, if a point is found of distance 1/2 or closer, you know there is not a closer torus point (right?)
import numpy as np
n=10000
np.random.seed(1)
A = np.random.randint(low=0, high=10, size=(n,n))
I create 10000x10000 points, and store the location of the 1's in ONES.
ONES = np.argwhere(A == 0)
Now I define my torus distance, which is trying which of the 9 mirrors is the closest.
def distance_on_torus( point=[500,500] ):
index_diff = [[1],[1],[0],[0],[0,1],[0,1],[0,1],[0,1]]
coord_diff = [[-1],[1],[-1],[1],[-1,-1],[-1,1],[1,-1],[1,1]]
tree = BallTree( ONES, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances = [dist[0]]
for indici_to_shift, coord_direction in zip(index_diff, coord_diff):
MIRROR = ONES.copy()
for i,shift in zip(indici_to_shift,coord_direction):
MIRROR[:,i] = MIRROR[:,i] + (shift * n)
tree = BallTree( MIRROR, leaf_size=5*n, metric='euclidean')
dist, indi = tree.query([point],k=1, return_distance=True )
distances.append(dist[0])
return np.min(distances)
%%time
distance_on_torus([2,3])
It is slow, the above takes 15 minutes.... For n = 1000 less than a second.
A optimisation would be to first consider the none-torus distance, and if the minimum distance is possibly not the smallest, calculate with only the minimum set of extra 'blocks' around. This will greatly increase speed.

Rectangular lattice fit to noisey coordinates

I have the following problem. Imaging you have a set of coordinates that are somewhat organized in a regular pattern, such as the one shown below.
What i want to do is to automatically extract coordinates, such that they are ordered from left to right and top to bottom. In addition, the total number of coordinates should be as large as possible, but only include coordinates, such that the extracted coordinates are on a nearly rectangular grid (even if the coordinates have a different symmetry, e.g. hexagonal). I always want to extract coordinates that follow a rectangular unit cell structure.
For the example shown above, the largest number that contain such an orthorhombic set would be 8 x 8 coordinates (lets call this dimensions: m x n), as framed by the red rectangle.
The problem is that the given coordinates are noisy and distorted.
My approach was to generate an artificial lattice, and minimizing the difference to the given coordinates, taking into account some rotation, shift and simple distortion of the lattice. However, it turned out to be tricky to define a cost function that covers the complexity of the problem, i.e. minimizing the difference between the given coordinates and the fitted lattice, but also maximizing the grid components m x n.
If anyone has a smart idea how to tackle this problem, maybe also with machine learning algorithms, i would be very thankful.
Here is the code that i have used so far:
A function to generate the artificial lattice with m x n coordinates that are spaced by a and b in the "n" and "m" directions. The angle theta allows for a rotation of the lattice.
def lattice(m, n, a, b, theta):
coords = []
for j in range(m):
for i in range(n):
coords.append([np.sin(theta)*a*i + np.cos(theta)*b*j, np.cos(theta)*a*i - np.sin(theta)*b*j])
return np.array(coords)
I used the following function to measure the mean minimal distance between points, which is a good starting point for fitting:
def mean_min_distance(coords):
from scipy.spatial import distance
cd = distance.cdist(coords, coords)
cd_1 = np.where(cd == 0, np.nan, cd)
return np.mean(np.nanmin(cd_1, axis=1))
The following function provides all possible combinations of m x n that theoretically fit into the lengths of the coordinates, whose arrangement is assumed to be unknown. The ability to limit this to minimal and maximal values is included already:
def get_all_mxn(l, min_m=2, min_n=2, max_m=None, max_n=None):
poss = []
if max_m is None:
max_m = l + 1
if max_n is None:
max_n = l +1
for i in range(min_m, max_m):
for j in range(min_n, max_n):
if i * j <= l:
poss.append([i, j])
return np.array(poss)
The definition of the costfunction i used (for one particular set of m x n). So i first wanted to get a good fit for a certain m x n arrangement.
def cost(x0):
a, b, theta, shift_a, shift_b, dd1 = x0
# generate lattice
l = lattice(m, n, a, b, theta)
# distort lattice by affine transformation
distortion_matr = np.array([[1, dd1], [0, 1]])
l = np.dot(distortion_matr, l.T).T
# shift lattice
l = l + np.array((shift_b, shift_a))
# Some padding to make the lists the same length
len_diff = coords.shape[0] - l.shape[0]
l = np.append(l, (1e3, 1e3)*len_diff).reshape((l.shape[0] + len_diff, 2))
# calculate all distances between all points
cd = distance.cdist(coords, l)
minimum distance between each artificial lattice point and all coords
cd_min = np.min(cd[:, :coords.shape[0] - len_diff], axis=0)
# returns root mean square difference of all minimal distances
return np.sqrt(np.sum(np.abs(cd_min) ** 2) )
I then run the minimization:
md = mean_min_distance(coords)
# initial guess
x0 = np.array((md, md, np.deg2rad(-3.), 3, 1, 0.12))
res = minimize(cost, x0)
However, the results are extremely dependend on the initial parameter x0 and i have not even included a fitting of m and n.

Distances to positions on a square integer lattice

I want to calculate the distance to every other position in an integer lattice from it's centre and the number of positions at each distance. I'm currently using the following code to calculate this:
x = numpy.arange(-10, 11, 1)
[X, Y] = numpy.meshgrid(x, x)
R = numpy.sqrt(X**2+Y**2)
R2 = numpy.ndarray.flatten(R)
R3 = numpy.unique(R2)
r = R3[1:] # excludes the 0
Nr = numpy.zeros(numpy.size(r))
for i in range(numpy.size(r)):
Nr[i] = numpy.count_nonzero(R2 == r[i]
This tells me that the possible distances are 1, sqrt2, 2, sqrt5 etc.
It also tells me their is 4x1, 4xsqrt2, 4x2, 8xsqrt5 etc.
As this is a common problem in physics I was wondering if there is a function from a library such as numpy or scipy which could return these values more easily.
The lattice is centered at (0,0). So it's symmetric on the four quadrants. So, we can use this restriction to our advantage as we could compute those required unique distances and counts for one quadrant and multiply those counts by 4 to simulate for all four quadrants.
So, let's say we use the first quadrant (upper right quad). We would skip the elements on the (y = 0) line, because otherwise with the multiplication by 4 for simulating on all four quadrants would result in duplicating results. Additionally, this way we won't have to exclude the first element, as done in the original post.
Thus, an implementation would be -
N = 11 # Lattice size
xa, ya = np.ogrid[0:N,1:N] # x's:0:N, y's:1:N
unq_dists, count = np.unique(np.sqrt(xa**2 + ya**2), return_counts=1)
count = count*4
For further performance boost, we could use np.unique on the squared summations and then use np.sqrt on the unique ones. The idea is to perform the slow square-root computation on smaller unique set, like so -
unq_dists, count = np.unique(xa**2 + ya**2, return_counts=1)
unq_dists = np.sqrt(unq_dists)
After you flattened the array, convert it to a Pandas Series and count unique values:
distances = pandas.Series(numpy.ndarray.flatten(R))
distances.value_counts()
# 9.219544 16
# 8.062258 16
# 5.000000 12
#10.000000 12
# 7.071068 12
# 2.236068 8
# ....
Your code can be made more efficient in a few ways. You are first forming two large matrices and then square the terms. It would be better to square first. Also, forming meshgrid only to perform outer addition is unnecessary: there is numpy.add.outer. Lastly, the loop you have is made unnecessary by the return_counts=True option in numpy.unique. (Which, incidentally, flattens the array itself, so you don't have to.) So the code is shortened to three lines.
x = numpy.arange(-10, 11, 1)
R = numpy.sqrt(numpy.add.outer(x**2, x**2))
r, Nr = numpy.unique(R, return_counts=True)
(And if you want to exclude 0 distance, return r[1:] and Nr[1:])

Find irregular region in 4D numpy array of gridded data (lat/lon)

I have a large 4-dimensional dataset of Temperatures [time,pressure,lat,lon].
I need to find all grid points within a region defined by lat/lon indices and calculate an average over the region to leave me with a 2-dimensional array.
I know how to do this if my region is a rectangle (or square) but how can this be done with an irregular polygon?
Below is an image showing the regions I need to average together and the lat/lon grid the data is gridded to in the array
I believe this should solve your problem.
The code below generates all cells in a polygon defined by a list of vertices.
It "scans" the polygon row by row keeping track of the transition columns where you (re)-enter or exit the polygon.
def row(x, transitions):
""" generator spitting all cells in a row given a list of transition (in/out) columns."""
i = 1
in_poly = True
y = transitions[0]
while i < len(transitions):
if in_poly:
while y < transitions[i]:
yield (x,y)
y += 1
in_poly = False
else:
in_poly = True
y = transitions[i]
i += 1
def get_same_row_vert(i, vertices):
""" find all vertex columns in the same row as vertices[i], and return next vertex index as well."""
vert = []
x = vertices[i][0]
while i < len(vertices) and vertices[i][0] == x:
vert.append(vertices[i][1])
i += 1
return vert, i
def update_transitions(old, new):
""" update old transition columns for a row given new vertices.
That is: merge both lists and remove duplicate values (2 transitions at the same column cancel each other)"""
if old == []:
return new
if new == []:
return old
o0 = old[0]
n0 = new[0]
if o0 == n0:
return update_transitions(old[1:], new[1:])
if o0 < n0:
return [o0] + update_transitions(old[1:], new)
return [n0] + update_transitions(old, new[1:])
def polygon(vertices):
""" generator spitting all cells in the polygon defined by given vertices."""
vertices.sort()
x = vertices[0][0]
transitions, i = get_same_row_vert(0, vertices)
while i < len(vertices):
while x < vertices[i][0]:
for cell in row(x, transitions):
yield cell
x += 1
vert, i = get_same_row_vert(i, vertices)
transitions = update_transitions(transitions, vert)
# define a "strange" polygon (hook shaped)
vertices = [(0,0),(0,3),(4,3),(4,0),(3,0),(3,2),(1,2),(1,1),(2,1),(2,0)]
for cell in polygon(vertices):
print cell
# or do whatever you need to do
The general class of problems is called "Point in Polygon", where the (fairly) standard algorithm is based on drawing a test line through the point under consideration and counting the number of times it crosses polygon boundaries (its really cool/weird that it works so simply, I think). This is a really good overview which includes implementation information.
For your problem in particular, since each of your regions are defined based on a small number of square cells - I think a more brute-force approach might be better. Perhaps something like:
For each region, form a list of all of the (lat/lon) squares which define it. Depending on how your regions are defined, this may be trivial, or annoying...
For each point you are examining, figure out which square it lives in. Since the squares are so well behaves, you can do this manually using opposite corners of each square, or using a method like numpy.digitize.
Test whether the square the point lives in, is in one of the regions.
If you're still having trouble, please provide some more details about your problem (specifically, how your regions are defined) --- that will make it easier to offer advice.

How to efficiently determine if a set of points contains two that are close

I need to determine if a set of points (each given by a tuple of floats, each of which is in [0, 1]) contains two that are within some threshold, say 0.01, of each other. I should also mention that in the version of the problem that I am interested in, these "points" are given by tuples of length ~30, that is they are points in [0, 1]^30.
I can test if any two are within this threshold using something like:
def is_near(p1, p2):
return sqrt(sum((x1 - x2)**2 for x1, x2 in zip(p1, p2))) < 0.01 # Threshold.
Using this I can just check every pair using something like:
def contains_near(points):
from itertools import combinations
return any(is_near(p1, p2) for p1, p2 in combinations(points, r=2))
However this is quadratic in the length of the list, which is too slow for the long list of points that I have.
Is there an n log n way of solving this?
I tried doing things like snapping these points to a grid so I could use a dictionary / hash map to store them:
def contains_near_hash(points):
seen = dict()
for point in points:
# The rescaling constant should be 1 / threshold.
grid_point = tuple([round(x * 100, 0) for x in point])
if grid_point in seen:
for other in seen[grid_point]:
if is_near(point, other):
return True
seen[grid_point].append(point)
else:
seen[grid_point] = [point]
return False
However this doesn't work when
points = [(0.1149999,), (0.1150001,)]
As these round to two different grid points. I also tried a version in which the point was appended to all neighbouring grid points however as the examples that I want to do have ~30 coordinates, each grid point has 2^30 neighbours which makes this completely impractical.
A pair of points can only be 'near' each other if their distance in at least one dimension is less than the threshold. This can be exploited to reduce the number of candidate pairs by filtering one dimension after the other.
I suggest:
- sort the points in one dimension (say: x)
- find all points which are close enough to the next point in the sorted list and put their index into a set of candidates
- do not use sqrt() but the quadratic distance (x1 - x2)**2 or even abs(x1 - x2) for efficiency
- do that for the second dimension as well
- determine the intersect of both sets, these are points near each other
This way, you avoid costly is_near() calls, operate on way smaller sets, only work with unique points, and set lookups are very efficient.
This scheme can easily be expanded to include more than 2 dimensions.

Categories

Resources