I have a 570 x 800 matrix with id values. What I would like to do if find the adjacent neighbors for each item. The max number of neighbors would be 8 unless the cell is along a boundary. In that case, there would be three neighbors. I want to append the neighbors to a list. I saw the posting for finding neighbors when each cell has x and y coordinates which was very helpful, but how would modify the code with no coordinates. The ids come in as a string which is fine because I use it as a key in a dictionary. Any help would be appreciated.
Assuming that what you're trying to do is construct an eight-connected grid on the matrix, and that the position of item in the the matrix defines an x- and y- co-ordinate, you can use something like this:
def eight_connected_neighbours( xmax, ymax, x, y ):
"""The x- and y- components for a single cell in an eight connected grid
Parameters
----------
xmax : int
The width of the grid
ymax: int
The height of the grid
x : int
The x- position of cell to find neighbours of
y : int
The y- position of cell to find neighbours of
Returns
-------
results : list of tuple
A list of (x, y) indices for the neighbours
"""
results = []
for dx in [-1,0,1]:
for dy in [-1,0,1]:
newx = x+dx
newy = y+dy
if (dx == 0 and dy == 0):
continue
if (newx>=0 and newx<xmax and newy >=0 and newy<ymax):
results.append( (newx, newy) )
return results
Let me give an alternate answer with numpy, which is a library you might want to consider if you're doing anything a bit more heavy duty with your data. The advantage with this method is the extensibility to the number of nearest neighbors with the parameter k. The setup:
from numpy import *
k = 1
# Create the nearest neighbors
Xidx, Yidx = mgrid[-k:k+1,-k:k+1]
# Remove the center (0,0) index
center = (Xidx==0) & (Yidx==0)
Xidx = Xidx[~center]
Yidx = Yidx[~center]
Now you can access the nearest neighbors with A[Xidx+dx, Yidx+dy] where dx and dy are the offsets for the x and y coordinates.
Example
Let's take a random matrix:
A = random.random((5,5))
print A
which for me looks like:
[[ 0.90779297 0.91195651 0.32751438 0.44830373 0.2528675 ]
[ 0.02542108 0.52542962 0.28203009 0.35606998 0.88076027]
[ 0.08955781 0.98903843 0.86881875 0.21246095 0.92005691]
[ 0.57253561 0.08830487 0.06418296 0.59632344 0.53604546]
[ 0.7646322 0.50869651 0.00229266 0.26363367 0.64899637]]
Now we can view the nearest neighbors with
dx, dy = 2,1
print "Cell value A[%i,%i] = %f " % (dx, dy, A[dx,dy])
print "k=%i nearest neighbors: "%k, A[Xidx+dx, Yidx+dy]
Giving:
Cell value A[2,1] = 0.989038
k=1 nearest neighbors: [ 0.02542108 0.52542962 0.28203009 0.08955781 0.86881875 0.57253561 0.08830487 0.06418296]
Bonus
As mentioned, by changing k you can easily get the next nearest neighbors, and next-next neighbors, etc... In addition, the ability to index a higher order array (say a tensor of rank 3) is now possible by adding an additional variable Zidx in a similar way.
Caveats
This works nicely when you go to the rightmost and bottom of your matrix - you'll get smaller lists (as you specified you wanted). However, numpy indexing (and Python) as well, wraps around, so an index of -1 will give you the last element. Thus asking for the offset at 0,0 will still give you eight entries by wrapping around. The other answers here show a good way to check for this.
If you want to grab something on the left side edge (and you really don't want to use an if statement), you might change the index as such (making sure to remove the center element as above):
# Create the nearest neighbors (ON THE LEFT EDGE)
Xidx_left, Yidx_left = mgrid[0:k+1,-k:k+1]
code with no coordinates? Do you mean like this:
XMAX = 800
YMAX = 570
NEIGHBOURS = [(-1, -1), (0, -1), (1, -1), (-1, 0), (1, 0), (-1, 1), (0, 1), (1, 1)]
matrix = range(XMAX * YMAX)
def all_neighbours(m):
for i in xrange(len(m)):
ns = []
y, x = divmod(i, XMAX)
for u, v in NEIGHBOURS:
ux = u + x
vy = v + y
if 0 <= ux < XMAX and 0 <= vy < YMAX:
ns.append(ux + vy * YMAX)
yield i, ns
if __name__ == '__main__':
for field, neighbours in all_neighbours(matrix):
print field, neighbours
Related
I am trying to label x and y points based on their being in a specific section of a meshgrid in python. The points are stored in a pandas dataframe.
Here I have a scatter plot of the coordinates and above them I am plotting the grid.
The entire grid is way bigger, from the bottom left point (500,1250) to upper right point (2750, 3250), which means the whole grid is 225x200 sections.
I want to iterate through the sections of the grid and check if a point is inside. If a point is inside the section I want to add a label to the point. The label should be the same of the section name.
I want to add a column to the dataframe called 'section' that stores the section a point belongs to.
In the example (picture above) I would like to label all the points with
770 <= x <= 780 and 1795 <= y <= 1805 with the section name 'A3'.
my code currently looks like this:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
df = pd.read_csv('./file.csv', sep=';')
x_min = df['X[mm]'].min()
x_max = df['X[mm]'].max()
y_min = df['Y[mm]'].min()
y_max = df['Y[mm]'].max()
#side of the square in mm:
square_side = 10
xs = np.arange(x_min, x_max+square_side, square_side)
ys = np.arange(y_min, y_max+square_side, square_side)
x_2, y_2 = np.meshgrid(xs, ys, indexing = 'ij')
fig, ax = plt.subplots(figsize=(9,9))
ax.plot(df['X[mm]'], df['Y[mm]'], linewidth=0.2, c='black')
#plot meshgrid as grid instead of points:
segs1 = np.stack((x_2[:,[0,-1]],y_2[:,[0,-1]]), axis=2)
segs2 = np.stack((x_2[[0,-1],:].T,y_2[[0,-1],:].T), axis=2)
plt.gca().add_collection(LineCollection(np.concatenate((segs1, segs2))))
ax.set_aspect('equal', 'box')
plt.show()
I have also a function that determines if the points are inside of a rectangle (this does not use meshgrid):
def is_inside_rect(M, A, B, D):
'''Check if a point M is inside a rectangle with corners A, B, C, D'''
# 0 <= dot(BC,BM) <= dot(BC,BC)
#print(np.dot(B - A, D - A))
return 0 <= np.dot(B - A, M - A) <= np.dot(B - A, B - A) and 0 <= np.dot(D - B, M - B) <= np.dot(D - B, D - B)
I thought of using it in a while loop like this:
x = x_min
y = y_min
while (x <= x_max + square_side) and (y <= y_max + square_side):
A = np.array([x, y])
B = np.array([x + square_side, y])
D = np.array([x + square_side, y + square_side])
print(A, B, D)
df['c'] = df[['X[mm]', 'Y[mm]']].apply(lambda coord: 'red' if is_inside_rect(np.array(coord), A, B, D) else 'black', axis=1)
x += square_side
y += square_side
but this very slow and it changes the colors of all the points in every iteration.
Since all your points are equally sized, there is no need to define all of your squares beforehand and then determine which squares have which points. I would use the coordinates of each point to directly determine which square it will land in.
Let's take the 1-dimensional case, for the sake of simplicity. You want to group points on the number line into "squares" (really 1-d line segments). If your first square starts at x=0, your second at x=10, your third at x=20, and so on, how do you find the square for an arbitrary point x? You know that your squares are spaced by 10 (and you know they start at 0, which makes things easier), so you can simply divide by 10 and round down to get the square index.
You can just as easily do the same thing in 3-dimensions (or n-dimensions).
square_side = 10
x_min = df['X[mm]'].min()
y_min = df['Y[mm]'].min()
def label_point(x, y):
# Double forward slash is integer (round down) division
# Add 1 here if you really want 1-based indexing
x_label = (x - x_min) // square_side
y_label = chr(ord('A') + (y - y_min) // square_side)
return f'{y_label}{x_label}'
df['label'] = df[['X[mm]', 'Y[mm]']].apply(lambda coord: label_point(*coord), axis=1)
As for the efficiency, this solution looks at each point only once, and does a constant amount of work with each point, so it is O(n) in the number of points. Your solution looks at each square once, and for each square looks at each point this is O(n × m) where n is the number of points and m is the number of squares.
Your solution is more general, in that your is_inside_rect function works when your grid of rectangles has an arbitrary rotation. In this case, I would recommend rotating all your points about the origin, and then running my solution.
Also, your loop is adding 10 to x and y every loop, so you are traversing your space diagonally. I don't think you meant to do that.
I have to write a 2D Ising model simulation, where I don't neglect the effect of the distant neighbors, so I want to count the spins in a circle.
I've written a simple function which can get the elements of a grid, which are in a circle.
def countInCircle(g, x, y, r):
spinSum = 0
for R in range(0, r + 1, 1):
for i in range(0, g.shape[0], 1):
for j in range(0, g.shape[1], 1):
if ((i - x) ** 2 + (j - y) ** 2) == R:
spinSum = spinSum + g[i][j]
return spinSum
It works like a charm, but it cuts down some parts of the circle if it's ouf of the grid. How should I solve this for periodic boundary condition?
Thanks in advance!
Here is a solution. It uses numpy arrays and shifts the grid in such a manner that one can easily sum up all elements within a radius r around the given point (x,y). Some useful questions are
How to apply a disc shaped mask to an array
How to select a window from a numpy array with periodic boundary-conditions
There is one restriction in the following code which is that 2*radius+1 must be smaller or equal the minimum shape of the grid.
import numpy as np
def countInCircle(grid, x, y, r):
#restriction: 2*r+1 < min(g.shape)
shifted_grid=np.roll(np.roll(grid,shift=-x+r,axis=0),shift=-y+r,axis=1)
Y,X = np.ogrid[-r: r+1, -r: r+1]
mask = X**2+Y**2 <= r**2
return np.sum(shifted_grid[mask])
g = np.ones((5,5))
s = countInCircle(g, 0, 0, 2)
print "s = ", s
# setting r=2 and summing all ones around (0,0) gives 13. Works fine.
# setting some spin (-1,0,1) particles
g2 = np.random.randint(-1,2, size=(10,15))
print g2
s = countInCircle(g2, 3,9, r=3)
print "s = ", s
I have a list of objects that I have mapped on an xy-plane based on similarity. The points for these objects are stored in a 2D python list:
>>> points
[[x1, x2, ... , xn],
[y1, y2, ... , yn]]
The coordinates are floating point numbers and somewhat outline a square on a scatter plot. This is an example of 2000 objects mapped on a plane:
example of plotted points
I am trying to "fit" these points into a two-dimensional list in Python. Essentially, I want to keep the relative similarities of the points, but map them to a 2D array. For example, let's say I have these four points that form a perfect square:
(1, 1), (1, 2), (2, 1), and (2, 2)
I want an algorithm that will map them (as best as possible, obviously not perfectly unless the points form a perfectly filled square) to a 2D list like this:
[[(2, 1), (2, 2)],
[(1, 1), (1, 2)]]
If I have n points, then the 2D list generated will have m by m dimensions where m*m >= n. Any unused spots in the list will have "None" inside of it.
Here is some code I wrote attempting a solution. I pass the function a list of objects and a 2D list of their xy-coordinates, respectively. I then calculate the upper and lower bound in the x and y direction, and compute the "grids" that each spot in the list will represent based on the size of the matrix I am trying to fill. Then, I iterate through each point, and try an place it in the closest mapped spot I can find.
I try the point itself, and if it is already full, I search the perimeter around the spot in the matrix. If all those are full, I increase the "radius" of the perimeter to check until I find a spot.
This has two issues:
1) I know this is not as accurate of a mapping I could create, but I searched online and cannot find any other ideas.
2) Sometimes, this code never finishes and can't find the proper open spot. I know I can fix this bug somewhere in the calculation of the perimeter, but I figured I best spend my time looking for a more optimal solution. Hence, here I am!
This is my first SO post as well, so apologies if I didn't follow proper procedure.
Here is the code. Any assistance would be greatly appreciated.
from math import ceil, sqrt
def insert_nearest(lst, obj, i, j, N):
if lst[i][j] == None: # see if "ideal" spot is empty
lst[i][j] = obj
return
yu = i - 1 # top of box
yd = i + 1 # bottom of box
xl = j - 1 # left of box
xr = j + 1 # right of bog
while True:
to_try = [(yu, x) for x in range(xl, xr+1)] + \
[(yd, x) for x in range(xl, xr+1)] + \
[(y, xl) for y in range(yu+1, yd)] + \
[(y, xr) for y in range(yu+1, yd)] # "perimeter" points to check
for coord in to_try: # check around perimeter for any empty spots
try: # in case index is out of range
if lst[coord[0]][coord[1]] == None:
lst[coord[0]][coord[1]] = obj
return
except:
continue
# if perimeter is full, increase "raduis" of box and try again
yu -= 1
yd += 1
xl -= 1
xr += 1
def build_matrix(objList, points):
N = ceil(sqrt(len(objList)))
matrix = [[None]*N]*N
xinc = (max(points[0]) - min(points[0])) / N # grid lines in x direction
yinc = (max(points[1]) - min(points[1])) / N # grid lines in y direction
for i in range(len(points[0])):
x = min(N-1, int(points[0][i] // xinc)) # map to integer spot in list
y = min(N-1, int(points[1][i] // yinc)) # map to integer spot in list
insert_nearest(matrix, objList[i], x, y, N)
return matrix
I have a question/idea that I am not sure how to do.
I have a scatter plot of X vs. Y
I can draw a rectangle and then pick all the points within in.
Ideally I want to define a ellipse as it better captures the shape and exclude all the points that are outside it.
How does one do this? is it even possible? I drew the plot using matplotlib.
I used Linear Regression (LR) to fit the points but thats not really what I am looking for.
I want to define APPROXIMATELY a ellipse to cover as many points as possible within it and then exclude points outside it. How can I define an equation/code to pick the ones inside ?
If you have the data structure that is represented in the graph, you can do this with a function and a list comprehension.
If you have the data in a list like this:
# Made up data
lst = [
# First element is X, second is Y.
(0,0),
(92,20),
(10,0),
(13,40),
(27,31),
(.5,.5),
]
def shape_bounds(x):
"""
Function that returns lower and upper bounds for y based on x
Using a circle as an example here.
"""
r = 4
# A circle is x**2 + y**2 = r**2, r = radius
if -r <= x <= r:
y = sqrt(r**2-x**2)
return -y, y
else:
return 1, -1 # Remember, returns lower, upper.
# This will fail any lower < x < upper test.
def in_shape(elt):
"""
Unpacks a pair and tests if y is inside the shape bounds given by x
"""
x, y = elt
lower_bound, upper_bound = shape_bounds(x)
if lower_bound < y < upper_bound:
return True
else:
return False
# Demo walkthrough
for elt in lst:
x, y = elt
print x, y
lower_bound, upper_bound = shape_bounds(x)
if lower_bound < y < upper_bound:
print "X: {0}, Y: {1} is in the circle".format(x, y)
# New list of only points inside the shape
new_lst = [x for x in lst if in_shape(x)]
As for an ellipse, try changing the shape equation based on this
I'm currently trying to simulate many particles in a box bouncing around.
I've taken into account #kalhartt's suggestions and this is the improved code to initialize the particles inside the box:
import numpy as np
import scipy.spatial.distance as d
import matplotlib.pyplot as plt
# 2D container parameters
# Actual container is 50x50 but chose 49x49 to account for particle radius.
limit_x = 20
limit_y = 20
#Number and radius of particles
number_of_particles = 350
radius = 1
def force_init(n):
# equivalent to np.array(list(range(number_of_particles)))
count = np.linspace(0, number_of_particles-1, number_of_particles)
x = (count + 2) % (limit_x-1) + radius
y = (count + 2) / (limit_x-1) + radius
return np.column_stack((x, y))
position = force_init(number_of_particles)
velocity = np.random.randn(number_of_particles, 2)
The initialized positions look like this:
Once I have the particles initialized I'd like to update them at each time-step. The code for updating follows the previous code immediately and is as follows:
# Updating
while np.amax(abs(velocity)) > 0.01:
# Assume that velocity slowly dying out
position += velocity
velocity *= 0.995
#Get pair-wise distance matrix
pair_dist = d.cdist(position, position)
pair_d = pair_dist<=4
#If pdist [i,j] is <=4 then the particles are too close and so treat as collision
for i in range(len(pair_d)):
for j in range(i):
# Only looking at upper triangular matrix (not inc. diagonal)
if pair_d[i,j] ==True:
# If two particles are too close then swap velocities
# It's a bad hack but it'll work for now.
vel_1 = velocity[j][:]
velocity[j] = velocity[i][:]*0.9
velocity[i] = vel_1*0.9
# Masks for particles beyond the boundary
xmax = position[:, 0] > limit_x
xmin = position[:, 0] < 0
ymax = position[:, 1] > limit_y
ymin = position[:, 1] < 0
# flip velocity and assume that it looses 10% of energy
velocity[xmax | xmin, 0] *= -0.9
velocity[ymax | ymin, 1] *= -0.9
# Force maximum positions of being +/- 2*radius from edge
position[xmax, 0] = limit_x-2*radius
position[xmin, 0] = 2*radius
position[ymax, 0] = limit_y-2*radius
position[ymin, 0] = 2*radius
After updating it and letting it run to completion I get this result:
This is infinitely better than before but there are still patches that are too close together - such as:
Too close together. I think the updating works... and thanks to #kalhartt my code is wayyyy better and faster (and I learnt some things about numpy... props #kalhartt) but I still don't know where it's screwing up. I've tried changing the order of the actual updates with the pair-wise distance going last or the position +=velocity going last but to no avail. I added the *0.9 to make the entire thing die down faster and I tried it with 4 to make sure that 2*radius (=2) wasn't too tight a criteria... but nothing seems to work.
Any and all help would be appreciated.
There are just two typos standing in your way. First for i in range(len(positions)/2): only iterates over half of your particles. This is why half the particles stay in the x bounds (if you watch for large iterations its more clear). Second, the second y condition should be a minimum (I assume) position[i][1] < 0. The following block works to bound the particles for me (I didn't test with the collision code so there could be problems there).
for i in range(len(position)):
if position[i][0] > limit_x or position[i][0] < 0:
velocity[i][0] = -velocity[i][0]
if position[i][1] > limit_y or position[i][1] < 0:
velocity[i][1] = -velocity[i][1]
As an aside, try to leverage numpy to eliminate loops when possible. It is faster, more efficient, and in my opinion more readable. For example force_init would look like this:
def force_init(n):
# equivalent to np.array(list(range(number_of_particles)))
count = np.linspace(0, number_of_particles-1, number_of_particles)
x = (count * 2) % limit_x + radius
y = (count * 2) / limit_x + radius
return np.column_stack((x, y))
And your boundary conditions would look like this:
while np.amax(abs(velocity)) > 0.01:
position += velocity
velocity *= 0.995
# Masks for particles beyond the boundary
xmax = position[:, 0] > limit_x
xmin = position[:, 0] < 0
ymax = position[:, 1] > limit_y
ymin = position[:, 1] < 0
# flip velocity
velocity[xmax | xmin, 0] *= -1
velocity[ymax | ymin, 1] *= -1
Final note, it is probably a good idea to hard clip position to the bounding box with something like position[xmax, 0] = limit_x; position[xmin, 0] = 0. There may be cases where velocity is small and a particle outside the box will be reflected but not make it inside in the next iteration. So it will just sit outside the box being reflected forever.
EDIT: Collision
The collision detection is a much harder problem, but lets see what we can do. Lets take a look at your current implementation.
pair_dist = d.cdist(position, position)
pair_d = pair_dist<=4
for i in range(len(pair_d)):
for j in range(i):
# Only looking at upper triangular matrix (not inc. diagonal)
if pair_d[i,j] ==True:
# If two particles are too close then swap velocities
# It's a bad hack but it'll work for now.
vel_1 = velocity[j][:]
velocity[j] = velocity[i][:]*0.9
velocity[i] = vel_1*0.9
Overall a very good approach, cdist will efficiently calculate the distance
between sets of points and you find which points collide with pair_d = pair_dist<=4.
The nested for loops are the first problem. We need to iterate over True values of pair_d where j > i. First your code actually iterate over the lower triangular region by using for j in range(i) so that j < i, not particularly important in this instance as long since i,j pairs are not repeated. However Numpy has two builtins we can use instead, np.triu lets us set all values below a diagonal to 0 and np.nonzero will give us the indices of non-zero elements in a matrix. So this:
pair_dist = d.cdist(position, position)
pair_d = pair_dist<=4
for i in range(len(pair_d)):
for j in range(i+1, len(pair_d)):
if pair_d[i, j]:
...
is equivalent to
pair_dist = d.cdist(position, position)
pair_d = np.triu(pair_dist<=4, k=1) # k=1 to exclude the diagonal
for i, j in zip(*np.nonzero(pair_d)):
...
The second problem (as you noted) is that the velocities are just switched and scaled instead of reflected. What we really want to do is negate and scale the component of each particles velocity along the axis that connects them. Note that to do this we will need the vector connecting them position[j] - position[i] and the length of the vector connecting them (which we already calculated). So unfortunately part of the cdist calculation gets repeated. Lets quit using cdist and do it ourselves instead. The goal here is to make two arrays diff and norm where diff[i][j] is a vector pointing from particle i to j (so diff is a 3D array) and norm[i][j] is the distance between particles i and j. We can do this with numpy like so:
nop = number_of_particles
# Give pos a 3rd index so we can use np.repeat below
# equivalent to `pos3d = np.array([ position ])
pos3d = position.reshape(1, nop, 2)
# 3D arras with a repeated index so we can form combinations
# diff_i[i][j] = position[i] (for all j)
# diff_j[i][j] = position[j] (for all i)
diff_i = np.repeat(pos3d, nop, axis=1).reshape(nop, nop, 2)
diff_j = np.repeat(pos3d, nop, axis=0)
# diff[i][j] = vector pointing from position[i] to position[j]
diff = diff_j - diff_i
# norm[i][j] = sqrt( diff[i][j]**2 )
norm = np.linalg.norm(diff, axis=2)
# check for collisions and take the region above the diagonal
collided = np.triu(norm < radius, k=1)
for i, j in zip(*np.nonzero(collided)):
# unit vector from i to j
unit = diff[i][j] / norm[i][j]
# flip velocity
velocity[i] -= 1.9 * np.dot(unit, velocity[i]) * unit
velocity[j] -= 1.9 * np.dot(unit, velocity[j]) * unit
# push particle j to be radius units from i
# This isn't particularly effective when 3+ points are close together
position[j] += (radius - norm[i][j]) * unit
...
Since this post is long enough already, here is a gist of the code with my modifications.