The input is a undirected, unweighted graph with about 80 nodes (or to be more specific: the coordinates of the nodes in a txt file), in which a route is to be found that visits each node once, but does not use any vertex twice. Furthermore, the included angle between each two vertexes of the route should be greater than 90°. On the left side of the following picture you can see an angle that is not wanted in the route, in contrast to the angle on the right side:
Furthermore, start and end points of the route do not have to be identical. The route does not have to be the shortest one but should be as short as possible.
Here's what I've tried so far:
Considering there are 80 nodes in the graph it would be impossible to use a depth-first-search or backtracking algorithm because it would just take too long. Instead, I've implemented a greedy-algorithm, which always makes the best decision at the time of the decision. It works well for most of the examples, however, some are just impssoible to solve for that type of algorithm. Here's my whole code, which uses the "read_coordinates(file)" function to import the coordinates (x- and y-coordinates of a point line by line). After doing that, the function "greedy_approach(coordinates)" tries to find a route.
import matplotlib.pyplot as plt
import numpy as np
import math
import matplotlib.animation as animation
def read_coordinates(file):
with open(file, "r") as f:
lines = f.readlines()
coordinates = []
for line in lines:
x, y = map(float, line.strip().split())
coordinates.append((x, y))
return coordinates
def check_angle(v1, v2):
cos_angle = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
if cos_angle > 1:
cos_angle = 1
elif cos_angle < -1:
cos_angle = -1
angle = math.acos(cos_angle)
return abs(angle) < math.pi / 2
def greedy_approach(coordinates):
all_visited = set(range(len(coordinates)))
overall_best_route = []
for start in range(len(coordinates)):
route = [coordinates[start]]
visited = {start}
unvisited = all_visited - visited
while unvisited:
best_route = None
best_distance = float("inf")
for i in unvisited:
point = coordinates[i]
# Vektor 1
v1 = np.array(route[-1]) - np.array(point)
if len(route) > 1:
# Vektor 2
v2 = np.array(route[-2]) - np.array(route[-1])
else:
v2 = np.array([1, 0])
if check_angle(v1, v2):
temp_route = route + [point]
temp_distance = route_length(temp_route)
if temp_distance < best_distance:
best_route = temp_route
best_distance = temp_distance
if not best_route:
break
route = best_route
visited.add(coordinates.index(route[-1]))
unvisited = all_visited - visited
if len(route) > len(overall_best_route):
overall_best_route = route
if len(visited) == len(coordinates):
return overall_best_route
return overall_best_route
# return None
def plot_route(coordinates, route):
x_coo = coordinates[:, 0]
y_coo = coordinates[:, 1]
x = [p[0] for p in route]
# x = [p[0] for p in route]
y = [p[1] for p in route]
fig, ax = plt.subplots()
ax.plot(x_coo, y_coo, 'o')
line, = ax.plot(x[:1], y[:1], '-')
def update(num):
num += 1
if num >= len(x) + 1:
ani.event_source.stop()
return line,
line.set_data(x[:num], y[:num])
return line,
ani = animation.FuncAnimation(fig, update, frames=len(x)+1, interval=100, blit=False)
ax.plot(x[0], y[0], 'go')
ax.plot(x[-1], y[-1], 'ro')
plt.show()
if __name__ == "__main__":
file = 'X:\coordinates6.txt'
coordinates = read_coordinates(file)
route = greedy_approach(coordinates)
plot_route(np.array(coordinates), route)
It goes without saying that you need to change the variable "file" accoring to the path, where you've saved the txt-file. In the following I've included the txt-file of such an "unsolveable" graph:
102.909291 60.107868
-89.453831 162.237392
64.943433 -119.784474
121.392544 56.694081
-107.196865 -77.792599
20.218290 88.031173
202.346980 -189.069699
143.114152 -135.866707
-144.887799 -73.495410
92.255820 -93.514104
-55.091518 198.826966
228.929427 82.624982
96.781707 141.370805
154.870684 140.327660
112.833346 -38.057607
14.005617 -14.015334
138.136997 -31.348808
73.689751 110.224271
100.006932 76.579303
120.906436 131.798810
21.067444 122.164599
49.091876 150.678826
85.043830 108.946389
-194.986965 101.363745
152.102728 -193.381252
238.583388 -133.143524
151.432196 121.427337
221.028639 -139.435079
-139.741580 57.936680
-72.565291 -24.281820
155.405344 -56.437901
58.019653 49.937906
277.821597 104.262606
19.765322 -99.236400
246.621634 101.705861
289.298882 56.051342
172.836936 59.184233
132.794476 135.681392
155.341949 -20.252779
134.692592 -102.152826
-97.391662 124.120512
245.415055 44.794067
255.134924 115.594915
83.005905 64.646774
245.020791 -167.448848
-102.699992 95.632069
-4.590656 -40.067226
-191.216327 -162.689024
210.186432 -127.403563
-51.343758 -57.654823
187.669263 -122.655771
121.661135 85.267672
46.674278 -193.090008
-189.988471 -98.043874
-175.118239 77.842636
-187.485329 -177.031237
56.716498 66.959624
-18.507391 -22.905270
-167.994506 138.195365
81.740403 10.276251
-19.310012 -131.810965
157.588994 -144.200765
40.327635 19.216022
-126.569816 -30.645224
150.526118 -88.230057
76.647124 -7.289705
231.944823 82.961057
58.716620 32.835930
-288.744132 -173.349893
-293.833463 -165.440105
-31.745416 -69.207960
175.677917 98.929343
216.825920 -152.024123
21.176627 -165.421555
-100.569041 140.808607
-90.160190 -25.200829
242.810288 -182.054289
-154.225945 -135.522059
102.223372 174.201904
64.559003 82.567627
I would really appreciate it if you could have a look into that problem :)
Every node has a potential edge to every other node. ( It might be possible to optimize things by specifying a cut-off distance so that two nodes that are far apart will not create an edge )
Now you need to split nodes so that you end up with nodes that only have two edges connecting them.
Now set a "cost" for each node which is 1 if the angle between the edges is more than the cut-off and "infinite" if smaller
Transform the nodes into edges ( with the cost ) and the edges into nodes.
Run the spanning tree algorithm on this graph.
This diagram shows how to split nodes so that they have only two ( real ) edges.
The red edges will have zero cost and be ignored when calculating the angle.
I am trying to use Kdtree data structure to remove closest points from an array preferablly without for loops.
import sys
import time
import scipy.spatial
class KDTree:
"""
Nearest neighbor search class with KDTree
"""
def __init__(self, data):
# store kd-tree
self.tree = scipy.spatial.cKDTree(data)
def search(self, inp, k=1):
"""
Search NN
inp: input data, single frame or multi frame
"""
if len(inp.shape) >= 2: # multi input
index = []
dist = []
for i in inp.T:
idist, iindex = self.tree.query(i, k=k)
index.append(iindex)
dist.append(idist)
return index, dist
dist, index = self.tree.query(inp, k=k)
return index, dist
def search_in_distance(self, inp, r):
"""
find points with in a distance r
"""
index = self.tree.query_ball_point(inp, r)
return np.asarray(index)
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
start = time.time()
fig, ar = plt.subplots()
t = 0
R = 50.0
u = R *np.cos(t)
v = R *np.sin(t)
x = np.linspace(-100,100,51)
y = np.linspace(-100,100,51)
xx, yy = np.meshgrid(x,y)
points =np.vstack((xx.ravel(),yy.ravel())).T
Tree = KDTree(points)
ind = Tree.search_in_distance([u, v],10.0)
ar.scatter(points[:,0],points[:,1],c='k',s=1)
infected = points[ind]
ar.scatter(infected[:,0],infected[:,1],c='r',s=5)
def animate(i):
global R,t,start,points
ar.clear()
u = R *np.cos(t)
v = R *np.sin(t)
ind = Tree.search_in_distance([u, v],10.0)
ar.scatter(points[:,0],points[:,1],c='k',s=1)
infected = points[ind]
ar.scatter(infected[:,0],infected[:,1],c='r',s=5)
#points = np.delete(points,ind)
t+=0.01
end = time.time()
if end - start != 0:
print((end - start), end="\r")
start = end
ani = animation.FuncAnimation(fig, animate, interval=20)
plt.show()
but no matter what i do i can't get np.delete to work with the indecies returned by the ball_query method. What am i missing?
I would like to make the red colored points vanish in each iteration from the points array.
Your points array is a Nx2 matrix. Your ind indices are a list of row indices. What you need is to specify the axis along which you need deletion, ultimately this:
points = np.delete(points,ind,axis=0)
Also, once you delete indices, watch out for missing indices in your next iteration/calculations. Maybe you want to have a copy to delete points and plot and another copy for calculations that you do not delete from it.
I have the following code that takes very long time to execute. The pandas DataFrames df and df_plants are very small (less than 1Mb). I wonder if there is any way to optimise this code:
import pandas as pd
import geopy.distance
import re
def is_inside_radius(latitude, longitude, df_plants, radius):
if (latitude != None and longitude != None):
lat = float(re.sub("[a-zA-Z]", "", str(latitude)))
lon = float(re.sub("[a-zA-Z]", "", str(longitude)))
for index, row in df_plants.iterrows():
coords_1 = (lat, lon)
coords_2 = (row["latitude"], row["longitude"])
dist = geopy.distance.distance(coords_1, coords_2).km
if dist <= radius:
return 1
return 0
df["inside"] = df.apply(lambda row: is_inside_radius(row["latitude"],row["longitude"],df_plants,10), axis=1)
I use regex to process latitude and longitude in df because the values contain some errors (characters) which should be deleted.
The function is_inside_radius verifies if row[latitude] and row[longitude] are inside the radius of 10 km from any of the points in df_plants.
Can you try this?
import pandas as pd
from geopy import distance
import re
def is_inside_radius(latitude, longitude, df_plants, radius):
if (latitude != None and longitude != None):
lat = float(re.sub("[a-zA-Z]", "", str(latitude)))
lon = float(re.sub("[a-zA-Z]", "", str(longitude)))
coords_1 = (lat, lon)
for row in df_plants.itertuples():
coords_2 = (row["latitude"], row["longitude"])
if distance.distance(coords_1, coords_2).km <= radius:
return 1
return 0
df["inside"] = df.map(
lambda row: is_inside_radius(
row["latitude"],
row["longitude"],
df_plants,
10),
axis=1)
From https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html#pandas-dataframe-iterrows, pandas.DataFrame.itertuples() returns namedtuples of the values which is generally faster than pandas.DataFrame.iterrows(), and preserve dtypes across returned rows.
I've encountered such a problem before, and I see one simple optimisation: try to avoid the floating point calculation as much a possible, which you can do as follows:
Imagine:
You have a circle, defined by Mx and My (center coordinates) and R (radius).
You have a point, defined by is coordinates X and Y.
If your point (X,Y) is not even within the square, defined by (Mx, My) and size 2*R, then it will also not be within the circle, defined by (Mx, My) and radius R.
In pseudo-code:
function is_inside(X,Y,Mx,My,R):
if (abs(Mx-X) >= R) OR (abs(My-Y) >= R)
then return false
else:
// and only here you perform the floating point calculation
I'm trying to split a Shapely LineString at the nearest point to some other coordinate. I can get the closest point on the line using project and interpolate but I am unable to split the line at this point as it is not a vertex.
I need to split the line along the edge, not snapped to the nearest vertex, so that the nearest point becomes a new vertex on the line.
Here's what I've done so far:
from shapely.ops import split
from shapely.geometry import Point, LineString
line = LineString([(0, 0), (5,8)])
point = Point(2,3)
# Find coordinate of closest point on line to point
d = line.project(point)
p = line.interpolate(d)
print(p)
# >>> POINT (1.910112359550562 3.056179775280899)
# Split the line at the point
result = split(line, p)
print(result)
# >>> GEOMETRYCOLLECTION (LINESTRING (0 0, 5 8))
Thanks!
As it turns out the answer I was looking for was outlined in the documentation as the cut method:
def cut(line, distance):
# Cuts a line in two at a distance from its starting point
if distance <= 0.0 or distance >= line.length:
return [LineString(line)]
coords = list(line.coords)
for i, p in enumerate(coords):
pd = line.project(Point(p))
if pd == distance:
return [
LineString(coords[:i+1]),
LineString(coords[i:])]
if pd > distance:
cp = line.interpolate(distance)
return [
LineString(coords[:i] + [(cp.x, cp.y)]),
LineString([(cp.x, cp.y)] + coords[i:])]
Now I can use the projected distance to cut the LineString:
...
d = line.project(point)
# print(d) 3.6039927920216237
cut(line, d)
# LINESTRING (0 0, 1.910112359550562 3.056179775280899)
# LINESTRING (1.910112359550562 3.056179775280899, 5 8)
I have a list of unsorted points:
List = [(-50.6261, 74.3683), (-63.2489, 75.0038), (-76.0384, 75.6219), (-79.8451, 75.7855), (-30.9626, 168.085), (-27.381, 170.967), (-22.9191, 172.928), (-16.5869, 173.087), (-4.813, 172.505), (-109.056, 92.0063), (-96.0705, 91.4232), (-83.255, 90.8563), (-80.7807, 90.7498), (-54.1694, 89.5087), (-41.6419, 88.9191), (-32.527, 88.7737), (-27.6403, 91.0134), (-22.3035, 95.141), (-18.0168, 100.473), (-15.3918, 105.542), (-13.6401, 112.373), (-13.3475, 118.988), (-14.4509, 125.238), (-17.1246, 131.895), (-21.6766, 139.821), (-28.5735, 149.98), (-33.395, 156.344), (-114.702, 83.9644), (-114.964, 87.4599), (-114.328, 89.8325), (-112.314, 91.6144), (-109.546, 92.0209), (-67.9644, 90.179), (-55.2013, 89.5624), (-34.4271, 158.876), (-34.6987, 161.896), (-33.6055, 164.993), (-87.0365, 75.9683), (-99.8007, 76.0889), (-105.291, 76.5448), (-109.558, 77.3525), (-112.516, 79.2509), (-113.972, 81.3335), (2.30014, 171.635), (4.40918, 169.691), (5.07165, 166.974), (5.34843, 163.817), (5.30879, 161.798), (-29.6746, 73.5082), (-42.5876, 74.0206)]
I want to sort those points to have a continuous curve passing by every point just once, starting from start = (-29.6746, 73.5082)
and end = (5.30879, 161.798)
This is what I tried so far:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import NearestNeighbors
import networkx as nx
for el in List:
X.append(el[0])
Y.append(el[1])
x = np.array(X)
y = np.array(Y)
points = np.c_[x, y]
# find 2 nearest neighbors
clf = NearestNeighbors(2).fit(points)
G = clf.kneighbors_graph()
T = nx.from_scipy_sparse_matrix(G)
# indexes of the new order
order = list(nx.dfs_preorder_nodes(T, 0))
# sorted arrays
new_x = x[order]
new_y = y[order]
plt.plot(new_x, new_y)
plt.show()
But I still get an unsorted list, and I couldn't find a way to determine the start point and end point.
We can see the problem as a Traveling salesman problem, that we can optimize by looking for the nearest point
def distance(P1, P2):
"""
This function computes the distance between 2 points defined by
P1 = (x1,y1) and P2 = (x2,y2)
"""
return ((P1[0] - P2[0])**2 + (P1[1] - P2[1])**2) ** 0.5
def optimized_path(coords, start=None):
"""
This function finds the nearest point to a point
coords should be a list in this format coords = [ [x1, y1], [x2, y2] , ...]
"""
if start is None:
start = coords[0]
pass_by = coords
path = [start]
pass_by.remove(start)
while pass_by:
nearest = min(pass_by, key=lambda x: distance(path[-1], x))
path.append(nearest)
pass_by.remove(nearest)
return path
# define a start point
start = [x0, y0]
path = optimized_path(List,start)
Not an answer, but too much for a comment
I plotted the data points as scatter and line
I see a visually smooth (low order local derivatve spline curve) with ~10% points 'out of order'
Is this typical of the problem?, is the data mostly in order?
How general or specific does the code have to be
I don't know the "big hammer" libs, but cleaned up the surounding code and did the same plot
List = [(-50.6261, 74.3683), (-63.2489, 75.0038), (-76.0384, 75.6219), (-79.8451, 75.7855), (-30.9626, 168.085), (-27.381, 170.967), (-22.9191, 172.928), (-16.5869, 173.087), (-4.813, 172.505), (-109.056, 92.0063), (-96.0705, 91.4232), (-83.255, 90.8563), (-80.7807, 90.7498), (-54.1694, 89.5087), (-41.6419, 88.9191), (-32.527, 88.7737), (-27.6403, 91.0134), (-22.3035, 95.141), (-18.0168, 100.473), (-15.3918, 105.542), (-13.6401, 112.373), (-13.3475, 118.988), (-14.4509, 125.238), (-17.1246, 131.895), (-21.6766, 139.821), (-28.5735, 149.98), (-33.395, 156.344), (-114.702, 83.9644), (-114.964, 87.4599), (-114.328, 89.8325), (-112.314, 91.6144), (-109.546, 92.0209), (-67.9644, 90.179), (-55.2013, 89.5624), (-34.4271, 158.876), (-34.6987, 161.896), (-33.6055, 164.993), (-87.0365, 75.9683), (-99.8007, 76.0889), (-105.291, 76.5448), (-109.558, 77.3525), (-112.516, 79.2509), (-113.972, 81.3335), (2.30014, 171.635), (4.40918, 169.691), (5.07165, 166.974), (5.34843, 163.817), (5.30879, 161.798), (-29.6746, 73.5082), (-42.5876, 74.0206)]
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import NearestNeighbors
import networkx as nx
points = np.asarray(List)
# find 2 nearest neighbors
clf = NearestNeighbors(2).fit(points)
G = clf.kneighbors_graph()
T = nx.from_scipy_sparse_matrix(G)
# indexes of the new order
order = list(nx.dfs_preorder_nodes(T, 0))
# sorted arrays
new_points = points[order]
plt.scatter(*zip(*points))
plt.plot(*zip(*new_points), 'r')
plt.show()