Related
The code below should be completely reproducible. I tried my best to make question is clear, if not please ask for clarification.
What I need to do:
"From set R, find the two closest nodes to the nodes in set P Call the closest node i and the next closest node j" - quoting page 157 end of paragraph that starts step 2, in this paper.
The list R is the ordered set of nodes in a graph and P contains sublists of nodes assigned to a particular vehicle. For example
R = [1,5,6,9] # Size of R might change for new k
P = [[4],[7,3,8],[2]] # Sizes of sublists might change for new k
So vehicle k=0 takes gets node P[0]= [4], vehicle k=1 gets nodes P[1] = [7,3,8] and vehicle k=2 gets node P[2] = [2]. For each sublist in P, I want to find the two closest nodes in R from P[k].
The distances are stored in a dict:
dist =
{(1,4) : 52.35456045083369,
(5,4) : 37.48332962798263,
(6,4) : 52.92447448959697,
(9,4) : 76.83749084919418,
(1,7) : 94.89467845985885,
(1,3) : 58.9406481131655,
(1,8) : 11.180339887498949,
(5,7) : 54.817880294662984,
(5,3) : 51.478150704935004,
(5,8) : 45.044422518220834,
(6,7) : 27.80287754891569,
(6,3) : 60.74537019394976,
(6,8) : 72.3671196055225,
(9,7) : 99.68951800465283,
(9,3) : 44.68780594300866,
(9,8) : 15.811388300841896,
(1,2) : 102.44998779892558,
(5,2) : 65.60487786742691,
(6,2) : 42.37924020083418,
(9,2) : 102.55242561733974}
where the first element in the tuple-key are the R-nodes and the second element are the P-nodes.
So first P[0] = [4], the two closest to this node in R are i = 5 since the distance from 5 to 4 is 37.48 and the second closest is j = 1 since the distance from 1 to 4 is 52.35.
Now we proceed to P[1] = [7,3,8]. Here, is where I run into trouble. I interpret the paper as "which two nodes in R are closest to the entire group [7,3,8]?" My first instinct was to calculate the average distance from the R-nodes to each node in P[1] and the smallest value is the closest.
I've made an attempt, but it only works if len(p[k]) = 1. The function I need is a function that takes in R and P, and spits out i and j for each k. Here is my code:
for k in range(2):
all_nodes_dict = {}
for i in range(len(R)):
all_nodes_dict[(R[i],P[k][0])] = dist[(R[i], P[k][0])]
min_list = sorted(list(all_nodes_dict.values()), key = lambda x:float(x))
min_vals = min_list[:2]
two_closest_nodes = []
for i in range(len(min_vals)):
two_closest_nodes += [return_key(min_vals[i], all_nodes_dict)[0]]
i = two_closest_nodes[0]
j = two_closest_nodes[1]
# do something with i and j before resetting them for new iteration
Here is my code for the function return_key().
# function to return key given value and dict
def return_key(val, my_dict):
for key, value in my_dict.items():
if val == value:
return key
return "key doesn't exist"
Here is the code to generate all distances, or the dist dictionary in my code:
n = 10
random.seed(1)
# Create n random points
points = [(0, 0)]
points += [(random.randint(0, 100), random.randint(0, 100)) for i in range(n - 1)]
# Dictionary of distances between each pair of points
dist = {
(i, j): math.sqrt(sum((points[i][p] - points[j][p]) ** 2 for p in range(2)))
for i in range(n)
for j in range(n)
if i != j
}
Problem
I have a list of line segments:
exampleLineSegments = [(1,2),(2,3),(3,4),(4,5),(5,6),(4,7),(8,7)]
These segments include the indices of the corresponding point in a separate array.
From this sublist, one can see that there is a branching point (4). So three different branches are emerging from this branching point.
(In other, more specific problems, there might be / are multiple branching points for n branches.)
Target
My target is to get a dictionary including information about the existing branches, so e.g.:
result = { branch_1: [1,2,3,4],
branch_2: [4,5,6],
branch_3: [4,7,8]}
Current state of work/problems
Currently, I am identifying the branch points first by setting up a dictionary for each point and checking for each entry if there are more than 2 neighbor points found. This means that there is a branching point.
Afterwards I am crawling through all points emerging from these branch points, checking for successors etc.
In these functions, there are a some for loops and generally an intensive "crawling". This is not the cleanest solution and if the number of points increasing, the performance is not so good either.
Question
What is the best / fastest / most performant way to achieve the target in this case?
I think you can achieve it by following steps:
use a neighbors dict to store the graph
find all branch points, which neighbors count > 2
start from every branch point, and use dfs to find all the paths
from collections import defaultdict
def find_branch_paths(exampleLineSegments):
# use dict to store the graph
neighbors = defaultdict(list)
for p1, p2 in exampleLineSegments:
neighbors[p1].append(p2)
neighbors[p2].append(p1)
# find all branch points
branch_points = [k for k, v in neighbors.items() if len(v) > 2]
res = []
def dfs(cur, prev, path):
# reach the leaf
if len(neighbors[cur]) == 1:
res.append(path)
return
for neighbor in neighbors[cur]:
if neighbor != prev:
dfs(neighbor, cur, path + [neighbor])
# start from all the branch points
for branch_point in branch_points:
dfs(branch_point, None, [branch_point])
return res
update an iteration version, for big data, which may cause a recursion depth problem:
def find_branch_paths(exampleLineSegments):
# use dict to store the graph
neighbors = defaultdict(list)
for p1, p2 in exampleLineSegments:
neighbors[p1].append(p2)
neighbors[p2].append(p1)
# find all branch points
branch_points = [k for k, v in neighbors.items() if len(v) > 2]
res = []
# iteration way to dfs
stack = [(bp, None, [bp]) for bp in branch_points]
while stack:
cur, prev, path = stack.pop()
if len(neighbors[cur]) == 1 or (prev and cur in branch_points):
res.append(path)
continue
for neighbor in neighbors[cur]:
if neighbor != prev:
stack.append((neighbor, cur, path + [neighbor]))
return res
test and output:
print(find_branch_paths([(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (4, 7), (8, 7)]))
# output:
# [[4, 3, 2, 1], [4, 5, 6], [4, 7, 8]]
Hope that helps you, and comment if you have further questions. : )
UPDATE: if there are many branch points, the path will grow exponentially. So if you only want distinct segments, you can end the path when encounter another branch point.
change this line
if len(neighbors[cur]) == 1:
to
if len(neighbors[cur]) == 1 or (prev and cur in branch_points):
I have the following situation:
(1) I have a large grid. By some conditions I want to further observe specific points/cells in this grid. Each cell has an ID and coordinates X, Y seperately. So in this case lets observe one cell only - marked C on the image, that is located on the edge of the grid. By some formula I can get all the neighbouring cells of the first order (marked 1 on the image) and the second order (marked 2 on the image).
(2) With a further condition I identify some cells in the neighbouring cells and are marked in orange on the second image. What I want to do is to connect all orange cells with each other by optimizing the distances and takih into account only min() distances. My first attempt was to observe cells only by calculating the distances to cells of the lower order. So when looking at cells in neighbours cells 2, i'm looking at the cells in 1 only. The solution of connections is presented on image 2, but it's not optimal, since the ideal solution would compare the distances of all cells and not only of the cells of the lower neighbour order. By doing this, i'm getting the situation presented on image 3. And the problem is that the cells are of course not connected to the centre. What to do?
The current code is:
CO - list of centre points.
data - df all all ID's with X,Y values
CO_list = CO['ID'].tolist()
neighbor100 = []
for p in IskanjeCO_list:
d = get_neighbors100k2(p, len(data)) #function that finds the ID's of neighbours of the first order
neighbor100.append(d)
neighbor200 = []
for p in IskanjeCO_list:
d = get_neighbors200k2(p, len(data)) #function that finds the ID's of neighbours of the second order
neighbor200.append(d)
flat100 = []
for i in neighbor100:
for j in i:
flat100.append(j)
flat200 = []
for i in neighbor200:
for j in i:
flat200.append(j)
neighbors100 = flat100
neighbors200 = flat200
data_sosedi100 = data.iloc[flat100,].reset_index(drop=True)
data_sosedi200 = data.iloc[flat200,].reset_index(drop=True)
dist200 = []
for b in flat200:
d = ((pd.DataFrame((data_sosedi100['X']* - data.iloc[b,]['X'])**2
+ (data_sosedi100['Y'] - data.iloc[b,]['Y'])**2 )**0.5)).sum(1)
dist200.append(d.min())
data_sosedi200['dist'] = dist200
data_sosedi200['id'] = None
for e in CO_list:
data_sosedi200.loc[data_sosedi200['FID_2'].isin((get_neighbors200k2(e, len(data)))),'id'] = e
Do you have any suggestion how to optimize this a bit further? I hope i presented the whole image. If needed, I'll clarify further. If you see a part of the code, where i'd be able to furher optimize this loop, i'd be very grateful!
I defined the points manually to work with:
import numpy as np
from operator import itemgetter, attrgetter
nodes = [[-2,1], [-2,0], [-1,0], [0,0], [1,1], [2,1], [2,0], [1,2], [2,2]]
center = [0,0]
def find_neighbor(node):
n=[]
for i in range(-1,2):
for j in range(-1,2):
if not (i ==0 and j ==0):
n.append([node[0]+i,node[1]+j])
return [N for N in n if N in nodes]
def distance_to_center(node):
return np.sqrt(node[0]**2+node[1]**2)
def distance_between_two_nodes(node1, node2):
return np.sqrt((node1[0]-node2[0])**2+(node1[1]-node2[1])**2)
def next_node_closest_to_center(node):
min = distance_to_center(node)
next_node = node
for n in find_neighbor(node):
if distance_to_center(n) < min:
min = distance_to_center(n)
next_node = n
return next_node, min
def get_path_to_center(node):
node_path = [node]
distance = 0.
while node!= center:
new_node = next_node_closest_to_center(node)[0]
distance += distance_between_two_nodes(node, new_node)
node_path.append(new_node)
node=new_node
return node_path,distance
def furthest_nodes_from_center(nodes):
max = 0.
for n in nodes:
if get_path_to_center(n)[1] > max:
furthest_nodes_pathwise = []
max = get_path_to_center(n)[1]
furthest_nodes_pathwise.append(n)
elif get_path_to_center(n)[1] == max:
furthest_nodes_pathwise.append(n)
return furthest_nodes_pathwise
def farthest_node_from_center(nodes):
max = 0.
farthest_node = center
for n in nodes:
if distance_to_center(n) > max:
max = distance_to_center(n)
farthest_node = n
return farthest_node
def closest_node_to_center(nodes):
min = distance_to_center(farthest_node_from_center(nodes))
for n in nodes:
if distance_to_center(n) < min:
min = distance_to_center(n)
closest_node = n
return closest_node
def closest_node_center_with_furthest_distance(node_selection):
if len(node_selection) == 1:
return node_selection[0]
else:
return closest_node_to_center(node_selection)
print(closest_node_center_with_furthest_distance(furthest_nodes_from_center(nodes)))
Output:
[2, 0]
[Finished in 0.266s]
By running on all nodes I can now determine that the furthest node away path-wise but still closest to the center distance wise is [2,0] and not [2,2]. So we start from there. To find the one on the other side just split the data like I said into negative x values and positive. if you run it over a list of only the negative x value cells you will get [-2,1]
Now that you have your 2 starting cells [2,0] and [-2,1] I will leave you to figure out the algorithm to navigate to the center passing by all cells using the steps in my comments (you can now skip step 1 because this is the answer posted)
I am having trouble wrapping my head around my code in the nested for loop. I am following the Kahn's algorithm here on wiki: Kahn's. I don't understand how to test for if outgoingEdge has incoming edges for each endArray element (m).
Here is what I have so far:
def topOrdering(self, graph):
retList = []
candidates = set()
left = []
right = []
for key in graph:
left.append(key)
right.append(graph[key])
flattenedRight = [val for sublist in right for val in sublist]
for element in left:
if element not in flattenedRight:
#set of all nodes with no incoming edges
candidates.add(element)
candidates = sorted(candidates)
while len(candidates) != 0:
a = candidates.pop(0)
retList.append(a)
endArray = graph[a]
for outGoingEdge in endArray:
if outGoingEdge not in flattenedRight:
candidates.append(outGoingEdge)
#flattenedRight.remove(outGoingEdge)
del outGoingEdge
if not graph:
return "the input graph is not a DAG"
else:
return retList
Here is a picture visualizing my algorithm. Graph is in a form of an adjacency list.
You can store indegree (number of incoming edges) separately and decrement the count every time you remove a vertex from empty set. When count becomes 0 add the vertex to empty set to be processed later. Here's example:
def top_sort(adj_list):
# Find number of incoming edges for each vertex
in_degree = {}
for x, neighbors in adj_list.items():
in_degree.setdefault(x, 0)
for n in neighbors:
in_degree[n] = in_degree.get(n, 0) + 1
# Iterate over edges to find vertices with no incoming edges
empty = {v for v, count in in_degree.items() if count == 0}
result = []
while empty:
# Take random vertex from empty set
v = empty.pop()
result.append(v)
# Remove edges originating from it, if vertex not present
# in adjacency list use empty list as neighbors
for neighbor in adj_list.get(v, []):
in_degree[neighbor] -= 1
# If neighbor has no more incoming edges add it to empty set
if in_degree[neighbor] == 0:
empty.add(neighbor)
if len(result) != len(in_degree):
return None # Not DAG
else:
return result
ADJ_LIST = {
1: [2],
2: [3],
4: [2],
5: [3]
}
print(top_sort(ADJ_LIST))
Output:
[1, 4, 5, 2, 3]
Given a list of edges such as, edges = [[1,2],[2,3],[3,1],[4,5]]
I need to find how many graphs are created, by this I mean how many groups of components are created by these edges. Then get the number of vertices in the group of components.
However, I am required to be able to handle 10^5 edges, and i am currently having trouble completing the task for large number of edges.
My algorithm is currently getting the list of edges= [[1,2],[2,3],[3,1],[4,5]] and merging each list as set if they have a intersection, this will output a new list that now contains group components such as , graphs = [[1,2,3],[4,5]]
There are two connected components : [1,2,3] are connected and [4,5] are connected as well.
I would like to know if there is a much better way of doing this task.
def mergeList(edges):
sets = [set(x) for x in edges if x]
m = 1
while m:
m = 0
res = []
while sets:
common, r = sets[0], sets[1:]
sets = []
for x in r:
if x.isdisjoint(common):
sets.append(x)
else:
m = 1
common |= x
res.append(common)
sets = res
return sets
I would like to try doing this in a dictionary or something efficient, because this is toooo slow.
A basic iterative graph traversal in Python isn't too bad.
import collections
def connected_components(edges):
# build the graph
neighbors = collections.defaultdict(set)
for u, v in edges:
neighbors[u].add(v)
neighbors[v].add(u)
# traverse the graph
sizes = []
visited = set()
for u in neighbors.keys():
if u in visited:
continue
# visit the component that includes u
size = 0
agenda = {u}
while agenda:
v = agenda.pop()
visited.add(v)
size += 1
agenda.update(neighbors[v] - visited)
sizes.append(size)
return sizes
Do you need to write your own algorithm? networkx already has algorithms for this.
To get the length of each component try
import networkx as nx
G = nx.Graph()
G.add_edges_from([[1,2],[2,3],[3,1],[4,5]])
components = []
for graph in nx.connected_components(G):
components.append([graph, len(graph)])
components
# [[set([1, 2, 3]), 3], [set([4, 5]), 2]]
You could use Disjoint-set data structure:
edges = [[1,2],[2,3],[3,1],[4,5]]
parents = {}
size = {}
def get_ancestor(parents, item):
# Returns ancestor for a given item and compresses path
# Recursion would be easier but might blow stack
stack = []
while True:
parent = parents.setdefault(item, item)
if parent == item:
break
stack.append(item)
item = parent
for item in stack:
parents[item] = parent
return parent
for x, y in edges:
x = get_ancestor(parents, x)
y = get_ancestor(parents, y)
size_x = size.setdefault(x, 1)
size_y = size.setdefault(y, 1)
if size_x < size_y:
parents[x] = y
size[y] += size_x
else:
parents[y] = x
size[x] += size_y
print(sum(1 for k, v in parents.items() if k == v)) # 2
In above parents is a dict where vertices are keys and ancestors are values. If given vertex doesn't have a parent then the value is the vertex itself. For every edge in the list the ancestor of both vertices is set the same. Note that when current ancestor is queried the path is compressed so following queries can be done in O(1) time. This allows the whole algorithm to have O(n) time complexity.
Update
In case components are required instead of just number of them the resulting dict can be iterated to produce it:
from collections import defaultdict
components = defaultdict(list)
for k, v in parents.items():
components[v].append(k)
print(components)
Output:
defaultdict(<type 'list'>, {3: [1, 2, 3], 5: [4, 5]})