I'm using networkx package, how can I randomly remove multiple edges but not causing any disconnection(nodes number be the same).
I've tried Stratified sampling the dataframe, but not working, don't know how to do it.
please for any advice.
here how I do so:
removed_edge_cnt = 0
remove_list = set([])
pbar = tqdm(total=n)
while (removed_edge_cnt < n):
removed = True
drop_indices = np.random.choice(orig_data_copy.index, 1, replace=False)
edge = orig_data_copy.iloc[drop_indices, :].values.ravel()
orig_data_copy = orig_data_copy.drop(drop_indices) # 不管該邊有沒有要刪掉都要drop避免一直取相同的edge
# print('{}-{}:{}'.format(edge[0], edge[1], G.has_edge(edge[0], edge[1])))
if G.has_edge(edge[0], edge[1]): # 邊存在
G.remove_edge(edge[0], edge[1])
if not nx.is_weakly_connected(G): # 移除是否會造成disconnect
G.add_edge(edge[0], edge[1])
removed = False
if removed:
removed_edge_cnt += 1
remove_list.add((edge[0], edge[1]))
pbar.update(1)
pbar.close()
A completely connected graph is one where there is a path from every node to every other. I assume that this is what you mean by "no disconnection"
To check if a graph is completely connected, do a depth first search from any node and confirm that every node is visited.
So, remove your random edges(s) and check if the graph is still completely connected. If not, replace the links and try again until you find an edge that can be removed.
Related
I am trying to remove nodes at random from graphs using the networkx package. The first block describes the graph construction and the second block gives me the node lists that I have to remove from my graph H (20%, 50% and 70% removals). I want 3 versions of the base graph H in the end, in a list or any data structure. The code in block 3 gives me objects of type "None". The last block shows that it works for a single case.
I am guessing that the problem is in the append function, which somehow returns objects of type "None". I also feel that the base graph H might be getting altered after every iteration. Is there any way around this? Any help would be appreciated :)
import networkx as nx
import numpy as np
import random
# node removals from Graphs at random
# network construction
H = nx.Graph()
H.add_nodes_from([1,2,3,4,5,6,7,8,9,10])
H.add_edges_from([[1,2],[2,4],[5,6],[7,10],[1,5],[3,6]])
nx.info(H)
nodes_list = list(H.nodes)
# list of nodes to be removed
perc = [.20,.50,.70] # percentage of nodes to be removed
random_sample_list = []
for p in perc:
interior_list = []
random.seed(2) # for replicability
sample = round(p*10)
random_sample = random.sample(nodes_list, sample)
interior_list.append(random_sample)
random_sample_list.append(random_sample)
# applying the list of nodes to be removed to create a list of graphs - not working
graph_list = []
for i in range(len(random_sample_list)):
H1 = H.copy()
graph_list.append(H1.remove_nodes_from(random_sample_list[i]))
# list access - works
H.remove_nodes_from(random_sample_list[1])
nx.info(H)
Final output should look like:
[Graph with 20% removed nodes, Graph with 50% removed nodes, Graph with 7% removed nodes] - eg. list
The function remove_nodes_from does not return the modified graph, but returns None. Consequently, you only need to create the graph with the desired percentage of your nodes and append it to the list:
graph_list = []
for i in range(len(random_sample_list)):
H1 = H.copy()
H1.remove_nodes_from(random_sample_list[i])
graph_list.append(H1)
I have a list of tuples like this.
a = [(1,2),(1,3),(1,4),(2,5),(6,5),(7,8)]
In this list 1 relates to 2 and then 2 relates to 5 and 5 relates to 6 therefore 1 relates to 6. Similarly I need to find the relations between other elements in tuples. I need a function that takes the input values and outputs as follows:
input = (1,6) #output = True
input = (5,3) #output = True
input = (2,8) #output = False
I do not have knowledge of itertools or map functions. Can they be used to solve these types of problems?
And for the sake of curiosity and interest where can I find these types of questions to solve and where are these types of problems encountered in real life situations?
This can be easily done by considering the tuples as edges in a graph. The question is then reduced to checking if there is a path between the two nodes.
There exists lots of nice libraries for this, see e.g. networkx
import networkx as nx
a = [(1,2),(1,3),(1,4),(2,5),(6,5),(7,8)]
G = nx.Graph(a)
nx.has_path(G, 1, 6) # True
nx.has_path(G, 5, 3) # True
nx.has_path(G, 2, 8) # False
This answer here nicely states your problem as a graph problem, where every time you need to run your algorithm you need to check for the existence of a path between your input vertices. The time complexity for every query then depends on the size, order, diameter, degree of the underlying graph.
However, if you intend to run this algorithm many times with the same array a, it may be worth doing some preprocessing on the input graph to find the connected components (Wikipedia : connected components) first. In that case you can get constant time for every query. Here is the code I suggest :
# NOTE : tested using python 3.6.1
# WARNING : no input sanitization
a = [(1,2),(1,3),(1,4),(2,5),(6,5),(7,8)]
n = 8 # order of the underlying graph
# prepare graph as lists of neighbors for every vertex, i.e. adjacency lists (extra unused vertex '0', just to match the value range of the problem)
graph = [[] for i in range(n+1)]
for edge in a:
graph[edge[0]].append(edge[1])
graph[edge[1]].append(edge[0])
print( "graph : " + str(graph) )
# list of unprocessed vertices : contains all of them at the beginning
unprocessed_vertices = {i for i in range(1,n+1)}
# subroutine to discover the connected component of a vertex
def build_component():
component = [] # current connected component
curr_vertices = {unprocessed_vertices.pop()} # locally unprocessed vertices, initialize with one of the globally unprocessed vertices
while len(curr_vertices) > 0:
curr_vertex = curr_vertices.pop() # vertex to be processed
# add unprocessed neighbours of current vertex to the set of vertices to process
for neighbour in graph[curr_vertex]:
if neighbour in unprocessed_vertices:
curr_vertices.add(neighbour)
unprocessed_vertices.remove(neighbour)
component.append(curr_vertex)
return component
# main algorithm : graph traversal on multiple connected components
components = []
while len(unprocessed_vertices) > 0:
components.append( build_component() )
print( "components : " + str(components) )
# assign a number to each component
component_numbers = [None] * (n+1)
curr_number = 1
for comp in components:
for vertex in comp:
component_numbers[vertex] = curr_number
curr_number += 1
print( "component_numbers : " + str(component_numbers) )
# main functionality
def is_connected( pair ):
return component_numbers[pair[0]] == component_numbers[pair[1]]
# run main functionnality on inputs : every call is executed in constant time now, regardless of the size of the graph
print( is_connected( (1,6) ) )
print( is_connected( (5,3) ) )
print( is_connected( (2,8) ) )
I don't really know about the most likely situations where this problem could be encountered, but I suppose it can have application is some clustering tasks, or maybe if you want to know if it is possible to go from one place to another. If the edges of the graph represent dependencies between modules, this problem would tell you if two parts depend on each other, so maybe some potential applications in compiling or the managment of large projects. The underlying problem is a "Connected component" problem which is among the problems we know polynomial algorithms for.
It is generally very useful to model these kind of problems with graphs as these objects have a very simple structure, and most of the time we can reduce the original problem to a well known problem on graphs.
I'm writing a recursive breadth-first traversal of a network. The problem I ran into is that the network often looks like this:
1
/ \
2 3
\ /
4
|
5
So my traversal starts at 1, then traverses to 2, then 3. The next stop is to proceed to 4, so 2 traverses to 4. After this, 3 traverses to 4, and suddenly I'm duplicating work as both lines try to traverse to 5.
The solution I've found is to create a list called self.already_traversed, and every time a node is traversed, I append it to the list. Then, when I'm traversing from node 4, I check to make sure it hasn't already been traversed.
The problem here is that I'm using an instance variable for this, so I need a way to set up the list before the first recursion and a way to clean it up afterwards. The way I'm currently doing this is:
self.already_traversed = []
self._traverse_all_nodes(start_id)
self.already_traversed = []
Of course, it sucks to be twoggling variables outside of the function that's using them. Is there a better way to do this so this can be put into my traversal function?
Here's the actual code, though I recognize it's a bit dense:
def _traverse_all_nodes(self, root_authority, max_depth=6):
"""Recursively build a networkx graph
Process is:
- Work backwards through the authorities for self.cluster_end and all
of its children.
- For each authority, add it to a networkx graph, if:
- it happened after self.cluster_start
- it's in the Supreme Court
- we haven't exceeded a max_depth of six cases.
- we haven't already followed this path
"""
g = networkx.Graph()
if hasattr(self, 'already_traversed'):
is_already_traversed = (root_authority.pk in self.visited_nodes)
else:
# First run. Create an empty list.
self.already_traversed = []
is_already_traversed = False
is_past_max_depth = (max_depth <= 0)
is_cluster_start_obj = (root_authority == self.cluster_start)
blocking_conditions = [
is_past_max_depth,
is_cluster_start_obj,
is_already_traversed,
]
if not any(blocking_conditions):
print " No blocking conditions. Pressing on."
self.visited_nodes.append(root_authority.pk)
for authority in root_authority.authorities.filter(
docket__court='scotus',
date_filed__gte=self.cluster_start.date_filed):
g.add_edge(root_authority.pk, authority.pk)
# Combine our present graph with the result of the next
# recursion
g = networkx.compose(g, self._build_graph(
authority,
max_depth - 1,
))
return g
def add_clusters(self):
"""Do the network analysis to add clusters to the model.
Process is to:
- Build a networkx graph
- For all nodes in the graph, add them to self.clusters
"""
self.already_traversed = []
g = self._traverse_all_nodes(
self.cluster_end,
max_depth=6,
)
self.already_traversed = []
Check out:
How do I pass a variable by reference?
which contains an example on how to past a list by reference. If you pass the list by reference, every call to your function will refer to the same list.
I am looking for an algorithm to check for any valid connection (shortest or longest) between two arbitrary nodes on a graph.
My graph is fixed to a grid with logical (x, y) coordinates with north/south/east/west connections, but nodes can be removed randomly so you can't assume that taking the edge with coords closest to the target is always going to get you there.
The code is in python. The data structure is each node (object) has a list of connected nodes. The list elements are object refs, so we can then search that node's list of connected nodes recursively, like this:
for pnode in self.connected_nodes:
for cnode in pnode.connected_nodes:
...etc
I've included a diagram showing how the nodes map to x,y coords and how they are connected in north/east/south/west. Sometimes there are missing nodes (i.e between J and K), and sometimes there are missing edges (i.e between G and H). The presence of nodes and edges is in flux (although when we run the algorithm, it is taking a fixed snapshot in time), and can only be determined by checking each node for it's list of connected nodes.
The algorithm needs to yield a simple true/false to whether there is a valid connection between two nodes. Recursing through every list of connected nodes explodes the number of operations required - if the node is n edges away, it requires at most 4^n operations. My understanding is something like Dijistrka's algorithm works by finding the shortest path based on edge weights, but if there is no connection at all then would it still work?
For some background, I am using this to model 2D destructible objects. Each node represents a chunk of the material, and if one or more nodes do not have a connection to the rest of the material then it should separate off. In the diagram - D, H, R - should pare off from the main body as they are not connected.
UPDATE:
Although many of the posted answers might well work, DFS is quick, easy and very appropriate. I'm not keen on the idea of sticking extra edges between nodes with high value weights to use Dijkstra because node's themselves might disappear as well as edges. The SSC method seems more appropriate for distinguishing between strong and weakly connected graph sections, which in my graph would work if there was a single edge between G and H.
Here is my experiment code for DFS search, which creates the same graph as shown in the diagram.
class node(object):
def __init__(self, id):
self.connected_nodes = []
self.id = id
def dfs_is_connected(self, node):
# Initialise our stack and our discovered list
stack = []
discovered = []
# Declare operations count to track how many iterations it took
op_count = 0
# Push this node to the stack, for our starting point
stack.append(self)
# Keeping iterating while the stack isn't empty
while stack:
# Pop top element off the stack
current_node = stack.pop()
# Is this the droid/node you are looking for?
if current_node.id == node.id:
# Stop!
return True, op_count
# Check if current node has not been discovered
if current_node not in discovered:
# Increment op count
op_count += 1
# Is this the droid/node you are looking for?
if current_node.id == node.id:
# Stop!
return True, op_count
# Put this node in the discovered list
discovered.append(current_node)
# Iterate through all connected nodes of the current node
for connected_node in current_node.connected_nodes:
# Push this connected node into the stack
stack.append(connected_node)
# Couldn't find the node, return false. Sorry bud
return False, op_count
if __name__ == "__main__":
# Initialise all nodes
a = node('a')
b = node('b')
c = node('c')
d = node('d')
e = node('e')
f = node('f')
g = node('g')
h = node('h')
j = node('j')
k = node('k')
l = node('l')
m = node('m')
n = node('n')
p = node('p')
q = node('q')
r = node('r')
s = node('s')
# Connect up nodes
a.connected_nodes.extend([b, e])
b.connected_nodes.extend([a, f, c])
c.connected_nodes.extend([b, g])
d.connected_nodes.extend([r])
e.connected_nodes.extend([a, f, j])
f.connected_nodes.extend([e, b, g])
g.connected_nodes.extend([c, f, k])
h.connected_nodes.extend([r])
j.connected_nodes.extend([e, l])
k.connected_nodes.extend([g, n])
l.connected_nodes.extend([j, m, s])
m.connected_nodes.extend([l, p, n])
n.connected_nodes.extend([k, m, q])
p.connected_nodes.extend([s, m, q])
q.connected_nodes.extend([p, n])
r.connected_nodes.extend([h, d])
s.connected_nodes.extend([l, p])
# Check if a is connected to q
print a.dfs_is_connected(q)
print a.dfs_is_connected(d)
print p.dfs_is_connected(h)
To find this out, you just need to run simple DFS or BFS algorithm on one of the nodes, it'll find all reachable nodes within a continuous component of the graph, so you just mark it down if you've found the other node during the run of algorithm.
There is a way to use Dijkstra to find the path. If there is an edge between two nodes put 1 for weight, if there is no node, put weight of sys.maxint. Then when the min path is calculated, if it is larger than the number of nodes - there is no path between them.
Another approach is to first find the strongly connected components of the graph. If the nodes are on the same strong component then use Dijkstra to find the path, otherwise there is no path that connects them.
You could take a look at the A* Path Finding Algorithm (which uses heuristics to make it more efficient than Dijkstra's, so if there isn't anything you can exploit in your problem, you might be better off using Dijkstra's algorithm. You would need positive weights though. If this is not something you have in your graph, you could simply give each edge a weight of 1).
Looking at the pseudo code on Wikipedia, A* moves from one node to another by getting the neighbours of the current node. Dijkstra's Algorithm keeps an adjacency list so that it knows which nodes are connected to each other.
Thus, if you where to start from node H, you could only go to R and D. Since these nodes are not connected to the others, the algorithm will not go through the other nodes.
You can find strongly connected components(SCC) of your graph and then check if nodes of interest in one component or not. In your example H-R-D will be first component and rest second, so for H and R result will be true but H and A false.
See SCC algorithm here: https://class.coursera.org/algo-004/lecture/53.
I have been trying to build a graph for a project and I have been trying to identify newly added edges after populating it with more information.
For instance below you can see its first and second iteration:
---------------------- General Info Graph H-----------------------------
Total number of Nodes in Graph: 2364
Total number of Edges: 3151
---------------------- General Info Graph G -----------------------------
Total number of Nodes in Graph: 6035
Total number of Edges: 11245
The problem I have been facing is when I try to identify newly added edges using the code:
counter = 0
edges_all = list(G.edges_iter(data=True))
edges_before = list(H.edges_iter(data=True))
print "How many edges in old graph: ", len(edges_before)
print "How many edges in new graph: ", len(edges_all)
edge_not_found = []
for edge in edges_all:
if edge in edges_before:
counter += 1
else:
edge_not_found.append(edge)
print "Edges found: ", counter
print "Not found: ", len(edge_not_found)
And I have been getting these results:
How many edges in old graph: 3151
How many edges in new graph: 11245
Edges found: 1601
Not found: 9644
I can't understand why I am getting 1601 found instead of 11245-3151 = 8094
Any ideas?
Thank you!
TL/DR: There's a simple explanation for what you see, and if you get to the end, there is a much shorter way to write your code (with a lot of explanation along the way).
First note that it looks like Edges found is intended to be the number of edges that are in both H and G. So it should only have 3151, not 8094. 8094 should be Not found. Note that the number of edges found, 1601, is about half the number that you would expect. That makes sense because:
I believe the problem you are having is that when networkx lists out the edges an edge might appear as (a,b) in edges_before. However in edges_after, it might appear in the list as (b,a).
So (b,a) will not be in edges_before. It will fail your test. Assuming the edge orders aren't correlated between when they are listed for H and G, you'd expect to find about half of them pass. You can do a different test to see if (b,a) is an edge of H. This is H.has_edge(b,a)
A straightforward improvement:
for edge in edges_all:
if H.has_edge(edge[0],edge[1]):
counter += 1
else:
edge_not_found.append(edge)
This lets you avoid even defining edges_before.
You can also avoid defining edges_all through a better improvement:
for edge in G.edges_iter(data=True):
if H.has_edge(edge[0],edge[1]):
etc
Note: I've written it as H.has_edge(edge[0],edge[1]) to make clear what's happening. A more sophisticated way to write it is H.has_edge(*edge). The *edge notation unpacks the tuple.
Finally, using a list comprehension gives a better way to get edge_not_found:
edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]
This creates a list made up of edges which are in G but not in H.
Putting this all together (and using the .size() command to count edges in a network), we arrive at a cleaner version:
print "How many edges in old graph: ", H.size()
print "How many edges in new graph: ", G.size()
edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]
print "Not found: ", len(edge_not_found)
print "Edges found: ", G.size()-len(edge_not_found)