I am working on a graph flow model in the context of transport networks. I have the position of sensors (lat/lon) and would like to associate these sensors with nodes on a graph retrieved using osmnx.
At present, I use get_nearest_node to map a sensor to a node. However, this isn't optimal, as I'm at the mercy of the cartographer -- straight roads will be have fewer nodes, and so the mean displacement (and therefore error) will be higher, even when dealing with unsimplified graphs. I had considered using get_nearest_edge, but I'd still need to edit the graph to insert a new node at the position of the sensor.
Instead, I thought a reasonable way of achieving this would be to upsample the graph (perhaps using redistribute_vertices), applying get_nearest_node, and then re-simplifying the graph, but somehow whitelisting the node that is now associated with a sensor to prevent it from being removed.
However, it's not clear to me how to go from the output of redistribute_vertices to a graph -- it returns a LineString or MultiLineString rather than a new graph.
I saw this question posted on the osmnx GitHub project: https://github.com/gboeing/osmnx/issues/304, in which a GeoDataFrame is generated, with a new column containing the redistributed way as a (Multi)LineString. However, I'm not sure how I can map this new gdf back to a Graph -- the corresponding node dataframe hasn't been updated, and u and v values remain the same in the new edges table.
Any pointers (including telling me I'm going about this the wrong way and should be using function XYZ) would be really appreciated.
Related
I am looking for a way to create a graph/flowchart in Python that will allow me to create nodes as boxes of multiple values of data (ideally separated into different lines of information such as shown in one of the boxes in the picture). So far, I have tried using ETE3 which had the graph and node structure I am looking for but wouldn't allow me to connect two parents to one child. I have also looked into networkx some but there doesn't seem to be much flexibility with the structure of the nodes. Here is a reference to help better understand my goal:
Im just starting out with my python - graphs adventure and I have just encountered a conceptional problem. So, my data is some activities which have happend over time - activity, timestamp and some additional data. I want to create graph that shows consecutive steps, so I join data on time condition. My ideal graph whould show that A->B->C->D, but because A is always initial step i get A->B (which is ok) and obsolete A->C, A->D. So my questions is do you know any smart way to prune those edges leaving only important ones? I was thinking about maybe some function in networkx? Please help.
I have multiple node- and edgelists which form a large graph, lets call that the maingraph. My current strategy is to first read all the nodelists and import it with add_vertices. Every node then gets an internal id which depends on the order they are ingested and therefore isnt very reliable (as i've read it, if you delete one, all higher ids than the one deleted change). I assign every node a name attribute which corresponds to the external ID I use so I can keep track of my nodes between frameworks and a type attribute.
Now, how do I add the edges? When I read an edgelist it will start making a new graph (subgraph) and hence starts the internal ID at 0. Therefore, "merging" the graphs with maingraph.add_edges(subgraph.get_edgelist) inevitably fails.
It is possible to work around this and use the name attribute from both maingraph and subgraph to find out which internal ID each edges' incident nodes have in the maingraph:
def _get_real_source_and_target_id(edge):
''' takes an edge from the to-be-added subgraph and gets the ids of the corresponding nodes in the
maingraph by their name '''
source_id = maingraph.vs.select(name_eq=subgraph.vs[edge[0]]["name"])[0].index
target_id = maingraph.vs.select(name_eq=subgraph.vs[edge[1]]["name"])[0].index
return (source_id,target_id)
And then I tried
edgelist = [_get_real_source_and_target_id(x) for x in subgraph.get_edgelist()]
maingraph.add_edges(edgelist)
But that is hoooooorribly slow. The graph has millions of nodes and edges, which takes 10 seconds to load with the fast, but incorrect maingraph.add_edges(subgraph.get_edgelist) approach. with the correct approach explained above, it takes minutes (I usually stop it after 5 minutes o so). I will have to do this tens of thousands of times. I switched from NetworkX to Igraph because of the fast loading, but it doesn't really help if I have to do it like this.
Does anybody have a more clever way to do this? Any help much appreciated!
Thanks!
Nevermind, I figured out that the mistake was elsewhere. I used numpy.loadtxt() to read the node's names as strings, which somehow did funny stuff when the names were incrementing numbers with more than five figures (see my issue report here). Therefore the above solution got stuck when it tried to get the nodes where numpy messed up the node name. maingraph.vs.select(name_eq=subgraph.vs[edge[0]]["name"])[0].index simply sat there when it couldnt find the node. Now I use pandas to read the node names and it works fine.
The solution above is still ~10x faster than my previous NetworkX solution, so I will just leave it helps someone. Feel free to delete it otherwise.
I want to do some calculations on a graph where each link and node has some state variables that are updated (from one time step to another) by considering the states of all connected elements in the previous time step.
In pseudocode this would be something like:
for nodes in graph:
state(node,t+1)=f(state(node(t)),states(links(node,t)))
for links in graph:
state(link,t+1)=f(state(start_node(link,t),end_node(link,t)))
with f being some function. Are the data structures in networkx appropriate for something like this or would an implementation be very inefficient? I want to do this with graphs that have up to 50000 edges.
A google search came up empty, any hints would be appreciated.
Very briefly, two-three basic questions about the minimize_nested_blockmodel_dl function in graph-tool library. Is there a way to figure out which vertex falls onto which block? In other words, to extract a list from each block, containing the labels of its vertices.
The hierarchical visualization is rather difficult to understand for amateurs in network theory, e.g. are the squares with directed edges that are drawn meant to implicate the main direction of the underlying edges between two blocks under consideration? The blocks are nicely shown using different colors, but on a very conceptual level, which types of patterns or edge/vertex properties are behind the block categorization of vertices? In other words, when two vertices are in the same block, what can I say about their common properties?
Regarding your first question, it is fairly straightforward: The minimize_nested_blockmodel_dl() function returns a NestedBlockState object:
g = collection.data["football"]
state = minimize_nested_blockmodel_dl(g)
you can query the group membership of the nodes by inspecting the first level of the hierarchy:
lstate = state.levels[0]
This is a BlockState object, from which we get the group memberships via the get_blocks() method:
b = lstate.get_blocks()
print(b[30]) # prints the group membership of node 30
Regarding your second question, the stochastic block model assumes that nodes that belong to the same group have the same probability of connecting to the rest of the network. Hence, nodes that get classified in the same group by the function above have similar connectivity patterns. For example, if we look at the fit for the football network:
state.draw(output="football.png")
We see that nodes that belong to the same group tend to have more connections to other nodes of the same group --- a typical example of community structure. However, this is just one of the many possibilities that can be uncovered by the stochastic block model. Other topological patterns include core-periphery organization, bipartiteness, etc.