iGraph: selecting vertices connected to - python

Suppose I have the following graph:
g = ig.Graph([(0,1), (0,2), (2,3), (3,4), (4,2), (2,5), (5,0), (6,3), (5,6)], directed=False)
g.vs["name"] = ["Alice", "Bob", "Claire", "Dennis", "Esther", "Frank", "George"]
and I wish to see who Bob is connected to. Bob is only connected to one person Alice. However if try to find the edge :
g.es.select(_source=1)
>>> <igraph.EdgeSeq at 0x7f15ece78050>
I simply get the above response. How do I infer what the vertex index is from the above. Or if that isn't possible, how do I find the vertices connected to Bob?

This seems to work. The keyword arguments consist of the property, e.g _source and _target, and operator e.g eq (=). And also it seems you need to check both the source and target of the edges (even it's an undirected graph), after filtering the edges, you can use a list comprehension to loop through the edges and extract the source or target:
connected_from_bob = [edge.target for edge in g.es.select(_source_eq=1)]
connected_to_bob = [edge.source for edge in g.es.select(_target_eq=1)]
connected_from_bob
# []
connected_to_bob
# [0]
Then vertices connected with Bob is a union of the two lists:
connected_with_bob = connected_from_bob + connected_to_bob

Related

Generate a bipartite structure in gephi

I have constructed in python with networkx a bipartite network like this:
import networkx as nx
from random import choice
from string import ascii_lowercase, digits
# Define the characters to choose from
chars = ascii_lowercase + digits
# Create two separate lists of 100 random strings each
lst = [''.join(choice(chars) for _ in range(12)) for _ in range(100)]
lst1 = [''.join(choice(chars) for _ in range(12)) for _ in range(100)]
# Create node labels for each list
List1 = [city for city in lst]
List2 = [city for city in lst1]
# Create the graph object
G = nx.Graph()
# Add nodes to the graph with different bipartite indices
G.add_nodes_from(List1, bipartite=0)
G.add_nodes_from(List2, bipartite=1)
# Add edges connecting nodes from the two lists
for i in range(len(lst)):
G.add_edge(List1[i], List2[i])
# Save the graph to a file
nx.write_gexf(G, "bipartite_network.gexf")
and I want to export this in Gephi which results in the following database:
which does not give me a bipartite structure (i.e. two separate lists of node connected via edges, namely the list under id connected to the list under Label). What is the right input to give Gephi in order to obtain the desired outcome?
Thank you

How do I compare two lists of pairs to see which pair combinations exist in both?

Here is a simple example of what I am trying to do. So with two lists of pairs, such as:
pairs1 = [(egg,dog),(apple,banana),(orange,chocolate),(elephant,gargoyle),(cat,lizard)]
pairs2 = [(cat,lizard),(ice,hamster),(elephant,giraffe),(apple,gargoyle),(dog,egg)]
I want to be able to retrieve the pair combinations that the two lists have in common. So for these two lists, the pairs retrieved would be (cat,lizard) and (dog,egg). The order of the elements within in the pair don't matter, just the fact that the pair combination is within the same tuple.
Try:
pairs1 = [
("egg", "dog"),
("apple", "banana"),
("orange", "chocolate"),
("elephant", "gargoyle"),
("cat", "lizard"),
]
pairs2 = [
("cat", "lizard"),
("ice", "hamster"),
("elephant", "giraffe"),
("apple", "gargoyle"),
("dog", "egg"),
]
x = set(map(frozenset, pairs1)).intersection(map(frozenset, pairs2))
print(list(map(tuple, x)))
Prints:
[('lizard', 'cat'), ('egg', 'dog')]

How to create a DAG from a list in python

I am using networkx to manually input the (u, v, weights) to a graph. But when the input gets bigger this manual insertion of nodes and edges will become a really tiresome task and prone to errors. I'm trying but haven't figured out that how to perform this task without manual labour.
Sample Input:
my_list = ["s1[0]", "d1[0, 2]", "s2[0]", "d2[1, 3]", "d3[0, 2]", "d4[1, 4]", "d5[2,
3]", "d6[1, 4]"]
Manual Insertion:
Before inserting nodes into a graph I need to number them, so first occurrence of 's' or 'd' can be differentiate from later similar characters e.g. s1,s2,s3,... and d1,d2,d3,...
I am aware it is something similar to SSA form (compilers) but I was not able to find something helpful for my case.
Manually inserting (u, v, weights) to a DiGraph()
my_graph.add_weighted_edges_from([("s1", "d1", 0), ("d1", "s2", 0), ("d1", "d3", 2), ("s2", "d3", 0), (
"d2", "d4", 1), ("d2", "d5", 3), ("d3", "d5", 2), ("d4", "d6", 1), ("d4", "d6", 4)])
Question:
How to automatically convert that input list(my_list) into a DAG(my_graph), avoiding manual insertion?
Complete Code:
This is what I have written so far.
import networkx as nx
from networkx.drawing.nx_agraph import write_dot, graphviz_layout
from matplotlib import pyplot as plt
my_graph = nx.DiGraph()
my_graph.add_weighted_edges_from([("s1", "d1", 0), ("d1", "s2", 0), ("d1", "d3", 2), ("s2", "d3", 0), (
"d2", "d4", 1), ("d2", "d5", 3), ("d3", "d5", 2), ("d4", "d6", 1), ("d4", "d6", 4)])
write_dot(my_graph, "graph.dot")
plt.title("draw graph")
pos = graphviz_layout(my_graph, prog='dot')
nx.draw(my_graph, pos, with_labels=True, arrows=True)
plt.show()
plt.clf()
Explanation:
's' and 'd' are some instructions that requires 1 or 2 registers respectively, to perform an operation.
In above example we have 2 's' operations and 6 'd' operations and there are five registers [0,1,2,3,4].
Each operation will perform some calculation and store the results in relevant register/s.
From input we can see that d1 uses register 0 and 2, so it cannot operate until both of these registers are free. Therefore, d1 is dependent on s1 because s1 comes before d1 and is using register 0. As soon as s1 finishes d1 can operate as register 2 is already free.
E.g. We initialize all registers with 1. s1 doubles its input while d1 sums two inputs and store the result in it's second register:
so after s1[0] reg-0 * 2 -> 1 * 2 => reg-0 = 2
and after d1[0, 2] reg-0 + reg-2 -> 2 + 1 => reg-0 = 2 and reg-2 = 3
Update 1: The graph will be a dependency-graph based on some resources [0...4], each node will require 1(for 's') or 2(for 'd') of these resources.
Update 2: Two questions were causing confusion so I'm separating them. For now I have changed my input list and there is only a single task of converting that list into a DAG. I have also included an explanation section.
PS: You might need to pip install graphviz if you don't already have it.
Ok now that I have a better idea of how the mapping works, it just comes down to describing the process in code, keeping a mapping of which op is using which resource and as iterating over the operations if it uses a resource used by the previous operation we generate an edge. I think this is along the lines of what you are looking for:
import ast
class UniqueIdGenerator:
def __init__(self, initial=1):
self.auto_indexing = {}
self.initial = initial
def get_unique_name(self, name):
"adds number after given string to ensure uniqueness."
if name not in self.auto_indexing:
self.auto_indexing[name] = self.initial
unique_idx = self.auto_indexing[name]
self.auto_indexing[name] += 1
return f"{name}{unique_idx}"
def generate_DAG(source):
"""
takes iterable of tuples in format (name, list_of_resources) where
- name doesn't have to be unique
- list_of_resources is a list of resources in any hashable format (list of numbers or strings is typical)
generates edges in the format (name1, name2, resource),
- name1 and name2 are unique-ified versions of names in input
- resource is the value in the list of resources
each "edge" represents a handoff of resource, so name1 and name2 use the same resource sequentially.
"""
# format {resource: name} for each resource in use.
resources = {}
g = UniqueIdGenerator()
for (op, deps) in source:
op = g.get_unique_name(op)
for resource in deps:
# for each resource this operation requires, if a previous operation used it
if resource in resources:
# yield the new edge
yield (resources[resource], op, resource)
# either first or yielded an edge, this op is now using the resource.
resources[resource] = op
my_list = ["s[0]", "d[0, 2]", "s[0]", "d[1, 3]", "d[0, 2]", "d[1, 4]", "d[2, 3]", "d[1, 4]"]
data = generate_DAG((a[0], ast.literal_eval(a[1:])) for a in my_list)
print(*data, sep="\n")

construct a tree out of list of strings

I have 400 lists that look like that:
[A ,B, C,D,E]
[A, C, G, B, E]
[A,Z,B,D,E]
...
[A,B,R,D,E]
Each with length of 5 items that start with A.
I would like to construct a tree or directed acyclic graph (while with counts a weights ) where each level is the index of the item i.e A have edges with all possible items in the first index, they will have edge with child in the second index and so on.
is there an easy way to build in in networkx ? what i thought to do is to create all the tuples for each level i.e for level 0 : (A,B) ,(A,C) , (A,Z) etc .. but not sure how to move with it
If I understood you correctly, you can set each list as a path using nx.add_path of a directed graph.
l = [['A' ,'B', 'C','D','E'],
['A', 'C','G', 'B', 'E'],
['A','Z','B','D','E'],
['A','B','R','D','E']]
Though since you have nodes across multiple levels, you should probably rename them according to their level, since you cannot have multiple nodes with the same name. So one way could be:
l = [[f'{j}_level{lev}' for lev,j in enumerate(i, 1)] for i in l]
#[['A_level1', 'B_level2', 'C_level3', 'D_level4', 'E_level5'],
# ['A_level1', 'C_level2', 'G_level3', 'B_level4', 'E_level5'],
# ['A_level1', 'Z_level2', 'B_level3', 'D_level4', 'E_level5'],
# ['A_level1', 'B_level2', 'R_level3', 'D_level4', 'E_level5']]
And now construct the graph with: ​
G = nx.DiGraph()
for path in l:
nx.add_path(G, path)
Then you could create a tree-like structure using a graphviz's dot layout:
from networkx.drawing.nx_agraph import graphviz_layout
pos=graphviz_layout(G, prog='dot')
nx.draw(G, pos=pos,
node_color='lightgreen',
node_size=1500,
with_labels=True,
arrows=True)

Python: Compute all possible pairwise distances of a list (DTW)

I have a list of items like so: T=[T_0, T_1, ..., T_N] where each of T_i is itself a time series. I want to find the pairwise distances (via DTW) for all potential pairs.
E.g. If T=[T_0, T_1, T_2] and I had a DTW function f, I want to find f(T_0, T_1), f(T_0, T_2), f(T_1, T_2).
Note T_i actually looks like ( id of i, [ time series values ] ).
My code snippet looks like this:
cluster = defaultdict( list )
donotcluster = defaultdict( list )
for i, lst1 in tqdm(enumerate(T)):
for lst2 in tqdm(T):
if lst2 in cluster[lst1[0]] or lst2 in donotcluster[lst1[0]]:
pass
else:
distance, path = fastdtw(lst1[1], lst2[1], dist=euclidean)
if distance <= distance_threshold:
cluster[lst1[0]] += [ lst2 ]
cluster[lst2[0]] += [ lst1 ]
else:
donotcluster[lst1[0]] += [ lst2 ]
donotcluster[lst2[0]] += [ lst1 ]
Right now I have around 20,000 time series and this take way too long (it will run in about 5 days). I am using the python library fastdtw. Is there a more optimised library? Or just a better/faster way of computing all possible distances? Since distances are symmetric I don't have to calculate for example f(T_41,T_33) if I have already calculated f(T_33, T_41)
I would recommend keeping a set of all of the pairs you've done so far, keeping in mind that set has a constant time lookup operation. Besides that, you should consider other approaches where you don't extend your lists so often (that nasty += you're doing) since it can be rather expensive. I don't know enough of the implementation of your application to comment on that though. If you provide more information, I may be able to figure a way to get rid of some of the += that you don't need. One idea (for efficiency) would be to append each list to a list of lists, and then flatten it at the end of your script with something like
[i for x in cluster[lst[0]] for i in x]
I modified your code as follows:
cluster = defaultdict( list )
donotcluster = defaultdict( list )
seen = set() # added this
for i, lst1 in tqdm(enumerate(T)):
for lst2 in tqdm(T):
if hashPair( lst1[1], lst2[1] ) not in seen and lst2 not in cluster[lst1[0]] and lst2 not in donotcluster[lst1[0]]: # changed around your condition
seen.add( hashPair( lst1[1], lst2[1] ) # added this
distance, path = fastdtw(lst1[1], lst2[1], dist=euclidean)
if distance <= distance_threshold:
cluster[lst1[0]] += [ lst2 ]
cluster[lst2[0]] += [ lst1 ]
else:
donotcluster[lst1[0]] += [ lst2 ]
donotcluster[lst2[0]] += [ lst1 ]
def hashPair( a, b ): # added this
return ','.join(max(a,b), min(a,b))
I cannot answer your question about whether there is a more optimized dtw library, but you can use itertools to get the combinations you want without duplicates:
import itertools
for combination in itertools.combinations(T, 2):
f(combination[0], combination[1])
Here is an example of the combinations:
('T_1', 'T_2')
('T_1', 'T_3')
('T_1', 'T_4')
('T_1', 'T_5')
('T_2', 'T_3')
('T_2', 'T_4')
('T_2', 'T_5')
('T_3', 'T_4')
('T_3', 'T_5')
('T_4', 'T_5')

Categories

Resources