I am trying to generate a Truth Table using PANDAS in python.
I have been given a Boolean Network with 3 external nodes (U1,U2,U3) and 6 internal nodes (v1,v2,v3,v4,v5,v6).
I have created a table with all the possible combinations of the 3 external nodes which are 2^3 = 8.
import pandas as pd
import itertools
in_comb = list(itertools.product([0,1], repeat = 3))
df = pd.DataFrame(in_comb)
df.columns = ['U1','U2','U3']
df.index += 1
U1
U2
U3
0
0
0
0
0
1
0
1
0
0
1
1
1
0
0
1
1
0
1
0
1
1
1
1
And I also have created the same table but with all the possible combinations of the 6 internal nodes which are 2^6 = 64 combinations.
The functions for each node were also given
v1(t+1) = U1(t)
v2(t+1) = v1(t) and U2(t)
v3(t+1) = v2(t) and v5(t)
v5(t+1) = not U3(t)
v6(t+1) = v5(t) or v3(t)
The truth table has to be done with PANDAS and it has to show all the combinations with each combination possible.
For example.
v1
v2
v3
v4
v5
v6
0 0 0
0 0 1
0 1 0
0
0
0
0
0
0
000010
000000
000010
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
1
1
The table above is an example of how the end product should be. Where the [0 0 0] is the first combination of the external nodes.
I am confused as to how to compute the functions of each gene and how to filter the data and end up with a new table like the one here.
Here I attach an image of the problem I want to solve:
What you seem to have missed is the fact that you don't only have 3 inputs to your network, as the "old state" is also considered an input - that's what a feedback combinational network does, it turns the old state + input into new state (and often output).
This means that you have 3+6 inputs, for 2^9=512 combinations. Not very easy to understand when printed, but still possible. I modified your code to print this (beware that I'm quite new to pandas, so this code can definitely be improved)
import pandas as pd
import pandas as pd
import itertools
#list of (u, v) pairs (3 and 6 elements)
# uses bools instead of ints
inputs = list((row[0:3],row[3:]) for row in itertools.product([False,True], repeat = 9))
def new_state(u, v):
# implement the itnernal nodes
return (
u[0],
v[0] and u[1],
v[1] and v[4],
v[2],
not u[2],
v[4] or v[2]
)
new_states = list(new_state(u, v) for u,v in inputs)
# unzip inputs to (u,v), add new_states
raw_rows = zip(*zip(*inputs), new_states)
def format_boolvec(v):
"""Format a tuple of bools like (False, False, True, False) into a string like "0010" """
return "".join('1' if b else '0' for b in v)
formatted_rows = list(map(lambda row: list(map(format_boolvec, row)), raw_rows))
df = pd.DataFrame(formatted_rows)
df.columns = ['U', "v(t)", "v(t+1)"]
df.index += 1
df
The heart of it is the function new_state that takes the (u, v) pair of input & old state and produces the resulting new state. It's a direct translation of your specification.
I modified your itertools.product line to use bools, produce length-9 results and split them to 3+6 length tuples. To still print in your format, I added the format_boolvec(v) function. Other than that, it should be very easy to follow, but fell free to comment if you need more explanation.
To find an input sequence from a given start state to a given end state, you could do it yourself by hand, but it's tedious. I recommend using a graph algorithm, which is easy to implement since we also know the length of the desired path, so we don't need any fancy algorithms like Bellman-Ford or Dijkstra's - we need to just generate all length=3 paths and filter for the endpoint.
# to find desired inputs
# treat each state as a node in a graph
# (think of visual graph transition diagrams)
# and add edges between them labeled with the inputs
# find all l=3 paths, and check the end nodes
nodes = {format_boolvec(prod): {} for prod in itertools.product([False,True], repeat = 6)}
for index, row in df.iterrows():
nodes[row['v(t)']][row['U']] = row['v(t+1)']
# we now built the graph, only need to find a path from start state to end state
def prefix_paths(prefix, paths):
# aux helper function for all_length_n_paths
for path, endp in paths:
yield ([prefix]+path, endp)
def all_length_n_paths(graph, start_node, n):
"""Return all length n paths from a given starting point
Yield tuples (path, endpoint) where path is a list of strings of the inputs, and endpoint is the end of the path.
Uses internal recursion to generate paths"""
if n == 0:
yield ([], start_node)
return
for inp, nextstate in graph[start_node].items():
yield from prefix_paths(inp, all_length_n_paths(graph, nextstate, n-1))
# just iterate over all length=3 paths starting at 101100 and print it if it end's up at 011001
for path, end in all_length_n_paths(nodes, "101100", 3):
if end=="011001":
print(path)
This code should also be easy to follow, maybe except that iterator syntax.
The result is not just one, but 3 different paths:
['100', '110', '011']
['101', '110', '011']
['111', '110', '011']
Related
I am trying to determine the chance of homophily, then the homophily, of a dataset having nodes as keys and colors as values.
Example:
Node Target Colors
A N 1
N A 0
A D 1
D A 1
C X 1
X C 0
S D 0
D S 1
B 0
R N 2
N R 2
Colors are associated with the Node column and span from 0 to 2 (int).
The steps for calculating the chance of homophily on a characteristic z (in my case Color) are illustrated as follows:
c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')
print("\nChance of same color:", round(chance_homophily(c_list),2))
where chance_homophily is defined as follows:
# The function below takes a dictionary with characteristics as keys and the frequency of their occurrence as values.
# Then it computes the chance homophily for that characteristic (color)
def chance_homophily(dataset):
freq_dict = Counter([tuple(x) for x in dataset.values()])
df_freq_counter = freq_dict
c_list = list(df_freq_counter.values())
chance_homophily = 0
for class_count in c_list:
chance_homophily += (class_count/sum(c_list))**2
return chance_homophily
Then the homophily is calculated as follows:
def homophily(G, chars, IDs):
"""
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
"""
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
num_ties+=1
if chars[IDs[n1]] == chars[IDs[n2]]:
num_same_ties+=1
return (num_same_ties / num_ties)
G should be built from my dataset above (so taking into account both node and target columns).
I am not totally familiar with this network property but I think I have missed something in the implementation (e.g., is it correctly taking count of relationships among nodes in the network?). In another example (with different dataset) found on the web
https://campus.datacamp.com/courses/using-python-for-research/case-study-6-social-network-analysis?ex=1
the characteristic is also color (though it is a string, while I have a numeric variable). I do not know if they take into consideration relationship among nodes to determine, maybe using adjacency matrix: this part has not been implemented in my code, where I am using
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
Your code works perfectly fine. The only thing you are missing is the IDs dict, which would map the names of your nodes to the names of the nodes in the graph G. By creating the graph from a pandas edgelist, you are already naming your nodes, as they are in the data.
This renders the use of the "IDs"dict unnecessary. Check out the example below, one time wihtou the IDs dict and one time with a trivial dict to use the original function:
import networkx as nx
import pandas as pd
from collections import Counter
df = pd.DataFrame({"Node":["A","N","A","D","C","X","S","D","B","R","N"],
"Target":["N","A","D","A","X","C","D","S","","N","R"],
"Colors":[1,0,1,1,1,0,0,1,0,2,2]})
c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
def homophily_without_ids(G, chars):
"""
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
"""
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if n1 in chars and n2 in chars:
if G.has_edge(n1, n2):
num_ties+=1
if chars[n1] == chars[n2]:
num_same_ties+=1
return (num_same_ties / num_ties)
print(homophily_without_ids(G, c_list))
#create node ids map - trivial in this case
nodes_ids = {i:i for i in G.nodes()}
def homophily(G, chars, IDs):
"""
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
"""
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
num_ties+=1
if chars[IDs[n1]] == chars[IDs[n2]]:
num_same_ties+=1
return (num_same_ties / num_ties)
print(homophily(G, c_list, nodes_ids))
I have a simple graph with 4 nodes A,B,C,D as well as the following edges:
[A,B]
[B,D]
[B,C]
I want to find paths that start at the node C given a certain length n. For example:
for n = 1 I will only have [C] as a possible path. Result is 1
for n = 2 we only have [C,B]. Result is 1
for n = 3 we have [C,B,C] , [C,B,D], [C,B,A]. Result is 3
etc.
I have written the following (python) code:
dg = {'A':['B'],
'B':['C','D','A'],
'D':['B'],
'C':['B']}
beg = ['C']
def makePath(n):
count = 0
curArr = beg
for i in range(n):
count = len(curArr)
tmp = []
for i in curArr:
tmp.extend(dg[i])
curArr = tmp
return count
However it gets extremely slow above n=12. Is there a better algorithm to solve this and more importantly. one that can be generalized for any undirected graph (i.e. with up to 20 nodes)?
Hi i have a big text file with format like this:
1 3 1
2 3 -1
5 7 1
6 1 -1
3 2 -1
the first column is the starting node, the second column the ending node and the third column shows the sign between two nodes. So i have positive and negative signs.
Im reading the graph with the code below:
G = nx.Graph()
G = nx.read_edgelist('network.txt', delimiter='\t', nodetype=int, data=(('weight', int),))
print(nx.info(G))
I also found a function to find the neighbors of a specific node:
list1 = list(G.neigbors(1))
So i have a list with the adjacency nodes of node 1. How can a find the sign between the node 1 and each adjacency node? (For example that edge between 1-3 has sign 1, the edge 1-5 has sign -1 etc)
An example for node 1:
n_from = 1
for n_to in G.neighbors(n_from):
sign = G[n_from][n_to]['weight']
print('edge from {} to {} has sign {}'.format(
n_from, n_to, sign))
which prints, for the example input you gave:
edge from 1 to 3 has sign 1
edge from 1 to 6 has sign -1
A similar approach, treating G[n_from] as a dict:
n_from = 1
for n_to, e_data in G[n_from].items():
sign = e_data['weight']
# then print
You can alternatively use Graph.get_edge_data, as such:
e_data = G.get_edge_data(n_from, n_to)
sign = e_data.get('weight')
I want to group by key some rows in a RDD so I can perform more advanced operations with the rows within one group. Please note, I do not want to calculate merely some aggregate values. The rows are key-value pairs, where the key is a GUID and the value is a complex object.
As per pyspark documentation, I first tried to implement this with combineByKey as it is supposed be more performant than groupByKey. The list at the beginning is just for illustration, not my real data:
l = list(range(1000))
numbers = sc.parallelize(l)
rdd = numbers.map(lambda x: (x % 5, x))
def f_init(initial_value):
return [initial_value]
def f_merge(current_merged, new_value):
if current_merged is None:
current_merged = []
return current_merged.append(new_value)
def f_combine(merged1, merged2):
if merged1 is None:
merged1 = []
if merged2 is None:
merged2 = []
return merged1 + merged2
combined_by_key = rdd.combineByKey(f_init, f_merge, f_combine)
c = combined_by_key.collectAsMap()
i = 0
for k, v in c.items():
if v is None:
print(i, k, 'value is None.')
else:
print(i, k, len(v))
i += 1
The output of this is:
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
Which is not what I expected. The same logic but implemented with groupByKey returns a correct output:
grouped_by_key = rdd.groupByKey()
d = grouped_by_key.collectAsMap()
i = 0
for k, v in d.items():
if v is None:
print(i, k, 'value is None.')
else:
print(i, k, len(v))
i += 1
Returns:
0 0 200
1 1 200
2 2 200
3 3 200
4 4 200
So unless I'm missing something, this is the case when groupByKey is preferred over reduceByKey or combineByKey (the topic of related discussion: Is groupByKey ever preferred over reduceByKey).
It is the case when understanding basic APIs is preferred. In particular if you check list.append docstring:
?list.append
## Docstring: L.append(object) -> None -- append object to end
## Type: method_descriptor
you'll see that like the other mutating methods in Python API it by convention doesn't return modified object. It means that f_merge always returns None and there is no accumulation whatsoever.
That being said for most of the problems there much more efficient solutions than groupByKey but rewriting it with combineByKey (or aggregateByKey) is never one of these.
I am working on implementing the Strongly Connected Components Program from input file of numbers.I know the algorithm on how to do this,but having hard time implementing it in python.
STRONGLY-CONNECTED-COMPONENTS(G)
1. run DFS on G to compute finish times
2. compute G'
3. run DFS on G', but when selecting which node to vist do so
in order of decreasing finish times (as computed in step 1)
4. output the vertices of each tree in the depth-first forest
of step 3 as a separate strongly connected component
The file looks like this:
5 5
1 2
2 4
2 3
3 4
4 5
The first line is no. of nodes and edges.The rest of the lines are two integers u and v separated by a space, which means a directed edge from node u to node v.The output is to be a strongly connected component and the no.of these components.
DFS(G)
1 for each vertex u in G.V
2 u.color = WHITE
3 u.π = NIL
4 time = 0
5 for each vertex u in G.V
6 if u.color == WHITE
7 DFS-VISIT(G, u)
DFS-VISIT(G, u)
1 time = time + 1 // white vertex u has just been discovered
2 u.d = time
3 u.color = GRAY
4 for each v in G.adj[u]
5 if v.color == WHITE
6 v.π = u
7 DFS-VISIT(G, u)
8 u.color = BLACK // blacken u; it is finished
9 time = time + 1
10 u.f = time
In the above algorithm how should I traverse the reverse graph to find SCC.
Here, implemented in Python.
Please notice that I construct G and G' at the same time. My DFS is also modified. The visited array stores in which component each node is. Also, the DFS receives a sequence argument, that is the order in which the nodes will be tested. In the first DFS, we pass a xrange(n), but in the second time, we pass the reversed(order) from the first execution.
The program will output something like:
3
[1, 1, 1, 2, 3]
In that output, we have 3 strongly connected components, with the 3 first nodes in a single component and the remaining two with one component each.
def DFSvisit(G, v, visited, order, component):
visited[v] = component
for w in G[v]:
if not visited[w]:
DFSvisit(G, w, visited, order, component)
order.append(v);
def DFS(G, sequence, visited, order):
components = 0
for v in sequence:
if not visited[v]:
components += 1
DFSvisit(G, v, visited, order, components)
n, m = (int(i) for i in raw_input().strip().split())
G = [[] for i in xrange(n)]
Gt = [[] for i in xrange(n)]
for i in xrange(m):
a, b = (int(i) for i in raw_input().strip().split())
G[a-1].append(b-1)
Gt[b-1].append(a-1)
order = []
components = [0]*n
DFS(G, xrange(n), [0]*n, order)
DFS(Gt, reversed(order), components, [])
print max(components)
print components
class graphSCC:
def __init__(self, graplist):
self.graphlist = graphlist
self.visitedNode = {}
self.SCC_dict = {}
self.reversegraph = {}
def reversegraph(self):
for edge in self.graphlist:
line = edge.split("\t")
self.reverseGraph.setdefault(strip("\r"), []).append()
return self.reverseGraph
def dfs(self):
SCC_count = 0
for x in self.reversegraph.keys():
self.visitednode[x] = 0
for x in self.reversegraph.keys():
if self.visitednode[x] == 0:
count += 1
self.explore(x, count)
def explore(self, node, count):
self.visitednode[node] = 1
for val in self.reversegraph[node]:
if self.visitednode[val] == 0:
self.explore(val, count)
self.SCC_dict.setdefault(count, []).append(node)
length = 0
node = 0
for x in graph.SCC_dict.keys():
if length < len(graph.SCC_dict[x]):
length = len(graph.SCC_dict[x])
node = x
length is the required answer