Related
Newbie here,
I am struggling with a homework question. We are supposed to write a function that prints out a "tree". This tree contains labels, branches and trees. A tree is constructed in the following way:
example_tree = tree(1, [tree(2), tree(3, [tree(4), tree(5)]), tree(6, [tree(7)])])
The tree-function is already defined. It creates a label and can optionally contain a new sublist (the "branches").
Now we are given the following task:
"""Print a representation of this tree in which each node is
indented by two spaces times its depth from the root.
>>> print_tree(tree(1))
1
>>> print_tree(tree(1, [tree(2)]))
1
2
>>> numbers = tree(1, [tree(2), tree(3, [tree(4), tree(5)]), tree(6, [tree(7)])])
>>> print_tree(numbers)
1
2
3
4
5
6
7
"""
Our first intuition was to build the following function:
def print_tree(t, indent=0):
for i in t:
if len(t) > 1:
print(i)
indent = indent + 1
else:
print(i)
This results in the following:
1
[2]
[3, [4], [5]]
[6, [7]]
Might someone give a pointer on how we arrive at the correct result? How to work with the indent for example? The coding is done on Google Colab.
Thanks a lot!
This sort of task (processing tree-like structure) is most commonly done by using recursion - each branch is essentialy a tree on its own, so you treat it in the same way.
class Tree:
def __init__(self, label, branches = None):
if not branches:
branches = []
self.label = label
self.branches = branches
def printTree(self, depth = 0):
print(" " * depth + str(self.label))
for branch in self.branches:
branch.printTree(depth+1)
numbers = Tree(1, [Tree(2), Tree(3, [Tree(4), Tree(5)]), Tree(6, [Tree(7)])])
numbers.printTree()
Output:
1
2
3
4
5
6
7
I have an implementation of Kosaraju's algorithm for finding SCCs in Python. The code below contains a recursive (fine on the small test cases) version and a non-recursive one (which I ultimately need because of the size of the real dataset).
I have run both the recursive and non-recursive version on a few test datasets and get the correct answer. However running it on the much larger dataset that I ultimately need to use, produces the wrong result. Going through the real data is not really an option because it contains nearly a million nodes.
My problem is that I don't know how to proceed from here. My suspision is that I either forgot a certain case of graph constellation in my test cases, or that I have a more fundamental misunderstanding about how this algo is supposed to work.
#!/usr/bin/env python3
import heapq
class Node():
"""A class to represent nodes in a DirectedGraph. It has attributes for
performing DFS."""
def __init__(self, i):
self.id = i
self.edges = []
self.rev_edges = []
self.explored = False
self.fin_time = 0
self.leader = 0
def add_edge(self, edge_id):
self.edges.append(edge_id)
def add_rev_edge(self, edge_id):
self.rev_edges.append(edge_id)
def mark_explored(self):
self.explored = True
def set_leader(self, leader_id):
self.leader = leader_id
def set_fin_time(self, fin_time):
self.fin_time = fin_time
class DirectedGraph():
"""A class to represent directed graphs via the adjacency list approach.
Each dictionary entry is a Node."""
def __init__(self, length, list_of_edges):
self.nodes = {}
self.nodes_by_fin_time = {}
self.length = length
self.fin_time = 1 # counter for the finishing time
self.leader_count = 0 # counter for the size of leader nodes
self.scc_heapq = [] # heapq to store the ssc by size
self.sccs_computed = False
for n in range(1, length + 1):
self.nodes[str(n)] = Node(str(n))
for n in list_of_edges:
ns = n[0].split(' ')
self.nodes[ns[0]].add_edge(ns[1])
self.nodes[ns[1]].add_rev_edge(ns[0])
def n_largest_sccs(self, n):
if not self.sccs_computed:
self.compute_sccs()
return heapq.nlargest(n, self.scc_heapq)
def compute_sccs(self):
"""First compute the finishing times and the resulting order of nodes
via a DFS loop. Second use that new order to compute the SCCs and order
them by their size."""
# Go through the given graph in reverse order, computing the finishing
# times of each node, and create a second graph that uses the finishing
# times as the IDs.
i = self.length
while i > 0:
node = self.nodes[str(i)]
if not node.explored:
self.dfs_fin_times(str(i))
i -= 1
# Populate the edges of the nodes_by_fin_time
for n in self.nodes.values():
for e in n.edges:
e_head_fin_time = self.nodes[e].fin_time
self.nodes_by_fin_time[n.fin_time].add_edge(e_head_fin_time)
# Use the nodes ordered by finishing times to calculate the SCCs.
i = self.length
while i > 0:
self.leader_count = 0
node = self.nodes_by_fin_time[str(i)]
if not node.explored:
self.dfs_leaders(str(i))
heapq.heappush(self.scc_heapq, (self.leader_count, node.id))
i -= 1
self.sccs_computed = True
def dfs_fin_times(self, start_node_id):
stack = [self.nodes[start_node_id]]
# Perform depth-first search along the reversed edges of a directed
# graph. While doing this populate the finishing times of the nodes
# and create a new graph from those nodes that uses the finishing times
# for indexing instead of the original IDs.
while len(stack) > 0:
curr_node = stack[-1]
explored_rev_edges = 0
curr_node.mark_explored()
for e in curr_node.rev_edges:
rev_edge_head = self.nodes[e]
# If the head of the rev_edge has already been explored, ignore
if rev_edge_head.explored:
explored_rev_edges += 1
continue
else:
stack.append(rev_edge_head)
# If the current node has no valid, unexplored outgoing reverse
# edges, pop it from the stack, populate the fin time, and add it
# to the new graph.
if len(curr_node.rev_edges) - explored_rev_edges == 0:
sink_node = stack.pop()
# The fin time is 0 if that node has not received a fin time.
# Prevents dealing with the same node twice here.
if sink_node and sink_node.fin_time == 0:
sink_node.set_fin_time(str(self.fin_time))
self.nodes_by_fin_time[str(self.fin_time)] = \
Node(str(self.fin_time))
self.fin_time += 1
def dfs_leaders(self, start_node_id):
stack = [self.nodes_by_fin_time[start_node_id]]
while len(stack) > 0:
curr_node = stack.pop()
curr_node.mark_explored()
self.leader_count += 1
for e in curr_node.edges:
if not self.nodes_by_fin_time[e].explored:
stack.append(self.nodes_by_fin_time[e])
###### Recursive verions below ###################################
def dfs_fin_times_rec(self, start_node_id):
curr_node = self.nodes[start_node_id]
curr_node.mark_explored()
for e in curr_node.rev_edges:
if not self.nodes[e].explored:
self.dfs_fin_times_rec(e)
curr_node.set_fin_time(str(self.fin_time))
self.nodes_by_fin_time[str(self.fin_time)] = Node(str(self.fin_time))
self.fin_time += 1
def dfs_leaders_rec(self, start_node_id):
curr_node = self.nodes_by_fin_time[start_node_id]
curr_node.mark_explored()
for e in curr_node.edges:
if not self.nodes_by_fin_time[e].explored:
self.dfs_leaders_rec(e)
self.leader_count += 1
To run:
#!/usr/bin/env python3
import utils
from graphs import scc_computation
# data = utils.load_tab_delimited_file('data/SCC.txt')
data = utils.load_tab_delimited_file('data/SCC_5.txt')
# g = scc_computation.DirectedGraph(875714, data)
g = scc_computation.DirectedGraph(11, data)
g.compute_sccs()
# for e, v in g.nodes.items():
# print(e, v.fin_time)
# for e, v in g.nodes_by_fin_time.items():
# print(e, v.edges)
print(g.n_largest_sccs(20))
Most complex test case (SCC_5.txt):
1 5
1 4
2 3
2 11
2 6
3 7
4 2
4 8
4 10
5 7
5 5
5 3
6 8
6 11
7 9
8 2
8 8
9 3
10 1
11 9
11 6
Drawing of that test case: https://imgur.com/a/LA3ObpN
This produces 4 SCCs:
Bottom: Size 4, nodes 2, 8, 6, 11
Left: Size 3, nodes 1, 10, 4
Top: Size 1, node 5
Right: Size 3, nodes 7, 3, 9
Ok, I figured out the missing cases. The algorithm wasn't performing correctly on very strongly connected graphs and duplicated edges. Here is an adjusted version of the test case I posted above with a duplicated edge and more edges to turn the whole graph into one big SCC.
1 5
1 4
2 3
2 6
2 11
3 2
3 7
4 2
4 8
4 10
5 1
5 3
5 5
5 7
6 8
7 9
8 2
8 2
8 4
8 8
9 3
10 1
11 9
11 6
I am fairly new to Python so please be patient, this is probably simple. I am trying to build an adjacency list representation of a graph. In this particular representation I decided to use list of lists where the first value of each sublist represents the tail node and all other values represent head nodes. For example, the graph with edges 1->2, 2->3, 3->1, 1->3 will be represented as [[1,2,3],[2,3],[3,1]].
Running the following code on this edge list, gives a problem I do not understand.
The edge list (Example.txt):
1 2
2 3
3 1
3 4
5 4
6 4
8 6
6 7
7 8
The Code:
def adjacency_list(graph):
graph_copy = graph[:]
g_tmp = []
nodes = []
for arc in graph_copy:
choice_flag_1 = arc[0] not in nodes
choice_flag_2 = arc[1] not in nodes
if choice_flag_1:
g_tmp.append(arc)
nodes.append(arc[0])
else:
idx = [item[0] for item in g_tmp].index(arc[0])
g_tmp[idx].append(arc[1])
if choice_flag_2:
g_tmp.append([arc[1]])
nodes.append(arc[1])
return g_tmp
# Read input from file
g = []
with open('Example.txt') as f:
for line in f:
line_split = line.split()
new_line = []
for element in line_split:
new_line.append(int(element))
g.append(new_line)
print('File Read. There are: %i items.' % len(g))
graph = adjacency_list(g)
During runtime, when the code processes arc 6 7 (second to last line in file), the following lines (found in the else statement) append 7 not only to g_tmp but also to graph_copy and graph.
idx = [item[0] for item in g_tmp].index(arc[0])
g_tmp[idx].append(arc[1])
What is happening?
Thank you!
J
P.S. I'm running Python 3.5
P.P.S. I also tried replacing graph_copy = graph[:] with graph_copy = list(graph). Same behavior.
The problem is in the lines
if choice_flag_1:
g_tmp.append(arc)
When you append arc, you are appending a shallow copy of the inner list. Replace with a new list like so
if choice_flag_1:
g_tmp.append([arc[0],arc[1]])
I have two CSV files that I want to compare one looks like this:
"a" 1 6 3 1 8
"b" 15 6 12 5 6
"c" 7 4 1 4 8
"d" 14 8 12 11 4
"e" 1 8 7 13 12
"f" 2 5 4 13 9
"g" 8 6 9 3 3
"h" 5 12 8 2 3
"i" 5 9 2 11 11
"j" 1 9 2 4 9
So "a" possesses the numbers 1,6,3,1,8 etc. The actual CSV file is 1,000s of lines long so you know for efficiency sake when writing the code.
The second CSV file looks like this:
4
15
7
9
2
I have written some code to import these CSV files into lists in python.
with open('winningnumbers.csv', 'rb') as wn:
reader = csv.reader(wn)
winningnumbers = list(reader)
wn1 = winningnumbers[0]
wn2 = winningnumbers[1]
wn3 = winningnumbers[2]
wn4 = winningnumbers[3]
wn5 = winningnumbers[4]
print(winningnumbers)
with open('Entries#x.csv', 'rb') as en:
readere = csv.reader(en)
enl = list(readere)
How would I now search cross reference number 4 so wn1 of CSV file 2 with the first csv file. So that it returns that "b" has wn1 in it. I imported them as a list to see if I could figure out how to do it but just ended up running in circles. I also tried using dict() but had no success.
If I understood you correctly, you want to find the first index (or all indexes) of numbers in entries that are winning. If you want it, you can do that:
with open('winningnumbers.csv', 'rb') as wn:
reader = csv.reader(wn)
winningnumbers = list(reader)
with open('Entries#x.csv', 'rb') as en:
readere = csv.reader(en)
winning_number_index = -1 # Default value which we will print if nothing is found
current_index = 0 # Initial index
for line in readere: # Iterate over entries file
all_numbers_match = True # Default value that will be set to False if any of the elements doesn't match with winningnumbers
for i in range(len(line)):
if line[i] != winningnumbers[i]: # If values of current line and winningnumbers with matching indexes are not equal
all_numbers_match = False # Our default value is set to False
break # Exit "for" without finishing
if all_numbers_match == True: # If our default value is still True (which indicates that all numbers match)
winning_number_index = current_index # Current index is written to winning_number_index
break # Exit "for" without finishing
else: # Not all numbers match
current_index += 1
print(winning_number_index)
This will print the index of the first winning number in entries (if you want all the indexes, write about it in the comments).
Note: this is not the optimal code to solve your problem. It's just easier to undestand and debug if you're not familiar with Python's more advanced features.
You should probably consider not abbreviating your variables. entries_reader takes just a second more to write and 5 seconds less to understand then readere.
This is the variant that is faster, shorter and more memory efficient, but may be harder to understand:
with open('winningnumbers.csv', 'rb') as wn:
reader = csv.reader(wn)
winningnumbers = list(reader)
with open('Entries#x.csv', 'rb') as en:
readere = csv.reader(en)
for line_index, line in enumerate(readere):
if all((line[i] == winningnumbers[i] for i in xrange(len(line)))):
winning_number_index = line_index
break
else:
winning_number_index = -1
print(winning_number_index)
The features that might me unclear are probably enumerate(), any() and using else in for and not in if. Let's go through all of them one by one.
To understand this usage of enumerate, you'll need to understand that syntax:
a, b = [1, 2]
Variables a and b will be assigned according values from the list. In this case a will be 1 and b will be 2. Using this syntax we can do that:
for a, b in [[1, 2], [2, 3], ['spam', 'eggs']]:
# do something with a and b
in each iteration, a and b will be 1 and 2, 2 and 3, 'spam' and 'eggs' accordingly.
Let's assume we have a list a = ['spam', 'eggs', 'potatoes']. enumerate() just returns a "list" like that: [(1, 'spam'), (2, 'eggs'), (3, 'potatoes')]. So, when we use it like that,
for line_index, line in enumerate(readere):
# Do something with line_index and line
line_index will be 1, 2, 3, e.t.c.
any() function accepts a sequence (list, tuple, e.t.c.) and returns True if all the elements in it are equal to True.
Generator expression mylist = [line[i] == winningnumbers[i] for i in range(len(line))] returns a list and is similar to the following:
mylist = []
for i in range(len(line)):
mylist.append(line[i] == winningnumbers[i]) # a == b will return True if a is equal to b
So any will return True only in cases when all the numbers from entry match the winning numbers.
Code in else section of for is called only when for was not interrupted by break, so in our situation it's good for setting a default index to return.
Having duplicate numbers seems illogical but if you want to get the count of matched numbers for each row regardless of index then makes nums a set and sum the times a number from each row is in the set:
from itertools import islice, imap
import csv
with open("in.txt") as f,open("numbers.txt") as nums:
# make a set of all winning nums
nums = set(imap(str.rstrip, nums))
r = csv.reader(f)
# iterate over each row and sum how many matches we get
for row in r:
print("{} matched {}".format(row[0], sum(n in nums
for n in islice(row, 1, None))))
Which using your input will output:
a matched 0
b matched 1
c matched 2
d matched 1
e matched 0
f matched 2
g matched 0
h matched 1
i matched 1
j matched 2
presuming your file is comma separated and you have a number per line in your numbers file.
If you actually want to know which numbers if any are present then you need to iterate over the number and print each one that is in our set:
from itertools import islice, imap
import csv
with open("in.txt") as f, open("numbers.txt") as nums:
nums = set(imap(str.rstrip, nums))
r = csv.reader(f)
for row in r:
for n in islice(row, 1, None):
if n in nums:
print("{} is in row {}".format(n, row[0]))
print("")
But again, I am not sure having duplicate numbers makes sense.
To group the rows based on how many matches, you can use a dict using the sum as the key and appending the first column value:
from itertools import islice, imap
import csv
from collections import defaultdict
with open("in.txt") as f,open("numbers.txt") as nums:
# make a set of all winning nums
nums = set(imap(str.rstrip, nums))
r = csv.reader(f)
results = defaultdict(list)
# iterate over each row and sum how many matches we get
for row in r:
results[sum(n in nums for n in islice(row, 1, None))].append(row[0])
results:
defaultdict(<type 'list'>,
{0: ['a', 'e', 'g'], 1: ['b', 'd', 'h', 'i'],
2: ['c', 'f', 'j']})
The keys are numbers match, the values are the rows ids that matched the n numbers.
I have a list assigned to the variable my_list. The value of my_list is [[1,2,3],[3,5,[2,3]], [[3,2],[5,[4]]]]. I need to find the length of my_list, but len(my_list) only returns 3. I want it to return 11. Is there any Python functions that will return the full length of my_list nested lists and all.
Example:
Input
[[1,2,3],[3,5,[2,3]], [[3,2],[5,[4]]]]
Output
11
I would like if this worked for not only numbers, but strings also.
This function counts the length of a list, counting any object other than list as length 1, and recursing on list items to find the flattened length, and will work with any degree of nesting up to the interpreters maximum stack depth.
def recursive_len(item):
if type(item) == list:
return sum(recursive_len(subitem) for subitem in item)
else:
return 1
Note: depending on how this will be used, it may be better to check if the item is iterable rather than checking if it has the type list, in order to correctly judge the size of tuples, etc. However, checking if the object is iterable will have the side effect of counting each character in a string rather than giving the string length 1, which may be undesirable.
hack solution, someone had to post it. Convert list to string (leave the heavy lifting / recursion to __str__ operator) then count the commas, add 1.
>>> my_list = [[1,2,3],[3,5,[2,3]], [[3,2],[5,[4]]]]
>>> str(my_list).count(",")+1
11
(works for integers & floats, of course fails with strings because they can contain commas)
EDIT: this hack doesn't account for empty lists: we have to remove [] elements:
>>> my_list = [[1,2,3],[3,5,[2,3]], [[3,2],[5,[4],[]]]] # added empty list at the end
>>> s = str(my_list)
>>> s.count(",")-s.count("[]")+1 # still 11
As an alternative, you can use flatten with len:
from compiler.ast import flatten
my_list = [[1,2,3],[3,5,[2,3]], [[3,2],[5,[4]]]]
len(flatten(my_list))
11
PS. thanks for #thefourtheye pointing out, please note:
Deprecated since version 2.6: The compiler package has been removed in Python 3.
Alternatives can be found here: Python 3 replacement for deprecated compiler.ast flatten function
You are essentially looking for a way to compute the number of leaves in a tree.
def is_leaf(tree):
return type(tree) != list
def count_leaves(tree):
if is_leaf(tree):
return 1
else:
branch_counts = [count_leaves(b) for b in tree]
return sum(branch_counts)
The count_leaves function counts the leaves in a tree by recursively computing the branch_counts of the branches, then summing those results. The base case is when the tree is a leaf, which is a tree with 1 leaf. The number of leaves differs from the length of the tree, which is its number of branches.
This is an alternative solution, which might be not so performant, since it fills a new flattened list, which is returned at the end:
def flatten_list(ls, flattened_list=[]):
for elem in ls:
if not isinstance(elem, list):
flattened_list.append(elem)
else:
flatten_list(elem, flattened_list)
return flattened_list
flatten_list intuitively flattens the list, and then you can calculate the length of the new returned flattened list with the len() function:
len(flatten_list(my_list))
Here is my implementation:
def nestedList(check):
returnValue = 0
for i in xrange(0, len(check)):
if(isinstance(check[i], list)):
returnValue += nestedList(check[i])
else:
returnValue += 1
return returnValue
This is my best attempt, utilizing recursion, and only uses the standard library and a visual. I try to not use custom libraries
def listlength(mylist, k=0, indent=''):
for l1 in mylist:
if isinstance(l1, list):
k = listlength(l1, k, indent+' ')
else:
print(indent+str(l1))
k+=1
return k
a = [[1,2,3],[3,5,[2,3]], [[3,2],[5,[4]]]]
listlength(a)
# 11
and for good measure
a = []
x = listlength(a)
print('length={}'.format(x))
# length=0
a = [1,2,3]
x = listlength(a)
print('length={}'.format(x))
#1
#2
#3
#length=3
a = [[1,2,3]]
x = listlength(a)
print('length={}'.format(x))
# 1
# 2
# 3
#length=3
a = [[1,2,3],[1,2,3]]
x = listlength(a)
print('length={}'.format(x))
# 1
# 2
# 3
# 1
# 2
# 3
#length=6
a = [1,2,3, [1,2,3],[1,2,3]]
x = listlength(a)
print('length={}'.format(x))
#1
#2
#3
# 1
# 2
# 3
# 1
# 2
# 3
#length=9
a = [1,2,3, [1,2,3,[1,2,3]]]
x = listlength(a)
print('length={}'.format(x))
#1
#2
#3
# 1
# 2
# 3
# 1
# 2
# 3
#length=9
a = [ [1,2,3], [1,[1,2],3] ]
x = listlength(a)
print('length={}'.format(x))
# 1
# 2
# 3
# 1
# 1
# 2
# 3
#length=7