Create a nested list based on tree relationship - python

I have a datatype that consists of multiple tuples in a list. It represents the relationship of parent-child.
For example, [('A', 1), ('A', 2, 1), ('A', 2, 2) ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('C',)] where the tuples can either have 1, 2, or three items with the format of (letter, number, number). In the above example, ('B', 1) is the parent of ('B', 1, 1) and ('B', 1, 2), and so on until we reach just a letter.
My question is, how can I create a function that will receive something like the example above and create a nested list where the similar orders and letters/numbers will be grouped together.
For instance, how do I create a function that will take something like:
[('A', 1), ('A', 2, 1), ('A', 2, 2), ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('B', 2), ('B', 3), ('C',)]
and turn it into:
[[('A', 1), [('A', 2, 1), ('A', 2, 2)] ('A', 3)], [[('B', 1, 1), ('B', 1, 2)], ('B', 2), ('B', 3)], ('C',)]
Also note that the list will come presorted already in alphabetical and numerical order. Only the lowest order tuples are in the input list as well. (Parental tuples will not appear in the input list if their children are present)
Thanks!

We basically can iterate over the tuples, and for each tuple recursively "dive" into the data structure, and add that element. I think however that a list is, at least for an intermediate structure, not appropriate. A dictionary allows fast retrieval, hence it will boost updating.
def to_nested_list(tuples):
data = {}
for tup in tuples:
elem = data
for ti in tup:
elem = elem.setdefault(ti, {})
stck = []
def to_list(source, dest):
for k, v in source.items():
stck.append(k)
if v:
dest.append(to_list(v, []))
else:
dest.append(tuple(stck))
stck.pop()
return dest
return to_list(data, [])
For the given sample data, we thus first construct a dictionary that looks, before the stck = [] line, like:
{'A': {1: {}, 2: {1: {}, 2: {}}, 3: {}}, 'B': {1: {1: {}, 2: {}}}, 'C': {}}
next we "harvest" the tuples of this structure, by iterating over the dictionary recursively, and each time if the corresponding value is not empty, adding a tuple we construct based on the "call path" to the corresponding sublist.
For example:
>>> to_nested_list([('A', 1), ('A', 2, 1), ('A', 2, 2), ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('C',)])
[[('A', 1), [('A', 2, 1), ('A', 2, 2)], ('A', 3)], [[('B', 1, 1), ('B', 1, 2)]], ('C',)]
This works for tuples of arbitrary length, as long as the elements of these tuples are all hashable (strings and integers are hashable, so we are safe here if the tuples contain only letters and numbers).
That being said, I'm not sure that using a nested list is a good idea anyway. Such list will result in the fact that it can take a lot of time to verify that the list contains a certain tuple, since the elements of the list do not "hint" about the prefix of that tuple. I think the data dictionary is probable a better representation.

Set
a = [('A', 1), ('A', 2, 1), ('A', 2, 2), ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('B', 2), ('B', 3), ('C',)]
The following solution works for trees with any depth:
First, a helper function that wraps each node with excess brackets if needed
def self_wrap(x, n):
output = x
for _ in range(n):
output = [output]
return output
Now, the main loop:
out_list = []
for i in range(len(a)):
# add 0th element to out_list
if i == 0:
out_list.append(self_wrap(a[i], len(a[i])-1))
continue
# determine the appropriate bracket level to add a[i]
prev_list = curr_list = out_list
j = 0
while min(len(a[i-1]), len(a[i])) > j and a[i-1][j] == a[i][j]:
prev_list, curr_list = curr_list, curr_list[-1]
print(curr_list, i, j)
j += 1
left_over_len = len(a[i]) - j - 1
# override if last item was parent
if j == len(a[i-1]):
prev_list[-1] = self_wrap(a[i], left_over_len + 1)
continue
# append a[i] to appropriate level and wrap with additional brackets if needed
curr_list.append(self_wrap(a[i], left_over_len) if left_over_len > 0 else a[i])
print(out_list)
This prints
[[('A', 1), [('A', 2, 1), ('A', 2, 2)], ('A', 3)], [[('B', 1, 1), ('B', 1, 2)], ('B', 2), ('B', 3)], ('C',)]
as expected.
As people have pointed out, this structure is not very efficient. There are 2 reasons:
redundant information
lists are hard to manipulate/lookup
That being said, the is probably the only way to represent paths.

Related

outputs of different list comprehensions in python 3.x

I was studying about list comprehension in Python and I am confused why this two codes are producing different outputs.
CODE:
print([(letter,num) for letter in 'abc' for num in range(2)])
print([((letter,num) for letter in 'abc') for num in range(2)])
OUTPUT:
[('a', 0), ('a', 1), ('a', 2), ('a', 3), ('b', 0), ('b', 1), ('b', 2), ('b', 3), ('c', 0), ('c', 1), ('c', 2), ('c', 3), ('d', 0), ('d', 1), ('d', 2), ('d', 3)]
[<generator object <listcomp>.<genexpr> at 0x000002919E020F20>, <generator object <listcomp>.<genexpr> at 0x000002919E148C10>, <generator object <listcomp>.<genexpr> at 0x000002919E1489E0>, <generator object <listcomp>.<genexpr> at 0x000002919E148C80>]
The first example:
print([(letter,num) for letter in 'abc' for num in range(2)])
Prints a list (because of the outer [] brackets) which contains all the tuples (because of the parentheses around letter, num) of letter and num for each value of letter looping over 'abc' and each value of num looping over every value of the generator returned by range(2) (which will be 0 and 1).
Since Python takes the first for as the outer loop, you see ('a', 0), ('a', 1), etc. instead of ('a', 0), ('b', 0), etc.
However, when you add parentheses around a for expression like (letter,num) for letter in 'abc', you're no longer executing the loop in the comprehension, but you're capturing the generators (ready to start yielding their values, but not actually yielding the values into the comprehension).
So:
print([((letter,num) for letter in 'abc') for num in range(2)])
Here, ((letter,num) for letter in 'abc') is just a generator that will yield values as soon as you start asking for them.
Note: because the value of num is not enclosed in the generators separately, if you do something with them, you may see a surprising result:
x = [((letter,num) for letter in 'abc') for num in range(2)]
print(next(x[0]))
print(next(x[0]))
print(next(x[0]))
print(next(x[1]))
print(next(x[1]))
print(next(x[1]))
Result:
('a', 1)
('b', 1)
('c', 1)
('a', 1)
('b', 1)
('c', 1)
The first list comprehension is equivalent to nested loops:
result = []
for num in range(2):
for letter in 'abc':
result.append((letter, num))
print(result)
Each iteration of the nested loop produces in an element of the resulting list.
The second is equivalent to a single loop:
result = []
for num in range(2):
result.append((letter, num) for letter in 'abc')
print(result)
Each iteration of the loop appends a generator object to the resulting list.
You could use a nested list comprehension, but then the result will be nested lists, not a flat list as in the first version.
print([list((letter,num) for letter in 'abc') for num in range(2)])
# output: [[('a', 0), ('b', 0), ('c', 0)], [('a', 1), ('b', 1), ('c', 1)]]

Python: union of set of tuples

Let's say we have two sets:
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
I want a union on 1st element to result in
u = {('b', 3), ('a', 2), ('c', 6)}
if duplicate symbol is present in both place (example 'b' in the above) then the element of the first list should be retained.
Thanks.
Just do:
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
d = dict(r)
d.update(t)
u = set(d.items())
print(u)
Output:
{('c', 6), ('a', 2), ('b', 3)}
A little bit shorter version:
s = dict((*r, *t))
set(s.items())
Output:
{('a', 2), ('b', 3), ('c', 6)}
for el in r:
if not el[0] in [x[0] for x in t]:
t.add(el)
t
{('a', 2), ('b', 3), ('c', 6)}
You can't do that with set intersecion. Two objects are either equal or they are not. Since your objects are tuples, (b, 3) and (b, 4) are not equal, and you don't get to change that.
The obvious way would be to create your own class and redefine equality, something like
class MyTuple:
def __init__(self, values):
self.values = values
def __eq__(self, other):
return self.values[0] == other[0]
and create sets of such objects.
An alternative using chain:
from itertools import chain
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
result = set({k: v for k, v in chain(r, t)}.items())
Output
{('b', 3), ('a', 2), ('c', 6)}
Here is my one-line style solution based on comprehensions:
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
result = {*t, *{i for i in r if i[0] not in {j[0] for j in t}}}
print(result) # {('b', 3), ('a', 2), ('c', 6)}
Using conversion to dictionary to eliminate the duplicates, you can also do that, which is a quite smart solution IMHO:
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
result = {(k,v) for k,v in dict((*r,*t)).items()}
print(result) # {('b', 3), ('a', 2), ('c', 6)}

In python, how should I implement a min heap on a list of tuple?

I'm trying to implement a min heap on a list of tuple.
For example:
A=[('a',2),('b',1)]
how can I heapify A based on the second element of these tuple, so that A will be heapified to [('b',1),('a',2)] ? (I must maintain a min heap.)
As per #JimMischel's comment, place your tuples in a tuple with the priority as the first element. Then use heapq:
import heapq
list = [('a', 2), ('b', 1), ('c', 0), ('d', 1)]
heap_elts = [(item[1], item) for item in list]
heapq.heapify(heap_elts) # you specifically asked about heapify, here it is!
while len(heap_elts) > 0:
print(heapq.heappop(heap_elts)[1]) # element 1 is the original tuple
produces:
('c', 0)
('b', 1)
('d', 1)
('a', 2)
import heapq
A=[('a',2),('b',1), ('d', 0), ('c', 2), ('a', 2)]
h = []
for el in A:
heapq.heappush(h, (el[1], el[0]))
print(h)
result:
[(0, 'd'), (2, 'a'), (1, 'b'), (2, 'c'), (2, 'a')]

Writing a enumerate function using lambda

I received this exercise:
Write a function enumerate that takes a list and returns a list of
tuples containing (index,item) for each item in the list
My problem is that I cannot insert the index and value in one or a combination of for loops. This is the code I managed to make:
a = ["a", "b", "c","a","b","c"]
index = 0
for i in a:
print (index,i)
index+=1
This is roughly the code I want to produce (must be on one line):
my_enumerate = lambda x :[(t) for t in x )]
print list(my_enumerate(range(4)))
How can I put it all one lambda line to get (value, index) back? The output should look like:
[(0, "a"), (1, "b"), (2, "c")]
If you can actually index just add the value by indexing:
my_enumerate = lambda x :[(t, x[t]) for t in range(len(x))]
print list(my_enumerate(a))
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'a'), (4, 'b'), (5, 'c')]
If not use zip and put range in the lambda:
my_enumerate = lambda x: zip(range(len(x), x))
print list(my_enumerate(a))
[(i,a[i])for i in range(len(a))]
my_enumerate = lambda x: [(i, x[i]) for i in xrange(len(x))]
a = ["a", "b", "c", "a", "b", "c"]
print my_enumerate(a)
outputs:
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'a'), (4, 'b'), (5, 'c')]
EDIT: use range instead of xrange and print(...) instead of print if you are using python3
Python's zip and range functions solve this problem pretty handily.
my_enumerate = lambda seq: zip(range(len(seq)), seq)
In Python 2.x, you should use itertools.izip, and xrange instead.
You could also do it recursively:
>>> myenumerate = lambda l, n=0: [] if not l else (lambda ll = list(l): [(n, ll.pop(0))] + myenumerate(ll, n+1)()
list.pop(n) returns the nth value form the list, and returns it.
The only problem is that you must pass in a list:
>>> myenumerate([1,2,3,4,5,6,7,8])
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8)]
>>> myenumerate("astring")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
AttributeError: 'str' object has no attribute 'pop'
>>> myenumerate(list("astring"))
[(0, 'a'), (1, 's'), (2, 't'), (3, 'r'), (4, 'i'), (5, 'n'), (6, 'g')]
However, if you just blindly added calls to list, you would't be able to replicate the functionality required without using a slice.
A neat trick for bypassing this requirement is to use another lambda:
>>> myenumerate = lambda l, n=0: [] if not l else (lambda ll: [(n, ll.pop(0))] + myenumerate(ll, n+1))(list(l))
>>> myenumerate("astring")
[(0, 'a'), (1, 's'), (2, 't'), (3, 'r'), (4, 'i'), (5, 'n'), (6, 'g')]

Sorting the content of a dictionary by the value and by the key

Sorting the content of a dictonary by the value has been throughly described already, so it can be acheived by something like this:
d={'d':1,'b':2,'c':2,'a':3}
sorted_res_1= sorted(d.items(), key=lambda x: x[1])
# or
from operator import itemgetter
sorted_res_2 = sorted(d.items(), key=itemgetter(1))
My question is, what would be the best way to acheive the following output:
[('d', 1), ('b', 2), ('c', 2), ('a', 3)] instead of [('d', 1), ('c', 2), ('b', 2), ('a', 3)]
so that the tuples are sorted by value and then by the key, if the value was equal.
Secondly - would such be possible for reversed:
[('a', 3), ('b', 2), ('c', 2), ('d', 1)] instead of [('a', 3), ('c', 2), ('b', 2), ('d', 1)]?
The sorted key parameter can return a tuple. In that case, the first item in the tuple is used to sort the items, and the second is used to break ties, and the third for those still tied, and so on...
In [1]: import operator
In [2]: d={'d':1,'b':2,'c':2,'a':3}
In [3]: sorted(d.items(),key=operator.itemgetter(1,0))
Out[3]: [('d', 1), ('b', 2), ('c', 2), ('a', 3)]
operator.itemgetter(1,0) returns a tuple formed from the second, and then the first item. That is, if f=operator.itemgetter(1,0) then f(x) returns (x[1],x[0]).
You just want standard tuple comparing, but in reversed mode:
>>> sorted(d.items(), key=lambda x: x[::-1])
[('d', 1), ('b', 2), ('c', 2), ('a', 3)]
An alternative approach, very close to your own example:
sorted(d.items(), key=lambda x: (x[1], x[0]))

Categories

Resources