Python: union of set of tuples - python

Let's say we have two sets:
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
I want a union on 1st element to result in
u = {('b', 3), ('a', 2), ('c', 6)}
if duplicate symbol is present in both place (example 'b' in the above) then the element of the first list should be retained.
Thanks.

Just do:
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
d = dict(r)
d.update(t)
u = set(d.items())
print(u)
Output:
{('c', 6), ('a', 2), ('b', 3)}

A little bit shorter version:
s = dict((*r, *t))
set(s.items())
Output:
{('a', 2), ('b', 3), ('c', 6)}

for el in r:
if not el[0] in [x[0] for x in t]:
t.add(el)
t
{('a', 2), ('b', 3), ('c', 6)}

You can't do that with set intersecion. Two objects are either equal or they are not. Since your objects are tuples, (b, 3) and (b, 4) are not equal, and you don't get to change that.
The obvious way would be to create your own class and redefine equality, something like
class MyTuple:
def __init__(self, values):
self.values = values
def __eq__(self, other):
return self.values[0] == other[0]
and create sets of such objects.

An alternative using chain:
from itertools import chain
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
result = set({k: v for k, v in chain(r, t)}.items())
Output
{('b', 3), ('a', 2), ('c', 6)}

Here is my one-line style solution based on comprehensions:
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
result = {*t, *{i for i in r if i[0] not in {j[0] for j in t}}}
print(result) # {('b', 3), ('a', 2), ('c', 6)}
Using conversion to dictionary to eliminate the duplicates, you can also do that, which is a quite smart solution IMHO:
t = {('b', 3), ('a', 2)}
r = {('b', 4), ('c', 6)}
result = {(k,v) for k,v in dict((*r,*t)).items()}
print(result) # {('b', 3), ('a', 2), ('c', 6)}

Related

Create a nested list based on tree relationship

I have a datatype that consists of multiple tuples in a list. It represents the relationship of parent-child.
For example, [('A', 1), ('A', 2, 1), ('A', 2, 2) ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('C',)] where the tuples can either have 1, 2, or three items with the format of (letter, number, number). In the above example, ('B', 1) is the parent of ('B', 1, 1) and ('B', 1, 2), and so on until we reach just a letter.
My question is, how can I create a function that will receive something like the example above and create a nested list where the similar orders and letters/numbers will be grouped together.
For instance, how do I create a function that will take something like:
[('A', 1), ('A', 2, 1), ('A', 2, 2), ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('B', 2), ('B', 3), ('C',)]
and turn it into:
[[('A', 1), [('A', 2, 1), ('A', 2, 2)] ('A', 3)], [[('B', 1, 1), ('B', 1, 2)], ('B', 2), ('B', 3)], ('C',)]
Also note that the list will come presorted already in alphabetical and numerical order. Only the lowest order tuples are in the input list as well. (Parental tuples will not appear in the input list if their children are present)
Thanks!
We basically can iterate over the tuples, and for each tuple recursively "dive" into the data structure, and add that element. I think however that a list is, at least for an intermediate structure, not appropriate. A dictionary allows fast retrieval, hence it will boost updating.
def to_nested_list(tuples):
data = {}
for tup in tuples:
elem = data
for ti in tup:
elem = elem.setdefault(ti, {})
stck = []
def to_list(source, dest):
for k, v in source.items():
stck.append(k)
if v:
dest.append(to_list(v, []))
else:
dest.append(tuple(stck))
stck.pop()
return dest
return to_list(data, [])
For the given sample data, we thus first construct a dictionary that looks, before the stck = [] line, like:
{'A': {1: {}, 2: {1: {}, 2: {}}, 3: {}}, 'B': {1: {1: {}, 2: {}}}, 'C': {}}
next we "harvest" the tuples of this structure, by iterating over the dictionary recursively, and each time if the corresponding value is not empty, adding a tuple we construct based on the "call path" to the corresponding sublist.
For example:
>>> to_nested_list([('A', 1), ('A', 2, 1), ('A', 2, 2), ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('C',)])
[[('A', 1), [('A', 2, 1), ('A', 2, 2)], ('A', 3)], [[('B', 1, 1), ('B', 1, 2)]], ('C',)]
This works for tuples of arbitrary length, as long as the elements of these tuples are all hashable (strings and integers are hashable, so we are safe here if the tuples contain only letters and numbers).
That being said, I'm not sure that using a nested list is a good idea anyway. Such list will result in the fact that it can take a lot of time to verify that the list contains a certain tuple, since the elements of the list do not "hint" about the prefix of that tuple. I think the data dictionary is probable a better representation.
Set
a = [('A', 1), ('A', 2, 1), ('A', 2, 2), ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('B', 2), ('B', 3), ('C',)]
The following solution works for trees with any depth:
First, a helper function that wraps each node with excess brackets if needed
def self_wrap(x, n):
output = x
for _ in range(n):
output = [output]
return output
Now, the main loop:
out_list = []
for i in range(len(a)):
# add 0th element to out_list
if i == 0:
out_list.append(self_wrap(a[i], len(a[i])-1))
continue
# determine the appropriate bracket level to add a[i]
prev_list = curr_list = out_list
j = 0
while min(len(a[i-1]), len(a[i])) > j and a[i-1][j] == a[i][j]:
prev_list, curr_list = curr_list, curr_list[-1]
print(curr_list, i, j)
j += 1
left_over_len = len(a[i]) - j - 1
# override if last item was parent
if j == len(a[i-1]):
prev_list[-1] = self_wrap(a[i], left_over_len + 1)
continue
# append a[i] to appropriate level and wrap with additional brackets if needed
curr_list.append(self_wrap(a[i], left_over_len) if left_over_len > 0 else a[i])
print(out_list)
This prints
[[('A', 1), [('A', 2, 1), ('A', 2, 2)], ('A', 3)], [[('B', 1, 1), ('B', 1, 2)], ('B', 2), ('B', 3)], ('C',)]
as expected.
As people have pointed out, this structure is not very efficient. There are 2 reasons:
redundant information
lists are hard to manipulate/lookup
That being said, the is probably the only way to represent paths.

In python, how should I implement a min heap on a list of tuple?

I'm trying to implement a min heap on a list of tuple.
For example:
A=[('a',2),('b',1)]
how can I heapify A based on the second element of these tuple, so that A will be heapified to [('b',1),('a',2)] ? (I must maintain a min heap.)
As per #JimMischel's comment, place your tuples in a tuple with the priority as the first element. Then use heapq:
import heapq
list = [('a', 2), ('b', 1), ('c', 0), ('d', 1)]
heap_elts = [(item[1], item) for item in list]
heapq.heapify(heap_elts) # you specifically asked about heapify, here it is!
while len(heap_elts) > 0:
print(heapq.heappop(heap_elts)[1]) # element 1 is the original tuple
produces:
('c', 0)
('b', 1)
('d', 1)
('a', 2)
import heapq
A=[('a',2),('b',1), ('d', 0), ('c', 2), ('a', 2)]
h = []
for el in A:
heapq.heappush(h, (el[1], el[0]))
print(h)
result:
[(0, 'd'), (2, 'a'), (1, 'b'), (2, 'c'), (2, 'a')]

Python: if first element is equal tuple together other elements

Apologies if this has been asked before, but I couldn't find it. If I have something like:
lst = [(('a', 'b'), 1, 2), (('a', 'b'), 3, 4), (('b', 'c'), 5, 6)]
and I want to obtain a shorter list:
new = [(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), 5, 6)]
so that it groups together the other elements in a tuple by first matching element, what is the fastest way to go about it?
You are grouping, based on a key. If your input groups are always consecutive, you can use itertools.groupby(), otherwise use a dictionary to group the elements. If order matters, use a dictionary that preserves insertion order (> Python 3.6 dict or collections.OrderedDict).
Using groupby():
from itertools import groupby
from operator import itemgetter
new = [(k, *zip(*(t[1:] for t in g))) for k, g in groupby(lst, key=itemgetter(0))]
The above uses Python 3 syntax to interpolate tuple elements from an iterable (..., *iterable)`.
Using a dictionary:
groups = {}
for key, *values in lst:
groups.setdefault(key, []).append(values)
new = [(k, *zip(*v)) for k, v in groups.items()]
In Python 3.6 or newer, that'll preserve the input order of the groups.
Demo:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> lst = [(('a', 'b'), 1, 2), (('a', 'b'), 3, 4), (('b', 'c'), 5, 6)]
>>> [(k, *zip(*(t[1:] for t in g))) for k, g in groupby(lst, key=itemgetter(0))]
[(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), (5,), (6,))]
>>> groups = {}
>>> for key, *values in lst:
... groups.setdefault(key, []).append(values)
...
>>> [(k, *zip(*v)) for k, v in groups.items()]
[(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), (5,), (6,))]
If you are using Python 2, you'd have to use:
new = [(k,) + tuple(zip(*(t[1:] for t in g))) for k, g in groupby(lst, key=itemgetter(0))]
or
from collections import OrderedDict
groups = OrderedDict()
for entry in lst:
groups.setdefault(entry[0], []).append(entry[1:])
new = [(k,) + tuple(zip(*v)) for k, v in groups.items()]
You could also use a collections.defaultdict to group your tuple keys:
from collections import defaultdict
lst = [(('a', 'b'), 1, 2), (('a', 'b'), 3, 4), (('b', 'c'), 5, 6)]
d = defaultdict(tuple)
for tup, fst, snd in lst:
d[tup] += fst, snd
# defaultdict(<class 'tuple'>, {('a', 'b'): (1, 2, 3, 4), ('b', 'c'): (5, 6)})
for key, value in d.items():
d[key] = value[0::2], value[1::2]
# defaultdict(<class 'tuple'>, {('a', 'b'): ((1, 3), (2, 4)), ('b', 'c'): ((5,), (6,))})
result = [(k, v1, v2) for k, (v1, v2) in d.items()]
Which Outputs:
[(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), (5,), (6,))]
The logic of the above code:
Group the tuples into a defaultdict of tuples.
Split the values into firsts and seconds with slicing [0::2] and [1::2].
Wrap this updated dictionary into the correct tuple structure with a list comprehension.
Depending on your use case, you might find using a dictionary or defaultdict more useful. It will scale better too.
from collections import defaultdict
listmaker = lambda: ([],[]) # makes a tuple of 2 lists for the values.
my_data = defaultdict(listmaker)
for letter_tuple, v1, v2 in lst:
my_data[letter_tuple][0].append(v1)
my_data[letter_tuple][1].append(v2)
Then you’ll get a new tuple of lists for each unique (x,y) key. Python handles the checking to see if the key already exists and it’s fast. If you absolutely need it to be a list, you can always convert it too:
new = [(k, tuple(v1s), tuple(v2s)) for k, (v1s, v2s) in my_data.items()]
This list comprehension is a bit opaque, but it will unpack your dictionary into the form specified [(('a', 'b'), (1,3), (2,4)), ... ]

SQL style inner join in Python?

I have two array like this:
[('a', 'beta'), ('b', 'alpha'), ('c', 'beta'), .. ]
[('b', 37), ('c', 22), ('j', 93), .. ]
I want to produce something like:
[('b', 'alpha', 37), ('c', 'beta', 22), .. ]
Is there an easy way to do this?
I would suggest a hash discriminator join like method:
l = [('a', 'beta'), ('b', 'alpha'), ('c', 'beta')]
r = [('b', 37), ('c', 22), ('j', 93)]
d = {}
for t in l:
d.setdefault(t[0], ([],[]))[0].append(t[1:])
for t in r:
d.setdefault(t[0], ([],[]))[1].append(t[1:])
from itertools import product
ans = [ (k,) + l + r for k,v in d.items() for l,r in product(*v)]
results in:
[('c', 'beta', 22), ('b', 'alpha', 37)]
This has lower complexity closer to O(n+m) than O(nm) because it avoids computing the product(l,r) and then filtering as the naive method would.
Mostly from: Fritz Henglein's Relational algebra with discriminative joins and lazy products
It can also be written as:
def accumulate(it):
d = {}
for e in it:
d.setdefault(e[0], []).append(e[1:])
return d
l = accumulate([('a', 'beta'), ('b', 'alpha'), ('c', 'beta')])
r = accumulate([('b', 37), ('c', 22), ('j', 93)])
from itertools import product
ans = [ (k,) + l + r for k in l&r for l,r in product(l[k], r[k])]
This accumulates both lists separately (turns [(a,b,...)] into {a:[(b,...)]}) and then computes the intersection between their sets of keys. This looks cleaner. if l&r is not supported between dictionaries replace it with set(l)&set(r).
There is no built in method. Adding package like numpy will give extra functionalities, I assume.
But if you want to solve it without using any extra packages, you can use a one liner like this:
ar1 = [('a', 'beta'), ('b', 'alpha'), ('c', 'beta')]
ar2 = [('b', 37), ('c', 22), ('j', 93)]
final_ar = [tuple(list(i)+[j[1]]) for i in ar1 for j in ar2 if i[0]==j[0]]
print(final_ar)
Output:
[('b', 'alpha', 37), ('c', 'beta', 22)]

Sorting the content of a dictionary by the value and by the key

Sorting the content of a dictonary by the value has been throughly described already, so it can be acheived by something like this:
d={'d':1,'b':2,'c':2,'a':3}
sorted_res_1= sorted(d.items(), key=lambda x: x[1])
# or
from operator import itemgetter
sorted_res_2 = sorted(d.items(), key=itemgetter(1))
My question is, what would be the best way to acheive the following output:
[('d', 1), ('b', 2), ('c', 2), ('a', 3)] instead of [('d', 1), ('c', 2), ('b', 2), ('a', 3)]
so that the tuples are sorted by value and then by the key, if the value was equal.
Secondly - would such be possible for reversed:
[('a', 3), ('b', 2), ('c', 2), ('d', 1)] instead of [('a', 3), ('c', 2), ('b', 2), ('d', 1)]?
The sorted key parameter can return a tuple. In that case, the first item in the tuple is used to sort the items, and the second is used to break ties, and the third for those still tied, and so on...
In [1]: import operator
In [2]: d={'d':1,'b':2,'c':2,'a':3}
In [3]: sorted(d.items(),key=operator.itemgetter(1,0))
Out[3]: [('d', 1), ('b', 2), ('c', 2), ('a', 3)]
operator.itemgetter(1,0) returns a tuple formed from the second, and then the first item. That is, if f=operator.itemgetter(1,0) then f(x) returns (x[1],x[0]).
You just want standard tuple comparing, but in reversed mode:
>>> sorted(d.items(), key=lambda x: x[::-1])
[('d', 1), ('b', 2), ('c', 2), ('a', 3)]
An alternative approach, very close to your own example:
sorted(d.items(), key=lambda x: (x[1], x[0]))

Categories

Resources