merging two lists of tuples in python - python

let's assume these are my lists:
oracle_files = [
(1, "__init__.py"),
(2, "price_calc.py"),
(3, "lang.py")]
predicted_files = [
(5, ["random.py","price_calc.py"]),
(2, ["__init__.py","price_calc.py"]),
(1, ["lang.py","__init__.py"])]
first list is a list of tuples where i have an identifier and a string per each.
second one is a list of tuples of integers and list of strings
my intention is to create a third list that intersects these two ones by ID (the integer)
and the output should look like this:
result = [(2, "price_calc.py", ["__init__.py","price_calc.py"]),
(1, "__init__.py", ["lang.py","__init__.py"])]
do you know a way to reach this output? because i'm not getting it right.

Here's an approach using dict:
oracle_files = [(1, "__init__.py"), (2, "price_calc.py"), (3, "lang.py")]
predicted_files = [(5, ["random.py","price_calc.py"]), (2, ["__init__.py","price_calc.py"]), (1, ["lang.py","__init__.py"])]
dct1 = dict(oracle_files)
dct2 = dict(predicted_files)
result = [(k, dct1[k], dct2[k]) for k in dct1.keys() & dct2.keys()]
print(result) # [(1, '__init__.py', ['lang.py', '__init__.py']), (2, 'price_calc.py', ['__init__.py', 'price_calc.py'])]
This uses a convenient fact that the dict keys obtained from dict.keys() behave like a set.
Keys views are set-like since their entries are unique and hashable. [...] For set-like views, all of the operations defined for the abstract base class collections.abc.Set are available (for example, ==, <, or ^).
https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects

I think this does what you want.
oracle_files = [(1, "__init__.py"), (2, "price_calc.py"), (3, "lang.py")]
predicted_files = [(5, ["random.py","price_calc.py"]), (2, ["__init__.py","price_calc.py"]), (1, ["lang.py","__init__.py"])]
dct = dict(oracle_files)
for k,v in predicted_files:
if k in dct:
dct[k] = (dct[k], v)
print(dct)
outlist = [(k,)+v for k,v in dct.items() if isinstance(v,tuple)]
print(outlist)
Output:
{1: ('__init__.py', ['lang.py', '__init__.py']), 2: ('price_calc.py', ['__init__.py', 'price_calc.py']), 3: 'lang.py'}
[(1, '__init__.py', ['lang.py', '__init__.py']), (2, 'price_calc.py', ['__init__.py', 'price_calc.py'])]

Related

How to convert [element, element, element] into [(1, element), (2, element), (3, element)]

Say I have a list:
my_list = ['foo', 'fa', 'goo']
I would like to turn this list into this:
[(1, 'foo'), (2, 'fa'), (3, 'goo')]
This way, I could iterate over the list and see what number it is in the list. Any help would be appreciated, I have been wondering what function does this for so long, I just don't know what exactly to search to find the answer.
The built-in function enumerate(iterable) is made to do literally this:
new_list = list(enumerate(my_list))
# [(0, 'foo'), (1, 'fa'), (2, 'goo')]
Giving a second argument to enumerate() will let you choose what index to start at, so you can 1-index:
new_list = list(enumerate(my_list, 1))
# [(1, 'foo'), (2, 'fa'), (3, 'goo')]
You can alternatively use a list comprehension to 1-index it, if you need to:
new_list = [(i+1, v) for (i, v) in enumerate(my_list)]
# [(1, 'foo'), (2, 'fa'), (3, 'goo')]

Access individual elements of tuples of dictionary keys

Considering the code snippet below -
list1 = [1,2,3,4]
list2 = [1,2,3,4]
list3 = ['a','b','c','d']
dct = dict(zip(zip(list1,list2),list3))
print(dct)
gives me,
{(1, 1): 'a', (2, 2): 'b', (3, 3): 'c', (4, 4): 'd'}
Now,
print(dct.keys())
gives me,
dict_keys([(1, 1), (2, 2), (3, 3), (4, 4)])
How can i access first element of the above list of keys?
Something like -
dct.keys[0, 0] = 1
dct.keys[0, 1] = 1
dct.keys[1, 0] = 2
dct.keys[1, 2] = 2
and so on...
Remember that a dict is unordered, and that dict.keys() may change order.
That said, to access the first element of a list, as you said, you can use list[element_index]. If the elemnt is an iterable, do that again!
So it would be
dct_keys = list(yourdict.keys())
dct_keys[0][0] = 1
dct_keys[0][1] = 1
dct_keys[1][0] = 2
dct_keys[1][1] = 2
You need to first convert the dct.keys() output to a list, and then the problem reduces to simple list-of-tuples indexing. To convert your .keys() output to a list, there are multiple available ways (check this out). Personally, I find using list comprehension as one of the simplest and most generic ways:
>>> [key for key in dct.keys()]
[(1, 1), (2, 2), (3, 3), (4, 4)]
And now simply index this list of tuples as:
>>> [key for key in dct.keys()][0][0]
1
Hope that helps.

Finding mismatches in tuples and merging them in Python

I have two tuples a = ((1, 'AB'), (2, 'BC'), (3, 'CD')) and b = ((1, 'AB'), (2, 'XY'), (3, 'ZA')). By analysing these two tuples, it can be found that there are mismatches in the tuples, i.e, (2, 'BC') is present in a but (2, 'XY') is present in b.
I need to figure out such mismatches and come with a tuple that has the values as
result = ((2, 'BC', 'XY'), (3, 'CD', 'ZA'))
(order shall be preserved)
The closest reference I could catch hold is Comparing sublists and merging them, but this is for lists and I couldn't find a way to work with tuples.
Is there a way by which I can perform this operation?
Since there cannot be missing "keys" from a or b (or those values should be ignored), I would turn b into a dictionary, then loop on a and compare values.
a = ((1, 'AB'), (2, 'BC'), (3, 'CD'))
b = ((1, 'AB'), (2, 'XY'), (3, 'ZA'))
b = dict(b)
mismatches = [(k,v,b[k]) for k,v in a if b.get(k,v) != v]
print(mismatches)
result:
[(2, 'BC', 'XY'), (3, 'CD', 'ZA')]
the solution has the advantage of being almost 1 line, fast (because of dict lookup) and preserves order.
the if b.get(k,v) != v condition safeguards against a having one tuple with a number not in b dictionary. In that case, default value of get returns v and the condition is False
If the lists are guaranteed to have the same order of the numbers in the tuples, you can do something like:
[ai + (bi[1],) for ai, bi in zip(a, b) if ai != bi]
and if there is no guarantee on the order you can do:
[ai + (bi[1],) for ai, bi in zip(sorted(a), sorted(b)) if ai != bi]

Python Easiest Way to Sum List Intersection of List of Tuples

Let's say I have the following two lists of tuples
myList = [(1, 7), (3, 3), (5, 9)]
otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
returns => [(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
I would like to design a merge operation that merges these two lists by checking for any intersections on the first element of the tuple, if there are intersections, add the second elements of each tuple in question (merge the two). After the operation I would like to sort based upon the first element.
I am also posting this because I think its a pretty common problem that has an obvious solution, but I feel that there could be very pythonic solutions to this question ;)
Use a dictionary for the result:
result = {}
for k, v in my_list + other_list:
result[k] = result.get(k, 0) + v
If you want a list of tuples, you can get it via result.items(). The resulting list will be in arbitrary order, but of course you can sort it if desired.
(Note that I renamed your lists to conform with Python's style conventions.)
Use defaultdict:
from collections import defaultdict
results_dict = defaultdict(int)
results_dict.update(my_list)
for a, b in other_list:
results_dict[a] += b
results = sorted(results_dict.items())
Note: When sorting sequences, sorted sorts by the first item in the sequence. If the first elements are the same, then it compares the second element. You can give sorted a function to sort by, using the key keyword argument:
results = sorted(results_dict.items(), key=lambda x: x[1]) #sort by the 2nd item
or
results = sorted(results_dict.items(), key=lambda x: abs(x[0])) #sort by absolute value
A method using itertools:
>>> myList = [(1, 7), (3, 3), (5, 9)]
>>> otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
>>> import itertools
>>> merged = []
>>> for k, g in itertools.groupby(sorted(myList + otherList), lambda e: e[0]):
... merged.append((k, sum(e[1] for e in g)))
...
>>> merged
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
This first concatenates the two lists together and sorts it. itertools.groupby returns the elements of the merged list, grouped by the first element of the tuple, so it just sums them up and places it into the merged list.
>>> [(k, sum(v for x,v in myList + otherList if k == x)) for k in dict(myList + otherList).keys()]
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
>>>
tested for both Python2.7 and 3.2
dict(myList + otherList).keys() returns an iterable containing a set of the keys for the joined lists
sum(...) takes 'k' to loop again through the joined list and add up tuple items 'v' where k == x
... but the extra looping adds processing overhead. Using an explicit dictionary as proposed by Sven Marnach avoids it.

Get tuples with max value from each key from a list

I have a list of tuples like this:
[(1, 0), (2, 1), (3, 1), (6, 2), (3, 2), (2, 3)]
I want to keep the tuples which have the max first value of every tuple with the same second value. For example (2, 1) and (3, 1) share the same second (key) value, so I just want to keep the one with the max first value -> (3, 1). In the end I would get this:
[(1, 0), (3, 1), (6, 2), (2, 3)]
I don't mind at all if it is not a one-liner but I was wondering about an efficient way to go about this...
from operator import itemgetter
from itertools import groupby
[max(items) for key, items in groupby(L,key = itemgetter(1))]
It's assuming that you initial list of tuples is sorted by key values.
groupby creates an iterator that yields objects like (0, <itertools._grouper object at 0x01321330>), where the first value is the key value, the second one is another iterator which gives all the tuples with that key.
max(items) just selects the tuple with the maximum value, and since all the second values of the group are the same (and is also the key), it gives the tuple with the maximum first value.
A list comprehension is used to form an output list of tuples based on the output of these functions.
Probably using a dict:
rd = {}
for V,K in my_tuples:
if V > rd.setdefault(K,V):
rd[K] = V
result = [ (V,K) for K,V in rd.items() ]
import itertools
import operator
l = [(1, 0), (2, 1), (3, 1), (6, 2), (3, 2), (2, 3)]
result = list(max(v, key=operator.itemgetter(0)) for k, v in itertools.groupby(l, operator.itemgetter(1)))
You could use a dictionary keyed on the second element of the tuple:
l = [(1, 0), (2, 1), (3, 1), (6, 2), (3, 2), (2, 3)]
d = dict([(t[1], None) for t in l])
for v, k in l:
if d[k] < v:
d[k] = v
l2 = [ (v, k) for (k, v) in d.items() if v != None ]
print l2

Categories

Resources