Finding mismatches in tuples and merging them in Python - python

I have two tuples a = ((1, 'AB'), (2, 'BC'), (3, 'CD')) and b = ((1, 'AB'), (2, 'XY'), (3, 'ZA')). By analysing these two tuples, it can be found that there are mismatches in the tuples, i.e, (2, 'BC') is present in a but (2, 'XY') is present in b.
I need to figure out such mismatches and come with a tuple that has the values as
result = ((2, 'BC', 'XY'), (3, 'CD', 'ZA'))
(order shall be preserved)
The closest reference I could catch hold is Comparing sublists and merging them, but this is for lists and I couldn't find a way to work with tuples.
Is there a way by which I can perform this operation?

Since there cannot be missing "keys" from a or b (or those values should be ignored), I would turn b into a dictionary, then loop on a and compare values.
a = ((1, 'AB'), (2, 'BC'), (3, 'CD'))
b = ((1, 'AB'), (2, 'XY'), (3, 'ZA'))
b = dict(b)
mismatches = [(k,v,b[k]) for k,v in a if b.get(k,v) != v]
print(mismatches)
result:
[(2, 'BC', 'XY'), (3, 'CD', 'ZA')]
the solution has the advantage of being almost 1 line, fast (because of dict lookup) and preserves order.
the if b.get(k,v) != v condition safeguards against a having one tuple with a number not in b dictionary. In that case, default value of get returns v and the condition is False

If the lists are guaranteed to have the same order of the numbers in the tuples, you can do something like:
[ai + (bi[1],) for ai, bi in zip(a, b) if ai != bi]
and if there is no guarantee on the order you can do:
[ai + (bi[1],) for ai, bi in zip(sorted(a), sorted(b)) if ai != bi]

Related

merging two lists of tuples in python

let's assume these are my lists:
oracle_files = [
(1, "__init__.py"),
(2, "price_calc.py"),
(3, "lang.py")]
predicted_files = [
(5, ["random.py","price_calc.py"]),
(2, ["__init__.py","price_calc.py"]),
(1, ["lang.py","__init__.py"])]
first list is a list of tuples where i have an identifier and a string per each.
second one is a list of tuples of integers and list of strings
my intention is to create a third list that intersects these two ones by ID (the integer)
and the output should look like this:
result = [(2, "price_calc.py", ["__init__.py","price_calc.py"]),
(1, "__init__.py", ["lang.py","__init__.py"])]
do you know a way to reach this output? because i'm not getting it right.
Here's an approach using dict:
oracle_files = [(1, "__init__.py"), (2, "price_calc.py"), (3, "lang.py")]
predicted_files = [(5, ["random.py","price_calc.py"]), (2, ["__init__.py","price_calc.py"]), (1, ["lang.py","__init__.py"])]
dct1 = dict(oracle_files)
dct2 = dict(predicted_files)
result = [(k, dct1[k], dct2[k]) for k in dct1.keys() & dct2.keys()]
print(result) # [(1, '__init__.py', ['lang.py', '__init__.py']), (2, 'price_calc.py', ['__init__.py', 'price_calc.py'])]
This uses a convenient fact that the dict keys obtained from dict.keys() behave like a set.
Keys views are set-like since their entries are unique and hashable. [...] For set-like views, all of the operations defined for the abstract base class collections.abc.Set are available (for example, ==, <, or ^).
https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects
I think this does what you want.
oracle_files = [(1, "__init__.py"), (2, "price_calc.py"), (3, "lang.py")]
predicted_files = [(5, ["random.py","price_calc.py"]), (2, ["__init__.py","price_calc.py"]), (1, ["lang.py","__init__.py"])]
dct = dict(oracle_files)
for k,v in predicted_files:
if k in dct:
dct[k] = (dct[k], v)
print(dct)
outlist = [(k,)+v for k,v in dct.items() if isinstance(v,tuple)]
print(outlist)
Output:
{1: ('__init__.py', ['lang.py', '__init__.py']), 2: ('price_calc.py', ['__init__.py', 'price_calc.py']), 3: 'lang.py'}
[(1, '__init__.py', ['lang.py', '__init__.py']), (2, 'price_calc.py', ['__init__.py', 'price_calc.py'])]

How do I print out the elements that is in both tuples?

a = (('we', 23), ('b', 2))
b = (('we', 3), ('e', 3), ('b', 4))
#wanted_result = (('we', 3), ('b', 4), ('we', 23), ('b', 2))
How can I receive the tuple that contains the same string in both a and b
like the result I have written below the code?
I would prefer using list comprehensions using filters btw... would that be available?
You can use set intersection:
keys = dict(a).keys() & dict(b)
tuple(t for t in a + b if t[0] in keys)
You can make a set of the intersection between the first part of the tuples in both lists. Then use a list comprehension to extract the tuples that match this common set:
a = (('we', 23), ('b', 2))
b = (('we', 3), ('e', 3), ('b', 4))
common = set(next(zip(*a))) & set(next(zip(*b)))
result = [t for t in a+b if t[0] in common]
[('we', 23), ('b', 2), ('we', 3), ('b', 4)]
You can also do something similar using the Counter class from collections (by filtering tuples on string counts greater than 1:
from collections import Counter
common = Counter(next(zip(*a,*b)))
result = [(s,n) for (s,n) in a+b if common[s]>1]
If you want a single list comprehension, given that your tuples have exactly two values, you can pair each one with a dictionary formed form the other and use the dictionary as a filter mechanism:
result = [t for d,tl in [(dict(b),a),(dict(a),b)] for t in tl if t[0] in d]
Adding two list comprehensions (i.e. concatenating lists):
print([bi for bi in b if any(bi[0]==i[0] for i in a)] +
[ai for ai in a if any(ai[0]==i[0] for i in b)])
# Output: [('we', 3), ('b', 4), ('we', 23), ('b', 2)]
Explanation
[bi for bi in b if any(bi[0]==i[0] for i in a)] # ->>
# Take tuples from b whose first element equals one of the
# first elements of a
[ai for ai in a if ai[0] in [i[0] for i in b]]
# Similarly take tuples from a whose first elements equals one of the
# first elements of b
another variation with sets
filtered_keys=set(k for k,v in a)&set(k for k,v in b)
res=tuple((k, v) for k, v in [*a, *b] if k in filtered_keys)
>>> (('we', 23), ('b', 2), ('we', 3), ('b', 4))

Access individual elements of tuples of dictionary keys

Considering the code snippet below -
list1 = [1,2,3,4]
list2 = [1,2,3,4]
list3 = ['a','b','c','d']
dct = dict(zip(zip(list1,list2),list3))
print(dct)
gives me,
{(1, 1): 'a', (2, 2): 'b', (3, 3): 'c', (4, 4): 'd'}
Now,
print(dct.keys())
gives me,
dict_keys([(1, 1), (2, 2), (3, 3), (4, 4)])
How can i access first element of the above list of keys?
Something like -
dct.keys[0, 0] = 1
dct.keys[0, 1] = 1
dct.keys[1, 0] = 2
dct.keys[1, 2] = 2
and so on...
Remember that a dict is unordered, and that dict.keys() may change order.
That said, to access the first element of a list, as you said, you can use list[element_index]. If the elemnt is an iterable, do that again!
So it would be
dct_keys = list(yourdict.keys())
dct_keys[0][0] = 1
dct_keys[0][1] = 1
dct_keys[1][0] = 2
dct_keys[1][1] = 2
You need to first convert the dct.keys() output to a list, and then the problem reduces to simple list-of-tuples indexing. To convert your .keys() output to a list, there are multiple available ways (check this out). Personally, I find using list comprehension as one of the simplest and most generic ways:
>>> [key for key in dct.keys()]
[(1, 1), (2, 2), (3, 3), (4, 4)]
And now simply index this list of tuples as:
>>> [key for key in dct.keys()][0][0]
1
Hope that helps.

Python Easiest Way to Sum List Intersection of List of Tuples

Let's say I have the following two lists of tuples
myList = [(1, 7), (3, 3), (5, 9)]
otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
returns => [(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
I would like to design a merge operation that merges these two lists by checking for any intersections on the first element of the tuple, if there are intersections, add the second elements of each tuple in question (merge the two). After the operation I would like to sort based upon the first element.
I am also posting this because I think its a pretty common problem that has an obvious solution, but I feel that there could be very pythonic solutions to this question ;)
Use a dictionary for the result:
result = {}
for k, v in my_list + other_list:
result[k] = result.get(k, 0) + v
If you want a list of tuples, you can get it via result.items(). The resulting list will be in arbitrary order, but of course you can sort it if desired.
(Note that I renamed your lists to conform with Python's style conventions.)
Use defaultdict:
from collections import defaultdict
results_dict = defaultdict(int)
results_dict.update(my_list)
for a, b in other_list:
results_dict[a] += b
results = sorted(results_dict.items())
Note: When sorting sequences, sorted sorts by the first item in the sequence. If the first elements are the same, then it compares the second element. You can give sorted a function to sort by, using the key keyword argument:
results = sorted(results_dict.items(), key=lambda x: x[1]) #sort by the 2nd item
or
results = sorted(results_dict.items(), key=lambda x: abs(x[0])) #sort by absolute value
A method using itertools:
>>> myList = [(1, 7), (3, 3), (5, 9)]
>>> otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
>>> import itertools
>>> merged = []
>>> for k, g in itertools.groupby(sorted(myList + otherList), lambda e: e[0]):
... merged.append((k, sum(e[1] for e in g)))
...
>>> merged
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
This first concatenates the two lists together and sorts it. itertools.groupby returns the elements of the merged list, grouped by the first element of the tuple, so it just sums them up and places it into the merged list.
>>> [(k, sum(v for x,v in myList + otherList if k == x)) for k in dict(myList + otherList).keys()]
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
>>>
tested for both Python2.7 and 3.2
dict(myList + otherList).keys() returns an iterable containing a set of the keys for the joined lists
sum(...) takes 'k' to loop again through the joined list and add up tuple items 'v' where k == x
... but the extra looping adds processing overhead. Using an explicit dictionary as proposed by Sven Marnach avoids it.

sorting tuples in python with a custom key

Hi:
I'm trying to sort a list of tuples in a custom way:
For example:
lt = [(2,4), (4,5), (5,2)]
must be sorted:
lt = [(5,2), (2,4), (4,5)]
Rules:
* b tuple is greater than a tuple if a[1] == b[0]
* a tuple is greater than b tuple if a[0] == b[1]
I've implemented a cmp function like this:
def tcmp(a, b):
if a[1] == b[0]:
return -1
elif a[0] == b[1]:
return 1
else:
return 0
but sorting the list:
lt.sort(tcmp)
lt show me:
lt = [(2, 4), (4, 5), (5, 2)]
What am I doing wrong?
Sounds a lot to me you are trying to solve one of the Google's Python class problems, which is to sort a list of tuples in increasing order based on their last element.
This how I did it:
def sort_last(tuples):
def last_value_tuple(t):
return t[-1]
return sorted(tuples, key=last_value_tuple)
EDIT: I didn't read the whole thing, and I assumed it was based on the last element of the tuple. Well, still I'm going to leave it here because it can be useful to anyone.
You could also write your code using lambda
def sort(tuples):
return sorted (tuples,key=lambda last : last[-1])
so sort([(1, 3), (3, 2), (2, 1)]) would yield [(2, 1), (3, 2), (1, 3)]
You can write your own custom key function to specify the key value for sorting.
Ex.
def sort_last(tuples):
return sorted(tuples, key=last)
def last(a):
return a[-1]
tuples => sorted tuple by last element
[(1, 3), (3, 2), (2, 1)] => [(2, 1), (3, 2), (1, 3)]
[(1, 7), (1, 3), (3, 4, 5), (2, 2)] => [(2, 2), (1, 3), (3, 4, 5), (1, 7)]
I'm not sure your comparison function is a valid one in a mathematical sense, i.e. transitive. Given a, b, c a comparison function saying that a > b and b > c implies that a > c. Sorting procedures rely on this property.
Not to mention that by your rules, for a = [1, 2] and b = [2, 1] you have both a[1] == b[0] and a[0] == b[1] which means that a is both greater and smaller than b.
Your ordering specification is wrong because it is not transitive.
Transitivity means that if a < b and b < c, then a < c. However, in your case:
(1,2) < (2,3)
(2,3) < (3,1)
(3,1) < (1,2)
Try lt.sort(tcmp, reverse=True).
(While this may produce the right answer, there may be other problems with your comparison method)

Categories

Resources