Remove duplicate dicts from list based on uniqueness of keys

Remove duplicate dicts from list based on uniqueness of keys - python

My question is somewhat similar to this question: https://codereview.stackexchange.com/questions/175079/removing-key-value-pairs-in-list-of-dicts. Essentially, I have a list of dictionaries, and I want to remove duplicates from the list based on the unique combination of two (or more) keys within each dictionary.
Suppose I have the following list of dictionaries:
some_list_of_dicts = [
{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 1, 'd': 5, 'e': 1},
{'a': 1, 'b': 1, 'c': 1, 'd': 7, 'e': 8},
{'a': 1, 'b': 1, 'c': 1, 'd': 9, 'e': 6},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}
]
And let's suppose the combination of a, b, and c have to be unique; any other values can be whatever they want, but the combination of these three must be unique to this list. I would want to take whichever unique combo of a, b, and c came first, keep that, and discard everything else where that combination is the same.
The new list, after running it through some remove_duplicates function would look like this:
new_list = [
{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}
]
I've only managed to come up with this:
def remove_duplicates(old_list):
uniqueness_check_list = []
new_list = []
for item in old_list:
# The unique combination is 'a', 'b', and 'c'
uniqueness_check = "{}{}{}".format(
item["a"], item["b"], item["c"]
)
if uniqueness_check not in uniqueness_check_list:
new_list.append(item)
uniqueness_check_list.append(uniqueness_check)
return new_list
But this doesn't feel very Pythonic. It also has the problem that I've hardcoded in the function which keys have to be unique; it would be better if I could specify that as an argument to the function itself, but again, not sure what's the most elegant way to do this.

You can use a dict comprehension to construct a dict from the list of dicts in the reversed order so that the values of the first of any unique combinations would take precedence. Use operator.itemgetter to get the unique keys as a tuple. Reverse again in the end for the original order:
from operator import itemgetter
list({itemgetter('a', 'b', 'c')(d): d for d in reversed(some_list_of_dicts)}.values())[::-1]
This returns:
[{'a': 1, 'b': 1, 'c': 1, 'd': 2, 'e': 4},
{'a': 1, 'b': 1, 'c': 2, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 3, 'd': 2, 'e': 3},
{'a': 1, 'b': 1, 'c': 4, 'd': 2, 'e': 3}]

With the help of a function to keep track of duplicates, you can use some list comprehension:
def remove_duplicates(old_list, cols=('a', 'b', 'c')):
duplicates = set()
def is_duplicate(item):
duplicate = item in duplicates
duplicates.add(item)
return duplicate
return [x for x in old_list if not is_duplicate(tuple([x[col] for col in cols]))]
To use:
>>> remove_duplicates(some_list_of_dicts)
[
{'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2},
{'a': 1, 'c': 2, 'b': 1, 'e': 3, 'd': 2},
{'a': 1, 'c': 3, 'b': 1, 'e': 3, 'd': 2},
{'a': 1, 'c': 4, 'b': 1, 'e': 3, 'd': 2}
]
You can also provide different columns to key on:
>>> remove_duplicates(some_list_of_dicts, cols=('a', 'd'))
[
{'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2},
{'a': 1, 'c': 1, 'b': 1, 'e': 1, 'd': 5},
{'a': 1, 'c': 1, 'b': 1, 'e': 8, 'd': 7},
{'a': 1, 'c': 1, 'b': 1, 'e': 6, 'd': 9}
]

Related

Python recursive generator breaks when using list() and append() keywords

I have only recently learned about coroutines using generators and tried to implement the concept in the following recursive function:
def _recursive_nWay_generator(input: list, output={}):
'''
Helper function; used to generate parameter-value pairs
to submit to the model for the simulation.
Parameters
----------
input : list of tuple
every tuple of the list must be of the form:
``('name_of_parameter', iterable_of_values)``
output : list, optional
parameter used for recursion; allows for list building
across subgenerators
Returns
-------
Generator :
Specifications used for simulation setup of the form:
``{'par1': val1, ...}``
'''
# exit condition
if len(input) == 0:
yield output
# recursive loop
else:
curr = input[0]
par_name = curr[0]
for par_value in curr[1]:
output[par_name] = par_value
# coroutines for the win!
yield from _recursive_nWay_generator(input[1:], output=output)
Function somewhat works as intended:
testlist = [('a', (1, 2, 3)), ('b', (4, 5, 6)), ('c', (7, 8))]
for a in _recursive_nWay_generator(testlist):
print(a)
Output:
{'a': 1, 'b': 4, 'c': 7}
{'a': 1, 'b': 4, 'c': 8}
{'a': 1, 'b': 5, 'c': 7}
{'a': 1, 'b': 5, 'c': 8}
{'a': 1, 'b': 6, 'c': 7}
{'a': 1, 'b': 6, 'c': 8}
{'a': 2, 'b': 4, 'c': 7}
{'a': 2, 'b': 4, 'c': 8}
{'a': 2, 'b': 5, 'c': 7}
{'a': 2, 'b': 5, 'c': 8}
{'a': 2, 'b': 6, 'c': 7}
{'a': 2, 'b': 6, 'c': 8}
{'a': 3, 'b': 4, 'c': 7}
{'a': 3, 'b': 4, 'c': 8}
{'a': 3, 'b': 5, 'c': 7}
{'a': 3, 'b': 5, 'c': 8}
{'a': 3, 'b': 6, 'c': 7}
{'a': 3, 'b': 6, 'c': 8}
However, it breaks when I try to append to an existing list or construct a new one:
gen = _recursive_nWay_generator(testlist)
print(list(gen))
Output:
[{'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}, {'a': 3, 'b': 6, 'c': 8}]
This question was attempting to do something close to what I have, but I'm not seeing answers that could help.
I am honestly clueless as to how to solve this, the online searches I tried gave nothing no matter how I phrase the question. If this was answered before I'll be happy to just follow the link.

The problem with your code is reusing the same mutable output dict during the iteration and recursive calls. That is, you yield output and then later on you modify it with output[par_name] = par_value but it's the same dict in each case - so you're modifying the instance which was already returned! If you append each result into a list and then print them all at the end, you'll see that they're identical - it's the same result yielded each time.
The simplest way to "fix" your existing code is to yield copies, i.e. change the line:
yield output
into this:
yield dict(output.items())
However, this algorithm is not great, and I recommend you look for something better. Using recursion is poor choice here. I'll offer you a simple/direct way to generate the sequence more efficiently:
import itertools as it
testlist = [('a', (1, 2, 3)), ('b', (4, 5, 6)), ('c', (7, 8))]
keys, vals = zip(*testlist)
for p in it.product(*vals):
print(dict(zip(keys, p)))

Iterating over combinations of integer values from a dict

I have a dict I'm trying to loop through which stores integer values:
myDict = {"a":1,"b":2,"c":3}
And I'm trying to loop through every combination of these integers from 1 to their value. For example:
{"a":1,"b":1,"c":1}
{"a":1,"b":1,"c":2}
{"a":1,"b":1,"c":3}
{"a":1,"b":2,"c":1}
{"a":1,"b":2,"c":2}
And so on...
Ending at their max values at the start while going through all the combinations.
Is there an elegant way to this that applies to any size dictionary? :
myDict = {"a":1,"b":2,"c":3,"d":5}
I'm currently doing this with nested if statements that edit a clone dictionary, so any solution setting dict values would work :) :
copyDict["c"] = 2

Yes, this will do it, for any size input (that fits in memory, of course):
import itertools
myDict = {"a":1,"b":2,"c":3,"d":5}
keys = list(myDict.keys())
ranges = tuple(range(1,k+1) for k in myDict.values())
outs = [dict(zip(keys,s)) for s in itertools.product(*ranges)]
print(outs)
You can actually replace *ranges with the *(range(1,k+1) for k in myDict.values()) and reduce it by one line, but it's not easier to read.

Another way with itertools.product:
>>> [dict(zip(myDict.keys(), p)) for p in itertools.product(myDict.values(), repeat=len(myDict))]
[{'a': 1, 'b': 1, 'c': 1},
{'a': 1, 'b': 1, 'c': 2},
{'a': 1, 'b': 1, 'c': 3},
{'a': 1, 'b': 2, 'c': 1},
{'a': 1, 'b': 2, 'c': 2},
{'a': 1, 'b': 2, 'c': 3},
{'a': 1, 'b': 3, 'c': 1},
{'a': 1, 'b': 3, 'c': 2},
{'a': 1, 'b': 3, 'c': 3},
{'a': 2, 'b': 1, 'c': 1},
{'a': 2, 'b': 1, 'c': 2},
{'a': 2, 'b': 1, 'c': 3},
{'a': 2, 'b': 2, 'c': 1},
{'a': 2, 'b': 2, 'c': 2},
{'a': 2, 'b': 2, 'c': 3},
{'a': 2, 'b': 3, 'c': 1},
{'a': 2, 'b': 3, 'c': 2},
{'a': 2, 'b': 3, 'c': 3},
{'a': 3, 'b': 1, 'c': 1},
{'a': 3, 'b': 1, 'c': 2},
{'a': 3, 'b': 1, 'c': 3},
{'a': 3, 'b': 2, 'c': 1},
{'a': 3, 'b': 2, 'c': 2},
{'a': 3, 'b': 2, 'c': 3},
{'a': 3, 'b': 3, 'c': 1},
{'a': 3, 'b': 3, 'c': 2},
{'a': 3, 'b': 3, 'c': 3}]

Flatten product of two dictionary lists [duplicate]

This question already has answers here:
How do I merge two dictionaries in a single expression in Python?
(43 answers)
Closed 5 years ago.
I have two lists of dictionaries which I am trying to get the product of:
from itertools import product
list1 = [{'A': 1, 'B': 1}, {'A': 2, 'B': 2}, {'A': 2, 'B': 1}, {'A': 1, 'B': 2}]
list2 = [{'C': 1, 'D': 1}, {'C': 1, 'D': 2}]
for p in product(list1, list2):
print p
and this gives me the output:
({'A': 1, 'B': 1}, {'C': 1, 'D': 1})
({'A': 1, 'B': 1}, {'C': 1, 'D': 2})
({'A': 2, 'B': 2}, {'C': 1, 'D': 1})
({'A': 2, 'B': 2}, {'C': 1, 'D': 2})
({'A': 2, 'B': 1}, {'C': 1, 'D': 1})
({'A': 2, 'B': 1}, {'C': 1, 'D': 2})
({'A': 1, 'B': 2}, {'C': 1, 'D': 1})
({'A': 1, 'B': 2}, {'C': 1, 'D': 2})
How would I flatten these so the output is a single dict rather than a tuple of dicts?:
{'A': 1, 'B': 1, 'C': 1, 'D': 1}
{'A': 1, 'B': 1, 'C': 1, 'D': 2}
{'A': 2, 'B': 2, 'C': 1, 'D': 1}
{'A': 2, 'B': 2, 'C': 1, 'D': 2}
{'A': 2, 'B': 1, 'C': 1, 'D': 1}
{'A': 2, 'B': 1, 'C': 1, 'D': 2}
{'A': 1, 'B': 2, 'C': 1, 'D': 1}
{'A': 1, 'B': 2, 'C': 1, 'D': 2}

Looks like you want to merge the dictionaries
for p1, p2 in product(list1, list2):
merged = {**p1, **p2}
print(merged)
In earlier versions of Python, you can't merge with this expression. Use p1.update(p2) instead.

Explode a dict - Get all combinations of the values in a dictionary

I want to get all combinations of the values in a dictionary as multiple dictionaries (each containing every key of the original but only one value of the original values). Say I want to parametrize a function call with:
kwargs = {'a': [1, 2, 3], 'b': [1, 2, 3]}
How do I get a list of all the combinations like this:
combinations = [{'a': 1, 'b': 1}, {'a': 1, 'b': 2}, {'a': 1, 'b': 3},
{'a': 2, 'b': 1}, {'a': 2, 'b': 2}, {'a': 2, 'b': 3},
{'a': 3, 'b': 1}, {'a': 3, 'b': 2}, {'a': 3, 'b': 3}]
There can be an arbitary amount of keys in the original kwargs and each value is garantueed to be an iterable but the number of values is not fixed.
If possible: the final combinations should be a generator (not a list).

You can flatten the kwargs to something like this
>>> kwargs = {'a': [1, 2, 3], 'b': [1, 2, 3]}
>>> flat = [[(k, v) for v in vs] for k, vs in kwargs.items()]
>>> flat
[[('b', 1), ('b', 2), ('b', 3)], [('a', 1), ('a', 2), ('a', 3)]]
Then, you can use itertools.product like this
>>> from itertools import product
>>> [dict(items) for items in product(*flat)]
[{'a': 1, 'b': 1},
{'a': 2, 'b': 1},
{'a': 3, 'b': 1},
{'a': 1, 'b': 2},
{'a': 2, 'b': 2},
{'a': 3, 'b': 2},
{'a': 1, 'b': 3},
{'a': 2, 'b': 3},
{'a': 3, 'b': 3}]
itertools.product actually returns an iterator. So you can get the values on demand and build your dictionaries. Or you can use map, which also returns an iterator.
>>> for item in map(dict, product(*flat)):
... print(item)
...
...
{'b': 1, 'a': 1}
{'b': 1, 'a': 2}
{'b': 1, 'a': 3}
{'b': 2, 'a': 1}
{'b': 2, 'a': 2}
{'b': 2, 'a': 3}
{'b': 3, 'a': 1}
{'b': 3, 'a': 2}
{'b': 3, 'a': 3}

Just another way, building the value tuples first and then combining with keys afterwards (pretty much the opposite of #thefourtheye's way :-).
>>> combinations = (dict(zip(kwargs, vs)) for vs in product(*kwargs.values()))
>>> for c in combinations:
print(c)
{'a': 1, 'b': 1}
{'a': 1, 'b': 2}
{'a': 1, 'b': 3}
{'a': 2, 'b': 1}
{'a': 2, 'b': 2}
{'a': 2, 'b': 3}
{'a': 3, 'b': 1}
{'a': 3, 'b': 2}
{'a': 3, 'b': 3}

How to join 2 lists of dicts in python?

I have 2 lists like this:
l1 = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'a': 5, 'b': 6, 'c': 7, 'd': 8}]
l2 = [{'a': 5, 'b': 6, 'e': 100}, {'a': 1, 'b': 2, 'e': 101}]
and I want to obtain a list l3, which is a join of l1 and l2 where values of 'a' and 'b' are equal in both l1 and l2
i.e.
l3 = [{'a': 1, 'b: 2, 'c': 3, 'd': 4, 'e': 101}, {'a': 5, 'b: 6, 'c': 7, 'd': 8, 'e': 100}]
How can I do this?

You should accumulate the results in a dictionary. You should use the values of 'a' and 'b' to form a key of this dictionary
Here, I have used a defaultdict to accumulate the entries
l1 = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'a': 5, 'b': 6, 'c': 7, 'd': 8}]
l2 = [{'a': 5, 'b': 6, 'e': 100}, {'a': 1, 'b': 2, 'e': 101}]
from collections import defaultdict
D = defaultdict(dict)
for lst in l1, l2:
for item in lst:
key = item['a'], item['b']
D[key].update(item)
l3 = D.values()
print l3
output:
[{'a': 1, 'c': 3, 'b': 2, 'e': 101, 'd': 4}, {'a': 5, 'c': 7, 'b': 6, 'e': 100, 'd': 8}]

Simple list operations would do the thing for you as well:
l1 = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'a': 5, 'b': 6, 'c': 7, 'd': 8}]
l2 = [{'a': 5, 'b': 6, 'e': 100}, {'a': 1, 'b': 2, 'e': 101}]
l3 = []
for i in range(len(l1)):
for j in range(len(l2)):
if l1[i]['a'] == l2[j]['a'] and l1[i]['b'] == l2[j]['b']:
l3.append(dict(l1[i]))
l3[i].update(l2[j])

My approach is to sort the the combined list by the key, which is keys a + b. After that, for each group of dictionaries with similar key, combine them:
from itertools import groupby
def ab_key(dic):
return dic['a'], dic['b']
def combine_lists_of_dicts(list_of_dic1, list_of_dic2, keyfunc):
for key, dic_of_same_key in groupby(sorted(list_of_dic1 + list_of_dic2, key=keyfunc), keyfunc):
combined_dic = {}
for dic in dic_of_same_key:
combined_dic.update(dic)
yield combined_dic
l1 = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'a': 5, 'b': 6, 'c': 7, 'd': 8}]
l2 = [{'a': 5, 'b': 6, 'e': 100}, {'a': 1, 'b': 2, 'e': 101}]
for dic in combine_lists_of_dicts(l1, l2, ab_key):
print dic
Discussion
The function ab_key returns a tuple of value for key a and b, used for sorting a groupping
The groupby function groups all the dictionaries with similar keys together
This solution is less efficient than that of John La Rooy, but should work fine for small lists

One can achieve a nice and quick solution using pandas.
l1 = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'a': 5, 'b': 6, 'c': 7, 'd': 8}]
l2 = [{'a': 5, 'b': 6, 'e': 100}, {'a': 1, 'b': 2, 'e': 101}]
import pandas as pd
pd.DataFrame(l1).merge(pd.DataFrame(l2), on=['a','b']).to_dict('records')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicate dicts from list based on uniqueness of keys - python

Related

Python recursive generator breaks when using list() and append() keywords

Iterating over combinations of integer values from a dict

Flatten product of two dictionary lists [duplicate]

Explode a dict - Get all combinations of the values in a dictionary

How to join 2 lists of dicts in python?

Categories

Resources