List of unique dictionaries - python

Let's say I have a list of dictionaries:
[
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
How can I obtain a list of unique dictionaries (removing the duplicates)?
[
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]

So make a temporary dict with the key being the id. This filters out the duplicates.
The values() of the dict will be the list
In Python2.7
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> {v['id']:v for v in L}.values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
In Python3
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> list({v['id']:v for v in L}.values())
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
In Python2.5/2.6
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> dict((v['id'],v) for v in L).values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

The usual way to find just the common elements in a set is to use Python's set class. Just add all the elements to the set, then convert the set to a list, and bam the duplicates are gone.
The problem, of course, is that a set() can only contain hashable entries, and a dict is not hashable.
If I had this problem, my solution would be to convert each dict into a string that represents the dict, then add all the strings to a set() then read out the string values as a list() and convert back to dict.
A good representation of a dict in string form is JSON format. And Python has a built-in module for JSON (called json of course).
The remaining problem is that the elements in a dict are not ordered, and when Python converts the dict to a JSON string, you might get two JSON strings that represent equivalent dictionaries but are not identical strings. The easy solution is to pass the argument sort_keys=True when you call json.dumps().
EDIT: This solution was assuming that a given dict could have any part different. If we can assume that every dict with the same "id" value will match every other dict with the same "id" value, then this is overkill; #gnibbler's solution would be faster and easier.
EDIT: Now there is a comment from André Lima explicitly saying that if the ID is a duplicate, it's safe to assume that the whole dict is a duplicate. So this answer is overkill and I recommend #gnibbler's answer.

In case the dictionaries are only uniquely identified by all items (ID is not available) you can use the answer using JSON. The following is an alternative that does not use JSON, and will work as long as all dictionary values are immutable
[dict(s) for s in set(frozenset(d.items()) for d in L)]

Here's a reasonably compact solution, though I suspect not particularly efficient (to put it mildly):
>>> ds = [{'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30}
... ]
>>> map(dict, set(tuple(sorted(d.items())) for d in ds))
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]

You can use numpy library (works for Python2.x only):
import numpy as np
list_of_unique_dicts=list(np.unique(np.array(list_of_dicts)))
To get it worked with Python 3.x (and recent versions of numpy), you need to convert array of dicts to numpy array of strings, e.g.
list_of_unique_dicts=list(np.unique(np.array(list_of_dicts).astype(str)))

a = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]
b = {x['id']:x for x in a}.values()
print(b)
outputs:
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Since the id is sufficient for detecting duplicates, and the id is hashable: run 'em through a dictionary that has the id as the key. The value for each key is the original dictionary.
deduped_dicts = dict((item["id"], item) for item in list_of_dicts).values()
In Python 3, values() doesn't return a list; you'll need to wrap the whole right-hand-side of that expression in list(), and you can write the meat of the expression more economically as a dict comprehension:
deduped_dicts = list({item["id"]: item for item in list_of_dicts}.values())
Note that the result likely will not be in the same order as the original. If that's a requirement, you could use a Collections.OrderedDict instead of a dict.
As an aside, it may make a good deal of sense to just keep the data in a dictionary that uses the id as key to begin with.

We can do with pandas
import pandas as pd
yourdict=pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[293]: [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]
Notice slightly different from the accept answer.
drop_duplicates will check all column in pandas , if all same then the row will be dropped .
For example :
If we change the 2nd dict name from john to peter
L=[
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'peter', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[295]:
[{'age': 34, 'id': 1, 'name': 'john'},
{'age': 34, 'id': 1, 'name': 'peter'},# here will still keeping the dict in the out put
{'age': 30, 'id': 2, 'name': 'hanna'}]

There are a lot of answers here, so let me add another:
import json
from typing import List
def dedup_dicts(items: List[dict]):
dedupped = [ json.loads(i) for i in set(json.dumps(item, sort_keys=True) for item in items)]
return dedupped
items = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
dedup_dicts(items)

I have summarized my favorites to try out:
https://repl.it/#SmaMa/Python-List-of-unique-dictionaries
# ----------------------------------------------
# Setup
# ----------------------------------------------
myList = [
{"id":"1", "lala": "value_1"},
{"id": "2", "lala": "value_2"},
{"id": "2", "lala": "value_2"},
{"id": "3", "lala": "value_3"}
]
print("myList:", myList)
# -----------------------------------------------
# Option 1 if objects has an unique identifier
# -----------------------------------------------
myUniqueList = list({myObject['id']:myObject for myObject in myList}.values())
print("myUniqueList:", myUniqueList)
# -----------------------------------------------
# Option 2 if uniquely identified by whole object
# -----------------------------------------------
myUniqueSet = [dict(s) for s in set(frozenset(myObject.items()) for myObject in myList)]
print("myUniqueSet:", myUniqueSet)
# -----------------------------------------------
# Option 3 for hashable objects (not dicts)
# -----------------------------------------------
myHashableObjects = list(set(["1", "2", "2", "3"]))
print("myHashAbleList:", myHashableObjects)

In python 3, simple trick, but based on unique field (id):
data = [ {'id': 1}, {'id': 1}]
list({ item['id'] : item for item in data}.values())

I don't know if you only want the id of your dicts in the list to be unique, but if the goal is to have a set of dict where the unicity is on all keys' values.. you should use tuples key like this in your comprehension :
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... {'id':2,'name':'hanna', 'age':50}
... ]
>>> len(L)
4
>>> L=list({(v['id'], v['age'], v['name']):v for v in L}.values())
>>>L
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, {'id': 2, 'name': 'hanna', 'age': 50}]
>>>len(L)
3
Hope it helps you or another person having the concern....

Expanding on John La Rooy (Python - List of unique dictionaries) answer, making it a bit more flexible:
def dedup_dict_list(list_of_dicts: list, columns: list) -> list:
return list({''.join(row[column] for column in columns): row
for row in list_of_dicts}.values())
Calling Function:
sorted_list_of_dicts = dedup_dict_list(
unsorted_list_of_dicts, ['id', 'name'])

If there is not a unique id in the dictionaries, then I'd keep it simple and define a function as follows:
def unique(sequence):
result = []
for item in sequence:
if item not in result:
result.append(item)
return result
The advantage with this approach, is that you can reuse this function for any comparable objects. It makes your code very readable, works in all modern versions of Python, preserves the order in the dictionaries, and is fast too compared to its alternatives.
>>> L = [
... {'id': 1, 'name': 'john', 'age': 34},
... {'id': 1, 'name': 'john', 'age': 34},
... {'id': 2, 'name': 'hanna', 'age': 30},
... ]
>>> unique(L)
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}]

In python 3.6+ (what I've tested), just use:
import json
#Toy example, but will also work for your case
myListOfDicts = [{'a':1,'b':2},{'a':1,'b':2},{'a':1,'b':3}]
#Start by sorting each dictionary by keys
myListOfDictsSorted = [sorted(d.items()) for d in myListOfDicts]
#Using json methods with set() to get unique dict
myListOfUniqueDicts = list(map(json.loads,set(map(json.dumps, myListOfDictsSorted))))
print(myListOfUniqueDicts)
Explanation: we're mapping the json.dumps to encode the dictionaries as json objects, which are immutable. set can then be used to produce an iterable of unique immutables. Finally, we convert back to our dictionary representation using json.loads. Note that initially, one must sort by keys to arrange the dictionaries in a unique form. This is valid for Python 3.6+ since dictionaries are ordered by default.

Well all the answers mentioned here are good, but in some answers one can face error if the dictionary items have nested list or dictionary, so I propose simple answer
a = [str(i) for i in a]
a = list(set(a))
a = [eval(i) for i in a]

Objects can fit into sets. You can work with objects instead of dicts and if needed after all set insertions convert back to a list of dicts. Example
class Person:
def __init__(self, id, age, name):
self.id = id
self.age = age
self.name = name
my_set = {Person(id=2, age=3, name='Jhon')}
my_set.add(Person(id=3, age=34, name='Guy'))
my_set.add({Person(id=2, age=3, name='Jhon')})
# if needed convert to list of dicts
list_of_dict = [{'id': obj.id,
'name': obj.name,
'age': obj.age} for obj in my_set]

A quick-and-dirty solution is just by generating a new list.
sortedlist = []
for item in listwhichneedssorting:
if item not in sortedlist:
sortedlist.append(item)

Let me add mine.
sort target dict so that {'a' : 1, 'b': 2} and {'b': 2, 'a': 1} are not treated differently
make it as json
deduplicate via set (as set does not apply to dicts)
again, turn it into dict via json.loads
import json
[json.loads(i) for i in set([json.dumps(i) for i in [dict(sorted(i.items())) for i in target_dict]])]

There may be more elegant solutions, but I thought it might be nice to add a more verbose solution to make it easier to follow. This assumes there is not a unique key, you have a simple k,v structure, and that you are using a version of python that guarantees list order. This would work for the original post.
data_set = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
]
# list of keys
keys = [k for k in data_set[0]]
# Create a List of Lists of the values from the data Set
data_set_list = [[v for v in v.values()] for v in data_set]
# Dedupe
new_data_set = []
for lst in data_set_list:
# Check if list exists in new data set
if lst in new_data_set:
print(lst)
continue
# Add list to new data set
new_data_set.append(lst)
# Create dicts
new_data_set = [dict(zip(keys,lst)) for lst in new_data_set]
print(new_data_set)

Pretty straightforward option:
L = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]
D = dict()
for l in L: D[l['id']] = l
output = list(D.values())
print output

Heres an implementation with little memory overhead at the cost of not being as compact as the rest.
values = [ {'id':2,'name':'hanna', 'age':30},
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
{'id':1,'name':'john', 'age':34},]
count = {}
index = 0
while index < len(values):
if values[index]['id'] in count:
del values[index]
else:
count[values[index]['id']] = 1
index += 1
output:
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]

This is the solution I found:
usedID = []
x = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]
for each in x:
if each['id'] in usedID:
x.remove(each)
else:
usedID.append(each['id'])
print x
Basically you check if the ID is present in the list, if it is, delete the dictionary, if not, append the ID to the list

Related

[python - get the list of keys in a list of dictionary]

I have a list of dictionaries
input:
x = [{'id': 19, 'number': 123, 'count': 1},
{'id': 1, 'number': 23, 'count': 7},
{'id': 2, 'number': 238, 'count': 17},
{'id': 1, 'number': 9, 'count': 1}]
How would I get the list of number:
[123, 23, 238, 9]
Thank you for you reading
To get these numbers you can use
>>> [ d['number'] for d in x ]
But this is not the "list of keys" for which you ask in the question title.
The list of keys of each dictionary d in x is obtained as d.keys()
which would yield something like ['id', 'number', ...]. Do for example
>>> [ list(d.keys()) for d in x ]
to see. If they are all equal you are probably only interested in the first of these lists. You can get it as
>>> list( x[0].keys() )
Note also that the "elements" of a dictionary are actually the keys rather than the values. So you will also get the list ['id', 'number',...] if you write
>>> [ key for key in x[0] ]
or simply (and better):
>>> list( x[0] )
To get the first element is more tricky when x is not a list but a set or dict. In that case you can use next(x.__iter__()).
P.S.: You should actually think what you really want the keys to be -- a priori that should be the 'id's, not the 'number's, but your 'id's have duplicates which is contradictory to the concept and very definition / meaning of 'id' -- and then use the chosen keys as identifiers to index the elements of your collection 'x'. So if the keys are the 'number's, you should have a dictionary (rather than a list)
x = {123: {'id': 19, 'count': 1}, 23: {'id': 1, 'count': 7}, ...}
(where I additionally assumed that the numbers are indeed integers [which is more efficient] rather than strings, but that's up to you).
Then you can also do, e.g., x[123]['count'] += 1 to increment the 'count' of entry 123.
You can use a list comprehension:
numbers = [dictionary.get('number') for dictionary in list_of_dictionaries]
Using a functional programming approach:
from operator import itemgetter
x_k = list(map(itemgetter('number'), x))
#[123, 23, 238, 9]

Compare two dictionary lists, based on specific key

I'll try to be the more concise that I can.
Two dictionary lists as follows:
dictlist1 = [{'name': 'john', 'age': 30}, {'name': 'jessica', 'age': 56}, {'name': 'kirk', 'age': 20}, {'name': 'mario, 'age': 25}]
dictlist2 = [{'name': 'john', 'job': 'engineer'}, {'name': 'jessica', 'job':'nurse'}, {'name': 'mario', 'job': 'electrician'}]
My objective is to match base on the key "name" on both dictionaries and, at the end, create a third dictionary list with the key that has no match, in this case {'name':'kirk' , 'age':20}, like this:
listfinal = [{'name': 'kirk', 'age': 20}]
I've tried successfully compare the equal keys, creating a new dictionary with keys that matches and adding "job" key to it, doing this:
for dict2 in dictlist2:
for dict1 in dictlist1:
if dict1['name'] == dict2['name']:
matchname1 = dict2['name']
dictoutput = {'name': matchname1, 'age': dict1['age'], 'group': dict2['group']}
templist.append(dictoutput)
for dictionay in templist:
print(dictionay)
Output:
{'name': 'john', 'age': '30', 'job': 'engineer'}
{'name': 'jessica', 'age': '56', 'job': 'nurse'}
{'name': 'mario', 'age': '25', 'job': 'electrician'}
But absolutely no luck to get kirk user alone, not even using "else" in the inner if statement or creating a new if statement and using not equal (!=). I always get all the users when printing.
Any orientation will be highly appreciated.
Enumerate the lists and then collect the indices of matched pairs in a list inside the loop and delete corresponding elements outside the loop.
matched_d1 = []
matched_d2 = []
for j2, dict2 in enumerate(dictlist2):
for j1, dict1 in enumerate(dictlist1):
if dict1['name'] == dict2['name']:
matchname1 = dict2['name']
dictoutput = {'name': matchname1, 'age': dict1['age'], 'group': dict2['group']}
templist.append(dictoutput)
matched_d1.append(j1)
matched_d2.append(j2)
for j in sorted(matched_d1, reverse = True):
dictlist1.pop(j)
for j in sorted(matched_d2, reverse = True):
dictlist2.pop(j)
ans = dictlist1 + dictlist2
You can use sets to find the names that are only in dictlist1, in dictlist2 and also the common names. Then create the listfinal by keeping only the item with the name not in the common names:
dictlist1 = [{"name": "john", "age": 30}, {"name": "jessica", "age": 56}, {"name":"kirk" , "age": 20}, {"name": "mario", "age": 25}]
dictlist2 = [{"name": "john", "job": "engineer"}, {"name": "jessica", "job": "nurse"}, {"name": "mario", "job": "electrician"}]
names_dictlist1 = {item["name"] for item in dictlist1}
names_dictlist2 = {item["name"] for item in dictlist2}
common_names = names_dictlist1 & names_dictlist2
listfinal = [item for item in dictlist1 + dictlist2 if item["name"] not in common_names]
If you want one-line solution there it is
print([person for person in dictlist1 if person['name'] not in map(lambda x: x['name'], dictlist2)])
This code prints element from fist list wich "name" key doesnt occur in second list
you can use set operations to get the unique name, first make a set of the names on each one and subtract both, then use that to get the appropriate item form the list
>>> name1 = set(d["name"] for d in dictlist1)
>>> name2 = set(d["name"] for d in dictlist2)
>>> name1 - name2
{'kirk'}
>>> name2 - name1
set()
>>> unique = name1 - name2
>>> listfinal = [d for n in unique for d in dictlist1 if d["name"]==n]
>>> listfinal
[{'name': 'kirk', 'age': 20}]
>>>
Additionally, looking at #freude answers, you can make it a dictionary of name:index for each list an subtracts its keys, given that they dict.keys behave set-like in order to avoid a double loop from before to get the right item from the list
>>> name1 = {d["name"]:i for i,d in enumerate(dictlist1)}
>>> name2 = {d["name"]:i for i,d in enumerate(dictlist2)}
>>> unique = name1.keys() - name2.keys()
>>> unique
{'kirk'}
>>> [ dictlist1[name1[n]] for n in unique]
[{'name': 'kirk', 'age': 20}]
>>>

How to find latest entry for specific value in a dict?

I have a list of dictionaries which basically follows a structure like this:
elements = [{'id':1, 'date':1}, {'id':1, 'date':5}, {'id':2, 'date': 6}]
I want to write a function that only keeps the latest dict for each duplicate id, based on the date value:
[{'id':1, 'date':5}, {'id':2, 'date': 6}]
Is there an efficient way of doing this? So far I always end up with nested for loops and conditionals and I am sure there is a pythonic solution to this...
You don't need nested loops. This seems reasonably straightforward:
elements = [{'id':1, 'date':1}, {'id':1, 'date':5}, {'id':2, 'date': 6}]
latest = {}
for ele in elements:
if ele['id'] not in latest or ele['date'] > latest[ele['id']]['date']:
latest[ele['id']] = ele
print(list(latest.values()))
This will output:
[{'id': 1, 'date': 5}, {'id': 2, 'date': 6}]
First sort the list, then create a dict with id as the key and the elements of the list as values.
Now you can extract the values from this list
>>> elements = [{'id':1, 'date':1}, {'id':1, 'date':5}, {'id':2, 'date': 6}]
>>> dct = {d['id']:d for d in sorted(elements, key=lambda d: list(d.values()))}
>>> list(dct.values())
[{'id': 1, 'date': 5}, {'id': 2, 'date': 6}]
You can use itertools.groupby to group the dicts by id, operator.itemgetter will be the best helper method than using a lambda (make sure the list is sorted by id).
As for getting the "latest" I assume the last entry for each id you can use a collections.deque to get the last element in each group:
from itertools import groupby
from collections import deque
from operator import itemgetter
elements = [{'id':1, 'date':1}, {'id':1, 'date':5}, {'id':2, 'date': 6}]
get_id = itemgetter('id')
s_elements = sorted(elements, key=itemgetter('id', 'date'))
output = [deque(g, maxlen=1).pop() for _, g in groupby(s_elements, get_id)]
Output:
[{'id': 1, 'date': 5}, {'id': 2, 'date': 6}]
Does this work? It turns the dicts into tuples and then sorts them. Then when converting back to a dictionary any duplicates will be replaced by the last one.
def remove_duplicates(elements):
elements_as_tuples = sorted((d['id'], d['date']) for d in elements)
return dict(elements_as_tuples)
print(remove_duplicates(elements))
{1: 5, 2: 6}
The output is not in the original dictionary format but will it do?
If not you could add these steps:
elements_dict = remove_duplicates(elements)
list_of_dicts = [{'id': k, 'date': v} for k, v in elements_dict.items()]
print(list_of_dicts)
[{'id': 1, 'date': 5}, {'id': 2, 'date': 6}]

Use counter on a list of Python dictionaries

I'm trying to use counter on a list of dictionaries in order to count how many time each dictionary repeats itself.
Not all the dictionaries in the list necessarily has the same keys.
lets assume I have the following list:
my_list=({"id":1,"first_name":"Jhon","last_name":"Smith"},{"id":2,"first_name":"Jeff","last_name":"Levi"},{"id":3,"first_name":"Jhon"},{"id":1,"first_name":"Jhon","last_name":"Smith"})
My desired solution is
solution={
{"id":1,"first_name":"Jhon","last_name":"Smith"}:2
{"id":2,"first_name":"Jeff","last_name":"Levi"}:1
{"id":3,"first_name":"Jhon"}}
I have tried
import collections
c=collections.Counter(my_list)
but I get the following error
TypeError: unhashable type: 'dict'
Do you have any suggestion
Thanks
You can't use dictionary as a key in other dictionary. That's why you get a TypeError: unhashable type: 'dict'.
You can serialize the dictionary to a JSON string, which can be used as a dictionary key.
import json
import collections
my_list = [{"id":1,"first_name":"Jhon","last_name":"Smith"},
{"id":2,"first_name":"Jeff","last_name":"Levi"},
{"id":3,"first_name":"Jhon"},
{"id":1,"first_name":"Jhon","last_name":"Smith"}]
c = collections.Counter(json.dumps(l) for l in my_list)
print c
>>> Counter({'{"first_name": "Jhon", "last_name": "Smith", "id": 1}': 2,
'{"first_name": "Jeff", "last_name": "Levi", "id": 2}': 1,
'{"first_name": "Jhon", "id": 3}': 1})
Counter is tool that stores items in a iterable as a dict wherein dict.keys() represent the items and dict.values() represent the count of that item in the iterable.
In a dictionary, however, you cannot have repetitive keys, as the keys must be unique. There is therefore no point in counting anything, since we already know it's 1. On the other hand, there may be repetitive values stored in the dict. For instance:
>>> from collections import Counter
>>> my_dict = {'a': 'me', 'b':'you', 'c':'me', 'd':'me'}
>>> Counter(my_dict) # As plain dict.
Counter({'b': 'you', 'a': 'me', 'c': 'me', 'd': 'me'})
>>> Counter(my_dict.values()) # As dict values.
Counter({'me': 3, 'you': 1})
Now let's say we have list of dictionaries, and we want to counter the values in those dictionaries; as is the case in your question:
>>> my_dict = [
... {'age': 30, 'name': 'John'},
... {'age': 20, 'name': 'Jeff'},
... {'age': 30, 'name': 'John'},
... {'age': 25, 'name': 'John'}
... ]
>>> Counter(tuple(i.values()) for i in a) # As a generator of values as tuple.
Counter({(30, 'John'): 2, (25, 'John'): 1, (20, 'Jeff'): 1})
Now you can of course take this tuples and convert them to a dict:
>>> {key: value for key, value in b.items()}
{(25, 'John'): 1, (30, 'John'): 2, (20, 'Jeff'): 1}
or go even further, and use named tuples from collections.namedtuple and identify your tuples by name, to which you can later refer much more easily, and clearly.
Hope this helps.
Learn more about collections.Counter from the documentations or this useful set of examples. You can also refer to Raymond Hettinger (Python's maintainer of collections toolbox) videos on YouTube. He has some great tutorials on different tools.
Unfortunately dict are not hashable. So I write this code. Result is not like your desired solution (because not possible) but may be you can use that.
ids_l = [i['id'] for i in my_list]
ids_s = list(set(ids_l))
#k is basickly [id, how many]
k = [[i,ids_l.count(i)] for i in ids_s]
#finding my_list from id
def finder(x):
for i in my_list:
if i['id'] == x:
return i
res = []
for i in range(len(ids_s)):
#k[i][1] how many value
#finder(k[i][0]) return dict
res.append([k[i][1],finder(k[i][0])])
print(res)
this code return
[
[2, {'id': 1, 'first_name': 'Jhon', 'last_name': 'Smith'}],
[1, {'id': 2, 'first_name': 'Jeff', 'last_name': 'Levi'}],
[1, {'id': 3, 'first_name': 'Jhon'}]
]
ps: sorry my poor english

Is there better way to merge dictionaries contained in two lists in Python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I have two lists containing dictionaries:
list_a:
[{'id': 1, 'name': 'test'}, {'id': 2, 'name': 'test1'},....]
list_b:
[{'id': 1, 'age': 10}, {'id': 2, 'age': 20}, ....]
I want to merge these two lists with the result being:
[{'id': 1, 'name': 'test', 'age': 10}, {'id': 2, 'name': 'test1', 'age': 20}....]
I wan to use the nest loop to make it:
result= []
for i in list_a:
for j in list_b:
if i['id'] == j['id']:
i['age']=j['age']
result.append(i)
but there are 2000 elements for list_a, the ids of list_b is belongs to list_a, but the count of list_b is possibly less than 2000. the time complexityis of this method is too high, there a better way to merge them?
Not really, but dict.setdefault and dict.update probably are your friends for this.
data = {}
lists = [
[{'id': 1, 'name': 'test'}, {'id': 2, 'name': 'test1'},],
[{'id': 1, 'age': 10}, {'id': 2, 'age': 20},]
]
for each_list in lists:
for each_dict in each_list:
data.setdefault(each_dict['id'], {}).update(each_dict)
Result:
>>> data
{1: {'age': 10, 'id': 1, 'name': 'test'},
2: {'age': 20, 'id': 2, 'name': 'test1'}}
This way you can lookup by id (or just get data.values() if you want a plain list). Its been 20 years since I took my algorithms class, but I guess this is close enough to O(n) while your sample is more O(n²). This solution has some interesting properties: does not mutate the original lists, works for any number of lists, works for uneven lists containing distinct sets of "id".
answer = {}
for d in list_a: answer[d['id']] = d
for d in list_b:
if d['id'] not in d:
answer[d['id']] = d
continue
for k,v in d.items():
answer[d['id']][k] = v
answer = [d for k,d in sorted(answer.items(), key=lambda s:s[0])]
No, I think this is the best way because you are joining all data in the simplest data structure.
You can know how to implement it here
I hope my answer will be helpful for you.
It could be done in one line, given the items in list1 and list2 are in the same order, id wise, I mean.
result = [item1 for item1, item2 in zip(list1, list2) if not item1.update(item2)]
For a more lengthy one
for item1, item2 in zip(list1, list2):
item1.update(item2)
# list1 will be mutated to the result
To find a better way one needs to know how the values are generated and how they will be used.
For example if you have them as csv files you can use a Table-like module like pandas (I'll create them from your lists but they have a read_csv and from_csv as well):
import pandas as pd
df1 = pd.DataFrame.from_dict([{'id': 1, 'name': 'test'}, {'id': 2, 'name': 'test1'}])
df2 = pd.DataFrame.from_dict([{'id': 1, 'age': 10}, {'id': 2, 'age': 20}])
pd.merge(df1, df2, on='id')
Or if they come from a database most databases already have a JOIN ON (for example MYSQL) option.

Categories

Resources