list duplicate values in a nested dictionary - python

i need to check for duplicate values that might occur in a dictionary. I have a dictionary in the following layout. Any advise is welcome! thanks so much
the original dictionary
dic = {'ab1': [{'ans': 'Male', 'val': '1'},
{'ans': 'Female', 'val': '2'},
{'ans': 'Other', 'val': '3'},
{'ans': 'Prefer not to answer', 'val': '3'}],
'bc1': [{'ans': 'Employed', 'val': '1'},
{'ans': 'Unemployed', 'val': '2'},
{'ans': 'Student', 'val': '3'},
{'ans': 'Retired', 'val': '4'},
{'ans': 'Part-time', 'val': '5'},
{'ans': 'Prefer not to answer', 'val': '7'}],
'bc2': [{'ans': 'Mother',
'val': '1'},
{'ans': 'Father ', 'val': '2'},
{'ans': 'Brother', 'val': '3'},
{'ans': 'Sister', 'val': '4'},
{'ans': 'Grandmother', 'val': '4'},
{'ans': 'Grandfather', 'val': '6'},
{'ans': 'Son', 'val': '7'},
{'ans': 'Daughter', 'val': '8'}]}
the expected output - a list that contains ONLY items with identical values per key - so only this
ab1: Other 3, Prefer not to answer 3
bc2: Sister 4, Grandmother 4
code I have tried it aims to reverse the dictionary first - but throws unhashable type list error i think because it treats it as a list when in fact the dict might be a tupple but i don't know how to change it
rev_dict = {}
for k, v in dic.items():
rev_dict.setdefault(v, set()).add(k)
res = set(chain.from_iterable(v for k, v in rev_dict.items()
if len(v) > 1))

You've not specified an exact output format, but since you tagged pandas, here's a pandas solution.
import pandas as pd
{k: pd.DataFrame(v)[lambda df: df['val'].duplicated(keep=False)].to_dict(orient='records') for k, v in dic.items()}
Output:
{
'ab1': [{'ans': 'Other', 'val': '3'},
{'ans': 'Prefer not to answer', 'val': '3'}],
'bc1': [],
'bc2': [{'ans': 'Sister', 'val': '4'}, {'ans': 'Grandmother', 'val': '4'}]
}

The panda's answer is certainly nicer:
lst = []
for i in dic.keys():
counts = Counter([j['val'] for j in dic[i]])
new = {j['ans']: j['val'] for j in dic[i] if counts[j['val']] > 1}
lst.append(i + ': ' + ', '.join(['{} {}'.format(i, new[i]) for i in new])) if new else None

Import itertools and try this:
list(itertools.chain(*[[(k, i['ans'],i['val']) for i in v] for k, v in dic.items()]))
Long version
import itertools
lst = []
for k,v in dic.items():
for i in v:
tup = (k, i['ans'],i['val'])
lst.append(tup)
list(itertools.chain(*lst))

Related

dict from a dict of list

I have a Python dictionary with following format:
d1 = {'Name':['ABC'], 'Number':['123'], 'Element 1':['1', '2', '3'],
'Element2':['1','2','3']}
Expected output:
{'Name': 'ABC', 'Number': '123',
'Elements': [{'Element 1': '1', 'Element2': '1'},
{'Element 1': '2', 'Element2': '2'},
{'Element 1': '3', 'Element2': '3'}]
I have tried the following:
[{k: v[i] for k, v in d1.items() if i < len(v)}
for i in range(max([len(l) for l in d1.values()]))]
but getting this result:
[{'Name': 'ABC', 'Number': '123', 'Element 1': '1', 'Element 2': '1'},
{'Element 1': '2', 'Element 2': '2'},
{'Element 1': '3', 'Element 2': '3'}]
How can I go from here?
I strongly recommend not trying to do everything in one line. It's not always more efficient, and almost always less readable if you have any branching logic or nested loops.
Given your dict, we can pop() the Name and Number keys into our new dict. Then
output = dict()
d1 = {'Name':['ABC'], 'Number':['123'], 'Element 1':['1', '2', '3'], 'Element2':['1','2','3']}
output["Name"] = d1.pop("Name")
output["Number"] = d1.pop("Number")
print(output)
# prints:
# {'Name': ['ABC'], 'Number': ['123']}
print(d1)
# prints:
# {'Element 1': ['1', '2', '3'], 'Element2': ['1', '2', '3']}
Then, we zip all remaining values in the dictionary, and add them to a new list:
mylist = []
keys = d1.keys()
for vals in zip(*d1.values()):
temp_obj = dict(zip(keys, vals))
mylist.append(temp_obj)
print(mylist)
# prints:
# [{'Element 1': '1', 'Element2': '1'},
# {'Element 1': '2', 'Element2': '2'},
# {'Element 1': '3', 'Element2': '3'}]
And finally, assign that to output["Elements"]
output["Elements"] = mylist
print(output)
# prints:
# {'Name': ['ABC'], 'Number': ['123'], 'Elements': [{'Element 1': '1', 'Element2': '1'}, {'Element 1': '2', 'Element2': '2'}, {'Element 1': '3', 'Element2': '3'}]}
Since you don't want to hardcode the first two keys,
for k, v in d1.items():
if "element" not in k.lower():
output[k] = v
Or as a dict-comprehension:
output = {k: v for k, v in d1.items() if "element" not in k.lower()}
use a list of tuples to create the elements list of dictionaries. Use Convert to build your dictionary item from the tuple.
#https://www.geeksforgeeks.org/python-convert-list-tuples-dictionary/
d1 = {'Name':['ABC'], 'Number':['123'], 'Element 1':['1', '2', '3'],
'Element2':['1','2','3']}
def Convert(tup, di):
for a, b in tup:
di[a]=b
return di
dict={}
listElements=[]
for key,value in d1.items():
if isinstance(value,list) and len(value)>1:
for item in value:
listElements.append((key,item))
elif isinstance(value,list) and len(value)==1:
dict[key]=value[0]
else:
dict[key]=value
dict['Elements']=[Convert([(x,y)],{}) for x,y in listElements]
print(dict)
output:
{'Name': 'ABC', 'Number': '123', 'Elements': [{'Element 1': '1'}, {'Element 1': '2'}, {'Element 1': '3'}, {'Element2': '1'}, {'Element2': '2'}, {'Element2': '3'}]}
I'm going to explain step by step:
We build new_d1 variable, that is the dictionary you expect as output and it's initialized as {'Name': 'ABC', 'Number': '123'}. For achieving the above, we use comprehension notation taking into account the keys != 'Element'
new_d1 = {key: d1.get(key)[0] for key in filter(lambda x: 'Element' not in x, d1)}
We build elements variable, that's a list with the dictionaries matter for us, I mean, the dictionaries we have to manipulate to achieve the expected result. Then elements is [{'Element 1': ['1', '2', '3']}, {'Element2': ['1', '2', '3']}].
elements = [{key: d1.get(key)} for key in filter(lambda x: 'Element' in x, d1)]
We are going to do a Cartesian product using itertools.product taking into account each key and each item of the values present in elements.
product = [list(it.product(d.keys(), *d.values())) for d in elements]
Using zip, we arrange the data and covert them in dictionary. And finally we create "Elements" key in new_df1
elements_list = [dict(t) for index, t in enumerate(list(zip(*product)))]
new_d1["Elements"] = elements_list
print(new_d1)
Full code:
import itertools as it
new_d1 = {key: d1.get(key)[0] for key in filter(lambda x: 'Element' not in x, d1)}
elements = [{key: d1.get(key)} for key in filter(lambda x: 'Element' in x, d1)]
product = [list(it.product(d.keys(), *d.values())) for d in elements]
elements_list = [dict(t) for index, t in enumerate(list(zip(*product)))]
new_d1["Elements"] = elements_list
Output:
{'Elements': [{'Element 1': '1', 'Element2': '1'},
{'Element 1': '2', 'Element2': '2'},
{'Element 1': '3', 'Element2': '3'}],
'Name': 'ABC',
'Number': '123'}

How to get the keys if one of key is or '0' in dictionary

I have Dictionary is below, My product output is below. I need to create a new dictionary with two types Out_1, Out_2
product = {'Product1': {'index': '1', '1': 'Book', '2': 'Pencil', '3': 'Pen','value': '1'},
'Product2': {'index': '2', '1': 'Marker', '2': 'MYSQL', '3': 'Scale','value': '0'}}
If value inside product is 0 then extract the keys
Expected Output
Out_1 = {'Product2': {1:'Marker': '2': 'Compass', '3': 'Scale', 'value': 0}}
Out_2 = {'Product2':['Marker','Compass','Scale', '0']}
Psuedo code is below. i tried to create but not able to create as above
Out_1 = {}
Out_2 = {i:[]}
for i,j in product.items():
for a,b in j.items():
if a['value'] == 0:
Out_2.append(i)
I am getting indices error, How to get Out_1, Out_2
You can use dict comprehensions for this.
out_1 = {k: v for k, v in product.items() if v['value']=='0'}
out_2 = {k: list(v.values()) for k, v in product.items() if v['value']=='0'}
Hi do you really need this index variable? If not yes why would not you use list of dicts instead of dict of dicts. However here is what you wanted:
products = {'Product1': {'index': '1', '1': 'Book', '2': 'Pencil', '3': 'Pen','value': '1'},
'Product2': {'index': '2', '1': 'Marker', '2': 'MYSQL', '3': 'Scale','value': '0'}}
for k,product in products.items():
product.pop('index', None)
if product['value'] == '0':
products[k] = list(product.values())
print(products)
>>> {'Product1': {'1': 'Book', '2': 'Pencil', '3': 'Pen', 'value': '1'}, 'Product2': ['Marker', 'MYSQL', 'Scale', '0']}
I was not assigning it to any other variables like out1/2 in case you have more than 2 products
Here it is:
Code:
product = {'Product1': {'index': '1', '1': 'Book', '2': 'Pencil', '3': 'Pen','value': '1'},
'Product2': {'index': '2', '1': 'Marker', '2': 'MYSQL', '3': 'Scale','value': '0'}}
output_1 = {}
output_2 = {}
for key,val in product.items():
if (val['value'] == '0'):
output_1[key]=val
output_2[key]=val.values()
print(output_1)
print(output_2)
Output:
{'Product2': {'1': 'Marker', 'index': '2', '3': 'Scale', '2': 'MYSQL', 'value': '0'}}
{'Product2': ['Marker', '2', 'Scale', 'MYSQL', '0']}

Update JSON format from other JSON file

I have two files which are a and b. I want to import certain information from data b to data a with the unique id from every response.
data
a= [{'id':'abc23','name':'aa','age':'22',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}},
{'id':'abc25','name':'bb','age':'32',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}},
{'id':'abc60','name':'cc','age':'24',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}}]
b=[{'id':'abc23','read':'2','speak':'abc','write':'2'},
{'id':'abc25','read':'3','speak':'def','write':'3'},
{'id':'abc60','read':'5','speak':'dgf','write':'1'}]
Code that I used to import from b to a :
from pprint import pprint
for dest in a:
for source in b:
if source['id'] == dest['id']:
dest['data'].update(source)
pprint(a)
Output from the code that i used :
[{ 'age': '22',
'data': {'id': 'abc23', 'read': '2', 'speak': 'abc', 'write': '2'},
'id': 'abc23',
'name': 'aa',
'responses': {'a': 1, 'b': 2}},
{ 'age': '32',
'data': {'id': 'abc25', 'read': '3', 'speak': 'def', 'write': '3'},
'id': 'abc25',
'name': 'bb',
'responses': {'a': 1, 'b': 2}},
{ 'age': '24',
'data': {'id': 'abc60', 'read': '5', 'speak': 'dgf', 'write': '1'},
'id': 'abc60',
'name': 'cc',
'responses': {'a': 1, 'b': 2}}]
But... This is the output that I want:
[{'age': '22',
'data': {'read': '2', 'speak': 'abc'},
'id': 'abc23',
'name': 'aa',
'responses': {'a': 1, 'b': 2}},
{'age': '32',
'data': {'read': '3', 'speak': 'def'},
'id': 'abc25',
'name': 'bb',
'responses': {'a': 1, 'b': 2}},
{'age': '24',
'data': {'read': '5', 'speak': 'dgf'},
'id': 'abc60',
'name': 'cc',
'responses': {'a': 1, 'b': 2}}]
It can't work the way you want with your code.
You do
dest['data'].update(source)
where source is
{'id':'abc23','read':'2','speak':'abc','write':'2'}
and dest['data'] is {'read':'','speak':''}.
When you update it will add all key-value pairs to dest['data'] and preserve the ones that won't be overwritten.
from pprint import pprint
for dest in a:
for source in b:
if source['id'] == dest['id']:
dest['data'] = {k: v for k, v in source.items() if k in dest.get('data', {})}
pprint(a)
This one will look for all the fields that are 'updateable' for each case. You might want to hardcode it, depending on your use case.
This is one approach by changing b to a dict for easy lookup.
Ex:
a= [{'id':'abc23','name':'aa','age':'22',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}},
{'id':'abc25','name':'bb','age':'32',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}},
{'id':'abc60','name':'cc','age':'24',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}}]
b=[{'id':'abc23','read':'2','speak':'abc','write':'2'},
{'id':'abc25','read':'3','speak':'def','write':'3'},
{'id':'abc60','read':'5','speak':'dgf','write':'1'}]
b = {i.pop('id'): i for i in b} #Convert to dict key = ID & value = `read`, `speak`, `write`
for i in a:
i['data'].update(b[i['id']]) #Update list
print(a)
Output:
[{'age': '22',
'data': {'read': '2', 'speak': 'abc', 'write': '2'},
'id': 'abc23',
'name': 'aa',
'responses': {'a': 1, 'b': 2}},
{'age': '32',
'data': {'read': '3', 'speak': 'def', 'write': '3'},
'id': 'abc25',
'name': 'bb',
'responses': {'a': 1, 'b': 2}},
{'age': '24',
'data': {'read': '5', 'speak': 'dgf', 'write': '1'},
'id': 'abc60',
'name': 'cc',
'responses': {'a': 1, 'b': 2}}]

How to write two else condition in dict comprehension

2 dictionary d1,d2, create a new dictionary with same keys.
d1 = {'product': '8', 'order': '8', 'tracking': '3'}
d2 = {'order': 1, 'product': 1,'customer':'5'}
dict3 = { k: [ d1[k], d2[k] ] if k in d2 else [d1[k]] for k in d1 }
dict3
{'product': ['8', 1], 'order': ['8', 1], 'tracking': ['3']}
How to pass else [d2[k]] for k in d2 to get the expected out
My Expected out
{'product': ['8', 1], 'order': ['8', 1], 'tracking': ['3'],'customer':['5']}
Disclaimer. I have done with defaultdict. Please give answer in dict comprehension only
You could use a nested ternary ... if ... else (... if ... else ...), but what if there are three dictionaries, or four?
Better use a nested list comprehension and iterate over the different dictionaries.
>>> d1 = {'product': '8', 'order': '8', 'tracking': '3'}
>>> d2 = {'order': 1, 'product': 1,'customer':'5'}
>>> {k: [d[k] for d in (d1, d2) if k in d] for k in set(d1) | set(d2)}
{'customer': ['5'], 'order': ['8', 1], 'product': ['8', 1], 'tracking': ['3']}
You have to iterate over both the dictionaries to include all the keys in new constructed dict.
You can achieve this by using defaultdict
from collections import defaultdict
res = defaultdict(list)
for key, value in d1.items():
res[key].append(value)
for key, value in d2.items():
res[key].append(value)
Output:
>>> dict(res)
>>> {'product': ['8', 1], 'order': ['8', 1], 'tracking': ['3'], 'customer': ['5']}
Using a defaultdict without a comprehension is a much, much better way to go, but as requested:
d1 = {'product': '8', 'order': '8', 'tracking': '3'}
d2 = {'order': 1, 'product': 1,'customer':'5'}
d3 = {
k: [d1[k], d2[k]]
if (k in d1 and k in d2)
else [d1[k]]
if k in d1
else [d2[k]]
for k in list(d1.keys()) + list(d2.keys())
}
d3 is now:
{'product': ['8', 1], 'order': ['8', 1], 'tracking': ['3'], 'customer': ['5']}
>>> d1 = {'product': '8', 'order': '8', 'tracking': '3'}
>>> d2 = {'order': 1, 'product': 1, 'customer': '5'}
>>> dict3 = {k: [d1[k], d2[k]] if k in d1 and k in d2 else [d1[k]] if k in d1 else [d2[k]] for list in [d1, d2] for k in list}
>>> dict3
{'product': ['8', 1], 'order': ['8', 1], 'tracking': ['3'],'customer':['5']}
d1 = {'product': '8', 'order': '8', 'tracking': '3'}
d2 = {'order': 1, 'product': 1, 'customer': '5'}
list_ = []
for i in [d1, d2]:
list_.append(i)
list_
[{'product': '8', 'order': '8', 'tracking': '3'},
{'order': 1, 'product': 1, 'customer': '5'}]
dict_={}
for d in list_:
for k,v in d.items():
dict_.setdefault(k,[]).append(v)
dict_
{'product': ['8', 1], 'order': ['8', 1], 'tracking': ['3'], 'customer': ['5']}
Comprehension
combined_key = {key for d in list_ for key in d}
combined_key
{'customer', 'order', 'product', 'tracking'}
super_dict = {key:[d[key] for d in list_ if key in d] for key in combined_key}
super_dict
{'customer': ['5'], 'tracking': ['3'], 'order': ['8', 1], 'product': ['8', 1]}

Exclude repeated values from a dictionary and increment the 'qty' field accordingly

Considering '1', '2', '3', '4' are the indexes and everything else as the values of a dictionary in Python, I'm trying to exclude the repeating values and increment the quantity field when a dupicate is found. e.g.:
Turn this:
a = {'1': {'name': 'Blue', 'qty': '1', 'sub': ['sky', 'ethernet cable']},
'2': {'name': 'Blue', 'qty': '1', 'sub': ['sky', 'ethernet cable']},
'3': {'name': 'Green', 'qty': '1', 'sub': []},
'4': {'name': 'Blue', 'qty': '1', 'sub': ['sea']}}
into this:
b = {'1': {'name': 'Blue', 'qty': '2', 'sub': ['sky', 'ethernet cable']},
'2': {'name': 'Green', 'qty': '1', 'sub': []},
'3': {'name': 'Blue', 'qty': '1', 'sub': ['sea']}}
I was able to exclude the duplicates, but I'm having a hard time incrementing the 'qty' field:
b = {}
for k,v in a.iteritems():
if v not in b.values():
b[k] = v
P.S.: I posted this question earlier, but forgot to add that the dictionary can have that 'sub' field which is a list. Also, don't mind the weird string indexes.
First, convert the original dict 'name' and 'sub' keys to a comma-delimited string, so we can use set():
data = [','.join([v['name']]+v['sub']) for v in a.values()]
This returns
['Blue,sky,ethernet cable', 'Green', 'Blue,sky,ethernet cable', 'Blue,sea']
Then use the nested dict and list comprehensions as below:
b = {str(i+1): {'name': j.split(',')[0], 'qty': sum([int(qty['qty']) for qty in a.values() if (qty['name']==j.split(',')[0]) and (qty['sub']==j.split(',')[1:])]), 'sub': j.split(',')[1:]} for i, j in enumerate(set(data))}
Maybe you can try to use a counter like this:
b = {}
count = 1
for v in a.values():
if v not in b.values():
b[str(count)] = v
count += 1
print b

Categories

Resources