Remove duplicates values from list of dictionaries in python - python

I want to remove duplicates values from list which is inside the dictionary. I am trying to make configurable code to work on any field instead of making field specific.
Input Data :
{'Customer_Number': 90617174, 'Email': [{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu', 'saman.zonouz#rutgers.edu']}], 'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280, 12177218280]}]}
Expected Output Data :
{'Customer_Number': 90617174, 'Email': [{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu']}], 'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280]}]}
code tried:
dic = {'Customer_Number': 90617174, 'Email': [{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu', 'saman.zonouz#rutgers.edu']}], 'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280, 12177218280]}]}
res = []
for i in dic:
if i not in res:
res.append(i)

You can use set()
import json
dic = {
'Customer_Number': 90617174,
'Email': [
{
'Email_Type': 'Primary',
'Email': list(set([
'saman.zonouz#rutgers.edu',
'saman.zonouz#rutgers.edu',
]))
}
],
'Phone_Number': [
{
'Phone_Type': 'Mobile',
'Phone': list(set([
12177218280,
12177218280,
]))
}
]
}
print(json.dumps(dic,indent=2))
If you want to do it on a list of dic's then you can do like this:
for dic in dics:
for email in dic['Email']:
email['Email'] = list(set(email['Email']))
for phone in dic['Phone_Number']:
phone['Phone'] = list(set(phone['Phone']))

The approach that you started with, you need to go a few levels deeper with that to find every such "repeating" list and dedupe it.
To dedupe, you can use a set - which is also a "container" data structure like a list but with some (many?) differences. You can get a good introduction to all of this in the official python docs -
for key in dic:
if isinstance(dic[key], list):
for inner_dict in dic[key]:
for inner_key in inner_dict:
if isinstance(inner_dict[inner_key], list):
inner_dict[inner_key] = list(set(inner_dict[inner_key]))
print(dic)
#{'Customer_Number': 90617174,
# 'Email': [{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu']}],
# 'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280]}]}

Related

Remove item in JSON if key has value

I have tried everything I can possible come up with, but the value wont go away.
I have a JSON user and if user['permissions'] have key permission = "DELETE PAGE" remove that index of del user['permissions'][1] (in this example)
I want to have a list of possible values as "DELETE PAGE" and so on. If value in key, then delete that index.
Then return the users json without those items found.
I have tried del user['permission][x] and .pop() and so on but it is still there.
{
'id': 123,
'name': 'My name',
'description': 'This is who I am',
'permissions': [{
'id': 18814,
'holder': {
'type': 'role',
'parameter': '321',
'projectRole': {
'name': 'Admin',
'id': 1,
}
},
'permission': 'VIEW PAGE'
}, {
'id': 18815,
'holder': {
'type': 'role',
'parameter': '123',
'projectRole': {
'name': 'Moderator',
'id': 2
}
},
'permission': 'DELETE PAGE'
}]
}
Here's the code:
perm = a['permissions']
for p in perm:
if p['permission'] == 'DELETE PAGE':
perm.remove(p)
print(a)
Output:
{'id': 123, 'name': 'My name', 'description': 'This is who I am', 'permissions': [{'id': 18814, 'holder': {'type': 'role', 'parameter': '321', 'projectRole': {'name': 'Admin', 'id': 1}}, 'permission': 'VIEW PAGE'}]}

How to get key values in a json object(Python)

This is my json :
{'1': {'name': 'poulami', 'password': 'paul123', 'profession': 'user', 'uid': 'poulamipaul'}, '2': {'name': 'test', 'password': 'testing', 'profession': 'tester', 'uid': 'jarvistester'}}
I want to get a list of all the values of name.
What should be my code in python
d.values gives all the values, then you can get the attribute name of each value.
d = {'1': {'name': 'poulami', 'password': 'paul123', 'profession': 'user', 'uid': 'poulamipaul'}, '2': {'name': 'test', 'password': 'testing', 'profession': 'tester', 'uid': 'jarvistester'}}
[i['name'] for i in d.values()]
['poulami', 'test']
Also note that d.values returns a generator and not a list so to convert to list use list(d.values())
That is not JSON format. It is a Python Dictionary.
Iterate over the values of the dictionary(d.values()) and get the name from each item.
d = {'1': {'name': 'poulami', 'password': 'paul123', 'profession': 'user', 'uid': 'poulamipaul'}, '2': {'name': 'test', 'password': 'testing', 'profession': 'tester', 'uid': 'jarvistester'}}
names_list = []
for i in d.values():
names_list.append(i['name'])
names_list = ['poulami', 'test']

creating a dictionary by partitioning a dictionary with new keys in python

I have the a dictionary like this:
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}
I want to create another list as follows:
[{"label":{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"},"value":
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}}]
I have tried some methods with .items() but none of them gives the desired result.
Is that what you want?
dict_ = {"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}
output = [{"label": dict_ , "value": dict_ }]
print(output)
[{"label":{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"},"value":
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}}] == [{"label": dict_ , "value": dict_ }]
Gives True
Following my comment, below is the code I would go through assuming key and output:
# Could be the keys would get from somewhere
vals = ["1","2","3","4"]
# Probably same coming from external sources
example_op =
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}
#Global list
item_list = []
temp_dict = {}
for key in vals:
temp_dict[key] = example_op
item_list.append(temp_dict)
Final output of the list would be as:
Out[9]:
[{'1': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'},
'2': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'},
'3': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'},
'4': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'}}]

Make a list of first item in every element of a dictionary

I am working with a dictionary from a file with an import. This is the dictionary:
[{'id': 76001,
'full_name': 'Alaa Abdelnaby',
'first_name': 'Alaa',
'last_name': 'Abdelnaby',
'is_active': False},
{'id': 76002,
'full_name': 'Zaid Abdul-Aziz',
'first_name': 'Zaid',
'last_name': 'Abdul-Aziz',
'is_active': False},
{'id': 76003,
'full_name': 'Kareem Abdul-Jabbar',
'first_name': 'Kareem',
'last_name': 'Abdul-Jabbar',
'is_active': False}]
What I want to do is get a list out of all the IDs:
player_ids = [76001,76002, 76003]
I have tried:
player_ids = []
for i in player_dict:
player_ids.append(player_dict[i]['id'])
but I get the error
TypeError: list indices must be integers or slices, not dict
So I get that 'i' is not the place but the actual item I am calling in the dictionary? But I'm not able to make much sense of this based on what I have read.
The pythonic way to do this is with a list comprehension. For example:
player_ids = [dict['id'] for dict in player_dict]
This basically loops over all dictionaries in the player_dict, which is actually a list in your case, and for every dictionary gets the item with key 'id'.
You can try list comprehension:
>>> [d['id'] for d in my_list]
[76001, 76002, 76003]
Here is how you can use a list comprehension:
player_dict = [{'id': 76001,
'full_name': 'Alaa Abdelnaby',
'first_name': 'Alaa',
'last_name': 'Abdelnaby',
'is_active': False},
{'id': 76002,
'full_name': 'Zaid Abdul-Aziz',
'first_name': 'Zaid',
'last_name': 'Abdul-Aziz',
'is_active': False},
{'id': 76003,
'full_name': 'Kareem Abdul-Jabbar',
'first_name': 'Kareem',
'last_name': 'Abdul-Jabbar',
'is_active': False}]
player_ids = [d['id'] for d in player_dict]
print(player_ids)
Output:
[76001, 76002, 76003]
Just append
player_ids.append(i['id'])

Sort a list of dictionaries while consolidating duplicates in Python?

So I have a list of dictionaries like so:
data = [ {
'Organization' : '123 Solar',
'Phone' : '444-444-4444',
'Email' : '',
'website' : 'www.123solar.com'
}, {
'Organization' : '123 Solar',
'Phone' : '',
'Email' : 'joey#123solar.com',
'Website' : 'www.123solar.com'
}, {
etc...
} ]
Of course, this is not the exact data. But (maybe) from my example here you can catch my problem. I have many records with the same "Organization" name, but not one of them has the complete information for that record.
Is there an efficient method for searching over the list, sorting the list based on the dictionary's first entry, and finally merging the data from duplicates to create a unique entry? (Keep in mind these dictionaries are quite large)
You can make use of itertools.groupby:
from itertools import groupby
from operator import itemgetter
from pprint import pprint
data = [ {
'Organization' : '123 Solar',
'Phone' : '444-444-4444',
'Email' : '',
'website' : 'www.123solar.com'
}, {
'Organization' : '123 Solar',
'Phone' : '',
'Email' : 'joey#123solar.com',
'Website' : 'www.123solar.com'
},
{
'Organization' : '234 test',
'Phone' : '111',
'Email' : 'a#123solar.com',
'Website' : 'b.123solar.com'
},
{
'Organization' : '234 test',
'Phone' : '222',
'Email' : 'ac#123solar.com',
'Website' : 'bd.123solar.com'
}]
data = sorted(data, key=itemgetter('Organization'))
result = {}
for key, group in groupby(data, key=itemgetter('Organization')):
result[key] = [item for item in group]
pprint(result)
prints:
{'123 Solar': [{'Email': '',
'Organization': '123 Solar',
'Phone': '444-444-4444',
'website': 'www.123solar.com'},
{'Email': 'joey#123solar.com',
'Organization': '123 Solar',
'Phone': '',
'Website': 'www.123solar.com'}],
'234 test': [{'Email': 'a#123solar.com',
'Organization': '234 test',
'Phone': '111',
'Website': 'b.123solar.com'},
{'Email': 'ac#123solar.com',
'Organization': '234 test',
'Phone': '222',
'Website': 'bd.123solar.com'}]}
UPD:
Here's what you can do to group items into single dict:
for key, group in groupby(data, key=itemgetter('Organization')):
result[key] = {'Phone': [],
'Email': [],
'Website': []}
for item in group:
result[key]['Phone'].append(item['Phone'])
result[key]['Email'].append(item['Email'])
result[key]['Website'].append(item['Website'])
then, in result you'll have:
{'123 Solar': {'Email': ['', 'joey#123solar.com'],
'Phone': ['444-444-4444', ''],
'Website': ['www.123solar.com', 'www.123solar.com']},
'234 test': {'Email': ['a#123solar.com', 'ac#123solar.com'],
'Phone': ['111', '222'],
'Website': ['b.123solar.com', 'bd.123solar.com']}}
Is there an efficient method for searching over the list, sorting the list based on the dictionary's first entry, and finally merging the data from duplicates to create a unique entry?
Yes, but there's an even more efficient method without searching and sorting. Just build up a dictionary as you go along:
datadict = {}
for thingy in data:
organization = thingy['Organization']
datadict[organization] = merge(thingy, datadict.get(organization, {}))
Now you've making a linear pass over the data, doing a constant-time lookup for each one. So, it's better than any sorted solution by a factor of O(log N). It's also one pass instead of multiple passes, and it will probably have lower constant overhead besides.
It's not clear exactly what you want to do to merge the entries, and there's no way anyone can write the code without knowing what rules you want to use. But here's a simple example:
def merge(d1, d2):
for key, value in d2.items():
if not d1.get(key):
d1[key] = value
return d1
In other words, for each item in d2, if d1 already has a truthy value (like a non-empty string), leave it alone; otherwise, add it.

Categories

Resources