Convert a list-of-dictionaries to a dictionary

Convert a list-of-dictionaries to a dictionary - python

I have this list of dictionaries I want to convert to one dictionary
vpcs = [{'VPCRegion': 'us-east-1', 'VPCId': '12ededd4'},
{'VPCRegion': 'us-east-1', 'VPCId': '9847'},
{'VPCRegion': 'us-west-2', 'VPCId': '99485003'}]
I want to convert it to
{'us-east-1': '12ededd4', 'us-east-1': '9847', 'us-west-2': '99485003'}
I used this function
def convert_dict(tags):
return {tag['VPCRegion']:tag['VPCId'] for tag in tags}
but get this output it doesn't convert the first dictionary in the list
{'us-east-1': '9847', 'us-west-2': '99485003'}

Perhaps a list of dictionary may fit your need - see code below:
[{'us-east-1': '12ededd4'}, {'us-east-1': '9847'}, {'us-west-2': '99485003'}]
To elaborate on what other commented about dictionary key has to be unique, you can see that in the commented line which zip up the list_dict would result error if the 'vpcs' has 2 duplicate 'VPCRegion': 'us-east-1' and successfully create new dict if you take out one of the 'VPCRegion': 'us-east-1'.
vpcs = [{'VPCRegion': 'us-east-1', 'VPCId': '12ededd4'},
{'VPCRegion': 'us-east-1', 'VPCId': '9847'},
{'VPCRegion': 'us-west-2', 'VPCId': '99485003'}]
def changekey(listofdict):
new_dict = {}
new_list = []
for member in listofdict:
new_key = member['VPCRegion']
new_val = member['VPCId']
new_dict.update({new_key:new_val})
new_list.append({new_key:new_val})
return new_dict, new_list
dict1,list_dict=changekey(vpcs)
print(dict1)
print(list_dict)
#dict4=dict(zip(*[iter(list_dict)]*2))
#print(dict4)

Since your output must group several values under the same name, your output will be a dict of lists, not a dict of strings.
One way to quickly do it:
import collections
def group_by_region(vpcs):
result = collections.defaultdict(list)
for vpc in vpcs:
result[vpc['VPCRegion']].append(vpc['VPCId'])
return result
The result of group_by_region(vpcs) will be {'us-east-1': ['12ededd4', '9847'], 'us-west-2': ['99485003']}).
As an entertainment, here's a cryptic but efficient way to get this in one expression:
import itertools
{key: [rec['VPCId'] for rec in group]
for (key, group) in itertools.groupby(vpcs, lambda vpc: vpc['VPCRegion'])}

Related

python updating dictionary with list as the value type

I'm trying to iterate through data extracted from a file and store them in a dictionary based on each data's id
These are the id (str) for the data : "sensor", "version", "frame", "function"
And the data are in hexadecimal string.
What I bascially start with is a huge list of tuples in a form of id and data (that i extracted from a file)
example_list = [("sensor", 245), ("frame", 455), ("frame", 77)] and so on
This example_list stores all the data, so it has information of data for all the id.
I want to make a dictionary with id as key and list of data as value so when done iterating through the example_list, I have list of values for specific id (so I can iterate through the value list to get all the data for a specific id (the key))
To start, all values (list) will start with an empty list
my_dict = {"sensor": [], "frame": [], "version": [], "function": []}
Then, as I iterate through example_list, if the id is in my_dict as a key, I append the value to the values list in my_dict
for itm in example_list:
if itm[0] in my_dict:
tmp = my_dict[itm[0]] # since itm[0] is the id
tmp.append(itm[1])
my_dict[itm[0]] = tmp # update the list
When I tried this, it seems like the final my_dict's value list has the value of the lastest data
What I mean by this is if
example_list = [("sensor", 245), ("frame", 455), ("frame", 77)]
then
my_dict = {"sensor": [245], "frame": [77], "version": [], "function": []}
I may be wrong about this interpretation (since the data I'm reading is really big), but when I printed my_dict in the end of function, each value list had only one data inside, which is far off from what I expected (list of data instead of just one)
I tried searching and people used update function to update the dictionary but that one also didn't seem to work and gave me somehting unhashable error/warning.
Any way to implement what I want to do?

try doing it like so:
for itm in example_list:
if itm[0] in my_dict:
my_dict[itm[0]].append(itm[1])

Your code is working as required. To simplify, as you've already instantiated the dict with empty lists:
for i,j in example_list:
my_dict[i].append(j)
print(my_dict)
Output:
{'sensor': [245], 'frame': [455, 77], 'version': [], 'function': []}

What you want to do is:
for itm in example_list:
if itm[0] in my_dict.keys(): # have to look if keys match
my_dict[itm[0]].append(itm[1]) # # directly access the key-value pair
Your problem was that you created a new list and appended your item to it each time the loop was run, therefore the old data was deleted everytime.

Check for string in list items using list as reference

I want to replace items in a list based on another list as reference.
Take this example lists stored inside a dictionary:
dict1 = {
"artist1": ["dance pop","pop","funky pop"],
"artist2": ["chill house","electro house"],
"artist3": ["dark techno","electro techno"]
}
Then, I have this list as reference:
wish_list = ["house","pop","techno"]
My result should look like this:
dict1 = {
"artist1": ["pop"],
"artist2": ["house"],
"artist3": ["techno"]
}
I want to check if any of the list items inside "wishlist" is inside one of the values of the dict1. I tried around with regex, any.
This was an approach with just 1 list instead of a dictionary of multiple lists:
check = any(item in artist for item in wish_list)
if check == True:
artist_genres.clear()
artist_genres.append()
I am just beginning with Python on my own and am playing around with the SpotifyAPI to clean up my favorite songs into playlists. Thank you very much for your help!

The idea is like this,
dict1 = { "artist1" : ["dance pop","pop","funky pop"],
"artist2" : ["house","electro house"],
"artist3" : ["techno","electro techno"] }
wish_list = ["house","pop","techno"]
dict2={}
for key,value in dict1.items():
for i in wish_list:
if i in value:
dict2[key]=i
break
print(dict2)

A regex is not needed, you can get away by simply iterating over the list:
wish_list = ["house","pop","techno"]
dict1 = {
"artist1": ["dance pop","pop","funky pop"],
"artist2": ["chill house","electro house"],
"artist3": ["dark techno","electro techno"]
}
dict1 = {
# The key is reused as-is, no need to change it.
# The new value is the wishlist, filtered based on its presence in the current value
key: [genre for genre in wish_list if any(genre in item for item in value)]
for key, value in dict1.items() # this method returns a tuple (key, value) for each entry in the dictionary
}
This implementation relies a lot on list comprehensions (and also dictionary comprehensions), you might want to check it if it's new to you.

List Comprehension returns empty list

I'm trying to query a MongoDB database and throw the two sets of results ('_id' and 'Team') into two separate lists.
import pymongo
client = pymongo.MongoClient('localhost:27017')
db = client['db_name']
query = {'Team': {'$exists': 1}}
projection = {'_id': 1, 'Team': 1}
data = db['collection_name'].find(query, projection) # line 9
id_list = [value for dict in data for key, value in dict.iteritems() if key == '_id']
teams_list = [value for dict in data for key, value in dict.iteritems() if key == 'Team']
print id_list
print teams_list
client.close()
For the code above, the 'id_list' is as expected but 'teams_list' is empty. When I put 'teams_list' before 'id_list' I get the expected 'teams_list' output and 'id_list' is empty. And when I repeat the data call (line 9) in between the two list comprehensions I get the expected output for both lists.
Any idea why this is happening?

You need to define your data as:
data = list(db['collection_name'].find(query, projection))
As find() returns the generator. Once you iterate the values, those are lost. You need to store them as list. Here list() does that i.e. stores the items returns by generator as list.
Instead of iterating the list twice, better way will be two do it single loop as:
id_list, teams_list = [], []
# v `dict` is in-built data type, you should not be using it as variable
for d in data:
for key, value in d.iteritems():
if key == '_id':
id_list.append(value)
elif key == 'Team':
teams_list.append(value)
Refer Generator wiki for more information related to generators

As already mentioned the culprit here is the find() method which returns a Cursor object which is consumed when you iterate it the first time.
But you are using the wrong method for the job. You need to use the .aggregate() method.
query = {'Team': {'$exists': 1}}
cursor = db['collection_name'].aggregate([
{'$match': query }
{ '$group': {
'_id': None,
'id_list': {'$push': '$_id'},
'teams_list': {'$push': '$Team'}
}}
])
The .aggregate() method like his partner in crime .find() returns a CommandCursor over the result set which is a generator like object.
Because we are grouping by None, iterating the cursor will yield a single document which means that you can safely do:
print list(cursor)[0] # return a dictionary
or
result = list(cursor)[0]
print result['id_list']
print result['teams_list']

Update dictionary if in list

I'm running through an excel file reading line by line to create dictionaries and append them to a list, so I have a list like:
myList = []
and a dictionary in this format:
dictionary = {'name': 'John', 'code': 'code1', 'date': [123,456]}
so I do this: myList.append(dictionary), so far so good. Now I'll go into the next line where I have a pretty similar dictionary:
dictionary_two = {'name': 'John', 'code': 'code1', 'date': [789]}
I'd like to check if I already have a dictionary with 'name' = 'John' in myList so I check it with this function:
def checkGuy(dude_name):
return any(d['name'] == dude_name for d in myList)
Currently I'm writing this function to add the guys to the list:
def addGuy(row_info):
if not checkGuy(row_info[1]):
myList.append({'name':row_info[1],'code':row_info[0],'date':[row_info[2]]})
else:
#HELP HERE
in this else I'd like to dict.update(updated_dict) but I don't know how to get the dictionary here.
Could someone help so dictionary appends the values of dictionary_two?

I would modify checkGuy to something like:
def findGuy(dude_name):
for d in myList:
if d['name'] == dude_name:
return d
else:
return None # or use pass
And then do:
def addGuy(row_info):
guy = findGuy(row_info[1])
if guy is None:
myList.append({'name':row_info[1],'code':row_info[0],'date':[row_info[2]]})
else:
guy.update(updated_dict)

This answer suggestion is pasted on the comments where it was suggested that if "name" is the only criteria to search on then it could be used as a key in a dictionary instead of using a list.
master = {"John" : {'code': 'code1', 'date': [123,456]}}
def addGuy(row_info):
key = row_info[1]
code = row_info[0]
date = row_info[2]
if master.get(key):
master.get(key).update({"code": code, "date": date})
else:
master[key] = {"code": code, "date": date}

If you dict.update the existing data each time you see a repeated name, your code can be reduced to a dict of dicts right where you read the file. Calling update on existing dicts with the same keys is going to overwrite the values leaving you with the last occurrence so even if you had multiple "John" dicts they would all contain the exact same data by the end.
def read_file():
results = {name: {"code": code, "date": date}
for code, name, date in how_you_read_into_rows}
If you actually think that the values get appended somehow, you are wrong. If you wanted to do that you would need a very different approach. If you actually want to gather the dates and codes per user then use a defauldict appending the code,date pair to a list with the name as the key:
from collections import defaultdict
d = defaultdict(list)
def read_file():
for code, name, date in how_you_read_into_rows:
d["name"].append([code, date])
Or some variation depending on what you want the final output to look like.

Matching values in nested dictionaries

I have two dictionaries which contain nested sub-dictionaries. They are structured as follows:
search_regions = {
'chr11:56694718-71838208': {'Chr': 'chr11', 'End': 71838208, 'Start': 56694718},
'chr13:27185654-39682032': {'Chr': 'chr13', 'End': 39682032, 'Start': 27185654}
}
database_variants = {
'chr11:56694718-56694718': {'Chr': 'chr11', 'End': 56694718, 'Start': 56694718},
'chr13:27185659-27185659': {'Chr': 'chr13', 'End': 27185659, 'Start': 27185659}
}
I need to compare them and pull out the dictionaries in database_variants
which fall in the range of the dictionaries in search_regions.
I am building a function to do this (linked to a previous question). This is what I have so far:
def region_to_variant_location_match(search_Variants, database_Variants):
'''Take dictionaries for search_Variants and database_Variants as input.
Match variants in database_Variants to regions within search_Variants.
Return matches as a nested dictionary.'''
#Match on Chr value
#Where Start value from database_variant is between St and End values in
search_variants.
#return as nested dictionary
The problem I am having is working out how to get to the values in the nested dictionaries (Chr, St, End, etc) for the comparison. I'd like to do this using list comprehension as I've got quite a bit of data to get through so a simpler for loop might be more time consuming.
Any help is much appreciated!
UPDATE
I've tried to implement the solution suggested by bioinfoboy below. My first step was to convert the search_regions and database_variants dictionaries into defaultdict(list) using the following functions:
def search_region_converter(searchDict):
'''This function takes the dictionary of dictionaries and converts it to a
DefaultDict(list) to allow matching
with the database in a corresponding format'''
search_regions = defaultdict(list)
for i in search_regions.keys():
chromosome = i.split(":")[0]
start = int(i.split(":")[1].split("-")[0])
end = int(i.split(":")[1].split("-")[1])
search_regions[chromosome].append((start, end))
return search_regions #a list with chromosomes as keys
def database_snps_converter(databaseDict):
'''This function takes the dictionary of dictionaries and converts it to a
DefaultDict(list) to allow matching
with the serach_snps in a corresponding format'''
database_variants = defaultdict(list)
for i in database_variants.keys():
chromosome = i.split(":")[0]
start = int(i.split(":")[1].split("-")[0])
database_variants[chromosome].append(start)
return database_variants #list of database variants
Then I have made a function for matching (again with bioinfoboy's code), which is as follows:
def region_to_variant_location_match(search_Regions, database_Variants):
'''Take dictionaries for search_Variants and database_Variants as
input.
Match variants in database_Variants to regions within search_Variants.'''
for key, values in database_Variants.items():
for value in values:
for search_area in search_Regions[key]:
print(search_area)
if (value >= search_area[0]) and (value <= search_area[1]):
yield(key, search_area)
However the defaultdict functions return empty dictionaries and I can't quite work out what I need to change.
Any ideas?

I imagine this may help
I'm converting your search_regions and database_variants according to what I've mentioned in the comment.
from collections import defaultdict
_database_variants = defaultdict(list)
_search_regions = defaultdict(list)
for i in database_variants.keys():
_chromosome = i.split(":")[0]
_start = int(i.split(":")[1].split("-")[0])
_database_variants[_chromosome].append(_start)
_search_regions = defaultdict(list)
for i in search_regions.keys():
_chromosome = i.split(":")[0]
_start = int(i.split(":")[1].split("-")[0])
_end = int(i.split(":")[1].split("-")[1])
_search_regions[_chromosome].append((_start, _end))
def _search(_database_variants, _search_regions):
for key, values in _database_variants.items():
for value in values:
for search_area in _search_regions[key]:
if (value >= search_area[0]) and (value <= search_area[1]):
yield(key, search_area)
I've used yield and thus would return a generator object on which you can iterate through. Considering the data that you've provided initially in the question, I get the following output.
for i in _search(_database_variants, _search_regions):
print(i)
The output is the following:
('chr11', (56694718, 71838208))
('chr13', (27185654, 39682032))
Is that not what you are trying to achieve?

You should probably do something like
def region_to_variant_location_match(search_Variants, database_Variants):
'''Take dictionaries for search_Variants and database_Variants as input.
Match variants in database_Variants to regions within search_Variants.
Return matches as a nested dictionary.'''
return {
record[0]: record[1]
for record, lookup in zip(
database_Variants.items(),
search_Variants.items()
)
if (
record[1]['Chr'] == lookup[1]['Chr'] and
lookup[1]['Start'] <= record[1]['Start'] <= lookup[1]['End']
)
}
Note that if you were using Python 2.7 or lower (instead of Python 3), you would do iteritems() instead of items() and itertools.izip() instead of zip, and if you were using less than 2.6, you would need to switch to a generator comprehension being passed to dict() instead of a dict comprehension.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert a list-of-dictionaries to a dictionary - python

Related

python updating dictionary with list as the value type

Check for string in list items using list as reference

List Comprehension returns empty list

Update dictionary if in list

Matching values in nested dictionaries

Categories

Resources