Matching values in nested dictionaries - python

I have two dictionaries which contain nested sub-dictionaries. They are structured as follows:
search_regions = {
'chr11:56694718-71838208': {'Chr': 'chr11', 'End': 71838208, 'Start': 56694718},
'chr13:27185654-39682032': {'Chr': 'chr13', 'End': 39682032, 'Start': 27185654}
}
database_variants = {
'chr11:56694718-56694718': {'Chr': 'chr11', 'End': 56694718, 'Start': 56694718},
'chr13:27185659-27185659': {'Chr': 'chr13', 'End': 27185659, 'Start': 27185659}
}
I need to compare them and pull out the dictionaries in database_variants
which fall in the range of the dictionaries in search_regions.
I am building a function to do this (linked to a previous question). This is what I have so far:
def region_to_variant_location_match(search_Variants, database_Variants):
'''Take dictionaries for search_Variants and database_Variants as input.
Match variants in database_Variants to regions within search_Variants.
Return matches as a nested dictionary.'''
#Match on Chr value
#Where Start value from database_variant is between St and End values in
search_variants.
#return as nested dictionary
The problem I am having is working out how to get to the values in the nested dictionaries (Chr, St, End, etc) for the comparison. I'd like to do this using list comprehension as I've got quite a bit of data to get through so a simpler for loop might be more time consuming.
Any help is much appreciated!
UPDATE
I've tried to implement the solution suggested by bioinfoboy below. My first step was to convert the search_regions and database_variants dictionaries into defaultdict(list) using the following functions:
def search_region_converter(searchDict):
'''This function takes the dictionary of dictionaries and converts it to a
DefaultDict(list) to allow matching
with the database in a corresponding format'''
search_regions = defaultdict(list)
for i in search_regions.keys():
chromosome = i.split(":")[0]
start = int(i.split(":")[1].split("-")[0])
end = int(i.split(":")[1].split("-")[1])
search_regions[chromosome].append((start, end))
return search_regions #a list with chromosomes as keys
def database_snps_converter(databaseDict):
'''This function takes the dictionary of dictionaries and converts it to a
DefaultDict(list) to allow matching
with the serach_snps in a corresponding format'''
database_variants = defaultdict(list)
for i in database_variants.keys():
chromosome = i.split(":")[0]
start = int(i.split(":")[1].split("-")[0])
database_variants[chromosome].append(start)
return database_variants #list of database variants
Then I have made a function for matching (again with bioinfoboy's code), which is as follows:
def region_to_variant_location_match(search_Regions, database_Variants):
'''Take dictionaries for search_Variants and database_Variants as
input.
Match variants in database_Variants to regions within search_Variants.'''
for key, values in database_Variants.items():
for value in values:
for search_area in search_Regions[key]:
print(search_area)
if (value >= search_area[0]) and (value <= search_area[1]):
yield(key, search_area)
However the defaultdict functions return empty dictionaries and I can't quite work out what I need to change.
Any ideas?

I imagine this may help
I'm converting your search_regions and database_variants according to what I've mentioned in the comment.
from collections import defaultdict
_database_variants = defaultdict(list)
_search_regions = defaultdict(list)
for i in database_variants.keys():
_chromosome = i.split(":")[0]
_start = int(i.split(":")[1].split("-")[0])
_database_variants[_chromosome].append(_start)
_search_regions = defaultdict(list)
for i in search_regions.keys():
_chromosome = i.split(":")[0]
_start = int(i.split(":")[1].split("-")[0])
_end = int(i.split(":")[1].split("-")[1])
_search_regions[_chromosome].append((_start, _end))
def _search(_database_variants, _search_regions):
for key, values in _database_variants.items():
for value in values:
for search_area in _search_regions[key]:
if (value >= search_area[0]) and (value <= search_area[1]):
yield(key, search_area)
I've used yield and thus would return a generator object on which you can iterate through. Considering the data that you've provided initially in the question, I get the following output.
for i in _search(_database_variants, _search_regions):
print(i)
The output is the following:
('chr11', (56694718, 71838208))
('chr13', (27185654, 39682032))
Is that not what you are trying to achieve?

You should probably do something like
def region_to_variant_location_match(search_Variants, database_Variants):
'''Take dictionaries for search_Variants and database_Variants as input.
Match variants in database_Variants to regions within search_Variants.
Return matches as a nested dictionary.'''
return {
record[0]: record[1]
for record, lookup in zip(
database_Variants.items(),
search_Variants.items()
)
if (
record[1]['Chr'] == lookup[1]['Chr'] and
lookup[1]['Start'] <= record[1]['Start'] <= lookup[1]['End']
)
}
Note that if you were using Python 2.7 or lower (instead of Python 3), you would do iteritems() instead of items() and itertools.izip() instead of zip, and if you were using less than 2.6, you would need to switch to a generator comprehension being passed to dict() instead of a dict comprehension.

Related

Check for string in list items using list as reference

I want to replace items in a list based on another list as reference.
Take this example lists stored inside a dictionary:
dict1 = {
"artist1": ["dance pop","pop","funky pop"],
"artist2": ["chill house","electro house"],
"artist3": ["dark techno","electro techno"]
}
Then, I have this list as reference:
wish_list = ["house","pop","techno"]
My result should look like this:
dict1 = {
"artist1": ["pop"],
"artist2": ["house"],
"artist3": ["techno"]
}
I want to check if any of the list items inside "wishlist" is inside one of the values of the dict1. I tried around with regex, any.
This was an approach with just 1 list instead of a dictionary of multiple lists:
check = any(item in artist for item in wish_list)
if check == True:
artist_genres.clear()
artist_genres.append()
I am just beginning with Python on my own and am playing around with the SpotifyAPI to clean up my favorite songs into playlists. Thank you very much for your help!
The idea is like this,
dict1 = { "artist1" : ["dance pop","pop","funky pop"],
"artist2" : ["house","electro house"],
"artist3" : ["techno","electro techno"] }
wish_list = ["house","pop","techno"]
dict2={}
for key,value in dict1.items():
for i in wish_list:
if i in value:
dict2[key]=i
break
print(dict2)
A regex is not needed, you can get away by simply iterating over the list:
wish_list = ["house","pop","techno"]
dict1 = {
"artist1": ["dance pop","pop","funky pop"],
"artist2": ["chill house","electro house"],
"artist3": ["dark techno","electro techno"]
}
dict1 = {
# The key is reused as-is, no need to change it.
# The new value is the wishlist, filtered based on its presence in the current value
key: [genre for genre in wish_list if any(genre in item for item in value)]
for key, value in dict1.items() # this method returns a tuple (key, value) for each entry in the dictionary
}
This implementation relies a lot on list comprehensions (and also dictionary comprehensions), you might want to check it if it's new to you.

Trying to nest a dictionary from a previous dictionary

so I have the following scenario:
dictionary=[
{category1:clothes, category2:cheap, category3:10},
{category1:clothes, category2:normal, category3:20}]
I need a dictionary that goes {clothes:{cheap:10, normal:20}}
All I have figured out is something that prints them individually
for i in range(len(dictionary)):
print({dictionary[i]['category1']:{dictionary[i][category2],dictionary[i][category3]}}
But it prints them individually, and I can't figure out how to nest them together since this just gives me two dictionaries with the format I want, but the nested dictionary just has either the values from the first list or the second. I have also tried
[{item['category1']: {'Attribute': attr_key, 'Value': item[attr_key]}}
for item in dictionary for attr_key in item if attr_key != 'category1']
It is the same, it gives more lines whereas I just need one dictionary with cat1 and the other ones nested in its dictionary.
raw = {}
for item in dictionary:
value1 = item.get('category2')
value2 = item.get('category3')
raw.update({value1:value2})
data = {}
data[dictionary[0].get('category1')] = raw
Output:
{'clothes': {'cheap': 10, 'normal': 20}}
This should do it.
import collections
dictionary=[
{'category1':'clothes', 'category2':'cheap', 'category3':10},
{'category1':'clothes', 'category2':'normal', 'category3':20}
]
newdict = collections.defaultdict(dict)
for item in dictionary:
newdict[item['category1']].update({item['category2']: item['category3']})
print(newdict)

Convert a list-of-dictionaries to a dictionary

I have this list of dictionaries I want to convert to one dictionary
vpcs = [{'VPCRegion': 'us-east-1', 'VPCId': '12ededd4'},
{'VPCRegion': 'us-east-1', 'VPCId': '9847'},
{'VPCRegion': 'us-west-2', 'VPCId': '99485003'}]
I want to convert it to
{'us-east-1': '12ededd4', 'us-east-1': '9847', 'us-west-2': '99485003'}
I used this function
def convert_dict(tags):
return {tag['VPCRegion']:tag['VPCId'] for tag in tags}
but get this output it doesn't convert the first dictionary in the list
{'us-east-1': '9847', 'us-west-2': '99485003'}
Perhaps a list of dictionary may fit your need - see code below:
[{'us-east-1': '12ededd4'}, {'us-east-1': '9847'}, {'us-west-2': '99485003'}]
To elaborate on what other commented about dictionary key has to be unique, you can see that in the commented line which zip up the list_dict would result error if the 'vpcs' has 2 duplicate 'VPCRegion': 'us-east-1' and successfully create new dict if you take out one of the 'VPCRegion': 'us-east-1'.
vpcs = [{'VPCRegion': 'us-east-1', 'VPCId': '12ededd4'},
{'VPCRegion': 'us-east-1', 'VPCId': '9847'},
{'VPCRegion': 'us-west-2', 'VPCId': '99485003'}]
def changekey(listofdict):
new_dict = {}
new_list = []
for member in listofdict:
new_key = member['VPCRegion']
new_val = member['VPCId']
new_dict.update({new_key:new_val})
new_list.append({new_key:new_val})
return new_dict, new_list
dict1,list_dict=changekey(vpcs)
print(dict1)
print(list_dict)
#dict4=dict(zip(*[iter(list_dict)]*2))
#print(dict4)
Since your output must group several values under the same name, your output will be a dict of lists, not a dict of strings.
One way to quickly do it:
import collections
def group_by_region(vpcs):
result = collections.defaultdict(list)
for vpc in vpcs:
result[vpc['VPCRegion']].append(vpc['VPCId'])
return result
The result of group_by_region(vpcs) will be {'us-east-1': ['12ededd4', '9847'], 'us-west-2': ['99485003']}).
As an entertainment, here's a cryptic but efficient way to get this in one expression:
import itertools
{key: [rec['VPCId'] for rec in group]
for (key, group) in itertools.groupby(vpcs, lambda vpc: vpc['VPCRegion'])}

Extracting values from old dictionary into new dictinary using iteration

I am struggling to figure out an assignment.
The problem set up is:
I have a list containing ratios ( unique_ratio = ['0.05', '0.98', '1.45']
I have a dictionary containing k:v as ratio:count the number of times ratio has appeared in a previous variable ( dict = {'0.05':'5', '0.32':'72', '0.98': '21'}
I want to iterate over the dictionary and extract the k:v for the ratios which appear in the unique_ratio list. I want to store these k:v's in a new dictionary (frequencies = {})
I am running pytho 3.7
I have tried iterating over the dictionary using for loop but am never able to extract the k:v pair.
I am unsure whether I should test for i in unique_ratios or i in dict
for i in dict.values():
frequencies = { k:v for k,v in comp_dict_count.items() if 'i' in
unique_ratios }
print(frequencies)
Everything I have tried has led to syntax errors. The above code leads to empty frequencies dictionary.
You need a single dictionary comprehension for this. Also for a better formormance you could check membership using sets, reducing the lookup complexity to O(1):
unique_ratio = set(['0.05', '0.98', '1.45'])
d = {'0.05':'5', '0.32':'72', '0.98': '21'}
{k:v for k,v in d.items() if k in unique_ratio}
# {'0.05': '5', '0.98': '21'}

Iteration through nested JSON objects

I'm a beginner in Python pulling JSON data consisting of nested objects (dictionaries?). I'm trying to iterate through everything to locate a key all of them share, and select only the objects that have a specific value in that key. I spent days researching and applying and now everything is kind of blurring together in some mix of JS/Python analysis paralysis. This is the general format for the JSON data:
{
"things":{
"firstThing":{
"one":"x",
"two":"y",
"three":"z"
},
"secondThing":{
"one":"a",
"two":"b",
"three":"c"
},
"thirdThing":{
"one":"x",
"two":"y",
"three":"z"
}
}
}
In this example I want to isolate the dictionaries where two == y. I'm unsure if I should be using
JSON selection (things.things[i].two)
for loop through things, then things[i] looking for two
k/v when I have 3 sets of keys
Can anyone point me in the right direction ?
Assuming this is only ever one level deep (things), and you want a 'duplicate' of this dictionary with only the matching child dicts included, then you can do this with a dictionary comprehension:
data = {
"things":{
"firstThing":{
"one":"x",
"two":"y",
"three":"z"
},
"secondThing":{
"one":"a",
"two":"b",
"three":"c"
},
"thirdThing":{
"one":"x",
"two":"y",
"three":"z"
}
}
}
print({"things": {k:v for k, v in data['things'].items() if 'two' in v and v['two'] == 'y'}})
Since you've tagged this with python I assume you'd prefer a python solution. If you know that your 'two' key (whatever it is) is only present at the level of objects that you want, this might be a nice place for a recursive solution: a generator that takes a dictionary and yields any sub-dictionaries that have the correct key and value. This way you don't have to think too much about the structure of your data. Something like this will work, if you're using at least Python 3.3:
def findSubdictsMatching(target, targetKey, targetValue):
if not isinstance(target, dict):
# base case
return
# check "in" rather than get() to allow None as target value
if targetKey in target and targetKey[target] == targetValue:
yield target
else:
for key, value in target.items():
yield from findSubdictsMatching(value, targetKey, targetValue)
This code allows You to add objects with "two":"y" to list:
import json
m = '{"things":{"firstThing":{"one":"x","two":"y","three":"z"},"secondThing":{"one":"a","two":"b","three":"c"},"thirdThing":{"one":"x","two":"y","three":"z"}}}'
l = json.loads(m)
y_objects = []
for key in l["things"]:
l_2 = l["things"][key]
for key_1 in l_2:
if key_1 == "two":
if l_2[key_1] == 'y':
y_objects.append(l_2)
print(y_objects)
Console:
[{'one': 'x', 'two': 'y', 'three': 'z'}, {'one': 'x', 'two': 'y', 'three': 'z'}]

Categories

Resources