Comparing values of different python dictionaries with different keys & structures - python

This is more of a theoretical question than anything. I have 3 dictionaries that have different structure/keys but the values may be the same. E.g.
dict1 = {u'd':{u'results':[{u'number':'1', u'dispURL':u'www.site.com'},
{u'number':u'2', u'dispURL':u'www.othersite.com'}]
}}
dict2 = {u'result_page':{u'url':u'www.greatsite.com', u'pos':u'1'},
{u'url':u'www.site.com', u'pos':u'2'}}
dict3 = {u'hits':[{u'displayurl':u'www.othersite.com', u'index':u'1'},
{u'displayurl':u'www.site.com', u'index':u'2'}]
}
Note how dict1 has {u'd': before the {u'results':, also dict1 and dict3 have the different key/values enclosed in square parentheses. Also note how the keys/values layout is different in dict1 with the number key coming before the url instead of after like in dict2 and dict3, as well as the key names being different in each dictionary.
I have 3 large dictionaries like this and I need to compare them by the position of each url for scoring purposes. i.e.
if dict1[www.site.com index] > dict2[www.site.com index]:
dict1[www.site.com] score +1
I know the code snippet isn't correct it's just for illustration. What I want to know is what do I need to do with the dictionaries to be able to perform a comparison like this? I was thinking of taking the required data from each dictionary and putting them into 3 new dictionaries with uniform keys and structure, or even 1 new dictionary. But my program has to be computationally quite fast so I don't know how this would affect it. Any of you more experienced python programmers want to have a say on this?

The most effective way to approach this is to convert your data into canonical {url:value} format.
For example:
dict1 = {data[u'dispURL']:int(data[u'number']) for data in dict1[u'd'][u'results']}
dict2 = {data[u'url']:int(data[u'pos']) for data in dict2[u'result_page']}
dict3 = {data[u'displayurl']:int(data[u'index']) for data in dict3[u'hits']}
Now they look like
dict1 = {u'www.othersite.com': 2, u'www.site.com': 1}
dict2 = {u'www.greatsite.com': 1, u'www.site.com': 2}
dict3 = {u'www.othersite.com': 1, u'www.site.com': 2}
and your comparison looks like
for url in dict1:
if dict1[url] > dict2[url]:
# do something

Related

Refactoring: Merging two dictionaries, but ignoring None values

I need a bit of Python refactoring advice.
I have a list of dict objects (new_monitors), which can be empty. When there are new monitors, however, I want to add a bunch of fields to those monitors.
For each monitor, I would like to append all not None fields from the DogDump.HIDE_FIELDS dict:
if new_monitors:
for monitor in new_monitors:
for key, value in DogDump.HIDE_FIELDS.items():
if value:
monitor[key] = value
Note: This snippet below worked very well, but it included all of the None fields. I do not want the None fields!
if new_monitors:
for monitor in new_monitors:
monitor.update(DogDump.HIDE_FIELDS)
How can I refactor this snippet that looks more pythonic, but still maintain good readability?
Not sure which is really the most "pythonic" way of handling your need to filter DogDump.HIDE_FIELDS dict before adding the relevant key / value pairs to your monitor dict. One way would be to perform the "filtering" with dict comprehension.
Also, I would think that you could "filter" your DogDump.HIDE_FIELDS dict before your loop rather than repeating this operation for each loop iteration (unless there are other operations taking place that mutate DogDump.HIDE_FIELDS while you are iterating).
Example of "filtering" with dict comprehension (dump refers to your DogDump.HIDE_FIELDS dict):
monitor = {'key': 'value'}
dump = {'a': 1, 'b': None}
dump_filtered = {k:v for (k,v) in dump.items() if v}
monitor.update(dump_filtered)
print(monitor)
# OUTPUT
# {'key': 'value', 'a': 1}

comparing dicts based on key and create a new dict with the missing key, with a 0 value in python3

I have three dicts say dict_a, dict_b and dict_c
dict_a={'key1':a, 'key2':b, 'key3':c}
dict_b={'key1':a, 'key2':b, 'key4':d}
dict_c={'key3':c, 'key1':a, 'key5':e}
Here the keys that are represented overall are: key1, key2, key3, key4, key5 with their respective values.
What I am looking for is eg., to create a new dict (or keep the dicts) and fill the missing keys in each dict in compare to the overall keys with 0 values and the key e.g,:
dict_a={'key1':a, 'key2':b, 'key3':c, 'key4':0, 'key5':0}
dict_b={'key1':a, 'key2':b, 'key3':0, 'key4':d, 'key5':0}
dict_c={'key1':a, 'key2':b, 'key3':c, 'key4':0, 'key5':e}
I am experienced in C, and based on my "limited knowledge" in python I would run a nested for-loop with a bunch of if, else statement to solve this, however what I know is python have some tools eg. zip, lamda etc. to nail this in a simple way. But I don't know how to start and begin, or even if there is a library that can solve this ?
it doesen't matter if I create new dicts with the missing keys or simple replace the existing dict, both are usable.
You could create a union of your keys and then just create new dictionaries containing all keys that you update with your previous values.
all_keys = set(dict_a).union(dict_b, dict_c)
new_dict_a = dict.fromkeys(all_keys, 0)
new_dict_a.update(dict_a)
print(new_dict_a)
# {'key1': 'a', 'key2': 'b', 'key3': 'c', 'key4': 0, 'key5': 0}
The same for the other dicts:
new_dict_b = dict.fromkeys(all_keys, 0)
new_dict_b.update(dict_b)
new_dict_c = dict.fromkeys(all_keys, 0)
new_dict_c.update(dict_c)
The dict.fromkeys creates a new dictionary containing all the specified keys with a default value (in this case 0) and then the update overwrites the values that were already in the original dictionary.
you could do this:
all_keys = set(dict_a).union(dict_b, dict_c)
default = 0
dct_a = {key: dict_a.get(key, default) for key in all_keys}
print(dct_a) # {'key2': 'b', 'key4': 0, 'key5': 0, 'key1': 'a',
# 'key3': 'c'}
...and so on for the other dicts.
once you have collected all_keys it's just a one-liner to create the new dictionary. dict.get is either the value that belongs to the key - if it exists - or default otherwise.
keys=set(dict_a).union(dict_b, dict_c)
for a in keys:
if a not in dict_a.keys():
dict_a[a]=0
if a not in dict_b.keys():
dict_b[a]=0
if a not in dict_c.keys():
dict_c[a]=0
Using sets to avoid having duplicates, we get all the keys. Now we know all the keys there are. Now we check if any of these keys are not in any of the dicts and if they are not, we add them with the value 0.This will give your desired output

Trying to grow a nested dictionary by adding more key:value pairs

I am facing some trouble with trying to add more key:value pairs to a dictionary object that is itself nested within another dictionary object. Also, the usual way of doing dict[key] = value to assign additional key:value pairs to the dictionary is not suitable for my case here (I'll explain why later below), and thus this makes my objective a lot more challenging to achieve.
I'll illustrate what I'm trying to achieve with some statements from my source code.
First, I have a dictionary object that contains nesting:
environment = { 'v' :
{
'SDC_PERIOD':'{period}s'.format(period = self.period),
'FAMILY':'{legup_family}s'.format(legup_family = self.legup_family),
'DEVICE_FAMILY':'"{fpga_family}s"'.format(fpga_family = self.fpga_family)
}
}
and then following this line, I will do an if test that, if passed, will require me to add this other dictionary:
environment_add = { 'v' : {'LM_LICENSE_FILE' : '1800#adsc-linux'} ,
'l' : 'quartus_full' }
to ultimately form this complete dictionary:
environment = { 'v' :
{
'SDC_PERIOD':'{period}s'.format(period = self.period),
'FAMILY':'{legup_family}s'.format(legup_family = self.legup_family),
'DEVICE_FAMILY':'"{fpga_family}s"'.format(fpga_family = self.fpga_family),
'LM_LICENSE_FILE' : '1800#adsc-linux'
} ,
'l' : 'quartus_full'
}
As you can see, if I were to try and assign a new key:value pair using the dict[key] = value syntax, it would not work for me because it would end up either creating an new key:value pair for me, or overwrite the existing dictionary object and the key:value pairs that are nested under the 'v' key.
So far, in order to accomplish the creation of the dictionary, I've been using the following:
environment = """{ v: {'SDC_PERIOD':'%(period)s','FAMILY':'%(legup_family)s','DEVICE_FAMILY':'"%(fpga_family)s"'}}""" % self
if self.require_license: # this is the if statement test that I was referring to earlier
environment = environment.replace('}', '')
environment += """ ,'LM_LICENSE_FILE':'1800#adsc-linux'}, 'l': 'quartus_full'}"""
and then obtaining the dictionary object later with:
import ast
env_dict = ast.literal_eval(environment)
which gives effectively converts the environment string into a dictionary object stored under a new variable name of env_dict.
My teammates think that this is much too overkill, especially since the environment or env_dict object will be parsed in 2 separate modules later on. In the first module, the key-value pairs will be broken up and reconstructed to form strings that look like '-v SDC_PERIOD=2500s, LM_LICENSE_FILE=1800#adsc-linux' , while in the second module, the dictionary nested under the 'v' key (of the environment/env_dict dictionary object) will be extracted out and directly fed as an argument to a function that accepts a mapping object.
So as you can see, there is quite a lot of precise parsing required to do the job, and although my method fulfills the objective, it is not accepted by my team and they think that there must be a better way to do this directly from environment being a dictionary object and not a string object.
Thank you very much for studying my detailed post, and I will greatly appreciate any help or suggestions to move forward on this!
for k,v in environment_add.iteritems(): # .items() in Python 3
if k in environment:
environment[k].update(v)
else:
environment[k] = v
That is, for each item to add, check if it exists, and update it if so, or simply create it. This assumes the items being added, if they exist, will be dicts (you can't update a string like quartus_full).
Why not just use update
In [4]: dict_ = {"a": {"b": 2, "c": 3}}
In [5]: dict_["a"].update(d=4)
In [6]: dict_
Out[6]: {'a': {'b': 2, 'c': 3, 'd': 4}}

Accessing elements of Python dictionary by index

Consider a dict like
mydict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
How do I access for instance a particular element of this dictionary?
for instance, I would like to print the first element after some formatting the first element of Apple which in our case is 'American' only?
Additional information
The above data structure was created by parsing an input file in a python function. Once created however it remains the same for that run.
I am using this data structure in my function.
So if the file changes, the next time this application is run the contents of the file are different and hence the contents of this data structure will be different but the format would be the same.
So you see I in my function I don't know that the first element in Apple is 'American' or anything else so I can't directly use 'American' as a key.
Given that it is a dictionary you access it by using the keys. Getting the dictionary stored under "Apple", do the following:
>>> mydict["Apple"]
{'American': '16', 'Mexican': 10, 'Chinese': 5}
And getting how many of them are American (16), do like this:
>>> mydict["Apple"]["American"]
'16'
If the questions is, if I know that I have a dict of dicts that contains 'Apple' as a fruit and 'American' as a type of apple, I would use:
myDict = {'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
print myDict['Apple']['American']
as others suggested. If instead the questions is, you don't know whether 'Apple' as a fruit and 'American' as a type of 'Apple' exist when you read an arbitrary file into your dict of dict data structure, you could do something like:
print [ftype['American'] for f,ftype in myDict.iteritems() if f == 'Apple' and 'American' in ftype]
or better yet so you don't unnecessarily iterate over the entire dict of dicts if you know that only Apple has the type American:
if 'Apple' in myDict:
if 'American' in myDict['Apple']:
print myDict['Apple']['American']
In all of these cases it doesn't matter what order the dictionaries actually store the entries. If you are really concerned about the order, then you might consider using an OrderedDict:
http://docs.python.org/dev/library/collections.html#collections.OrderedDict
As I noticed your description, you just know that your parser will give you a dictionary that its values are dictionary too like this:
sampleDict = {
"key1": {"key10": "value10", "key11": "value11"},
"key2": {"key20": "value20", "key21": "value21"}
}
So you have to iterate over your parent dictionary. If you want to print out or access all first dictionary keys in sampleDict.values() list, you may use something like this:
for key, value in sampleDict.items():
print value.keys()[0]
If you want to just access first key of the first item in sampleDict.values(), this may be useful:
print sampleDict.values()[0].keys()[0]
If you use the example you gave in the question, I mean:
sampleDict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'}
}
The output for the first code is:
American
Indian
And the output for the second code is:
American
EDIT 1:
Above code examples does not work for version 3 and above of python; since from version 3, python changed the type of output of methods keys and values from list to dict_values. Type dict_values is not accepting indexing, but it is iterable. So you need to change above codes as below:
First One:
for key, value in sampleDict.items():
print(list(value.keys())[0])
Second One:
print(list(list(sampleDict.values())[0].keys())[0])
I know this is 8 years old, but no one seems to have actually read and answered the question.
You can call .values() on a dict to get a list of the inner dicts and thus access them by index.
>>> mydict = {
... 'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
... 'Grapes':{'Arabian':'25','Indian':'20'} }
>>>mylist = list(mydict.values())
>>>mylist[0]
{'American':'16', 'Mexican':10, 'Chinese':5},
>>>mylist[1]
{'Arabian':'25','Indian':'20'}
>>>myInnerList1 = list(mylist[0].values())
>>>myInnerList1
['16', 10, 5]
>>>myInnerList2 = list(mylist[1].values())
>>>myInnerList2
['25', '20']
As a bonus, I'd like to offer kind of a different solution to your issue. You seem to be dealing with nested dictionaries, which is usually tedious, especially when you have to check for existence of an inner key.
There are some interesting libraries regarding this on pypi, here is a quick search for you.
In your specific case, dict_digger seems suited.
>>> import dict_digger
>>> d = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'}
}
>>> print(dict_digger.dig(d, 'Apple','American'))
16
>>> print(dict_digger.dig(d, 'Grapes','American'))
None
You can use mydict['Apple'].keys()[0] in order to get the first key in the Apple dictionary, but there's no guarantee that it will be American. The order of keys in a dictionary can change depending on the contents of the dictionary and the order the keys were added.
You can't rely on order of dictionaries, but you may try this:
mydict['Apple'].items()[0][0]
If you want the order to be preserved you may want to use this:
http://www.python.org/dev/peps/pep-0372/#ordered-dict-api
Simple Example to understand how to access elements in the dictionary:-
Create a Dictionary
d = {'dog' : 'bark', 'cat' : 'meow' }
print(d.get('cat'))
print(d.get('lion'))
print(d.get('lion', 'Not in the dictionary'))
print(d.get('lion', 'NA'))
print(d.get('dog', 'NA'))
Explore more about Python Dictionaries and learn interactively here...
Few people appear, despite the many answers to this question, to have pointed out that dictionaries are un-ordered mappings, and so (until the blessing of insertion order with Python 3.7) the idea of the "first" entry in a dictionary literally made no sense. And even an OrderedDict can only be accessed by numerical index using such uglinesses as mydict[mydict.keys()[0]] (Python 2 only, since in Python 3 keys() is a non-subscriptable iterator.)
From 3.7 onwards and in practice in 3,6 as well - the new behaviour was introduced then, but not included as part of the language specification until 3.7 - iteration over the keys, values or items of a dict (and, I believe, a set also) will yield the least-recently inserted objects first. There is still no simple way to access them by numerical index of insertion.
As to the question of selecting and "formatting" items, if you know the key you want to retrieve in the dictionary you would normally use the key as a subscript to retrieve it (my_var = mydict['Apple']).
If you really do want to be able to index the items by entry number (ignoring the fact that a particular entry's number will change as insertions are made) then the appropriate structure would probably be a list of two-element tuples. Instead of
mydict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
you might use:
mylist = [
('Apple', {'American':'16', 'Mexican':10, 'Chinese':5}),
('Grapes', {'Arabian': '25', 'Indian': '20'}
]
Under this regime the first entry is mylist[0] in classic list-endexed form, and its value is ('Apple', {'American':'16', 'Mexican':10, 'Chinese':5}). You could iterate over the whole list as follows:
for (key, value) in mylist: # unpacks to avoid tuple indexing
if key == 'Apple':
if 'American' in value:
print(value['American'])
but if you know you are looking for the key "Apple", why wouldn't you just use a dict instead?
You could introduce an additional level of indirection by cacheing the list of keys, but the complexities of keeping two data structures in synchronisation would inevitably add to the complexity of your code.
With the following small function, digging into a tree-shaped dictionary becomes quite easy:
def dig(tree, path):
for key in path.split("."):
if isinstance(tree, dict) and tree.get(key):
tree = tree[key]
else:
return None
return tree
Now, dig(mydict, "Apple.Mexican") returns 10, while dig(mydict, "Grape") yields the subtree {'Arabian':'25','Indian':'20'}. If a key is not contained in the dictionary, dig returns None.
Note that you can easily change (or even parameterize) the separator char from '.' to '/', '|' etc.
mydict = {
'Apple': {'American':'16', 'Mexican':10, 'Chinese':5},
'Grapes':{'Arabian':'25','Indian':'20'} }
for n in mydict:
print(mydict[n])

Inverse Dict in Python

I am trying to create a new dict using a list of values of an existing dict as individual keys.
So for example:
dict1 = dict({'a':[1,2,3], 'b':[1,2,3,4], 'c':[1,2]})
and I would like to obtain:
dict2 = dict({1:['a','b','c'], 2:['a','b','c'], 3:['a','b'], 4:['b']})
So far, I've not been able to do this in a very clean way. Any suggestions?
If you are using Python 2.5 or above, use the defaultdict class from the collections module; a defaultdict automatically creates values on the first access to a missing key, so you can use that here to create the lists for dict2, like this:
from collections import defaultdict
dict1 = dict({'a':[1,2,3], 'b':[1,2,3,4], 'c':[1,2]})
dict2 = defaultdict(list)
for key, values in dict1.items():
for value in values:
# The list for dict2[value] is created automatically
dict2[value].append(key)
Note that the lists in dict2 will not be in any particular order, as a dictionaries do not order their key-value pairs.
If you want an ordinary dict out at the end that will raise a KeyError for missing keys, just use dict2 = dict(dict2) after the above.
Notice that you don't need the dict in your examples: the {} syntax gives you a dict:
dict1 = {'a':[1,2,3], 'b':[1,2,3,4], 'c':[1,2]}
Other way:
dict2={}
[[ (dict2.setdefault(i,[]) or 1) and (dict2[i].append(x)) for i in y ] for (x,y) in dict1.items()]

Categories

Resources