Improve multidimensional dictionary loop efficiency

Improve multidimensional dictionary loop efficiency - python

I'm taking first steps with python and trying to iterate over multidimensional dictionary, while checking if key exists and not None.
Just to make it clear, the code works! But I feel that there should be a better way to implement it:
for key in sites[website_name]:
if 'revenue' in sites[website_name][key]:
if sites[website_name][key]['revenue'] is not None:
totalSiteIncome += sites[website_name][key]['revenue']
else:
sites[website_name][key]['revenue'] = 0
if 'spent' in sites[website_name][key]:
if sites[website_name][key]['spent'] is not None:
totalSiteSpent += sites[website_name][key]['spent']
else:
sites[website_name][key]['spent'] = 0
Any idea if and how can I improve the loop?
Keep in mind, looking for best practice here, thx!

Posting a sample of the sites[website_name] dictionary would really be helpful but if I understand you correctly, this is how I would do it:
totalSiteIncome = sum(x.get('revenue', 0.0) for x in sites[website_name])
totalSiteSpent = sum(x.get('spent', 0.0) for x in sites[website_name])
As mentioned in the comments, .get() allows you not to care whether the key is there or not and it takes a default argument in case it isn't (in this case 0). Other than that is just a generator in the sum() function.
In english the first line would read:
"get me all the revenues if they exist from every website in my site dictionary and sum them. If the revenue is not logged, assume 0"
As a sidenote, in your code, the totalSiteIncome and totalSiteSpent have to be initialised too, otherwise it whould not run. In my version they don't have to be and if they are their values will be overwritten.

The following approach could be useful if you need a solution that is agnostic to the nesting level of the target fields (revenue and spent). Also could be useful if you want to add more and more fields, as with this solution you don't need to repeat the code for each new field.
Besides that, there are some downsides to my suggestion, comparing to your solution: it's using recursion, which is less readable, and also a flag (return_totals), which feels hacky. Just adding my 5 cents to the brainstorm.
import collections
def _update(input_dict, target_fields, totals = {}, return_totals=True):
result = {}
for k, v in input_dict.iteritems():
if isinstance(v, dict):
r = _update(input_dict[k], target_fields, totals, return_totals=False)
result[k] = r
else:
if k in target_fields:
result[k] = input_dict[k] or 0
if k not in totals:
totals[k] = 0
totals[k] += result[k]
else:
result[k] = input_dict[k]
if return_totals:
return {
'updated_dictionary': result,
'totals': totals,
}
return result
new_sites = _update(input_dict = sites, target_fields = ['revenue', 'spent'])
print 'updated_dictionary:'
print new_sites['updated_dictionary']
print 'totals:'
print new_sites['totals']

dictionaries should be iterated using their iterator methods, for example dict.keys(), dict.values() and dict.items().
dict.keys():
d = {'a': '1', 'b': '2'}
for key in d.keys():
print(key)
Output:
a
b
dict.values():
d = {'a': '1', 'b': '2'}
for value in d.values():
print(value)
Output:
1
2
dict.items():
d = {'a': '1', 'b': '2'}
for key, value in d.items():
print(key + " -> " + value)
Output:
a -> 1
b -> 2
NOTE:
This methods work both in Python2 and Python3, but are only real iterators (improving efficiency) in Python3. The iterators in Python2 are called dir.iterkeys(), dir.itervalues() and dir.iteritems() respectively.

Related

if statement doesn't work properly in python

d={1:'a', 2:'b', 3:'c', 4:'a', 5:'d', 6:'e', 7:'a', 8:'b'}
value = raw_input("Choose a value to be searched: ")
data = ""
if value in d:
data = d.keys["value"]
print(data)
else:
print "There isn't such value in the dictionary"
So I write 'a' and I want to get the key '1'
but it skips "data = d.keys["value"] print(data)" and it prints me the message of "else"
What have I done wrong?

Containment checks for dict check the keys, not the values, and 'a' is a value in the dict, not a key.
The simplest fix would be to change your test to:
if value in d.viewvalues(): # d.values() on Python 3
but that's still sub-optimal; you can't perform efficient (O(1)) lookups in the values of a dict (nor can you do d.keys[value] as you seem to think you can; you'd have to perform a second linear scan to find the key, or perform a more complicated single scan to determine if the value exists and pull the key at the same time).
Really though, it seems like you want your dictionary reversed, with the keys as values and vice-versa. Doing it this way:
d = {1:'a', 2:'b', 3:'c', 4:'a', 5:'d', 6:'e', 7:'a', 8:'b'}
d_inv = {v: k for k, v in d.items()} # Make inverted version of d
value = raw_input("Choose a value to be searched: ")
if value in d_inv:
data = d_inv[value]
print(data)
else:
print "There isn't such value in the dictionary"
you can perform the containment check and lookup efficiently (if d isn't otherwise needed, you can just replace d with the same structure as d_inv and use d instead of d_inv uniformly).

As stated, you need the value, here's an alternative
if any(d[value] for value in d):
Then, d.keys["value"] is actually d[value]

You could do something like this:
d={1:'a', 2:'b', 3:'c', 4:'a', 5:'d', 6:'e', 7:'a', 8:'b'}
value = 'q'
data = [key for key, val in d.items() if val == value]
if len(data) > 0:
print(data)
else:
print "There isn't such value in the dictionary"
Then you would get the results
[1, 4, 7]

Merging values from 2 dictionaries (Python)

(I'm new to Python!)
Trying to figure out this homework question:
The function will takes as input two dictionaries, each mapping strings to integers. The function will return a dictionary that maps strings from the two input dictionaries to the sum of the integers in the two input dictionaries.
my idea was this:
def add(dicA,dicB):
dicA = {}
dicB = {}
newdictionary = dicA.update(dicB)
however, that brings back None.
In the professor's example:
print(add({'alice':10, 'Bob':3, 'Carlie':1}, {'alice':5, 'Bob':100, 'Carlie':1}))
the output is:
{'alice':15, 'Bob':103, 'Carlie':2}
My issue really is that I don't understand how to add up the values from each dictionaries. I know that the '+' is not supported with dictionaries. I'm not looking for anyone to do my homework for me, but any suggestions would be very much appreciated!

From the documentation:
update([other])
Update the dictionary with the key/value pairs from other, overwriting existing keys. Return None.
You don't want to replace key/value pairs, you want to add the values for similar keys. Go through each dictionary and add each value to the relevant key:
def add(dicA,dicB):
result = {}
for d in dicA, dicB:
for key in d:
result[key] = result.get(key, 0) + d[key]
return result
result.get(key, 0) will retrieve the value of an existing key or produce 0 if key is not yet present.

First of all, a.update(b) updates a in place, and returns None.
Secondly, a.update(b) wouldn't help you to sum the keys; it would just produce a dictionary with the resulting dictionary having all the key, value pairs from b:
>>> a = {'alice':10, 'Bob':3, 'Carlie':1}
>>> b = {'alice':5, 'Bob':100, 'Carlie':1}
>>> a.update(b)
>>> a
{'alice': 5, 'Carlie': 1, 'Bob': 100}
It'd be easiest to use collections.Counter to achieve the desired result. As a plus, it does support addition with +:
from collections import Counter
def add(dicA, dicB):
return dict(Counter(dicA) + Counter(dicB))
This produces the intended result:
>>> print(add({'alice':10, 'Bob':3, 'Carlie':1}, {'alice':5, 'Bob':100, 'Carlie':1}))
{'alice': 15, 'Carlie': 2, 'Bob': 103}

The following is not meant to be the most elegant solution, but to get a feeling on how to deal with dicts.
dictA = {'Alice':10, 'Bob':3, 'Carlie':1}
dictB = {'Alice':5, 'Bob':100, 'Carlie':1}
# how to iterate through a dictionary
for k,v in dictA.iteritems():
print k,v
# make a new dict to keep tally
newdict={}
for d in [dictA,dictB]: # go through a list that has your dictionaries
print d
for k,v in d.iteritems(): # go through each dictionary item
if not k in newdict.keys():
newdict[k]=v
else:
newdict[k]+=v
print newdict
Output:
Bob 3
Alice 10
Carlie 1
{'Bob': 3, 'Alice': 10, 'Carlie': 1}
{'Bob': 100, 'Alice': 5, 'Carlie': 1}
{'Bob': 103, 'Alice': 15, 'Carlie': 2}

def add(dicA,dicB):
You define a function that takes two arguments, dicA and dicB.
dicA = {}
dicB = {}
Then you assign an empty dictionary to both those variables, overwriting the dictionaries you passed to the function.
newdictionary = dicA.update(dicB)
Then you update dicA with the values from dicB, and assign the result to newdictionary. dict.update always returns None though.
And finally, you don’t return anything from the function, so it does not give you any results.
In order to combine those dictionaries, you actually need to use the values that were passed to it. Since dict.update mutates the dictionary it is called on, this would change one of those passed dictionaries, which we generally do not want to do. So instead, we use an empty dictionary, and then copy the values from both dictionaries into it:
def add (dicA, dicB):
newDictionary = {}
newDictionary.update(dicA)
newDictionary.update(dicB)
return newDictionary
If you want the values to sum up automatically, then use a Counter instead of a normal dictionary:
from collections import Counter
def add (dicA, dicB):
newDictionary = Counter()
newDictionary.update(dicA)
newDictionary.update(dicB)
return newDictionary

I suspect your professor wants to achieve this using more simple methods. But you can achieve this very easily using collections.Counter.
from collections import Counter
def add(a, b):
return dict(Counter(a) + Counter(b))
Your professor probably wants something like this:
def add(a, b):
new_dict = copy of a
for each key/value pair in b
if key in new_dict
add value to value already present in new_dict
else
insert key/value pair into new_dict
return new_dict

You can try this:
def add(dict1, dict2):
return dict([(key,dict1[key]+dict2[key]) for key in dict1.keys()])

I personally like using a dictionary's get method for this kind of merge:
def add(a, b):
result = {}
for dictionary in (a, b):
for key, value in dictionary.items():
result[key] = result.get(key, 0) + value
return result

Adding nonzero items from a dictionary to another dictionary

I have a set of reactions (keys) with values (0.0 or 100) stored in mydict.
Now I want to place non zero values in a new dictionary (nonzerodict).
def nonzero(cmod):
mydict = cmod.getReactionValues()
nonzerodict = {}
for key in mydict:
if mydict.values() != float(0):
nonzerodict[nz] = mydict.values
print nz
Unfortunately this is not working.
My questions:
Am I iterating over a dictionary correctly?
Am I adding items to the new dictionary correctly?

You are testing if the list of values is not equal to float(0). Test each value instead, using the key to retrieve it:
if mydict[key] != 0:
nonzerodict[key] = mydict[key]
You are iterating over the keys correctly, but you could also iterate over the key-value pairs:
for key, value in mydict.iteritems():
if value != 0:
nonzerodict[key] = value
Note that with floating point values, chances are you'll have very small values, close to zero, that you may want to filter out too. If so, test if the value is close to zero instead:
if abs(value) > 1e-9:
You can do the whole thing in a single dictionary expression:
def nonzero(cmod):
return {k: v for k, v in cmod.getReactionValues().iteritems() if abs(v) > 1e-9}

Its simple and you can it by below way -
>>> d = {'a':4,'b':2, 'c':0}
>>> dict((k,v) for k,v in d.iteritems() if v!=0)
{'a': 4, 'b': 2}
>>>

Replace if condition in you code with:
if mydict[key]:
nonzerodict[key] = mydict[key]
Your solution can be further simplified as:
def nonzero(cmod):
mydict = cmod.getReactionValues()
nonzerodict = {key: value for key, value in mydict.iteritems() if value}

Python get remaining runoff voting

I am a little stuck on writing a function for a project. This function takes a dictionary of candidates who's values are the number of votes they received. I then have to return a set containing the remaining_candidates. In other words the candidate with the least amount of votes should not be in the set being returned and if for example all of the candidates have the same votes, the set should be empty. I am having trouble getting started here.
For example I know I can sort the dictionary like so:
x = min(canadites, key=canadites.__getitem__)
but that will not work if the candidates have the same value, as it just pops up the last one in the dict.
Any ideas?
Update: To make things clear.
Lets say I have the following dictionary:
canadites = {'X':22,'Y':1, 'Z':0}
Ideally the function should return a set containing only X and Y. But if Y and Z where both 1
x = min(canadites, key=canadites.__getitem__)
seems to only return Z

It's cleaner to create a new dict instead of popping items from the old one:
>>> d = {'a':1, 'b':2, 'c':1, 'd':3}
>>> min_val = min(d.values())
>>> {k:v for k,v in d.items() if v > min_val}
{'b': 2, 'd': 3}
In python2, itervalues and iteritems would be more efficient, although this is a micro-optimization in most cases.

python dict: get vs setdefault

The following two expressions seem equivalent to me. Which one is preferable?
data = [('a', 1), ('b', 1), ('b', 2)]
d1 = {}
d2 = {}
for key, val in data:
# variant 1)
d1[key] = d1.get(key, []) + [val]
# variant 2)
d2.setdefault(key, []).append(val)
The results are the same but which version is better or rather more pythonic?
Personally I find version 2 harder to understand, as to me setdefault is very tricky to grasp. If I understand correctly, it looks for the value of "key" in the dictionary, if not available, enters "[]" into the dict, returns a reference to either the value or "[]" and appends "val" to that reference. While certainly smooth it is not intuitive in the least (at least to me).
To my mind, version 1 is easier to understand (if available, get the value for "key", if not, get "[]", then join with a list made up from [val] and place the result in "key"). But while more intuitive to understand, I fear this version is less performant, with all this list creating. Another disadvantage is that "d1" occurs twice in the expression which is rather error-prone. Probably there is a better implementation using get, but presently it eludes me.
My guess is that version 2, although more difficult to grasp for the inexperienced, is faster and therefore preferable. Opinions?

Your two examples do the same thing, but that doesn't mean get and setdefault do.
The difference between the two is basically manually setting d[key] to point to the list every time, versus setdefault automatically setting d[key] to the list only when it's unset.
Making the two methods as similar as possible, I ran
from timeit import timeit
print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000)
and got
0.794723378711
0.811882272256
0.724429205999
0.722129751973
So setdefault is around 10% faster than get for this purpose.
The get method allows you to do less than you can with setdefault. You can use it to avoid getting a KeyError when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.
See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.
The thread about setdefault concludes that most of the time, you want to use a defaultdict. The thread about get concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).

The accepted answer from agf isn't comparing like with like. After:
print timeit("d[0] = d.get(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] contains a list with 10,000 items whereas after:
print timeit("d.setdefault(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] is simply []. i.e. the d.setdefault version never modifies the list stored in d. The code should actually be:
print timeit("d.setdefault(0, []).append(1)", "d = {1: []}", number = 10000)
and in fact is faster than the faulty setdefault example.
The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using append the list updates are amortised O(1), i.e. effectively constant time.
Finally, there are two other options not considered in the original question: defaultdict or simply testing the dictionary to see whether it already contains the key.
So, assuming d3, d4 = defaultdict(list), {}
# variant 1 (0.39)
d1[key] = d1.get(key, []) + [val]
# variant 2 (0.003)
d2.setdefault(key, []).append(val)
# variant 3 (0.0017)
d3[key].append(val)
# variant 4 (0.002)
if key in d4:
d4[key].append(val)
else:
d4[key] = [val]
variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.
I would say use variant 3 if you can, with variant 4 as an option for those occasional places where defaultdict isn't an exact fit. Avoid both of your original variants.

For those who are still struggling in understanding these two term, let me tell you basic difference between get() and setdefault() method -
Scenario-1
root = {}
root.setdefault('A', [])
print(root)
Scenario-2
root = {}
root.get('A', [])
print(root)
In Scenario-1 output will be {'A': []} while in Scenario-2 {}
So setdefault() sets absent keys in the dict while get() only provides you default value but it does not modify the dictionary.
Now let come where this will be useful-
Suppose you are searching an element in a dict whose value is a list and you want to modify that list if found otherwise create a new key with that list.
using setdefault()
def fn1(dic, key, lst):
dic.setdefault(key, []).extend(lst)
using get()
def fn2(dic, key, lst):
dic[key] = dic.get(key, []) + (lst) #Explicit assigning happening here
Now lets examine timings -
dic = {}
%%timeit -n 10000 -r 4
fn1(dic, 'A', [1,2,3])
Took 288 ns
dic = {}
%%timeit -n 10000 -r 4
fn2(dic, 'A', [1,2,3])
Took 128 s
So there is a very large timing difference between these two approaches.

You might want to look at defaultdict in the collections module. The following is equivalent to your examples.
from collections import defaultdict
data = [('a', 1), ('b', 1), ('b', 2)]
d = defaultdict(list)
for k, v in data:
d[k].append(v)
There's more here.

1. Explained with a good example here:
http://code.activestate.com/recipes/66516-add-an-entry-to-a-dictionary-unless-the-entry-is-a/
dict.setdefault typical usage
somedict.setdefault(somekey,[]).append(somevalue)
dict.get typical usage
theIndex[word] = 1 + theIndex.get(word,0)
2. More explanation : http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
dict.setdefault() is equivalent to get or set & get. Or set if necessary then get. It's especially efficient if your dictionary key is expensive to compute or long to type.
The only problem with dict.setdefault() is that the default value is always evaluated, whether needed or not. That only matters if the default value is expensive to compute. In that case, use defaultdict.
3. Finally the official docs with difference highlighted http://docs.python.org/2/library/stdtypes.html
get(key[, default])
Return the value for key if key is in the dictionary, else default. If
default is not given, it defaults to None, so that this method never
raises a KeyError.
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

The logic of dict.get is:
if key in a_dict:
value = a_dict[key]
else:
value = default_value
Take an example:
In [72]: a_dict = {'mapping':['dict', 'OrderedDict'], 'array':['list', 'tuple']}
In [73]: a_dict.get('string', ['str', 'bytes'])
Out[73]: ['str', 'bytes']
In [74]: a_dict.get('array', ['str', 'byets'])
Out[74]: ['list', 'tuple']
The mechamism of setdefault is:
levels = ['master', 'manager', 'salesman', 'accountant', 'assistant']
#group them by the leading letter
group_by_leading_letter = {}
# the logic expressed by obvious if condition
for level in levels:
leading_letter = level[0]
if leading_letter not in group_by_leading_letter:
group_by_leading_letter[leading_letter] = [level]
else:
group_by_leading_letter[leading_letter].append(word)
In [80]: group_by_leading_letter
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
The setdefault dict method is for precisely this purpose. The preceding for loop can be rewritten as:
In [87]: for level in levels:
...: leading = level[0]
...: group_by_leading_letter.setdefault(leading,[]).append(level)
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}
It's very simple, means that either a non-null list append an element or a null list append an element.
The defaultdict, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict:
from collections import defualtdict
group_by_leading_letter = defaultdict(list)
for level in levels:
group_by_leading_letter[level[0]].append(level)

There is no strict answer to this question. They both accomplish the same purpose. They can both be used to deal with missing values on keys. The only difference that I have found is that with setdefault(), the key that you invoke (if not previously in the dictionary) gets automatically inserted while it does not happen with get(). Here is an example:
Setdefault()
>>> myDict = {'A': 'GOD', 'B':'Is', 'C':'GOOD'} #(1)
>>> myDict.setdefault('C') #(2)
'GOOD'
>>> myDict.setdefault('C','GREAT') #(3)
'GOOD'
>>> myDict.setdefault('D','AWESOME') #(4)
'AWESOME'
>>> myDict #(5)
{'A': 'GOD', 'B': 'Is', 'C': 'GOOD', 'D': 'AWSOME'}
>>> myDict.setdefault('E')
>>>
Get()
>>> myDict = {'a': 1, 'b': 2, 'c': 3} #(1)
>>> myDict.get('a',0) #(2)
1
>>> myDict.get('d',0) #(3)
0
>>> myDict #(4)
{'a': 1, 'b': 2, 'c': 3}
Here is my conclusion: there is no specific answer to which one is best specifically when it comes to default values imputation. The only difference is that setdefault() automatically adds any new key with a default value in the dictionary while get() does not. For more information, please go here !

In [1]: person_dict = {}
In [2]: person_dict['liqi'] = 'LiQi'
In [3]: person_dict.setdefault('liqi', 'Liqi')
Out[3]: 'LiQi'
In [4]: person_dict.setdefault('Kim', 'kim')
Out[4]: 'kim'
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}
In [8]: person_dict.get('Dim', '')
Out[8]: ''
In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Improve multidimensional dictionary loop efficiency - python

Related

if statement doesn't work properly in python

Merging values from 2 dictionaries (Python)

Adding nonzero items from a dictionary to another dictionary

Python get remaining runoff voting

python dict: get vs setdefault

Categories

Resources