Python summing up values in a nested dictionary - python

I have a dictionary P which represents a dictionary within a dictionary within a dictionary. It looks something like this.
P={key1:{keyA:{value1: 1, value2:3}, keyB:{value1:3,value2:4}},
key2:{keyA:{value1: 1, value2:3}, keyB:{value1:3,value2:4}}, key3{...:{...:}}}
What I am trying to do is to write each value of value1,value 2 in terms of their percentages of the totalPopulation from whichever is there base key.
For example key1 should look like
key1:{keyA:{value1: 1/(1+3+3+4), value2:3/(1+3+3+4)}, keyB:
{value1:3/(1+3+3+4),value2:4/(1+3+3+4)}
What I am not sure about is how to iterate over this dictionary and only collect the innermost values of a certain key so I can then sum up all the values and divide each value by that sum.

This can be done in single line using dict comprehension and map like this:
#from __future__ import division # use for Python 2.x
p = {"key1":{"keyA":{"value1": 1, "value2":3}, "keyB":{"value1":3,"value2":4}}}
p = {kOuter:{kInner:{kVal: vVal/sum(map(lambda x: sum(x.values()), vOuter.values())) for kVal, vVal in vInner.iteritems()} for kInner, vInner in vOuter.iteritems()} for kOuter, vOuter in p.iteritems()}
A more readable version of above :
p = {
kOuter:{
kInner:{
kVal: vVal/sum(map(lambda x: sum(x.values()), vOuter.values())) for kVal, vVal in vInner.iteritems()
}
for kInner, vInner in vOuter.iteritems()
}
for kOuter, vOuter in p.iteritems()
}
OUTPUT
>>> p
>>>
{'key1': {'keyB': {'value2': 0.36363636363636365, 'value1': 0.2727272727272727}, 'keyA': {'value2': 0.2727272727272727, 'value1': 0.09090909090909091}}}
The only problem with this is that the sum is calculated repeatedly, you can fix that by calculating the sum for each of your key1, key2... before this dict comprehension and use the stored values instead, like this :
keyTotals = {kOuter:sum(map(lambda x: sum(x.values()), vOuter.values())) for kOuter, vOuter in p.iteritems()}
and then you can simply access the sums calculated above by keys, like this:
p = {kOuter:{kInner:{kVal: vVal/keyTotals[kOuter] for kVal, vVal in vInner.iteritems()} for kInner, vInner in vOuter.iteritems()} for kOuter, vOuter in p.iteritems()}

test = {"key1":{"keyA":{"value1": 1, "value2":3}, "keyB":{"value1":3,"value2":4}}}
for a in test:
s = 0
for b in test[a]:
for c in test[a][b]:
s += test[a][b][c]
print(s)
for b in test[a]:
for c in test[a][b]:
test[a][b][c] = test[a][b][c] / s
This should do what you want. I've only included "key1" in this example.

Related

Finding the percentange in a list of values

I have a dictionary that has multiple values assigned to each key. For each list of values in each key, I am trying to find a percentage of how many fit the 'flexibility' criteria. Since the values are stings it is throwing me for a loop (pun not intended). I am trying to get one value that has the percentage of values that are either 'none' or 'flexible' out of the total values in the loop.
Basically if the dictionary looks like this:
dict1 = {'German' : ["None", "None" ,"Flexible", "Hard"],
"French" : ["Hard", "Hard", "Hard", "Hard"]
}
I want the code to give me this (rounding to 2 decimals is fine:
dict1 = {"German" : "0.75",
"French" : "1.00"
}
import pandas as pd
def course_prereq_flexibility(fn):
df = pd.read_csv(fn)
df2 = df[["area", "prereq_type"]].copy()
def percentages (df2):
dict1 = {}
for items in range(len(df2)):
key = df2.iloc[items, 0]
values = df2.iloc[items, 1]
dict1.setdefault(key, [])
dict1[key].append(values)
dict1
I am a bit confused on where to go from creating the dictonary and would really appreciate a walk through of the steps I could go through.
Without using pandas, it's reasonably straightfoward to do this with just collections.Counter.
>>> dict1 = {'German' : ["None", "None" ,"Flexible", "Hard"],
...
... "French" : ["Hard", "Hard", "Hard", "Hard"]
...
... }
>>>
>>> {k: c
... for k, v in dict1.items()
... for c in (Counter(v),)}
{'German': Counter({'None': 2, 'Flexible': 1, 'Hard': 1}), 'French': Counter({'Hard': 4})}
>>> {k: (c['None'] + c['Flexible']) / len(v)
... for k, v in dict1.items()
... for c in (Counter(v),)}
{'German': 0.75, 'French': 0.0}
There are a number of ways to achieve this. The following is one example:
dict1 = {
"German": ["None", "None", "Flexible", "Hard"],
"French": ["Hard", "Hard", "Hard", "Hard"]
}
def percentage_in_list(input_list, elements_to_find=None):
if elements_to_find is None:
elements_to_find = ["None", "Flexible"]
nr_found = len([x for x in input_list if x in elements_to_find])
return (nr_found / len(input_list)) * 100
percentages = {k: percentage_in_list(v) for k,v in dict1.items()}
print(percentages)
The function percentage_in_list returns the percentage of values that corresponds to one of the values in elements_to_find which in this case is set to "None" and "Flexible" by default. In the function, a list comprehension is used to filter out all the elements of the input_list that are in elements_to_find. The len of the result of the list comprehension is the number of elements that have been found. Now, this number just has to be divided by the length of the input list and multiplied by 100 to return the percentage.
In the main code, a dictionary comprehension is used to iterate over dict1 and call the function percentage_in_list for every value in the dictionary.

Run calculation multiple times with different values

I have two dictionaries, that look like:
dict1 = {1: 10, 2: 23, .... 999: 12}
dict2 = {1: 42, 2: 90, .... 999: 78}
I want to perform a simple calculation: Multiply value of dict1 with value of dict2 for 1 and 2 each.
The code so far is:
dict1[1] * dict2[1]
This calculates 10*42, which is exactly what i want.
Now i want to perform this calculation for every index in the dictionary, so for 1 up to 999.
I tried:
i = {1,2,3,4,5,6 ... 999}
dict1[i] * dict2[i]
But it didnt work.
This creates a new dict with the results:
out = { i: dict1[i] * dict2[i] for i in range(1,1000) }
If you need to work with vectors and matrices take a look at the numpy module. It has data structures and a huge collection of tools for working with them.

What is the fastest way to return a specific list within a dictionary within a dictionary?

I have a list within a dictionary within a dictionary. The data set is very large. How can I most quickly return the list nested in the two dictionaries if I am given a List that is specific to the key, dict pairs?
{"Dict1":{"Dict2": ['UNIOUE LIST'] }}
Is there an alternate data structure to use for this for efficiency?
I do not believe a more efficient data structure exists in Python. Simply retrieving the list using the regular indexing operator should be a very fast operation, even if both levels of dictionaries are very large.
nestedDict = {"Dict1":{"Dict2": ['UNIOUE LIST'] }}
uniqueList = nestedDict["Dict1"]["Dict2"]
My only thought for improving performance was to try flattening the data structure into a single dictionary with tuples for keys. This would take more memory than the nested approach since the keys in the top-level dictionary will be replicated for every entry in the second-level dictionaries, but it will only compute the hash function once for every lookup. But this approach is actually slower than the nested approach in practice:
nestedDict = {i: {j: ['UNIQUE LIST'] for j in range(1000)} for i in range(1000)}
flatDict = {(i, j): ['UNIQUE LIST'] for i in range(1000) for j in range(1000)}
import random
def accessNested():
i = random.randrange(1000)
j = random.randrange(1000)
return nestedDict[i][j]
def accessFlat():
i = random.randrange(1000)
j = random.randrange(1000)
return nestedDict[(i,j)]
import timeit
print(timeit.timeit(accessNested))
print(timeit.timeit(accessFlat))
Output:
2.0440238649971434
2.302736301004188
The fastest way to access the list within the nested dictionary is,
d = {"Dict1":{"Dict2": ['UNIOUE LIST'] }}
print(d["Dict1"]["Dict2"])
Output :
['UNIOUE LIST']
But if you perform iteration on the list that is in nested dictionary. so you can use the following code as example,
d = {"a":{"b": ['1','2','3','4'] }}
for i in d["a"]["b"]:
print(i)
Output :
1
2
3
4
If I understand correctly, you want to access a nested dictionary structure if...
if I am given a List that is specific to the key
So, here you have a sample dictionary and key that you want to access
d = {'a': {'a': 0, 'b': 1},
'b': {'a': {'a': 2}, 'b': 3}}
key = ('b', 'a', 'a')
The lazy approach
This is fast if you know Python dictionaries already, no need to learn other stuff!
>>> value = d
>>> for level in key:
... value = temp[level]
>>> value
2
NestedDict from the ndicts package
If you pip install ndicts then you get the same "lazy approach" implementation in a nicer interface.
>>> from ndicts import NestedDict
>>> nd = NestedDict(d)
>>> nd[key]
2
>>> nd["b", "a", "a"]
2
This option is fast because you can't really write less code than nd[key] to get what you want.
Pandas dataframes
This is the solution that will give you performance. Lookups in dataframes should be quick, especially if you have a sorted index.
In this case we have hierarchical data with multiple levels, so I will create a MultiIndex first. I will use the NestedDict for ease, but anything else to flatten the dictionary will do.
>>> keys = list(nd.keys())
>>> values = list(nd.values())
>>> from pandas import DataFrame, MultiIndex
>>> index = MultiIndex.from_tuples(keys)
>>> df = DataFrame(values, index=index, columns="Data").sort_index()
>>> df
Data
a a NaN 0
b NaN 1
b a a 2
b NaN 3
Use the loc method to get a row.
>>> nd.loc[key]
Data 2
Name: (b, a, a), dtype: int64

How to query values in a dictionary of a dictionary in python?

I have a list in Python and it's a dictionary contains a dictionary.
{'METTS MARK': {'salary': 365788, 'po': 1}, 'HARRY POTTER':{'salary': 3233233, 'po': 0}
How do I calculate the number of records with 'po' = 1?
I tried this:
sum = 0
for key, values in DIC:
if values[po] == 1:
sum = sum + 1
But it returns: too many values to unpack
Thanks in advance
You can simply use sum and sum over the condition:
total = sum(values.get('po') == 1 for values in DIC.values())
which is equivalent to (as #VPfB says):
total = sum (1 for item in DIC.values() if item.get('po') == 1)
but the latter is a bit less efficient.
You should also use 'po' instead of po since it is a string, and you better use .get('po') since this guarantees that it will work if 'po' is not part of every dictionary.
I think you forgot to use .items() in your for loop. By iterating over the dictionary, you iterate over the keys and in this case, you cannot unpack your keys into tuples with two elements.
Nevertheless using a generator, this will be more memory efficient (probably faster as well) and it is clean code. Furthermore by iterating directly over the .values instead of the .items one expects an increase in performance because one saves on packing and unpacking.
You can get it like this:
a = {
'METTS MARK': {'salary': 365788, 'po': 1},
'HARRY POTTER': {'salary': 3233233, 'po': 0}
}
print(len([b for b in a.values() if b.get('po')==1]))
Output:
1
Here we are creating a list of dictionaries where the key po==1. And then we calculate the length of the list.

Summing up numbers in a defaultdict(list)

I've been experimenting trying to get this to work and I've exhausted every idea and web search. Nothing seems to do the trick. I need to sum numbers in a defaultdict(list) and i just need the final result but no matter what i do i can only get to the final result by iterating and returning all sums adding up to the final. What I've been trying generally,
d = { key : [1,2,3] }
running_total = 0
#Iterate values
for value in d.itervalues:
#iterate through list inside value
for x in value:
running_total += x
print running_total
The result is :
1,3,6
I understand its doing this because its iterating through the for loop. What i dont get is how else can i get to each of these list values without using a loop? Or is there some sort of method iv'e overlooked?
To be clear i just want the final number returned e.g. 6
EDIT I neglected a huge factor , the items in the list are timedealta objects so i have to use .seconds to make them into integers for adding. The solutions below make sense and I've tried similar but trying to throw in the .seconds conversion in the sum statement throws an error.
d = { key : [timedelta_Obj1,timedelta_Obj2,timedelta_Obj3] }
I think this will work for you:
sum(td.seconds for sublist in d.itervalues() for td in sublist)
Try this approach:
from datetime import timedelta as TD
d = {'foo' : [TD(seconds=1), TD(seconds=2), TD(seconds=3)],
'bar' : [TD(seconds=4), TD(seconds=5), TD(seconds=6), TD(seconds=7)],
'baz' : [TD(seconds=8)]}
print sum(sum(td.seconds for td in values) for values in d.itervalues())
You could just sum each of the lists in the dictionary, then take one final sum of the returned list.
>>> d = {'foo' : [1,2,3], 'bar' : [4,5,6,7], 'foobar' : [10]}
# sum each value in the dictionary
>>> [sum(d[i]) for i in d]
[10, 6, 22]
# sum each of the sums in the list
>>> sum([sum(d[i]) for i in d])
38
If you don't want to iterate or to use comprehensions you can use this:
d = {'1': [1, 2, 3], '2': [3, 4, 5], '3': [5], '4': [6, 7]}
print(sum(map(sum, d.values())))
If you use Python 2 and your dict has a lot of keys it's better you use imap (from itertools) and itervalues
from itertools import imap
print sum(imap(sum, d.itervalues()))
Your question was how to get the value "without using a loop". Well, you can't. But there is one thing you can do: use the high performance itertools.
If you use chain you won't have an explicit loop in your code. chain manages that for you.
>>> data = {'a': [1, 2, 3], 'b': [10, 20], 'c': [100]}
>>> import itertools
>>> sum(itertools.chain.from_iterable(data.itervalues()))
136
If you have timedelta objects you can use the same recipe.
>>> data = {'a': [timedelta(minutes=1),
timedelta(minutes=2),
timedelta(minutes=3)],
'b': [timedelta(minutes=10),
timedelta(minutes=20)],
'c': [timedelta(minutes=100)]}
>>> sum(td.seconds for td in itertools.chain.from_iterable(data.itervalues()))
8160

Categories

Resources