Creating nested dictionaries in Python using Openpyxl

Creating nested dictionaries in Python using Openpyxl - python

Trying to build a dictionary in Python created by looping through an Excel file using Openpyxl, where the key is the Name of a person, and the value is a list of dictionary items where each key is the Location, and the value is an array of Start and End.
Here is the Excel file:
And here is what I want:
people = {
'John':[{20:[[2,4],[3,5]]}, {21:[[2,4]]}],
'Jane':[{20:[[9,10]]},{21:[[2,4]]}]
}
Here is my current script:
my_file = openpyxl.load_workbook('Book2.xlsx', read_only=True)
ws = my_file.active
people = {}
for row in ws.iter_rows(row_offset=1):
a = row[0] # Name
b = row[1] # Date
c = row[2] # Start
d = row[3] # End
if a.value: # Only operate on rows that contain data
if a.value in people.keys(): # If name already in dict
for k, v in people.items():
for item in v:
#print(item)
for x in item:
if x == int(b.value):
print(people[k])
people[k][0][x].append([c.value,d.value])
else:
#people[k].append([c.value,d.value]) # Creates inf loop
else:
people[a.value] = [{b.value:[[c.value,d.value]]}]
Which successfully creates this:
{'John': [{20: [[2, 4], [9, 10]]}], 'Jane': [{20: [[9, 10]]}]}
But when I uncomment the line after the else: block to try to add a new Location dictionary to the initial list, it creates an infinite loop.
if x == int(b.value):
people[k][0][x].append([c.value,d.value])
else:
#people[k].append([c.value,d.value]) # Creates inf loop
I am sure there's a more Pythonic way of doing this, but pretty stuck here and looking for a nudge in the right direction. The outcome here is to analyze all of the dict items for overlapping Start/Ends per person and per location. So John's Start of 3.00 - 5.00 at location 20 overlaps with his Start/End at the same location of 2.00 - 4.00

It seems you're overthinking this; a combination of default dictionaries should do the trick.
from collections import defaultdict
person = defaultdict(dict)
for row in ws.iter_rows(min_row=2, max_col=4):
p, l, s, e = (c.value for c in row)
if p not in person:
person[p] = defaultdict(list)
person[p][l].append((s, e))

You can use the Pandas library for this. The core of this solution is a nested dictionary comprehension, each using groupby. You can, as below, use a function to take care of the nesting to aid readability / maintenance.
import pandas as pd
# define dataframe, or df = pd.read_excel('file.xlsx')
df = pd.DataFrame({'Name': ['John']*3 + ['Jane']*2,
'Location': [20, 20, 21, 20, 21],
'Start': [2.00, 3.00, 2.00, 9.00, 2.00],
'End': [4.00, 5.00, 4.00, 10.00, 4.00]})
# convert cols to integers
int_cols = ['Start', 'End']
df[int_cols] = df[int_cols].apply(pd.to_numeric, downcast='integer')
# define inner dictionary grouper and split into list of dictionaries
def loc_list(x):
d = {loc: w[int_cols].values.tolist() for loc, w in x.groupby('Location')}
return [{i: j} for i, j in d.items()]
# define outer dictionary grouper
people = {k: loc_list(v) for k, v in df.groupby('Name')}
{'Jane': [{20: [[9, 10]]}, {21: [[2, 4]]}],
'John': [{20: [[2, 4], [3, 5]]}, {21: [[2, 4]]}]}

Related

Create a key:value pair in the first loop and append more values in subsequent loops

How can I create a key:value pair in a first loop and then just append values in subsequent loops?
For example:
a = [1,2,3]
b = [8,9,10]
c = [4,6,5]
myList= [a,b,c]
positions= ['first_position', 'second_position', 'third_position']
I would like to create a dictionary which records the position values for each letter so:
mydict = {'first_position':[1,8,4], 'second_position':[2,9,6], 'third_position':[3,10,5]}
Imagine that instead of 3 letters with 3 values each, I had millions. How could I loop through each letter and:
In the first loop create the key:value pair 'first_position':[1]
In subsequent loops append values to the corresponding key: 'first_position':[1,8,4]
Thanks!

Try this code:
mydict = {}
for i in range(len(positions)):
mydict[positions[i]] = [each[i] for each in myList]
Output:
{'first_position': [1, 8, 4],
'second_position': [2, 9, 6],
'third_position': [3, 10, 5]}

dictionary.get('key') will return None if the key doesn't exist. So, you can check if the value is None and then append it if it isn't.
dict = {}
for list in myList:
for position, val in enumerate(list):
this_position = positions[position]
if dict.get(this_position) is not None:
dict[this_position].append(val)
else:
dict[this_position] = [val]

The zip function will iterate the i'th values of positions, a, b and c in order. So,
a = [1,2,3]
b = [8,9,10]
c = [4,6,5]
positions= ['first_position', 'second_position', 'third_position']
sources = [positions, a, b, c]
mydict = {vals[0]:vals[1:] for vals in zip(*sources)}
print(mydict)
This created tuples which is usually fine if the lists are read only. Otherwise do
mydict = {vals[0]:list(vals[1:]) for vals in zip(*sources)}

Operation similar to group by for lists

I have lists of ids and scores:
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
I want to remove duplicates from list ids so that scores would sum up accordingly.This is something very similar to what groupby.sum() does when use dataframes.
So, as output I expect :
ids=[1,2,3]
scores=[60,20,40]
I use the following code but it doesn't work well for all cases:
for indi ,i in enumerate(ids):
for indj ,j in enumerate(ids):
if(i==j) and (indi!=indj):
del ids[i]
scores[indj]=scores[indi]+scores[indj]
del scores[indi]

You can create a dictionary using ids and scores with the key as elements of id and values as the list of elements corresponding to an element in id, you can them sum up the values, and get your new id and scores list
from collections import defaultdict
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
dct = defaultdict(list)
#Create the dictionary of element of ids vs list of elements of scores
for id, score in zip(ids, scores):
dct[id].append(score)
print(dct)
#defaultdict(<class 'list'>, {1: [10, 10, 30, 10], 2: [20], 3: [40]})
#Calculate the sum of values, and get the new ids and scores list
new_ids, new_scores = zip(*((key, sum(value)) for key, value in dct.items()))
print(list(new_ids))
print(list(new_scores))
The output will be
[1, 2, 3]
[60, 20, 40]

As suggested in comments, using a dictionary is one way. You can iterate one time over the list and update the sum per id.
If you want two lists at the end, select the keys and values with keys() and values() methods from the dictionary:
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
# Init the idct with all ids at 0
dict_ = {i:0 for i in set(ids)}
for id, scores in zip(ids, scores):
dict_[id] += scores
print(dict_)
# {1: 60, 2: 20, 3: 40}
new_ids = list(dict_.keys())
sum_score = list(dict_.values())
print(new_ids)
# [1, 2, 3]
print(sum_score)
# [60, 20, 40]

Simply loop through them and add if the ids match.
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
ans={}
for i,s in zip(ids,scores):
if i in ans:
ans[i]+=s
else:
ans[i]=s
ids, scores=list(ans.keys()), list(ans.values())
Output:
[1, 2, 3]
[60, 20, 40]

# Find all unique ids and keep track of their scores
id_to_score = {id : 0 for id in set(ids)}
# Sum up the scores for that id
for index, id in enumerate(ids):
id_to_score[id] += scores[index]
unique_ids = []
score_sum = []
for (i, s) in id_to_score.items():
unique_ids.append(i)
score_sum.append(s)
print(unique_ids) # [1, 2, 3]
print(score_sum) # [60, 20, 40]

This may help you.
# Solution 1
import pandas as pd
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
df = pd.DataFrame(list(zip(ids, scores)),
columns=['ids', 'scores'])
print(df.groupby('ids').sum())
#### Output ####
scores
ids
1 60
2 20
3 40
# Solution 2
from itertools import groupby
zipped_list = list(zip(ids, scores))
print([[k, sum(v for _, v in g)] for k, g in groupby(sorted(zipped_list), key = lambda x: x[0])])
#### Output ####
[[1, 60], [2, 20], [3, 40]]

With only built-in Python tools I would do that task following way:
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
uids = list(set(ids)) # unique ids
for uid in uids:
print(uid,sum(s for inx,s in enumerate(scores) if ids[inx]==uid))
Output:
1 60
2 20
3 40
Above code just print result, but it might be easily changed to result in dict:
output_dict = {uid:sum(s for inx,s in enumerate(scores) if ids[inx]==uid) for uid in uids} # {1: 60, 2: 20, 3: 40}
or other data structure. Keep in mind that this method require separate pass for every unique id, so it might be slower than other approaches. Whatever this will be or not issue, depends on how big is your data.

python running count of values in dict

I have a dictionary like this
d = {1:'Bob', 2:'Joe', 3:'Bob', 4:'Bill', 5:'Bill'}
I want to keep a count of how many times each name occurs as a dictionary value. So, the output should be like this:
d = {1:['Bob', 1], 2:['Joe',1], 3:['Bob', 2], 4:['Bill',1] , 5:['Bill',2]}

One way of counting the values like you want, is shown below:
from collections import Counter
d = {1:'Bob',2:'Joe',3:'Bob', 4:'Bill', 5:'Bill'}
c = Counter()
new_d = {}
for k in sorted(d.keys()):
name = d[k]
c[name] += 1;
new_d[k] = [name, c[name]]
print(new_d)
# {1: ['Bob', 1], 2: ['Joe', 1], 3: ['Bob', 2], 4: ['Bill', 1], 5: ['Bill', 2]}
Here I use Counter to keep track of occurrences of names in the input dictionary. Hope this helps. Maybe not most elegant code, but it works.

To impose an order (which a dict per se doesn't have), let's say you're going in sorted order on the keys. Then you could do -- assuming the values are hashable, as in you example...:
import collections
def enriched_by_count(somedict):
countsofar = collections.defaultdict(int)
result = {}
for k in sorted(somedict):
v = somedict[k]
countsofar[v] += 1
result[k] = [v, countsofar[v]]
return result

Without using any modules, this is the code I came up with. Maybe not as short, but I am scared of modules.
def new_dict(d):
check = [] #List for checking against
new_dict = {} #The new dictionary to be returned
for i in sorted(d.keys()): #Loop through all the dictionary items
val = d[i] #Store the dictionary item value in a variable just for clarity
check.append(val) #Add the current item to the array
new_dict[i] = [d[i], check.count(val)] #See how many of the items there are in the array
return new_dict
Use like so:
d = {1:'Bob', 2:'Joe', 3:'Bob', 4:'Bill', 5:'Bill'}
d = new_dict(d)
print d
Output:
{1: ['Bob', 1], 2: ['Joe', 1], 3: ['Bob', 2], 4: ['Bill', 1], 5: ['Bill', 2]}

Combine Python dictionaries that have the same Key name

I have two separate Python List that have common key names in their respective dictionary. The second list called recordList has multiple dictionaries with the same key name that I want to append the first list clientList. Here are examples lists:
clientList = [{'client1': ['c1','f1']}, {'client2': ['c2','f2']}]
recordList = [{'client1': {'rec_1':['t1','s1']}}, {'client1': {'rec_2':['t2','s2']}}]
So the end result would be something like this so the records are now in a new list of multiple dictionaries within the clientList.
clientList = [{'client1': [['c1','f1'], [{'rec_1':['t1','s1']},{'rec_2':['t2','s2']}]]}, {'client2': [['c2','f2']]}]
Seems simple enough but I'm struggling to find a way to iterate both of these dictionaries using variables to find where they match.

When you are sure, that the key names are equal in both dictionaries:
clientlist = dict([(k, [clientList[k], recordlist[k]]) for k in clientList])
like here:
>>> a = {1:1,2:2,3:3}
>>> b = {1:11,2:12,3:13}
>>> c = dict([(k,[a[k],b[k]]) for k in a])
>>> c
{1: [1, 11], 2: [2, 12], 3: [3, 13]}

Assuming you want a list of values that correspond to each key in the two lists, try this as a start:
from pprint import pprint
clientList = [{'client1': ['c1','f1']}, {'client2': ['c2','f2']}]
recordList = [{'client1': {'rec_1':['t1','s1']}}, {'client1': {'rec_2':['t2','s2']}}]
clientList.extend(recordList)
outputList = {}
for rec in clientList:
k = rec.keys()[0]
v = rec.values()[0]
if k in outputList:
outputList[k].append(v)
else:
outputList[k] = [v,]
pprint(outputList)
It will produce this:
{'client1': [['c1', 'f1'], {'rec_1': ['t1', 's1']}, {'rec_2': ['t2', 's2']}],
'client2': [['c2', 'f2']]}

This could work but I am not sure I understand the rules of your data structure.
# join all the dicts for better lookup and update
clientDict = {}
for d in clientList:
for k, v in d.items():
clientDict[k] = clientDict.get(k, []) + v
recordDict = {}
for d in recordList:
for k, v in d.items():
recordDict[k] = recordDict.get(k, []) + [v]
for k, v in recordDict.items():
clientDict[k] = [clientDict[k]] + v
# I don't know why you need a list of one-key dicts but here it is
clientList = [dict([(k, v)]) for k, v in clientDict.items()]
With the sample data you provided this gives the result you wanted, hope it helps.

Appending values to dictionary in Python

I have a dictionary to which I want to append to each drug, a list of numbers. Like this:
append(0), append(1234), append(123), etc.
def make_drug_dictionary(data):
drug_dictionary={'MORPHINE':[],
'OXYCODONE':[],
'OXYMORPHONE':[],
'METHADONE':[],
'BUPRENORPHINE':[],
'HYDROMORPHONE':[],
'CODEINE':[],
'HYDROCODONE':[]}
prev = None
for row in data:
if prev is None or prev==row[11]:
drug_dictionary.append[row[11][]
return drug_dictionary
I later want to be able to access the entirr set of entries in, for example, 'MORPHINE'.
How do I append a number into the drug_dictionary?
How do I later traverse through each entry?

Just use append:
list1 = [1, 2, 3, 4, 5]
list2 = [123, 234, 456]
d = {'a': [], 'b': []}
d['a'].append(list1)
d['a'].append(list2)
print d['a']

You should use append to add to the list. But also here are few code tips:
I would use dict.setdefault or defaultdict to avoid having to specify the empty list in the dictionary definition.
If you use prev to to filter out duplicated values you can simplfy the code using groupby from itertools
Your code with the amendments looks as follows:
import itertools
def make_drug_dictionary(data):
drug_dictionary = {}
for key, row in itertools.groupby(data, lambda x: x[11]):
drug_dictionary.setdefault(key,[]).append(row[?])
return drug_dictionary
If you don't know how groupby works just check this example:
>>> list(key for key, val in itertools.groupby('aaabbccddeefaa'))
['a', 'b', 'c', 'd', 'e', 'f', 'a']

It sounds as if you are trying to setup a list of lists as each value in the dictionary. Your initial value for each drug in the dict is []. So assuming that you have list1 that you want to append to the list for 'MORPHINE' you should do:
drug_dictionary['MORPHINE'].append(list1)
You can then access the various lists in the way that you want as drug_dictionary['MORPHINE'][0] etc.
To traverse the lists stored against key you would do:
for listx in drug_dictionary['MORPHINE'] :
do stuff on listx

To append entries to the table:
for row in data:
name = ??? # figure out the name of the drug
number = ??? # figure out the number you want to append
drug_dictionary[name].append(number)
To loop through the data:
for name, numbers in drug_dictionary.items():
print name, numbers

If you want to append to the lists of each key inside a dictionary, you can append new values to them using + operator (tested in Python 3.7):
mydict = {'a':[], 'b':[]}
print(mydict)
mydict['a'] += [1,3]
mydict['b'] += [4,6]
print(mydict)
mydict['a'] += [2,8]
print(mydict)
and the output:
{'a': [], 'b': []}
{'a': [1, 3], 'b': [4, 6]}
{'a': [1, 3, 2, 8], 'b': [4, 6]}
mydict['a'].extend([1,3]) will do the job same as + without creating a new list (efficient way).

You can use the update() method as well
d = {"a": 2}
d.update{"b": 4}
print(d) # {"a": 2, "b": 4}

how do i append a number into the drug_dictionary?
Do you wish to add "a number" or a set of values?
I use dictionaries to build associative arrays and lookup tables quite a bit.
Since python is so good at handling strings,
I often use a string and add the values into a dict as a comma separated string
drug_dictionary = {}
drug_dictionary={'MORPHINE':'',
'OXYCODONE':'',
'OXYMORPHONE':'',
'METHADONE':'',
'BUPRENORPHINE':'',
'HYDROMORPHONE':'',
'CODEINE':'',
'HYDROCODONE':''}
drug_to_update = 'MORPHINE'
try:
oldvalue = drug_dictionary[drug_to_update]
except:
oldvalue = ''
# to increment a value
try:
newval = int(oldval)
newval += 1
except:
newval = 1
drug_dictionary[drug_to_update] = "%s" % newval
# to append a value
try:
newval = int(oldval)
newval += 1
except:
newval = 1
drug_dictionary[drug_to_update] = "%s,%s" % (oldval,newval)
The Append method allows for storing a list of values but leaves you will a trailing comma
which you can remove with
drug_dictionary[drug_to_update][:-1]
the result of the appending the values as a string means that you can append lists of values as you need too and
print "'%s':'%s'" % ( drug_to_update, drug_dictionary[drug_to_update])
can return
'MORPHINE':'10,5,7,42,12,'

vowels = ("a","e","i","o","u") #create a list of vowels
my_str = ("this is my dog and a cat") # sample string to get the vowel count
count = {}.fromkeys(vowels,0) #create dict initializing the count to each vowel to 0
for char in my_str :
if char in count:
count[char] += 1
print(count)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating nested dictionaries in Python using Openpyxl - python

Related

Create a key:value pair in the first loop and append more values in subsequent loops

Operation similar to group by for lists

python running count of values in dict

Combine Python dictionaries that have the same Key name

Appending values to dictionary in Python

Categories

Resources