I get the error onthis line of code -
result_dict['strat'][k]['name'] = current_comps[0].strip()
The error is : Keyerror: 'strat'
I have an input line
PERSON1 ## CAR1 # ENTRY : 0 | EXIT : 0 ## CAR2 # M1 : YES : 10/01/17 02:00 | M2 : NO : 10/02/16 03:00 | M3 : NO : 05/07/17 11:00 | M4 : YES : 01/01/16 03:00 ## TRUCK # M3 : NO : 03/01/17 03:45 | M23 : NO : 01/01/14 07:00 | M27 : YES : 02/006/18 23:00
I 'm looking to parse this input to generate the output detailed below. As part of this, I'm trying to build a dictionary inserting both keys & values dynamically. I'm having a lot of problems doing this.
Could I please request help on this?
Here is what I've tried so far -
# File read
f = open('input_data', 'r')
file_cont = f.read().splitlines()
f.close()
#json template
# Initialize dictionary
result_arr = []
result_dict = {}
k = 0
for item in file_cont:
strat = item.split('##')
result_dict['Person'] = strat[0].strip()
j = 1
while j < len(strat):
# Split various components of the main line
current_comps = strat[j].split('#')
# Name of strat being parsed
result_dict['strat'][k]['name'] = current_comps[0].strip()
# tfs across the various time frames
tfs = current_comps[1].split('|')
# First travel mode
if current_comps[0].strip() == 'CAR1':
temp_low_arr = tfs[0].split(':')
temp_high_arr = tfs[1].split(':')
result_dict['strat'][k]['Entry'] = temp_low_arr[1].strip()
result_dict['strat'][k]['Exit'] = temp_high_arr[1].strip()
# Second travel mode
elif current_comps[0].strip() == 'CAR2':
z = 0
while z < len(tfs):
# Split components of the sign
sign_comp_car_2 = tfs[z].split(':')
result_dict['strat'][k]['tf'][z]['path'] = sign_comp_ma_cross[0].strip()
result_dict['strat'][k]['tf'][z]['sign'] = sign_comp_ma_cross[1].strip()
result_dict['strat'][k]['tf'][z]['sign_time'] = sign_comp_ma_cross[2].strip()
z += 1
# Third travel mode
elif current_comps[0].strip() == 'CAR3':
b = 0
while b < len(tfs):
# Split components of the sign
sign_car_3 = tfs[z].split(':')
result_dict['strat'][k]['tf'][b]['path'] = sign_all_term[0].strip()
result_dict['strat'][k]['tf'][b]['sign'] = sign_all_term[1].strip()
result_dict['strat'][k]['tf'][b]['sign_time'] = sign_all_term[2].strip()
b += 1
j += 1
k += 1
Expected output
[{
"Person":"",
"Transport":[
{
"Name":"CAR1",
"Entry":"0",
"Exit":"0"
},
{
"name":"CAR2:",
"tf":[
{
"path":"M1",
"sign":"YES",
"sign_time":"10/01/17 02:00"
},
{
"path":"M2",
"sign":"NO",
"sign_time":"10/02/16 03:00"
},
{
"path":"M3",
"sign":"NO",
"sign_time":"05/07/17 11:00"
},
{
"path":"M4",
"sign":"YES",
"sign_time":"01/01/16 03:00"
}
]
},
{
"name":"CAR3",
"tf":[
{
"path":"M3",
"sign":"NO",
"sign_time":"03/01/17 03:45"
},
{
"path":"M23",
"sign":"NO",
"sign_time":"01/01/14 07:00"
},
{
"path":"M27",
"sign":"Yes",
"sign_time":"02/006/18 23:00"
}
]
}
]
}]
The issue is when you try to assign the ['name'] field in result_dict['strat'][k] when result_dict['strat'][k] hasn't been initialized yet. Before you run your for-loop, the dictionary has no key called strat.
Now you could have done something like result_dict['strat'] = dict() (assigning an object to that key in the dict), but when you further subscript it using result_dict['strat'][k], it will try to resolve that first, by accessing result_dict['strat'], expecting either a subscriptable collection or a dictionary in return. However, since that key doesn't exist yet, it throws you the error.
What you could do instead is initialize a default dictionary:
from collections import defaultdict
...
resultdict = defaultdict(dict)
...
Otherwise, in your existing code, you could initialize a dict within result_dict before entering the loop.
I have a list of lists containing company objects:
companies_list = [companies1, companies2]
I have the following function:
def get_fund_amount_by_year(companies_list):
companies_length = len(companies_list)
for idx, companies in enumerate(companies_list):
companies1 = companies.values_list('id', flat=True)
funding_rounds = FundingRound.objects.filter(company_id__in=companies1).order_by('announced_on')
amount_per_year_list = []
for fr in funding_rounds:
fr_year = fr.announced_on.year
fr_amount = fr.raised_amount_usd
if not any(d['year'] == fr_year for d in amount_per_year_list):
year_amount = {}
year_amount['year'] = fr_year
for companies_idx in range(companies_length):
year_amount['amount'+str(companies_idx)] = 0
if companies_idx == idx:
year_amount['amount'+str(companies_idx)] = fr_amount
amount_per_year_list.append(year_amount)
else:
for year_amount in amount_per_year_list:
if year_amount['year'] == fr_year:
year_amount['amount'+str(idx)] += fr_amount
return amount_per_year_list
The problem is the resulting list of dictionaries has only one amount attribute updated.
As you can see "amount0" contains all "0" amounts:
[{'amount1': 12100000L, 'amount0': 0, 'year': 1999}, {'amount1':
8900000L, 'amount0': 0, 'year': 2000}]
What am I doing wrong?
I put list of dictionaries being built in the loop and so when it iterated it overwrote the last input. I changed it to look like:
def get_fund_amount_by_year(companies_list):
companies_length = len(companies_list)
**amount_per_year_list = []**
for idx, companies in enumerate(companies_list):
companies1 = companies.values_list('id', flat=True)
funding_rounds = FundingRound.objects.filter(company_id__in=companies1).order_by('announced_on')
I've created a function for iterating through a multiple level dictionary, and execute a second function ssocr that needs four arguments: coord, background, foreground, type (they are the value of my keys). This is my dictionary which is taken from a json file.
document json
def parse_image(self, d):
bg = d['background']
fg = d['foreground']
results = {}
for k, v in d['boxes'].iteritems():
if 'foreground' in d['boxes']:
myfg = d['boxes']['foreground']
else:
myfg = fg
if k != 'players_home' and k != 'players_opponent':
results[k] = MyAgonism.ssocr(v['coord'], bg, myfg, v['type'])
results['players_home'] = {}
for k, v in d['boxes']['players_home'].iteritems():
if 'foreground' in d['boxes']['players_home']:
myfg = d['boxes']['players_home']['foreground']
else:
myfg = fg
if k != 'background' and k != 'foreground':
for k2, v2 in d['boxes']['players_home'][k].iteritems():
if k2 != 'fouls':
results['players_home'][k] = {}
results['players_home'][k][k2] = MyAgonism.ssocr(v2['coord'], bg, myfg, v2['type'])
return results
In the last iteritems I get the correct value just for the name key. The score key doesn't appear. It looks like name override score in my results['players_home'] dictionary
output: ... "player4": {"name": 9}, "player5": {"name": 24} ...
I would like something like ... "player4": {"name": 9, "score": value}, "player5": {"name": 24, "score": value} ...
What am I doing wrong? Here is the full code just in case: Full Code
This might be the/a problem:
if k2 != 'fouls':
results['players_home'][k] = {}
In your loop, each time that k2 is not 'fouls', you create a new empty dict and store that in results['players_home']. That means that any entry previously stored there is no longer accessible.
Search for a value and get the parent dictionary names (keys):
Dictionary = {dict1:{
'part1': {
'.wbxml': 'application/vnd.wap.wbxml',
'.rl': 'application/resource-lists+xml',
},
'part2':
{'.wsdl': 'application/wsdl+xml',
'.rs': 'application/rls-services+xml',
'.xop': 'application/xop+xml',
'.svg': 'image/svg+xml',
},
'part3':{...}, ...
dict2:{
'part1': { '.dotx': 'application/vnd.openxmlformats-..'
'.zaz': 'application/vnd.zzazz.deck+xml',
'.xer': 'application/patch-ops-error+xml',}
},
'part2':{...},
'part3':{...},...
},...
In above dictionary I need to search values like: "image/svg+xml". Where, none of the values are repeated in the dictionary. How to search the "image/svg+xml"? so that it should return the parent keys in a dictionary { dict1:"part2" }.
Please note: Solutions should work unmodified for both Python 2.7 and Python 3.3.
Here's a simple recursive version:
def getpath(nested_dict, value, prepath=()):
for k, v in nested_dict.items():
path = prepath + (k,)
if v == value: # found value
return path
elif hasattr(v, 'items'): # v is a dict
p = getpath(v, value, path) # recursive call
if p is not None:
return p
Example:
print(getpath(dictionary, 'image/svg+xml'))
# -> ('dict1', 'part2', '.svg')
To yield multiple paths (Python 3 only solution):
def find_paths(nested_dict, value, prepath=()):
for k, v in nested_dict.items():
path = prepath + (k,)
if v == value: # found value
yield path
elif hasattr(v, 'items'): # v is a dict
yield from find_paths(v, value, path)
print(*find_paths(dictionary, 'image/svg+xml'))
This is an iterative traversal of your nested dicts that additionally keeps track of all the keys leading up to a particular point. Therefore as soon as you find the correct value inside your dicts, you also already have the keys needed to get to that value.
The code below will run as-is if you put it in a .py file. The find_mime_type(...) function returns the sequence of keys that will get you from the original dictionary to the value you want. The demo() function shows how to use it.
d = {'dict1':
{'part1':
{'.wbxml': 'application/vnd.wap.wbxml',
'.rl': 'application/resource-lists+xml'},
'part2':
{'.wsdl': 'application/wsdl+xml',
'.rs': 'application/rls-services+xml',
'.xop': 'application/xop+xml',
'.svg': 'image/svg+xml'}},
'dict2':
{'part1':
{'.dotx': 'application/vnd.openxmlformats-..',
'.zaz': 'application/vnd.zzazz.deck+xml',
'.xer': 'application/patch-ops-error+xml'}}}
def demo():
mime_type = 'image/svg+xml'
try:
key_chain = find_mime_type(d, mime_type)
except KeyError:
print ('Could not find this mime type: {0}'.format(mime_type))
exit()
print ('Found {0} mime type here: {1}'.format(mime_type, key_chain))
nested = d
for key in key_chain:
nested = nested[key]
print ('Confirmation lookup: {0}'.format(nested))
def find_mime_type(d, mime_type):
reverse_linked_q = list()
reverse_linked_q.append((list(), d))
while reverse_linked_q:
this_key_chain, this_v = reverse_linked_q.pop()
# finish search if found the mime type
if this_v == mime_type:
return this_key_chain
# not found. keep searching
# queue dicts for checking / ignore anything that's not a dict
try:
items = this_v.items()
except AttributeError:
continue # this was not a nested dict. ignore it
for k, v in items:
reverse_linked_q.append((this_key_chain + [k], v))
# if we haven't returned by this point, we've exhausted all the contents
raise KeyError
if __name__ == '__main__':
demo()
Output:
Found image/svg+xml mime type here: ['dict1', 'part2', '.svg']
Confirmation lookup: image/svg+xml
Here is a solution that works for a complex data structure of nested lists and dicts
import pprint
def search(d, search_pattern, prev_datapoint_path=''):
output = []
current_datapoint = d
current_datapoint_path = prev_datapoint_path
if type(current_datapoint) is dict:
for dkey in current_datapoint:
if search_pattern in str(dkey):
c = current_datapoint_path
c+="['"+dkey+"']"
output.append(c)
c = current_datapoint_path
c+="['"+dkey+"']"
for i in search(current_datapoint[dkey], search_pattern, c):
output.append(i)
elif type(current_datapoint) is list:
for i in range(0, len(current_datapoint)):
if search_pattern in str(i):
c = current_datapoint_path
c += "[" + str(i) + "]"
output.append(i)
c = current_datapoint_path
c+="["+ str(i) +"]"
for i in search(current_datapoint[i], search_pattern, c):
output.append(i)
elif search_pattern in str(current_datapoint):
c = current_datapoint_path
output.append(c)
output = filter(None, output)
return list(output)
if __name__ == "__main__":
d = {'dict1':
{'part1':
{'.wbxml': 'application/vnd.wap.wbxml',
'.rl': 'application/resource-lists+xml'},
'part2':
{'.wsdl': 'application/wsdl+xml',
'.rs': 'application/rls-services+xml',
'.xop': 'application/xop+xml',
'.svg': 'image/svg+xml'}},
'dict2':
{'part1':
{'.dotx': 'application/vnd.openxmlformats-..',
'.zaz': 'application/vnd.zzazz.deck+xml',
'.xer': 'application/patch-ops-error+xml'}}}
d2 = {
"items":
{
"item":
[
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{"id": "1001", "type": "Regular"},
{"id": "1002", "type": "Chocolate"},
{"id": "1003", "type": "Blueberry"},
{"id": "1004", "type": "Devil's Food"}
]
},
"topping":
[
{"id": "5001", "type": "None"},
{"id": "5002", "type": "Glazed"},
{"id": "5005", "type": "Sugar"},
{"id": "5007", "type": "Powdered Sugar"},
{"id": "5006", "type": "Chocolate with Sprinkles"},
{"id": "5003", "type": "Chocolate"},
{"id": "5004", "type": "Maple"}
]
},
...
]
}
}
pprint.pprint(search(d,'svg+xml','d'))
>> ["d['dict1']['part2']['.svg']"]
pprint.pprint(search(d2,'500','d2'))
>> ["d2['items']['item'][0]['topping'][0]['id']",
"d2['items']['item'][0]['topping'][1]['id']",
"d2['items']['item'][0]['topping'][2]['id']",
"d2['items']['item'][0]['topping'][3]['id']",
"d2['items']['item'][0]['topping'][4]['id']",
"d2['items']['item'][0]['topping'][5]['id']",
"d2['items']['item'][0]['topping'][6]['id']"]
Here are two similar quick and dirty ways of doing this type of operation. The function find_parent_dict1 uses list comprehension but if you are uncomfortable with that then find_parent_dict2 uses the infamous nested for loops.
Dictionary = {'dict1':{'part1':{'.wbxml':'1','.rl':'2'},'part2':{'.wbdl':'3','.rs':'4'}},'dict2':{'part3':{'.wbxml':'5','.rl':'6'},'part4':{'.wbdl':'1','.rs':'10'}}}
value = '3'
def find_parent_dict1(Dictionary):
for key1 in Dictionary.keys():
item = {key1:key2 for key2 in Dictionary[key1].keys() if value in Dictionary[key1][key2].values()}
if len(item)>0:
return item
find_parent_dict1(Dictionary)
def find_parent_dict2(Dictionary):
for key1 in Dictionary.keys():
for key2 in Dictionary[key1].keys():
if value in Dictionary[key1][key2].values():
print {key1:key2}
find_parent_dict2(Dictionary)
Traverses a nested dict looking for a particular value. When success is achieved the full key path to the value is printed. I left all the comments and print statements for pedagogical purposes (this isn't production code!)
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Jan 24 17:16:46 2022
#author: wellington
"""
class Tree(dict):
"""
allows autovivification as in Perl hashes
"""
def __missing__(self, key):
value = self[key] = type(self)()
return value
# tracking the key sequence when seeking the target
key_list = Tree()
# dict storing the target success result
success = Tree()
# example nested dict of dicts and lists
E = {
'AA':
{
'BB':
{'CC':
{
'DD':
{
'ZZ':'YY',
'WW':'PP'
},
'QQ':
{
'RR':'SS'
},
},
'II':
{
'JJ':'KK'
},
'LL':['MM', 'GG', 'TT']
}
}
}
def find_keys_from_value(data, target):
"""
recursive function -
given a value it returns all the keys in the path to that value within
the dict "data"
there are many paths and many false routes
at the end of a given path if success has not been achieved
the function discards keys to get back to the next possible path junction
"""
print(f"the number of keys in the local dict is {len(data)}")
key_counter = 0
for key in data:
key_counter += 1
# if target has been located stop iterating through keys
if success[target] == 1:
break
else:
# eliminate prior key from path that did not lead to success
if key_counter > 1:
k_list.pop()
# add key to new path
k_list.append(key)
print(f"printing k_list after append{k_list}")
# if target located set success[target] = 1 and exit
if key == target or data[key] == target:
key_list[target] = k_list
success[target] = 1
break
# if the target has not been located check to see if the value
# associated with the new key is a dict and if so return to the
# recursive function with the new dict as "data"
elif isinstance(data[key], dict):
print(f"\nvalue is dict\n {data[key]}")
find_keys_from_value(data[key], target)
# check to see if the value associated with the new key is a list
elif isinstance(data[key], list):
# print("\nv is list\n")
# search through the list
for i in data[key]:
# check to see if the list element is a dict
# and if so return to the recursive function with
# the new dict as "data
if isinstance(i, dict):
find_keys_from_value(i, target)
# check to see if each list element is the target
elif i == target:
print(f"list entry {i} is target")
success[target] = 1
key_list[target] = k_list
elif i != target:
print(f"list entry {i} is not target")
print(f"printing k_list before pop_b {k_list}")
print(f"popping off key_b {key}")
# so if value is not a key and not a list and not the target then
# discard the key from the key list
elif data[key] != target:
print(f"value {data[key]} is not target")
print(f"printing k_list before removing key_before {k_list}")
print(f"removing key_c {key}")
k_list.remove(key)
# select target values
values = ["PP", "SS", "KK", "TT"]
success = {}
for target in values:
print(f"\nlooking for target {target}")
success[target] = 0
k_list = []
find_keys_from_value(E, target)
print(f"\nprinting key_list for target {target}")
print(f"{key_list[target]}\n")
print("\n****************\n\n")
What is the best solution to pivot/cross-tab tables in Python 3? Is there a built-in function that will do this? Ideally, I'm looking for a Python 3 solution that does not have external dependencies. For example, given a nested list:
nl = [["apples", 2 "New York"],
["peaches", 6, "New York"],
["apples", 6, "New York"],
["peaches", 1, "Vermont"]]
I would like to be able to rearrange rowed data and groupby fields:
apples peaches
New York 2 6
Vermont 6 1
The above is a trivial example, but is there a solution that would be easier than using itertools.groupby everytime a pivot is desired? Ideally, the solution would allow rowed data to be pivoted on any column. I was debating about using pandas, but it is an external library and only has limited Python 3 support.
Here is some simple code. Providing row/column/grand totals is left as an exercise for the reader.
class CrossTab(object):
def __init__(
self,
missing=0, # what to return for an empty cell.
# Alternatives: '', 0.0, None, 'NULL'
):
self.missing = missing
self.col_key_set = set()
self.cell_dict = {}
self.headings_OK = False
def add_item(self, row_key, col_key, value):
self.col_key_set.add(col_key)
try:
self.cell_dict[row_key][col_key] += value
except KeyError:
try:
self.cell_dict[row_key][col_key] = value
except KeyError:
self.cell_dict[row_key] = {col_key: value}
def _process_headings(self):
if self.headings_OK:
return
self.row_headings = list(sorted(self.cell_dict.keys()))
self.col_headings = list(sorted(self.col_key_set))
self.headings_OK = True
def get_col_headings(self):
self._process_headings()
return self.col_headings
def generate_row_info(self):
self._process_headings()
for row_key in self.row_headings:
row_dict = self.cell_dict[row_key]
row_vals = [
row_dict.get(col_key, self.missing)
for col_key in self.col_headings
]
yield row_key, row_vals
if __name__ == "__main__":
data = [["apples", 2, "New York"],
["peaches", 6, "New York"],
["apples", 6, "New York"],
["peaches", 1, "Vermont"]]
ctab = CrossTab(missing='uh-oh')
for s in data:
ctab.add_item(row_key=s[2], col_key=s[0], value=s[1])
print()
print('Column headings:', ctab.get_col_headings())
for row_heading, row_values in ctab.generate_row_info():
print(repr(row_heading), row_values)
Output:
Column headings: ['apples', 'peaches']
'New York' [8, 6]
'Vermont' ['uh-oh', 1]
See also this answer.
And this one, which I'd forgotten about.
itertools.groupby was exactly made for this problem. You will be hard-pressed to find something better, especially within the standard library.