The output below is a pretty printed snapshot of a portion of a dictionary that I am trying to work with. I'm looking to output the highest value of all entries in column p, as well as it's main dictionary key.
In the example output below, the value for p in GRTEUR is higher than any other values of p from any of the other main keys so I would like to return the main key and the value, so GRTEUR and -0.1752234098475558.
I've read about Pandas and using pandas.DataFrame.max() but I'm not finding any examples on how to evaluate the values from a key (p) of a nested dictionary (1h).
Any pointers?
data = {
"LUNAEUR": {
"1h": {
"ot": "2021-07-09 08:00:00",
"o": 6.033,
"h": 6.551,
"l": 5.983,
"ct": "2021-07-09 08:59:59.999000",
"p": -1.660459342023591
},
"stream0": {
"c": 6.444,
"v": 1393.808,
"ct": "2021-07-09 09:59:59.999000"
},
"stream1": {
"c": 6.446,
"v": 1171.177,
"ct": "2021-07-09 09:59:59.999000"
}
},
"THETAEUR": {
"1h": {
"ot": "2021-07-09 08:00:00",
"o": 4.992,
"h": 5.076,
"l": 4.956,
"ct": "2021-07-09 08:59:59.999000",
"p": -0.2963841138114934
},
"stream0": {
"c": 5.061,
"v": 492.138,
"ct": "2021-07-09 09:59:59.999000"
},
"stream1": {
"c": 5.067,
"v": 423.079,
"ct": "2021-07-09 09:59:59.999000"
}
},
"GRTEUR": {
"1h": {
"ot": "2021-07-09 08:00:00",
"o": 0.5616,
"h": 0.5717,
"l": 0.5523,
"ct": "2021-07-09 08:59:59.999000",
"p": -0.1752234098475558
},
"stream0": {
"c": 0.5707,
"v": 105.17,
"ct": "2021-07-09 09:59:59.999000"
},
"stream1": {
"c": 0.571,
"v": 19.71,
"ct": "2021-07-09 09:59:59.999000"
}
}
}
Filter the data using python max(..., key=...):
key, value = max(data.items(), key=lambda x: x[1]["1h"]["p"])
print(key, value["1h"]["p"])
To ignore those keys whose values don't contain the "p", you could either provide a very small default value
import sys
max(data.items(), key=lambda x: x[1]["1h"].get("p", -sys.float_info.max))
or filter before finding the max:
max(((key, val) for key, val in data.items() if "p" in val["1h"]),
key=lambda x: x[1]["1h"]["p"])
The reduce function gives the values of nested keys in each dictionary. Maybe you could try this:
def deep_get(dictionary, *keys):
print(keys)
return reduce(lambda d, key: d.get(key, None) if isinstance(d, dict) else None, keys, dictionary)
val_list=[]
key_list=["LUNAEUR","THETAEUR","GRTEUR"]
for item in key_list:
key1=item
key2='1h'
key3='p'
print(deep_get(data, key1,key2,key3))
val_list.append(deep_get(data, key1,key2,key3))
print(max(val_list)
Output:
-0.1752234098475558
Related
Working on a freshwater fish conservation project. I scraped a JSON file that looks like this:
{
"fish": [
{
"id": 0,
"n": "NO INFORMATION",
"a": "NONE",
"i": "none.png"
},
{
"id": 1,
"n": "Hampala barb",
"a": "Hampala macrolepidota",
"i": "hampala.png"
},
{
"id": 2,
"n": "Giant snakehead",
"a": "Channa micropeltes",
"i": "toman.png"
},
{
"id": 3,
"n": "Clown featherback",
"a": "Chitala ornata",
"i": "belida.png"
}
]
}
And I'm trying to extract the keys "id" and "a" into a python dictionary like this:
fish_id = {
0 : "NONE",
1 : "Hampala macrolepidota",
2 : "Channa micropeltes",
3 : "Chitala ornata"
}
import json
data = """{
"fish": [
{
"id": 0,
"n": "NO INFORMATION",
"a": "NONE",
"i": "none.png"
},
{
"id": 1,
"n": "Hampala barb",
"a": "Hampala macrolepidota",
"i": "hampala.png"
},
{
"id": 2,
"n": "Giant snakehead",
"a": "Channa micropeltes",
"i": "toman.png"
},
{
"id": 3,
"n": "Clown featherback",
"a": "Chitala ornata",
"i": "belida.png"
}
]
}"""
data_dict = json.loads(data)
fish_id = {}
for item in data_dict["fish"]:
fish_id[item["id"]] = item["a"]
print(fish_id)
First create a fish.json file and get your JSON file;
with open('fish.json') as json_file:
data = json.load(json_file)
Then, take your fishes;
fish1 = data['fish'][0]
fish2 = data['fish'][1]
fish3 = data['fish'][2]
fish4 = data['fish'][3]
After that take only values for each, because you want to create a dictionary only from values;
value_list1=list(fish1.values())
value_list2=list(fish2.values())
value_list3=list(fish3.values())
value_list4=list(fish4.values())
Finally, create fish_id dictionary;
fish_id = {
f"{value_list1[0]}" : f"{value_list1[2]}",
f"{value_list2[0]}" : f"{value_list2[2]}",
f"{value_list3[0]}" : f"{value_list3[2]}",
f"{value_list4[0]}" : f"{value_list4[2]}",
}
if you run;
print(fish_id)
Result will be like below, but if you can use for loops, it can be more effective.
{'0': 'NONE', '1': 'Hampala macrolepidota', '2': 'Channa micropeltes', '3': 'Chitala ornata'}
I have a nested dictionary that represents parent-child relationships. For example:
{
"45273425f5abc05b->s":
{
"12864f455e7c86bb->s": {
"12864f455e7c86bbexternal_call->c": {}
}
},
"c69aead72fcd6ec1->d":
{
"8ade76728bdddf27->d": {
"8ade76728bdddf27external_call->i": {}
},
"b29f07de47c5841f->d": {
"107bec1baede1bff->l": {
"e14ebabea4785c3f->l": {
"e14ebabea4785c3fexternal_call->r": {}
},
"e36b35daa794bd50->l": {
"e36b35daa794bd50external_call->a": {}
}
},
"b29f07de47c5841fexternal_call->l": {}
},
"1906ef2c2897ac01->d": {
"1906ef2c2897ac01external_call->e": {}
}
}
}
I want to do two things with this dictionary. Firstly I want to remove everything before and including "->" i.e I want to update the keys. Secondly, after renaming there will be duplicate values in the nested dictionary. for example the second element in the dictionary. If there are two keys with the same name I want to merge them into one. So, the result will look like the following:
{
"s":
{
"s": {
"c"
}
},
"d":
{
"d": {
"i",
"l": {
"l": {
"r",
"a"
}
},
"e"
}
}
}
How can I achieve this? I have written this code so far.
def alter_dict(nested_dict):
new_dict = {}
for k, v in nested_dict.items():
if isinstance(v, dict):
v = alter_dict(v)
new_key = k.split("->")[1]
new_dict[new_key] = v
return new_dict
It works for a simple one like the first element but doesn't work for the second one. It loses some information. The purpose of this is to create a graph with the dictionary.
You can use recursion:
import json
from collections import defaultdict
def merge(d):
r = defaultdict(list)
for i in d:
for a, b in i.items():
r[a.split('->')[-1]].append(b)
return {a:merge(b) for a, b in r.items()}
data = {'45273425f5abc05b->s': {'12864f455e7c86bb->s': {'12864f455e7c86bbexternal_call->c': {}}}, 'c69aead72fcd6ec1->d': {'8ade76728bdddf27->d': {'8ade76728bdddf27external_call->i': {}}, 'b29f07de47c5841f->d': {'107bec1baede1bff->l': {'e14ebabea4785c3f->l': {'e14ebabea4785c3fexternal_call->r': {}}, 'e36b35daa794bd50->l': {'e36b35daa794bd50external_call->a': {}}}, 'b29f07de47c5841fexternal_call->l': {}}, '1906ef2c2897ac01->d': {'1906ef2c2897ac01external_call->e': {}}}}
print(json.dumps(merge([data]), indent=4))
Output:
{
"s": {
"s": {
"c": {}
}
},
"d": {
"d": {
"i": {},
"l": {
"l": {
"r": {},
"a": {}
}
},
"e": {}
}
}
}
I have a csv file with some as the columns in the format x;y;z. I am using pandas to read this data, do some pre-processing and convert to a list of json objects using to_json/to_dict methods of pandas. While converting these special columns, the json object for that column should be of the format {x: {y: {z: value}}}. There could be different columns like x:y:z and x:y:a and these 2 have to be merged together into a single object in the resultant record json in the format i.e., {x: {y: {z: value1, a: value2}}}
CSV:
Id,Name,X;Y;Z,X;Y;A,X;B;Z
101,Adam,1,2,3
102,John,4,5,6
103,Sara,7,8,9
Output:
[
{
"Id":101,
"Name":"Adam",
"X":{
"Y":{
"Z":1,
"A":2
},
"B":{
"Z":3
}
}
},
{
"Id":102,
"Name":"John",
"X":{
"Y":{
"Z":4,
"A":5
},
"B":{
"Z":6
}
}
},
{
"Id":103,
"Name":"Sara",
"X":{
"Y":{
"Z":7,
"A":8
},
"B":{
"Z":9
}
}
}
]
I found it easier to use pandas to dump the data as a dict then use a recursive function to iterate through the keys and where I encounter a key which contains a ; then i split the key by this deliminator and recursively create the nested dicts. When i reach the last element in the split key i update the key with the original value and the remove the original key from the dict.
import pandas as pd
from io import StringIO
import json
def split_key_to_nested_dict(original_dict, original_key, nested_dict, nested_keys):
if nested_keys[0] not in nested_dict:
nested_dict[nested_keys[0]] = {}
if len(nested_keys) == 1:
nested_dict[nested_keys[0]] = original_dict[original_key]
del original_dict[original_key]
else:
split_key_to_nested_dict(original_dict, original_key, nested_dict[nested_keys[0]], nested_keys[1:])
csv_data = StringIO("""Id,Name,X;Y;Z,X;Y;A,X;B;Z
101,Adam,1,2,3
102,John,4,5,6
103,Sara,7,8,9""")
df = pd.DataFrame.from_csv(csv_data)
df.insert(0, df.index.name, df.index)
dict_data = df.to_dict('records')
for data in dict_data:
keys = list(data.keys())
for key in keys:
if ';' in key:
nested_keys = key.split(';')
split_key_to_nested_dict(data, key, data, nested_keys)
print(json.dumps(dict_data))
OUTPUT
[{"Id": 101, "Name": "Adam", "X": {"Y": {"Z": 1, "A": 2}, "B": {"Z": 3}}}, {"Id": 102, "Name": "John", "X": {"Y": {"Z": 4, "A": 5}, "B": {"Z": 6}}}, {"Id": 103, "Name": "Sara", "X": {"Y": {"Z": 7, "A": 8}, "B": {"Z": 9}}}]
FORMATED OUTPUT
[
{
"Id": 101,
"Name": "Adam",
"X": {
"Y": {
"Z": 1,
"A": 2
},
"B": {
"Z": 3
}
}
},
{
"Id": 102,
"Name": "John",
"X": {
"Y": {
"Z": 4,
"A": 5
},
"B": {
"Z": 6
}
}
},
{
"Id": 103,
"Name": "Sara",
"X": {
"Y": {
"Z": 7,
"A": 8
},
"B": {
"Z": 9
}
}
}
]
I'm struggling with this problem. I have a JSON file and needs ti put it out to CSV, its fine if the structure is kind of flat with no deep nested items.
But in this case the nested RACES is messing me up.
How would I go about getting the data in a format like this:
VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
for each object and each race in the object?
{
"1": {
"VENUE": "JOEBURG",
"COUNTRY": "HAE",
"ITW": "XAD",
"RACES": {
"1": {
"NO": 1,
"TIME": "12:35"
},
"2": {
"NO": 2,
"TIME": "13:10"
},
"3": {
"NO": 3,
"TIME": "13:40"
},
"4": {
"NO": 4,
"TIME": "14:10"
},
"5": {
"NO": 5,
"TIME": "14:55"
},
"6": {
"NO": 6,
"TIME": "15:30"
},
"7": {
"NO": 7,
"TIME": "16:05"
},
"8": {
"NO": 8,
"TIME": "16:40"
}
}
},
"2": {
"VENUE": "FOOBURG",
"COUNTRY": "ABA",
"ITW": "XAD",
"RACES": {
"1": {
"NO": 1,
"TIME": "12:35"
},
"2": {
"NO": 2,
"TIME": "13:10"
},
"3": {
"NO": 3,
"TIME": "13:40"
},
"4": {
"NO": 4,
"TIME": "14:10"
},
"5": {
"NO": 5,
"TIME": "14:55"
},
"6": {
"NO": 6,
"TIME": "15:30"
},
"7": {
"NO": 7,
"TIME": "16:05"
},
"8": {
"NO": 8,
"TIME": "16:40"
}
}
}, ...
}
I would like to output this to CSV like this:
VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
JOEBERG, HAE, XAD, 1, 12:35
JOEBERG, HAE, XAD, 2, 13:10
JOEBERG, HAE, XAD, 3, 13:40
...
...
FOOBURG, ABA, XAD, 1, 12:35
FOOBURG, ABA, XAD, 2, 13:10
So first I get the correct keys:
self.keys = self.data.keys()
keys = ["DATA_KEY"]
for key in self.keys:
if type(self.data[key]) == dict:
for k in self.data[key].keys():
if k not in keys:
if type(self.data[key][k]) == unicode:
keys.append(k)
elif type(self.data[key][k]) == dict:
self.subkey = k
for sk in self.data[key][k].values():
for subkey in sk.keys():
subkey = "%s__%s" % (self.subkey, subkey)
if subkey not in keys:
keys.append(subkey)
Then add the data:
But how?
This should be a fun one for you skilled forloopers. ;-)
I'd collect keys only for the first object, then assume that the rest of the format is consistent.
The following code also limits the nested object to just one; you did not specify what should happen when there is more than one. Having two or more nested structures of equal length could work (you'd 'zip' those together), but if you have structures of differing length you need to make an explicit choice how to handle those; zip with empty columns to pad, or to write out the product of those entries (A x B rows, repeating information from A each time you find a B entry).
import csv
from operator import itemgetter
with open(outputfile, 'wb') as outf:
writer = None # will be set to a csv.DictWriter later
for key, item in sorted(data.items(), key=itemgetter(0)):
row = {}
nested_name, nested_items = '', {}
for k, v in item.items():
if not isinstance(v, dict):
row[k] = v
else:
assert not nested_items, 'Only one nested structure is supported'
nested_name, nested_items = k, v
if writer is None:
# build fields for each first key of each nested item first
fields = sorted(row)
# sorted keys of first item in key sorted order
nested_keys = sorted(sorted(nested_items.items(), key=itemgetter(0))[0][1])
fields.extend('__'.join((nested_name, k)) for k in nested_keys)
writer = csv.DictWriter(outf, fields)
writer.writeheader()
for nkey, nitem in sorted(nested_items.items(), key=itemgetter(0)):
row.update(('__'.join((nested_name, k)), v) for k, v in nitem.items())
writer.writerow(row)
For your sample input, this produces:
COUNTRY,ITW,VENUE,RACES__NO,RACES__TIME
HAE,XAD,JOEBURG,1,12:35
HAE,XAD,JOEBURG,2,13:10
HAE,XAD,JOEBURG,3,13:40
HAE,XAD,JOEBURG,4,14:10
HAE,XAD,JOEBURG,5,14:55
HAE,XAD,JOEBURG,6,15:30
HAE,XAD,JOEBURG,7,16:05
HAE,XAD,JOEBURG,8,16:40
ABA,XAD,FOOBURG,1,12:35
ABA,XAD,FOOBURG,2,13:10
ABA,XAD,FOOBURG,3,13:40
ABA,XAD,FOOBURG,4,14:10
ABA,XAD,FOOBURG,5,14:55
ABA,XAD,FOOBURG,6,15:30
ABA,XAD,FOOBURG,7,16:05
ABA,XAD,FOOBURG,8,16:40
I'm struggling with this problem. I have a JSON file and needs ti put it out to CSV, its fine if the structure is kind of flat with no deep nested items.
But in this case the nested RACES is messing me up.
How would I go about getting the data in a format like this:
VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
for each object and each race in the object?
{
"1": {
"VENUE": "JOEBURG",
"COUNTRY": "HAE",
"ITW": "XAD",
"RACES": {
"1": {
"NO": 1,
"TIME": "12:35"
},
"2": {
"NO": 2,
"TIME": "13:10"
},
"3": {
"NO": 3,
"TIME": "13:40"
},
"4": {
"NO": 4,
"TIME": "14:10"
},
"5": {
"NO": 5,
"TIME": "14:55"
},
"6": {
"NO": 6,
"TIME": "15:30"
},
"7": {
"NO": 7,
"TIME": "16:05"
},
"8": {
"NO": 8,
"TIME": "16:40"
}
}
},
"2": {
"VENUE": "FOOBURG",
"COUNTRY": "ABA",
"ITW": "XAD",
"RACES": {
"1": {
"NO": 1,
"TIME": "12:35"
},
"2": {
"NO": 2,
"TIME": "13:10"
},
"3": {
"NO": 3,
"TIME": "13:40"
},
"4": {
"NO": 4,
"TIME": "14:10"
},
"5": {
"NO": 5,
"TIME": "14:55"
},
"6": {
"NO": 6,
"TIME": "15:30"
},
"7": {
"NO": 7,
"TIME": "16:05"
},
"8": {
"NO": 8,
"TIME": "16:40"
}
}
}, ...
}
I would like to output this to CSV like this:
VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
JOEBERG, HAE, XAD, 1, 12:35
JOEBERG, HAE, XAD, 2, 13:10
JOEBERG, HAE, XAD, 3, 13:40
...
...
FOOBURG, ABA, XAD, 1, 12:35
FOOBURG, ABA, XAD, 2, 13:10
So first I get the correct keys:
self.keys = self.data.keys()
keys = ["DATA_KEY"]
for key in self.keys:
if type(self.data[key]) == dict:
for k in self.data[key].keys():
if k not in keys:
if type(self.data[key][k]) == unicode:
keys.append(k)
elif type(self.data[key][k]) == dict:
self.subkey = k
for sk in self.data[key][k].values():
for subkey in sk.keys():
subkey = "%s__%s" % (self.subkey, subkey)
if subkey not in keys:
keys.append(subkey)
Then add the data:
But how?
This should be a fun one for you skilled forloopers. ;-)
I'd collect keys only for the first object, then assume that the rest of the format is consistent.
The following code also limits the nested object to just one; you did not specify what should happen when there is more than one. Having two or more nested structures of equal length could work (you'd 'zip' those together), but if you have structures of differing length you need to make an explicit choice how to handle those; zip with empty columns to pad, or to write out the product of those entries (A x B rows, repeating information from A each time you find a B entry).
import csv
from operator import itemgetter
with open(outputfile, 'wb') as outf:
writer = None # will be set to a csv.DictWriter later
for key, item in sorted(data.items(), key=itemgetter(0)):
row = {}
nested_name, nested_items = '', {}
for k, v in item.items():
if not isinstance(v, dict):
row[k] = v
else:
assert not nested_items, 'Only one nested structure is supported'
nested_name, nested_items = k, v
if writer is None:
# build fields for each first key of each nested item first
fields = sorted(row)
# sorted keys of first item in key sorted order
nested_keys = sorted(sorted(nested_items.items(), key=itemgetter(0))[0][1])
fields.extend('__'.join((nested_name, k)) for k in nested_keys)
writer = csv.DictWriter(outf, fields)
writer.writeheader()
for nkey, nitem in sorted(nested_items.items(), key=itemgetter(0)):
row.update(('__'.join((nested_name, k)), v) for k, v in nitem.items())
writer.writerow(row)
For your sample input, this produces:
COUNTRY,ITW,VENUE,RACES__NO,RACES__TIME
HAE,XAD,JOEBURG,1,12:35
HAE,XAD,JOEBURG,2,13:10
HAE,XAD,JOEBURG,3,13:40
HAE,XAD,JOEBURG,4,14:10
HAE,XAD,JOEBURG,5,14:55
HAE,XAD,JOEBURG,6,15:30
HAE,XAD,JOEBURG,7,16:05
HAE,XAD,JOEBURG,8,16:40
ABA,XAD,FOOBURG,1,12:35
ABA,XAD,FOOBURG,2,13:10
ABA,XAD,FOOBURG,3,13:40
ABA,XAD,FOOBURG,4,14:10
ABA,XAD,FOOBURG,5,14:55
ABA,XAD,FOOBURG,6,15:30
ABA,XAD,FOOBURG,7,16:05
ABA,XAD,FOOBURG,8,16:40