Related
Working on a freshwater fish conservation project. I scraped a JSON file that looks like this:
{
"fish": [
{
"id": 0,
"n": "NO INFORMATION",
"a": "NONE",
"i": "none.png"
},
{
"id": 1,
"n": "Hampala barb",
"a": "Hampala macrolepidota",
"i": "hampala.png"
},
{
"id": 2,
"n": "Giant snakehead",
"a": "Channa micropeltes",
"i": "toman.png"
},
{
"id": 3,
"n": "Clown featherback",
"a": "Chitala ornata",
"i": "belida.png"
}
]
}
And I'm trying to extract the keys "id" and "a" into a python dictionary like this:
fish_id = {
0 : "NONE",
1 : "Hampala macrolepidota",
2 : "Channa micropeltes",
3 : "Chitala ornata"
}
import json
data = """{
"fish": [
{
"id": 0,
"n": "NO INFORMATION",
"a": "NONE",
"i": "none.png"
},
{
"id": 1,
"n": "Hampala barb",
"a": "Hampala macrolepidota",
"i": "hampala.png"
},
{
"id": 2,
"n": "Giant snakehead",
"a": "Channa micropeltes",
"i": "toman.png"
},
{
"id": 3,
"n": "Clown featherback",
"a": "Chitala ornata",
"i": "belida.png"
}
]
}"""
data_dict = json.loads(data)
fish_id = {}
for item in data_dict["fish"]:
fish_id[item["id"]] = item["a"]
print(fish_id)
First create a fish.json file and get your JSON file;
with open('fish.json') as json_file:
data = json.load(json_file)
Then, take your fishes;
fish1 = data['fish'][0]
fish2 = data['fish'][1]
fish3 = data['fish'][2]
fish4 = data['fish'][3]
After that take only values for each, because you want to create a dictionary only from values;
value_list1=list(fish1.values())
value_list2=list(fish2.values())
value_list3=list(fish3.values())
value_list4=list(fish4.values())
Finally, create fish_id dictionary;
fish_id = {
f"{value_list1[0]}" : f"{value_list1[2]}",
f"{value_list2[0]}" : f"{value_list2[2]}",
f"{value_list3[0]}" : f"{value_list3[2]}",
f"{value_list4[0]}" : f"{value_list4[2]}",
}
if you run;
print(fish_id)
Result will be like below, but if you can use for loops, it can be more effective.
{'0': 'NONE', '1': 'Hampala macrolepidota', '2': 'Channa micropeltes', '3': 'Chitala ornata'}
The output below is a pretty printed snapshot of a portion of a dictionary that I am trying to work with. I'm looking to output the highest value of all entries in column p, as well as it's main dictionary key.
In the example output below, the value for p in GRTEUR is higher than any other values of p from any of the other main keys so I would like to return the main key and the value, so GRTEUR and -0.1752234098475558.
I've read about Pandas and using pandas.DataFrame.max() but I'm not finding any examples on how to evaluate the values from a key (p) of a nested dictionary (1h).
Any pointers?
data = {
"LUNAEUR": {
"1h": {
"ot": "2021-07-09 08:00:00",
"o": 6.033,
"h": 6.551,
"l": 5.983,
"ct": "2021-07-09 08:59:59.999000",
"p": -1.660459342023591
},
"stream0": {
"c": 6.444,
"v": 1393.808,
"ct": "2021-07-09 09:59:59.999000"
},
"stream1": {
"c": 6.446,
"v": 1171.177,
"ct": "2021-07-09 09:59:59.999000"
}
},
"THETAEUR": {
"1h": {
"ot": "2021-07-09 08:00:00",
"o": 4.992,
"h": 5.076,
"l": 4.956,
"ct": "2021-07-09 08:59:59.999000",
"p": -0.2963841138114934
},
"stream0": {
"c": 5.061,
"v": 492.138,
"ct": "2021-07-09 09:59:59.999000"
},
"stream1": {
"c": 5.067,
"v": 423.079,
"ct": "2021-07-09 09:59:59.999000"
}
},
"GRTEUR": {
"1h": {
"ot": "2021-07-09 08:00:00",
"o": 0.5616,
"h": 0.5717,
"l": 0.5523,
"ct": "2021-07-09 08:59:59.999000",
"p": -0.1752234098475558
},
"stream0": {
"c": 0.5707,
"v": 105.17,
"ct": "2021-07-09 09:59:59.999000"
},
"stream1": {
"c": 0.571,
"v": 19.71,
"ct": "2021-07-09 09:59:59.999000"
}
}
}
Filter the data using python max(..., key=...):
key, value = max(data.items(), key=lambda x: x[1]["1h"]["p"])
print(key, value["1h"]["p"])
To ignore those keys whose values don't contain the "p", you could either provide a very small default value
import sys
max(data.items(), key=lambda x: x[1]["1h"].get("p", -sys.float_info.max))
or filter before finding the max:
max(((key, val) for key, val in data.items() if "p" in val["1h"]),
key=lambda x: x[1]["1h"]["p"])
The reduce function gives the values of nested keys in each dictionary. Maybe you could try this:
def deep_get(dictionary, *keys):
print(keys)
return reduce(lambda d, key: d.get(key, None) if isinstance(d, dict) else None, keys, dictionary)
val_list=[]
key_list=["LUNAEUR","THETAEUR","GRTEUR"]
for item in key_list:
key1=item
key2='1h'
key3='p'
print(deep_get(data, key1,key2,key3))
val_list.append(deep_get(data, key1,key2,key3))
print(max(val_list)
Output:
-0.1752234098475558
This question already has answers here:
Search for a value in a nested dictionary python
(5 answers)
Closed 1 year ago.
I've got a nested json and I would like to find a substring in any pair's value. The results should be the pair's name or None or similar if not found at all. So for example let's assume that I am looking for the substring "met". Then for:
{
"a": "example",
{
"b": "another example",
"c": "something else"
}
}
result should be "c" (as "met" is found in "something else")
and for:
{
"a": "example",
{
"b": "another example",
"c": "yet another one"
}
}
the result should be None as no met is found.
I have no additional information about json.
How to do it in the most efficient way?
You could write a recursive function to check each of the values and return the key for the first match
def find_nested(item, d):
for key, value in d.items():
if isinstance(value, dict):
return find_nested(item, value)
if item in value:
return key
For example
>>> d = {
"a": "example",
"d": {
"b": "another example",
"c": "something else"
}
}
>>> find_nested('met', d)
'c'
>>> d = {
"a": "example",
"d": {
"b": "another example",
"c": "yet another one"
}
}
>>> find_nested('met', d) # returns None
I have the following JSON document:
{
"A": "A_VALUE",
"B": {
"C": [
{
"D": {
"E": "E_VALUE1",
"F": "F_VALUE1",
"G": "G_VALUE1"
},
"H": ["01", "23" ]
},
{
"D": {
"E": "E_VALUE2",
"F": "F_VALUE2",
"G": "G_VALUE3"
},
"H": ["45", "67" ]
}
]
}
}
and I would like to extract field H using a jsonpath2 expression where I specify a value for E field,
for example :
$..C[?(#.D.G="G_VALUE1")].H[1]
The code I use to parse this is the following ( jsonpath version 0.4.3 ):
from jsonpath2.path import Path
s='{ "A": "A_VALUE", "B": { "C": [ { "D": { "E": "E_VALUE1", "F": "F_VALUE1", "G": "G_VALUE1" }, "H": ["01", "23" ] }, { "D": { "E": "E_VALUE2", "F": "F_VALUE2", "G": "G_VALUE3" }, "H": ["45", "67" ] } ] } }"'
p = Path.parse_str("$..C[?(#.D.E=\"E_VALUE1\")].H[1]")
print ([m.current_value for m in p.match(s)])
output
[]
Now, if I use JsonPath evaluator on https://jsonpath.com/ I obtain the following result which is not exatly what I need
$..C[?(#.D.E="E_VALUE1")].H[1]
output
[23,67]
But If I change the expression this way than it works and I obtain what I need;
$..C[?(#.D.E=="E_VALUE1")].H[1]
output
[23]
Same results with other online evaluator such as https://codebeautify.org/jsonpath-tester
So what would be the correct jsonpath expression I should use with jsonpath2 api in order to correctly extract the two required fields ?
You have to use [*] to access individual objects inside an array. This code works -
from jsonpath2.path import Path
import json
s='{ "A": "A_VALUE", "B": { "C": [ { "D": { "E": "E_VALUE1", "F": "F_VALUE1", "G": "G_VALUE1" }, "H": ["01", "23" ] }, { "D": { "E": "E_VALUE2", "F": "F_VALUE2", "G": "G_VALUE3" }, "H": ["45", "67" ] } ] } }'
jso = json.loads(s)
p = Path.parse_str('$..C[*][?(#.D.E="E_VALUE1")].H[1]') # C[*] access each bject in the array
print (*[m.current_value for m in p.match(jso)]) # 23
You can refer to this example from the jsonpath2 docs
You should use the == syntax.
Full disclosure: I've never heard of jsonpath before coming across your question, but being somewhat familiar with XPath, I figured I would read about this tool. I came across a site that can evaluate your expresssion using diffeernt implementations: http://jsonpath.herokuapp.com. The net result was that your expression with = could not be parsed by 3 of the 4 implementations. Moreover, the Goessner implementation returned results that you weren't expecting (all C elements matched and the result was [23,67]. With the == boolean expression, 3 of the 4 implementations provided the expected result of [23]. The Nebhale implementation again complained about the expresssion.
Essentially I need to write a parser of some product of a markup. It's a list of strings formatted like such:
x = [
'A:B:C:D:E',
'A:B:D',
'A:C:E:F',
'B:D:E',
'B:C',
'A:C:F',
]
I need to turn it into a python object like so:
{
"B": [
"C",
{
"D": "E"
}
],
"A": [
{
"B": [
"D",
{
"C": {
"D": "E"
}
}
]
},
{
"C": [
"F",
{
"E": "F"
}
]
}
]
}
You can copy above and paste into this inspector to look at the object hierarchy, and understand what I'm going after. In any regards, it's a nested dictionary combining common keys, and putting items in lists sometimes.
TL;DR -
I have written a function below
splits = [l.split(':') for l in x]
def DictDrill(o):
# list of lists
if type(o)==type([]) and all([type(l)==type([]) for l in o]):
d = dict()
for group in o:
if type(group)==type([]) and len(group)>1:
d[group[0]] = d.get(group[0],[]) + [group[1:]]
if type(group)==type([]) and len(group)==1:
d[group[0]] = d.get(group[0],[]) + []
return DictDrill(d)
# a dictionary
elif type(o)==type({}):
next = dict(o)
for k,groups in next.items():
next[k] = DictDrill(groups)
return next
But you'll see that this script is only returning dictionaries and the last item is placed on as a key again with an empty dict() as value. If you run my script like DictDrill(splits) on the example you will see this:
{
"B": {
"C": {},
"D": {
"E": {}
}
},
"A": {
"C": {
"E": {
"F": {}
},
"F": {}
},
"B": {
"C": {
"D": {
"E": {}
}
},
"D": {}
}
}
}
Notice the useless {} as values
Preferably I need to solve this in python. I know a little C# but it seems very cumbersome to move data around between lists and dictionaries...
You can use itertools.groupby with recursion:
from itertools import groupby as gb
data = ['A:B:C:D:E', 'A:B:D', 'A:C:E:F', 'B:D:E', 'B:C', 'A:C:F']
def to_dict(d):
if isinstance(d, dict) or not d or any(isinstance(i, (dict, list)) for i in d):
return d
return d[0] if len(d) == 1 else {d[0]:to_dict(d[1:])}
def group(d):
_d = [(a, [c for _, *c in b]) for a, b in gb(sorted(d, key=lambda x:x[0]), key=lambda x:x[0])]
new_d =[{a:to_dict(b[0] if len(b) == 1 else group(b))} for a, b in _d]
return [i for b in new_d for i in (b if not all(b.values()) else [b])]
import json
print(json.dumps(group([i.split(':') for i in data]), indent=4))
Output:
[
{
"A": [
{
"B": [
{
"C": {
"D": "E"
}
},
"D"
]
},
{
"C": [
{
"E": "F"
},
"F"
]
}
]
},
{
"B": [
"C",
{
"D": "E"
}
]
}
]