How do I pretty-print a JSON file in Python?
Use the indent= parameter of json.dump() or json.dumps() to specify how many spaces to indent by:
>>> import json
>>> your_json = '["foo", {"bar": ["baz", null, 1.0, 2]}]'
>>> parsed = json.loads(your_json)
>>> print(json.dumps(parsed, indent=4))
[
"foo",
{
"bar": [
"baz",
null,
1.0,
2
]
}
]
To parse a file, use json.load():
with open('filename.txt', 'r') as handle:
parsed = json.load(handle)
You can do this on the command line:
python3 -m json.tool some.json
(as already mentioned in the commentaries to the question, thanks to #Kai Petzke for the python3 suggestion).
Actually python is not my favourite tool as far as json processing on the command line is concerned. For simple pretty printing is ok, but if you want to manipulate the json it can become overcomplicated. You'd soon need to write a separate script-file, you could end up with maps whose keys are u"some-key" (python unicode), which makes selecting fields more difficult and doesn't really go in the direction of pretty-printing.
You can also use jq:
jq . some.json
and you get colors as a bonus (and way easier extendability).
Addendum: There is some confusion in the comments about using jq to process large JSON files on the one hand, and having a very large jq program on the other. For pretty-printing a file consisting of a single large JSON entity, the practical limitation is RAM. For pretty-printing a 2GB file consisting of a single array of real-world data, the "maximum resident set size" required for pretty-printing was 5GB (whether using jq 1.5 or 1.6). Note also that jq can be used from within python after pip install jq.
After reading the data with the json standard library module, use the pprint standard library module to display the parsed data. Example:
import json
import pprint
json_data = None
with open('file_name.txt', 'r') as f:
data = f.read()
json_data = json.loads(data)
pprint.pprint(json_data)
The output will look like:
{'address': {'city': 'New York',
'postalCode': '10021-3100',
'state': 'NY',
'streetAddress': '21 2nd Street'},
'age': 27,
'children': [],
'firstName': 'John',
'isAlive': True,
'lastName': 'Smith'}
Note that this output is not valid JSON; while it shows the content of the Python data structure with nice formatting, it uses Python syntax to do so. In particular, strings are (usually) enclosed in single quotes, whereas JSON requires double quotes. To rewrite the data to a JSON file, use pprint.pformat:
pretty_print_json = pprint.pformat(json_data).replace("'", '"')
with open('file_name.json', 'w') as f:
f.write(pretty_print_json)
Pygmentize is a powerful tool for coloring the output of terminal commands.
Here is an example of using it to add syntax highlighting to the json.tool output:
echo '{"foo": "bar"}' | python -m json.tool | pygmentize -l json
The result will look like:
In a previous Stack Overflow answer, I show in detail how to install and use pygmentize.
Use this function and don't sweat having to remember if your JSON is a str or dict again - just look at the pretty print:
import json
def pp_json(json_thing, sort=True, indents=4):
if type(json_thing) is str:
print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
else:
print(json.dumps(json_thing, sort_keys=sort, indent=indents))
return None
pp_json(your_json_string_or_dict)
Use pprint: https://docs.python.org/3.6/library/pprint.html
import pprint
pprint.pprint(json)
print() compared to pprint.pprint()
print(json)
{'feed': {'title': 'W3Schools Home Page', 'title_detail': {'type': 'text/plain', 'language': None, 'base': '', 'value': 'W3Schools Home Page'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.w3schools.com'}], 'link': 'https://www.w3schools.com', 'subtitle': 'Free web building tutorials', 'subtitle_detail': {'type': 'text/html', 'language': None, 'base': '', 'value': 'Free web building tutorials'}}, 'entries': [], 'bozo': 0, 'encoding': 'utf-8', 'version': 'rss20', 'namespaces': {}}
pprint.pprint(json)
{'bozo': 0,
'encoding': 'utf-8',
'entries': [],
'feed': {'link': 'https://www.w3schools.com',
'links': [{'href': 'https://www.w3schools.com',
'rel': 'alternate',
'type': 'text/html'}],
'subtitle': 'Free web building tutorials',
'subtitle_detail': {'base': '',
'language': None,
'type': 'text/html',
'value': 'Free web building tutorials'},
'title': 'W3Schools Home Page',
'title_detail': {'base': '',
'language': None,
'type': 'text/plain',
'value': 'W3Schools Home Page'}},
'namespaces': {},
'version': 'rss20'}
To be able to pretty print from the command line and be able to have control over the indentation etc. you can set up an alias similar to this:
alias jsonpp="python -c 'import sys, json; print json.dumps(json.load(sys.stdin), sort_keys=True, indent=2)'"
And then use the alias in one of these ways:
cat myfile.json | jsonpp
jsonpp < myfile.json
You could try pprintjson.
Installation
$ pip3 install pprintjson
Usage
Pretty print JSON from a file using the pprintjson CLI.
$ pprintjson "./path/to/file.json"
Pretty print JSON from a stdin using the pprintjson CLI.
$ echo '{ "a": 1, "b": "string", "c": true }' | pprintjson
Pretty print JSON from a string using the pprintjson CLI.
$ pprintjson -c '{ "a": 1, "b": "string", "c": true }'
Pretty print JSON from a string with an indent of 1.
$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -i 1
Pretty print JSON from a string and save output to a file output.json.
$ pprintjson -c '{ "a": 1, "b": "string", "c": true }' -o ./output.json
Output
def saveJson(date,fileToSave):
with open(fileToSave, 'w+') as fileToSave:
json.dump(date, fileToSave, ensure_ascii=True, indent=4, sort_keys=True)
It works to display or save it to a file.
Here's a simple example of pretty printing JSON to the console in a nice way in Python, without requiring the JSON to be on your computer as a local file:
import pprint
import json
from urllib.request import urlopen # (Only used to get this example)
# Getting a JSON example for this example
r = urlopen("https://mdn.github.io/fetch-examples/fetch-json/products.json")
text = r.read()
# To print it
pprint.pprint(json.loads(text))
TL;DR: many ways, also consider print(yaml.dump(j, sort_keys=False))
For most uses, indent should do it:
print(json.dumps(parsed, indent=2))
A Json structure is basically tree structure.
While trying to find something fancier, I came across this nice paper depicting other forms of nice trees that might be interesting: https://blog.ouseful.info/2021/07/13/exploring-the-hierarchical-structure-of-dataframes-and-csv-data/.
It has some interactive trees and even comes with some code including this collapsing tree from so:
Other samples include using plotly Here is the code example from plotly:
import plotly.express as px
fig = px.treemap(
names = ["Eve","Cain", "Seth", "Enos", "Noam", "Abel", "Awan", "Enoch", "Azura"],
parents = ["", "Eve", "Eve", "Seth", "Seth", "Eve", "Eve", "Awan", "Eve"]
)
fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
And using treelib. On that note, This github also provides nice visualizations. Here is one example using treelib:
#%pip install treelib
from treelib import Tree
country_tree = Tree()
# Create a root node
country_tree.create_node("Country", "countries")
# Group by country
for country, regions in wards_df.head(5).groupby(["CTRY17NM", "CTRY17CD"]):
# Generate a node for each country
country_tree.create_node(country[0], country[1], parent="countries")
# Group by region
for region, las in regions.groupby(["GOR10NM", "GOR10CD"]):
# Generate a node for each region
country_tree.create_node(region[0], region[1], parent=country[1])
# Group by local authority
for la, wards in las.groupby(['LAD17NM', 'LAD17CD']):
# Create a node for each local authority
country_tree.create_node(la[0], la[1], parent=region[1])
for ward, _ in wards.groupby(['WD17NM', 'WD17CD']):
# Create a leaf node for each ward
country_tree.create_node(ward[0], ward[1], parent=la[1])
# Output the hierarchical data
country_tree.show()
I have, based on this, created a function to convert json to a tree:
from treelib import Node, Tree, node
def create_node(tree, s, counter_byref, verbose, parent_id=None):
node_id = counter_byref[0]
if verbose:
print(f"tree.create_node({s}, {node_id}, parent={parent_id})")
tree.create_node(s, node_id, parent=parent_id)
counter_byref[0] += 1
return node_id
def to_compact_string(o):
if type(o) == dict:
if len(o)>1:
raise Exception()
k,v =next(iter(o.items()))
return f'{k}:{to_compact_string(v)}'
elif type(o) == list:
if len(o)>1:
raise Exception()
return f'[{to_compact_string(next(iter(o)))}]'
else:
return str(o)
def to_compact(tree, o, counter_byref, verbose, parent_id):
try:
s = to_compact_string(o)
if verbose:
print(f"# to_compact({o}) ==> [{s}]")
create_node(tree, s, counter_byref, verbose, parent_id=parent_id)
return True
except:
return False
def json_2_tree(o , parent_id=None, tree=None, counter_byref=[0], verbose=False, compact_single_dict=False, listsNodeSymbol='+'):
if tree is None:
tree = Tree()
parent_id = create_node(tree, '+', counter_byref, verbose)
if compact_single_dict and to_compact(tree, o, counter_byref, verbose, parent_id):
# no need to do more, inserted as a single node
pass
elif type(o) == dict:
for k,v in o.items():
if compact_single_dict and to_compact(tree, {k:v}, counter_byref, verbose, parent_id):
# no need to do more, inserted as a single node
continue
key_nd_id = create_node(tree, str(k), counter_byref, verbose, parent_id=parent_id)
if verbose:
print(f"# json_2_tree({v})")
json_2_tree(v , parent_id=key_nd_id, tree=tree, counter_byref=counter_byref, verbose=verbose, listsNodeSymbol=listsNodeSymbol, compact_single_dict=compact_single_dict)
elif type(o) == list:
if listsNodeSymbol is not None:
parent_id = create_node(tree, listsNodeSymbol, counter_byref, verbose, parent_id=parent_id)
for i in o:
if compact_single_dict and to_compact(tree, i, counter_byref, verbose, parent_id):
# no need to do more, inserted as a single node
continue
if verbose:
print(f"# json_2_tree({i})")
json_2_tree(i , parent_id=parent_id, tree=tree, counter_byref=counter_byref, verbose=verbose,listsNodeSymbol=listsNodeSymbol, compact_single_dict=compact_single_dict)
else: #node
create_node(tree, str(o), counter_byref, verbose, parent_id=parent_id)
return tree
Then for example:
import json
j = json.loads('{"2": 3, "4": [5, 6], "7": {"8": 9}}')
json_2_tree(j ,verbose=False,listsNodeSymbol='+' ).show()
gives:
+
├── 2
│ └── 3
├── 4
│ └── +
│ ├── 5
│ └── 6
└── 7
└── 8
└── 9
While
json_2_tree(j ,listsNodeSymbol=None, verbose=False ).show()
+
├── 2
│ └── 3
├── 4
│ ├── 5
│ └── 6
└── 7
└── 8
└── 9
And
json_2_tree(j ,compact_single_dict=True,listsNodeSymbol=None).show()
+
├── 2:3
├── 4
│ ├── 5
│ └── 6
└── 7:8:9
As you see, there are different trees one can make depending on how explicit vs. compact he wants to be.
One of my favorites, and one of the most compact ones might be using yaml:
import yaml
j = json.loads('{"2": "3", "4": ["5", "6"], "7": {"8": "9"}}')
print(yaml.dump(j, sort_keys=False))
Gives the compact and unambiguous:
'2': '3'
'4':
- '5'
- '6'
'7':
'8': '9'
I think that's better to parse the json before, to avoid errors:
def format_response(response):
try:
parsed = json.loads(response.text)
except JSONDecodeError:
return response.text
return json.dumps(parsed, ensure_ascii=True, indent=4)
I had a similar requirement to dump the contents of json file for logging, something quick and easy:
print(json.dumps(json.load(open(os.path.join('<myPath>', '<myjson>'), "r")), indent = 4 ))
if you use it often then put it in a function:
def pp_json_file(path, file):
print(json.dumps(json.load(open(os.path.join(path, file), "r")), indent = 4))
A very simple way is using rich. with this method you can also highlight the json
This method reads data from a json file called config.json
from rich import print_json
setup_type = open('config.json')
data = json.load(setup_type)
print_json(data=data)
It's far from perfect, but it does the job.
data = data.replace(',"',',\n"')
you can improve it, add indenting and so on, but if you just want to be able to read a cleaner json, this is the way to go.
Related
I'm trying to convert xml to json in python using xmltodict library. Though, the xml is getting converted to json, before every key in dict, '#' is getting prefixed. Below is the code snippet and sample output:
import xmltodict
import json
with open('response.xml','r') as res_file:
doc = xmltodict.parse(res_file.read())
xml_json_str = json.dumps(doc)
final_json = json.loads(xml_json_str)
Output:
"CustomerInfo": {
"#address": "Bangalore, Karnataka 560034",
"#email": "abc#gmail.com",
"#name": "Sam",
}
How to remove # from all key's at one go?
Finally I found a solution which works like charm. While parsing the xml, set attr_prefix='' to remove all # from keys.
Below changes worked for me:
with open('response.xml','r') as res_file:
doc = xmltodict.parse(res_file.read(), attr_prefix='')
Check this out:
It will remove all the # from all keys be it in any node: I have added one extra note just to show you the example:
def removeAtTheRate(jsonFile,final_json_edited):
if jsonFile != {} and type(jsonFile) == dict:
for i in jsonFile.keys():
final_json_values = {}
for j in jsonFile[i]:
if j[:1] == '#':
final_json_values[j[1:]] = jsonFile[i][j]
if i[:1] == '#':
final_json_edited[i[1:]] = final_json_values
else:
final_json_edited[i] = final_json_values
print(final_json_edited)
doc = {"#CustomerInfo":{"#address": "Bangalore, Karnataka 560034","#email": "abc#gmail.com","#name": "Sam"},"Location":{"#Loc":"Mum"}}
removeAtTheRate(doc,{})
Result:
>> {'Location': {'Loc': 'Mum'}, 'CustomerInfo': {'name': 'Sam', 'address':
'Bangalore, Karnataka 560034', 'email': 'abc#gmail.com'}}
I have a pretty big dictionary which looks like this:
{
'startIndex': 1,
'username': 'myemail#gmail.com',
'items': [{
'id': '67022006',
'name': 'Adopt-a-Hydrant',
'kind': 'analytics#accountSummary',
'webProperties': [{
'id': 'UA-67522226-1',
'name': 'Adopt-a-Hydrant',
'websiteUrl': 'https://www.udemy.com/,
'internalWebPropertyId': '104343473',
'profiles': [{
'id': '108333146',
'name': 'Adopt a Hydrant (Udemy)',
'type': 'WEB',
'kind': 'analytics#profileSummary'
}, {
'id': '132099908',
'name': 'Unfiltered view',
'type': 'WEB',
'kind': 'analytics#profileSummary'
}],
'level': 'STANDARD',
'kind': 'analytics#webPropertySummary'
}]
}, {
'id': '44222959',
'name': 'A223n',
'kind': 'analytics#accountSummary',
And so on....
When I copy this dictionary on my Jupyter notebook and I run the exact same function I run on my django code it runs as expected, everything is literarily the same, in my django code I'm even printing the dictionary out then I copy it to the notebook and run it and I get what I'm expecting.
Just for more info this is the function:
google_profile = gp.google_profile # Get google_profile from DB
print(google_profile)
all_properties = []
for properties in google_profile['items']:
all_properties.append(properties)
site_selection=[]
for single_property in all_properties:
single_propery_name=single_property['name']
for single_view in single_property['webProperties'][0]['profiles']:
single_view_id = single_view['id']
single_view_name = (single_view['name'])
selections = single_propery_name + ' (View: '+single_view_name+' ID: '+single_view_id+')'
site_selection.append(selections)
print (site_selection)
So my guess is that my notebook has some sort of json parser installed or something like that? Is that possible? Why in django I can't access dictionaries the same way I can on my ipython notebooks?
EDITS
More info:
The error is at the line: for properties in google_profile['items']:
Django debug is: TypeError at /gconnect/ string indices must be integers
Local Vars are:
all_properties =[]
current_user = 'myemail#gmail.com'
google_profile = `the above dictionary`
So just to make it clear for who finds this question:
If you save a dictionary in a database django will save it as a string, so you won't be able to access it after.
To solve this you can re-convert it to a dictionary:
The answer from this post worked perfectly for me, in other words:
import json
s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
json_acceptable_string = s.replace("'", "\"")
d = json.loads(json_acceptable_string)
# d = {u'muffin': u'lolz', u'foo': u'kitty'}
There are many ways to convert a string to a dictionary, this is only one. If you stumbled in this problem you can quickly check if it's a string instead of a dictionary with:
print(type(var))
In my case I had:
<class 'str'>
before converting it with the above method and then I got
<class 'dict'>
and everything worked as supposed to
I have around 10 EBS volumes attached to a single instance. Below is e.g., of lsblk for some of them. Here we can't simply mount xvdf or xvdp to some location but actual point is xvdf1, xvdf2, xvdp which are to be mounted. I want to have a script that would allow me to iterate through all the points under xvdf, xvdp etc. using python. I m newbie to python.
[root#ip-172-31-1-65 ec2-user]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvdf 202:80 0 35G 0 disk
├─xvdf1 202:81 0 350M 0 part
└─xvdf2 202:82 0 34.7G 0 part
xvdp 202:0 0 8G 0 disk
└─xvdp1 202:1 0 8G 0 part
If you have a relatively new lsblk, you can easily import its json output into a python dictionary, which then open all possibilities for iterations.
# lsblk --version
lsblk from util-linux 2.28.2
For example, you could run the following command to gather all block devices and their children with their name and mount point. Use --help to get a list of all supported columns.
# lsblk --json -o NAME,MOUNTPOINT
{
"blockdevices": [
{"name": "vda", "mountpoint": null,
"children": [
{"name": "vda1", "mountpoint": null,
"children": [
{"name": "pv-root", "mountpoint": "/"},
{"name": "pv-var", "mountpoint": "/var"},
{"name": "pv-swap", "mountpoint": "[SWAP]"},
]
},
]
}
]
}
So you just have to pipe that output into a file and use python's json parser. Or run the command straight within your script as the example below shows:
#!/usr/bin/python3.7
import json
import subprocess
process = subprocess.run("/usr/bin/lsblk --json -o NAME,MOUNTPOINT".split(),
capture_output=True, text=True)
# blockdevices is a dictionary with all the info from lsblk.
# Manipulate it as you wish.
blockdevices = json.loads(process.stdout)
print(json.dumps(blockdevices, indent=4))
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
def parse(file_name):
result = []
with open(file_name) as input_file:
for line in input_file:
temp_arr = line.split(' ')
for item in temp_arr:
if '└─' in item or '├─' in item:
result.append(item.replace('└─','').replace('├─',''))
return result
def main(argv):
if len(argv)>1:
print 'Usage: ./parse.py input_file'
return
result = parse(argv[0])
print result
if __name__ == "__main__":
main(sys.argv[1:])
The above is what you need. You can modify it to parse the output of lsblk better.
Usage:
1. Save the output of lsblk to a file.
E.g. run this command: lsblk > output.txt
2. python parse.py output.txt
I remixed minhhn2910's answer for my own purposes to work with encrypted partitions, labels and build the output in a tree-like dict object. I'll probably keep a more updated version as I hit edge-cases on GitHub, but here is the basic code:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import re
import pprint
def parse_blk(blk_filename):
result = []
with open(blk_filename) as blk_file:
disks = []
for line in blk_file:
if line.startswith('NAME'): # skip first line
continue
blk_list = re.split('\s+', line)
node_type = blk_list[5]
node_size = blk_list[3]
if node_type in set(['disk', 'loop']):
# new disk
disk = {'name': blk_list[0], 'type': node_type, 'size': node_size}
if node_type == 'disk':
disk['partitions'] = []
disks.append(disk)
# get size info if relevant
continue
if node_type in set(['part', 'dm']):
# new partition (or whatever dm is)
node_name = blk_list[0].split('\x80')[1]
partition = {'name': node_name, 'type': node_type, 'size': node_size}
disk['partitions'].append(partition)
continue
if len(blk_list) > 8: # if node_type == 'crypt':
# crypt belonging to a partition
node_name = blk_list[1].split('\x80')[1]
partition['crypt'] = node_name
return disks
def main(argv):
if len(argv)>1:
print 'Usage: ./parse.py blk_filename'
return
result = parse_blk(argv[0])
pprint.PrettyPrinter(indent=4).pprint(result)
if __name__ == "__main__":
main(sys.argv[1:])
It works for your output as well:
$ python check_partitions.py blkout2.txt
[ { 'name': 'xvdf',
'partitions': [ { 'name': 'xvdf1', 'size': '350M', 'type': 'part'},
{ 'name': 'xvdf2', 'size': '34.7G', 'type': 'part'}],
'size': '35G',
'type': 'disk'},
{ 'name': 'xvdp',
'partitions': [{ 'name': 'xvdp1', 'size': '8G', 'type': 'part'}],
'size': '8G',
'type': 'disk'}]
This is how it works on a slightly more complicated scenario with docker loopback devices and encrypted partitions.
$ python check_partitions.py blkout.txt
[ { 'name': 'sda',
'partitions': [ { 'crypt': 'cloudfleet-swap',
'name': 'sda1',
'size': '2G',
'type': 'part'},
{ 'crypt': 'cloudfleet-storage',
'name': 'sda2',
'size': '27.7G',
'type': 'part'}],
'size': '29.7G',
'type': 'disk'},
{ 'name': 'loop0', 'size': '100G', 'type': 'loop'},
{ 'name': 'loop1', 'size': '2G', 'type': 'loop'},
{ 'name': 'mmcblk0',
'partitions': [{ 'name': 'mmcblk0p1',
'size': '7.4G',
'type': 'part'}],
'size': '7.4G',
'type': 'disk'}]
I am parsing JSON that stores various code snippets and I am first building a dictionary of languages used by these snippets:
snippets = {'python': {}, 'text': {}, 'php': {}, 'js': {}}
Then when looping through the JSON I'm wanting add the information about the snippet into its own dictionary to the dictionary listed above. For example, if I had a JS snippet - the end result would be:
snippets = {'js':
{"title":"Script 1","code":"code here", "id":"123456"}
{"title":"Script 2","code":"code here", "id":"123457"}
}
Not to muddy the waters - but in PHP working on a multi-dimensional array I would just do the following (I am lookng for something similiar):
snippets['js'][] = array here
I know I saw one or two people talking about how to create a multidimensional dictionary - but can't seem to track down adding a dictionary to a dictionary within python. Thanks for the help.
This is called autovivification:
You can do it with defaultdict
def tree():
return collections.defaultdict(tree)
d = tree()
d['js']['title'] = 'Script1'
If the idea is to have lists, you can do:
d = collections.defaultdict(list)
d['js'].append({'foo': 'bar'})
d['js'].append({'other': 'thing'})
The idea for defaultdict it to create automatically the element when the key is accessed. BTW, for this simple case, you can simply do:
d = {}
d['js'] = [{'foo': 'bar'}, {'other': 'thing'}]
From
snippets = {'js':
{"title":"Script 1","code":"code here", "id":"123456"}
{"title":"Script 2","code":"code here", "id":"123457"}
}
It looks to me like you want to have a list of dictionaries. Here is some python code that should hopefully result in what you want
snippets = {'python': [], 'text': [], 'php': [], 'js': []}
snippets['js'].append({"title":"Script 1","code":"code here", "id":"123456"})
snippets['js'].append({"title":"Script 1","code":"code here", "id":"123457"})
print(snippets['js']) #[{'code': 'code here', 'id': '123456', 'title': 'Script 1'}, {'code': 'code here', 'id': '123457', 'title': 'Script 1'}]
Does that make it clear?
Is there a way to define a XPath type query for nested python dictionaries.
Something like this:
foo = {
'spam':'eggs',
'morefoo': {
'bar':'soap',
'morebar': {'bacon' : 'foobar'}
}
}
print( foo.select("/morefoo/morebar") )
>> {'bacon' : 'foobar'}
I also needed to select nested lists ;)
This can be done easily with #jellybean's solution:
def xpath_get(mydict, path):
elem = mydict
try:
for x in path.strip("/").split("/"):
try:
x = int(x)
elem = elem[x]
except ValueError:
elem = elem.get(x)
except:
pass
return elem
foo = {
'spam':'eggs',
'morefoo': [{
'bar':'soap',
'morebar': {
'bacon' : {
'bla':'balbla'
}
}
},
'bla'
]
}
print xpath_get(foo, "/morefoo/0/morebar/bacon")
[EDIT 2016] This question and the accepted answer are ancient. The newer answers may do the job better than the original answer. However I did not test them so I won't change the accepted answer.
One of the best libraries I've been able to identify, which, in addition, is very actively developed, is an extracted project from boto: JMESPath. It has a very powerful syntax of doing things that would normally take pages of code to express.
Here are some examples:
search('foo | bar', {"foo": {"bar": "baz"}}) -> "baz"
search('foo[*].bar | [0]', {
"foo": [{"bar": ["first1", "second1"]},
{"bar": ["first2", "second2"]}]}) -> ["first1", "second1"]
search('foo | [0]', {"foo": [0, 1, 2]}) -> [0]
There is an easier way to do this now.
http://github.com/akesterson/dpath-python
$ easy_install dpath
>>> dpath.util.search(YOUR_DICTIONARY, "morefoo/morebar")
... done. Or if you don't like getting your results back in a view (merged dictionary that retains the paths), yield them instead:
$ easy_install dpath
>>> for (path, value) in dpath.util.search(YOUR_DICTIONARY, "morefoo/morebar", yielded=True)
... and done. 'value' will hold {'bacon': 'foobar'} in that case.
Not exactly beautiful, but you might use sth like
def xpath_get(mydict, path):
elem = mydict
try:
for x in path.strip("/").split("/"):
elem = elem.get(x)
except:
pass
return elem
This doesn't support xpath stuff like indices, of course ... not to mention the / key trap unutbu indicated.
There is the newer jsonpath-rw library supporting a JSONPATH syntax but for python dictionaries and arrays, as you wished.
So your 1st example becomes:
from jsonpath_rw import parse
print( parse('$.morefoo.morebar').find(foo) )
And the 2nd:
print( parse("$.morefoo[0].morebar.bacon").find(foo) )
PS: An alternative simpler library also supporting dictionaries is python-json-pointer with a more XPath-like syntax.
dict > jmespath
You can use JMESPath which is a query language for JSON, and which has a python implementation.
import jmespath # pip install jmespath
data = {'root': {'section': {'item1': 'value1', 'item2': 'value2'}}}
jmespath.search('root.section.item2', data)
Out[42]: 'value2'
The jmespath query syntax and live examples: http://jmespath.org/tutorial.html
dict > xml > xpath
Another option would be converting your dictionaries to XML using something like dicttoxml and then use regular XPath expressions e.g. via lxml or whatever other library you prefer.
from dicttoxml import dicttoxml # pip install dicttoxml
from lxml import etree # pip install lxml
data = {'root': {'section': {'item1': 'value1', 'item2': 'value2'}}}
xml_data = dicttoxml(data, attr_type=False)
Out[43]: b'<?xml version="1.0" encoding="UTF-8" ?><root><root><section><item1>value1</item1><item2>value2</item2></section></root></root>'
tree = etree.fromstring(xml_data)
tree.xpath('//item2/text()')
Out[44]: ['value2']
Json Pointer
Yet another option is Json Pointer which is an IETF spec that has a python implementation:
https://github.com/stefankoegl/python-json-pointer
From the jsonpointer-python tutorial:
from jsonpointer import resolve_pointer
obj = {"foo": {"anArray": [ {"prop": 44}], "another prop": {"baz": "A string" }}}
resolve_pointer(obj, '') == obj
# True
resolve_pointer(obj, '/foo/another%20prop/baz') == obj['foo']['another prop']['baz']
# True
>>> resolve_pointer(obj, '/foo/anArray/0') == obj['foo']['anArray'][0]
# True
If terseness is your fancy:
def xpath(root, path, sch='/'):
return reduce(lambda acc, nxt: acc[nxt],
[int(x) if x.isdigit() else x for x in path.split(sch)],
root)
Of course, if you only have dicts, then it's simpler:
def xpath(root, path, sch='/'):
return reduce(lambda acc, nxt: acc[nxt],
path.split(sch),
root)
Good luck finding any errors in your path spec tho ;-)
Another alternative (besides that suggested by jellybean) is this:
def querydict(d, q):
keys = q.split('/')
nd = d
for k in keys:
if k == '':
continue
if k in nd:
nd = nd[k]
else:
return None
return nd
foo = {
'spam':'eggs',
'morefoo': {
'bar':'soap',
'morebar': {'bacon' : 'foobar'}
}
}
print querydict(foo, "/morefoo/morebar")
More work would have to be put into how the XPath-like selector would work.
'/' is a valid dictionary key, so how would
foo={'/':{'/':'eggs'},'//':'ham'}
be handled?
foo.select("///")
would be ambiguous.
Is there any reason for you to the query it the way like the XPath pattern? As the commenter to your question suggested, it just a dictionary, so you can access the elements in a nest manner. Also, considering that data is in the form of JSON, you can use simplejson module to load it and access the elements too.
There is this project JSONPATH, which is trying to help people do opposite of what you intend to do (given an XPATH, how to make it easily accessible via python objects), which seems more useful.
def Dict(var, *arg, **kwarg):
""" Return the value of an (imbricated) dictionnary, if all fields exist else return "" unless "default=new_value" specified as end argument
Avoid TypeError: argument of type 'NoneType' is not iterable
Ex: Dict(variable_dict, 'field1', 'field2', default = 0)
"""
for key in arg:
if isinstance(var, dict) and key and key in var: var = var[key]
else: return kwarg['default'] if kwarg and 'default' in kwarg else "" # Allow Dict(var, tvdbid).isdigit() for example
return kwarg['default'] if var in (None, '', 'N/A', 'null') and kwarg and 'default' in kwarg else "" if var in (None, '', 'N/A', 'null') else var
foo = {
'spam':'eggs',
'morefoo': {
'bar':'soap',
'morebar': {'bacon' : 'foobar'}
}
}
print Dict(foo, 'morefoo', 'morebar')
print Dict(foo, 'morefoo', 'morebar', default=None)
Have a SaveDict(value, var, *arg) function that can even append to lists in dict...
I reference form this link..
Following code is for json xpath base parse implemented in python :
import json
import xmltodict
# Parse the json string
class jsonprase(object):
def __init__(self, json_value):
try:
self.json_value = json.loads(json_value)
except Exception :
raise ValueError('must be a json str value')
def find_json_node_by_xpath(self, xpath):
elem = self.json_value
nodes = xpath.strip("/").split("/")
for x in range(len(nodes)):
try:
elem = elem.get(nodes[x])
except AttributeError:
elem = [y.get(nodes[x]) for y in elem]
return elem
def datalength(self, xpath="/"):
return len(self.find_json_node_by_xpath(xpath))
#property
def json_to_xml(self):
try:
root = {"root": self.json_value}
xml = xmltodict.unparse(root, pretty=True)
except ArithmeticError :
pyapilog().error(e)
return xml
Test Json :
{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 2675,
"params": {
"q": "TxnInitTime:[2021-11-01T00:00:00Z TO 2021-11-30T23:59:59Z] AND Status:6",
"stats": "on",
"stats.facet": "CountryCode",
"rows": "0",
"wt": "json",
"stats.field": "ItemPrice"
}
},
"response": {
"numFound": 15162439,
"start": 0,
"maxScore": 1.8660598,
"docs": []
}
}
Test Code to read the values from above input json.
numFound = jsonprase(ABOVE_INPUT_JSON).find_json_node_by_xpath('/response/numFound')
print(numFound)