Is there a package that converts xmltodict dictionaries to lxml trees?

Is there a package that converts xmltodict dictionaries to lxml trees? - python

The problem I have is this. I've started the XML creation using the dictionary structure used by xmltodict Python package so I can use the unparse method to create the XML. But I think I reached a point where xmltodict can't help me. I have actions in this dictionary format, highly nested each, something like this, just much more complex:
action = {
"#id": 1,
"some-nested-stuff":
{"#attr1": "string value", "child": True}
}
Now I need to group some actions similar to this:
<action id=1>...</action>
<action-group groupId=1>
<action id=2>...</action>
<action id=3>...</action>
</action-group>
<action id=4>...</action>
And yes, the first action needs to go before the action group and the fourth action after it. It seems impossible to do it with just xmltodict. I was thinking that I create the actions' XML tree as an lxml object from these dictionaries, and than I merge those objects into a whole XML. I think that it wouldn't be a big task, but there might be a ready package for that. Is there one?
The alternative solution — that I try to avoid if possible — is to rewrite the project from scratch using just lxml. Or is there a way to create that XML using just xmltodict but not the xml/lxml packages?

It seems that no such package. So far I have this solution. I doesn't handle #text keys and there can be problems with namespaces.
"""
Converts the dictionary used by xmltodict package to represent XMLs
to lxml.
"""
from typing import Dict, Any
from lxml import etree
XmlDictType = Dict[str, Any]
element = etree.Element("for-creating-types")
ElementType = type(element)
ElementTreeType = type(etree.ElementTree(element))
def convert(xml_dict: XmlDictType) -> ElementType:
root_name = list(xml_dict)[0]
inside_dict = xml_dict[root_name]
attrs, children = split_attrs_and_children(inside_dict)
root = etree.Element(root_name, **attrs)
convert_children(root, children)
return root
def split_attrs_and_children(xml_dict: XmlDictType) -> ElementType:
"""Split the categories and fix the types"""
def fix_types(v):
if isinstance(v, (int, float)):
return str(v)
elif isinstance(v, bool):
return {True: "true", False: "false"}[v]
else:
return v
attrs = {k[1:]: fix_types(v) for k, v in xml_dict.items() if k.startswith("#")}
children = {k: fix_types(v) for k, v in xml_dict.items() if not (k.startswith("#") or k.startswith("#"))}
return attrs, children
def convert_children(parent: ElementType, children: XmlDictType) -> ElementType:
for child_name, value in children.items():
if isinstance(value, dict):
attrs, children = split_attrs_and_children(value)
child = etree.SubElement(parent, child_name, **attrs)
convert_children(child, children)
elif isinstance(value, list):
for v in value:
child = etree.SubElement(parent, child_name).text = v
else:
child = etree.SubElement(parent, child_name).text = value
return parent
You can convert for example this dictionary:
xml_dict = {
"mydocument": {
"#has": "an attribute",
"and": {
"many": [
"elements",
"more elements"
]
},
"plus": {
"#a": "complex",
"#text": "element as well"
}
}
}
Note that the #text line is not included yet.

Related

How can I convert/transform a JSON tree structure to a merkle tree

I'm running a web server, where I receive data in JSON format and planning to store it in a NoSQL database. Here is an example:
data_example = {
"key1": "val1",
"key2": [1, 2, 3],
"key3": {
"subkey1": "subval1",
.
.
}
}
I had thoughts about using a Merkle tree to represent my data since JSON is also a tree-like structure.
Essentially, what I want to do is to store my data in (or as) a more secure decentralized tree-like structure. Many entities will have access to create, read, update or delete (CRUD) a record from it. These CRUD operations will ideally need to be verified from other entities in the network, which will also hold a copy of the database. Just like in blockchain.
I'm having a design/concept problem and I'm trying to understand how can I turn my JSON into a Merkle tree structure. This is my Node class:
class Node:
""" class that represents a node in a merkle tree"""
def __init__(data):
self.data = data
self.hash = self.calculate_some_hash() # based on the data or based on its child nodes
I'm interested in the conception/design of this as I couldn't figure out how this can work. Any idea how to save/store my data_example object in a Merkle tree? (is it possible?)

You can create a Merkle Tree by first converting your dictionary to a class object form, and then recursively traverse the tree, hashing the sum of the child node hashes. Since a Merkle Tree requires a single root node, any input dictionaries that have more than one key at the topmost level should become the child dictionary of an empty root node (with a default key of None):
data_example = {
"key1": "val1",
"key2": [1, 2, 3],
"key3": {
"subkey1": "subval1",
"subkey2": "subval2",
"subkey3": "subval3",
}
}
class MTree:
def __init__(self, key, value):
self.key, self.hash = key, None
self.children = value if not isinstance(value, (dict, list)) else self.__class__.build(value, False)
def compute_hashes(self):
#build hashes up from the bottom
if not isinstance(self.children, list):
self.hash = hash(self.children)
else:
self.hash = hash(sum([i.compute_hashes() for i in self.children]))
return self.hash
def update_kv(self, k, v):
#recursively update a value in the tree with an associated key
if self.key == k:
self.children = v
elif isinstance(self.children, list):
_ = [i.update_kv(k, v) for i in self.children]
def update_tree(self, payload):
#update key-value pairs in the tree from payload
for a, b in payload.items():
self.update_kv(a, b)
self.compute_hashes() #after update is complete, recompute the hashes
#classmethod
def build(cls, dval, root=True):
#convert non-hashible values to strings
vals = [i if isinstance(i, (list, tuple)) else (None, i) for i in getattr(dval, 'items', lambda :dval)()]
if root:
if len(vals) > 1:
return cls(None, dval)
return cls(vals[0][0], vals[0][-1])
return [cls(a, b) for a, b in vals]
def __repr__(self):
return f'{self.__class__.__name__}({self.hash}, {repr(self.children)})'
tree = MTree.build(data_example) #create the basic tree with the input dictionary
_ = tree.compute_hashes() #get the hashes for each node (predicated on its children)
print(tree)
Output:
MTree(-1231139208667999673, [MTree(-8069796171680625903, 'val1'), MTree(6, [MTree(1, 1), MTree(2, 2), MTree(3, 3)]), MTree(-78872064628455629, [MTree(-8491910191379857244, 'subval1'), MTree(1818926376495655970, 'subval2'), MTree(1982425731828357743, 'subval3')])])
Updating the tree with the contents from a payload:
tree.update_tree({"key1": "newVal1"})
Output:
MTree(1039734050960246293, [MTree(5730292134016089818, 'newVal1'), MTree(6, [MTree(1, 1), MTree(2, 2), MTree(3, 3)]), MTree(-78872064628455629, [MTree(-8491910191379857244, 'subval1'), MTree(1818926376495655970, 'subval2'), MTree(1982425731828357743, 'subval3')])])

Copy keys and list contents from JSON in python

I am trying to skim through a dictionary that contains asymmetrical data and make a list of unique headings. Aside from the normal key:value items, the data within the dictionary also includes other dictionaries, lists, lists of dictionaries, NoneTypes, and so on at various levels throughout. I would like to be able to keep the hierarchy of keys/indexes if possible. This will be used to assess the scope of the data and it's availability. The data comes from a JSON file and it's contents are subject to change.
My latest attempt is to do this through a series of type checks within a function, skim(), as seen below.
def skim(obj, header='', level=0):
if obj is None:
return
def skim_iterable(iterable):
lvl = level +1
if isinstance(iterable, (list, tuple)):
for value in iterable:
h = ':'.join([header, iterable.index(value)])
return skim(value, header=h, level=lvl)
elif isinstance(iterable, dict):
for key, value in iterable.items():
h = ':'.join([header, key])
return skim(value, header=h, level=lvl)
if isinstance(obj, (int, float, str, bool)):
return ':'.join([header, obj, level])
elif isinstance(obj, (list, dict, tuple)):
return skim_iterable(obj)
The intent is to make a recursive call to skim() until the key or list index position at the deepest level is passed and then returned. skim has a inner function that handles iterable objects which carries the level along with the key value or list index position forward through each nestled iterable object.
An example below
test = {"level_0Item_1": {
"level_1Item_1": {
"level_2Item_1": "value",
"level_2Item_2": "value"
},
"level_1Item_2": {
"level_2Item_1": "value",
"level_2Item_2": {}
}},
"level_0Item_2": [
{
"level_1Item_1": "value",
"level_1Item_2": 569028742
}
],
"level_0Item_3": []
}
collection = [skim(test)]
Right now I'm getting a return of [None] on the above code and would like some help troubleshooting or guidance on how best to approach this. What I was expecting is something like this:
['level_0Item_1:level_1Item_1:level_2Item_1',
'level_0Item_1:level_1Item_1:level_2Item_2',
'level_0Item_1:level_1Item_2:level_2Item_1',
'level_0Item_1:level_1Item_2:level_2Item_2',
'level_0Item_2:level_1Item_1',
'level_0Item_2:level_1Item_2',
'level_0Item_3]
Among other resources, I recently came across this question (python JSON complex objects (accounting for subclassing)), read it and it's included references. Full disclosure here, I've only began coding recently.
Thank you for your help.

You can try something like:
def skim(obj, connector=':', level=0, builded_str= ''):
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, dict) and v:
yield from skim(v, connector, level + 1, builded_str + k + connector)
elif isinstance(v, list) and v:
yield from skim(v[0], connector, level + 1, builded_str + k + connector)
else:
yield builded_str + k
else:
yield builded_str
Test:
test = {"level_0Item_1": {
"level_1Item_1": {
"level_2Item_1": "value",
"level_2Item_2": "value"
},
"level_1Item_2": {
"level_2Item_1": "value",
"level_2Item_2": {}
}},
"level_0Item_2": [
{
"level_1Item_1": "value",
"level_1Item_2": 569028742
}
],
"level_0Item_3": []
}
lst = list(skim(test))
print(lst)
['level_0Item_1:level_1Item_2:level_2Item_1`',
'level_0Item_1:level_1Item_2:level_2Item_2',
'level_0Item_1:level_1Item_1:level_2Item_1',
'level_0Item_1:level_1Item_1:level_2Item_2',
'level_0Item_2:level_1Item_2',
'level_0Item_2:level_1Item_1',
'level_0Item_3']`

Copy nested dictionary to another dictionary with all the levels

For some third party APIs, there is a huge data that needs to be sent in the API parameters. And input data comes to our application in the CSV format.
I receive all the rows of the CSV containing around 120 columns, in a plane dict format by CSV DictReader.
file_data_obj = csv.DictReader(open(file_path, 'rU'))
This gives me each row in following format:
CSV_PARAMS = {
'param7': "Param name",
'param6': ["some name"],
'param5': 1234,
'param4': 999999999,
'param3': "some ",
'param2': {"x name":"y_value"},
'param1': None,
'paramA': "",
'paramZ': 2.687
}
And there is one nested dictionary containing all the third-party API parameters as keys with blank value.
eg. API_PARAMS = {
"param1": "",
"param2": "",
"param3": "",
"paramAZ": {
"paramA": "",
"paramZ": {"test1":1234, "name":{"hello":1}},
...
},
"param67": {
"param6": "",
"param7": ""
},
...
}
I have to map all the CSV Values to API parameters dynamically. following code works but upto 3 level nesting only.
def update_nested_params(self, paramdict, inpdict, result={}):
"""Iterate over nested dictionary up to level 3 """
for k, v in paramdict.items():
if isinstance(v, dict):
for k1, v1 in v.items():
if isinstance(v1, dict):
for k2, _ in v1.items():
result.update({k:{k1:{k2: inpdict.get(k2, '')}}})
else:
result.update({k:{k1: inpdict.get(k1, '')}})
else:
result.update({k: inpdict.get(k, '')})
return result
self.update_nested_params(API_PARAMS, CSV_PARAMS)
Is there any other efficient way to achieve this for n number of nestings of the API Parameters?

You could use recursion:
def update_nested_params(self, template, source):
result = {}
for key, value in template.items():
if key in source:
result[key] = source[key]
elif not isinstance(value, dict):
# assume the template value is a default
result[key] = value
else:
# recurse
result[key] = self.update_nested_params(value, source)
return result
This copies the 'template' (API_PARAMS) recursively, taking any key it finds from source if available, and recurses if not but the value in template is another dictionary. This handles nesting up to sys.getrecursionlimit() levels (default 1000).
Alternatively, use an explicit stack:
# extra import to add at the module top
from collections import deque
def update_nested_params(self, template, source):
top_level = {}
stack = deque([(top_level, template)])
while stack:
result, template = stack.pop()
for key, value in template.items():
if key in source:
result[key] = source[key]
elif not isinstance(value, dict):
# assume the template value is a default
result[key] = value
else:
# push nested dict into the stack
result[key] = {}
stack.append((result[key], value))
return top_level
This essentially just moves the call stack used in recursion to an explicit stack. The order in which keys are processed changes from depth to breath first but this doesn’t matter for your specific problem.

Accessing json type data without knowing layout of data?

I have a file with JSON data I am loading using json.load.
Suppose I want to put a variable in the json data, which references another data field. How can I process this reference in python?
eg:
{
"dictionary" : {
"list_1" : [
"item_1"
],
"list_2" : [
"$dictionary.list_1"
]
}
}
when I come across $, I then want list_2 to grab the data from: dictionary.list_1
and extend list_2, as if I had written in my python code:
jsonData["dictionary"]["list_2"].extend(jsonData["dictionary"]["list_1"])

As far as I know, there is nothing in the JSON standard for doing references. My first suggestion would be to use YAML which does have references in the form of Node Anchors. Python has a good implementation of YAML which supports those.
That being said, if you're set on using JSON, you'll have to roll your own implementation.
One possible example(though this doesn't extend the current array by the referenced array because that's ambiguous in the case of dicts, it replaces the reference by the value it refers to) is below. Note that it doesn't handle malformed references you'll have to add the error-checking yourself or guarantee that there aren't malformed references. If you want to change it to extend instead of replacing, you can, but you know your use-case better than I so you'll be able to specify it that way. This is meant to give you a starting point.
def resolve_references(structure, sub_structure=None):
if sub_structure is None:
return resolve_references(structure, structure)
if isinstance(sub_structure, list):
tmp = []
for item in sub_structure:
tmp.append(resolve_references(structure, item))
return tmp
if isinstance(sub_structure, dict):
tmp = {}
for key,value in sub_structure.items():
tmp[key] = resolve_references(structure, value)
return tmp
if isinstance(sub_structure, str) or isinstance(sub_structure, unicode):
if sub_structure[0] != "$":
return sub_structure
keys = sub_structure[1:].split(".")
def get_value(obj, key):
if isinstance(obj, dict):
return obj[key]
if isinstance(obj, list):
return obj[int(key)]
return value
value = get_value(structure, keys[0])
for key in keys[1:]:
value = get_value(value, key)
return value
return sub_structure
Example usage:
>>> import json
>>> json_str = """
... {
... "dictionary" : {
... "list_1" : [
... "item_1"
... ],
...
... "list_2" : "$dictionary.list_1"
... }
... }
... """
>>> obj = json.loads(json_str)
>>> resolve_references(obj)
{u'dictionary': {u'list_2': [u'item_1'], u'list_1': [u'item_1']}}

Xpath like query for nested python dictionaries

Is there a way to define a XPath type query for nested python dictionaries.
Something like this:
foo = {
'spam':'eggs',
'morefoo': {
'bar':'soap',
'morebar': {'bacon' : 'foobar'}
}
}
print( foo.select("/morefoo/morebar") )
>> {'bacon' : 'foobar'}
I also needed to select nested lists ;)
This can be done easily with #jellybean's solution:
def xpath_get(mydict, path):
elem = mydict
try:
for x in path.strip("/").split("/"):
try:
x = int(x)
elem = elem[x]
except ValueError:
elem = elem.get(x)
except:
pass
return elem
foo = {
'spam':'eggs',
'morefoo': [{
'bar':'soap',
'morebar': {
'bacon' : {
'bla':'balbla'
}
}
},
'bla'
]
}
print xpath_get(foo, "/morefoo/0/morebar/bacon")
[EDIT 2016] This question and the accepted answer are ancient. The newer answers may do the job better than the original answer. However I did not test them so I won't change the accepted answer.

One of the best libraries I've been able to identify, which, in addition, is very actively developed, is an extracted project from boto: JMESPath. It has a very powerful syntax of doing things that would normally take pages of code to express.
Here are some examples:
search('foo | bar', {"foo": {"bar": "baz"}}) -> "baz"
search('foo[*].bar | [0]', {
"foo": [{"bar": ["first1", "second1"]},
{"bar": ["first2", "second2"]}]}) -> ["first1", "second1"]
search('foo | [0]', {"foo": [0, 1, 2]}) -> [0]

There is an easier way to do this now.
http://github.com/akesterson/dpath-python
$ easy_install dpath
>>> dpath.util.search(YOUR_DICTIONARY, "morefoo/morebar")
... done. Or if you don't like getting your results back in a view (merged dictionary that retains the paths), yield them instead:
$ easy_install dpath
>>> for (path, value) in dpath.util.search(YOUR_DICTIONARY, "morefoo/morebar", yielded=True)
... and done. 'value' will hold {'bacon': 'foobar'} in that case.

Not exactly beautiful, but you might use sth like
def xpath_get(mydict, path):
elem = mydict
try:
for x in path.strip("/").split("/"):
elem = elem.get(x)
except:
pass
return elem
This doesn't support xpath stuff like indices, of course ... not to mention the / key trap unutbu indicated.

There is the newer jsonpath-rw library supporting a JSONPATH syntax but for python dictionaries and arrays, as you wished.
So your 1st example becomes:
from jsonpath_rw import parse
print( parse('$.morefoo.morebar').find(foo) )
And the 2nd:
print( parse("$.morefoo[0].morebar.bacon").find(foo) )
PS: An alternative simpler library also supporting dictionaries is python-json-pointer with a more XPath-like syntax.

dict > jmespath
You can use JMESPath which is a query language for JSON, and which has a python implementation.
import jmespath # pip install jmespath
data = {'root': {'section': {'item1': 'value1', 'item2': 'value2'}}}
jmespath.search('root.section.item2', data)
Out[42]: 'value2'
The jmespath query syntax and live examples: http://jmespath.org/tutorial.html
dict > xml > xpath
Another option would be converting your dictionaries to XML using something like dicttoxml and then use regular XPath expressions e.g. via lxml or whatever other library you prefer.
from dicttoxml import dicttoxml # pip install dicttoxml
from lxml import etree # pip install lxml
data = {'root': {'section': {'item1': 'value1', 'item2': 'value2'}}}
xml_data = dicttoxml(data, attr_type=False)
Out[43]: b'<?xml version="1.0" encoding="UTF-8" ?><root><root><section><item1>value1</item1><item2>value2</item2></section></root></root>'
tree = etree.fromstring(xml_data)
tree.xpath('//item2/text()')
Out[44]: ['value2']
Json Pointer
Yet another option is Json Pointer which is an IETF spec that has a python implementation:
https://github.com/stefankoegl/python-json-pointer
From the jsonpointer-python tutorial:
from jsonpointer import resolve_pointer
obj = {"foo": {"anArray": [ {"prop": 44}], "another prop": {"baz": "A string" }}}
resolve_pointer(obj, '') == obj
# True
resolve_pointer(obj, '/foo/another%20prop/baz') == obj['foo']['another prop']['baz']
# True
>>> resolve_pointer(obj, '/foo/anArray/0') == obj['foo']['anArray'][0]
# True

If terseness is your fancy:
def xpath(root, path, sch='/'):
return reduce(lambda acc, nxt: acc[nxt],
[int(x) if x.isdigit() else x for x in path.split(sch)],
root)
Of course, if you only have dicts, then it's simpler:
def xpath(root, path, sch='/'):
return reduce(lambda acc, nxt: acc[nxt],
path.split(sch),
root)
Good luck finding any errors in your path spec tho ;-)

Another alternative (besides that suggested by jellybean) is this:
def querydict(d, q):
keys = q.split('/')
nd = d
for k in keys:
if k == '':
continue
if k in nd:
nd = nd[k]
else:
return None
return nd
foo = {
'spam':'eggs',
'morefoo': {
'bar':'soap',
'morebar': {'bacon' : 'foobar'}
}
}
print querydict(foo, "/morefoo/morebar")

More work would have to be put into how the XPath-like selector would work.
'/' is a valid dictionary key, so how would
foo={'/':{'/':'eggs'},'//':'ham'}
be handled?
foo.select("///")
would be ambiguous.

Is there any reason for you to the query it the way like the XPath pattern? As the commenter to your question suggested, it just a dictionary, so you can access the elements in a nest manner. Also, considering that data is in the form of JSON, you can use simplejson module to load it and access the elements too.
There is this project JSONPATH, which is trying to help people do opposite of what you intend to do (given an XPATH, how to make it easily accessible via python objects), which seems more useful.

def Dict(var, *arg, **kwarg):
""" Return the value of an (imbricated) dictionnary, if all fields exist else return "" unless "default=new_value" specified as end argument
Avoid TypeError: argument of type 'NoneType' is not iterable
Ex: Dict(variable_dict, 'field1', 'field2', default = 0)
"""
for key in arg:
if isinstance(var, dict) and key and key in var: var = var[key]
else: return kwarg['default'] if kwarg and 'default' in kwarg else "" # Allow Dict(var, tvdbid).isdigit() for example
return kwarg['default'] if var in (None, '', 'N/A', 'null') and kwarg and 'default' in kwarg else "" if var in (None, '', 'N/A', 'null') else var
foo = {
'spam':'eggs',
'morefoo': {
'bar':'soap',
'morebar': {'bacon' : 'foobar'}
}
}
print Dict(foo, 'morefoo', 'morebar')
print Dict(foo, 'morefoo', 'morebar', default=None)
Have a SaveDict(value, var, *arg) function that can even append to lists in dict...

I reference form this link..
Following code is for json xpath base parse implemented in python :
import json
import xmltodict
# Parse the json string
class jsonprase(object):
def __init__(self, json_value):
try:
self.json_value = json.loads(json_value)
except Exception :
raise ValueError('must be a json str value')
def find_json_node_by_xpath(self, xpath):
elem = self.json_value
nodes = xpath.strip("/").split("/")
for x in range(len(nodes)):
try:
elem = elem.get(nodes[x])
except AttributeError:
elem = [y.get(nodes[x]) for y in elem]
return elem
def datalength(self, xpath="/"):
return len(self.find_json_node_by_xpath(xpath))
#property
def json_to_xml(self):
try:
root = {"root": self.json_value}
xml = xmltodict.unparse(root, pretty=True)
except ArithmeticError :
pyapilog().error(e)
return xml
Test Json :
{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 2675,
"params": {
"q": "TxnInitTime:[2021-11-01T00:00:00Z TO 2021-11-30T23:59:59Z] AND Status:6",
"stats": "on",
"stats.facet": "CountryCode",
"rows": "0",
"wt": "json",
"stats.field": "ItemPrice"
}
},
"response": {
"numFound": 15162439,
"start": 0,
"maxScore": 1.8660598,
"docs": []
}
}
Test Code to read the values from above input json.
numFound = jsonprase(ABOVE_INPUT_JSON).find_json_node_by_xpath('/response/numFound')
print(numFound)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there a package that converts xmltodict dictionaries to lxml trees? - python

Related

How can I convert/transform a JSON tree structure to a merkle tree

Copy keys and list contents from JSON in python

Copy nested dictionary to another dictionary with all the levels

Accessing json type data without knowing layout of data?

Xpath like query for nested python dictionaries

Categories

Resources