Fail to use xmlschema.from_json - python

I had being look for the answer left and right. Even stackoverflow only has 1 similar question, but the answer does not work in my case. I fail to validate the xml and keep getting this error:
"unable to select an element for decoding data, provide a valid 'path' argument."
My goal is converting the json data to xml with validation. Anyone has any idea?
Below is my simple code:
import xmlschema
import json
from xml.etree import ElementTree
my_xsd = '<?xml version="1.0"?><schema targetNamespace = "urn:oasis:names:tc:emergency:cap:1.2" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <element name="note" type="xs:string"/><element name="age" type="xs:integer"/> </schema>'
schema = xmlschema.XMLSchema(my_xsd)
#jdata = xmlschema.to_json(xml_document = """<note>this is a Note text</note>""", schema = schema)
#jsonData = json.dumps(jdata)
data = json.dumps({'note': 'this is a Note text','age':'5'})
#print (jdata)
#print (jsonData)
print(data)
xml = xmlschema.from_json(data, schema=schema)
ElementTree.dump(xml)

I requested help from xmlschema creator and it turns out I need to have extra parameters as:
from_json(jsonTxt ,schema = CAPSchema, preserve_root=True, namespaces={'': 'urn:oasis:names:tc:emergency:cap:1.2'})

Related

Loading a YAML-file throws an error in Python

I am reading in some YAML-files like this:
data = yaml.safe_load(pathtoyamlfile)
When doing so I get the followin error:
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:value'
When checking for the line of the YAML-file which is also given in the error messages I recognized that there is always this key-value-pair: simple: =.
Since the YAML-files are autogenerated I am not sure if I can change the files themselves. Is there a way on reading the data of the YAML-files none the less?
It looks like you have hit this bug. There is a workaround suggested in the comments.
Given this content in example.yaml:
example: =
This code fails as you've described in your question:
import yaml
with open('example.yaml') as fd:
data = yaml.safe_load(fd)
print(data)
But this works:
import yaml
yaml.SafeLoader.yaml_implicit_resolvers.pop('=')
with open('example.yaml') as fd:
data = yaml.safe_load(fd)
print(data)
And outputs:
{'example': '='}
If you cannot change the input, you might be able to upgrade the library that you use:
import sys
import ruamel.yaml
yaml_str = """\
example: =
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
for key, value in data.items():
print(f'{key}: {value}')
which gives:
example: =
Please be aware that ruamel.yaml still has this bug in its safe mode loading ( YAML(typ='safe') ).

How to write ElementTree generated by iterparse into an xml file

Please, Note: Novice user of Python.
Hi,
I am working with more than 1Gb of XML file. Using Python2.7. Initially, I was using 'iter' to parse the XML. It worked fine with small files but with file such big I was getting a memory error. Then, I read the documentation and found out that iter load the whole file into memory at once and I should use iterparse. I used and able to load the xml file and make modification while I parse it.
The problem I am facing now is how to write this parsed element tree into a file. The methods I found on Google were suggesting 'write' method of ElementTree which was parsed using 'iter' but mine is parsed using iterparse.
Below is my code snippet. I had commented lines because inner logic of code is pretty big. The only part where I am struggling is writing the updated tree into 'output_pre' file.
The structure of my xml file is like this:
<users>
<user pin=''>
</user>
<user pin=''>
</user>
</users>
Code(inner logic has been removed):
----------------Parser---------------------------
import xml.etree.cElementTree as ET2
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element
output_pre = open("pre_ouput.xml", 'w')
tree = ET2.iterparse("temp-output-preliminary.xml")
for event, elem in tree:
if elem.tag == "users":
pass
if elem.tag == "user":
userContent = list(elem)
#Number of children will help filter dummy users in user-state file.
numberOfChildren = len(userContent)
#assert numberOfChildren != 3
PIN = elem.get('pin')
assert PIN is not None
analysing += 1
logger.info ("Analysing user number: %d", analysing)
if numberOfChildren <= 2:
if numberOfChildren >=4:
if numberOfChildren == 3:
for e in ids:
node = ET2.Element("property", {eid: PROV_DATA})
elem.append(node)
container_id_set.add(e)
tree.write(output_pre, encoding='unicode')
output_pre.write("\n</perk-users")
output_pre.close()
Thanks!

How to get sub attribute value in Python

I have an xml file like this.
<approved-by approved="yes">
<person>
<name>XXX/name><signature>XXXXX</signature>
<location>XX</location><company>XXX</company><department>XX</department>
</person>
</approved-by>
<revision-history suppress="yes">
<rev-info>
<rev>PA1</rev>
<date><y>2013</y><m>01</m><d>22</d></date>
I need to retrieve values of 'rev' from all xmls and the value of approved. I want whtr the doc is approved or not. I have a script like this.
from xml.dom.minidom import parse, parseString
import os
import sys
def shahul(dir):
for r,d,f in os.walk(dir):
for files in f:
if files.endswith(".xml"):
dom=parse(os.path.join(r, files))
name = dom.getElementsByTagName('rev')
title = dom.getElementsByTagName('title')
approved=dom.getAttribute('approved')
print (files, title[0].firstChild.nodeValue,name[0].firstChild.nodeValue, approved, sep='\t')
shahul("location")
I am able to get the value under 'rev' but i am not able to get value of the attribute 'approved-by'. I understand my syntax is not right to get the value of approved, but i dont know it.
i need the following as output.
FILE_NAME, Title, PA1, yes
Please guide me.
Assuming that you have only one approved-by tag in the XML:
Change:
approved = dom.getAttribute('approved')
To:
approved_by = dom.getElementsByTagName('approved-by')
approved = approved_by[0].attributes['approved'].value

Python: Update XML-file using ElementTree while conserving layout as much as possible

I have a document which uses an XML namespace for which I want to increase /group/house/dogs by one: (the file is called houses.xml)
<?xml version="1.0"?>
<group xmlns="http://dogs.house.local">
<house>
<id>2821</id>
<dogs>2</dogs>
</house>
</group>
My current result using the code below is: (the created file is called houses2.xml)
<ns0:group xmlns:ns0="http://dogs.house.local">
<ns0:house>
<ns0:id>2821</ns0:id>
<ns0:dogs>3</ns0:dogs>
</ns0:house>
</ns0:group>
I would like to fix two things (if it is possible using ElementTree. If it isn´t, I´d be greatful for a suggestion as to what I should use instead):
I want to keep the <?xml version="1.0"?> line.
I do not want to prefix all tags, I´d like to keep it as is.
In conclusion, I don´t want to mess with the document more than I absolutely have to.
My current code (which works except for the above mentioned flaws) generating the above result follows.
I have made a utility function which loads an XML file using ElementTree and returns the elementTree and the namespace (as I do not want to hard code the namespace, and am willing to take the risk it implies):
def elementTreeRootAndNamespace(xml_file):
from xml.etree import ElementTree
import re
element_tree = ElementTree.parse(xml_file)
# Search for a namespace on the root tag
namespace_search = re.search('^({\S+})', element_tree.getroot().tag)
# Keep the namespace empty if none exists, if a namespace exists set
# namespace to {namespacename}
namespace = ''
if namespace_search:
namespace = namespace_search.group(1)
return element_tree, namespace
This is my code to update the number of dogs and save it to the new file houses2.xml:
elementTree, namespace = elementTreeRootAndNamespace('houses.xml')
# Insert the namespace before each tag when when finding current number of dogs,
# as ElementTree requires the namespace to be prefixed within {...} when a
# namespace is used in the document.
dogs = elementTree.find('{ns}house/{ns}dogs'.format(ns = namespace))
# Increase the number of dogs by one
dogs.text = str(int(dogs.text) + 1)
# Write the result to the new file houses2.xml.
elementTree.write('houses2.xml')
An XML based solution to this problem is to write a helper class for ElementTree which:
Grabs the XML-declaration line before parsing as ElementTree at the time of writing is unable to write an XML-declaration line without also writing an encoding attribute(I checked the source).
Parses the input file once, grabs the namespace of the root element. Registers that namespace with ElementTree as having the empty string as prefix. When that is done the source file is parsed using ElementTree again, with that new setting.
It has one major drawback:
XML-comments are lost. Which I have learned is not acceptable for this situation(I initially didn´t think the input data had any comments, but it turns out it has).
My helper class with example:
from xml.etree import ElementTree as ET
import re
class ElementTreeHelper():
def __init__(self, xml_file_name):
xml_file = open(xml_file_name, "rb")
self.__parse_xml_declaration(xml_file)
self.element_tree = ET.parse(xml_file)
xml_file.seek(0)
root_tag_namespace = self.__root_tag_namespace(self.element_tree)
self.namespace = None
if root_tag_namespace is not None:
self.namespace = '{' + root_tag_namespace + '}'
# Register the root tag namespace as having an empty prefix, as
# this has to be done before parsing xml_file we re-parse.
ET.register_namespace('', root_tag_namespace)
self.element_tree = ET.parse(xml_file)
def find(self, xpath_query):
return self.element_tree.find(xpath_query)
def write(self, xml_file_name):
xml_file = open(xml_file_name, "wb")
if self.xml_declaration_line is not None:
xml_file.write(self.xml_declaration_line + '\n')
return self.element_tree.write(xml_file)
def __parse_xml_declaration(self, xml_file):
first_line = xml_file.readline().strip()
if first_line.startswith('<?xml') and first_line.endswith('?>'):
self.xml_declaration_line = first_line
else:
self.xml_declaration_line = None
xml_file.seek(0)
def __root_tag_namespace(self, element_tree):
namespace_search = re.search('^{(\S+)}', element_tree.getroot().tag)
if namespace_search is not None:
return namespace_search.group(1)
else:
return None
def __main():
el_tree_hlp = ElementTreeHelper('houses.xml')
dogs_tag = el_tree_hlp.element_tree.getroot().find(
'{ns}house/{ns}dogs'.format(
ns=el_tree_hlp.namespace))
one_dog_added = int(dogs_tag.text.strip()) + 1
dogs_tag.text = str(one_dog_added)
el_tree_hlp.write('hejsan.xml')
if __name__ == '__main__':
__main()
The output:
<?xml version="1.0"?>
<group xmlns="http://dogs.house.local">
<house>
<id>2821</id>
<dogs>3</dogs>
</house>
</group>
If someone has an improvement to this solution please don´t hesitate to grab the code and improve it.
Round-tripping, unfortunately, isn't a trivial problem. With XML, it's generally not possible to preserve the original document unless you use a special parser (like DecentXML but that's for Java).
Depending on your needs, you have the following options:
If you control the source and you can secure your code with unit tests, you can write your own, simple parser. This parser doesn't accept XML but only a limited subset. You can, for example, read the whole document as a string and then use Python's string operations to locate <dogs> and replace anything up to the next <. Hack? Yes.
You can filter the output. XML allows the string <ns0: only in one place, so you can search&replace it with < and then the same with <group xmlns:ns0=" → <group xmlns=". This is pretty safe unless you can have CDATA in your XML.
You can write your own, simple XML parser. Read the input as a string and then create Elements for each pair of <> plus their positions in the input. That allows you to take the input apart quickly but only works for small inputs.
when Save xml add default_namespace argument is easy to avoid ns0, on my code
key code: xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)
if os.path.isfile(xmlfiile):
xmltree = ET.parse(xmlfiile)
root = xmltree.getroot()
xmlnamespace = root.tag.split('{')[1].split('}')[0] //get namespace
initwin=xmltree.find("./{"+ xmlnamespace +"}test")
initwin.find("./{"+ xmlnamespace +"}content").text = "aaa"
xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)
etree from lxml provides this feature.
elementTree.write('houses2.xml',encoding = "UTF-8",xml_declaration = True) helps you in not omitting the declaration
While writing into the file it does not change the namespaces.
http://lxml.de/parsing.html is the link for its tutorial.
P.S : lxml should be installed separately.

Convert JSON to XML in Python

I see a number of questions on SO asking about ways to convert XML to JSON, but I'm interested in going the other way. Is there a python library for converting JSON to XML?
Edit: Nothing came back right away, so I went ahead and wrote a script that solves this problem.
Python already allows you to convert from JSON into a native dict (using json or, in versions < 2.6, simplejson), so I wrote a library that converts native dicts into an XML string.
https://github.com/quandyfactory/dict2xml
It supports int, float, boolean, string (and unicode), array and dict data types and arbitrary nesting (yay recursion).
I'll post this as an answer once 8 hours have passed.
Nothing came back right away, so I went ahead and wrote a script that solves this problem.
Python already allows you to convert from JSON into a native dict (using json or, in versions < 2.6, simplejson), so I wrote a library that converts native dicts into an XML string.
https://github.com/quandyfactory/dict2xml
It supports int, float, boolean, string (and unicode), array and dict data types and arbitrary nesting (yay recursion).
If you don't have such a package, you can try:
def json2xml(json_obj, line_padding=""):
result_list = list()
json_obj_type = type(json_obj)
if json_obj_type is list:
for sub_elem in json_obj:
result_list.append(json2xml(sub_elem, line_padding))
return "\n".join(result_list)
if json_obj_type is dict:
for tag_name in json_obj:
sub_obj = json_obj[tag_name]
result_list.append("%s<%s>" % (line_padding, tag_name))
result_list.append(json2xml(sub_obj, "\t" + line_padding))
result_list.append("%s</%s>" % (line_padding, tag_name))
return "\n".join(result_list)
return "%s%s" % (line_padding, json_obj)
For example:
s='{"main" : {"aaa" : "10", "bbb" : [1,2,3]}}'
j = json.loads(s)
print(json2xml(j))
Result:
<main>
<aaa>
10
</aaa>
<bbb>
1
2
3
</bbb>
</main>
Load it into a dict using json.loads then use anything from this question...
Serialize Python dictionary to XML
Use dicttoxml to convert JSON directly to XML
Installation
pip install dicttoxml
or
easy_install dicttoxml
In [2]: from json import loads
In [3]: from dicttoxml import dicttoxml
In [4]: json_obj = '{"main" : {"aaa" : "10", "bbb" : [1,2,3]}}'
In [5]: xml = dicttoxml(loads(json_obj))
In [6]: print(xml)
<?xml version="1.0" encoding="UTF-8" ?><root><main type="dict"><aaa type="str">10</aaa><bbb type="list"><item type="int">1</item><item type="int">2</item><item type="int">3</item></bbb></main></root>
In [7]: xml = dicttoxml(loads(json_obj), attr_type=False)
In [8]: print(xml)
<?xml version="1.0" encoding="UTF-8" ?><root><main><aaa>10</aaa><bbb><item>1</item><item>2</item><item>3</item></bbb></main></root>
For more information on dicttoxml
I found xmltodict to be useful. Looks like it was released after some of the posts here. https://pypi.org/project/xmltodict/
import xmltodict
import json
sample_json = {"note": {"to": "Tove", "from": "Jani", "heading": "Reminder", "body": "Don't forget me this weekend!"}}
#############
#json to xml
#############
json_to_xml = xmltodict.unparse(sample_json)
print(json_to_xml)
#############
#xmlto json
#############
x_to_j_dict = xmltodict.parse(json_to_xml)
x_to_j_string = json.dumps(x_to_j_dict)
back_to_json = json.loads(x_to_j_string)
print(back_to_json)
from json import loads
from dicttoxml import dicttoxml
s='{"main" : {"aaa" : "10", "bbb" : [1,2,3]}}'
xml = dicttoxml(loads(s))
Or if your data is stored in a pandas data.frame as mine often is:
df['xml'] = df['json'].apply(lambda s: dicttoxml(json.loads(s))

Categories

Resources