Python xml etree add sub element containing prefix to xml file

Python xml etree add sub element containing prefix to xml file - python

I need to add an element to an existing xml file. The file define a namespace like this:
<Document xmlns:idPkg="http://ns.adobe.com/AdobeInDesign/idml/1.0/packaging" ></Document>
The element that i need to add:
<idPkg:Story src="Stories/Story_main.xml" />
To obtain:
<Document xmlns:idPkg="http://ns.adobe.com/AdobeInDesign/idml/1.0/packaging">
<idPkg:Story src="Stories/Story_main.xml" />
</Document>
I've tried this:
designmap = ET.parse(os.path.join(path, "designmap.xml"))
designmap.getroot().append(ET.fromstring(f"<idPkg:Story src=\"Stories/Story_{self.id}.xml\" />"))
designmap.write(os.path.join(path, "designmap.xml"))
But I get this error:
xml.etree.ElementTree.ParseError: unbound prefix: line 1, column 0
due to the parser not being able to find the prefix.
Is there a work around?

Related

Delete Element from XML file using python

I have been trying to delete the structuredBody element (which is within a component element) within the following Document, but my code seems to not work.
The structure of the XML source file simplified:
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
...
<component>
<structuredBody>
...
...
</structuredBody>
</component>
</ClinicalDocument>
Here is the code I'm using:
import xml.etree.ElementTree as ET
from lxml import objectify, etree
cda_tree = etree.parse('ELGA-023-Entlassungsbrief_aerztlich_EIS-FullSupport.xml')
cda_root = cda_tree.getroot()
for e in cda_root:
ET.register_namespace("", "urn:hl7-org:v3")
for node in cda_tree.xpath('//component/structuredBody'):
node.getparent().remove(node)
cda_tree.write('newXML.xml')
Whenever I run the code, the newXML.xml file still has the structuredBody element.
Thanks in advance!

Based on your most recent edit, I think you'll find the problem is that your for loop isn't matching any nodes. Your document doesn't contain any elements named component or structuredBody. The xmlns="urn:hl7-org:v3" declaration on the root element mean that all elements in the document exist by default in that particular namespace, so you need to use that namespace when matching elements:
from lxml import objectify, etree
cda_tree = etree.parse('data.xml')
cda_root = cda_tree.getroot()
ns = {
'hl7': 'urn:hl7-org:v3',
}
for node in cda_tree.xpath('//hl7:component/hl7:structuredBody', namespaces=ns):
node.getparent().remove(node)
cda_tree.write('newXML.xml')
With the above code, if the input looks like this:
<ClinicalDocument
xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<component>
<structuredBody>
...
...
</structuredBody>
</component>
</ClinicalDocument>
The output looks like:
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<component>
</component>
</ClinicalDocument>

How to deal with xmlns values while parsing an XML file?

I have the following toy example of an XML file. I have thousands of these. I have difficulty parsing this file.
Look at the text in second line. All my original files contain this text. When I delete i:type="Record" xmlns="http://schemas.datacontract.org/Storage" from second line (retaining the remaining text), I am able to get accelx and accely values using the code given below.
How can I parse this file with the original text?
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfRecord xmlns:i="http://www.w3.org/2001/XMLSchema-instance" i:type="Record" xmlns="http://schemas.datacontract.org/Storage">
<AvailableCharts>
<Accelerometer>true</Accelerometer>
<Velocity>false</Velocity>
</AvailableCharts>
<Trics>
<Trick>
<EndOffset>PT2M21.835S</EndOffset>
<Values>
<TrickValue>
<Acceleration>26.505801694441629</Acceleration>
<Rotation>0.023379150593228679</Rotation>
</TrickValue>
</Values>
</Trick>
</Trics>
<Values>
<SensorValue>
<accelx>-3.593643144</accelx>
<accely>7.316485176</accely>
</SensorValue>
<SensorValue>
<accelx>0.31103436</accelx>
<accely>7.70408184</accely>
</SensorValue>
</Values>
</ArrayOfRecord>
Code to parse the data:
import lxml.etree as etree
tree = etree.parse(r"C:\testdel.xml")
root = tree.getroot()
val_of_interest = root.findall('./Values/SensorValue')
for sensor_val in val_of_interest:
print sensor_val.find('accelx').text
print sensor_val.find('accely').text
I asked related question here: How to extract data from xml file that is deep down the tag
Thanks

The confusion was caused by the following default namespace (namespace declared without prefix) :
xmlns="http://schemas.datacontract.org/Storage"
Note that descendants elements without prefix inherit default namespace from ancestor, implicitly. Now, to reference element in namespace, you need to map a prefix to the namespace URI, and use that prefix in your XPath :
ns = {'d': 'http://schemas.datacontract.org/Storage' }
val_of_interest = root.findall('./d:Values/d:SensorValue', ns)
for sensor_val in val_of_interest:
print sensor_val.find('d:accelx', ns).text
print sensor_val.find('d:accely', ns).text

python lxml insert xml file into xml string

I have an XML file similar to this:
<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>
What I need to do is embed it into the following XML replacing the <object></object> element with everything in the file.
<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>
How do I go about doing this?

This is one possible way to replace an element with another element using lxml (see comments for how it works) :
....
....
#assume that 'tree' is variable containing the parsed template XML...
#and 'content_tree' is variable containing the actual content to be embedded, parsed
#get the container element to be replaced
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
#get parent of the container element
parent = container.getparent()
#replace container element with the actual content element
parent.replace(container, content_tree)
And this is a working demo example :
import lxml.etree as etree
file_content = '''<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>'''
template = '''<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>'''
tree = etree.fromstring(template)
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
parent = container.getparent()
content_tree = etree.fromstring(file_content)
parent.replace(container, content_tree)
print(etree.tostring(tree))

Remove namespace with xmltodict in Python

xmltodict converts XML to a Python dictionary. It supports namespaces. I can follow the example on the homepage and successfully remove a namespace. However, I cannot remove the namespace from my XML and cannot identify why? Here is my XML:
<?xml version="1.0" encoding="UTF-8"?>
<status xmlns:mystatus="http://localhost/mystatus">
<section1
mystatus:field1="data1"
mystatus:field2="data2" />
<section2
mystatus:lineA="outputA"
mystatus:lineB="outputB" />
</status>
And using:
xmltodict.parse(xml,process_namespaces=True,namespaces={'http://localhost/mystatus':None})
I get:
OrderedDict([(u'status', OrderedDict([(u'section1', OrderedDict([(u'#http://localhost/mystatus:field1', u'data1'), (u'#http://localhost/mystatus:field2', u'data2')])), (u'section2', OrderedDict([(u'#http://localhost/mystatus:lineA', u'outputA'), (u'#http://localhost/mystatus:lineB', u'outputB')]))]))])
instead of:
OrderedDict([(u'status', OrderedDict([(u'section1', OrderedDict([(u'field1', u'data1'), (u'field2', u'data2')])), (u'section2', OrderedDict([(u'lineA', u'outputA'), (u'#lineB', u'outputB')]))]))])
Am I making some simple mistake, or is there something about my XML that prevents the process_namespace modification from working correctly?

xmltodict is based on expat, so namespaces should applied to the class name, not attribute names:
<?xml version="1.0" encoding="UTF-8"?>
<status xmlns:mystatus="http://localhost/mystatus">
<mystatus:section1 field1="data1" field2="data2" />
<mystatus:section2 lineA="outputA" lineB="outputB" />
</status>
When parsed with:
foo = xmltodict.parse(xml,
process_namespaces=True,
namespaces={'http://localhost/mystatus':None})
outputs:
OrderedDict([(u'status', OrderedDict([(u'section1', OrderedDict([(u'#field1', u'data1'), (u'#field2', u'data2')])), (u'section2', OrderedDict([(u'#lineA', u'outputA'), (u'#lineB', u'outputB')]))]))])
Accessing it is easy:
# Get attribute 'lineA' from class 'section2' from class 'status'
>>> foo.get('status').get('section2').get('#lineA')
u'outputA'
Attribute namespaces are only required when you have multiple attributes of the same name (e.g. multiple id's or multiple prices, etc), in which case, I couldn't get expat or xmltodict to parse it correctly. YMMV though.

Python ElementTree find() not matching within kml file

I'm trying to find an element from a kml file using element trees as follows:
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse("history-03-02-2012.kml")
p = tree.find(".//name")
A sufficient subset of the file to demonstrate the problem follows:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<name>Location history from 03/03/2012 to 03/10/2012</name>
</Document>
</kml>
A "name" element exists; why does the search come back empty?

The name element you're trying to match is actually within the KML namespace, but you aren't searching with that namespace in mind.
Try:
p = tree.find(".//{http://www.opengis.net/kml/2.2}name")
If you were using lxml's XPath instead of the standard-library ElementTree, you'd instead pass the namespace in as a dictionary:
>>> tree = lxml.etree.fromstring('''<kml xmlns="http://www.opengis.net/kml/2.2">
... <Document>
... <name>Location history from 03/03/2012 to 03/10/2012</name>
... </Document>
... </kml>''')
>>> tree.xpath('//kml:name', namespaces={'kml': "http://www.opengis.net/kml/2.2"})
[<Element {http://www.opengis.net/kml/2.2}name at 0x23afe60>]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python xml etree add sub element containing prefix to xml file - python

Related

Delete Element from XML file using python

How to deal with xmlns values while parsing an XML file?

python lxml insert xml file into xml string

Remove namespace with xmltodict in Python

Python ElementTree find() not matching within kml file

Categories

Resources