Python xml etree add sub element containing prefix to xml file - python

I need to add an element to an existing xml file. The file define a namespace like this:
<Document xmlns:idPkg="http://ns.adobe.com/AdobeInDesign/idml/1.0/packaging" ></Document>
The element that i need to add:
<idPkg:Story src="Stories/Story_main.xml" />
To obtain:
<Document xmlns:idPkg="http://ns.adobe.com/AdobeInDesign/idml/1.0/packaging">
<idPkg:Story src="Stories/Story_main.xml" />
</Document>
I've tried this:
designmap = ET.parse(os.path.join(path, "designmap.xml"))
designmap.getroot().append(ET.fromstring(f"<idPkg:Story src=\"Stories/Story_{self.id}.xml\" />"))
designmap.write(os.path.join(path, "designmap.xml"))
But I get this error:
xml.etree.ElementTree.ParseError: unbound prefix: line 1, column 0
due to the parser not being able to find the prefix.
Is there a work around?

Related

Delete Element from XML file using python

I have been trying to delete the structuredBody element (which is within a component element) within the following Document, but my code seems to not work.
The structure of the XML source file simplified:
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
...
<component>
<structuredBody>
...
...
</structuredBody>
</component>
</ClinicalDocument>
Here is the code I'm using:
import xml.etree.ElementTree as ET
from lxml import objectify, etree
cda_tree = etree.parse('ELGA-023-Entlassungsbrief_aerztlich_EIS-FullSupport.xml')
cda_root = cda_tree.getroot()
for e in cda_root:
ET.register_namespace("", "urn:hl7-org:v3")
for node in cda_tree.xpath('//component/structuredBody'):
node.getparent().remove(node)
cda_tree.write('newXML.xml')
Whenever I run the code, the newXML.xml file still has the structuredBody element.
Thanks in advance!
Based on your most recent edit, I think you'll find the problem is that your for loop isn't matching any nodes. Your document doesn't contain any elements named component or structuredBody. The xmlns="urn:hl7-org:v3" declaration on the root element mean that all elements in the document exist by default in that particular namespace, so you need to use that namespace when matching elements:
from lxml import objectify, etree
cda_tree = etree.parse('data.xml')
cda_root = cda_tree.getroot()
ns = {
'hl7': 'urn:hl7-org:v3',
}
for node in cda_tree.xpath('//hl7:component/hl7:structuredBody', namespaces=ns):
node.getparent().remove(node)
cda_tree.write('newXML.xml')
With the above code, if the input looks like this:
<ClinicalDocument
xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<component>
<structuredBody>
...
...
</structuredBody>
</component>
</ClinicalDocument>
The output looks like:
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<component>
</component>
</ClinicalDocument>

How to deal with xmlns values while parsing an XML file?

I have the following toy example of an XML file. I have thousands of these. I have difficulty parsing this file.
Look at the text in second line. All my original files contain this text. When I delete i:type="Record" xmlns="http://schemas.datacontract.org/Storage" from second line (retaining the remaining text), I am able to get accelx and accely values using the code given below.
How can I parse this file with the original text?
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfRecord xmlns:i="http://www.w3.org/2001/XMLSchema-instance" i:type="Record" xmlns="http://schemas.datacontract.org/Storage">
<AvailableCharts>
<Accelerometer>true</Accelerometer>
<Velocity>false</Velocity>
</AvailableCharts>
<Trics>
<Trick>
<EndOffset>PT2M21.835S</EndOffset>
<Values>
<TrickValue>
<Acceleration>26.505801694441629</Acceleration>
<Rotation>0.023379150593228679</Rotation>
</TrickValue>
</Values>
</Trick>
</Trics>
<Values>
<SensorValue>
<accelx>-3.593643144</accelx>
<accely>7.316485176</accely>
</SensorValue>
<SensorValue>
<accelx>0.31103436</accelx>
<accely>7.70408184</accely>
</SensorValue>
</Values>
</ArrayOfRecord>
Code to parse the data:
import lxml.etree as etree
tree = etree.parse(r"C:\testdel.xml")
root = tree.getroot()
val_of_interest = root.findall('./Values/SensorValue')
for sensor_val in val_of_interest:
print sensor_val.find('accelx').text
print sensor_val.find('accely').text
I asked related question here: How to extract data from xml file that is deep down the tag
Thanks
The confusion was caused by the following default namespace (namespace declared without prefix) :
xmlns="http://schemas.datacontract.org/Storage"
Note that descendants elements without prefix inherit default namespace from ancestor, implicitly. Now, to reference element in namespace, you need to map a prefix to the namespace URI, and use that prefix in your XPath :
ns = {'d': 'http://schemas.datacontract.org/Storage' }
val_of_interest = root.findall('./d:Values/d:SensorValue', ns)
for sensor_val in val_of_interest:
print sensor_val.find('d:accelx', ns).text
print sensor_val.find('d:accely', ns).text

python lxml insert xml file into xml string

I have an XML file similar to this:
<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>
What I need to do is embed it into the following XML replacing the <object></object> element with everything in the file.
<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>
How do I go about doing this?
This is one possible way to replace an element with another element using lxml (see comments for how it works) :
....
....
#assume that 'tree' is variable containing the parsed template XML...
#and 'content_tree' is variable containing the actual content to be embedded, parsed
#get the container element to be replaced
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
#get parent of the container element
parent = container.getparent()
#replace container element with the actual content element
parent.replace(container, content_tree)
And this is a working demo example :
import lxml.etree as etree
file_content = '''<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>'''
template = '''<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>'''
tree = etree.fromstring(template)
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
parent = container.getparent()
content_tree = etree.fromstring(file_content)
parent.replace(container, content_tree)
print(etree.tostring(tree))

Remove namespace with xmltodict in Python

xmltodict converts XML to a Python dictionary. It supports namespaces. I can follow the example on the homepage and successfully remove a namespace. However, I cannot remove the namespace from my XML and cannot identify why? Here is my XML:
<?xml version="1.0" encoding="UTF-8"?>
<status xmlns:mystatus="http://localhost/mystatus">
<section1
mystatus:field1="data1"
mystatus:field2="data2" />
<section2
mystatus:lineA="outputA"
mystatus:lineB="outputB" />
</status>
And using:
xmltodict.parse(xml,process_namespaces=True,namespaces={'http://localhost/mystatus':None})
I get:
OrderedDict([(u'status', OrderedDict([(u'section1', OrderedDict([(u'#http://localhost/mystatus:field1', u'data1'), (u'#http://localhost/mystatus:field2', u'data2')])), (u'section2', OrderedDict([(u'#http://localhost/mystatus:lineA', u'outputA'), (u'#http://localhost/mystatus:lineB', u'outputB')]))]))])
instead of:
OrderedDict([(u'status', OrderedDict([(u'section1', OrderedDict([(u'field1', u'data1'), (u'field2', u'data2')])), (u'section2', OrderedDict([(u'lineA', u'outputA'), (u'#lineB', u'outputB')]))]))])
Am I making some simple mistake, or is there something about my XML that prevents the process_namespace modification from working correctly?
xmltodict is based on expat, so namespaces should applied to the class name, not attribute names:
<?xml version="1.0" encoding="UTF-8"?>
<status xmlns:mystatus="http://localhost/mystatus">
<mystatus:section1 field1="data1" field2="data2" />
<mystatus:section2 lineA="outputA" lineB="outputB" />
</status>
When parsed with:
foo = xmltodict.parse(xml,
process_namespaces=True,
namespaces={'http://localhost/mystatus':None})
outputs:
OrderedDict([(u'status', OrderedDict([(u'section1', OrderedDict([(u'#field1', u'data1'), (u'#field2', u'data2')])), (u'section2', OrderedDict([(u'#lineA', u'outputA'), (u'#lineB', u'outputB')]))]))])
Accessing it is easy:
# Get attribute 'lineA' from class 'section2' from class 'status'
>>> foo.get('status').get('section2').get('#lineA')
u'outputA'
Attribute namespaces are only required when you have multiple attributes of the same name (e.g. multiple id's or multiple prices, etc), in which case, I couldn't get expat or xmltodict to parse it correctly. YMMV though.

Python ElementTree find() not matching within kml file

I'm trying to find an element from a kml file using element trees as follows:
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse("history-03-02-2012.kml")
p = tree.find(".//name")
A sufficient subset of the file to demonstrate the problem follows:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<name>Location history from 03/03/2012 to 03/10/2012</name>
</Document>
</kml>
A "name" element exists; why does the search come back empty?
The name element you're trying to match is actually within the KML namespace, but you aren't searching with that namespace in mind.
Try:
p = tree.find(".//{http://www.opengis.net/kml/2.2}name")
If you were using lxml's XPath instead of the standard-library ElementTree, you'd instead pass the namespace in as a dictionary:
>>> tree = lxml.etree.fromstring('''<kml xmlns="http://www.opengis.net/kml/2.2">
... <Document>
... <name>Location history from 03/03/2012 to 03/10/2012</name>
... </Document>
... </kml>''')
>>> tree.xpath('//kml:name', namespaces={'kml': "http://www.opengis.net/kml/2.2"})
[<Element {http://www.opengis.net/kml/2.2}name at 0x23afe60>]

Categories

Resources