Python ElementTree find() not matching within kml file - python

I'm trying to find an element from a kml file using element trees as follows:
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse("history-03-02-2012.kml")
p = tree.find(".//name")
A sufficient subset of the file to demonstrate the problem follows:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<name>Location history from 03/03/2012 to 03/10/2012</name>
</Document>
</kml>
A "name" element exists; why does the search come back empty?

The name element you're trying to match is actually within the KML namespace, but you aren't searching with that namespace in mind.
Try:
p = tree.find(".//{http://www.opengis.net/kml/2.2}name")
If you were using lxml's XPath instead of the standard-library ElementTree, you'd instead pass the namespace in as a dictionary:
>>> tree = lxml.etree.fromstring('''<kml xmlns="http://www.opengis.net/kml/2.2">
... <Document>
... <name>Location history from 03/03/2012 to 03/10/2012</name>
... </Document>
... </kml>''')
>>> tree.xpath('//kml:name', namespaces={'kml': "http://www.opengis.net/kml/2.2"})
[<Element {http://www.opengis.net/kml/2.2}name at 0x23afe60>]

Related

Delete Element from XML file using python

I have been trying to delete the structuredBody element (which is within a component element) within the following Document, but my code seems to not work.
The structure of the XML source file simplified:
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
...
<component>
<structuredBody>
...
...
</structuredBody>
</component>
</ClinicalDocument>
Here is the code I'm using:
import xml.etree.ElementTree as ET
from lxml import objectify, etree
cda_tree = etree.parse('ELGA-023-Entlassungsbrief_aerztlich_EIS-FullSupport.xml')
cda_root = cda_tree.getroot()
for e in cda_root:
ET.register_namespace("", "urn:hl7-org:v3")
for node in cda_tree.xpath('//component/structuredBody'):
node.getparent().remove(node)
cda_tree.write('newXML.xml')
Whenever I run the code, the newXML.xml file still has the structuredBody element.
Thanks in advance!
Based on your most recent edit, I think you'll find the problem is that your for loop isn't matching any nodes. Your document doesn't contain any elements named component or structuredBody. The xmlns="urn:hl7-org:v3" declaration on the root element mean that all elements in the document exist by default in that particular namespace, so you need to use that namespace when matching elements:
from lxml import objectify, etree
cda_tree = etree.parse('data.xml')
cda_root = cda_tree.getroot()
ns = {
'hl7': 'urn:hl7-org:v3',
}
for node in cda_tree.xpath('//hl7:component/hl7:structuredBody', namespaces=ns):
node.getparent().remove(node)
cda_tree.write('newXML.xml')
With the above code, if the input looks like this:
<ClinicalDocument
xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<component>
<structuredBody>
...
...
</structuredBody>
</component>
</ClinicalDocument>
The output looks like:
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<component>
</component>
</ClinicalDocument>

python, xml: how to access the 3rd child by element' name

Would you help me, pleace, to get an access to elemnt with name 'id' by the following construction in Python (i have lxml and xml.etree.ElementTree libraries).
Desirable result: '0000000'
Desirable method:
Search in xml-document a child, where it's name is fcsProtocolEF3.
Search in fcsProtocolEF3 an element with name 'id'.
It is crucial to search by element name. Not by ordinal position.
I tried to use something like this: tree.findall('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')[0].findall('{http://zakupki.gov.ru/oos/types/1}id')[0].text
it works, but it requires to input namespaces. XML-document have different namespaces and I don't know how to define them beforehand.
Thank you.
That would be great to use something like XQuery in SQL:
value('(/*:export/*:fcsProtocolEF3/*:id)[1]', 'nvarchar(21)')) AS [id],
XML-document:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
lxml solution:
xml = '''<?xml version="1.0"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>'''
from lxml import etree as et
root = et.fromstring(xml)
text = root.xpath('//*[local-name()="export"]/*[local-name()="fcsProtocolEF3"]/*[local-name()="id"]/text()')[0]
print(text)
Below is ET based solution. NS are in use.
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
'''
def get_id_text():
root = ET.fromstring(xml)
fcs = root.find('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')
# assuming there is one fcs element and one id under fcs
return fcs.find('{http://zakupki.gov.ru/oos/types/1}id').text
print(get_id_text())
output
0000000

python lxml insert xml file into xml string

I have an XML file similar to this:
<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>
What I need to do is embed it into the following XML replacing the <object></object> element with everything in the file.
<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>
How do I go about doing this?
This is one possible way to replace an element with another element using lxml (see comments for how it works) :
....
....
#assume that 'tree' is variable containing the parsed template XML...
#and 'content_tree' is variable containing the actual content to be embedded, parsed
#get the container element to be replaced
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
#get parent of the container element
parent = container.getparent()
#replace container element with the actual content element
parent.replace(container, content_tree)
And this is a working demo example :
import lxml.etree as etree
file_content = '''<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>'''
template = '''<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>'''
tree = etree.fromstring(template)
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
parent = container.getparent()
content_tree = etree.fromstring(file_content)
parent.replace(container, content_tree)
print(etree.tostring(tree))

Accessing Elements with and without namespaces using lxml

Is there a way to search for the same element, at the same time, within a document that occur with and without namespaces using lxml? As an example, I would want to get all occurences of the element identifier irrespective of whether or not it is associated with a specific namespace. I am currently only able to access them separately as below.
Code:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
root = xmlfile.getroot()
for l in root.iter('identifier'):
print l.text
for l in root.iter('{http://www.openarchives.org/OAI/2.0/provenance}identifier'):
print l.text
File: xmlfile.xml
<?xml version="1.0"?>
<record>
<header>
<identifier>identifier1</identifier>
<datestamp>datastamp1</datestamp>
<setSpec>setspec1</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>title1</dc:title>
<dc:title>title2</dc:title>
<dc:creator>creator1</dc:creator>
<dc:subject>subject1</dc:subject>
<dc:subject>subject2</dc:subject>
</oai_dc:dc>
</metadata>
<about>
<provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
<originDescription altered="false" harvestDate="2011-08-11T03:47:51Z">
<baseURL>baseURL1</baseURL>
<identifier>identifier3</identifier>
<datestamp>datestamp2</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
<originDescription altered="false" harvestDate="2010-10-10T06:15:53Z">
<baseURL>xxxxx</baseURL>
<identifier>identifier4</identifier>
<datestamp>2010-04-27T01:10:31Z</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
</originDescription>
</originDescription>
</provenance>
</about>
</record>
You could use XPath to solve that kind of issue:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
identifier_nodes = xmlfile.xpath("//*[local-name() = 'identifier']")

ElementTree returns no nodes parsing simple KML document

I have a very simple KML file which returns no nodes when parsed with ElementTree. This is frustrating me :-). Any clues?
from xml.etree import ElementTree
from pprint import pprint
kml = '''<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0">
<Document>
<name>NEXRAD Radar Sites</name>
<Schema parent="Placemark" name="wsr">
<SimpleField type="wstring" name="STATE">
</SimpleField>
</Schema>
<wsr>
<name>KABR</name>
</wsr>
</Document>
</kml>
'''
tree = ElementTree.fromstring(kml)
ElementTree.dump(tree)
for node in tree.iter('wsr'):
pprint(node)
for node in tree.findall('../wsr'):
pprint(node)
The tags are namespaced. If you try tree.iter() with no tag it will show what ElementTree thinks the tags are called. The wsr tag is called {http://earth.google.com/kml/2.0}wsr. This returns a node:
list(tree.iter('{http://earth.google.com/kml/2.0}wsr'))

Categories

Resources