I'm trying to find an element from a kml file using element trees as follows:
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse("history-03-02-2012.kml")
p = tree.find(".//name")
A sufficient subset of the file to demonstrate the problem follows:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<name>Location history from 03/03/2012 to 03/10/2012</name>
</Document>
</kml>
A "name" element exists; why does the search come back empty?
The name element you're trying to match is actually within the KML namespace, but you aren't searching with that namespace in mind.
Try:
p = tree.find(".//{http://www.opengis.net/kml/2.2}name")
If you were using lxml's XPath instead of the standard-library ElementTree, you'd instead pass the namespace in as a dictionary:
>>> tree = lxml.etree.fromstring('''<kml xmlns="http://www.opengis.net/kml/2.2">
... <Document>
... <name>Location history from 03/03/2012 to 03/10/2012</name>
... </Document>
... </kml>''')
>>> tree.xpath('//kml:name', namespaces={'kml': "http://www.opengis.net/kml/2.2"})
[<Element {http://www.opengis.net/kml/2.2}name at 0x23afe60>]
Related
I have been trying to delete the structuredBody element (which is within a component element) within the following Document, but my code seems to not work.
The structure of the XML source file simplified:
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
...
<component>
<structuredBody>
...
...
</structuredBody>
</component>
</ClinicalDocument>
Here is the code I'm using:
import xml.etree.ElementTree as ET
from lxml import objectify, etree
cda_tree = etree.parse('ELGA-023-Entlassungsbrief_aerztlich_EIS-FullSupport.xml')
cda_root = cda_tree.getroot()
for e in cda_root:
ET.register_namespace("", "urn:hl7-org:v3")
for node in cda_tree.xpath('//component/structuredBody'):
node.getparent().remove(node)
cda_tree.write('newXML.xml')
Whenever I run the code, the newXML.xml file still has the structuredBody element.
Thanks in advance!
Based on your most recent edit, I think you'll find the problem is that your for loop isn't matching any nodes. Your document doesn't contain any elements named component or structuredBody. The xmlns="urn:hl7-org:v3" declaration on the root element mean that all elements in the document exist by default in that particular namespace, so you need to use that namespace when matching elements:
from lxml import objectify, etree
cda_tree = etree.parse('data.xml')
cda_root = cda_tree.getroot()
ns = {
'hl7': 'urn:hl7-org:v3',
}
for node in cda_tree.xpath('//hl7:component/hl7:structuredBody', namespaces=ns):
node.getparent().remove(node)
cda_tree.write('newXML.xml')
With the above code, if the input looks like this:
<ClinicalDocument
xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<component>
<structuredBody>
...
...
</structuredBody>
</component>
</ClinicalDocument>
The output looks like:
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<component>
</component>
</ClinicalDocument>
Would you help me, pleace, to get an access to elemnt with name 'id' by the following construction in Python (i have lxml and xml.etree.ElementTree libraries).
Desirable result: '0000000'
Desirable method:
Search in xml-document a child, where it's name is fcsProtocolEF3.
Search in fcsProtocolEF3 an element with name 'id'.
It is crucial to search by element name. Not by ordinal position.
I tried to use something like this: tree.findall('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')[0].findall('{http://zakupki.gov.ru/oos/types/1}id')[0].text
it works, but it requires to input namespaces. XML-document have different namespaces and I don't know how to define them beforehand.
Thank you.
That would be great to use something like XQuery in SQL:
value('(/*:export/*:fcsProtocolEF3/*:id)[1]', 'nvarchar(21)')) AS [id],
XML-document:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
lxml solution:
xml = '''<?xml version="1.0"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>'''
from lxml import etree as et
root = et.fromstring(xml)
text = root.xpath('//*[local-name()="export"]/*[local-name()="fcsProtocolEF3"]/*[local-name()="id"]/text()')[0]
print(text)
Below is ET based solution. NS are in use.
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
'''
def get_id_text():
root = ET.fromstring(xml)
fcs = root.find('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')
# assuming there is one fcs element and one id under fcs
return fcs.find('{http://zakupki.gov.ru/oos/types/1}id').text
print(get_id_text())
output
0000000
I have an XML file similar to this:
<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>
What I need to do is embed it into the following XML replacing the <object></object> element with everything in the file.
<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>
How do I go about doing this?
This is one possible way to replace an element with another element using lxml (see comments for how it works) :
....
....
#assume that 'tree' is variable containing the parsed template XML...
#and 'content_tree' is variable containing the actual content to be embedded, parsed
#get the container element to be replaced
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
#get parent of the container element
parent = container.getparent()
#replace container element with the actual content element
parent.replace(container, content_tree)
And this is a working demo example :
import lxml.etree as etree
file_content = '''<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>'''
template = '''<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>'''
tree = etree.fromstring(template)
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
parent = container.getparent()
content_tree = etree.fromstring(file_content)
parent.replace(container, content_tree)
print(etree.tostring(tree))
Is there a way to search for the same element, at the same time, within a document that occur with and without namespaces using lxml? As an example, I would want to get all occurences of the element identifier irrespective of whether or not it is associated with a specific namespace. I am currently only able to access them separately as below.
Code:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
root = xmlfile.getroot()
for l in root.iter('identifier'):
print l.text
for l in root.iter('{http://www.openarchives.org/OAI/2.0/provenance}identifier'):
print l.text
File: xmlfile.xml
<?xml version="1.0"?>
<record>
<header>
<identifier>identifier1</identifier>
<datestamp>datastamp1</datestamp>
<setSpec>setspec1</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>title1</dc:title>
<dc:title>title2</dc:title>
<dc:creator>creator1</dc:creator>
<dc:subject>subject1</dc:subject>
<dc:subject>subject2</dc:subject>
</oai_dc:dc>
</metadata>
<about>
<provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
<originDescription altered="false" harvestDate="2011-08-11T03:47:51Z">
<baseURL>baseURL1</baseURL>
<identifier>identifier3</identifier>
<datestamp>datestamp2</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
<originDescription altered="false" harvestDate="2010-10-10T06:15:53Z">
<baseURL>xxxxx</baseURL>
<identifier>identifier4</identifier>
<datestamp>2010-04-27T01:10:31Z</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
</originDescription>
</originDescription>
</provenance>
</about>
</record>
You could use XPath to solve that kind of issue:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
identifier_nodes = xmlfile.xpath("//*[local-name() = 'identifier']")
I have a very simple KML file which returns no nodes when parsed with ElementTree. This is frustrating me :-). Any clues?
from xml.etree import ElementTree
from pprint import pprint
kml = '''<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0">
<Document>
<name>NEXRAD Radar Sites</name>
<Schema parent="Placemark" name="wsr">
<SimpleField type="wstring" name="STATE">
</SimpleField>
</Schema>
<wsr>
<name>KABR</name>
</wsr>
</Document>
</kml>
'''
tree = ElementTree.fromstring(kml)
ElementTree.dump(tree)
for node in tree.iter('wsr'):
pprint(node)
for node in tree.findall('../wsr'):
pprint(node)
The tags are namespaced. If you try tree.iter() with no tag it will show what ElementTree thinks the tags are called. The wsr tag is called {http://earth.google.com/kml/2.0}wsr. This returns a node:
list(tree.iter('{http://earth.google.com/kml/2.0}wsr'))