python lxml insert xml file into xml string - python

I have an XML file similar to this:
<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>
What I need to do is embed it into the following XML replacing the <object></object> element with everything in the file.
<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>
How do I go about doing this?

This is one possible way to replace an element with another element using lxml (see comments for how it works) :
....
....
#assume that 'tree' is variable containing the parsed template XML...
#and 'content_tree' is variable containing the actual content to be embedded, parsed
#get the container element to be replaced
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
#get parent of the container element
parent = container.getparent()
#replace container element with the actual content element
parent.replace(container, content_tree)
And this is a working demo example :
import lxml.etree as etree
file_content = '''<tes:variable xmlns:tes="http://www.tidalsoftware.com/client/tesservlet" xmlns="http://purl.org/atom/ns#">
<tes:ownername>OWNER</tes:ownername>
<tes:productiondate>2015-08-23T00:00:00-0400</tes:productiondate>
<tes:readonly>N</tes:readonly>
<tes:publish>N</tes:publish>
<tes:description>JIRA-88</tes:description>
<tes:startcalendar>0</tes:startcalendar>
<tes:ownerid>88</tes:ownerid>
<tes:type>2</tes:type>
<tes:innervalue>4</tes:innervalue>
<tes:calc>N</tes:calc>
<tes:name>test_number3</tes:name>
<tes:startdate>1899-12-30T00:00:00-0500</tes:startdate>
<tes:pub>Y</tes:pub>
<tes:lastvalue>0</tes:lastvalue>
<tes:id>2078</tes:id>
<tes:startdateasstring>18991230000000</tes:startdateasstring>
</tes:variable>'''
template = '''<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://purl.org/atom/ns#">
<tes:Variable.update xmlns:tes="http://www.tidalsoftware.com/client/tesservlet">
<object></object>
</tes:Variable.update>
</entry>'''
tree = etree.fromstring(template)
container = tree.xpath('//d:object', namespaces={'d':'http://purl.org/atom/ns#'})[0]
parent = container.getparent()
content_tree = etree.fromstring(file_content)
parent.replace(container, content_tree)
print(etree.tostring(tree))

Related

xml parsing in python with XPath

I am trying to parse an XML file in Python with the built in xml module and Elemnt tree, but what ever I try to do according to the documentation, it does not give me what I need.
I am trying to extract all the value tags into a list
<?xml version="1.0" encoding="UTF-8"?>
<CustomField xmlns="http://soap.sforce.com/2006/04/metadata">
<fullName>testPicklist__c</fullName>
<externalId>false</externalId>
<label>testPicklist</label>
<required>false</required>
<trackFeedHistory>false</trackFeedHistory>
<type>Picklist</type>
<valueSet>
<restricted>true</restricted>
<valueSetDefinition>
<sorted>false</sorted>
<value>
<fullName>a 32</fullName>
<default>false</default>
<label>a 32</label>
</value>
<value>
<fullName>23 432;:</fullName>
<default>false</default>
<label>23 432;:</label>
</value>
and here is the example code that I cant get to work. It's very basic and all I have issues is the xpath.
from xml.etree.ElementTree import ElementTree
field_filepath= "./testPicklist__c.field-meta.xml"
mydoc = ElementTree()
mydoc.parse(field_filepath)
root = mydoc.getroot()
print(root.findall(".//value")
print(root.findall(".//*/value")
print(root.findall("./*/value")
Since the root element has attribute xmlns="http://soap.sforce.com/2006/04/metadata", every element in the document will belong to this namespace. So you're actually looking for {http://soap.sforce.com/2006/04/metadata}value elements.
To search all <value> elements in this document you have to specify the namespace argument in the findall() function
from xml.etree.ElementTree import ElementTree
field_filepath= "./testPicklist__c.field-meta.xml"
mydoc = ElementTree()
mydoc.parse(field_filepath)
root = mydoc.getroot()
# get the namespace of root
ns = root.tag.split('}')[0][1:]
# create a dictionary with the namespace
ns_d = {'my_ns': ns}
# get all the values
values = root.findall('.//my_ns:value', namespaces=ns_d)
# print the values
for value in values:
print(value)
Outputs:
<Element '{http://soap.sforce.com/2006/04/metadata}value' at 0x7fceea043ba0>
<Element '{http://soap.sforce.com/2006/04/metadata}value' at 0x7fceea043e20>
Alternatively you can just search for the {http://soap.sforce.com/2006/04/metadata}value
# get all the values
values = root.findall('.//{http://soap.sforce.com/2006/04/metadata}value')

python, xml: how to access the 3rd child by element' name

Would you help me, pleace, to get an access to elemnt with name 'id' by the following construction in Python (i have lxml and xml.etree.ElementTree libraries).
Desirable result: '0000000'
Desirable method:
Search in xml-document a child, where it's name is fcsProtocolEF3.
Search in fcsProtocolEF3 an element with name 'id'.
It is crucial to search by element name. Not by ordinal position.
I tried to use something like this: tree.findall('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')[0].findall('{http://zakupki.gov.ru/oos/types/1}id')[0].text
it works, but it requires to input namespaces. XML-document have different namespaces and I don't know how to define them beforehand.
Thank you.
That would be great to use something like XQuery in SQL:
value('(/*:export/*:fcsProtocolEF3/*:id)[1]', 'nvarchar(21)')) AS [id],
XML-document:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
lxml solution:
xml = '''<?xml version="1.0"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>'''
from lxml import etree as et
root = et.fromstring(xml)
text = root.xpath('//*[local-name()="export"]/*[local-name()="fcsProtocolEF3"]/*[local-name()="id"]/text()')[0]
print(text)
Below is ET based solution. NS are in use.
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
'''
def get_id_text():
root = ET.fromstring(xml)
fcs = root.find('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')
# assuming there is one fcs element and one id under fcs
return fcs.find('{http://zakupki.gov.ru/oos/types/1}id').text
print(get_id_text())
output
0000000

Python -lxml xpath returns empty list

I am reading an xliff file and planning to retrieve specific element. I tried to print all the elements using
from lxml import etree
with open('path\to\file\.xliff', 'r',encoding = 'utf-8') as xml_file:
tree = etree.parse(xml_file)
root = tree.getroot()
for element in root.iter():
print("child", element)
The output was
child <Element {urn:oasis:names:tc:xliff:document:2.0}segment at 0x6b8f9c8>
child <Element {urn:oasis:names:tc:xliff:document:2.0}source at 0x6b8f908>
When I tried to get the specific element (with the help of many posts here) - source tag
segment = tree.xpath('{urn:oasis:names:tc:xliff:document:2.0}segment')
print(segment)
it returns an empty list. Can someone tell me how to retrieve it properly.
Input :
<?xml version='1.0' encoding='UTF-8'?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0">
<segment id = 1>
<source>
Hello world
</source>
</segment>
<segment id = 2 >
<source>
2nd statement
</source>
</segment>
</xliff>
I want to get the values of segment and its corresponding source
This code,
tree.xpath('{urn:oasis:names:tc:xliff:document:2.0}segment')
is not accepted by lxml ("lxml.etree.XPathEvalError: Invalid expression"). You need to use findall().
The following works (in the XML sample, the segment elements are children of xliff):
from lxml import etree
tree = etree.parse("test.xliff") # XML in the question; ill-formed attributes corrected
segment = tree.findall('{urn:oasis:names:tc:xliff:document:2.0}segment')
print(segment)
However, the real XML is apparently more complex (segment is not a direct child of xliff). Then you need to add .// to search the whole tree:
segment = tree.findall('.//{urn:oasis:names:tc:xliff:document:2.0}segment')

Accessing Elements with and without namespaces using lxml

Is there a way to search for the same element, at the same time, within a document that occur with and without namespaces using lxml? As an example, I would want to get all occurences of the element identifier irrespective of whether or not it is associated with a specific namespace. I am currently only able to access them separately as below.
Code:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
root = xmlfile.getroot()
for l in root.iter('identifier'):
print l.text
for l in root.iter('{http://www.openarchives.org/OAI/2.0/provenance}identifier'):
print l.text
File: xmlfile.xml
<?xml version="1.0"?>
<record>
<header>
<identifier>identifier1</identifier>
<datestamp>datastamp1</datestamp>
<setSpec>setspec1</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>title1</dc:title>
<dc:title>title2</dc:title>
<dc:creator>creator1</dc:creator>
<dc:subject>subject1</dc:subject>
<dc:subject>subject2</dc:subject>
</oai_dc:dc>
</metadata>
<about>
<provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
<originDescription altered="false" harvestDate="2011-08-11T03:47:51Z">
<baseURL>baseURL1</baseURL>
<identifier>identifier3</identifier>
<datestamp>datestamp2</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
<originDescription altered="false" harvestDate="2010-10-10T06:15:53Z">
<baseURL>xxxxx</baseURL>
<identifier>identifier4</identifier>
<datestamp>2010-04-27T01:10:31Z</datestamp>
<metadataNamespace>xxxxx</metadataNamespace>
</originDescription>
</originDescription>
</provenance>
</about>
</record>
You could use XPath to solve that kind of issue:
from lxml import etree
xmlfile = etree.parse('xmlfile.xml')
identifier_nodes = xmlfile.xpath("//*[local-name() = 'identifier']")

Python ElementTree find() not matching within kml file

I'm trying to find an element from a kml file using element trees as follows:
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse("history-03-02-2012.kml")
p = tree.find(".//name")
A sufficient subset of the file to demonstrate the problem follows:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<name>Location history from 03/03/2012 to 03/10/2012</name>
</Document>
</kml>
A "name" element exists; why does the search come back empty?
The name element you're trying to match is actually within the KML namespace, but you aren't searching with that namespace in mind.
Try:
p = tree.find(".//{http://www.opengis.net/kml/2.2}name")
If you were using lxml's XPath instead of the standard-library ElementTree, you'd instead pass the namespace in as a dictionary:
>>> tree = lxml.etree.fromstring('''<kml xmlns="http://www.opengis.net/kml/2.2">
... <Document>
... <name>Location history from 03/03/2012 to 03/10/2012</name>
... </Document>
... </kml>''')
>>> tree.xpath('//kml:name', namespaces={'kml': "http://www.opengis.net/kml/2.2"})
[<Element {http://www.opengis.net/kml/2.2}name at 0x23afe60>]

Categories

Resources