Create spring:beans root in lxml - python

I'm trying to create xml with lxml.etree module for python2. It would be an easy task if not requirement that output should looks like:
<spring:beans xmlns="http://membrane-soa.org/proxies/1/"
xmlns:spring="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.2.xsd
http://membrane-soa.org/proxies/1/ http://membrane-soa.org/schemas/proxies-1.xsd">
any suggestion how can I do that? All I was able to achieve at this moment is:
<ns0:beans xmlns:ns0="http://membrane-soa.org/proxies/1/"/>
so how to have "spring" instead of "ns0"
Thanks

Use map to declare the namespaces and use None as the key specifically for default namespace :
from lxml import etree as ET
nsmap = { None: "http://membrane-soa.org/proxies/1/",
"spring": "http://www.springframework.org/schema/beans",
"xsi": "http://www.w3.org/2001/XMLSchema-instance" }
root = ET.Element("{%s}beans" % nsmap["spring"], nsmap=nsmap)
root.set("{%s}schemaLocation" % nsmap["xsi"],
"http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.2.xsd")
result : (after formatting)
<spring:beans
xmlns:spring="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://membrane-soa.org/proxies/1/"
spring:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.2.xsd"
/>

Related

python, xml: how to access the 3rd child by element' name

Would you help me, pleace, to get an access to elemnt with name 'id' by the following construction in Python (i have lxml and xml.etree.ElementTree libraries).
Desirable result: '0000000'
Desirable method:
Search in xml-document a child, where it's name is fcsProtocolEF3.
Search in fcsProtocolEF3 an element with name 'id'.
It is crucial to search by element name. Not by ordinal position.
I tried to use something like this: tree.findall('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')[0].findall('{http://zakupki.gov.ru/oos/types/1}id')[0].text
it works, but it requires to input namespaces. XML-document have different namespaces and I don't know how to define them beforehand.
Thank you.
That would be great to use something like XQuery in SQL:
value('(/*:export/*:fcsProtocolEF3/*:id)[1]', 'nvarchar(21)')) AS [id],
XML-document:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
lxml solution:
xml = '''<?xml version="1.0"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>'''
from lxml import etree as et
root = et.fromstring(xml)
text = root.xpath('//*[local-name()="export"]/*[local-name()="fcsProtocolEF3"]/*[local-name()="id"]/text()')[0]
print(text)
Below is ET based solution. NS are in use.
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
'''
def get_id_text():
root = ET.fromstring(xml)
fcs = root.find('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')
# assuming there is one fcs element and one id under fcs
return fcs.find('{http://zakupki.gov.ru/oos/types/1}id').text
print(get_id_text())
output
0000000

How to edit xml config file in python 3?

I have a xml config file and needs to update particular attribute value.
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<testCommnication>
<connection intervalInSeconds="50" versionUpdates="15"/>
</testCommnication>
</configuration>
I just need to update the "versionUpdates" value to "10".
How can i achieve this in python 3.
I have tried xml.etree and minidom and not able to achieve it.
Please use xml.etree.ElementTree to modify the xml:
Edit: If you want to retail the attribute order, use lxml instead. To install, use pip install lxml
# import xml.etree.ElementTree as ET
from lxml import etree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
# modifying an attribute
for elem in root.iter('connection'):
elem.set('versionUpdates', '10')
tree.write('modified.xml') # you can write 'sample.xml' as well
Content now in modified.xml:
<configuration>
<testCommnication>
<connection intervalInSeconds="50" versionUpdates="10" />
</testCommnication>
</configuration>
You can use xml.etree.ElementTree in Python 3 to handle XML :
import xml.etree.ElementTree
config_file = xml.etree.ElementTree.parse('your_file.xml')
config_file.findall(".//connection")[0].set('versionUpdates', 10))
config_file.write('your_new_file.xml')

Reading xml with lxml lib geting strange string from xmlns tag

I am writing program to work on xml file and change it. But when I try to get to any part of it I get some extra part.
My xml file:
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<members>sbaa__ApprovalChain__c.ExternalID__c</members>
<members>sbaa__ApprovalCondition__c.ExternalID__c</members>
<members>sbaa__ApprovalRule__c.ExternalID__c</members>
<name>CustomField</name>
</types>
<version>40.0</version>
</Package>
And I have my code:
from lxml import etree
import sys
tree = etree.parse('package.xml')
root = tree.getroot()
print( root[0][0].tag )
As output I expect to see members but I get something like this:
{http://soap.sforce.com/2006/04/metadata}members
Why do I see that url and how to stop it from showing up?
You have defined a default namespace (Wikipedia, lxml tutorial). When defined, it is a part of every child tag.
If you want to print the tag without the namespace, it's easy
tag = root[0][0].tag
print(tag[tag.find('}')+1:])
If you want to remove the namespace from XML, see this question.

How to parse tiered XML String

I have an xml string that I need to parse in python that looks like this:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<PostLoadsResponse xmlns="http://webservices.truckstop.com/v11">
<PostLoadsResult xmlns:a="http://schemas.datacontract.org/2004/07/WebServices.Objects" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Errors xmlns="http://schemas.datacontract.org/2004/07/WebServices">
<Error>
<ErrorMessage>Invalid Location</ErrorMessage>
</Error>
</Errors>
</PostLoadsResult>
</PostLoadsResponse>
</s:Body>
</s:Envelope>'
I'm having trouble using xmltree to get to the error message of this tree without something like:
import xml.etree.ElementTree as ET
ET.fromstring(text).findall('{http://schemas.xmlsoap.org/soap/envelope/}Body')[0].getchildren()[0].getchildren()[0].getchildren()
You need to handle namespaces and you can do it with xml.etree.ElementTree:
tree = ET.fromstring(data)
namespaces = {
's': 'http://schemas.xmlsoap.org/soap/envelope/',
'd': "http://schemas.datacontract.org/2004/07/WebServices"
}
print(tree.find(".//d:ErrorMessage", namespaces=namespaces).text)
Prints Invalid Location.
Using the partial XPath support:
ET.fromstring(text).find('.//{http://schemas.datacontract.org/2004/07/WebServices}ErrorMessage')
That will instruct it to find the first element named ErrorMessage with namespace http://schemas.datacontract.org/2004/07/WebServices at any depth.
However, it may be faster to use something like
ET.fromstring(text).find('{http://schemas.xmlsoap.org/soap/envelope/}Body').find('{http://webservices.truckstop.com/v11}PostLoadsResponse').find('{http://webservices.truckstop.com/v11}PostLoadsResult').find('{http://schemas.datacontract.org/2004/07/WebServices}Errors').find('{http://schemas.datacontract.org/2004/07/WebServices}Error').find('{http://schemas.datacontract.org/2004/07/WebServices}ErrorMessage'
If you know your message will always contain those elements.
You can use the getiterator method on the tree to iterate through the items in it. You can check the tag on each item to see if it's the right one.
>>> err = [node.text for node in tree.getiterator() if node.tag.endswith('ErrorMessage')]
>>> err
['Invalid Location']

Set a DTD using minidom in python

I am trying to include a reference to a DTD in my XML doc using minidom.
I am creating the document like:
doc = Document()
foo = doc.createElement('foo')
doc.appendChild(foo)
doc.toxml()
This gives me:
<?xml version="1.0" ?>
<foo/>
I need to get something like:
<?xml version="1.0" ?>
<!DOCTYPE something SYSTEM "http://www.path.to.my.dtd.com/my.dtd">
<foo/>
The documentation is out of date. Use the source, Luke. I do it something like this.
from xml.dom.minidom import DOMImplementation
imp = DOMImplementation()
doctype = imp.createDocumentType(
qualifiedName='foo',
publicId='',
systemId='http://www.path.to.my.dtd.com/my.dtd',
)
doc = imp.createDocument(None, 'foo', doctype)
doc.toxml()
This prints the following.
<?xml version="1.0" ?><!DOCTYPE foo SYSTEM \'http://www.path.to.my.dtd.com/my.dtd\'><foo/>
Note how the root element is created automatically by createDocument(). Also, your 'something' has been changed to 'foo': the DTD needs to contain the root element name itself.
According to the Python docs, there is no implementation of the DocumentType interface in the minidom.

Categories

Resources