Cannot write XML file with default namespace [duplicate] - python

This question already has answers here:
Saving XML files using ElementTree
(5 answers)
Closed 7 years ago.
I'm writing a Python script to update Visual Studio project files. They look like this:
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" DefaultTargets="Build"
xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
...
The following code reads and then writes the file:
import xml.etree.ElementTree as ET
tree = ET.parse(projectFile)
root = tree.getroot()
tree.write(projectFile,
xml_declaration = True,
encoding = 'utf-8',
method = 'xml',
default_namespace = "http://schemas.microsoft.com/developer/msbuild/2003")
Python throws an error at the last line, saying:
ValueError: cannot use non-qualified names with default_namespace option
This is surprising since I'm just reading and writing, with no editing in between. Visual Studio refuses to load XML files without a default namespace, so omitting it is not optional.
Why does this error occur? Suggestions or alternatives welcome.

This is a duplicate to Saving XML files using ElementTree
The solution is to define your default namespace BEFORE parsing the project file.
ET.register_namespace('',"http://schemas.microsoft.com/developer/msbuild/2003")
Then write out your file as
tree.write(projectFile,
xml_declaration = True,
encoding = 'utf-8',
method = 'xml')
You have successfully round-tripped your file. And avoided the creation of ns0 tags everywhere.

I think that lxml does a better job handling namespaces. It aims for an ElementTree-like interface but uses xmllib2 underneath.
>>> import lxml.etree
>>> doc=lxml.etree.fromstring("""<?xml version="1.0" encoding="utf-8"?>
... <Project ToolsVersion="4.0" DefaultTargets="Build"
... xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
... <PropertyGroup>
... </PropertyGroup>
... </Project>""")
>>> print lxml.etree.tostring(doc, xml_declaration=True, encoding='utf-8', method='xml', pretty_print=True)
<?xml version='1.0' encoding='utf-8'?>
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="4.0" DefaultTargets="Build">
<PropertyGroup>
</PropertyGroup>
</Project>

This was the closest answer I could find to my problem. Putting the:
ET.register_namespace('',"http://schemas.microsoft.com/developer/msbuild/2003")
just before the parsing of my file did not work.
You need to find the specific namespace the xml file you are loading is using. To do that, I printed out the Element of the ET tree node's tag which gave me my namespace to use and the tag name, copy that namespace into:
ET.register_namespace('',"XXXXX YOUR NAMESPACEXXXXXX")
before you start parsing your file then that should remove all the namespaces when you write.

Related

store content of a tag in a string with elementtree in python3

I'm using Python 3.7.2 and elementtree to copy the content of a tag in an XML file.
This is my XML file:
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.003.03">
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>nBblsUR-uH..6jmGgZNHLQAAAXgXN1Lu</MsgId>
<CreDtTm>2016-11-10T12:00:00.000+01:00</CreDtTm>
<NbOfTxs>1</NbOfTxs>
<CtrlSum>6</CtrlSum>
<InitgPty>
<Nm>TC 03000 Kunde 55 Protokollr ckf hrung</Nm>
</InitgPty>
</GrpHdr>
</CstmrCdtTrfInitn>
</Document>
I want to copy the content of the 'MsgId' tag and save it as a string.
I've manage to do this with minidom before, but due to new circumstances, I have to settle with elementtree for now.
This is that code with minidom:
dom = xml.dom.minidom.parse('H:\\app_python/in_spsh/{}'.format(filename_string))
message = dom.getElementsByTagName('MsgId')
for MsgId in message:
print(MsgId.firstChild.nodeValue)
Now I want to do the exact same thing with elementtree. How can I achieve this?
To get the text value of a single element, you can use the findtext() method. The namespace needs to be taken into account.
from xml.etree import ElementTree as ET
tree = ET.parse("test.xml") # Your XML document
msgid = tree.findtext('.//{urn:iso:std:iso:20022:tech:xsd:pain.001.003.03}MsgId')
With Python 3.8 and later, it is possible to use a wildcard for the namespace:
msgid = tree.findtext('.//{*}MsgId')

Overwriting an XML file and one of my nameSpace is missing

I am parsing an XML file , replacing value on it and overwriting it, everything works fine but one of my two root's namespaces is missing after the overwrite.
I found that i have to register my namespaces, i did it but it doesnt change it:
There is the Xml file input :
<?xml version="1.0" encoding ="utf8"?>
<Document xmlns:xsi = "sample" xmlns ="sample2">
and there is the output :
<?xml version='1.0' encoding='UTF-8'?>
<Document xmlns="sample2">
there is when i register my namespace :
ET.register_namespace('xsi' , "sample")
ET.register_namespace('' , "Sample2" )
the writing method :
tree.write(path , xml_declaration=True, method='xml', encoding='UTF-8')
do you have any idea what is the problem and how can i fix it ?
It probably would be easier using lxml library:
from lxml import etree
nsmap = {'xsi': "sample", None: "sample2"}
root = etree.Element('Document', nsmap=nsmap)
print(etree.tostring(root))
Which gives desired output:
<Document xmlns:xsi="sample" xmlns="sample2"/>

Generate XML Document in Python 3 using Namespaces and ElementTree

I am having problems generating a XML document using the ElementTree framework in Python 3. I tried registering the namespace before setting up the document. Right now it seems that I can generate a XML document only by adding the namespace to each element like a=Element("{full_namespace_URI}element_name") which seems tedious.
How do I setup the default namespace and can omit putting it in each element?
Any help is appreciated.
I have written a small demo program for Python 3:
from io import BytesIO
from xml.etree import ElementTree as ET
ET.register_namespace("", "urn:dslforum-org:service-1-0")
"""
desired output
==============
<?xml version='1.0' encoding='utf-8'?>
<topNode xmlns="urn:dslforum-org:service-1-0"">
<childNode>content</childNode>
</topNode>
"""
# build XML document without namespaces
a = ET.Element("topNode")
b = ET.Element("childNode")
b.text = "content"
a.append(b)
tree = ET.ElementTree(a)
# build XML document with namespaces
a_ns = ET.Element("{dsl}topNode")
b_ns = ET.Element("{dsl}childNode")
b_ns.text = "content"
a_ns.append(b_ns)
tree_ns = ET.ElementTree(a_ns)
def print_element_tree(element_tree, comment, default_namespace=None):
"""
print element tree with comment to standard out
"""
with BytesIO() as buf:
element_tree.write(buf, encoding="utf-8", xml_declaration=True,
default_namespace=default_namespace)
buf.seek(0)
print(comment)
print(buf.read().decode("utf-8"))
print_element_tree(tree, "Element Tree without XML namespace")
print_element_tree(tree_ns, "Element Tree with XML namespace", "dsl")
I believe you are overthinking this.
Registering a default namespace in your code avoids the ns0: aliases.
Registering any namespaces you will use while creating a document allows you to designate the alias used for each namespace.
To achieve your desired output, assign the namespace to your top element:
a = ET.Element("{urn:dslforum-org:service-1-0}topNode")
The preceding ET.register_namespace("", "urn:dslforum-org:service-1-0") will make that the default namespace in the document, assign it to topNode, and not prefix your tag names.
<?xml version='1.0' encoding='utf-8'?>
<topNode xmlns="urn:dslforum-org:service-1-0"><childNode>content</childNode></topNode>
If you remove the register_namespace() call, then you get this monstrosity:
<?xml version='1.0' encoding='utf-8'?>
<ns0:topNode xmlns:ns0="urn:dslforum-org:service-1-0"><childNode>content</childNode></ns0:topNode>

Validate XML with namespaces against Schematron using lxml in Python

I am not able to get lxml Schematron validator to recognize namespaces. Validation works fine in code without namespaces.
This is for Python 3.7.4 and lxml 4.4.0 on MacOS 10.15
Here is the schematron file
<?xml version='1.0' encoding='UTF-8'?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:ns1="http://foo">
<pattern>
<rule context="//ns1:bar">
<assert test="number(.) = 2">
bar must be 2
</assert>
</rule>
</pattern>
</schema>
and here is the xml file
<?xml version="1.0" encoding="UTF-8"?>
<zip xmlns:ns1="http://foo">
<ns1:bar>3</ns1:bar>
</zip>
here is the python code
from lxml import etree, isoschematron
from plumbum import local
schematron_doc = etree.parse(local.path('rules.sch'))
schematron = isoschematron.Schematron(schematron_doc)
xml_doc = etree.parse(local.path('test.xml'))
is_valid = schematron.validate(xml_doc)
assert not is_valid
What I get: lxml.etree.XSLTParseError: xsltCompilePattern : failed to compile '//ns1:bar'
If I remove ns1 from both the XML file and the Schematron file, the example works perfectly-- no error message.
There must be a trick to registering namespaces in lxml Schematron that I am missing. Has anyone done this?
As it turns out, there is a specific way to register namespaces in Schematron. It is described in the Schematron ISO standard
It only required a small change to the Schematron file, adding the "ns" element in as follows:
<?xml version='1.0' encoding='UTF-8'?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron">
<ns uri="http://foo" prefix="ns1"/>
<pattern>
<rule context="//ns1:bar">
<assert test="number(.) = 2">
bar must be 2
</assert>
</rule>
</pattern>
</schema>
I won't remove the question, since there is a dearth of examples of Schematron rules using namespaces. Hopefully it can be helpful to someone.

Reading xml with lxml lib geting strange string from xmlns tag

I am writing program to work on xml file and change it. But when I try to get to any part of it I get some extra part.
My xml file:
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<members>sbaa__ApprovalChain__c.ExternalID__c</members>
<members>sbaa__ApprovalCondition__c.ExternalID__c</members>
<members>sbaa__ApprovalRule__c.ExternalID__c</members>
<name>CustomField</name>
</types>
<version>40.0</version>
</Package>
And I have my code:
from lxml import etree
import sys
tree = etree.parse('package.xml')
root = tree.getroot()
print( root[0][0].tag )
As output I expect to see members but I get something like this:
{http://soap.sforce.com/2006/04/metadata}members
Why do I see that url and how to stop it from showing up?
You have defined a default namespace (Wikipedia, lxml tutorial). When defined, it is a part of every child tag.
If you want to print the tag without the namespace, it's easy
tag = root[0][0].tag
print(tag[tag.find('}')+1:])
If you want to remove the namespace from XML, see this question.

Categories

Resources