xml_declaration = True <?xml version="1.0" encoding="UTF-8"?> - python

I am using this code:
tree.write(xmlFileOut, pretty_print = True, xml_declaration = True, encoding='UTF-8')
to write my xml with xml declaration but it is producing:
<?xml version='1.0' encoding='UTF-8'?>
But I need it to produce:
<?xml version="1.0" encoding="UTF-8"?>
I am using python with lxml.
What do I need to do?
Cheers.

Related

Modify xml file with extra namespace

I want to modify an existing xml file.
The layout of the existing file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
After i modified a field in the xml i want to get the new xml file, but the modified file is different from the original.
<?xml version='1.0' encoding='UTF-8'?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
Whats the difference between the files:
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" //this is missing in the mutated file
So what i did:
ET.register_namespace('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
ET.register_namespace('', "urn:iso:std:iso:20022:tech:xsd:pain.001.001.03")
#parse the data
tree = ET.parse(self.sepa_xml.path)
root = tree.getroot()
#add a subelement
body = ET.SubElement(root, "{http://www.w3.org/2001/XMLSchema-instance}")
The finale result:
<?xml version='1.0' encoding='UTF-8'?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CstmrCdtTrfInitn>
<PmtInf>
<PmtInfId>20220929085842-36645</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
<CtgyPurp>
<Cd>SALA</Cd>
</CtgyPurp>
</PmtTpInf>
<ReqdExctnDt>2022-09-29</ReqdExctnDt>
<Dbtr>
<Nm>test name</Nm>
</Dbtr>
</DbtrAgt>
<ChrgBr>SLEV</ChrgBr>
<CdtTrfTxInf>
<PmtId>
<EndToEndId>20220929085842-36645/1</EndToEndId>
</PmtId>
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
<xsi: /></Document>. // how can i delete this xsi tag ?
The problem is now that is get an extra tag at the end of the xml file:
<xsi: />. I assume this is because i added a subelement. How can ik delete this last tag ?

Exporting XML header and doctype to XML file [duplicate]

I have tried to use the answer in this question, but can't make it work: How to create "virtual root" with Python's ElementTree?
Here's my code:
import xml.etree.cElementTree as ElementTree
from StringIO import StringIO
s = '<?xml version=\"1.0\" encoding=\"UTF-8\" ?><!DOCTYPE tmx SYSTEM \"tmx14a.dtd\" ><tmx version=\"1.4a\" />'
tree = ElementTree.parse(StringIO(s)).getroot()
header = ElementTree.SubElement(tree,'header',{'adminlang': 'EN',})
body = ElementTree.SubElement(tree,'body')
ElementTree.ElementTree(tree).write('myfile.tmx','UTF-8')
When I open the resulting 'myfile.tmx' file, it contains this:
<?xml version='1.0' encoding='UTF-8'?>
<tmx version="1.4a"><header adminlang="EN" /><body /></tmx>
What am I missing? or, is there a better tool?
You could set xml_declaration argument on write function to False, so output won't have xml declaration with encoding, then just append what header you need manually. Actually if you set your encoding as 'utf-8' (lowercase), xml declaration won't be added too.
import xml.etree.cElementTree as ElementTree
tree = ElementTree.Element('tmx', {'version': '1.4a'})
ElementTree.SubElement(tree, 'header', {'adminlang': 'EN'})
ElementTree.SubElement(tree, 'body')
with open('myfile.tmx', 'wb') as f:
f.write('<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE tmx SYSTEM "tmx14a.dtd">'.encode('utf8'))
ElementTree.ElementTree(tree).write(f, 'utf-8')
Resulting file (newlines added manually for readability):
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE tmx SYSTEM "tmx14a.dtd">
<tmx version="1.4a">
<header adminlang="EN" />
<body />
</tmx>
You could use lxml and its tostring function:
from lxml import etree
s = """<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4a"/>"""
tree = etree.fromstring(s)
header = etree.SubElement(tree,'header',{'adminlang': 'EN'})
body = etree.SubElement(tree,'body')
print etree.tostring(tree, encoding="UTF-8",
xml_declaration=True,
pretty_print=True,
doctype='<!DOCTYPE tmx SYSTEM "tmx14a.dtd">')
=>
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE tmx SYSTEM "tmx14a.dtd">
<tmx version="1.4a">
<header adminlang="EN"/>
<body/>
</tmx>
I used different solution to add DOCTYPE, very simple, very stupid.
import xml.etree.ElementTree as ET
with open(path_file, "w", encoding='UTF-8') as xf:
doc_type = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE dlg:window ' \
'PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "dialog.dtd">'
tostring = ET.tostring(root).decode('utf-8')
file = f"{doc_type}{tostring}"
xf.write(file)
I couldn't find a solution to this problem either using vanilla ElementTree, and the solution proposed by demalexx created non-valid XML that was rejected by my application (DITA).
What I propose is a workaround involving other modules and it works perfectly for me.
import re
# found no way for cleanly specify a <!DOCTYPE ...> stanza in ElementTree so
# so we substitute the current <?xml ... ?> stanza with a full <?xml... + <!DOCTYPE...
new_header = '<?xml version="1.0" encoding="UTF-8" ?>\n' \
'<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">\n'
target_xml = re.sub(u"\<\?xml .+?>", new_header, source_xml)
with open(filename, 'w') as catalog_file:
catalog_file.write(target_xml.encode('utf8'))

Read a non formatted xml and export it again formatted? [duplicate]

Here is the code but the exported xml appears badly formatted.
import xml.etree.ElementTree as ET
import os
sampleXML = """<?xml version="1.0" encoding="ASCII"?>
<Metadata version="1.0">
<CODE_OK>510</CODE_OK>
<DeliveryDate>13/08/2018</DeliveryDate>
</Metadata>
"""
tree = ET.ElementTree(ET.fromstring(sampleXML))
for folder in os.listdir("YourPath"): #Iterate the dir
tree.find("CODE_OK").text = folder #Update dir name in XML
tree.write(open(os.path.join(r"Path", folder, "newxml.xml"), "wb")) #Write to XML
How to make the exported xml appear normally formatted?
I found in docs that xml module has an implementation of Document Object Model interface. I provide a simple example
from xml.dom.minidom import parseString
example = parseString(sampleXML) # your string
# write to file
with open('file.xml', 'w') as file:
example.writexml(file, indent='\n', addindent=' ')
Output:
<?xml version="1.0" ?>
<Metadata version="1.0">
<CODE_OK>510</CODE_OK>
<DeliveryDate>13/08/2018</DeliveryDate>
</Metadata>
Update
You can also write like this
example = parseString(sampleXML).toprettyxml()
with open('file.xml', 'w') as file:
file.write(example)
Output:
<?xml version="1.0" ?>
<Metadata version="1.0">
<CODE_OK>510</CODE_OK>
<DeliveryDate>13/08/2018</DeliveryDate>
</Metadata>
Update 2
I copy all your code and only add indent from this site. And for me is working correctly
import xml.etree.ElementTree as ET
import os
sampleXML = "your xml"
tree = ET.ElementTree(ET.fromstring(sampleXML))
indent(tree.getroot()) # this I add
for folder in os.listdir(path):
tree.find("CODE_OK").text = folder
tree.write(open(os.path.join(path, folder, "newxml.xml"), "wb"))

Header for XML files with xml.etree.ElementTree [duplicate]

I have tried to use the answer in this question, but can't make it work: How to create "virtual root" with Python's ElementTree?
Here's my code:
import xml.etree.cElementTree as ElementTree
from StringIO import StringIO
s = '<?xml version=\"1.0\" encoding=\"UTF-8\" ?><!DOCTYPE tmx SYSTEM \"tmx14a.dtd\" ><tmx version=\"1.4a\" />'
tree = ElementTree.parse(StringIO(s)).getroot()
header = ElementTree.SubElement(tree,'header',{'adminlang': 'EN',})
body = ElementTree.SubElement(tree,'body')
ElementTree.ElementTree(tree).write('myfile.tmx','UTF-8')
When I open the resulting 'myfile.tmx' file, it contains this:
<?xml version='1.0' encoding='UTF-8'?>
<tmx version="1.4a"><header adminlang="EN" /><body /></tmx>
What am I missing? or, is there a better tool?
You could set xml_declaration argument on write function to False, so output won't have xml declaration with encoding, then just append what header you need manually. Actually if you set your encoding as 'utf-8' (lowercase), xml declaration won't be added too.
import xml.etree.cElementTree as ElementTree
tree = ElementTree.Element('tmx', {'version': '1.4a'})
ElementTree.SubElement(tree, 'header', {'adminlang': 'EN'})
ElementTree.SubElement(tree, 'body')
with open('myfile.tmx', 'wb') as f:
f.write('<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE tmx SYSTEM "tmx14a.dtd">'.encode('utf8'))
ElementTree.ElementTree(tree).write(f, 'utf-8')
Resulting file (newlines added manually for readability):
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE tmx SYSTEM "tmx14a.dtd">
<tmx version="1.4a">
<header adminlang="EN" />
<body />
</tmx>
You could use lxml and its tostring function:
from lxml import etree
s = """<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4a"/>"""
tree = etree.fromstring(s)
header = etree.SubElement(tree,'header',{'adminlang': 'EN'})
body = etree.SubElement(tree,'body')
print etree.tostring(tree, encoding="UTF-8",
xml_declaration=True,
pretty_print=True,
doctype='<!DOCTYPE tmx SYSTEM "tmx14a.dtd">')
=>
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE tmx SYSTEM "tmx14a.dtd">
<tmx version="1.4a">
<header adminlang="EN"/>
<body/>
</tmx>
I used different solution to add DOCTYPE, very simple, very stupid.
import xml.etree.ElementTree as ET
with open(path_file, "w", encoding='UTF-8') as xf:
doc_type = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE dlg:window ' \
'PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "dialog.dtd">'
tostring = ET.tostring(root).decode('utf-8')
file = f"{doc_type}{tostring}"
xf.write(file)
I couldn't find a solution to this problem either using vanilla ElementTree, and the solution proposed by demalexx created non-valid XML that was rejected by my application (DITA).
What I propose is a workaround involving other modules and it works perfectly for me.
import re
# found no way for cleanly specify a <!DOCTYPE ...> stanza in ElementTree so
# so we substitute the current <?xml ... ?> stanza with a full <?xml... + <!DOCTYPE...
new_header = '<?xml version="1.0" encoding="UTF-8" ?>\n' \
'<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">\n'
target_xml = re.sub(u"\<\?xml .+?>", new_header, source_xml)
with open(filename, 'w') as catalog_file:
catalog_file.write(target_xml.encode('utf8'))

How to create <!DOCTYPE> with Python's cElementTree

I have tried to use the answer in this question, but can't make it work: How to create "virtual root" with Python's ElementTree?
Here's my code:
import xml.etree.cElementTree as ElementTree
from StringIO import StringIO
s = '<?xml version=\"1.0\" encoding=\"UTF-8\" ?><!DOCTYPE tmx SYSTEM \"tmx14a.dtd\" ><tmx version=\"1.4a\" />'
tree = ElementTree.parse(StringIO(s)).getroot()
header = ElementTree.SubElement(tree,'header',{'adminlang': 'EN',})
body = ElementTree.SubElement(tree,'body')
ElementTree.ElementTree(tree).write('myfile.tmx','UTF-8')
When I open the resulting 'myfile.tmx' file, it contains this:
<?xml version='1.0' encoding='UTF-8'?>
<tmx version="1.4a"><header adminlang="EN" /><body /></tmx>
What am I missing? or, is there a better tool?
You could set xml_declaration argument on write function to False, so output won't have xml declaration with encoding, then just append what header you need manually. Actually if you set your encoding as 'utf-8' (lowercase), xml declaration won't be added too.
import xml.etree.cElementTree as ElementTree
tree = ElementTree.Element('tmx', {'version': '1.4a'})
ElementTree.SubElement(tree, 'header', {'adminlang': 'EN'})
ElementTree.SubElement(tree, 'body')
with open('myfile.tmx', 'wb') as f:
f.write('<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE tmx SYSTEM "tmx14a.dtd">'.encode('utf8'))
ElementTree.ElementTree(tree).write(f, 'utf-8')
Resulting file (newlines added manually for readability):
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE tmx SYSTEM "tmx14a.dtd">
<tmx version="1.4a">
<header adminlang="EN" />
<body />
</tmx>
You could use lxml and its tostring function:
from lxml import etree
s = """<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4a"/>"""
tree = etree.fromstring(s)
header = etree.SubElement(tree,'header',{'adminlang': 'EN'})
body = etree.SubElement(tree,'body')
print etree.tostring(tree, encoding="UTF-8",
xml_declaration=True,
pretty_print=True,
doctype='<!DOCTYPE tmx SYSTEM "tmx14a.dtd">')
=>
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE tmx SYSTEM "tmx14a.dtd">
<tmx version="1.4a">
<header adminlang="EN"/>
<body/>
</tmx>
I used different solution to add DOCTYPE, very simple, very stupid.
import xml.etree.ElementTree as ET
with open(path_file, "w", encoding='UTF-8') as xf:
doc_type = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE dlg:window ' \
'PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "dialog.dtd">'
tostring = ET.tostring(root).decode('utf-8')
file = f"{doc_type}{tostring}"
xf.write(file)
I couldn't find a solution to this problem either using vanilla ElementTree, and the solution proposed by demalexx created non-valid XML that was rejected by my application (DITA).
What I propose is a workaround involving other modules and it works perfectly for me.
import re
# found no way for cleanly specify a <!DOCTYPE ...> stanza in ElementTree so
# so we substitute the current <?xml ... ?> stanza with a full <?xml... + <!DOCTYPE...
new_header = '<?xml version="1.0" encoding="UTF-8" ?>\n' \
'<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">\n'
target_xml = re.sub(u"\<\?xml .+?>", new_header, source_xml)
with open(filename, 'w') as catalog_file:
catalog_file.write(target_xml.encode('utf8'))

Categories

Resources