Creating .docs file from .xml, .xslt and .docs template, using python - python

I am trying to create a .docx file, using the .docx template, .xml and .xslt file. I want to fill the placeholders in the .docx template file, with the data in the .xml file, and then generate a new word file, containing the data.
The template.docx file looks like this:
The data.xml file looks like this:
<root>
<person>
<Name>John</Name>
<profession>dentist</profession>
<city>Miami</city>
</person>
<person>
<Name>Mia</Name>
<profession>teacher</profession>
<city>London</city>
</person>
</root>
The parser.xslt file that I came up with looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="">
<xsl:template match="/">
<xsl:for-each select="root/person">
<xsl:value-of select="Name"/>
<xsl:value-of select="profession"/>
<xsl:value-of select="city"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The output result.docs file should look like this:
My python code that I came up with looks like this:
import lxml.etree as ET
dom = ET.parse('data.xml')
xslt = ET.parse(r'parser.xslt')
transform = ET.XSLT(xslt)
newdom = transform(dom)
I don't know what the content of the .xslt file must be, in order to work, and how to create the result.docx
Any kind of help will be appreciable

Related

Modify xml file with extra namespace

I want to modify an existing xml file.
The layout of the existing file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
After i modified a field in the xml i want to get the new xml file, but the modified file is different from the original.
<?xml version='1.0' encoding='UTF-8'?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
Whats the difference between the files:
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" //this is missing in the mutated file
So what i did:
ET.register_namespace('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
ET.register_namespace('', "urn:iso:std:iso:20022:tech:xsd:pain.001.001.03")
#parse the data
tree = ET.parse(self.sepa_xml.path)
root = tree.getroot()
#add a subelement
body = ET.SubElement(root, "{http://www.w3.org/2001/XMLSchema-instance}")
The finale result:
<?xml version='1.0' encoding='UTF-8'?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CstmrCdtTrfInitn>
<PmtInf>
<PmtInfId>20220929085842-36645</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
<CtgyPurp>
<Cd>SALA</Cd>
</CtgyPurp>
</PmtTpInf>
<ReqdExctnDt>2022-09-29</ReqdExctnDt>
<Dbtr>
<Nm>test name</Nm>
</Dbtr>
</DbtrAgt>
<ChrgBr>SLEV</ChrgBr>
<CdtTrfTxInf>
<PmtId>
<EndToEndId>20220929085842-36645/1</EndToEndId>
</PmtId>
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
<xsi: /></Document>. // how can i delete this xsi tag ?
The problem is now that is get an extra tag at the end of the xml file:
<xsi: />. I assume this is because i added a subelement. How can ik delete this last tag ?

How to access the tag below another tag in xml using xml.dom.minidom in python?

I am using python 3.10.4 . I am new at parsing xml files.
like for eg, let the xml file be with the filename "test.xml":
<?xml version="1.0" encoding="UTF-8"?>
<tag1 name="1">
<tag2 name="a"></tag2>
</tag1>
<tag1 name = "2">
<tag2 name = "b"></tag2>
</tag1>
</xml>
python code
import xml.dom.minidom
file = xml.dom.minidom.parse('test.xml')
list = []
tags=file.getElementsByTagName("tag1")
for tag in tags:
if(tag.getAttribute("name")=="1"):
print(tag.getAttribute("tag2"))
So here I want to access the tag2 of tag1 with name="1". How can I do it?

Read a non formatted xml and export it again formatted? [duplicate]

Here is the code but the exported xml appears badly formatted.
import xml.etree.ElementTree as ET
import os
sampleXML = """<?xml version="1.0" encoding="ASCII"?>
<Metadata version="1.0">
<CODE_OK>510</CODE_OK>
<DeliveryDate>13/08/2018</DeliveryDate>
</Metadata>
"""
tree = ET.ElementTree(ET.fromstring(sampleXML))
for folder in os.listdir("YourPath"): #Iterate the dir
tree.find("CODE_OK").text = folder #Update dir name in XML
tree.write(open(os.path.join(r"Path", folder, "newxml.xml"), "wb")) #Write to XML
How to make the exported xml appear normally formatted?
I found in docs that xml module has an implementation of Document Object Model interface. I provide a simple example
from xml.dom.minidom import parseString
example = parseString(sampleXML) # your string
# write to file
with open('file.xml', 'w') as file:
example.writexml(file, indent='\n', addindent=' ')
Output:
<?xml version="1.0" ?>
<Metadata version="1.0">
<CODE_OK>510</CODE_OK>
<DeliveryDate>13/08/2018</DeliveryDate>
</Metadata>
Update
You can also write like this
example = parseString(sampleXML).toprettyxml()
with open('file.xml', 'w') as file:
file.write(example)
Output:
<?xml version="1.0" ?>
<Metadata version="1.0">
<CODE_OK>510</CODE_OK>
<DeliveryDate>13/08/2018</DeliveryDate>
</Metadata>
Update 2
I copy all your code and only add indent from this site. And for me is working correctly
import xml.etree.ElementTree as ET
import os
sampleXML = "your xml"
tree = ET.ElementTree(ET.fromstring(sampleXML))
indent(tree.getroot()) # this I add
for folder in os.listdir(path):
tree.find("CODE_OK").text = folder
tree.write(open(os.path.join(path, folder, "newxml.xml"), "wb"))

Replace a number in an xml using loop?

Haven't done this kind of process in xml before.
I have these empty folders, called: 125,127,128
and I have this xml:
<?xml version="1.0" encoding="ASCII"?>
<Metadata version="1.0">
<CODE_OK>510</CODE_OK>
<DeliveryDate>13/08/2018</DeliveryDate>
I want to replace the number between:<CODE_OK>510</CODE_OK> with the number that is each folder's name:125,127 and 128 and drop each new xml in the corresponding folder.
This is one approach.
import xml.etree.ElementTree as ET
import os
sampleXML = """<?xml version="1.0" encoding="ASCII"?>
<Metadata version="1.0">
<CODE_OK>510</CODE_OK>
<DeliveryDate>13/08/2018</DeliveryDate>
</Metadata>
"""
tree = ET.ElementTree(ET.fromstring(sampleXML))
for folder in os.listdir("YourPath"): #Iterate the dir
tree.find("CODE_OK").text = folder #Update dir name in XML
tree.write(open(os.path.join(r"YourPath", folder, "yourxml.xml"), "w")) #Write to XML

Merging Lots of XML files

I have lots of xml files that I need to merge. I have tried this link at merging xml files using python's ElementTree
whose code is (Edited as per my need):
import os, os.path, sys
import glob
from xml.etree import ElementTree
def run(files):
xml_files = glob.glob(files +"/*.xml")
xml_element_tree = None
for xml_file in xml_files:
print xml_file
data = ElementTree.parse(xml_file).getroot()
# print ElementTree.tostring(data)
for result in data.iter('TALLYMESSAGE'):
if xml_element_tree is None:
xml_element_tree = data
insertion_point = xml_element_tree.findall("./BODY/DATA/TALLYMESSAGE")[0]
else:
insertion_point.extend(result)
if xml_element_tree is not None:
f = open("myxmlfile.xml", "wb")
f.write(ElementTree.tostring(xml_element_tree))
run("F:/data/data")
But the problem is that I have lots of XML file, 365 to be precise and each one is atleast 2 mb. merging them all has lead to crashing of my PC.
This is the image of the xml tree of my xml file:
My new updated code is:
import os, os.path, sys
import glob
from lxml import etree
def XSLFILE(files):
xml_files = glob.glob(files +"/*.xml")
#print xml_files[0]
xslstring = """<?xml version="1.0" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/DATA">
<DATA>
<xsl:copy>
<xsl:copy-of select="TALLYMESSAGE"/>\n"""
#print xslstring
for xmlfile in xml_files[1:]:
xslstring = xslstring + '<xsl:copy-of select="document(\'' + xmlfile[-16:] + "')/BODY/DATA/TALLYMESSAGE\"/>\n"
xslstring = xslstring + """</xsl:copy>+
</DATA>
</xsl:template>
</xsl:transform>"""
#print xslstring
with open("parsingxsl.xsl", "w") as f:
f.write(xslstring)
with open(xml_files[0], "r") as f:
dom = etree.XML(f.read())
print etree.tostring(dom)
with open('F:\data\parsingxsl.xsl', "r") as f:
xslt_tree = etree.XML(f.read())
print xslt_tree
transform = etree.XSLT(xslt_tree)
newdom = transform(dom)
#print newdom
tree_out = etree.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
print(tree_out)
xmlfile = open('F:\data\OutputFile.xml','wb')
xmlfile.write(tree_out)
xmlfile.close()
XSLFILE("F:\data\data")
The same when run creates the following error:
Traceback (most recent call last):
File "F:\data\xmlmergexsl.py", line 38, in <module>
XSLFILE("F:\data\data")
File "F:\data\xmlmergexsl.py", line 36, in XSLFILE
xmlfile.write(tree_out)
TypeError: must be string or buffer, not None
Consider using XSLT and its document() function to merge XML files. Python (like many object-oriented programming languages) maintain an XSLT processor like in its lxml module. As information, XSLT is a declarative programming language to transform XML files in various formats and structures.
For your purposes, XSLT may be more efficient than using programming code to develop files as no lists or loops or other objects are held in memory during processing except what the XSLT processor would use.
XSLT (to be saved externally as .xsl file)
Consider initially running a Python write to text file looping to fill in all 365 documents to avoid copy and paste. Also notice first document is skipped since it is the starting point used in Python script below:
<?xml version="1.0" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="DATA">
<DATA>
<xsl:copy>
<xsl:copy-of select="TALLYMESSAGE"/>
<xsl:copy-of select="document('Document2.xml')/BODY/DATA/TALLYMESSAGE"/>
<xsl:copy-of select="document('Document3.xml')/BODY/DATA/TALLYMESSAGE"/>
<xsl:copy-of select="document('Document4.xml')/BODY/DATA/TALLYMESSAGE"/>
...
<xsl:copy-of select="document('Document365.xml')/BODY/DATA/TALLYMESSAGE"/>
</xsl:copy>
</DATA>
</xsl:template>
</xsl:transform>
Python (to be included in you overall script)
import lxml.etree as ET
dom = ET.parse('C:\Path\To\XML\Document1.xml')
xslt = ET.parse('C:\Path\To\XSL\file.xsl')
transform = ET.XSLT(xslt)
newdom = transform(dom)
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
print(tree_out)
xmlfile = open('C:\Path\To\XML\OutputFile.xml','wb')
xmlfile.write(tree_out)
xmlfile.close()

Categories

Resources