Currently doing XML-XSLT transformation using following code.
from lxml import etree
xmlRoot = etree.parse('path/abc.xml')
xslRoot = etree.parse('path/abc.xsl')
transform = etree.XSLT(xslRoot)
newdom = transform(xmlRoot)
print(etree.tostring(newdom, pretty_print=True))
The following code works fine but gives only the root element as output but not the whole XML content. When i run the transformation for the same XML and XSL file using Altova it works fine doing the transformation. Is the syntax for printing the whole XML is different or any errors in here that u find out?
XML content :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<slide name="slide7.xml" nav_lvl_1="Solutions" nav_lvl_2="Value Map" page_number="7">
<title>Retail Value Map</title>
<Subheadline>Retail </Subheadline>
</slide>
</root>
XSL content:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="UTF-8" indent="yes" method="xml" standalone="yes" version="1.0"/>
<xsl:template match="/">
<p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
<xsl:for-each select="root/slide">
<xsl:choose>
<xsl:when test="#nav_lvl_1='Solutions'">
<xsl:if test="#nav_lvl_2='Value Map'">
<p:txBody>
<a:p>
<a:r>
<a:rPr lang="en-US" dirty="0" smtClean="0"/>
<a:t>
<xsl:value-of select="title"/>
</a:t>
</a:r>
<a:endParaRPr lang="en-US" dirty="0"/>
</a:p>
</p:txBody>
</xsl:if>
</xsl:when>
</xsl:choose>
</xsl:for-each>
</p:sld>
</xsl:template>
Current output :
<p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main"/>
Related
Let's say I have this source XML:
<A>
<B>something</B>
<B>something else</B>
</A>
and I want to transform it into this target XML:
<C>
<D>something</D>
<D>something else</D>
</C>
The obvious XSL of course is this:
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="A">
<C>
<xsl:for-each select="B">
<D><xsl:value-of select="."/></D>
</xsl:for-each>
</C>
</xsl:template>
</xsl:stylesheet>
Now let's say I don't know the paths I'm going to use beforehand and I want to parametrize them from my processor, which happens to be lxml (in Python).
So I change my XSL into this:
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="path_of_B"/>
<xsl:template match="A">
<C>
<xsl:for-each select="$path_of_B">
<D><xsl:value-of select="."/></D>
</xsl:for-each>
</C>
</xsl:template>
</xsl:stylesheet>
and I call it from Python like this:
source = etree.parse("source.xml")
transform = etree.XSLT(etree.parse("transform.xsl"))
target = transform(source, path_of_B="B")
This doesn't give me the intended result because when I pass the paths from the processor they are always evaluated in a global context, the current() node is always the root, no matter where I use the parameter. Is there any way to evaluate the XPaths in the correct context like they do in the first example where I write them by hand?
I have tried many approaches like
Passing parameters in nested templates, because I thought the evaluation would have the context of the template
Passing the parameters as strings and evaluate them later, but XPath 1.0 doesn't have an eval() function like Python.
Attribute value templates, but it is not allowed on xsl elements
At some point I even touched <xsl:namespace-alias> to dynamically generate my XSL but it was very confusing.
So in the end, I solved it by pre-processing my xsl file with a template engine or string-formatting. It works, but I was just wondering if there is a "pure" XSLT+processor solution.
XPath 1.0 doesn't have an eval() function
No, but the libxslt processor supports the EXSLT dyn:evaluate() extension function - so you could do:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dyn="http://exslt.org/dynamic"
extension-element-prefixes="dyn">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:param name="path_of_B"/>
<xsl:template match="/A">
<C>
<xsl:for-each select="dyn:evaluate($path_of_B)">
<D>
<xsl:value-of select="."/>
</D>
</xsl:for-each>
</C>
</xsl:template>
</xsl:stylesheet>
If you want to parametrize both your input and output element names you could do something like this.
Although this method would not work well if your source XML's structure is not always the same.
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="e1_input" select="'A'"/>
<xsl:param name="e1_output" select="'A_OUT'"/>
<xsl:param name="e2_input" select="'B'"/>
<xsl:param name="e2_output" select="'B_OUT'"/>
<xsl:template match="/">
<xsl:for-each select="*[name()=$e1_input]">
<xsl:element name="{$e1_output}">
<xsl:for-each select="*[name()=$e2_input]">
<xsl:element name="{$e2_output}">
<xsl:apply-templates/>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
See it working here : https://xsltfiddle.liberty-development.net/aKZkh9
I have the following XML file:
<?xml version='1.0' encoding='utf8'?>
<Products>
<item type="dict">
<id type="int">37475</id>
<name type="str">something_something</name>
<slug type="str">something_something</slug>
<permalink type="str">something_something</permalink>
<date_created type="str">date</date_created>
<date_created_gmt type="str">date</date_created_gmt>
<date_modified type="str">date</date_modified>
<date_modified_gmt type="str">date</date_modified_gmt>
<type type="str">simple</type>
<status type="str">publish</status>
<featured type="bool">False</featured>
<catalog_visibility type="str">visible</catalog_visibility>
<description type="str">something_something</description>
</item>
I started with a JSON that I converted to a XML file so all of the products in that file start with the <item type="dict"> tag, which is not what I want. I would like for all of the products to be enclosed in a <product> tag.
To fix this issue I am doing the following:
tree = ET.ElementTree(root)
xmlstr = ET.tostring(root, encoding='utf8', method='xml') #xml of each product to string so that it can be edited
finalstr = xmlstr.decode("utf-8").replace(' />','') #remove wrong part
finalstr = finalstr.replace('<item type="dict"> <id type="int">','<product> <id type="int">')
This works for other problems in my XML file, but only when they are on one line.
My question is how do I select two or more lines so that I can replace them?
Desired output:
<?xml version='1.0' encoding='utf8'?>
<Products>
<product>
<id type="int">37475</id>
<name type="str">something_something</name>
<slug type="str">something_something</slug>
<permalink type="str">something_something</permalink>
<date_created type="str">date</date_created>
<date_created_gmt type="str">date</date_created_gmt>
<date_modified type="str">date</date_modified>
<date_modified_gmt type="str">date</date_modified_gmt>
<type type="str">simple</type>
<status type="str">publish</status>
<featured type="bool">False</featured>
<catalog_visibility type="str">visible</catalog_visibility>
<description type="str">something_something</description>
</product>
It should be possible with regular expression:
finalstr = re.sub('<item type="dict">[\n\s]*<id type="int">','<product>\n<id type="int">', finalstr)
This will allow you to select more than one line (notice the [\n\s]* part between xml nodes - this will select lines with any amount of new lines or whitespaces inbetween)
Read more about re.sub here: https://docs.python.org/3/library/re.html
Here is the XSLT for the scenario. It is following so called modified identity transform pattern.
If there is a need for the XML prolog, just modify omit-xml-declaration="yes" as omit-xml-declaration="no".
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity template -->
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="item">
<product>
<xsl:apply-templates />
</product>
</xsl:template>
</xsl:stylesheet>
I want to merge two XML files. I read many solutions but they are specific to those files. I am using xml.etree.ElementTree as well as lxml for parsing, comparing the files, getting the differences. I understand my next step is:
for element in file2.xml:
if element present in file1.xml:
append to output_file.xml
else:
copy element to the output_file
but I haven't worked much on XML, and the tools to merge are licensed, so I need to write a generic script to merge to the format I want.
file1.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>
<great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
<grandpa>
<grandpa_name>grandpa_name_one_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name>grandpa_name_two_1</grandpa_name>
</grandpa>
<grandma>
<grandma_name>grandma_name_one_1</grandma_name>
</grandma>
<grandma>
<grandma_name>grandma_name_two_1</grandma_name>
</grandma>
</great_grands>
file2.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>
<great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>
<grandpa>
<grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
</grandpa>
<grandma>
<grandma_name_2>grandma_name_one_2</grandma_name_2>
</grandma>
</great_grands>
Required output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>
<great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
<great_grandma_name_two>great_grandma_name</great_grandma_name_two>
<grandpa>
<grandpa_name>grandpa_name_one_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name>grandpa_name_two_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
</grandpa>
<grandma>
<grandma_name>grandma_name_one_1</grandma_name>
</grandma>
<grandma>
<grandma_name>grandma_name_two_1</grandma_name>
</grandma>
<grandma>
<grandma_name_2>grandma_name_one_2</grandma_name_2>
</grandma>
</great_grands>
Consider XSLT, the special-purpose declarative language and sibling to XPath, designed to transform XML files. Using its document() function, it can parse from external XML files at relative links. Python's lxml module can process XSLT 1.0 scripts.
And because XSLT scripts are well-formed XML files you can parse from file or embedded string. Below assumes all files and scripts are saved in same directory:
XSLT Script (save as .xsl script, notice only file2.xml is referenced)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="/great_grands">
<xsl:copy>
<xsl:copy-of select="great_grandpa_name_one"/>
<xsl:copy-of select="document('file2.xml')/great_grands/great_grandpa_name_two"/>
<xsl:copy-of select="grandpa"/>
<xsl:copy-of select="document('file2.xml')/great_grands/grandpa"/>
<xsl:copy-of select="grandma"/>
<xsl:copy-of select="document('file2.xml')/great_grands/grandma"/>
</xsl:copy>
</xsl:template>
</xsl:transform>
Python Script (notice only file1.xml is referenced)
from lxml import etree
xml = etree.parse('file1.xml')
xsl = etree.parse('XSLTScript.xsl')
transform = etree.XSLT(xsl)
newdom = transform(xml)
# SAVE NEW DOM STRING TO FILE
with open('Output.xml', 'wb') as f:
f.write(newdom)
Output
<?xml version="1.0" encoding="UTF-8"?>
<great_grands>
<great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
<great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>
<grandpa>
<grandpa_name>grandpa_name_one_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name>grandpa_name_two_1</grandpa_name>
</grandpa>
<grandpa>
<grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
</grandpa>
<grandma>
<grandma_name>grandma_name_one_1</grandma_name>
</grandma>
<grandma>
<grandma_name>grandma_name_two_1</grandma_name>
</grandma>
<grandma>
<grandma_name_2>grandma_name_one_2</grandma_name_2>
</grandma>
</great_grands>
I'm trying to make a simple XML --> CSV script, using XSLT. I found that etree seems to "want" a tag to output... Does anyone know a workaround? Yes, I've seen this post: XML to CSV Using XSLT.
See below...
Here's a sample XML data just for reference. My code doesn't even do anything with the data yet, as it was failing to even write a header.
<projects>
<project>
<name>Shockwave</name>
<language>Ruby</language>
<owner>Brian May</owner>
<state>New</state>
<startDate>31/10/2008 0:00:00</startDate>
</project>
<project>
<name>Other</name>
<language>Erlang</language>
<owner>Takashi Miike</owner>
<state> Canceled </state>
<startDate>07/11/2008 0:00:00</startDate>
</project>
</projects>
Here's my script:
import sys
from lxml import etree
system_file = sys.argv[1]
xml_file = sys.argv[2]
sys_txt = open( system_file,"r" ).read()
xsl_txt = open( "csv_file.xslt","r" ).read()
sysroot = etree.fromstring( sys_txt )
xslroot = etree.fromstring( xsl_txt )
transform = etree.XSLT( xslroot )
with open( xml_file, "w" ) as f:
f.write(etree.tostring( transform(sysroot) ) )
This XSLT code does NOT work ( etree.tostring... = None ):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
Hi
</xsl:template>
</xsl:stylesheet>
But THIS XSLT does work... seems etree needs to output an XML file?
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<dummy>
Hi
</dummy>
</xsl:template>
</xsl:stylesheet>
At this point I'm thinking I can proceed with a dummy tag, then remove it at end...
"Python etree XSLT Requires Tag output?"
The answer is NO.
As exemplified in the documentation, section XSLT result objects; you can use standard python str() function to get the expected string representation of the transformation result, especially when it has no root element :
from lxml import etree
raw_xml = '''<projects>
<project>
<name>Shockwave</name>
<language>Ruby</language>
<owner>Brian May</owner>
<state>New</state>
<startDate>31/10/2008 0:00:00</startDate>
</project>
<project>
<name>Other</name>
<language>Erlang</language>
<owner>Takashi Miike</owner>
<state>Canceled</state>
<startDate>07/11/2008 0:00:00</startDate>
</project>
</projects>'''
raw_xslt = '''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:text>Hi</xsl:text>
</xsl:template>
</xsl:stylesheet>'''
sysroot = etree.fromstring(raw_xml)
xslroot = etree.fromstring(raw_xslt)
transform = etree.XSLT(xslroot)
print str(transform(sysroot))
# output:
# Hi
And as you saw, etree.tostring() is still usable when the transformation result has a root element.
Is it possible to sort XML files like the following:
<model name="ford">
<driver>Bob</driver>
<driver>Alice</driver>
</model>
<model name="audi">
<driver>Carly</driver>
<driver>Dean</driver>
</model>
Which would become
<model name="audi">
<driver>Carly</driver>
<driver>Dean</driver>
</model>
<model name="ford">
<driver>Alice</driver>
<driver>Bob</driver>
</model>
That is, the outermost elements are sorted first, then the second outermost, and so on.
They'd need to be sorted by element name first.
This is a refinement of Kirill's solution, I think it better reflects the stated requirements, and it avoids the type error XSLT 2.0 will give you if the sort key contains more than one value (but it still works on 1.0).
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" />
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates select="*">
<xsl:sort select="(#name | text())[1]"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Try this XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" />
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()">
<xsl:sort select="text() | #*"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
You can sort nodes by removing them from the parent node, and re-inserting them in the intended order. For example:
def sort_tree(tree):
""" recursively sorts the given etree in place """
for child in tree:
sort_tree(child)
sorted_children = sorted(tree, key=lambda n: n.text)
for child in tree:
tree.remove(child)
for child in reversed(sorted_children):
tree.insert(0, child)
tree = etree.fromstring(YOUR_XML)
sort_tree(tree)
print(etree.tostring(tree, pretty_print=True))
You don't need to sort the entire xml dom.
Instead take the required nodes into a list and sort them. Because we would need the sorted order while processing and not in file, its better done in run time.
May be like this, using minidom.
import os, sys
from xml.dom import minidom
document = """\
<root>
<model name="ford">
<driver>Bob</driver>
<driver>Alice</driver>
</model><model name="audi">
<driver>Carly</driver>
<driver>Dean</driver>
</model>
</root>
"""
document = minidom.parseString(document)
elements = document.getElementsByTagName("model")
elements.sort(key=lambda elements:elements.attributes['name'])