How to evaluate XSLT processor parameters in a local context? - python

Let's say I have this source XML:
<A>
<B>something</B>
<B>something else</B>
</A>
and I want to transform it into this target XML:
<C>
<D>something</D>
<D>something else</D>
</C>
The obvious XSL of course is this:
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="A">
<C>
<xsl:for-each select="B">
<D><xsl:value-of select="."/></D>
</xsl:for-each>
</C>
</xsl:template>
</xsl:stylesheet>
Now let's say I don't know the paths I'm going to use beforehand and I want to parametrize them from my processor, which happens to be lxml (in Python).
So I change my XSL into this:
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="path_of_B"/>
<xsl:template match="A">
<C>
<xsl:for-each select="$path_of_B">
<D><xsl:value-of select="."/></D>
</xsl:for-each>
</C>
</xsl:template>
</xsl:stylesheet>
and I call it from Python like this:
source = etree.parse("source.xml")
transform = etree.XSLT(etree.parse("transform.xsl"))
target = transform(source, path_of_B="B")
This doesn't give me the intended result because when I pass the paths from the processor they are always evaluated in a global context, the current() node is always the root, no matter where I use the parameter. Is there any way to evaluate the XPaths in the correct context like they do in the first example where I write them by hand?
I have tried many approaches like
Passing parameters in nested templates, because I thought the evaluation would have the context of the template
Passing the parameters as strings and evaluate them later, but XPath 1.0 doesn't have an eval() function like Python.
Attribute value templates, but it is not allowed on xsl elements
At some point I even touched <xsl:namespace-alias> to dynamically generate my XSL but it was very confusing.
So in the end, I solved it by pre-processing my xsl file with a template engine or string-formatting. It works, but I was just wondering if there is a "pure" XSLT+processor solution.

XPath 1.0 doesn't have an eval() function
No, but the libxslt processor supports the EXSLT dyn:evaluate() extension function - so you could do:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dyn="http://exslt.org/dynamic"
extension-element-prefixes="dyn">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:param name="path_of_B"/>
<xsl:template match="/A">
<C>
<xsl:for-each select="dyn:evaluate($path_of_B)">
<D>
<xsl:value-of select="."/>
</D>
</xsl:for-each>
</C>
</xsl:template>
</xsl:stylesheet>

If you want to parametrize both your input and output element names you could do something like this.
Although this method would not work well if your source XML's structure is not always the same.
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="e1_input" select="'A'"/>
<xsl:param name="e1_output" select="'A_OUT'"/>
<xsl:param name="e2_input" select="'B'"/>
<xsl:param name="e2_output" select="'B_OUT'"/>
<xsl:template match="/">
<xsl:for-each select="*[name()=$e1_input]">
<xsl:element name="{$e1_output}">
<xsl:for-each select="*[name()=$e2_input]">
<xsl:element name="{$e2_output}">
<xsl:apply-templates/>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
See it working here : https://xsltfiddle.liberty-development.net/aKZkh9

Related

Find text accross mutiple lines to replace it with python (xml)

I have the following XML file:
<?xml version='1.0' encoding='utf8'?>
<Products>
<item type="dict">
<id type="int">37475</id>
<name type="str">something_something</name>
<slug type="str">something_something</slug>
<permalink type="str">something_something</permalink>
<date_created type="str">date</date_created>
<date_created_gmt type="str">date</date_created_gmt>
<date_modified type="str">date</date_modified>
<date_modified_gmt type="str">date</date_modified_gmt>
<type type="str">simple</type>
<status type="str">publish</status>
<featured type="bool">False</featured>
<catalog_visibility type="str">visible</catalog_visibility>
<description type="str">something_something</description>
</item>
I started with a JSON that I converted to a XML file so all of the products in that file start with the <item type="dict"> tag, which is not what I want. I would like for all of the products to be enclosed in a <product> tag.
To fix this issue I am doing the following:
tree = ET.ElementTree(root)
xmlstr = ET.tostring(root, encoding='utf8', method='xml') #xml of each product to string so that it can be edited
finalstr = xmlstr.decode("utf-8").replace(' />','') #remove wrong part
finalstr = finalstr.replace('<item type="dict"> <id type="int">','<product> <id type="int">')
This works for other problems in my XML file, but only when they are on one line.
My question is how do I select two or more lines so that I can replace them?
Desired output:
<?xml version='1.0' encoding='utf8'?>
<Products>
<product>
<id type="int">37475</id>
<name type="str">something_something</name>
<slug type="str">something_something</slug>
<permalink type="str">something_something</permalink>
<date_created type="str">date</date_created>
<date_created_gmt type="str">date</date_created_gmt>
<date_modified type="str">date</date_modified>
<date_modified_gmt type="str">date</date_modified_gmt>
<type type="str">simple</type>
<status type="str">publish</status>
<featured type="bool">False</featured>
<catalog_visibility type="str">visible</catalog_visibility>
<description type="str">something_something</description>
</product>
It should be possible with regular expression:
finalstr = re.sub('<item type="dict">[\n\s]*<id type="int">','<product>\n<id type="int">', finalstr)
This will allow you to select more than one line (notice the [\n\s]* part between xml nodes - this will select lines with any amount of new lines or whitespaces inbetween)
Read more about re.sub here: https://docs.python.org/3/library/re.html
Here is the XSLT for the scenario. It is following so called modified identity transform pattern.
If there is a need for the XML prolog, just modify omit-xml-declaration="yes" as omit-xml-declaration="no".
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity template -->
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="item">
<product>
<xsl:apply-templates />
</product>
</xsl:template>
</xsl:stylesheet>

Putting Namespaces into Different XML Tags in Python

I have an xml file in tmp/Program.ev3p:
<?xml version="1.0" encoding="utf-8"?>
<SourceFile Version="1.0.2.10" xmlns="http://www.ni.com/SourceModel.xsd">
<Namespace Name="Project">
<VirtualInstrument IsTopLevel="false" IsReentrant="false" Version="1.0.2.0" OverridingModelDefinitionType="X3VIDocument" xmlns="http://www.ni.com/VirtualInstrument.xsd">
<FrontPanel>
<fpruntime:FrontPanelCanvas xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:fpruntime="clr-namespace:NationalInstruments.LabVIEW.FrontPanelRuntime;assembly=NationalInstruments.LabVIEW.FrontPanelRuntime" xmlns:Model="clr-namespace:NationalInstruments.SourceModel.Designer;assembly=NationalInstruments.SourceModel" x:Name="FrontPanel" Model:DesignerSurfaceProperties.CanSnapToObjects="True" Model:DesignerSurfaceProperties.SnapToObjects="True" Model:DesignerSurfaceProperties.ShowSnaplines="True" Model:DesignerSurfaceProperties.ShowControlAdorners="True" Width="640" Height="480" />
</FrontPanel>
<BlockDiagram Name="__RootDiagram__">
<StartBlock Id="n1" Bounds="0 0 70 91" Target="X3\.Lib:StartBlockTest">
<ConfigurableMethodTerminal>
<Terminal Id="Result" Direction="Output" DataType="Boolean" Hotspot="0.5 1" Bounds="0 0 0 0" />
</ConfigurableMethodTerminal>
<Terminal Id="SequenceOut" Direction="Output" DataType="NationalInstruments:SourceModel:DataTypes:X3SequenceWireDataType" Hotspot="1 0.5" Bounds="52 33 18 18" />
</StartBlock>
</BlockDiagram>
</VirtualInstrument>
</Namespace>
</SourceFile>
I am trying to modify it with the following code:
import xml.etree.ElementTree as ET
tree = ET.parse('tmp/Program.ev3p')
root = tree.getroot()
namespaces = {'http://www.ni.com/SourceModel.xsd': '' ,
'http://www.ni.com/VirtualInstrument.xsd':'',
'http://schemas.microsoft.com/winfx/2006/xaml/presentation':'',
'http://schemas.microsoft.com/winfx/2006/xaml':'x',
'clr-namespace:NationalInstruments.LabVIEW.FrontPanelRuntime;assembly=NationalInstruments.LabVIEW.FrontPanelRuntime':'fpruntime',
'clr-namespace:NationalInstruments.SourceModel.Designer;assembly=NationalInstruments.SourceModel': 'Model',
}
for uri, prefix in namespaces.items():
ET._namespace_map[uri] = prefix
diagram = root[0][0][1]
elem = ET.Element('Data')
diagram.append(elem)
tree.write('tmp/Program.ev3p',"UTF-8",xml_declaration=True)
After running the code, my xml file contains:
<?xml version='1.0' encoding='UTF-8'?>
<SourceFile xmlns="http://www.ni.com/SourceModel.xsd" xmlns="http://www.ni.com/VirtualInstrument.xsd" xmlns:Model="clr-namespace:NationalInstruments.SourceModel.Designer;assembly=NationalInstruments.SourceModel" xmlns:fpruntime="clr-namespace:NationalInstruments.LabVIEW.FrontPanelRuntime;assembly=NationalInstruments.LabVIEW.FrontPanelRuntime" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Version="1.0.2.10">
<Namespace Name="Project">
<VirtualInstrument IsReentrant="false" IsTopLevel="false" OverridingModelDefinitionType="X3VIDocument" Version="1.0.2.0">
<FrontPanel>
<fpruntime:FrontPanelCanvas Height="480" Width="640" Model:DesignerSurfaceProperties.CanSnapToObjects="True" Model:DesignerSurfaceProperties.ShowControlAdorners="True" Model:DesignerSurfaceProperties.ShowSnaplines="True" Model:DesignerSurfaceProperties.SnapToObjects="True" x:Name="FrontPanel" />
</FrontPanel>
<BlockDiagram Name="__RootDiagram__">
<StartBlock Bounds="0 0 70 91" Id="n1" Target="X3\.Lib:StartBlockTest">
<ConfigurableMethodTerminal>
<Terminal Bounds="0 0 0 0" DataType="Boolean" Direction="Output" Hotspot="0.5 1" Id="Result" />
</ConfigurableMethodTerminal>
<Terminal Bounds="52 33 18 18" DataType="NationalInstruments:SourceModel:DataTypes:X3SequenceWireDataType" Direction="Output" Hotspot="1 0.5" Id="SequenceOut" />
</StartBlock>
<Data /></BlockDiagram>
</VirtualInstrument>
</Namespace>
</SourceFile>
I need the namespaces to be in the tags they were registered in the original file instead of having all of them inside SourceFile, is it possible to achieve so in python?
Currently, the undocumented ElementTree._namespace_map[uri] = prefix is the older Python version (< 1.3) for namespace assignment to the more current, documented ElementTree.register_namespace(prefix, uri). But even this method does not resolve the root issue and docs emphasize this assignment applies globally and replaces any previous namespace or prefix:
xml.etree.ElementTree.register_namespace(prefix, uri)
Registers a namespace prefix. The registry is global, and any existing
mapping for either the given prefix or the namespace URI will be
removed. prefix is a namespace prefix. uri is a namespace uri. Tags
and attributes in this namespace will be serialized with the given
prefix, if at all possible.
To achieve your desired result and because your XML is a bit complex with multiple default and non-default namespaces, consider XSLT, the special-purpose language to transform XML files. Python can run XSLT 1.0 scripts with the third-party module, lxml (not built-in etree). Additionally, XSLT is portable so very code can run in other languages (Java, C#, PHP, VB) and dedicated processors (e.g., Saxon, Xalan).
Specifically, you can use a temporary prefix like doc to map the default namespace of lowest level parent, VirtualInstrument and use this prefix to identify the needed nodes. All other elements are copied over as is with the identity transform template. Also, because you are adding an element to the default namespace you can assign it with the xsl:element tag.
XSLT (save below as .xsl file, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://www.ni.com/VirtualInstrument.xsd">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="doc:BlockDiagram">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
<xsl:element name="Data" namespace="http://www.ni.com/VirtualInstrument.xsd"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Python
import lxml.etree as ET
# LOAD XML AND XSL
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')
# TRANSFORM INPUT
transform = ET.XSLT(xslt)
newdom = transform(dom)
# OUTPUT RESULT TREE TO CONSOLE
print(newdom)
# SAVE RESULT TREE AS XML
with open('Output.xml','wb') as f:
f.write(newdom)
XSLT Demo

Modify xml using python

I have an xml file already generated by python and it looks like this. It has multiple items.
xml_screenshot
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:sparkle="http://www.andymatuschak.org/xml-namespaces/sparkle" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>title-name-xyz</title>
<link>http://dist.stage.xyzgauri.com/qa/partner/mac.xml</link>
<description>Most recent changes</description>
<language>en</language>
<item>
<title>Version 3.0.22.4</title>
<sparkle:releaseNotesLink>
https://dist.stage.xyzgauri.com.com/qa/partner/mac_notes.html
</sparkle:releaseNotesLink>
<pubDate>Thu, 12 Nov 2015 04:38:23 -0000</pubDate>
<enclosure
url="https://dist.stage.xyzgauri.com/qa/sandisk/InstallCloud.3.0.22.4.pkg"
sparkle:version="3.0.22.4"
sparkle:shortVersionString="3.0.22"
openlength="30455215"
type="application/octet-stream"
sparkle:dsaSignature="MCwCFHvf7peesvwR0AhRbZxTViLarxcjfd758mHPbnOW6wA=="
sparkle:status="live"
/>
<item>
<title>Version 3.0.10.4</title>
<sparkle:releaseNotesLink>
http://dist.stage.xyzgauri.com/qa/partner/mac_notes.html
</sparkle:releaseNotesLink>
<pubDate>Tue, 03 Nov 2015 04:31:18 -0000</pubDate>
<enclosure
url="http://dist.stage.xyzgauri.com/qa/partner/InstallCloud.3.0.10.4.pkg"
sparkle:version="3.0.10.4"
sparkle:shortVersionString="3.0.10"
openlength="29709636"
type="application/octet-stream"
sparkle:dsaSignature="MCwCFDPvLPr7lYkrx5L5XCDbhXYqrFkGzLtLePK6ng=="
sparkle:status="live"
/>
I need to use python to change the sparkle:status from "live" to "expired" for the older version 3.0.10.4. This xml is later pushed to S3.
I am a newbie to python and hence wondering how to implement this. I can even create a whole new jenkins jobs to get this xml and modify it and then push to S3.
Any help is appreciated.
Thanks.
Consider an XSLT solution using lxml package where you can avoid any looping through all elements as may be required of an XPath solution. The script here runs an identity transform to copy all nodes and attributes as is and then runs a template specifically on all instances of the attribute #sparkle:status where its sibling in attribute set #sparkle:version='3.0.10.4'. Note too I had to declare the sparkle namespace in XSLT's header.
Below loads the XSLT script as a string but you can parse it from external file (saved in .xsl or .xslt format) like you do your XML file.
import lxml.etree as ET
# LOAD XML LAND XSL
dom = ET.parse('Input.xml')
xslstr='''<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:sparkle="http://www.andymatuschak.org/xml-namespaces/sparkle">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<!-- Identity Transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="enclosure[#sparkle:version='3.0.10.4']/#sparkle:status">
<xsl:attribute name="sparkle:status">expired</xsl:attribute>
</xsl:template>
</xsl:transform>'''
xslt = ET.fromstring(xslstr)
# TRANSFORM XML
transform = ET.XSLT(xslt)
newdom = transform(dom)
# SAVE OUTPUT
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
print(tree_out.decode("utf-8"))
xmlfile = open('Output.xml','wb')
xmlfile.write(tree_out)
xmlfile.close()

XSLT transformation gives only root element python lxml

Currently doing XML-XSLT transformation using following code.
from lxml import etree
xmlRoot = etree.parse('path/abc.xml')
xslRoot = etree.parse('path/abc.xsl')
transform = etree.XSLT(xslRoot)
newdom = transform(xmlRoot)
print(etree.tostring(newdom, pretty_print=True))
The following code works fine but gives only the root element as output but not the whole XML content. When i run the transformation for the same XML and XSL file using Altova it works fine doing the transformation. Is the syntax for printing the whole XML is different or any errors in here that u find out?
XML content :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<slide name="slide7.xml" nav_lvl_1="Solutions" nav_lvl_2="Value Map" page_number="7">
<title>Retail Value Map</title>
<Subheadline>Retail </Subheadline>
</slide>
</root>
XSL content:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="UTF-8" indent="yes" method="xml" standalone="yes" version="1.0"/>
<xsl:template match="/">
<p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
<xsl:for-each select="root/slide">
<xsl:choose>
<xsl:when test="#nav_lvl_1='Solutions'">
<xsl:if test="#nav_lvl_2='Value Map'">
<p:txBody>
<a:p>
<a:r>
<a:rPr lang="en-US" dirty="0" smtClean="0"/>
<a:t>
<xsl:value-of select="title"/>
</a:t>
</a:r>
<a:endParaRPr lang="en-US" dirty="0"/>
</a:p>
</p:txBody>
</xsl:if>
</xsl:when>
</xsl:choose>
</xsl:for-each>
</p:sld>
</xsl:template>
Current output :
<p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main"/>

Sorting XML files

Is it possible to sort XML files like the following:
<model name="ford">
<driver>Bob</driver>
<driver>Alice</driver>
</model>
<model name="audi">
<driver>Carly</driver>
<driver>Dean</driver>
</model>
Which would become
<model name="audi">
<driver>Carly</driver>
<driver>Dean</driver>
</model>
<model name="ford">
<driver>Alice</driver>
<driver>Bob</driver>
</model>
That is, the outermost elements are sorted first, then the second outermost, and so on.
They'd need to be sorted by element name first.
This is a refinement of Kirill's solution, I think it better reflects the stated requirements, and it avoids the type error XSLT 2.0 will give you if the sort key contains more than one value (but it still works on 1.0).
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" />
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates select="*">
<xsl:sort select="(#name | text())[1]"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Try this XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" />
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()">
<xsl:sort select="text() | #*"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
You can sort nodes by removing them from the parent node, and re-inserting them in the intended order. For example:
def sort_tree(tree):
""" recursively sorts the given etree in place """
for child in tree:
sort_tree(child)
sorted_children = sorted(tree, key=lambda n: n.text)
for child in tree:
tree.remove(child)
for child in reversed(sorted_children):
tree.insert(0, child)
tree = etree.fromstring(YOUR_XML)
sort_tree(tree)
print(etree.tostring(tree, pretty_print=True))
You don't need to sort the entire xml dom.
Instead take the required nodes into a list and sort them. Because we would need the sorted order while processing and not in file, its better done in run time.
May be like this, using minidom.
import os, sys
from xml.dom import minidom
document = """\
<root>
<model name="ford">
<driver>Bob</driver>
<driver>Alice</driver>
</model><model name="audi">
<driver>Carly</driver>
<driver>Dean</driver>
</model>
</root>
"""
document = minidom.parseString(document)
elements = document.getElementsByTagName("model")
elements.sort(key=lambda elements:elements.attributes['name'])

Categories

Resources