lxml preserves attributes order?

lxml preserves attributes order? - python

I was writing my aplication using minidom but minidom does not preserve attribute order(sorts alphabetically), so I decided to do it using lxml.
However in the following lines of code I'm not getting the desired order:
import lxml.etree as ET
SATNS = "link_1"
NS = "link_2"
location_attribute = '{%s}schemaLocation' % NS
root = ET.Element('{%s}Catalogo' % SATNS, nsmap={'catalogocuentas':SATNS}, attrib=
{location_attribute: 'http://www.sat.gob.mx/catalogocuentas'}, Ano="2014", Mes="02", TotalCtas="219", RFC="ALF040329CX6", Version="1.0")
print (ET.tostring(root, pretty_print=True))
This is what I'm expecting to get:
<catalogocuentas:Catalogo xmlns:catalogocuentas="link_1"
xmlns:xsi="link_2" xsi:schemaLocation="http://www.sat.gob.mx/catalogocuentas"
Ano="2014" Mes="02" TotalCtas="219" RFC="XXX010101XXX" Version="1.0">
</catalogocuentas:Catalogo>
Which is in the order that I filled in:
root=ET.element(...)
But I'm getting the next, that has no order:
<catalogocuentas:Catalogo xmlns:catalogocuentas="link_1"
xmlns:xsi="link_2" RFC="ALF040329CX6" Version="1.0"
Mes="02" xsi:schemaLocation="http://www.sat.gob.mx/catalogocuentas" Ano="2014" TotalCtas="219">
</catalogocuentas:Catalogo>
Is there a way to fix this problem?
Thanks in advance!!

Dictionaries in Python are unordered. Keyword arguments are passed to functions by a dictionary traditionally named **kwargs, and so the order is lost. The function can't possibly know what order the arguments to ET.element came in.
As stated in this question, there isn't really any way to get this done. XML doesn't care about attribute order, so there isn't really any good reason to do it.

Related

ElementTree namespace dictionary not working with find() or findall()

I'm stumped with how to do the ElementTree namespace dictionary and subsequent find() and findall() calls using the documented sytnax:
A better way to search the namespaced XML example is to create a
dictionary with your own prefixes and use those in the search
functions:
ns = {'real_person': 'http://people.example.com',
'role': 'http://characters.example.com'}
for actor in root.findall('real_person:actor', ns):
name = actor.find('real_person:name', ns)
print(name.text)
for char in actor.findall('role:character', ns):
print(' |-->', char.text)
The issue i'm having is if i try to use the syntax noted in that doc, by passing the "ns" dictionary as a 2nd argument in find() or findall(), i get an empty list. If I type out the full namespace without passing the 2nd argument, it returns all of the expected elements.
I've defined my namespace dictionary as such:
ns = {'ws':'{urn:com.workday/workersync}'}
And here is the ElementTree and root setup:
xmlparser = ET.parse(xmlfile)
xmlroot = xmlparser.getroot()
Here is what i get when i try to use the dictionary shortcut syntax noted in the docs:
>>> xmlroot.findall('ws:Worker', ns)
[]
Just an empty list... Here is what i get if type out the namespace in the call:
xmlroot.findall('{urn:com.workday/workersync}Worker')
[<Element '{urn:com.workday/workersync}Worker' at 0x03220A78>, <Element'{urn:com.workday/workersync}Worker' at 0x0322D8C0>]
That returns the expected 2 elements in my sample file.
Here is what the top of my sample file looks like for reference:
<?xml version="1.0" encoding="UTF-8"?>
<ws:Worker_Sync xmlns:ws="urn:com.workday/workersync" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ws:Header>
<ws:Version>34.0</ws:Version>
<ws:Prior_Entry_Time>2020-07-04T21:40:25.822-07:00</ws:Prior_Entry_Time>
<ws:Current_Entry_Time>2020-07-04T22:03:47.458-07:00</ws:Current_Entry_Time>
<ws:Prior_Effective_Time>2020-07-04T00:00:00.000-07:00</ws:Prior_Effective_Time>
<ws:Current_Effective_Time>2020-07-05T00:00:00.000-07:00</ws:Current_Effective_Time>
<ws:Full_File>true</ws:Full_File>
<ws:Document_Retention_Policy>30</ws:Document_Retention_Policy>
<ws:Worker_Count>2</ws:Worker_Count>
</ws:Header>
<ws:Worker>
*<snipped rest of XML data>*
The snipped XML data contains 2 <ws:Worker> elements with many subchildren under them.
I've been messing with this for longer than i'd care to admit. I feel like I'm missing something incredibly obvious, as to my eyes, my code looks like every example i've found online and the example code on the docs.
Please help!

Remove the curly brackets from the URI string. The namespace dictionary should look like this:
ns = {'ws': 'urn:com.workday/workersync'}
Another option is to use a wildcard for the namespace. This is supported for find() and findall() since Python 3.8:
print(xmlroot.findall('{*}Worker'))
Output:
[<Element '{urn:com.workday/workersync}Worker' at 0x033E6AC8>]

Setting and accessing namespaces in python lxml

I am writing a script that processes a rdf:skos file with python3 and lxml:
I learnt that I need to pass to the findall procedure the namespaces that the XML mentions. (Ok, strange, since the XML files lists these in the header, so this seems like an unnecessary step but anyway).
When calling
for concept in root.findall('.//skos:Concept', namespaces=root.nsmap):
that works, because a root.nsmap is constructed by lxml.
But then later in my code I also need to perform a test on xml:lang
for pl in concept.findall(".//skos:prefLabel[#xml:lang='en']", namespaces=root.nsmap):
and here python tells me
SyntaxError: prefix 'xml' not found in prefix map
Ok, true, in my skos file there is no extra declaration for the xml namespace. So I try to add it to the root.nsmap dict
root.nsmap['xml'] = "http://www.w3.org/XML/1998/namespace"
but that too doesn't work
nsmap = {'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'uneskos': 'http://purl.org/umu/uneskos#', 'iso-thes': 'http://purl.org/iso25964/skos-thes#', 'dcterms': 'http://purl.org/dc/terms/', 'skos': 'http://www.w3.org/2004/02/skos/core#', 'rdfs': 'http://www.w3.org/2000/01/rdf-schema#'}
Seems I am not allowed to modify the root.nsmap?
Anyone an idea how this is done? I have processed tons of XML in the past with Perl XML::Twig which is very very comfortable and I assmue, the Python community has (at least) similarly comfortable ways to do that ... but how?
Any hint appreciated.

Modifying root.nsmap has no effect. But you can create another dictionary and modify that one. Example:
from lxml import etree
doc = """
<root xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<skos:prefLabel xml:lang='en'>FOO</skos:prefLabel>
<skos:prefLabel xml:lang='de'>BAR</skos:prefLabel>
</root>"""
root = etree.fromstring(doc)
nsmap = root.nsmap
nsmap["xml"] = "http://www.w3.org/XML/1998/namespace"
en = root.find(".//skos:prefLabel[#xml:lang='en']", namespaces=nsmap)
print(en.text)
Output:
FOO

python xml xpath query using tag and attribute with ns

I must be doing something inherently wrong here, every example I've seen and search for on SO seems to suggest this would work.
I'm trying to use an XPath search with lxml etree library to parse a garmin tcx file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<TrainingCenterDatabase xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd">
<Workouts>
<Workout Sport="Biking">
<Name>3P2 WK16 - 3</Name>
<Step xsi:type="Step_t">
<StepId>1</StepId>
<Name>[MP19]6:28-6:38</Name>
<Duration xsi:type="Distance_t">
<Meters>13000</Meters>
</Duration>
<Intensity>Active</Intensity>
<Target xsi:type="Speed_t">
<SpeedZone xsi:type="PredefinedSpeedZone_t">
<Number>2</Number>
</SpeedZone>
</Target>
</Step>
......
</Workout>
</Workouts>
</TrainingCenterDatabase>
I'd like to return the SpeedZone Element only where the type is PredefinedSpeedZone_t. I thought I'd be able to do:
root = ET.parse(open('file.tcx'))
xsi = {'xsi': 'http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2'}
for speed_zone in root.xpath(".//xsi:SpeedZone[#xsi:type='PredefinedSpeedZone_t']", namespaces=xsi):
print speed_zone
Though this doesn't seem to be the case. I've tried lots of combinations of removing/adding namespaces and to no avail. If I remove the attribute search and leave it as ".//xsi:SpeedZone" then this does return:
<Element {http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2}SpeedZone at 0x2595188>
as I'd expect.
I guess I could do it inside the for loop but it just feels like it should be possible on one line!

I'm a bit late, but the other answers are confusing IMHO.
In the Python code in the question and in the two other answers, the xsi prefix is bound to the http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 URI. But in the XML document with the Garmin data, xsi is bound to http://www.w3.org/2001/XMLSchema-instance.
Since there are two namespaces at play here, I think the following code gives a clearer picture of what's going on. The namespace associated with the tcd prefix is the default namespace.
from lxml import etree
NSMAP = {"tcd": "http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2",
"xsi": "http://www.w3.org/2001/XMLSchema-instance"}
root = etree.parse('file.tcx')
for speed_zone in root.xpath(".//tcd:SpeedZone[#xsi:type='PredefinedSpeedZone_t']",
namespaces=NSMAP):
print speed_zone
Output:
<Element {http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2}SpeedZone at 0x25b7e18>

One way to workaround this is to avoid specifying the attribute name and use *:
.//xsi:SpeedZone[#*='PredefinedSpeedZone_t']
Another option (not that awesome as previous one) is to actually get all the SpeedZone tags and check for the attribute value in the loop:
attribute_name = '{%s}type' % root.nsmap['xsi']
for speed_zone in root.xpath(".//xsi:SpeedZone", namespaces=xsi):
if speed_zone.attrib.get(attribute_name) == 'PredefinedSpeedZone_t':
print speed_zone
Hope that helps.

If all else fails you can still use
".//xsi:SpeedZone[#*[name() = 'xsi:type' and . = 'PredefinedSpeedZone_t']]"
Using name() is not as nice as directly addressing the namespaced attribute, but at least etree understands it.

How do you create a non-nested xml element using Python's lxml.objectify?

My current code is
xml_obj = lxml.objectify.Element('root_name')
xml_obj[root_name] = str('text')
lxml.etree.tostring(xml_obj)
but this creates the following xml:
<root_name><root_name>text</root_name></root_name>
In the application I am using this for I could easily use text substitution to solve this problem, but it would be nice to know how to do it using the library.

I'm not that familiar with objectify, but i don't think that's the way it's intended to be used. The way it represents objects, is that a node at any given level is, say, a classname, and the subnodes are field names (with types) and values. And the normal way to use it would be something more like this:
xml_obj = lxml.objectify.Element('xml_obj')
xml_obj.root_path = 'text'
etree.dump(xml_obj)
<root_name xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="TREE">
<root_name py:pytype="str">text</root_name>
</root_name>
What you want would be way easier to do with etree:
xml_obj = lxml.etree.Element('root_path')
xml_obj.text = 'text'
etree.dump(xml_obj)
<root_path>text</root_path>
If you really need it to be in objectify, it looks like while you shouldn't mix directly, you can use tostring to generate XML, then objectify.fromstring to bring it back. But probably, if this is what you want, you should just use etree to generate it.

I don't think you can write data into the root element. You may need to create a child element like this:
xml_obj = lxml.objectify.Element('root_name')
xml_obj.child_name = str('text')
lxml.etree.tostring(xml_obj)

XML creation from a Dictionary in Python

I am quite new to XML as well as Python, so please overlook . I am trying to unpack a dictionary straight into XML format. My Code Fragment is as follows:
from xml.dom.minidom import Document
def make_xml(dictionary):
doc = Document()
result = doc.createElement('result')
doc.appendChild(result)
for key in dictionary:
attrib = doc.createElement(key)
result.appendChild(attrib)
value = doc.createTextNode(dictionary[key])
attrib.appendChild(value)
print doc
I expected an answer of the format
<xml>
<result>
<attrib#1>value#1</attrib#1>
...
However all I am getting is
<xml.dom.minidom.Document instance at 0x01BE6130>
Please help

You have not checked the
http://docs.python.org/library/xml.dom.minidom.html
docs.
Look at the toxml() or prettyprettyxml() methods.

You can always use a library like xmler which easily takes a python dictionary and converts it to xml. Full disclosure, I created the package, but I feel like it will probably do what you need.
Also feel free to take a look at the source.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

lxml preserves attributes order? - python

Related

ElementTree namespace dictionary not working with find() or findall()

Setting and accessing namespaces in python lxml

python xml xpath query using tag and attribute with ns

How do you create a non-nested xml element using Python's lxml.objectify?

XML creation from a Dictionary in Python

Categories

Resources