I have looked at the documentation here:
http://docs.python.org/dev/library/xml.etree.elementtree.html#xml.etree.ElementTree.SubElement
The parent and tag argument seems clear enough, but what format do I put the attribute name and value in? I couldn't find any previous example. What format is the extra** argument?
I receive and error for trying to call the SubElement itself, saying that it is not defined. Thank you.
SubElement is a function of ElementTree (not Element) which allows to create child objects for an Element.
attrib takes a dictionary containing the attributes
of the element you want to create.
**extra is used for additional keyword arguments, those will be added as attributes to the Element.
Example:
>>> import xml.etree.ElementTree as ET
>>>
>>> parent = ET.Element("parent")
>>>
>>> myattributes = {"size": "small", "gender": "unknown"}
>>> child = ET.SubElement(parent, "child", attrib=myattributes, age="10" )
>>>
>>> ET.dump(parent)
<parent><child age="10" gender="unknown" size="small" /></parent>
>>>
If you look further down on the same page you linked to where it deals with class xml.etree.ElementTree.Element(tag, attrib={}, **extra) it tells you how any of the extra arguments work, that is by e.g.:
from etree import ElementTree as ET
a = ET.Element('root-node', tag='This is an extra that sets a tag')
b = ET.SubElement(a, 'nested-node 1')
c = ET.SubElement(a, 'nested-node 2')
d = ET.SubElement(c, 'innermost node')
ET.dump(a)
This also shows you how subelement works, you simply tell it which element (can be a subelsement) that you want to attach it to. For the future, supply some code too so it's easier to see what you're doing/want.
Related
I'm trying to make a flexible argument parser which takes strings, finds arguments within them and creates a dictionary like this:
message = "--mass=12"
physics = comparse(message, "mass", "int", 10, "this is your mass attribute")
Produces: {'mass': 12}
I'm unable to update and add more arguments/keys (like if I wanted to add a 'vel' variable) to the dictionary. Unfortunately, I'm new to Python classes and although the parser detects the first argument, I'm unable to add to the dictionary. Here's my program:
import shlex
class comparse(object):
def __init__(self, message, attribute, var_type, default, help_txt):
self.data = {} #This is the "data" dictionary that this function will ultimately return.
self.message = message
self.attribute = attribute
self.var_type = var_type
self.default = default
self.help_txt = help_txt
#Remove unwanted symbols (e.g. "=", ":", etc.)
self.message = self.message.replace("=", " ")
self.message = self.message.replace(":", " ")
self.args = shlex.split(self.message)
def parse(self):
try:
options = {k.strip('-'): True if v.startswith('-') else v
for k,v in zip(self.args, self.args[1:]+["--"]) if k.startswith('-') or k.startswith('')}
if (self.var_type == "int"):
self.data[self.attribute] = int(options[self.attribute]) #Updates if "attribute" exists, else adds "attribute".
if (self.var_type == "str"):
self.data[self.attribute] = str(options[self.attribute]) #Updates if "attribute" exists, else adds "attribute".
except:
if self.attribute not in self.message:
if (self.var_type == "int"):
self.data[self.attribute] = int(self.default) #Updates if "x" exists, else adds "x".
if (self.var_type == "str"):
self.data[self.attribute] = str(self.default) #Updates if "x" exists, else adds "x".
return self.data
def add_argument(self):
self.data.update(self.data)
message = "-mass: 12 --vel= 18"
physics = comparse(message, "mass", "int", 10, "this is your mass attribute")
comparse.add_argument(message, "vel", "int", 10, "this is your velocity attribute")
print (physics.parse())
The comparse.add_argument method doesn't work. There's obviously something I'm doing wrong and classes generally confuse me! Could someone point out what's wrong with my program?
I'm a little confused about how your class is designed. I've answered the question in a more holistic way, with suggestions about how you should maybe redesign your class to achieve your goal better.
Typically, when you initialize a class (your parser) you pass it all the data that it needs to do its work. It seems, however, like your goal is to create a generic physics parser physics = comparse() then add possible arguments like mass and velocity to the physics parser's data. You then give the physics parser a message string like message = "-mass: 12 --vel= 18" for which it should parse and extract the arguments. This would suggest that the end of your code snippet, which is currently
message = "-mass: 12 --vel= 18"
physics = comparse(message, "mass", "int", 10, "this is your mass attribute")
comparse.add_argument(message, "vel", "int", 10, "this is your velocity attribute")
print (physics.parse())
should look like so:
message = "-mass: 12 --vel= 18"
# create a parser
physics = comparse()
# tell the parser what arguments it should accept
comparse.add_argument("vel", "int", 10, "this is your velocity attribute")
comparse.add_argument("mass", "int", 10, "this is your mass attribute")
# give the parser a specific message for which it should extract arguments
print (physics.parse(message))
This code snippet would create a parser, tell the parser what sorts of arguments it should accept (like velocity and mass), and then extract those arguments from a specific string message.
This design adheres better to object oriented programming principles. The benefits here are that you're creating a physics parser that can be reused, and then asking it to parse strings which it does not save in its own properties (no this.message). This way, you can make additional calls like `physics.parse("--mass: 40 --vel=15") and reuse the physics parser.
If this design fits your intention more accurately, I would modify the code in the following ways to adhere to it:
modify your init function so that it takes no parameters.
Since you have multiple arguments that you are storing within your
class, instead of having self.attribute, self.var_type, self.default, self.help_txt be just single variables, I would make them arrays that you can add the attribute names, variable types, defaults, and help texts to for EACH argument. Initialize each of these as empty arrays in init like so: self.defaults = []. I would also change the name of each to indicate that they're arraysand not individual variables, so defaults, types, texts etc.
Modify add_argument to be the following:
def add_argument(self. attribute, var_type, default, help_txt):
self.attributes.append(attribute)
self.var_types.append(var_type)
self.defaults.append(default)
self.help_txts.append(default)
Modify parser to take message as a parameter, remove its unwanted
symbols, perform the split, and then execute its logic for each of
the arguments you set up in add_argument.
Please comment if you have any questions, good luck!
in the following code from what you gave
def add_argument(self):
self.data.update(self.data)
the add_argument does not take any arguments.But, you have done the fllowing
comparse.add_argument(message, "vel", "int", 10, "this is your velocity attribute")
where you have given multiple arguments. this is the cause of the problem
To fix it, try modifying the add_argument function to accept the parameter's you want it to handle
EDIT: based on your comment, the function should have been
def add_argument(self,message, attribute, var_type, default, help_txt):
data = comparse.parse(message, attribute, var_type, default, help_txt)
self.data.update(data)
But, here again, parse method is actually taking no arguments in your code, so..modify it to accept all the arguments you need along with self
I am getting a weird recurring error using AttrDict 2.0 on Python 2.7. The weird part is that transitive assignment seems to break, but only when using AttrDict.
What's happening is that I want to instantiate a new list on an object if it doesn't exist and then append data to it.
If I use AttrDict, the list somehow gets transformed into a tuple and I get an exception.
from attrdict import AttrDict
class Test(object):
pass
try:
for cls_ in [Test,AttrDict]:
foo = cls_()
print ("\ntesting with class %s" % (cls_))
#this
chk = foo.li = getattr(foo, "li", None) or []
print(" type(chk):%s, id(chk):%s" % (type(chk),id(chk)))
print(" type(foo.li):%s, id(foo.li):%s" % (type(foo.li),id(foo.li)))
foo.li.append(3)
print (" success appending with class %s: foo.li:%s" % (cls_, foo.li))
except (Exception,) as e:
# pdb.set_trace()
raise
Now check out the output, when I use the Test class vs when I use AttrDict.
testing with class <class '__main__.Test'>
type(chk):<type 'list'>, id(chk):4465207704
type(foo.li):<type 'list'>, id(foo.li):4465207704
success appending with class <class '__main__.Test'>: foo.li:[3]
With the custom Test class, as expected, chk and foo.li are both lists and have the same id. append works.
Looking at the pass using AttrDict, id does not match and foo.li is a tuple rather than a list.
testing with class <class 'attrdict.dictionary.AttrDict'>
type(chk):<type 'list'>, id(chk):4465207848
type(foo.li):<type 'tuple'>, id(foo.li):4464595080
Traceback (most recent call last):
File "test_attrdict2.py", line 25, in <module>
test()
File "test_attrdict2.py", line 18, in test
foo.li.append(3)
AttributeError: 'tuple' object has no attribute 'append'
Is attrdict assignment actually returning some kind of property/accessor object that gets changed the 2nd time you access it?
Took #abartnet's suggestion:
from attrdict import AttrDict
a = AttrDict()
a.li = []
print(a.li)
output:
()
OK, but even if that points to some weird behavior on AttrDict's end, how is it the transitive assignment does not assign the tuple as well?
reworked:
from attrdict import AttrDict
a = AttrDict()
b = a.li = []
print("a.li:", a.li)
print("b:",b)
output:
('a.li:', ())
('b:', [])
This is part of the automatic recursiveness of AttrDict. Which is explained better in the inline help (which you can find here in the source) than in the README:
If a values which is accessed as an attribute is a Sequence-type (and is not a string/bytes), it will be converted to a _sequence_type with any mappings within it converted to Attrs.
In other words, in order to auto-convert any dict or other mappings recursively inside your AttrDict to AttrDict values when doing attribute access, it also converts all sequences to (by default) tuple. This is a little weird, but appears to be intentional and somewhat-documented behavior, not a bug.
>>> a = AttrDict()
>>> a._sequence_type
tuple
>>> a.li = []
>>> a.li
()
The more flexible AttrMap type lets you specify the sequence type, and documents that you can disable this recursive remapping stuff by passing None:
>>> a = AttrMap(sequence_type=None)
>>> a.li = []
>>> a.li
[]
But of course AttrMap isn't a dict (although it is a collections.abc.MutableMapping, and more generally it duck-types as a dict-like type).
OK, but even if that points to some weird behavior on AttrDict's end, how is it the transitive assignment does not assign the tuple as well?
Because that's not how chained assignment works. Oversimplifying a bit:
target1 = target2 = value
… is not equivalent to this:
target2 = value
target1 = target2
… but to this:
target2 = value
target1 = value
The best way to understand why that's true: targets aren't expressions, and therefore don't have values. Sure, often the exact same sequence of tokens would be valid as an expression elsewhere in the grammar, but that sequence of tokens never gets evaluated as an expression anywhere in an assignment statement—otherwise, simple things like d['spam'] = 'eggs' would have to raise an exception if d['spam'] didn't exist.
Also, a.li = [] doesn't actually assign tuple([]) anywhere; it actually stores the [] internally, and does the tuple(…) later, when you try to access a.li. You can't really tell that for sure without reading the source, but when you consider that a['li'] gives you [] rather than (), it pretty much has to be true. And, in fact:
>>> li = []
>>> a.li = li
>>> a['li'] is li
True
I'm trying to get both keys and values of attributes of some tag in a XML file (using scrapy and xpath).
The tag is something like:
<element attr1="value1" attr2="value2 ...>
I don't know the keys "attr1", "attr2" and so on, and they can change between two elements. I didn't figure out how to get both keys and values with xpath, is there any other good practice for doing that?
Short version
>>> for element in selector.xpath('//element'):
... attributes = []
... # loop over all attribute nodes of the element
... for index, attribute in enumerate(element.xpath('#*'), start=1):
... # use XPath's name() string function on each attribute,
... # using their position
... attribute_name = element.xpath('name(#*[%d])' % index).extract_first()
... # Scrapy's extract() on an attribute returns its value
... attributes.append((attribute_name, attribute.extract()))
...
>>> attributes # list of (attribute name, attribute value) tuples
[(u'attr1', u'value1'), (u'attr2', u'value2')]
>>> dict(attributes)
{u'attr2': u'value2', u'attr1': u'value1'}
>>>
Long version
XPath has a name(node-set?) function to get node names (an attribute is a node, an attribute node):
The name function returns a string containing a QName representing the expanded-name of the node in the argument node-set that is first in document order.(...) If the argument it omitted, it defaults to a node-set with the context node as its only member.
(source: http://www.w3.org/TR/xpath/#function-name)
>>> import scrapy
>>> selector = scrapy.Selector(text='''
... <html>
... <element attr1="value1" attr2="value2">some text</element>
... </html>''')
>>> selector.xpath('//element').xpath('name()').extract()
[u'element']
(Here, I chained name() on the result of //element selection, to apply the function to all selected element nodes. A handy feature of Scrapy selectors)
One would like to do the same with attribute nodes, right? But it does not work:
>>> selector.xpath('//element/#*').extract()
[u'value1', u'value2']
>>> selector.xpath('//element/#*').xpath('name()').extract()
[]
>>>
Note: I don't know if it's a limitation of lxml/libxml2, which Scrapy uses under the hood, or if the XPath specs disallow it. (I don't see why it would.)
What you can do though is use name(node-set) form, i.e. with a non-empty node-set as parameter. If you read carefully the part of the XPath 1.0 specs I pasted above, as with other string functions, name(node-set) only takes into account the first node in the node-set (in document order):
>>> selector.xpath('//element').xpath('#*').extract()
[u'value1', u'value2']
>>> selector.xpath('//element').xpath('name(#*)').extract()
[u'attr1']
>>>
Attribute nodes also have positions, so you can loop on all attributes by their position. Here we have 2 (result of count(#*) on the context node):
>>> for element in selector.xpath('//element'):
... print element.xpath('count(#*)').extract_first()
...
2.0
>>> for element in selector.xpath('//element'):
... for i in range(1, 2+1):
... print element.xpath('#*[%d]' % i).extract_first()
...
value1
value2
>>>
Now, you can guess what we can do: call name() for each #*[i]
>>> for element in selector.xpath('//element'):
... for i in range(1, 2+1):
... print element.xpath('name(#*[%d])' % i).extract_first()
...
attr1
attr2
>>>
If you put all this together, and assume that #* will get you attributes in document order (not said in the XPath 1.0 specs I think, but it's what I see happening with lxml), you end up with this:
>>> attributes = []
>>> for element in selector.xpath('//element'):
... for index, attribute in enumerate(element.xpath('#*'), start=1):
... attribute_name = element.xpath('name(#*[%d])' % index).extract_first()
... attributes.append((attribute_name, attribute.extract()))
...
>>> attributes
[(u'attr1', u'value1'), (u'attr2', u'value2')]
>>> dict(attributes)
{u'attr2': u'value2', u'attr1': u'value1'}
>>>
I'm trying to get both keys and values of attributes of some tag in a XML file (using scrapy and xpath).
You need #*, which means "any attribute". The XPath expression //element/#* will give you all the attributes of elements element, and with the attributes, their values.
In an event, IAfterTransitionEvent, I'm trying to capture the event of an object being published and when the object is published, two objects are created and I want to relate.
In the type xml file of the object that's being published, I added to behaviors:
element value="plone.app.relationfield.behavior.IRelatedItems"
So that I could get relatedItems.
In my event function, I have:
#grok.subscribe(InitialContract, IAfterTransitionEvent)
def itemPublished(obj, event):
site = api.portal.get()
if event.status['action'] == 'publish':
house_agreement = customCreateFunction(container...,type..)
#I get the HouseAgreement object
labor_contract = customCreateFunction(container....,type)
#I get the LaborContract object
relIDs = []
relIDs.append(RelationValue(IUUID(house_agreement)))
relIDs.append(RelationValue(IUUID(labor_contract)))
obj.relatedItems = relIDs
Unfortunately, printing obj.relatedItems gives me an empty list and when I go to the View class and look under Categorization, the Related Items field is empty.
I tried _relatedItems instead of relatedItems, but that doesn't seem to work as I think its creating an attribute for the obj.
I also tried just using IUUID instead of also converting it to a RelationValue, but that doesn't give me any error at all.
It is like it is not setting the relatedItems value, yet seems to accept the list being passed.
If its possible, how can I programmatically set relatedItems?
Also, I do plan on adding code to prevent objects from being added twice.
You need to store a list of RelationValues.
>>> from zope.component import getUtility
>>> from zope.intid.interfaces import IIntIds
>>> from z3c.relationfield import RelationValue
>>> intids = getUtility(IIntIds)
>>> source.relatedItems = [RelationValue(self.intids.getId(target))]
>>> source.relatedItems
[<z3c.relationfield.relation.RelationValue object at 0x10bf2eed8>]
You can now access the target by...
>>> target_ref = source.relatedItems[0]
>>> target_ref.to_object
<XXX at / Plone/target>
The important change on my code example is:
RelationValues are based on Intid not uuid.
I have an XML writing script that outputs XML for a specific 3rd party tool.
I've used the original XML as a template to make sure that I'm building all the correct elements, but the final XML does not appear like the original.
I write the attributes in the same order, but lxml is writing them in its own order.
I'm not sure, but I suspect that the 3rd part tool expects attributes to appear in a specific order, and I'd like to resolve this issue so I can see if its the attrib order that making it fail, or something else.
Source element:
<FileFormat ID="1" Name="Development Signature" PUID="dev/1" Version="1.0" MIMEType="text/x-test-signature">
My source script:
sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", ID = str(db.ID), Name = db.name, PUID="fileSig/{}".format(str(db.ID)), Version = "", MIMEType = "")
My resultant XML:
<FileFormat MIMEType="" PUID="fileSig/19" Version="" Name="Printer Info File" ID="19">
Is there a way of constraining the order they are written?
It looks like lxml serializes attributes in the order you set them:
>>> from lxml import etree as ET
>>> x = ET.Element("x")
>>> x.set('a', '1')
>>> x.set('b', '2')
>>> ET.tostring(x)
'<x a="1" b="2"/>'
>>> y= ET.Element("y")
>>> y.set('b', '2')
>>> y.set('a', '1')
>>> ET.tostring(y)
'<y b="2" a="1"/>'
Note that when you pass attributes using the ET.SubElement() constructor, Python constructs a dictionary of keyword arguments and passes that dictionary to lxml. This loses any ordering you had in the source file, since Python's dictionaries are unordered (or, rather, their order is determined by string hash values, which may differ from platform to platform or, in fact, from execution to execution).
OrderedDict of attributes
As of lxml 3.3.3 (perhaps also in earlier versions) you can pass an OrderedDict of attributes to the lxml.etree.(Sub)Element constructor and the order will be preserved when using lxml.etree.tostring(root):
sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", OrderedDict([("ID",str(db.ID)), ("Name",db.name), ("PUID","fileSig/{}".format(str(db.ID))), ("Version",""), ("MIMEType","")]))
Note that the ElementTree API (xml.etree.ElementTree) does not preserve attribute order even if you provide an OrderedDict to the xml.etree.ElementTree.(Sub)Element constructor!
UPDATE: Also note that using the **extra parameter of the lxml.etree.(Sub)Element constructor for specifying attributes does not preserve attribute order:
>>> from lxml.etree import Element, tostring
>>> from collections import OrderedDict
>>> root = Element("root", OrderedDict([("b","1"),("a","2")])) # attrib parameter
>>> tostring(root)
b'<root b="1" a="2"/>' # preserved
>>> root = Element("root", b="1", a="2") # **extra parameter
>>> tostring(root)
b'<root a="2" b="1"/>' # not preserved
Attribute ordering and readability
As the commenters have mentioned, attribute order has no semantic significance in XML, which is to say it doesn't change the meaning of an element:
<tag attr1="val1" attr2="val2"/>
<!-- means the same thing as: -->
<tag attr2="val2" attr1="val1"/>
There is an analogous characteristic in SQL, where column order doesn't change
the meaning of a table definition. XML attributes and SQL columns are a set
(not an ordered set), and so all that can "officially" be said about either
one of those is whether the attribute or column is present in the set.
That said, it definitely makes a difference to human readability which order
these things appear in and in situations where constructs like this are authored and appear in text (e.g. source code) and must be interpreted, a careful ordering makes a lot of sense to me.
Typical parser behavior
Any XML parser that treated attribute order as significant would be out of compliance with the XML standard. That doesn't mean it can't happen, but in my experience it is certainly unusual. Still, depending on the provenence of the tool you mention, it's a possibility that may be worth testing.
As far as I know, lxml has no mechanism for specifying the order attributes appear in serialized XML, and I would be surprised if it did.
In order to test the behavior I'd be strongly inclined to just write a text-based template to generate enough XML to test it out:
id = 1
name = 'Development Signature'
puid = 'dev/1'
version = '1.0'
mimetype = 'text/x-test-signature'
template = ('<FileFormat ID="%d" Name="%s" PUID="%s" Version="%s" '
'MIMEType="%s">')
xml = template % (id, name, puid, version, mimetype)
I have seen order matter where the consumer of the XML is expecting canonicalized XML. Canonical XML specifies that the attributes be sorted:
in increasing lexicographic order with namespace URI as the primary
key and local name as the secondary key (an empty namespace URI is
lexicographically least). (section 2.6 of https://www.w3.org/TR/xml-c14n2/)
So if your application is expecting the kind of order you would get out of canonical XML, lxml does support output in canonical form using the method= argument to print. (see heading C14N of https://lxml.de/api.html)
For example:
from lxml import etree as ET
element = ET.Element('Test', B='beta', Z='omega', A='alpha')
val = ET.tostring(element, method="c14n")
print(val)
lxml uses libxml2 under the hood. It preserves attribute order, which means for an individual element you can sort them like this:
x = etree.XML('<x a="1" b="2" d="4" c="3"><y></y></x>')
sorted_attrs = sorted(x.attrib.items())
x.attrib.clear()
x.attrib.update(sorted_attrs)
Not very helpful if you want them all sorted though. If you want them all sorted you can use the c14n2 output method (XML Canonicalisation Version 2):
>>> x = etree.XML('<x a="1" b="2" d="4" c="3"><y></y></x>')
>>> etree.tostring(x, method="c14n2")
b'<x a="1" b="2" c="3" d="4"><y></y></x>'
That will sort the attributes. Unfortunately it has the downside of ignoring pretty_print, which isn't great if you want human-readable XML.
If you use c14n2 then lxml will use custom Python serialisation code to write the XML which calls sorted(x.attrib.items() itself for all attributes. If you don't, then it will instead call into libxml2's xmlNodeDumpOutput() function which doesn't support sorting attributes but does support pretty-printing.
Therefore the only solution is to manually walk the XML tree and sort all the attributes, like this:
from lxml import etree
x = etree.XML('<x a="1" b="2" d="4" c="3"><y z="1" a="2"><!--comment--></y></x>')
for el in x.iter(etree.Element):
sorted_attrs = sorted(el.attrib.items())
el.attrib.clear()
el.attrib.update(sorted_attrs)
etree.tostring(x, pretty_print=True)
# b'<x a="1" b="2" c="3" d="4">\n <y a="2" z="1">\n <!--comment-->\n </y>\n</x>\n'
You need to encapsulate a new string, which gives order when compared, and gives value when print and get strings.
Here is an example:
class S:
def __init__(self, _idx, _obj):
self._obj = (_idx, _obj)
def get_idx(self):
return self._obj[0]
def __le__(self, other):
return self._obj[0] <= other.get_idx()
def __lt__(self, other):
return self._obj[0] < other.get_idx()
def __str__(self):
return self._obj[1].__str__()
def __repr__(self):
return self._obj[1].__repr__()
def __eq__(self, other):
if isinstance(other, str):
return self._obj[1] == other
elif isinstance(other, S):
return self._obj[
0] == other.get_idx() and self.__str__() == other.__str__()
else:
return self._obj[
0] == other.get_idx() and self._obj[1] == other
def __add__(self, other):
return self._obj[1] + other
def __hash__(self):
return self._obj[1].__hash__()
def __getitem__(self, item):
return self._obj[1].__getitem__(item)
def __radd__(self, other):
return other + self._obj[1]
list_sortable = ['c', 'b', 'a']
list_not_sortable = [S(0, 'c'), S(0, 'b'), S(0, 'a')]
print("list_sortable ---- Before sort ----")
for ele in list_sortable:
print(ele)
print("list_not_sortable ---- Before sort ----")
for ele in list_not_sortable:
print(ele)
list_sortable.sort()
list_not_sortable.sort()
print("list_sortable ---- After sort ----")
for ele in list_sortable:
print(ele)
print("list_not_sortable ---- After sort ----")
for ele in list_not_sortable:
print(ele)
running result:
list_sortable ---- Before sort ----
c
b
a
list_not_sortable ---- Before sort ----
c
b
a
list_sortable ---- After sort ----
a
b
c
list_not_sortable ---- After sort ----
c
b
a
dict_sortable ---- After sort ----
a 3
b 2
c 1
dict_not_sortable ---- After sort ----
c 1
b 2
a 3