I want to get the actual XPath expression to an attribute node for a specific attribute in an xml element tree (using lxml).
Suppose the following XML tree.
<foo>
<bar attrib_name="hello_world"/>
</foo>
The XPath expression "//#*[local-name() = "attrib_name"]" produces ['hello_world'] which is the values of concerned attributes, and "//#*[local-name() = "attrib_name"]/.." gets me the bar element, which is one level too high, I need the xpath expression to the specific attribute node itself, not its parent xml node, that is having the string 'attrib_name' I want to generate '/foo/bar/#attrib_name'.
from lxml import etree
from io import StringIO
f = StringIO('<foo><bar attrib_name="hello_world"></bar></foo>')
tree = etree.parse(f)
print(tree.xpath('//#*[local-name() = "attrib_name"]'))
# --> ['hello_world']
print([tree.getpath(el) for el in tree.xpath('//#*[local-name() = "attrib_name"]/..')])
# --> ['/foo/bar']
As an add-on this should work with namespaces too.
If you remove the /.. then you will get the _ElementUnicodeResult
This will allow you to append the attribute name to the xpath:
>>> print(['%s/#%s' % (tree.getpath(attrib_result.getparent()), attrib_result.attrname) for attrib_result in tree.xpath('//#*[local-name() = "attrib_name"]')])
['/foo/bar/#attrib_name']
Trying to apply that to namespaces will result in the namespace added to the xpath (which may not be what you want):
>>> tree = etree.parse(StringIO('<foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><bar xsi:attrib_name="hello_world"></bar></foo>'))
>>> print(['%s/#%s' % (tree.getpath(attrib_result.getparent()), attrib_result.attrname) for attrib_result in tree.xpath('//#*[local-name() = "attrib_name"]')])
['/foo/bar/#{http://www.w3.org/2001/XMLSchema-instance}attrib_name']
Related
I'm using xml.etree.ElementTree to create some basic XML in Python. I have a block of XML that I need to access on its own, so I make it as a root Element:
import xml.etree.ElementTree as Tree
def correction_xml(self):
correction = Tree.Element('ColourCorrection')
sop_node = Tree.SubElement(correction, "SOPNode")
slope = Tree.SubElement(sop_node, 'Slope')
offset = Tree.SubElement(sop_node, 'Offset')
power = Tree.SubElement(sop_node, 'Power')
return correction
I also need to insert multiple instances of this part of a bigger XML, so is there a way to insert my correction Element into another tree as a SubElement? Somthing like this, except the SubElement factory only accepts a single string, not an Element object:
def correction_list(self):
list = Tree.Element("List")
item_1 = Tree.SubElement(list, self.correction_xml()) /*insert correction_xml into list as a subelement, keeping its children intact*/
I have an XSD-file where I need to get a namespace as defined in the root-tag:
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:abw="http://www.liegenschaftsbestandsmodell.de/ns/abw/1.0.1.0" xmlns:adv="http://www.adv-online.de/namespaces/adv/gid/6.0" xmlns:bfm="http://www.liegenschaftsbestandsmodell.de/ns/bfm/1.0" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:sc="http://www.interactive-instruments.de/ShapeChange/AppInfo" elementFormDefault="qualified" targetNamespace="http://www.liegenschaftsbestandsmodell.de/ns/abw/1.0.1.0" version="1.0.1.0">
<!-- elements -->
</schema>
Now as the targetNamespace of this schema-definition is "http://www.liegenschaftsbestandsmodell.de/ns/abw/1.0.1.0" I need to get the short identifier for this namespace - which is abw. To get this identifier I have to get that attribute from the root-tag that has the exact same value as my targetNamespace (I can´t rely on the identifier beeing part of the targetNamespace-string allready, this may change in the future).
On this question How to extract xml attribute using Python ElementTree I found how to get the value of an attribute given by its name. However I don´t know the attributes name, only its value, so what can I do when I have a value and want to select the attribute having this value?
I think of something like this:
for key in root.attrib.keys():
if(root.attrib[key] == targetNamespace):
return root.attrib[key]
but root.attrib only contains elementFormDefault, targetNamespace and version, but not xmlns:abw.
string must be Unicode else error will appear
Traceback (most recent call last):
File "<pyshell#62>", line 1, in <module>
it = etree.iterparse(StringIO(xml))
TypeError: initial_value must be unicode or None, not str
code:
>>> from io import StringIO
>>> from xml.etree import ElementTree
>>> xml=u"""<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:abw="http://www.liegenschaftsbestandsmodell.de/ns/abw/1.0.1.0" xmlns:adv="http://www.adv-online.de/namespaces/adv/gid/6.0" xmlns:bfm="http://www.liegenschaftsbestandsmodell.de/ns/bfm/1.0" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:sc="http://www.interactive-instruments.de/ShapeChange/AppInfo" elementFormDefault="qualified" targetNamespace="http://www.liegenschaftsbestandsmodell.de/ns/abw/1.0.1.0" version="1.0.1.0">
<!-- elements -->
</schema>"""
>>> ns = dict([
node for _, node in ElementTree.iterparse(
StringIO(xml), events=['start-ns']
)
])
>>> for k,v in ns.iteritems():
if v=='http://www.liegenschaftsbestandsmodell.de/ns/abw/1.0.1.0':
print k
output:
abw
Using minidom instead of ETree did it:
import xml.dom.minidom as DOM
tree = DOM.parse(myFile)
root = tree.documentElement
targetNamespace = root.getAttribute("targetNamespace")
d = dict(root.attributes.items())
for key in d:
if d[key] == targetNamespace: return key
This will return either targetNamespace or xmlns:abw depending on what comes first in the xsd. Of course we should ignore the first case, but this goes out of scope of this question.
Here is a sample from the doc I am working with:
<idx:index xsi:schemaLocation="http://www.belscript.org/schema/index index.xsd" idx:belframework_version="2.0">
<idx:namespaces>
<idx:namespace idx:resourceLocation="http://resource.belframework.org/belframework/1.0/namespace/entrez-gene-ids-hmr.belns"/>
<idx:namespace idx:resourceLocation="http://resource.belframework.org/belframework/1.0/namespace/hgnc-approved-symbols.belns"/>
<idx:namespace idx:resourceLocation="http://resource.belframework.org/belframework/1.0/namespace/mgi-approved-symbols.belns"/>
I can get all nodes with name "namespace" with the following code:
tree = etree.parse(self.old_files)
urls = tree.xpath('//*[local-name()="namespace"]')
This would return a list of the 3 namespace elements. But what if I want to get to the data in the idx:resourceLocation attribute? Here is my attempt at doing that, using the XPath docs as a guide.
urls = tree.xpath('//*[local-name()="namespace"]/#idx:resourceLocation="http://resource.belframework.org/belframework/1.0/namespace/"',
namespaces={'idx' : 'http://www.belscript.org/schema/index'})
What I want is all nodes that have an attribute starting with http://resource.belframework.org/belframework/1.0/namespace. So in the sample doc, it would return me only those strings in the resourceLocation attribute. Unfortunately, the syntax is not quite right, and I am having trouble deriving the proper syntax from the documentation. Thank you!
I think what you are looking for is:
//*[local-name()="namespace"]/#idx:resourceLocation
or
//idx:namespace/#idx:resourceLocation
or, if you want only those #idx:resourceLocation attributes that start with "http://resource.belframework.org/belframework/1.0/namespace" you could use
'''//idx:namespace[
starts-with(#idx:resourceLocation,
"http://resource.belframework.org/belframework/1.0/namespace")]
/#idx:resourceLocation'''
import lxml.etree as ET
content = '''\
<root xmlns:xsi="http://www.xxx.com/zzz/yyy" xmlns:idx="http://www.belscript.org/schema/index">
<idx:index xsi:schemaLocation="http://www.belscript.org/schema/index index.xsd" idx:belframework_version="2.0">
<idx:namespaces>
<idx:namespace idx:resourceLocation="http://resource.belframework.org/belframework/1.0/namespace/entrez-gene-ids-hmr.belns"/>
<idx:namespace idx:resourceLocation="http://resource.belframework.org/belframework/1.0/namespace/hgnc-approved-symbols.belns"/>
<idx:namespace idx:resourceLocation="http://resource.belframework.org/belframework/1.0/namespace/mgi-approved-symbols.belns"/>
</idx:namespaces>
</idx:index>
</root>
'''
root = ET.XML(content)
namespaces = {'xsi': 'http://www.xxx.com/zzz/yyy',
'idx': 'http://www.belscript.org/schema/index'}
for item in root.xpath(
'//*[local-name()="namespace"]/#idx:resourceLocation', namespaces=namespaces):
print(item)
yields
http://resource.belframework.org/belframework/1.0/namespace/entrez-gene-ids-hmr.belns
http://resource.belframework.org/belframework/1.0/namespace/hgnc-approved-symbols.belns
http://resource.belframework.org/belframework/1.0/namespace/mgi-approved-symbols.belns
I'm using ElementTree findall() to find elements in my XML which have a certain tag. I want to turn the result into a list. At the moment, I'm iterating through the elements, picking out the .text for each element, and appending to the list. I'm sure there's a more elegant way of doing this.
#!/usr/bin/python2.7
#
from xml.etree import ElementTree
import os
myXML = '''<root>
<project project_name="my_big_project">
<event name="my_first_event">
<location>London</location>
<location>Dublin</location>
<location>New York</location>
<month>January</month>
<year>2013</year>
</event>
</project>
</root>
'''
tree = ElementTree.fromstring(myXML)
for node in tree.findall('.//project'):
for element in node.findall('event'):
event_name=element.attrib.get('name')
print event_name
locations = []
if element.find('location') is not None:
for events in element.findall('location'):
locations.append(events.text)
# Could I use something like this instead?
# locations.append(''.join.text(*events) for events in element.findall('location'))
print locations
Outputs this (which is correct, but I'd like to assign the findall() results directly to a list, in text format, if possible;
my_first_event
['London', 'Dublin', 'New York']
You can try this - it uses a list comprehension to generate the list without having to create a blank one and then append.
if element.find('location') is not None:
locations = [events.text for events in element.findall('location')]
With this, you can also get rid of the locations definition above, so your code would be:
tree = ElementTree.fromstring(myXML)
for node in tree.findall('.//project'):
for element in node.findall('event'):
event_name=element.attrib.get('name')
print event_name
if element.find('location') is not None:
locations = [events.text for events in element.findall('location')]
print locations
One thing you will want to be wary of is what you are doing with locations - it won't be defined if location doesn't exist, so you will get a NameError if you try to print it and it doesn't exist. If that is an issue, you can retain the locations = [] definition - if the matching element isn't found, the result will just be an empty list.
i have a xml with following data. i need to get value of and all other attribute. i return a python code there i get only first driver value.
My xml :
<volume name="sp" type="span" operation="create">
<driver>HDD1</driver>
<driver>HDD2</driver>
<driver>HDD3</driver>
<driver>HDD4</driver>
</volume>
My script:
import xml.etree.ElementTree as ET
doc = ET.parse("vol.xml")
root = doc.getroot() #Returns the root element for this tree.
root.keys() #Returns the elements attribute names as a list. The names are returned in an arbitrary order
root.attrib["name"]
root.attrib["type"]
root.attrib["operation"]
print root.get("name")
print root.get("type")
print root.get("operation")
for child in root:
#print child.tag, child.attrib
print root[0].text
My output:
sr-query:~# python volume_check.py aaa
sp
span
create
HDD1
sr-queryC:~#
I am not get HDD2, HDD3 and HDD4. How to itirate through this xml to get all values? Any optimized way? I think any for loop can do that but not familiar with Python.
In your for loop, it should be
child.text
not
root[0].text