Get inner xml from lxml - python

I have the following string which is part of an bigger XML Document:
content = '<odvNameElem stopID="9001002"><itdMapItemList/>Rathaus</odvNameElem>'
And I want to access Rathaus. My current approach is to parse it with lxml and trying to access the text of the element 'odvNameElem':
from lxml import etree
content = '<odvNameElem stopID="9001002"><itdMapItemList/>Rathaus</odvNameElem>'
root = etree.fromstring(content)
print(root.text)
This however results in None. What am I doing wrong?
etree.__version__ = '4.2.5'
I am not sure why the following works:
root.xpath("string()") but root.xpath("//text()") only returns an empty list. Can somebody please explain this?

The "Rathaus" string is the value of the tail property of the itdMapItemList element. Examples:
root.xpath("itdMapItemList")[0].tail
root.find("itdMapItemList").tail
See https://lxml.de/tutorial.html#elements-contain-text.
root.xpath("string()") returns the concatenation of the string values of the root node and its descendants, which indeed is "Rathaus" in this case.
See https://www.w3.org/TR/xpath-10/#function-string.
root.xpath("//test") does not make sense (there is no test element). Did you mean root.xpath("//text()")?
root.xpath("//text()") returns a list of all text nodes, which in this case is ['Rathaus'].
If the input XML is changed to
<odvNameElem stopID="9001002">ABC<itdMapItemList/>Rathaus</odvNameElem>
then the result is ['ABC', 'Rathaus']

Related

Modifying element in xml using python

can anyone please explain how to modify xml element in python using elementtree.
I want to keep the rego AD-4214 and change make 'Tata' into 'Nissan' and model 'Sumo' into 'Skyline'.
If rewriting the entire file is acceptable1, the easiest way would be to turn the xml file into a dictionary (see for example here: How to convert an XML string to a dictionary?), do your modifications on that dictionary, and convert this dict back to xml (like for example here: https://pypi.org/project/dicttoxml/)
1 Consider lost formatting: whitespace, number formats etc may not be preserved by this.
This should work:
import xml.etree.ElementTree as ET
tree = ET.parse('your_xml_source.xml')
root = tree.getroot()
root[1][1].text = "Nissan"
root[1][2].text = "Skyline"
getroot() gives you the root element (<motorvehicle>), [1] selects its second child, the <vehicle> with rego AD-4214. The secondary indexing, [1] and [2], gives you AD-4214's <make> and <model> respectively. Then using the text attribute, you can change their text content.

Using ElementTree to find a node - invalid predicate

I'm very new to this area so I'm sure it's just something obvious. I'm trying to change a python script so that it finds a node in a different way but I get an "invalid predicate" error.
import xml.etree.ElementTree as ET
tree = ET.parse("/tmp/failing.xml")
doc = tree.getroot()
thingy = doc.find(".//File/Diag[#id='53']")
print(thingy.attrib)
thingy = doc.find(".//File/Diag[BaseName = 'HTTPHeaders']")
print(thingy.attrib)
That should find the same node twice but the second find gets the error. Here is an extract of the XML:
<Diag id="53">
<Formatted>xyz</Formatted>
<BaseName>HTTPHeaders</BaseName>
<Column>17</Column>
I hope I've not cut it down too much. Basically, finding it with "#id" works but I want to search on that BaseName tag instead.
Actually, I want to search on a combination of tags so I have a more complicated expression lined up but I can't get the simple one to work!
The code in the question works when using Python 3.7. If the spaces before and after the equals sign in the predicate are removed, it also works with earlier Python versions.
thingy = doc.find(".//File/Diag[BaseName='HTTPHeaders']")
See https://bugs.python.org/issue31648.

Python XML 'TypeError: must be xml.etree.ElementTree.Element, not str'

I currently am trying to build an XML file from a CSV file. Currently my code reads the CSV file to data and begins creating the XML from the data that is stored within the CSV.
CSV Example:
Element,XMLFile
SubElement,XMLName,XMLFile
SubElement,XMLDate,XMLName
SubElement,XMLInformation,XMLDate
SubElement,XMLTime,XMLName
Expected Output:
<XMLFile>
<XMLName>
<XMLDate>
<XMLInformation />
</XMLDate>
<XMLTime />
</XMLName>
</XMLFile>
Currently my code attempts to look at the CSV to see what the parent is for the new subelement:
# Defines main element
# xmlElement = xml.Element(XMLFile)
xmlElement = xml.Element(csvData[rowNumber][columnNumber])
# Should Define desired parent (FAIL) and SubElement name (PASS)
# xmlSubElement = xml.SubElement(XMLFile, XMLName)
xmlSubElement = xml.SubElement(csvData[rowNumber][columnNumber + 2], csvData[rowNumber][columnNumber + 1])
When the code attempts to use the CSV source string as the parent parameter, Python 3.5 generates the following error:
TypeError: must be xml.etree.ElementTree.Element, not str
Known cause of the error is that the parent paramenter is being returned as a string, when it is expected to be an Element or SubElement.
Is it possible to recall the stored value from the CSV and have it reference the Element or SubElement, instead of a string? The goal is to allow the code to read the CSV file and assign any SubElement to the parent listed in the CSV.
I cannot tell for sure, but it looks like you are doing:
ElementTree.SubElement(str, str)
when you should be doing:
ElementTree.SubElement(Element, str)
It also seems like you already know this. The real question, then, is how are you going to reference the parent object when you only know its tag string? You could search for Elements in the ElementTree with that particular tag string, but this is generally not a good idea as XML allows multiple instances of similar elements.
I would suggest you either:
Find a strategy to store references to parent elements
See if there is a way to uniquely identify the parent element using XPath

Python XML DOM unique object?

I have a problem understanding pythons way of handling references in lists. I tried googling and reading python books but did not find a suitable answer for my problem.
If I have a file called test.py with the following code:
from lxml import etree as ET
__check = ET.Entity('check')
def test():
entries = []
for c in range(2):
row = []
row.append(ET.Element('entry'))
a = ET.Element('entry')
a.append(__check)
row.append(a)
entries.append(row)
for row in entries:
for e in row:
ET.dump(e)
When executing the test() method the output is:
<entry/>
<entry/>
<entry/>
<entry>&check;</entry>
The expected output would be:
<entry/>
<entry>&check;</entry>
<entry/>
<entry>&check;</entry>
What am I missing? For sure I can just edit the line with a.append(__check) to a.append(copy.deepcopy(__check)) and it works. But I don't understand why the previous example does not work the way I think.
Edit: I am using python 2.7.6
You're appending the same element over and over. XML DOM does not allow the same element to exist in two places in the tree (it wouldn't be a tree if it did), so your second append moves the __check element to the new place in the tree.

Copy and Write XML Node in Python

I've got a large XML file that I need to parse and look for a specific node. Once it has been found, I need to make a copy, edit a couple of values and write the file again.
So far I've managed to get the DOM element that I want. There is actually two of these elements already in the XML so after I'm finished, there will be three. Once I've made a copy of the DOM and edited the value, how do I then write this into the DOM (and thus the file)?
I'm using Python's from xml.dom import minidom at the moment.
In minidom you start with creating Document:
Document doc = Document("your_root")
then if it is a text node you want to add, you append it with:
text_node = doc.createTextNode(str(some content))
doc.appendChild(text_node)
if you had for example <some_elem key="my value">some my text</some_elem>:
do it like this:
text_node = doc.createTextNode('some my text')
elem.appendChild(text_node)
elem.setAttribute('key', 'my value')
if it is complex element create it with:
elem = doc.createElement('your_elem')
if you need to set attributes do:
elem.setAttribute("some-attribute",your_attr)
if you need to append something to it:
elem.appendChild( some_other_elem )
then append the element:
doc.appendChild( elem )
if you need a string representation do:
doc.toxml()
of
doc.toprettyxml()
From the minidom documentation:
from xml.dom.minidom import getDOMImplementation
impl = getDOMImplementation()
newdoc = impl.createDocument(None, "some_tag", None)
top_element = newdoc.documentElement
text = newdoc.createTextNode('Some textual content.')
top_element.appendChild(text)
So I guess appendChild is what you ask for?

Categories

Resources