Getting XML attribute value with lxml module - python

How can i get the value of an attribute of XML file with lxml module?
My XML looks like this"
<process>
<name>somename</name>
<statistics>
<stats param='someparam'>
<value>0.456</value>
<real_value>0.4</value>
</stats>
<stats ...>
.
.
.
</stats>
</statistics>
</process>
I want to get the value 0.456 from the value attribute. I'm iterating trought the attribute and getting the text but im not sure that this is the best way for doing this
for attribute in root.iter('statistics'):
for stats in attribute:
for param_value in stats.iter('value'):
value = param_value.text
is there any other much easier way for doing this? something like stats.get_value('value')

Use XPath:
root.find('.//value').text
This gets you the content of the first value tag.
If you want to iterate over all value elements, use findall, this gets you a list with all the elements.
If you only want the value elements inside <stats param='someparam'> elements, make the path more specific:
root.findall("./statistics/stats[#param='someparam']/value")
edit: Note that find/findall only support a subset of XPath. If you want to make use of the whole XPath (1.x) functionality, use the xpath method.

Related

How to get text of innerHTML element?

ProductNames is an array of required data when using this line:
ProductNames[3].find_element_by_css_selector('.aok-align-bottom').get_attribute("innerHTML")
I'm getting this:
<span class="a-icon-alt">4.3 out of 5 stars</span>
So how can I extract only exactly text 4.3 out of 5 stars from span tag
You should include in your css_selector this >span too, and search get_attribute("innetHTML") on <span class="a-icon-alt">4.3 out of 5 stars</span>
Try something like this:
ProductNames[3].find_element_by_css_selector('.aok-align-bottom').get_attribute("innerHTML").text
You don't extract from innerHTML. Rather you extract text or the value of any attribute of a WebElement.
To extract the text _4.3 out of 5 stars_ you need to move one step deeper to the <span> and you can use the following Locator Strategy:
ProductNames[3].find_element_by_css_selector('.aok-align-bottom>span.a-icon-alt').get_attribute("innerHTML")
Or simply:
ProductNames[3].find_element_by_css_selector('.aok-align-bottom>span').get_attribute("innerHTML")
As an alternative, you can also use the text attribute as follows:
ProductNames[3].find_element_by_css_selector('.aok-align-bottom>span.a-icon-alt').text
Or simply:
ProductNames[3].find_element_by_css_selector('.aok-align-bottom>span').text
References
You can find a couple of relevant discussions in:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Problems with XPath in Selenium

I want to find element based on it attributes.
I have already tried searching by all divs, and specify by attributes, and even searching by *. None of this was solution.
Whole element looks like this:
<div class="charc" data-lvl="66" data-world="walios" data-nick="mirek">
This is my search expression:
driver.find_element_by_xpath('//div[#data-world="walios"] and [#data-nick="mirek"]')
I would like to find this element using python with selenium, and be able to click on it.
Actually I am getting the error
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//div[#data-world="walios"] and [#data-nick="mirek"]' is not a valid XPath expression.
What am I doing wrong?
The error message is correct because your predicate(s) is/are not correct.
Try putting the predicate in one [...] expression:
driver.find_element_by_xpath('//div[#data-world="walios" and #data-nick="mirek"]')
driver.find_elements_by_xpath("//div[#data-world="walios" and #data-nick="mirek)]")
or
driver.find_elements_by_xpath("//div[#data-world="walios"][#data-nick="mirek)]")
The multiple conditions for selecting the tag can't be within nested []. Either you have to specify within one [] or within multiple []s.
XPath axes methods:
These XPath axes methods are used to find the complex or dynamic elements. Below we will see some of these methods.
XPath expression select nodes or list of nodes on the basis of attributes like ID , Name, Classname, etc. from the XML document .

Replace XML element with another element in Python

I need to replace a particular element from one XML file with another element from a different XML file. I get the element with XPath expressions and I don't have a handle to its parent.
What is the easiest way to in-place replace it so that if I write to a XML file it would reflect the change? I.e. I want to do what this pseudocode does:
# Pseudocode
tree1.open('input1.xml')
tree2.open('input2.xml')
element1 = tree1.findall(...)[0]
element2 = tree2.findall(...)[0]
element1.replaceWith(element2)
tree1.writeToXmlFile('merged.xml')
Ok, I tried __setstate__ and __getstate__ and it worked:
element1.__setstate__(element2.__getstate__())

Python - how to edit a specific XML element content when multiple element attributes of the same name exist?

I've been trying to edit one specific element content in an XML which contains multiple element contents of the same name, but the "for loop" which is required to set the element attribute will always go through the entire section and change them all.
Let's say that this is my XML:
<SectionA>
<element_content attribute="device_1" type="parameter_1" />
<element_content attribute="device_2" type="parameter_2" />
</SectionA>
I am currently using ElementTree with this code which works perfectly when a certain section has element content with different names, but it does not work for such a case - where the name is the same. It will simply change all of the content's attributes to have the same value.
for element in root.iter(section):
print element
element.set(attribute, attribute_value)
How do I access a specific element content and only change that one?
Bear in mind that I have no knowledge of the currently present attributes inside the element_content section, as I am dynamically adding them to a user's request.
Edit:
Thanks to #leovp I was able to work around my problem and came up with this solution:
for step in root.findall(section):
last_element = step.find(element_content+'[last()]')
last_element.set(attribute, attribute_value)
This causes the for loop to always change the last attribute in the specific nest.
Since I am dynamically adding and editing lines, this makes it change the last one I have added.
Thank you.
You can use limited XPath support that xml.etree provides:
>>> from xml.etree import ElementTree
>>> xml_data = """
... <SectionA>
... <element_content attribute="device_1" type="parameter_1" />
... <element_content attribute="device_2" type="parameter_2" />
... </SectionA>
... """.strip()
>>> tree = ElementTree.fromstring(xml_data)
>>> d2 = tree.find('element_content[#attribute="device_2"]')
>>> d2.set('type', 'new_type')
>>> print(ElementTree.tostring(tree).decode('utf-8'))
<SectionA>
<element_content attribute="device_1" type="parameter_1" />
<element_content attribute="device_2" type="new_type" />
</SectionA>
The most important part here is an XPath expression, where we find an element by its name AND attribute value:
d2 = tree.find('element_content[#attribute="device_2"]')
Update: since the XML data in question is not known beforehand.
You can query the first, second, ..., last elements like this (indices start from 1):
tree.find('element_content[1]')
tree.find('element_content[2]')
tree.find('element_content[last()]')
But since you're iterating over elements anyway, the most simple solution is to just check current element's attributes:
for element in root.iter(section):
if element.attrib.get('type') == 'parameter_2'):
element.set(attribute, attribute_value)

Getting element by undefined tag name

I'm parsing an xml document in Python using minidom.
I have an element:
<informationRequirement>
<requiredDecision href="id"/>
</informationRequirement>
The only thing I need is value of href in subelement but its tag name can be different (for example requiredKnowledge instead of requiredDecision; it always shall begin with required).
If the tag was always the same I would use something like:
element.getElementsByTagName('requiredDecision')[0].attributes['href'].value
But that's not the case. What can be substitute of this knowing that tag name varies?
(there will be always one subelement)
If you're always guaranteed to have one subelement, just grab that element:
element.childNodes[0].attributes['href'].value
However, this is brittle. A (perhaps) better approach could be:
hrefs = []
for child in element.childNodes:
if child.tagName.startswith('required'):
hrefs.append(child.attributes['href'].value)

Categories

Resources