Trying to parse an XML file using lxml in Python, how do I simply get the value of an element's attribute? Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<item id="123">
<sub>ABC</sub>
</item>
I'd like to get the result 123, and store it as a variable.
When using etree.parse(), simply call .getroot() to get the root element; the .attrib attribute is a dictionary of all attributes, use that to get the value:
>>> from lxml import etree
>>> tree = etree.parse('test.xml')
>>> tree.getroot().attrib['id']
'123'
If you used etree.fromstring() the object returned is the root object already, so no .getroot() call is needed:
>>> tree = etree.fromstring('''\
... <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
... <item id="123">
... <sub>ABC</sub>
... </item>
... ''')
>>> tree.attrib['id']
'123'
Alternatively, you could use an XPath selector:
>>> from lxml import etree
>>> tree = etree.fromstring(b'''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<item id="123">
<sub>ABC</sub>
</item>''')
>>> tree.xpath('/item/#id')
['123']
I think Martijn has answered your question. Building on his answer, you can also use the items() method to get a list of tuples with the attributes and values. This may be useful if you need the values of multiple attributes. Like so:
>>> from lxml import etree
>>> tree = etree.parse('test.xml')
>>> item = tree.xpath('/item')
>>> item.items()
[('id', '123')]
Or in case of string:
>>> tree = etree.fromstring("""\
... <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
... <item id="123">
... <sub>ABC</sub>
... </item>
... """)
>>> tree.items()
[('id', '123')]
Related
I have an xml in python, need to obtain the elements of the "Items" tag in an iterable list.
I need get a iterable list from this XML, for example like it:
Item 1: Bicycle, value $250, iva_tax: 50.30
Item 2: Skateboard, value $120, iva_tax: 25.0
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<data>
<info>Listado de items</info>
<detalle>
<![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<tienda id="tiendaProd" version="1.1.0">
<items>
<item>
<nombre>Bicycle</nombre>
<valor>250</valor>
<data>
<tax name="iva" value="50.30"></tax>
</data>
</item>
<item>
<nombre>Skateboard</nombre>
<valor>120</valor>
<data>
<tax name="iva" value="25.0"></tax>
</data>
</item>
<item>
<nombre>Motorcycle</nombre>
<valor>900</valor>
<data>
<tax name="iva" value="120.50"></tax>
</data>
</item>
</items>
</tienda>]]>
</detalle>
</data>
I am working with
import xml.etree.ElementTree as ET
for example
import xml.etree.ElementTree as ET
xml = ET.fromstring(stringBase64)
ite = xml.find('.//detalle').text
tixml = ET.fromstring(ite)
You can use BeautifulSoup4 (BS4) to do this.
from bs4 import BeautifulSoup
#Read XML file
with open("example.xml", "r") as f:
contents = f.readlines()
#Create Soup object
soup = BeautifulSoup(contents, 'xml')
#find all the item tags
item_tags = soup.find_all("item") #returns everything in the <item> tags
#find the nombre and valor tags within each item
results = {}
for item in item_tags:
num = item.find("nombre").text
val = item.find("valor").text
results[str(num)] = val
#Prints dictionary with key value pairs from the xml
print(results)
I'm using the documentation here to try to get only the values (name,ip , netmask) for certain elements.
This is an example of the structure of my xml:
<?xml version="1.0" ?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="urn:uuid:5cf32451-91af-4f71-a0bd-ead244b81b1f">
<data>
<interfaces xmlns="urn:ietf:params:xml:ns:yang:ietf-interfaces">
<interface>
<name>GigabitEthernet1</name>
<type xmlns:ianaift="urn:ietf:params:xml:ns:yang:iana-if-type">ianaift:ethernetCsmacd</type>
<enabled>true</enabled>
<ipv4 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip">
<address>
<ip>192.168.40.30</ip>
<netmask>255.255.255.0</netmask>
</address>
</ipv4>
<ipv6 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip"/>
</interface>
<interface>
<name>GigabitEthernet2</name>
<type xmlns:ianaift="urn:ietf:params:xml:ns:yang:iana-if-type">ianaift:ethernetCsmacd</type>
<enabled>true</enabled>
<ipv4 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip">
<address>
<ip>10.10.10.1</ip>
<netmask>255.255.255.0</netmask>
</address>
</ipv4>
<ipv6 xmlns="urn:ietf:params:xml:ns:yang:ietf-ip"/>
</interface>
</interfaces>
</data>
</rpc-reply>
Python code: This code returns nothing .
import xml.etree.ElementTree as ET
tree = ET.parse("C:\\Users\\Redha\\Documents\\test_network\\interface1234.xml")
root = tree.getroot()
namespaces = {'interfaces': 'urn:ietf:params:xml:ns:yang:ietf-interfaces' }
for elem in root.findall('.//interfaces:interfaces', namespaces):
s0 = elem.find('.//interfaces:name',namespaces)
name = s0.text
print(name)
interface = ET.parse('interface2.xml')
interface_root = interface.getroot()
for interface_attribute in interface_root[0][0]:
print(f"{interface_attribute[0].text}, {interface_attribute[3][0][0].text}, {interface_attribute[3][0][1].text}")
Suppose you have an lmxl.etree element with the contents like:
<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
<element2>
<subelement2>blibli</sublement2>
</element2>
</root>
I can use find or xpath methods to get something an element rendering something like:
<element1>
<subelement1>blabla</subelement1>
</element1>
Is there a way simple to get:
<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
</root>
i.e The element of interest plus all it's ancestors up to the document root?
I am not sure there is something built-in for it, but here is a terrible, "don't ever use it in real life" type of a workaround using the iterancestors() parent iterator:
from lxml import etree as ET
data = """<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
<element2>
<subelement2>blibli</subelement2>
</element2>
</root>"""
root = ET.fromstring(data)
element = root.find(".//subelement1")
result = ET.tostring(element)
for node in element.iterancestors():
result = "<{name}>{text}</{name}>".format(name=node.tag, text=result)
print(ET.tostring(ET.fromstring(result), pretty_print=True))
Prints:
<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
</root>
The following code removes elements that don't have any subelement1 descendants and are not named subelement1.
from lxml import etree
tree = etree.parse("input.xml") # First XML document in question
for elem in tree.iter():
if elem.xpath("not(.//subelement1)") and not(elem.tag == "subelement1"):
if elem.getparent() is not None:
elem.getparent().remove(elem)
print etree.tostring(tree)
Output:
<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
</root>
I'm having similar difficulties to the one posed in this answered question, except the solution provided isn't working with my version of the problem.
With this sample XML data:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<element>
<name>XYZ</name>
<value>789</value>
</element>
<element>
<name>ABC</name>
<value>123</value>
</element>
</root>
My goal is to obtain a dictionary with the keys XYZ,ABC and the corresponding values 789,123. In other words, it should output the same as:
dict(XYZ=789,ABC=123)
Find element tags, and their name, value children using findall and find methods:
>>> import xml.etree.ElementTree as ET
>>>
>>> root = ET.fromstring('''<?xml version="1.0" encoding="UTF-8"?>
... <root>
... <element>
... <name>XYZ</name>
... <value>789</value>
... </element>
... <element>
... <name>ABC</name>
... <value>123</value>
... </element>
... </root>
... ''')
>>> {e.find('name').text: e.find('value').text for e in root.findall('element')}
{'XYZ': '789', 'ABC': '123'}
Another try may be using xpath and lxml.etree.
from lxml import etree
s="""<root>
<element>
<name>XYZ</name>
<value>789</value>
</element>
<element>
<name>ABC</name>
<value>123</value>
</element>
</root>"""
tree = etree.fromstring(s)
data = [(i.xpath("./name//text()")[0],i.xpath("./value//text()")[0]) for i in tree.xpath("//element")]
print {k:v for k,v in data}
Output-
{'XYZ': '789', 'ABC': '123'}
I have XML stored as a string "vincontents", formatted as such:
<response>
<data>
<vin>1FT7X2B69CEC76666</vin>
</data>
<data>
<vin>1GNDT13S452225555</vin>
</data>
</response>
I'm trying to use Python's elementtree library to parse out the VIN values into an array or Python list. I'm only interested in the values, not the tags.
def parseVins():
content = etree.fromstring(vincontents)
vins = content.findall("data/vin")
print vins
Outputs all of the tag information:
[<Element 'vin' at 0x2d2eef0>, <Element 'vin' at 0x2d2efd0> ....
Any help would be appreciated. Thank you!
Use .text property:
>>> import xml.etree.ElementTree as etree
>>> data = """<response>
... <data>
... <vin>1FT7X2B69CEC76666</vin>
... </data>
... <data>
... <vin>1GNDT13S452225555</vin>
... </data>
... </response>"""
>>> tree = etree.fromstring(data)
>>> [el.text for el in tree.findall('.//data/vin')]
['1FT7X2B69CEC76666', '1GNDT13S452225555']