Store XML values as Python list - python

I have XML stored as a string "vincontents", formatted as such:
<response>
<data>
<vin>1FT7X2B69CEC76666</vin>
</data>
<data>
<vin>1GNDT13S452225555</vin>
</data>
</response>
I'm trying to use Python's elementtree library to parse out the VIN values into an array or Python list. I'm only interested in the values, not the tags.
def parseVins():
content = etree.fromstring(vincontents)
vins = content.findall("data/vin")
print vins
Outputs all of the tag information:
[<Element 'vin' at 0x2d2eef0>, <Element 'vin' at 0x2d2efd0> ....
Any help would be appreciated. Thank you!

Use .text property:
>>> import xml.etree.ElementTree as etree
>>> data = """<response>
... <data>
... <vin>1FT7X2B69CEC76666</vin>
... </data>
... <data>
... <vin>1GNDT13S452225555</vin>
... </data>
... </response>"""
>>> tree = etree.fromstring(data)
>>> [el.text for el in tree.findall('.//data/vin')]
['1FT7X2B69CEC76666', '1GNDT13S452225555']

Related

How do I access elements in an XML when multiple default namespaces are used?

I would expect this code to produce a non-empty list:
import xml.etree.ElementTree as et
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<A
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="a:namespace">
<B xmlns="b:namespace">
<C>"Stuff"</C>
</B>
</A>
'''
namespaces = {'a' : 'a:namespace', 'b' : 'b:namespace'}
xroot = et.fromstring(xml)
res = xroot.findall('b:C', namespaces)
instead, res is an empty array. Why?
When I inspect the contents of xroot I can see that the C item is within b:namespace as expected:
for x in xroot.iter():
print(x)
# result:
<Element '{a:namespace}A' at 0x7f56e13b95e8>
<Element '{b:namespace}B' at 0x7f56e188d2c8>
<Element '{b:namespace}C' at 0x7f56e188def8>
To check whether something was wrong with my namespacing, I tried this as well; xroot.findall('{b:namespace}C') but the result was an empty array as well.
Your findall xpath 'b:C' is searching only tags immediately in the root element; you need to make it './/b:C' so the tag is found anywhere in the tree and it works, e.g.:
import xml.etree.ElementTree as et
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<A
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="a:namespace">
<B xmlns="b:namespace">
<C>"Stuff"</C>
</B>
</A>
'''
namespaces = {'a' : 'a:namespace', 'b' : 'b:namespace'}
xroot = et.fromstring(xml)
######## changed the xpath to start with .//
res = xroot.findall('.//b:C', namespaces)
print( f"{res=}" )
for x in xroot.iter():
print(x)
Output:
res=[<Element '{b:namespace}C' at 0x00000222DFCAAA40>]
<Element '{a:namespace}A' at 0x00000222DFCAA9A0>
<Element '{b:namespace}B' at 0x00000222DFCAA9F0>
<Element '{b:namespace}C' at 0x00000222DFCAAA40>
See here for some useful examples of ElementTree xpath support https://docs.python.org/3/library/xml.etree.elementtree.html?highlight=xpath#xpath-support

How to parse XML grouped by specific tag id

I have the following xml file and I will like to structure it group it by Table Id.
xml = """
<Tables Count="19">
<Table Id="1" >
<Data>
<Cell>
<Brush/>
<Text>AA</Text>
<Text>BB</Text>
</Cell>
</Data>
</Table>
<Table Id="2" >
<Data>
<Cell>
<Brush/>
<Text>CC</Text>
<Text>DD</Text>
</Cell>
</Data>
</Table>
</Tables>
"""
I would like to parse it and get something like this.
I have tried something below but couldn't figure out it.
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables"):
name = user.xpath("Table")[0].text
users[name] = []
for group in user.xpath("Data/Cell/Text"):
users[name].append(group.text)
print (users)
Is that possible to get the above result? if so, could anyone help me to do this? I really appreciate your effort.
You need to change your xpath queries to:
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables/Table"):
# ^^^
name = user.attrib['Id']
users[name] = []
for group in user.xpath(".//Data/Cell/Text"):
# ^^^
users[name].append(group.text)
print (users)
...and use the attrib dictionary.
This yields for your string:
{'1': ['AA', 'BB'], '2': ['CC', 'DD']}
If you're into "one-liners", you could even do:
users = {name: [group.text for group in user.xpath(".//Data/Cell/Text")]
for user in tree.xpath("//Tables/Table")
for name in [user.attrib["Id"]]}

How to create a subset of document using lxml?

Suppose you have an lmxl.etree element with the contents like:
<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
<element2>
<subelement2>blibli</sublement2>
</element2>
</root>
I can use find or xpath methods to get something an element rendering something like:
<element1>
<subelement1>blabla</subelement1>
</element1>
Is there a way simple to get:
<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
</root>
i.e The element of interest plus all it's ancestors up to the document root?
I am not sure there is something built-in for it, but here is a terrible, "don't ever use it in real life" type of a workaround using the iterancestors() parent iterator:
from lxml import etree as ET
data = """<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
<element2>
<subelement2>blibli</subelement2>
</element2>
</root>"""
root = ET.fromstring(data)
element = root.find(".//subelement1")
result = ET.tostring(element)
for node in element.iterancestors():
result = "<{name}>{text}</{name}>".format(name=node.tag, text=result)
print(ET.tostring(ET.fromstring(result), pretty_print=True))
Prints:
<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
</root>
The following code removes elements that don't have any subelement1 descendants and are not named subelement1.
from lxml import etree
tree = etree.parse("input.xml") # First XML document in question
for elem in tree.iter():
if elem.xpath("not(.//subelement1)") and not(elem.tag == "subelement1"):
if elem.getparent() is not None:
elem.getparent().remove(elem)
print etree.tostring(tree)
Output:
<root>
<element1>
<subelement1>blabla</subelement1>
</element1>
</root>

How to add xml nodes in python using ElementTree

i have xml file like
<data>
<person>
<Name>xyz</Name>
<add>abc</add>
</person>
</data>
i want to add another person node like
<data>
<person>
<Name>xyz</Name>
<add>abc</add>
</person>
<person>
<Name>def</Name>
</person>
</data>
my current python code is
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element
from xml.etree.ElementTree import ElementTree
root = ET.parse("Lexicon.xml").getroot()
creRoot = Element("person")
creDictionary = Element("Name")
creDictionary.text = "def"
creRoot.append(creDictionary)
print(ET.tostring(creRoot))
creTree= ElementTree(creRoot)
creTree.write("Lexicon.xml")
when i run this code it will create xml file rather then add and the result is
<person>
<Name>def</Name>
</person>
and it will remove all previous data..
Kindly anyone who can solve it.. Thanks in advance
SubElement shall be used to add nodes to existing node:
import xml.etree.ElementTree as etree
data = etree.XML(input)
person = etree.SubElement(data, 'person')
name = etree.SubElement(person, 'Name')
name.text = 'def'
print(etree.tostring(data))
We need to append new create element to respective parent element.
Demo:
>>> import xml.etree.ElementTree as ET
>>> input_data = """<data>
... <person>
... <Name>xyz</Name>
... <add>abc</add>
... </person>
... </data>"""
#- Create new Element.
>>> person_tag = ET.Element("person")
>>> name_tag = ET.Element("Name")
#- Add text to Element.
>>> name_tag.text = "def"
#- Append Element to Parent Element.
>>> person_tag.append(name_tag)
>>>
#- Just print Parent Element
>>> ET.tostring(person_tag)
'<person><Name>def</Name></person>'
>>>
>>>
#- Created ET object by formstring.
>>> root = ET.fromstring(input_data)
>>>
#- Append above element to root element
>>> root.append(person_tag)
#- Print root Element.
>>> print ET.tostring(root)
<data>
<person>
<Name>xyz</Name>
<add>abc</add>
</person>
<person><Name>def</Name></person></data>
>>> print ET.tostring(root, method="xml")
<data>
<person>
<Name>xyz</Name>
<add>abc</add>
</person>
<person><Name>def</Name></person></data>
>>>
Note: Best to use lxml b

Get attribute of first element using lxml

Trying to parse an XML file using lxml in Python, how do I simply get the value of an element's attribute? Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<item id="123">
<sub>ABC</sub>
</item>
I'd like to get the result 123, and store it as a variable.
When using etree.parse(), simply call .getroot() to get the root element; the .attrib attribute is a dictionary of all attributes, use that to get the value:
>>> from lxml import etree
>>> tree = etree.parse('test.xml')
>>> tree.getroot().attrib['id']
'123'
If you used etree.fromstring() the object returned is the root object already, so no .getroot() call is needed:
>>> tree = etree.fromstring('''\
... <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
... <item id="123">
... <sub>ABC</sub>
... </item>
... ''')
>>> tree.attrib['id']
'123'
Alternatively, you could use an XPath selector:
>>> from lxml import etree
>>> tree = etree.fromstring(b'''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<item id="123">
<sub>ABC</sub>
</item>''')
>>> tree.xpath('/item/#id')
['123']
I think Martijn has answered your question. Building on his answer, you can also use the items() method to get a list of tuples with the attributes and values. This may be useful if you need the values of multiple attributes. Like so:
>>> from lxml import etree
>>> tree = etree.parse('test.xml')
>>> item = tree.xpath('/item')
>>> item.items()
[('id', '123')]
Or in case of string:
>>> tree = etree.fromstring("""\
... <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
... <item id="123">
... <sub>ABC</sub>
... </item>
... """)
>>> tree.items()
[('id', '123')]

Categories

Resources