I am trying to parse an XML file in python and seems like my XML is different from the normal nomenclature.
Below is my XML snippet:
<records>
<record>
<parameter>
<name>Server</name>
<value>Application_server_01</value>
</parameter
</record>
</records>
I am trying to get the value of "parameter" name and value however i seem to get empty value.
I checked the online documentation and almost all XML seems to be in the below format
<neighbor name="Switzerland" direction="W"/>
I am able to parse this fine, how can i get the values for my XML attributes without changing the formatting.
working code
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
output
C:/Users/xxxxxx/PycharmProjects/default/parse.py
{'direction': 'E', 'name': 'Austria'}
{'direction': 'W', 'name': 'Switzerland'}
{'direction': 'N', 'name': 'Malaysia'}
{'direction': 'W', 'name': 'Costa Rica'}
{'direction': 'E', 'name': 'Colombia'}
PS: I will be using the XML to fire an API call and doubt if the downstream application would like the second way of formatting.
Below is my python code
import xml.etree.ElementTree as ET
tree = ET.parse('at.xml')
root = tree.getroot()
for name in root.iter('name'):
print(name.attrib)
Output for the above code
C:/Users/xxxxxx/PycharmProjects/default/learning.py
{}
{}
{}
{}
{}
{}
{}
{}
Use lxml and XPath:
from lxml import etree as et
tree = et.parse(open("/tmp/so.xml"))
name = tree.xpath("/records/record/parameter/name/text()")[0]
value = tree.xpath("/records/record/parameter/value/text()")[0]
print(name, value)
Output:
Server Application_server_01
Related
I have an absolute path for the values of XML files I want to retrieve. The absolute path is in the format of "A/B/C". How can I do this in Python?
Another method.
from simplified_scrapy import SimplifiedDoc, utils, req
# Basic
xml = '''<ROOT><A><B><C>The Value</C></B></A></ROOT>'''
doc = SimplifiedDoc(xml)
print (doc.select('A>B>C'))
# Multiple
xml = '''<ROOT><A><B><C>The Value 1</C></B></A><A><B><C>The Value 2</C></B></A></ROOT>'''
doc = SimplifiedDoc(xml)
# print (doc.selects('A').select('B').select('C'))
print (doc.selects('A').select('B>C'))
# Mixed structure
xml = '''<ROOT><A><other>no B</other></A><A><other></other><B>no C</B></A><A><B><C>The Value</C></B></A></ROOT>'''
doc = SimplifiedDoc(xml)
nodes = doc.selects('A').selects('B').select('C')
for node in nodes:
for c in node:
if c:
print (c)
Result:
{'tag': 'C', 'html': 'The Value'}
[{'tag': 'C', 'html': 'The Value 1'}, {'tag': 'C', 'html': 'The Value 2'}]
{'tag': 'C', 'html': 'The Value'}
Using ElementTree library (Note that my answer uses core python library while the other answers are using external libraries.)
import xml.etree.ElementTree as ET
xml = '''<ROOT><A><B><C>The Value</C></B></A></ROOT>'''
root = ET.fromstring(xml)
print(root.find('./A/B/C').text)
output
The Value
You can use lxml which you can install via pip install lxml.
See also https://lxml.de/xpathxslt.html
from io import StringIO
from lxml import etree
data = '''\
<prestashop>
<combination>
<id>a</id>
<id_product>b</id_product>
<location>c</location>
<ean13>d</ean13>
<isbn>e</isbn>
<upc>f</upc>
<mpn>g</mpn>
</combination>
</prestashop>
'''
xpath = '/prestashop/combination/ean13'
f = StringIO(data)
tree = etree.parse(f)
matches = tree.xpath(xpath)
for e in matches:
print(e.text)
I have the following xml file and I will like to structure it group it by Table Id.
xml = """
<Tables Count="19">
<Table Id="1" >
<Data>
<Cell>
<Brush/>
<Text>AA</Text>
<Text>BB</Text>
</Cell>
</Data>
</Table>
<Table Id="2" >
<Data>
<Cell>
<Brush/>
<Text>CC</Text>
<Text>DD</Text>
</Cell>
</Data>
</Table>
</Tables>
"""
I would like to parse it and get something like this.
I have tried something below but couldn't figure out it.
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables"):
name = user.xpath("Table")[0].text
users[name] = []
for group in user.xpath("Data/Cell/Text"):
users[name].append(group.text)
print (users)
Is that possible to get the above result? if so, could anyone help me to do this? I really appreciate your effort.
You need to change your xpath queries to:
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables/Table"):
# ^^^
name = user.attrib['Id']
users[name] = []
for group in user.xpath(".//Data/Cell/Text"):
# ^^^
users[name].append(group.text)
print (users)
...and use the attrib dictionary.
This yields for your string:
{'1': ['AA', 'BB'], '2': ['CC', 'DD']}
If you're into "one-liners", you could even do:
users = {name: [group.text for group in user.xpath(".//Data/Cell/Text")]
for user in tree.xpath("//Tables/Table")
for name in [user.attrib["Id"]]}
If I've got an XML file like this:
<root
xmlns:a="http://example.com/a"
xmlns:b="http://example.com/b"
xmlns:c="http://example.com/c"
xmlns="http://example.com/base">
...
</root>
How can I get a list of the namespace definitions (ie, the xmlns:a="…", etc)?
Using:
import xml.etree.ElementTree as ET
tree = ET.parse('foo.xml')
root = tree.getroot()
print root.attrib()
Shows an empty attribute dictionary.
Via #mzjn, in the comments, here's how to do it with stock ElementTree: https://stackoverflow.com/a/42372404/407651 :
import xml.etree.ElementTree as ET
my_namespaces = dict([
node for (_, node) in ET.iterparse('file.xml', events=['start-ns'])
])
You might find it easier to use lxml.
from lxml import etree
xml_data = '<root xmlns:a="http://example.com/a" xmlns:b="http://example.com/b" xmlns:c="http://example.com/c" xmlns="http://example.com/base"></root>'
root_node = etree.fromstring(xml_data)
print root_node.nsmap
This outputs
{None: 'http://example.com/base',
'a': 'http://example.com/a',
'b': 'http://example.com/b',
'c': 'http://example.com/c'}
I am using ElementTree to build an XML file.
When I try to set an element's attribute with ET.SubElement().__setattr__(), I get the error AttributeError: __setattr__.
import xml.etree.cElementTree as ET
summary = open(Summary.xml, 'w')
root = ET.Element('Summary')
ET.SubElement(root, 'TextSummary')
ET.SubElement(root,'TextSummary').__setattr__('Status','Completed') # Error occurs here
tree = ET.ElementTree(root)
tree.write(summary)
summary.close()
After code execution, my XML should resemble the following:
<Summary>
<TextSummary Status = 'Completed'/>
</Summary>
How do I add attributes to an XML element with Python using xml.etree.cElementTree?
You should be doing:
ET.SubElement(root,'TextSummary').set('Status','Completed')
The Etree documentation shows usage.
You can specify attributes for an Element or SubElement during creation with keyword arguments.
import xml.etree.ElementTree as ET
root = ET.Element('Summary')
ET.SubElement(root, 'TextSummary', Status='Completed')
XML:
<Summary>
<TextSummary Status="Completed"/>
</Summary>
Alternatively, you can use .set to add attributes to an existing element.
import xml.etree.ElementTree as ET
root = ET.Element('Summary')
sub = ET.SubElement(root, 'TextSummary')
sub.set('Status', 'Completed')
XML:
<Summary>
<TextSummary Status="Completed"/>
</Summary>
Technical Explanation:
The constructors for Element and SubElement include **extra, which accepts attributes as keyword arguments.
xml.etree.ElementTree.Element(tag, attrib={}, **extra)
xml.etree.ElementTree.SubElement(parent, tag, attrib={}, **extra)
This allows you to add an arbitrary number of attributes.
root = ET.Element('Summary', Date='2018/07/02', Timestamp='11:44am')
# <Summary Date = "2018/07/02" Timestamp = "11:44am">
You can also use use .set to add attributes to a pre-existing element. However, this can only add one element at a time. (As suggested by Thomas Orozco).
root = ET.Element('Summary')
root.set('Date', '2018/07/02')
root.set('Timestamp', '11:44am')
# <Summary Date = "2018/07/02" Timestamp = "11:44am">
Full Example:
import xml.etree.ElementTree as ET
root = ET.Element('school', name='Willow Creek High')
ET.SubElement(root, 'student', name='Jane Doe', grade='9')
print(ET.tostring(root).decode())
# <school name="Willow Creek High"><student grade="9" name="Jane Doe" /></school>
The best way to set multiple attributes in single line is below.
I wrote this code for SVG XML creation:
from xml.etree import ElementTree as ET
svg = ET.Element('svg', attrib={'height':'210','width':'500'})
g = ET.SubElement(svg,'g', attrib={'x':'10', 'y':'12','id':'groupName'})
line = ET.SubElement(g, 'line', attrib={'x1':'0','y1':'0','x2':'200','y2':'200','stroke':'red'})
print(ET.tostring(svg, encoding="us-ascii", method="xml"))
I do have following xml generated by some http response
<?xml version="1.0" encoding="UTF-8"?>
<Response rid="1000" status="succeeded" moreData="false">
<Results completed="true" total="25" matched="5" processed="25">
<Resource type="h" DisplayName="Host" name="tango">
<Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
<PerfData attrId="cpuUsage" attrName="Usage">
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="36.00"/>
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="86.00"/>
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="29.00"/>
</PerfData>
<Resource type="vm" DisplayName="VM" name="charlie" baseHost="tango">
<Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
<PerfData attrId="cpuUsage" attrName="Usage">
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="6.00"/>
</PerfData>
</Resource>
</Resource>
</Result>
</Response>
If you look at this carefully -
Outer has one more same tag inside that
So high level xml structure is as below
<Resource>
<Resource>
</Resource>
</Resource>
Python ElementTree can parse only outer xml ... Below is my code
pattern = re.compile(r'(<Response.*?</Response>)',
re.VERBOSE | re.MULTILINE)
for match in pattern.finditer(data):
contents = match.group(1)
responses = xml.fromstring(contents)
for results in responses:
result = results.tag
for resources in results:
resource = resources.tag
temp = {}
temp = resources.attrib
print temp
This shows following output (temp)
{'typeDisplayName': 'Host', 'type': 'h', 'name': 'tango'}
How can I fetch inner attributes?
Don't parse xml with regular expressions! That won't work, use some xml parsing library instead, lxml for instance:
edit: the code example now fetch top resources only, the loop over them and try to fetch "sub resources", this was made after OP request in comment
from lxml import etree
content = '''
YOUR XML HERE
'''
root = etree.fromstring(content)
# search for all "top level" resources
resources = root.xpath("//Resource[not(ancestor::Resource)]")
for resource in resources:
# copy resource attributes in a dict
mashup = dict(resource.attrib)
# find child resource elements
subresources = resource.xpath("./Resource")
# if we find only one resource, add it to the mashup
if len(subresources) == 1:
mashup['resource'] = dict(subresources[0].attrib)
# else... not idea what the OP wants...
print mashup
That will output:
{'resource': {'DisplayName': 'VM', 'type': 'vm', 'name': 'charlie', 'baseHost': 'tango'}, 'DisplayName': 'Host', 'type': 'h', 'name': 'tango'}