XML Parsing issue in python using xml.etree.ElementTree - python

I do have following xml generated by some http response
<?xml version="1.0" encoding="UTF-8"?>
<Response rid="1000" status="succeeded" moreData="false">
<Results completed="true" total="25" matched="5" processed="25">
<Resource type="h" DisplayName="Host" name="tango">
<Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
<PerfData attrId="cpuUsage" attrName="Usage">
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="36.00"/>
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="86.00"/>
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="29.00"/>
</PerfData>
<Resource type="vm" DisplayName="VM" name="charlie" baseHost="tango">
<Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
<PerfData attrId="cpuUsage" attrName="Usage">
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="6.00"/>
</PerfData>
</Resource>
</Resource>
</Result>
</Response>
If you look at this carefully -
Outer has one more same tag inside that
So high level xml structure is as below
<Resource>
<Resource>
</Resource>
</Resource>
Python ElementTree can parse only outer xml ... Below is my code
pattern = re.compile(r'(<Response.*?</Response>)',
re.VERBOSE | re.MULTILINE)
for match in pattern.finditer(data):
contents = match.group(1)
responses = xml.fromstring(contents)
for results in responses:
result = results.tag
for resources in results:
resource = resources.tag
temp = {}
temp = resources.attrib
print temp
This shows following output (temp)
{'typeDisplayName': 'Host', 'type': 'h', 'name': 'tango'}
How can I fetch inner attributes?

Don't parse xml with regular expressions! That won't work, use some xml parsing library instead, lxml for instance:
edit: the code example now fetch top resources only, the loop over them and try to fetch "sub resources", this was made after OP request in comment
from lxml import etree
content = '''
YOUR XML HERE
'''
root = etree.fromstring(content)
# search for all "top level" resources
resources = root.xpath("//Resource[not(ancestor::Resource)]")
for resource in resources:
# copy resource attributes in a dict
mashup = dict(resource.attrib)
# find child resource elements
subresources = resource.xpath("./Resource")
# if we find only one resource, add it to the mashup
if len(subresources) == 1:
mashup['resource'] = dict(subresources[0].attrib)
# else... not idea what the OP wants...
print mashup
That will output:
{'resource': {'DisplayName': 'VM', 'type': 'vm', 'name': 'charlie', 'baseHost': 'tango'}, 'DisplayName': 'Host', 'type': 'h', 'name': 'tango'}

Related

Increment list indexes to get correct values to be updated in XML data based on Title

List elements to be appended in XML data:
Sorted_TestSpecID: [10860972, 10860972, 10860972, 10860972, 10860972]
Sorted_TestCaseID: [16961435, 16961462, 16961739, 16961741, 16961745]
Sorted_TestText : ['SIG1', 'SIG2', 'SIG3', 'Signal1', 'Signal2']
original xml data:
<tc>
<title>Signal1</title>
<tcid>2c758925-dc3d-4b1d-a5e2-e0ca54c52a47</tcid>
<attributes>
<attr>
<key>TestSpec ID</key>
<value>0</value>
</attr>
<attr>
<key>TestCase ID</key>
<value>0</value>
</attr>
</attributes>
</tc>
Trying Python script to:
Search title Signal1 in xml data from Sorted_TestText
Then it should search for Key =TestCase ID and update the corresponding 16961741 value
Then it shall check for its resp. Key =TestSpec ID and update the corresponding 10860972.
soup = BeautifulSoup(xml_data, 'xml')
for tc in soup.find_all('tc'):
for title, spec, case in zip(Sorted_TestText, Sorted_TestSpecID, Sorted_TestCaseID):
if tc.find('title').text == title:
for attr in tc.find_all('attr'):
if attr.find('key').text == "TestSpec ID":
attr.find('value').text = str(spec)
if attr.find('key').text == "TestCase ID"
attr.find('value').text = str(case)
print(soup)
I've tried above script ,this script is not updating spec and case based on title, working on if spec, case and title are in order. My intention was script shall look for title and then it shall update its respective attributes. Lets say in my xml 'SIG1', 'SIG2', 'SIG3' are not present; I want to update spec and case of Signal1 with spec: 10860972 case: 16961741, but with this script it is updating SIG4 as spec: 10860972 case: 16961435. Need to traverse the spec and case lists as well for respective title. I tried, but no luck.; Required support here. Thanks in advance.
I'd use a dictionary where keys are titles and values are TestCaseIDs and TestSpecIDs.
Then, to change the contents of <value> use .string instead of .text:
dct = {
c: (str(a), str(b))
for a, b, c in zip(Sorted_TestSpecID, Sorted_TestCaseID, Sorted_TestText)
}
for tc in soup.select("tc"):
title = tc.title.get_text(strip=True)
if title not in dct:
continue
val = tc.select_one('attr:has(key:-soup-contains("TestSpec ID")) value')
if val:
val.string = str(dct[title][0])
val = tc.select_one('attr:has(key:-soup-contains("TestCase ID")) value')
if val:
val.string = str(dct[title][1])
print(soup.prettify())
Prints:
<?xml version="1.0" encoding="utf-8"?>
<tc>
<title>
Signal1
</title>
<tcid>
2c758925-dc3d-4b1d-a5e2-e0ca54c52a47
</tcid>
<attributes>
<attr>
<key>
TestSpec ID
</key>
<value>
10860972
</value>
</attr>
<attr>
<key>
TestCase ID
</key>
<value>
16961741
</value>
</attr>
</attributes>
</tc>

How to parse XML grouped by specific tag id

I have the following xml file and I will like to structure it group it by Table Id.
xml = """
<Tables Count="19">
<Table Id="1" >
<Data>
<Cell>
<Brush/>
<Text>AA</Text>
<Text>BB</Text>
</Cell>
</Data>
</Table>
<Table Id="2" >
<Data>
<Cell>
<Brush/>
<Text>CC</Text>
<Text>DD</Text>
</Cell>
</Data>
</Table>
</Tables>
"""
I would like to parse it and get something like this.
I have tried something below but couldn't figure out it.
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables"):
name = user.xpath("Table")[0].text
users[name] = []
for group in user.xpath("Data/Cell/Text"):
users[name].append(group.text)
print (users)
Is that possible to get the above result? if so, could anyone help me to do this? I really appreciate your effort.
You need to change your xpath queries to:
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables/Table"):
# ^^^
name = user.attrib['Id']
users[name] = []
for group in user.xpath(".//Data/Cell/Text"):
# ^^^
users[name].append(group.text)
print (users)
...and use the attrib dictionary.
This yields for your string:
{'1': ['AA', 'BB'], '2': ['CC', 'DD']}
If you're into "one-liners", you could even do:
users = {name: [group.text for group in user.xpath(".//Data/Cell/Text")]
for user in tree.xpath("//Tables/Table")
for name in [user.attrib["Id"]]}

Get value for XML attribute in python

I am trying to parse an XML file in python and seems like my XML is different from the normal nomenclature.
Below is my XML snippet:
<records>
<record>
<parameter>
<name>Server</name>
<value>Application_server_01</value>
</parameter
</record>
</records>
I am trying to get the value of "parameter" name and value however i seem to get empty value.
I checked the online documentation and almost all XML seems to be in the below format
<neighbor name="Switzerland" direction="W"/>
I am able to parse this fine, how can i get the values for my XML attributes without changing the formatting.
working code
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
output
C:/Users/xxxxxx/PycharmProjects/default/parse.py
{'direction': 'E', 'name': 'Austria'}
{'direction': 'W', 'name': 'Switzerland'}
{'direction': 'N', 'name': 'Malaysia'}
{'direction': 'W', 'name': 'Costa Rica'}
{'direction': 'E', 'name': 'Colombia'}
PS: I will be using the XML to fire an API call and doubt if the downstream application would like the second way of formatting.
Below is my python code
import xml.etree.ElementTree as ET
tree = ET.parse('at.xml')
root = tree.getroot()
for name in root.iter('name'):
print(name.attrib)
Output for the above code
C:/Users/xxxxxx/PycharmProjects/default/learning.py
{}
{}
{}
{}
{}
{}
{}
{}
Use lxml and XPath:
from lxml import etree as et
tree = et.parse(open("/tmp/so.xml"))
name = tree.xpath("/records/record/parameter/name/text()")[0]
value = tree.xpath("/records/record/parameter/value/text()")[0]
print(name, value)
Output:
Server Application_server_01

XML not returning correct child tags/data in Python

Hello I am making a requests call to return order data from a online store. My issue is that once I have passed my data to a root variable the method iter is not returning the correct results. e.g. Display multiple tags of the same name rather than one and not showing the data within the tag.
I thought this was due to the XML not being correctly formatted so I formatted it by saving it to a file using pretty_print but that hasn't fixed the error.
How do I fix this? - Thanks in advance
Code:
import requests, xml.etree.ElementTree as ET, lxml.etree as etree
url="http://publicapi.ekmpowershop24.com/v1.1/publicapi.asmx"
headers = {'content-type': 'application/soap+xml'}
body = """<?xml version="1.0" encoding="utf-8"?>
<soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
<soap12:Body>
<GetOrders xmlns="http://publicapi.ekmpowershop.com/">
<GetOrdersRequest>
<APIKey>my_api_key</APIKey>
<FromDate>01/07/2018</FromDate>
<ToDate>04/07/2018</ToDate>
</GetOrdersRequest>
</GetOrders>
</soap12:Body>
</soap12:Envelope>"""
#send request to ekm
r = requests.post(url,data=body,headers=headers)
#save output to file
file = open("C:/Users/Mark/Desktop/test.xml", "w")
file.write(r.text)
file.close()
#take the file and format the xml
x = etree.parse("C:/Users/Mark/Desktop/test.xml")
newString = etree.tostring(x, pretty_print=True)
file = open("C:/Users/Mark/Desktop/test.xml", "w")
file.write(newString.decode('utf-8'))
file.close()
#parse the file to get the roots
tree = ET.parse("C:/Users/Mark/Desktop/test.xml")
root = tree.getroot()
#access elements names in the data
for child in root.iter('*'):
print(child.tag)
#show orders elements attributes
tree = ET.parse("C:/Users/Mark/Desktop/test.xml")
root = tree.getroot()
for order in root.iter('{http://publicapi.ekmpowershop.com/}Order'):
out = {}
for child in order:
if child.tag in ('OrderID'):
out[child.tag] = child.text
print(out)
Elements output:
{http://publicapi.ekmpowershop.com/}Orders
{http://publicapi.ekmpowershop.com/}Order
{http://publicapi.ekmpowershop.com/}OrderID
{http://publicapi.ekmpowershop.com/}OrderNumber
{http://publicapi.ekmpowershop.com/}CustomerID
{http://publicapi.ekmpowershop.com/}CustomerUserID
{http://publicapi.ekmpowershop.com/}Order
{http://publicapi.ekmpowershop.com/}OrderID
{http://publicapi.ekmpowershop.com/}OrderNumber
{http://publicapi.ekmpowershop.com/}CustomerID
{http://publicapi.ekmpowershop.com/}CustomerUserID
Orders Output:
{http://publicapi.ekmpowershop.com/}Order {}
{http://publicapi.ekmpowershop.com/}Order {}
XML Structure after formating:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetOrdersResponse xmlns="http://publicapi.ekmpowershop.com/">
<GetOrdersResult>
<Status>Success</Status>
<Errors/>
<Date>2018-07-10T13:47:00.1682029+01:00</Date>
<TotalOrders>10</TotalOrders>
<TotalCost>100</TotalCost>
<Orders>
<Order>
<OrderID>100</OrderID>
<OrderNumber>102/040718/67</OrderNumber>
<CustomerID>6910</CustomerID>
<CustomerUserID>204</CustomerUserID>
<FirstName>TestFirst</FirstName>
<LastName>TestLast</LastName>
<CompanyName>Test Company</CompanyName>
<EmailAddress>test#Test.com</EmailAddress>
<OrderStatus>Dispatched</OrderStatus>
<OrderStatusColour>#00CC00</OrderStatusColour>
<TotalCost>85.8</TotalCost>
<OrderDate>10/07/2018 14:30:43</OrderDate>
<OrderDateISO>2018-07-10T14:30:43</OrderDateISO>
<AbandonedOrder>false</AbandonedOrder>
<EkmStatus>SUCCESS</EkmStatus>
</Order>
</Orders>
<Currency>GBP</Currency>
</GetOrdersResult>
</GetOrdersResponse>
</soap:Body>
</soap:Envelope>
You need to consider the namespace when checking for tags.
>>> # Include the namespace part of the tag in the tag values that we check.
>>> tags = ('{http://publicapi.ekmpowershop.com/}OrderID', '{http://publicapi.ekmpowershop.com/}OrderNumber')
>>> for order in root.iter('{http://publicapi.ekmpowershop.com/}Order'):
... out = {}
... for child in order:
... if child.tag in tags:
... out[child.tag] = child.text
... print(out)
...
{'{http://publicapi.ekmpowershop.com/}OrderID': '100', '{http://publicapi.ekmpowershop.com/}OrderNumber': '102/040718/67'}
If you don't want the namespace prefixes in the output, you can strip them by only including that part of the tag after the } character.
>>> for order in root.iter('{http://publicapi.ekmpowershop.com/}Order'):
... out = {}
... for child in order:
... if child.tag in tags:
... out[child.tag[child.tag.index('}')+1:]] = child.text
... print(out)
...
{'OrderID': '100', 'OrderNumber': '102/040718/67'}

Python - How to parse xml response and store a elements value in a variable?

I am getting the XML response from the API call.
I need the "testId" attribute value from this response. Please help me on this.
r = requests.get( myconfig.URL_webpagetest + "?url=" + testurl + "&f=xml&k=" + myconfig.apikey_webpagetest )
xmltxt = r.content
print(xmltxt)
testId = XML(xmltxt).find("testId").text
r = requests.get("http://www.webpagetest.org/testStatus.php?f=xml&test=" + testId )
xml response:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<statusCode>200</statusCode>
<statusText>Ok</statusText>
<data>
<testId>180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</testId>
<ownerKey>dfc65d98de13c4770e528ef5b65e9629a52595e9</ownerKey>
<jsonUrl>http://www.webpagetest.org/jsonResult.php?test=180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</jsonUrl>
</data>
</response>
The following error is produced:
Traceback (most recent call last):
File "/pagePerformance.py", line 52, in <module>
testId = XML (xmltxt).find("testId").text
AttributeError: 'NoneType' object has no attribute 'text'
Use the following to collect testId from response:-
import xml.etree.ElementTree as ET
response_xml_as_string = "xml response string from API"
responseXml = ET.fromstring(response_xml_as_string)
testId = responseXml.find('data').find('testId')
print testId.text
from lxml.etree import fromstring
string = '<?xml version="1.0" encoding="UTF-8"?> <response> <statusCode>200</statusCode> <statusText>Ok</statusText> <data><testId>180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</testId> <ownerKey>dfc65d98de13c4770e528ef5b65e9629a52595e9</ownerKey> <jsonUrl>http://www.webpagetest.org/jsonResult.php?test=180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</jsonUrl> </data> </response>'
response = fromstring(string.encode('utf-8'))
elm = response.xpath('/response/data/testId').pop()
testId = elm.text
This way you can search for any element within the xml from the root/parent element via the XPATH.
Side Note: I don't particular like using the pop method to remove the item from a single item list. So if anyone else has a better way to do it please let me know. So far I've consider:
1) elm = next(iter(response.xpath('/response/data/testId')))
2) simply leaving it in a list so it can use as a stararg
I found this article the other day when it appeared on my feed, and it may suit your needs. I skimmed it, but in general the package parses xml data and converts the tags/attributes/values into a dictionary. Additionally, the author points out that it maintains the nesting structure of the xml as well.
https://www.oreilly.com/learning/jxmlease-python-xml-conversion-data-structures
for your use case.
>>> xml = '<?xml version="1.0" encoding="UTF-8"?> <response> <statusCode>200</statusCode> <statusText>Ok</statusText> <data> <testId>180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</testId> <ownerKey>dfc65d98de13c4770e528ef5b65e9629a52595e9</ownerKey> <jsonUrl>http://www.webpagetest.org/jsonResult.php?test=180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</jsonUrl> </data> </response>'
>>> root = jxmlease.parse(xml)
>>> testid = root['response']['data']['testId'].get_cdata()
>>> print(testid)
>>> '180523_YM_054fd7d84fd4ea7aed237f87289e0c7c'

Categories

Resources