XML Parsing text and attrib on same line

XML Parsing text and attrib on same line - python

I am working on a school assignment where we have to parse 3 elements from an xml file and print them on the same line with the titles, Artist:, Title:, Decade: in python 3. I was able to complete the Artist and Title part but the decade is a attrib contained in a element that i can only seem to get to print below or I get a "TypeError: 'dict' object is not callable". From my understanding when trying to parse an attrib I must use iter function so i understand why it's wrong but i just can't figure out how to fit that into my for loop so they can all print on the same line. The code is below
This is what I have:
import xml.etree.ElementTree as et
tree = et.parse("cd_catalog.xml")
root = tree.getroot()
for child in root.findall("CD"):
artist = child.find("ARTIST").text
title = child.find("TITLE").text
decade = child.attrib("decade").text
print("Artist: %s, Title: %s, Decade: %s" %(artist, title, decade))
The XML file has the following info:
<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
<CD decade="80s">
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>

The attrib attribute of XML elements is not a function but a dictionary of the xml element attributes. Therefore, the expression child.attrib("decade") raises an exception since you try to call child.attrib.
So you have to change:
decade = child.attrib("decade").text
by:
decade = child.attrib["decade"]

Related

print lines of tags that have no attribute in xml python

Let's assume that we have xml file:
<School Name = "school1">
<Class Name = "class A">
<Student Name = "student"/>
<Student/>
<!-- -->
</Class>
</School>
And I have a python script that using parsing. I want to print the line of a tag.
For example I want to print lines of tags that have no "Name" attribute.
Is it possible ?
I saw an example with inheritance ElementTree but couldn't understand it.
import xml.etree.ElementTree as ET
def read_root(root):
for x in root:
print(x.lineNum)
read_root(x)
def main():
fn = "a.xml"
try:
tree = ET.parse(fn)
except ET.ParseError as e:
print("\nParse error:", str(e))
print("while reading: " + fn)
exit(1)
root = tree.getroot()
read_root(root)

Your question is so unclear. Anyways, if you just want to check if the tag has a Name attribute and want to print that line number, you can use etree from lxml as shown below:
from lxml import etree
doc = etree.parse('test.xml')
for element in doc.iter():
# Check if the tag has a "Name" attribute
if "Name" not in element.attrib:
print(f"Line {element.sourceline}: {element.tag}"))
output:
Line 4: Student
Line 5: <cyfunction Comment at 0x13b8e6dc0>

You need a parser like ET.XMLPullParser what can read "comment" and "process instructions", "namespces", "start" and "end" events.
If your XML file 'comment.xml' looks like:
<?xml version="1.0" encoding="UTF-8"?>
<School Name = "school1">
<Class Name = "class A">
<Student Name = "student"/>
<Student/>
<!-- Comment xml -->
</Class>
</School>
You can parse to find TAG's without the attribute "Name" and comments:
import xml.etree.ElementTree as ET
#parser = ET.XMLPullParser(['start', 'end', "comment", "pi", "start-ns", "end-ns"])
parser = ET.XMLPullParser([ 'start', 'end', 'comment'])
with open('comment.xml', 'r', encoding='utf-8') as xml:
feedstring = xml.readlines()
for line in enumerate(feedstring):
parser.feed(line[1])
for event, elem in parser.read_events():
if elem.get("Name"):
pass
else:
print(f"{line[0]} Event:{event} | {elem.tag}, {elem.text}")
Output:
4 Event:start | Student, None
4 Event:end | Student, None
5 Event:comment | <function Comment at 0x00000216C4FDA200>, Comment xml

Python - How to parse xml response and store a elements value in a variable?

I am getting the XML response from the API call.
I need the "testId" attribute value from this response. Please help me on this.
r = requests.get( myconfig.URL_webpagetest + "?url=" + testurl + "&f=xml&k=" + myconfig.apikey_webpagetest )
xmltxt = r.content
print(xmltxt)
testId = XML(xmltxt).find("testId").text
r = requests.get("http://www.webpagetest.org/testStatus.php?f=xml&test=" + testId )
xml response:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<statusCode>200</statusCode>
<statusText>Ok</statusText>
<data>
<testId>180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</testId>
<ownerKey>dfc65d98de13c4770e528ef5b65e9629a52595e9</ownerKey>
<jsonUrl>http://www.webpagetest.org/jsonResult.php?test=180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</jsonUrl>
</data>
</response>
The following error is produced:
Traceback (most recent call last):
File "/pagePerformance.py", line 52, in <module>
testId = XML (xmltxt).find("testId").text
AttributeError: 'NoneType' object has no attribute 'text'

Use the following to collect testId from response:-
import xml.etree.ElementTree as ET
response_xml_as_string = "xml response string from API"
responseXml = ET.fromstring(response_xml_as_string)
testId = responseXml.find('data').find('testId')
print testId.text

from lxml.etree import fromstring
string = '<?xml version="1.0" encoding="UTF-8"?> <response> <statusCode>200</statusCode> <statusText>Ok</statusText> <data><testId>180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</testId> <ownerKey>dfc65d98de13c4770e528ef5b65e9629a52595e9</ownerKey> <jsonUrl>http://www.webpagetest.org/jsonResult.php?test=180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</jsonUrl> </data> </response>'
response = fromstring(string.encode('utf-8'))
elm = response.xpath('/response/data/testId').pop()
testId = elm.text
This way you can search for any element within the xml from the root/parent element via the XPATH.
Side Note: I don't particular like using the pop method to remove the item from a single item list. So if anyone else has a better way to do it please let me know. So far I've consider:
1) elm = next(iter(response.xpath('/response/data/testId')))
2) simply leaving it in a list so it can use as a stararg

I found this article the other day when it appeared on my feed, and it may suit your needs. I skimmed it, but in general the package parses xml data and converts the tags/attributes/values into a dictionary. Additionally, the author points out that it maintains the nesting structure of the xml as well.
https://www.oreilly.com/learning/jxmlease-python-xml-conversion-data-structures
for your use case.
>>> xml = '<?xml version="1.0" encoding="UTF-8"?> <response> <statusCode>200</statusCode> <statusText>Ok</statusText> <data> <testId>180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</testId> <ownerKey>dfc65d98de13c4770e528ef5b65e9629a52595e9</ownerKey> <jsonUrl>http://www.webpagetest.org/jsonResult.php?test=180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</jsonUrl> </data> </response>'
>>> root = jxmlease.parse(xml)
>>> testid = root['response']['data']['testId'].get_cdata()
>>> print(testid)
>>> '180523_YM_054fd7d84fd4ea7aed237f87289e0c7c'

Error in parsing a YQL string xml Python

I'm trying to scrape data from an API like this:
import urllib2
a = urllib2.urlopen('http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.xchange%20where%20pair%20in%20(%22USDEUR%22,%20%22USDJPY%22,%20%22USDBGN%22,%20%22USDCZK%22,%20%22USDDKK%22,%20%22USDGBP%22,%20%22USDHUF%22,%20%22USDLTL%22,%20%22USDLVL%22,%20%22USDPLN%22,%20%22USDRON%22,%20%22USDSEK%22,%20%22USDCHF%22,%20%22USDNOK%22,%20%22USDHRK%22,%20%22USDRUB%22,%20%22USDTRY%22,%20%22USDAUD%22,%20%22USDBRL%22,%20%22USDCAD%22,%20%22USDCNY%22,%20%22USDHKD%22,%20%22USDIDR%22,%20%22USDILS%22,%20%22USDINR%22,%20%22USDKRW%22,%20%22USDMXN%22,%20%22USDMYR%22,%20%22USDNZD%22,%20%22USDPHP%22,%20%22USDSGD%22,%20%22USDTHB%22,%20%22USDZAR%22,%20%22USDISK%22)&env=store://datatables.org/alltableswithkeys')
b = a.read()
b is a string object of the xml:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="34" yahoo:created="2017-04-21T19:46:11Z" yahoo:lang="en-US"><results><rate id="USDEUR"><Name>USD/EUR</Name><Rate>0.9347</Rate><Date>4/21/2017</Date><Time>7:13pm</Time><Ask>0.9352</Ask><Bid>0.9347</Bid></rate><rate id="USDJPY"><Name>USD/JPY</Name><Rate>109.2200</Rate><Date>4/21/2017</Date><Time>6:58pm</Time><Ask>109.2260</Ask><Bid>109.2200</Bid></rate><rate id="USDBGN"><Name>USD/BGN</Name><Rate>1.8282</Rate><Date>4/21/2017</Date><Time>3:15pm</Time><Ask>N/A</Ask><Bid>1.8282</Bid></rate><rate id="USDCZK"><Name>USD/CZK</Name><Rate>25.1629</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>25.1702</Ask><Bid>25.1629</Bid></rate><rate id="USDDKK"><Name>USD/DKK</Name><Rate>6.9458</Rate><Date>4/21/2017</Date><Time>6:44pm</Time><Ask>6.9466</Ask><Bid>6.9458</Bid></rate><rate id="USDGBP"><Name>USD/GBP</Name><Rate>0.7812</Rate><Date>4/21/2017</Date><Time>6:29pm</Time><Ask>0.7813</Ask><Bid>0.7812</Bid></rate><rate id="USDHUF"><Name>USD/HUF</Name><Rate>292.4200</Rate><Date>4/21/2017</Date><Time>8:14pm</Time><Ask>292.6200</Ask><Bid>292.4200</Bid></rate><rate id="USDLTL"><Name>USD/LTL</Name><Rate>3.0487</Rate><Date>6/22/2015</Date><Time>9:39am</Time><Ask>3.0491</Ask><Bid>3.0487</Bid></rate><rate id="USDLVL"><Name>USD/LVL</Name><Rate>0.6205</Rate><Date>6/22/2015</Date><Time>9:37am</Time><Ask>0.6206</Ask><Bid>0.6205</Bid></rate><rate id="USDPLN"><Name>USD/PLN</Name><Rate>3.9907</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>3.9916</Ask><Bid>3.9907</Bid></rate><rate id="USDRON"><Name>USD/RON</Name><Rate>4.2276</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>4.2411</Ask><Bid>4.2276</Bid></rate><rate id="USDSEK"><Name>USD/SEK</Name><Rate>9.0293</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>9.0310</Ask><Bid>9.0293</Bid></rate><rate id="USDCHF"><Name>USD/CHF</Name><Rate>0.9977</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>0.9977</Ask><Bid>0.9977</Bid></rate><rate id="USDNOK"><Name>USD/NOK</Name><Rate>8.6823</Rate><Date>4/21/2017</Date><Time>7:00pm</Time><Ask>8.6858</Ask><Bid>8.6823</Bid></rate><rate id="USDHRK"><Name>USD/HRK</Name><Rate>6.9250</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>6.9981</Ask><Bid>6.9250</Bid></rate><rate id="USDRUB"><Name>USD/RUB</Name><Rate>56.5055</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>56.5405</Ask><Bid>56.5055</Bid></rate><rate id="USDTRY"><Name>USD/TRY</Name><Rate>3.6473</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>3.6478</Ask><Bid>3.6473</Bid></rate><rate id="USDAUD"><Name>USD/AUD</Name><Rate>1.3263</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>1.3267</Ask><Bid>1.3263</Bid></rate><rate id="USDBRL"><Name>USD/BRL</Name><Rate>3.1473</Rate><Date>4/21/2017</Date><Time>7:02pm</Time><Ask>3.1493</Ask><Bid>3.1473</Bid></rate><rate id="USDCAD"><Name>USD/CAD</Name><Rate>1.3513</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>1.3513</Ask><Bid>1.3513</Bid></rate><rate id="USDCNY"><Name>USD/CNY</Name><Rate>6.8844</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>6.8854</Ask><Bid>6.8844</Bid></rate><rate id="USDHKD"><Name>USD/HKD</Name><Rate>7.7746</Rate><Date>4/21/2017</Date><Time>6:01pm</Time><Ask>7.7754</Ask><Bid>7.7746</Bid></rate><rate id="USDIDR"><Name>USD/IDR</Name><Rate>13316.0000</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>13326.0000</Ask><Bid>13316.0000</Bid></rate><rate id="USDILS"><Name>USD/ILS</Name><Rate>3.6723</Rate><Date>4/21/2017</Date><Time>6:52pm</Time><Ask>3.6823</Ask><Bid>3.6723</Bid></rate><rate id="USDINR"><Name>USD/INR</Name><Rate>64.6490</Rate><Date>4/21/2017</Date><Time>6:26pm</Time><Ask>64.6990</Ask><Bid>64.6490</Bid></rate><rate id="USDKRW"><Name>USD/KRW</Name><Rate>1133.3700</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1134.3700</Ask><Bid>1133.3700</Bid></rate><rate id="USDMXN"><Name>USD/MXN</Name><Rate>18.8424</Rate><Date>4/21/2017</Date><Time>6:16pm</Time><Ask>18.8443</Ask><Bid>18.8424</Bid></rate><rate id="USDMYR"><Name>USD/MYR</Name><Rate>4.3980</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>4.4030</Ask><Bid>4.3980</Bid></rate><rate id="USDNZD"><Name>USD/NZD</Name><Rate>1.4226</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1.4236</Ask><Bid>1.4226</Bid></rate><rate id="USDPHP"><Name>USD/PHP</Name><Rate>49.8400</Rate><Date>4/21/2017</Date><Time>6:13pm</Time><Ask>49.8900</Ask><Bid>49.8400</Bid></rate><rate id="USDSGD"><Name>USD/SGD</Name><Rate>1.3966</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>1.3969</Ask><Bid>1.3966</Bid></rate><rate id="USDTHB"><Name>USD/THB</Name><Rate>34.3500</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>34.4000</Ask><Bid>34.3500</Bid></rate><rate id="USDZAR"><Name>USD/ZAR</Name><Rate>13.1525</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>13.1620</Ask><Bid>13.1525</Bid></rate><rate id="USDISK"><Name>USD/ISK</Name><Rate>109.4900</Rate><Date>4/21/2017</Date><Time>5:32pm</Time><Ask>109.9900</Ask><Bid>109.4900</Bid></rate></results></query><!-- total: 1083 -->
<!-- prod_bf1_1;paas.yql;queryyahooapiscomproductionbf1;885cf297-259f-11e7-b972-d4ae52974741 -->
However, when I'm using xml the xml etree module to parse this string as an xml object, I'm getting errors like the object is not indexable and the object is not iterable. What exactly is the output of this code?
import xml.etree.ElementTree as ET
d = ET.ElementTree(ET.fromstring(b))
EDIT: The errors are coming up when I'm trying to iterate through the children of d like so:
for child in d:
print child.tag
The error here is "TypeError: 'ElementTree' object is not iterable"
How can I access the children in this string xml to get specific values from it?

you are overdoing things when you try to convert the string to an elementtree element:
import xml.etree.ElementTree as ET
b = '''<?xml version="1.0" encoding="UTF-8"?>...'''
element = ET.fromstring(b) # that does it!
print(element.attrib)
now you can access element as you would any instance of xml.etree.ElementTree.Element.
you could do this for example to iterate over all children:
for child in tree.iter():
print(child, child.tag, child.text, child.attrib)

XML parsing to get list of values in Python

i have a XML output like below:
<?xml version="1.0" encoding="utf-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns1:getValuesResponse soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:ns1="http://soap.core.green.controlj.com"><getValuesReturn soapenc:arrayType="xsd:string[3]" xsi:type="soapenc:Array" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"><getValuesReturn xsi:type="xsd:string">337.81998</getValuesReturn><getValuesReturn xsi:type="xsd:string">129.1</getValuesReturn><getValuesReturn xsi:type="xsd:string">1152.9691</getValuesReturn></getValuesReturn></ns1:getValuesResponse></soapenv:Body></soapenv:Envelope>
I want to get all the values regarding "getValuesReturn" attribute as a Python list. For this, i used a code like below:
import libxml2
DOC="""<?xml version="1.0" encoding="utf-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns1:getValuesResponse soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:ns1="http://soap.core.green.controlj.com"><getValuesReturn soapenc:arrayType="xsd:string[3]" xsi:type="soapenc:Array" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"><getValuesReturn xsi:type="xsd:string">337.81998</getValuesReturn><getValuesReturn xsi:type="xsd:string">129.1</getValuesReturn><getValuesReturn xsi:type="xsd:string">1152.9691</getValuesReturn></getValuesReturn></ns1:getValuesResponse></soapenv:Body></soapenv:Envelope>"""
def getValues(cat):
return [attr.content for attr in doc.xpathEval("/elements/parent[#name='%s']/child/#value" % (cat))]
# gelen xml dosyasini yazdir
doc = libxml2.parseDoc(DOC)
#getValuesReturn etiketinin degerlerini yazdir
print getValues("getValuesReturn")
It just returns me an empty list. But i should get a list such as ["337.81998","129.1","1152.9691"]. Could you please help me out with this ?
Thanks in advance.

Where does the xpath expression come from? It doesn't match anything. (There's no elements, parent tag element)
Try following:
DOC = ...
doc = libxml2.parseDoc(DOC)
print [attr.content for attr in doc.xpathEval(".//getValuesReturn")]
prints
['337.81998129.11152.9691', '337.81998', '129.1', '1152.9691']
doc = libxml2.parseDoc(DOC)
print [attr.content for attr in doc.xpathEval('.//getValuesReturn/text()')]
prints
['337.81998', '129.1', '1152.9691']

How to get ( parse ) subchild in XML from python

I am new to python or coding , so please be patient with my question,
So here's my busy XML
<?xml version="1.0" encoding="utf-8"?>
<Total>
<ID>999</ID>
<Response>
<Detail>
<Nix>
<Check>pass</Check>
</Nix>
<MaxSegment>
<Status>V</Status>
<Input>
<Name>
<First>jack</First>
<Last>smiths</Last>
</Name>
<Address>
<StreetAddress1>100 rodeo dr</StreetAddress1>
<City>long beach</City>
<State>ca</State>
<ZipCode>90802</ZipCode>
</Address>
<DriverLicense>
<Number>123456789</Number>
<State>ca</State>
</DriverLicense>
<Contact>
<Email>x#me.com</Email>
<Phones>
<Home>0000000000</Home>
<Work>1111111111</Work>
</Phones>
</Contact>
</Input>
<Type>Regular</Type>
</MaxSegment>
</Detail>
</Response>
</Total>
what I am trying to do is extract these value into nice and clean table below :
Here's my code so far.. but I couldn't figure it out how to get the subchild :
import os
os.chdir('d:/py/xml/')
import xml.etree.ElementTree as ET
tree = ET.parse('xxml.xml')
root=tree.getroot()
x = root.tag
y = root.attrib
print(x,y)
#---PRINT ALL NODES---
for child in root:
print(child.tag, child.attrib)
Thank you in advance !

You could create a dictionary that maps the column names to xpath expressions that extract corresponding values e.g.:
xpath = {
"ID": "/Total/ID/text()",
"Check": "/Total/Response/Detail/Nix/Check/text()", # or "//Check/text()"
}
To populate the table row:
row = {name: tree.xpath(path) for name, path in xpath.items()}
The above assumes that you use lxml that support the full xpath syntax. ElementTree supports only a subset of XPath expressions but it might be enough in your case (you could remove "text()" expression and use el.text in this case) e.g.:
xpath = {
  "ID": ".//ID",
  "Check": ".//Check",
}
row = {name: tree.findtext(path) for name, path in xpath.items()}
To print all text with corresponding tag names:
import xml.etree.cElementTree as etree
for _, el in etree.iterparse("xxm.xml"):
if el.text and not el: # leaf element with text
print el.tag, el.text
If column names differ from tag names (as in your case) then the last example is not enough to build the table.

This is how you could traverse the tree and print only the text nodes:
def traverse(node):
show = True
for c in node.getchildren():
show = False
traverse(c)
if show:
print node.tag, node.text
for you example I get the following:
traverse(root)
ID 999
Check pass
Status V
First jack
Last smiths
StreetAddress1 100 rodeo dr
City long beach
State ca
ZipCode 90802
Number 123456789
State ca
Email x#me.com
Home 0000000000
Work 1111111111
Type Regular
Instead of printing out you could store (node.tag, node.text) tuples or store {node.tag: node.text} in a dict.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

XML Parsing text and attrib on same line - python

Related

print lines of tags that have no attribute in xml python

Python - How to parse xml response and store a elements value in a variable?

Error in parsing a YQL string xml Python

XML parsing to get list of values in Python

How to get ( parse ) subchild in XML from python

Categories

Resources