I'm trying to scrape data from an API like this:
import urllib2
a = urllib2.urlopen('http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.xchange%20where%20pair%20in%20(%22USDEUR%22,%20%22USDJPY%22,%20%22USDBGN%22,%20%22USDCZK%22,%20%22USDDKK%22,%20%22USDGBP%22,%20%22USDHUF%22,%20%22USDLTL%22,%20%22USDLVL%22,%20%22USDPLN%22,%20%22USDRON%22,%20%22USDSEK%22,%20%22USDCHF%22,%20%22USDNOK%22,%20%22USDHRK%22,%20%22USDRUB%22,%20%22USDTRY%22,%20%22USDAUD%22,%20%22USDBRL%22,%20%22USDCAD%22,%20%22USDCNY%22,%20%22USDHKD%22,%20%22USDIDR%22,%20%22USDILS%22,%20%22USDINR%22,%20%22USDKRW%22,%20%22USDMXN%22,%20%22USDMYR%22,%20%22USDNZD%22,%20%22USDPHP%22,%20%22USDSGD%22,%20%22USDTHB%22,%20%22USDZAR%22,%20%22USDISK%22)&env=store://datatables.org/alltableswithkeys')
b = a.read()
b is a string object of the xml:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="34" yahoo:created="2017-04-21T19:46:11Z" yahoo:lang="en-US"><results><rate id="USDEUR"><Name>USD/EUR</Name><Rate>0.9347</Rate><Date>4/21/2017</Date><Time>7:13pm</Time><Ask>0.9352</Ask><Bid>0.9347</Bid></rate><rate id="USDJPY"><Name>USD/JPY</Name><Rate>109.2200</Rate><Date>4/21/2017</Date><Time>6:58pm</Time><Ask>109.2260</Ask><Bid>109.2200</Bid></rate><rate id="USDBGN"><Name>USD/BGN</Name><Rate>1.8282</Rate><Date>4/21/2017</Date><Time>3:15pm</Time><Ask>N/A</Ask><Bid>1.8282</Bid></rate><rate id="USDCZK"><Name>USD/CZK</Name><Rate>25.1629</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>25.1702</Ask><Bid>25.1629</Bid></rate><rate id="USDDKK"><Name>USD/DKK</Name><Rate>6.9458</Rate><Date>4/21/2017</Date><Time>6:44pm</Time><Ask>6.9466</Ask><Bid>6.9458</Bid></rate><rate id="USDGBP"><Name>USD/GBP</Name><Rate>0.7812</Rate><Date>4/21/2017</Date><Time>6:29pm</Time><Ask>0.7813</Ask><Bid>0.7812</Bid></rate><rate id="USDHUF"><Name>USD/HUF</Name><Rate>292.4200</Rate><Date>4/21/2017</Date><Time>8:14pm</Time><Ask>292.6200</Ask><Bid>292.4200</Bid></rate><rate id="USDLTL"><Name>USD/LTL</Name><Rate>3.0487</Rate><Date>6/22/2015</Date><Time>9:39am</Time><Ask>3.0491</Ask><Bid>3.0487</Bid></rate><rate id="USDLVL"><Name>USD/LVL</Name><Rate>0.6205</Rate><Date>6/22/2015</Date><Time>9:37am</Time><Ask>0.6206</Ask><Bid>0.6205</Bid></rate><rate id="USDPLN"><Name>USD/PLN</Name><Rate>3.9907</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>3.9916</Ask><Bid>3.9907</Bid></rate><rate id="USDRON"><Name>USD/RON</Name><Rate>4.2276</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>4.2411</Ask><Bid>4.2276</Bid></rate><rate id="USDSEK"><Name>USD/SEK</Name><Rate>9.0293</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>9.0310</Ask><Bid>9.0293</Bid></rate><rate id="USDCHF"><Name>USD/CHF</Name><Rate>0.9977</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>0.9977</Ask><Bid>0.9977</Bid></rate><rate id="USDNOK"><Name>USD/NOK</Name><Rate>8.6823</Rate><Date>4/21/2017</Date><Time>7:00pm</Time><Ask>8.6858</Ask><Bid>8.6823</Bid></rate><rate id="USDHRK"><Name>USD/HRK</Name><Rate>6.9250</Rate><Date>4/21/2017</Date><Time>6:53pm</Time><Ask>6.9981</Ask><Bid>6.9250</Bid></rate><rate id="USDRUB"><Name>USD/RUB</Name><Rate>56.5055</Rate><Date>4/21/2017</Date><Time>6:33pm</Time><Ask>56.5405</Ask><Bid>56.5055</Bid></rate><rate id="USDTRY"><Name>USD/TRY</Name><Rate>3.6473</Rate><Date>4/21/2017</Date><Time>6:02pm</Time><Ask>3.6478</Ask><Bid>3.6473</Bid></rate><rate id="USDAUD"><Name>USD/AUD</Name><Rate>1.3263</Rate><Date>4/21/2017</Date><Time>8:35pm</Time><Ask>1.3267</Ask><Bid>1.3263</Bid></rate><rate id="USDBRL"><Name>USD/BRL</Name><Rate>3.1473</Rate><Date>4/21/2017</Date><Time>7:02pm</Time><Ask>3.1493</Ask><Bid>3.1473</Bid></rate><rate id="USDCAD"><Name>USD/CAD</Name><Rate>1.3513</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>1.3513</Ask><Bid>1.3513</Bid></rate><rate id="USDCNY"><Name>USD/CNY</Name><Rate>6.8844</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>6.8854</Ask><Bid>6.8844</Bid></rate><rate id="USDHKD"><Name>USD/HKD</Name><Rate>7.7746</Rate><Date>4/21/2017</Date><Time>6:01pm</Time><Ask>7.7754</Ask><Bid>7.7746</Bid></rate><rate id="USDIDR"><Name>USD/IDR</Name><Rate>13316.0000</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>13326.0000</Ask><Bid>13316.0000</Bid></rate><rate id="USDILS"><Name>USD/ILS</Name><Rate>3.6723</Rate><Date>4/21/2017</Date><Time>6:52pm</Time><Ask>3.6823</Ask><Bid>3.6723</Bid></rate><rate id="USDINR"><Name>USD/INR</Name><Rate>64.6490</Rate><Date>4/21/2017</Date><Time>6:26pm</Time><Ask>64.6990</Ask><Bid>64.6490</Bid></rate><rate id="USDKRW"><Name>USD/KRW</Name><Rate>1133.3700</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1134.3700</Ask><Bid>1133.3700</Bid></rate><rate id="USDMXN"><Name>USD/MXN</Name><Rate>18.8424</Rate><Date>4/21/2017</Date><Time>6:16pm</Time><Ask>18.8443</Ask><Bid>18.8424</Bid></rate><rate id="USDMYR"><Name>USD/MYR</Name><Rate>4.3980</Rate><Date>4/21/2017</Date><Time>6:38pm</Time><Ask>4.4030</Ask><Bid>4.3980</Bid></rate><rate id="USDNZD"><Name>USD/NZD</Name><Rate>1.4226</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>1.4236</Ask><Bid>1.4226</Bid></rate><rate id="USDPHP"><Name>USD/PHP</Name><Rate>49.8400</Rate><Date>4/21/2017</Date><Time>6:13pm</Time><Ask>49.8900</Ask><Bid>49.8400</Bid></rate><rate id="USDSGD"><Name>USD/SGD</Name><Rate>1.3966</Rate><Date>4/21/2017</Date><Time>8:28pm</Time><Ask>1.3969</Ask><Bid>1.3966</Bid></rate><rate id="USDTHB"><Name>USD/THB</Name><Rate>34.3500</Rate><Date>4/21/2017</Date><Time>6:49pm</Time><Ask>34.4000</Ask><Bid>34.3500</Bid></rate><rate id="USDZAR"><Name>USD/ZAR</Name><Rate>13.1525</Rate><Date>4/21/2017</Date><Time>6:50pm</Time><Ask>13.1620</Ask><Bid>13.1525</Bid></rate><rate id="USDISK"><Name>USD/ISK</Name><Rate>109.4900</Rate><Date>4/21/2017</Date><Time>5:32pm</Time><Ask>109.9900</Ask><Bid>109.4900</Bid></rate></results></query><!-- total: 1083 -->
<!-- prod_bf1_1;paas.yql;queryyahooapiscomproductionbf1;885cf297-259f-11e7-b972-d4ae52974741 -->
However, when I'm using xml the xml etree module to parse this string as an xml object, I'm getting errors like the object is not indexable and the object is not iterable. What exactly is the output of this code?
import xml.etree.ElementTree as ET
d = ET.ElementTree(ET.fromstring(b))
EDIT: The errors are coming up when I'm trying to iterate through the children of d like so:
for child in d:
print child.tag
The error here is "TypeError: 'ElementTree' object is not iterable"
How can I access the children in this string xml to get specific values from it?
you are overdoing things when you try to convert the string to an elementtree element:
import xml.etree.ElementTree as ET
b = '''<?xml version="1.0" encoding="UTF-8"?>...'''
element = ET.fromstring(b) # that does it!
print(element.attrib)
now you can access element as you would any instance of xml.etree.ElementTree.Element.
you could do this for example to iterate over all children:
for child in tree.iter():
print(child, child.tag, child.text, child.attrib)
Related
I have an xml file of following format :
<?xml version='1.0' encoding='utf-8'?>
<execute time="0.59">
<exec name="recursive_a" loops="3" fail="2" skipped="0">
<testcase tname="test_a" name="test.cpp" time="0.50">
<pass>
001,test,pass
</pass>
</testcase>
</exec>
</execute>
how can i parse "recursive_a" string from this xml using python? (i am using minidom xml parser)
With xml.etree.ElementTree and pandas one solution could be:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('Code3r.xml')
root = tree.getroot()
for elem in root:
if elem.tag == "exec":
# print(elem.attrib) or with pandas
df = pd.DataFrame.from_dict(elem.attrib, orient='index')
print(df.T.to_string(index=False))
Output:
name loops fail skipped
recursive_a 3 2 0
I am working on a school assignment where we have to parse 3 elements from an xml file and print them on the same line with the titles, Artist:, Title:, Decade: in python 3. I was able to complete the Artist and Title part but the decade is a attrib contained in a element that i can only seem to get to print below or I get a "TypeError: 'dict' object is not callable". From my understanding when trying to parse an attrib I must use iter function so i understand why it's wrong but i just can't figure out how to fit that into my for loop so they can all print on the same line. The code is below
This is what I have:
import xml.etree.ElementTree as et
tree = et.parse("cd_catalog.xml")
root = tree.getroot()
for child in root.findall("CD"):
artist = child.find("ARTIST").text
title = child.find("TITLE").text
decade = child.attrib("decade").text
print("Artist: %s, Title: %s, Decade: %s" %(artist, title, decade))
The XML file has the following info:
<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
<CD decade="80s">
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
The attrib attribute of XML elements is not a function but a dictionary of the xml element attributes. Therefore, the expression child.attrib("decade") raises an exception since you try to call child.attrib.
So you have to change:
decade = child.attrib("decade").text
by:
decade = child.attrib["decade"]
I am getting the XML response from the API call.
I need the "testId" attribute value from this response. Please help me on this.
r = requests.get( myconfig.URL_webpagetest + "?url=" + testurl + "&f=xml&k=" + myconfig.apikey_webpagetest )
xmltxt = r.content
print(xmltxt)
testId = XML(xmltxt).find("testId").text
r = requests.get("http://www.webpagetest.org/testStatus.php?f=xml&test=" + testId )
xml response:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<statusCode>200</statusCode>
<statusText>Ok</statusText>
<data>
<testId>180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</testId>
<ownerKey>dfc65d98de13c4770e528ef5b65e9629a52595e9</ownerKey>
<jsonUrl>http://www.webpagetest.org/jsonResult.php?test=180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</jsonUrl>
</data>
</response>
The following error is produced:
Traceback (most recent call last):
File "/pagePerformance.py", line 52, in <module>
testId = XML (xmltxt).find("testId").text
AttributeError: 'NoneType' object has no attribute 'text'
Use the following to collect testId from response:-
import xml.etree.ElementTree as ET
response_xml_as_string = "xml response string from API"
responseXml = ET.fromstring(response_xml_as_string)
testId = responseXml.find('data').find('testId')
print testId.text
from lxml.etree import fromstring
string = '<?xml version="1.0" encoding="UTF-8"?> <response> <statusCode>200</statusCode> <statusText>Ok</statusText> <data><testId>180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</testId> <ownerKey>dfc65d98de13c4770e528ef5b65e9629a52595e9</ownerKey> <jsonUrl>http://www.webpagetest.org/jsonResult.php?test=180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</jsonUrl> </data> </response>'
response = fromstring(string.encode('utf-8'))
elm = response.xpath('/response/data/testId').pop()
testId = elm.text
This way you can search for any element within the xml from the root/parent element via the XPATH.
Side Note: I don't particular like using the pop method to remove the item from a single item list. So if anyone else has a better way to do it please let me know. So far I've consider:
1) elm = next(iter(response.xpath('/response/data/testId')))
2) simply leaving it in a list so it can use as a stararg
I found this article the other day when it appeared on my feed, and it may suit your needs. I skimmed it, but in general the package parses xml data and converts the tags/attributes/values into a dictionary. Additionally, the author points out that it maintains the nesting structure of the xml as well.
https://www.oreilly.com/learning/jxmlease-python-xml-conversion-data-structures
for your use case.
>>> xml = '<?xml version="1.0" encoding="UTF-8"?> <response> <statusCode>200</statusCode> <statusText>Ok</statusText> <data> <testId>180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</testId> <ownerKey>dfc65d98de13c4770e528ef5b65e9629a52595e9</ownerKey> <jsonUrl>http://www.webpagetest.org/jsonResult.php?test=180523_YM_054fd7d84fd4ea7aed237f87289e0c7c</jsonUrl> </data> </response>'
>>> root = jxmlease.parse(xml)
>>> testid = root['response']['data']['testId'].get_cdata()
>>> print(testid)
>>> '180523_YM_054fd7d84fd4ea7aed237f87289e0c7c'
If I've got an XML file like this:
<root
xmlns:a="http://example.com/a"
xmlns:b="http://example.com/b"
xmlns:c="http://example.com/c"
xmlns="http://example.com/base">
...
</root>
How can I get a list of the namespace definitions (ie, the xmlns:a="…", etc)?
Using:
import xml.etree.ElementTree as ET
tree = ET.parse('foo.xml')
root = tree.getroot()
print root.attrib()
Shows an empty attribute dictionary.
Via #mzjn, in the comments, here's how to do it with stock ElementTree: https://stackoverflow.com/a/42372404/407651 :
import xml.etree.ElementTree as ET
my_namespaces = dict([
node for (_, node) in ET.iterparse('file.xml', events=['start-ns'])
])
You might find it easier to use lxml.
from lxml import etree
xml_data = '<root xmlns:a="http://example.com/a" xmlns:b="http://example.com/b" xmlns:c="http://example.com/c" xmlns="http://example.com/base"></root>'
root_node = etree.fromstring(xml_data)
print root_node.nsmap
This outputs
{None: 'http://example.com/base',
'a': 'http://example.com/a',
'b': 'http://example.com/b',
'c': 'http://example.com/c'}
i have a XML output like below:
<?xml version="1.0" encoding="utf-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns1:getValuesResponse soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:ns1="http://soap.core.green.controlj.com"><getValuesReturn soapenc:arrayType="xsd:string[3]" xsi:type="soapenc:Array" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"><getValuesReturn xsi:type="xsd:string">337.81998</getValuesReturn><getValuesReturn xsi:type="xsd:string">129.1</getValuesReturn><getValuesReturn xsi:type="xsd:string">1152.9691</getValuesReturn></getValuesReturn></ns1:getValuesResponse></soapenv:Body></soapenv:Envelope>
I want to get all the values regarding "getValuesReturn" attribute as a Python list. For this, i used a code like below:
import libxml2
DOC="""<?xml version="1.0" encoding="utf-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns1:getValuesResponse soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:ns1="http://soap.core.green.controlj.com"><getValuesReturn soapenc:arrayType="xsd:string[3]" xsi:type="soapenc:Array" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"><getValuesReturn xsi:type="xsd:string">337.81998</getValuesReturn><getValuesReturn xsi:type="xsd:string">129.1</getValuesReturn><getValuesReturn xsi:type="xsd:string">1152.9691</getValuesReturn></getValuesReturn></ns1:getValuesResponse></soapenv:Body></soapenv:Envelope>"""
def getValues(cat):
return [attr.content for attr in doc.xpathEval("/elements/parent[#name='%s']/child/#value" % (cat))]
# gelen xml dosyasini yazdir
doc = libxml2.parseDoc(DOC)
#getValuesReturn etiketinin degerlerini yazdir
print getValues("getValuesReturn")
It just returns me an empty list. But i should get a list such as ["337.81998","129.1","1152.9691"]. Could you please help me out with this ?
Thanks in advance.
Where does the xpath expression come from? It doesn't match anything. (There's no elements, parent tag element)
Try following:
DOC = ...
doc = libxml2.parseDoc(DOC)
print [attr.content for attr in doc.xpathEval(".//getValuesReturn")]
prints
['337.81998129.11152.9691', '337.81998', '129.1', '1152.9691']
doc = libxml2.parseDoc(DOC)
print [attr.content for attr in doc.xpathEval('.//getValuesReturn/text()')]
prints
['337.81998', '129.1', '1152.9691']