Downloading a xml file using an API URL in python - python

I am trying to download data which is returned in an xml file from an api with the following url
URL='http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1'
When I use the url in my web browser the xml file downloaded looks like this
<?xml version="1.0" encoding="UTF-8"?>
<OASISReport xmlns="http://www.caiso.com/soa/OASISReport_v1.xsd">
<MessageHeader>
<TimeDate>2018-04-06T15:17:51-00:00</TimeDate>
<Source>OASIS</Source>
<Version>v20131201</Version>
</MessageHeader>
<MessagePayload>
<RTO>
<name>CAISO</name>
<REPORT_ITEM>
<REPORT_HEADER>
<SYSTEM>OASIS</SYSTEM>
<TZ>PPT</TZ>
<REPORT>PRC_FUEL</REPORT>
<UOM>US$</UOM>
<INTERVAL>ENDING</INTERVAL>
<SEC_PER_INTERVAL>3600</SEC_PER_INTERVAL>
</REPORT_HEADER>
<REPORT_DATA>
<DATA_ITEM>FUEL_PRC</DATA_ITEM>
<RESOURCE_NAME>CISO</RESOURCE_NAME>
<OPR_DATE>2013-09-19</OPR_DATE>
<INTERVAL_NUM>24</INTERVAL_NUM>
However, when I download using a python script it is something very different.
Python script:
r=requests.get(URL)
r.encoding="UTF-8"
with open ('data.xml','wb') as file:
file.write(r.content)
Downloaded file:
PKEL520130919_20130928_PRC_FUEL_N_20180406_08_44_40_v1.xmlíÝïOǹàïþ+Pt¤ó)ÝåQ*SdJ§_,âlMÿþÇK{Í£yV{í;ï¶vRâ¹XÞÌ<÷=»ûþ×?lü<sy}õ§Ï&O·>Û_½»þöòê»?}öõ¿|þì³?<Ù?}q~rþzþÓõûÛ»ÿÇÕÍ>ûþöö§/67ùå§ï..o®¾»þqóæúbsáï}ûóäé¿n¾ýìî+¼ßÜ\|7ÿëüâÛùû»_¿¹üqþòâv~0Ý<û|kûó­Ý7/¶·¿ØÞú|kë­­ýÍ?þ'ûç×ÿ|ÿn~ðákïoþþ«'ûûíO~ðóÝWMîþcóão=Ùßü÷æï¿>»øõëoï~ãõÓ»ÿ¼ºøq~pøâäütóÃÿ¾ûGg§¯ß¼=ysôêÓ¯þzôâåÑëû?Ìÿßÿß~u·¤¿½¹ûcÿýÿÏÁÙë÷ùúèËýÍßãÉþק¯¾>ÿ¯ýÍûÿñdÿä«7G¯ÿöâË£¯^|u¼¿ùÇoÜýß½~ûÇoÍvï]þã·|üòþ¿ÿúå7/î~uÿ_¿­æþóöîOµ¿ùé÷îÿîóÓ¯_½ýêÅ«£Ãïî5pöúþËÝÇfo=ÿ|ò|óßü´·_}ýê`ºýi!~c᯿yq÷';~õæ¯4Ýz³µûÅïúÇïýÿów/|;«ÿø{|ïÝ«åÅ_l?ݹûÃýö¿?Áýµj¶Ym§í1÷ubDÔ&Ï­T©ÝgPFÕ[tµÚó¨~BK®Ul
#-ô¯Ò¢«Õ&PÛª=¶èjµéÔv£j-ºZm6µ½¨Úc®VÛÚ³¨Úc®V{l÷NjÏ£jM»Üû/0]îV­éLuÿp¦DO®ºm§Iôxð誫Ùp<DÏ®ºm:óÁ$z#xtÕÕlC8 L¢'GW]Í6Â$zDxtÕÕlC8"L¢gGW]=ij-tH(­ºm϶Ð)¡´êj¶!<Û¦¡SBiÕÕlCx¶MC§Òª«Ù0ÿN ¥UW³¥#>þòaÂÛiÞ{v|4óÞU°u\&¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁeì×:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2áFplFplkÁ­­Ã«vlÑìŸ46æ½ßòóãɮ¥üð$¨Ñ~VX5:¸dÕèU£¬½bÕèÊ«f÷ó\6úè²Ñ# ¹lô¸e³û.%¹ltpé²Ñ1¹ËF2\6ºä²Ñ3®7ºltÖe£«Û,}Oàqµ1ï}ø2t]sº¬A·5]5yêªÉPWMÞºjòÔUw ¬¼eÑäz&·óX4¹Ç¢ÉÝ<M®æ±hr3Ey,ÜËcÑèZ«&·ò\5¹çªÉ<WM®ä±jt#ÏUy®Üz\mÌu_/}/dI?= lòÔeÏ®<pÕäÉ«&Y]5yïªÉÑ«&§®»jr÷ÂU£{>0\*Ùä#Ì&×ea6¹
³ÉÓVM¾u³É×N`69HÙäÔÒe£#rMîcÀlrù§À6æ½_cápjrnÉ¢É&ço,¿±hrúÆ¢Éá&go,½±htòæªÉÁ«&çn®»¹jrêæªÉ¡«&gn®¹¹jrâæªÉ«Fçm®·¹jrÚæªÉê\µhÌB¼YÍë.|ÇOÎO+^÷ÿK±Þ½òÛf¬÷_`å Ú´ß/
I am assuming it's an encoding issue, but I am struggling with the solution.
Thanks in advance for your help!

This should help. The url you mentioned gives a zip. You can download that and extract it to get your XML.
EX:
import requests
import zipfile
import StringIO
URL='http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1'
r = requests.get(URL, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()

You could also use urllib
import urllib
urllib.urlretrieve(
"http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1",
"oasis.zip")

Related

Python - Download zip files with requests package but get unknown file format

I am using Python 3.8.12. I tried the following code to download files from URLs with the requests package, but got 'Unkown file format' message when opening the zip file. I tested on different zip URLs but the size of all zip files are 18KB and none of the files can be opened successfully.
import requests
file_url = 'https://www.censtatd.gov.
hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
file_download = requests.get(file_url, allow_redirects=True, stream=True)
open(save_path+file_name, 'wb').write(file_download.content)
Zip file opening error message
Zip files size
However, once I updated the url as file_url = 'https://www.td.gov.hk/datagovhk_tis/mttd-csv/en/table41a_eng.csv' the code worked well and the csv file could be downloaded perfectly.
I try to use requests, urllib , wget and zipfile io packages, but none of them work.
The reason may be that the zip URL directs to both the zip file and a web page, while the csv URL directs to the csv file only.
I am really new to this field, could anyone help on it? Thanks a lot!
You might examine headers after sending HEAD request to get information regarding file, examining Content-Type allows you to reveal actual type of file
import requests
file_url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
r = requests.head(file_url)
print(r.headers["Content-Type"])
gives output
text/html
So file you have URL to is actually HTML page.
import wget
url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html?
pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
#url = 'https://golang.org/dl/go1.17.3.windows-amd64.zip'
wget.download(url)

Python script to download PDF not downloading the PDF?

I have a Python 3.10 script to download a PDF from a URL, I get no errors but when I run the code the PDF does not download. I've done a sanity check to ensure the PDF is actually on the URL (which it is)
I'm not sure if this maybe has something to do with HTTP/ HTTPS? This site does have an expired HTTPS certificate, but it is a government site and this is really for testing only so I am not worried about that and can ignore the error
from fileinput import filename
import os
import os.path
from datetime import datetime
import urllib.request
import requests
import urllib3
urllib3.disable_warnings()
resp = requests.get('http:// url domain .org', verify=False)
urllib.request.urlopen('http:// my url .pdf')
filename = datetime.now().strftime("%Y_%m_%d-%I_%M_%S_%p")
save_path = "C:/Users/bob/Desktop/folder"
Or maybe is the issue something to do with urllib3 ignoring the error and urllib downloading the file?
Redacted the specific URL here
The urllib.request.urlopen method doesn't save the remote URL to a file -- it returns a response object that can be treated as a file-like object. You could do something like:
response = urllib.request.urlopen('http:// my url .pdf')
with open('filename.pdf') as fd:
fd.write(response.read())
The urllib.request.urlretrieve method, on the other hand, will take care of writing the remote content to a local file. You would use it like this to write the PDF file to a local file named filename.pdf:
response = urllib.request.urlretrieve('http://my url .pdf',
filename='filename.pdf')
See the documentation for information about the return value from the urlretrieve method.

How to download Flickr images using photos url (does not contain .jpg, .png, etc.) using Python

I want to download image from Flickr using following type of links using Python:
https://www.flickr.com/photos/66176388#N00/2172469872/
https://www.flickr.com/photos/clairity/798067744/
This data is obtained from xml file given at https://snap.stanford.edu/data/web-flickr.html
Is there any Python script or way to download images automatically.
Thanks.
I try to find answer from other sources and compiled the answer as follows:
import re
from urllib import request
def download(url, save_name):
html = request.urlopen(url).read()
html=html.decode('utf-8')
img_url = re.findall(r'https:[^" \\:]*_b\.jpg', html)[0]
print(img_url)
with open(save_name, "wb") as fp:
fp.write(request.urlopen(img_url).read())
download('https://www.flickr.com/photos/clairity/798067744/sizes/l/', 'image.jpg')

Download csv file through python (url)

I work on a project and I want to download a csv file from a url. I did some research on the site but none of the solutions presented worked for me.
The url offers you directly to download or open the file of the blow I do not know how to say a python to save the file (it would be nice if I could also rename it)
But when I open the url with this code nothing happens.
import urllib
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
testfile = urllib.request.urlopen(url)
Any ideas?
Try this. Change "folder" to a folder on your machine
import os
import requests
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
response = requests.get(url)
with open(os.path.join("folder", "file"), 'wb') as f:
f.write(response.content)
You can adapt an example from the docs
import urllib.request
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
with urllib.request.urlopen(url) as testfile, open('dataset.csv', 'w') as f:
f.write(testfile.read().decode())

Download a file to a particular folder python

I can download a file from URL the following way.
import urllib2
response = urllib2.urlopen("http://www.someurl.com/file.pdf")
html = response.read()
One way I can think of is open this file as binary and then resave it to the differnet folder I want to save
but is there a better way?
Thanks
You can use the python module wget for downloading the file. Here is a sample code
import wget
url = 'http://www.example.com/foo.zip'
path = 'path/to/destination'
wget.download(url,out = path)
The function you are looking for is urllib.urlretrieve
import urllib
linkToFile = "http://www.someurl.com/file.pdf"
localDestination = "/home/user/local/path/to/file.pdf"
resultFilePath, responseHeaders = urllib.urlretrieve(linkToFile, localDestination)

Categories

Resources