This is the code that I have been trying:
import ijson
import urllib.request
with urllib.request.urlopen(some_link) as read_file:
path_array = ijson.items(read_file, object_in_json)
but I get this error:
(b'lexical error: invalid char in json text.\n \x1f\x8b\x08 (right here) ------^\n',)
Probably links are not supported by that library.
I advice you to use the requests module. So install it launching pip install requests.
Then, in your .py file:
import requests
response = requests.get(url)
json = response.json()
Related
I am having a couple issues with setting up a way to automate the download of a csv. The two issues are when downloading using a simple pandas read_csv(url) method I get and SSL error, so I switched to using requests and trying to parse the response. The next issues is that I am getting some errors in parsing the response. I'm not sure if the reason is that the URL is actually returning a zip file and if that is how can I get around that.
Here is the URL: https://www.californiadgstats.ca.gov/download/interconnection_rule21_applications/
and here is the code:
import pandas as pd
import numpy as np
import os
import io
import requests
import urllib3
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'ALL:#SECLEVEL=1'
url = "https://www.californiadgstats.ca.gov/download/interconnection_rule21_applications/"
res = requests.get(url).content
data = pd.read_csv(io.StringIO(res.decode('utf-8')))
If the content is zip format, you should unzip it, and use its contents (csv, txt...).
I wasn't able to download the file due to the low speed from host
Here is the answer I found although I don't really need to actually save these files locally, so if anyone knows how to parse zipfiles without downloading that would be great. Also not sure why I get that SSL error with pandas, but not with requests...
import requests
import zipfile
from io import BytesIO
url = "https://www.californiadgstats.ca.gov/download/interconnection_rule21_applications/"
pathSave = "C:/Users/wherever"
filename = url.split('/')[-1]
r = requests.get(url)
zipfile= zipfile.ZipFile(BytesIO(r.content))
zipfile.extractall(pathSave)
I have a Python 3.10 script to download a PDF from a URL, I get no errors but when I run the code the PDF does not download. I've done a sanity check to ensure the PDF is actually on the URL (which it is)
I'm not sure if this maybe has something to do with HTTP/ HTTPS? This site does have an expired HTTPS certificate, but it is a government site and this is really for testing only so I am not worried about that and can ignore the error
from fileinput import filename
import os
import os.path
from datetime import datetime
import urllib.request
import requests
import urllib3
urllib3.disable_warnings()
resp = requests.get('http:// url domain .org', verify=False)
urllib.request.urlopen('http:// my url .pdf')
filename = datetime.now().strftime("%Y_%m_%d-%I_%M_%S_%p")
save_path = "C:/Users/bob/Desktop/folder"
Or maybe is the issue something to do with urllib3 ignoring the error and urllib downloading the file?
Redacted the specific URL here
The urllib.request.urlopen method doesn't save the remote URL to a file -- it returns a response object that can be treated as a file-like object. You could do something like:
response = urllib.request.urlopen('http:// my url .pdf')
with open('filename.pdf') as fd:
fd.write(response.read())
The urllib.request.urlretrieve method, on the other hand, will take care of writing the remote content to a local file. You would use it like this to write the PDF file to a local file named filename.pdf:
response = urllib.request.urlretrieve('http://my url .pdf',
filename='filename.pdf')
See the documentation for information about the return value from the urlretrieve method.
I am trying to download data which is returned in an xml file from an api with the following url
URL='http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1'
When I use the url in my web browser the xml file downloaded looks like this
<?xml version="1.0" encoding="UTF-8"?>
<OASISReport xmlns="http://www.caiso.com/soa/OASISReport_v1.xsd">
<MessageHeader>
<TimeDate>2018-04-06T15:17:51-00:00</TimeDate>
<Source>OASIS</Source>
<Version>v20131201</Version>
</MessageHeader>
<MessagePayload>
<RTO>
<name>CAISO</name>
<REPORT_ITEM>
<REPORT_HEADER>
<SYSTEM>OASIS</SYSTEM>
<TZ>PPT</TZ>
<REPORT>PRC_FUEL</REPORT>
<UOM>US$</UOM>
<INTERVAL>ENDING</INTERVAL>
<SEC_PER_INTERVAL>3600</SEC_PER_INTERVAL>
</REPORT_HEADER>
<REPORT_DATA>
<DATA_ITEM>FUEL_PRC</DATA_ITEM>
<RESOURCE_NAME>CISO</RESOURCE_NAME>
<OPR_DATE>2013-09-19</OPR_DATE>
<INTERVAL_NUM>24</INTERVAL_NUM>
However, when I download using a python script it is something very different.
Python script:
r=requests.get(URL)
r.encoding="UTF-8"
with open ('data.xml','wb') as file:
file.write(r.content)
Downloaded file:
PKEL520130919_20130928_PRC_FUEL_N_20180406_08_44_40_v1.xmlíÝïOǹàïþ+Pt¤ó)ÝåQ*SdJ§_,âlMÿþÇK{Í£yV{í;ï¶vRâ¹XÞÌ<÷=»ûþ×?lü<sy}õ§Ï&O·>Û_½»þöòê»?}öõ¿|þì³?<Ù?}q~rþzþÓõûÛ»ÿÇÕÍ>ûþöö§/67ùå§ï..o®¾»þqóæúbsáï}ûóäé¿n¾ýìî+¼ßÜ\|7ÿëüâÛùû»_¿¹üqþòâv~0Ý<û|kûóÝ7/¶·¿ØÞú|këýÍ?þ'ûç×ÿ|ÿn~ðákïoþþ«'ûûíO~ðóÝWMîþcóão=Ùßü÷æï¿>»øõëoï~ãõÓ»ÿ¼ºøq~pøâäütóÃÿ¾ûGg§¯ß¼=ysôêÓ¯þzôâåÑëû?Ìÿßÿß~u·¤¿½¹ûcÿýÿÏÁÙë÷ùúèËýÍßãÉþק¯¾>ÿ¯ýÍûÿñdÿä«7G¯ÿöâË£¯^|u¼¿ùÇoÜýß½~ûÇoÍvï]þã·|üòþ¿ÿúå7/î~uÿ_¿æþóöîOµ¿ùé÷îÿîóÓ¯_½ýêÅ«£Ãïî5pöúþËÝÇfo=ÿ|ò|óßü´·_}ýê`ºýi!~c᯿yq÷';~õæ¯4Ýz³µûÅïúÇïýÿów/|;«ÿø{|ïÝ«åÅ_l?ݹûÃýö¿?Áýµj¶Ym§í1÷ubDÔ&ÏT©ÝgPFÕ[tµÚó¨~BK®Ul
#-ô¯Ò¢«Õ&PÛª=¶èjµéÔv£j-ºZm6µ½¨Úc®VÛÚ³¨Úc®V{l÷NjÏ£jM»Üû/0]îVéLuÿp¦DO®ºm§Iôxð誫Ùp<DÏ®ºm:óÁ$z#xtÕÕlC8 L¢'GW]Í6Â$zDxtÕÕlC8"L¢gGW]=ij-tH(ºm϶Ð)¡´êj¶!<Û¦¡SBiÕÕlCx¶MC§Òª«Ù0ÿN ¥UW³¥#>þòaÂÛiÞ{v|4óÞU°u\&¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁ%¨uÁeì×:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2Ø:à2áFplFplkÁëvlÑìŸ46æ½ßòóãɮ¥üð$¨Ñ~VX5:¸dÕèU£¬½bÕèÊ«f÷ó\6úè²Ñ# ¹lô¸e³û.%¹ltpé²Ñ1¹ËF2\6ºä²Ñ3®7ºltÖe£«Û,}Oàqµ1ï}ø2t]sº¬A·5]5yêªÉPWMÞºjòÔUw ¬¼eÑäz&·óX4¹Ç¢ÉÝ<M®æ±hr3Ey,ÜËcÑèZ«&·ò\5¹çªÉ<WM®ä±jt#ÏUy®Üz\mÌu_/}/dI?= lòÔeÏ®<pÕäÉ«&Y]5yïªÉÑ«&§®»jr÷ÂU£{>0\*Ùä#Ì&×ea6¹
³ÉÓVM¾u³É×N`69HÙäÔÒe£#rMîcÀlrù§À6æ½_cápjrnÉ¢É&ço,¿±hrúÆ¢Éá&go,½±htòæªÉÁ«&çn®»¹jrêæªÉ¡«&gn®¹¹jrâæªÉ«Fçm®·¹jrÚæªÉê\µhÌB¼YÍë.|ÇOÎO+^÷ÿK±Þ½òÛf¬÷_`å Ú´ß/
I am assuming it's an encoding issue, but I am struggling with the solution.
Thanks in advance for your help!
This should help. The url you mentioned gives a zip. You can download that and extract it to get your XML.
EX:
import requests
import zipfile
import StringIO
URL='http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1'
r = requests.get(URL, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()
You could also use urllib
import urllib
urllib.urlretrieve(
"http://oasis.caiso.com/oasisapi/SingleZip?queryname=PRC_FUEL&fuel_region_id=ALL&startdatetime=20130919T07:00-0000&enddatetime=20130928T07:00-0000&version=1",
"oasis.zip")
I'm getting this response when I open this url:
r = Request(r'http://airdates.tv/')
h = urlopen(r).readline()
print(h)
Response:
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x00\xed\xbdkv\xdbH\x96.\xfa\xbbj\x14Q\xaeuJ\xce\xee4E\x82\xa4(9m\xe7\xd2\xd3VZ\xaf2e\xab2k\xf5\xc2\n'
What encoding is this?
Is there a way to decode it based on the standard library?
Thank you in advance for any insight on this matter!
PS: It seems to be gzip.
It's gzip compressed HTML, as you suspected.
Rather than use urllib use requests which will decompress the response for you:
import requests
r = requests.get('http://airdates.tv/')
print(r.text)
You can install it with pip install requests, and never look back.
If you really must restrict yourself to the standard library, then decompress it with the gzip module:
import gzip
import urllib2
from cStringIO import StringIO
f = urllib2.urlopen('http://airdates.tv/')
# how to determine the content encoding
content_encoding = f.headers.get('Content-Encoding')
#print(content_encoding)
# how to decompress gzip data with Python 3
if content_encoding == 'gzip':
response = gzip.decompress(f.read())
# decompress with Python 2
if content_encoding == 'gzip':
gz = gzip.GzipFile(fileobj=StringIO(f.read())
response = gz.read()
mhawke's solution (using requests instead of urllib) works perfectly and in most cases should be preferred.
That said, I was looking for a solution that does not require installing 3rd party libraries (hence my choice of urllib over requests).
I found a solution using standard libraries:
import zlib
from urllib.request import Request, urlopen
r = Request(r'http://airdates.tv/')
h = urlopen(r).read()
decomp_gzip = zlib.decompress(h, 16+zlib.MAX_WBITS)
print(decomp_gzip)
Which yields the following response:
b'<!DOCTYPE html>\n (continues...)'
Please could someone convert the following from python2 to python3;
import requests
url = "http://duckduckgo.com/html"
payload = {'q':'python'}
r = requests.post(url, payload)
with open("requests_results.html", "w") as f:
f.write(r.content)
and I get;
Traceback (most recent call last):
File "C:\temp\Python\testFile.py", line 1, in <module>
import requests
ImportError: No module named 'requests'
I have also tried;
import urllib.request
url = "http://duckduckgo.com/html"
payload = {'q':'python'}
r = urllib.request.post(url, payload)
with open("requests_results.html", "w") as f:
f.write(r.content)
but I get
Traceback (most recent call last):
File "C:\temp\Python\testFile.py", line 5, in <module>
r = urllib.request.post(url, payload)
AttributeError: 'module' object has no attribute 'post'
So, in python3.2, r.content is a bytestring, not a str, and write does not like it. You might want to use r.text instead:
with open("requests_results.html", "w") as f:
f.write(r.text)
You can see it in the requests documentation in http://docs.python-requests.org/en/latest/api.html#main-interface
class requests.Response
content - Content of the response, in bytes.
text - Content of the response, in unicode. if Response.encoding is None and chardet module is available,
encoding will be guessed.
Edit:
I posted before seeing the edited question. Yeah, like Martijn Pieters said, you need to install the requests module for python3 in order to be able to import it.
I think the problem here is that there is no Requests package installed. Or if you have installed it's installed in your python2.x directory and not in python3 so which is why you're not able to use requests module. Try making python3 as your default copy and then install requests.
Also try visiting thisarticle by Michael Foord which walks you through using all the features of urlib2
import urllib.request
import urllib.parse
url = "https://duckduckgo.com/html"
values = {'q':'python'}
data = urllib.parse.urlencode(values).encode("utf-8")
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
the_page = response.read()
print(the_page)