I am using the interface of a website to get data, and I have run multiple programs at the same time. I wrote exception capture in the program. I still get a response 502 error and the program is interrupted, and several programs will be interrupted at the same time. What is the reason?
def search(name):
global n
path = 'https://dev.***.com/api/company/queryByName?name=' + str(name)
s = requests.session()
s.keep_alive = False # 关闭多余连接
try:
r = s.get(path,timeout=3)
print(n,r)
except (ReadTimeout,HTTPError,ConnectionError) as e:
print(e)
return search(name)
else:
n=n+1
result = json.loads(r.text)
Traceback (most recent call last):
File "D:/PyCharm Community Edition/project/company/30.py", line 72, in <module>
data1['social_credit_code'], data1['industry'], data1['reg_place'] = zip(*data1['companyName'].apply(search))
File "C:\Users\13750\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py", line 3848, in apply
739 <Response [502]>
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer
File "D:/PyCharm Community Edition/project/company/30.py", line 49, in search
result = json.loads(r.text)
File "C:\Users\13750\.conda\envs\py36\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Users\13750\.conda\envs\py36\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\13750\.conda\envs\py36\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The requests API will only raise an exception if you are not able to communicate with a server. In this case you did reach a server, but the server then responded by telling you 502 Bad Gateway. This error usually means you communicated with some proxy server which was unable to forward your message to the final destination.
Regardless, that response will be captured by the requests API and returned as a Response object. After you receive a response you always need to make sure that the return code is what you expect (commonly 200). requests has a convenient way to do so:
r = s.get(path,timeout=3)
if r.ok:
# do your work
In this case you didnt check if the response code was okay, and because the response code indicated an error, you didn't receive any JSON data like you thought you did. Which is why the code followed through to the else statement and gave you a JSONDecodeError.
As the traceback clearly shows, a JSONDecoderError is being raised and your code is not catching it.
You should probably not attempt to decode the content of a 502 response. If you want such responses to raise an exception use raise_for_status
try:
r = s.get(path,timeout=3)
r.raise_for_status()
print(n,r)
except (ReadTimeout,HTTPError,ConnectionError) as e:
...
Related
I scrape json pages but sometimes I get this error:
ERROR: Spider error processing <GET https://reqbin.com/echo/get/json/page/2>
Traceback (most recent call last):
File "/home/user/.local/lib/python3.8/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/home/user/path/scraping.py", line 239, in parse_images
jsonresponse = json.loads(response.text)
File "/usr/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 48662 (char 48661)
So I suspect that the json page does not have the time to be fully loaded and that's why parsing of its json content fails. And if I do it manually, I mean taking the json content as a string and loading it with the json module, it works and I don't get the json.decoder.JSONDecodeError error.
What I've done so far is to set in settings.py:
DOWNLOAD_DELAY = 5
DOWNLOAD_TIMEOUT = 600
DOWNLOAD_FAIL_ON_DATALOSS = False
CONCURRENT_REQUESTS = 8
hoping that it would slow down the scraping and solve my problem but the problem still occurs.
Any idea on how to be sure that the json page loaded completely so the parsing of its content does not fail ?
you can try to increase DOWNLOAD_TIMEOUT. It usually helps. If that's not enough, you can try to reduce CONCURRENT_REQUESTS.
If that still doesn't help, try use retry request. You can write your own retry_request function and call it return self.retry_request(response).
Or do it something like that req = response.request.copy(); req.dont_filter=True And return req.
You can also use RetryMiddleware. Read more on the documentation page https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.retry
As a part of a small project of mine, I'm using the requests module to make an API call. Here's the snippet:
date = str(day) + '-' + str(month) + '-' + str(year)
req = "https://cdn-api.co-vin.in/api/v2/appointment/sessions/public/findByDistrict?district_id=" + str(distid) + "&date=" + date
response = requests.get(req,headers={'Content-Type': 'application/json'})
st = str(jprint(response.json()))
file = open("data.json",'w')
file.write(st)
file.close()
The jprint function is as follows:
def jprint(obj):
text = json.dumps(obj,sort_keys=True,indent=4)
return text
This is a part of a nested loop. On the first few runs, it worked successfully but after that it gave the following error:
Traceback (most recent call last):
File "vax_alert2.py", line 99, in <module>
st = str(jprint(response.json()))
File "/usr/lib/python3/dist-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I tried adding a sleep of 1 second but got the same error. How should I resolve it?
Also, I checked it without using the jprint function yet got the exact same error.
I would suggest recording the response in case of exception parsing the response as the response body is likely empty with an error status. It's likely that you're getting a 403 or some other error status (potentially from a DDOS aware firewall). Once you know the potentially errant (empty) response status, you may detect said status and throttle your requests accordingly.
try:
st = str(jprint(response.json()))
file = open("data.json",'w')
file.write(st)
file.close()
except:
print(response)
See the following (from https://docs.python-requests.org/en/master/user/quickstart/):
In case the JSON decoding fails, r.json() raises an exception. For
example, if the response gets a 204 (No Content), or if the response
contains invalid JSON, attempting r.json() raises
simplejson.JSONDecodeError if simplejson is installed or raises
ValueError: No JSON object could be decoded on Python 2 or
json.JSONDecodeError on Python 3.
It should be noted that the success of the call to r.json() does not
indicate the success of the response. Some servers may return a JSON
object in a failed response (e.g. error details with HTTP 500). Such
JSON will be decoded and returned. To check that a request is
successful, use r.raise_for_status() or check r.status_code is what
you expect.
I am trying to get json object, and it tells me it's expecting a value even though i define the path to the json in r.json(). Also when i do r.headers[content-type] give me text/html;charset=ISO-8859-1 ... Thank you for your time everyone
import requests
import json
session = requests.Session()
username = "------"
password = "-------"
url_cookie = 'http://ludwig.podiumdata.com:----/podium/j_spring_security_check?j_username=--&j_password=----'
url_get = 'http://ludwig.corp.podiumdata.com:----/qdc/entity/v1/getEntities?type=EXTERNAL&count=2&sortAttr=name&sortDir=ASC'
r = requests.get(url_get, auth=(username,password), verify=False)
r.json()
r.headers['content-type']
Traceback (most recent call last):
File "<ipython-input-108-61f8159bb1b5>", line 10, in <module>
r.json()
File "//anaconda3/lib/python3.7/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "//anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "//anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "//anaconda3/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
JSONDecodeError: Expecting value
Your request is receiving the response as text/html, but you want to receive application/json.
You need to include {'Accept': 'application/json'} as a header in your request.
Example: requests.get(url_get, auth=(username,password), headers={'Accept': 'application/json'}, verify=False)
Also, it looks like you aren't using the Session you created on line 3, but that isn't what is causing this error.
python 3.4 and Coinbase V2 API
I am working on some BTC data analysis and trying to make continuous requests to coinbase API. When running my script, it will always eventually crash on a calls to
r = client.get_spot_price()
r = client.get_buy_price()
r = client.get_sell_price()
The unusual thing is that the script will always crash at different times. Sometimes it will successfully collect data for an hour or so and then crash, other times it will crash after 5 - 10 minutes.
ERROR:
r = client.get_spot_price()
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/client.py", line 191, in get_spot_price
response = self._get('v2', 'prices', 'spot', data=params)
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/client.py", line 129, in _get
return self._request('get', *args, **kwargs)
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/client.py", line 116, in _request
return self._handle_response(response)
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/client.py", line 125, in _handle_response
raise build_api_error(response)
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/error.py", line 49, in build_api_error
blob = blob or response.json()
File "/home/g/.local/lib/python3.4/site-packages/requests/models.py", line 812, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.4/json/__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)
It seems to be crashing due to some json decoding?
Does anyone have any idea why this will only throw errors at certain times?
I have tried something like the following to avoid crashing due to this error:
#snap is tuple of data containing data from buy, sell , spot price
if not any(snap):
print('\n\n-----ENTRY ERROR---- Snap returned None \n\n')
success = False
return
but it isn't doing the trick
What are some good ways to handle this error in your opinion?
Thanks, any help is much appreciated!
For me it could be something related with that issue https://github.com/coinbase/coinbase-python/issues/15. It seems in fact to be an internal library error (as the code does raise build_api_error(response) what confirms my assertions).
Maybe it possible that the problem is related to a internet connectivity? If your network (or the server fails), it can either fail to retrieve the JSON file or can retrieve an empty one. But, the library should inform you more clearly.
So, it will try to decode an empty file inside the JSON decoder, what causes the error.
A temporary workaround would be to brace your code with a try statement and to try again if it fails.
You have to supply it with a currency to get a price.
Here is an example:
price = client.get_spot_price(currency_pair='XRP-USD')
I'm going over some URL's and I can fetch most of the data I can from an API I'm using. *Imgur API. However when it finds an image that has been posted before but was eventually removed it still shows a positive URL get response (code 200), and when I use
j1 = json.loads(r_positive.text)
I get this error:
http://imgur.com/gallery/cJPSzbu.json
<Response [200]>
Traceback (most recent call last):
File "image_poller_multiple.py", line 61, in <module>
j1 = json.loads(r_positive.text)
File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
How can I "fetch" the error inside the j1 variable instead? I'd like to use a conditional structure to solve the problem and avoid my program from crashing. Something like
if j1 == ValueError:
continue
else:
do_next_procedures()
You need to use try except instead:
try:
j1 = json.loads(r_positive.text)
except ValueError:
# decoding failed
continue
else:
do_next_procedures()
See Handling Exceptions in the Python tutorial.
What really happens is that you were redirected for that URL and you got the image page instead. If you are using requests to fetch the JSON, look at the response history instead:
if r_positive.history:
# more than one request, we were redirected:
continue
else:
j1 = r_positive.json()
or you could even disallow redirections:
r = requests.post(url, allow_redirects=False)
if r.status == 200:
j1 = r.json()
The URL you listed redirects you to a HTML page. (Use curl to check things like this, he's your friend.)
The HTML page obviously cannot be parsed as JSON.
What you probably need is this:
response = fetch_the_url(url)
if response.status == 200:
try:
j1 = json.loads(response.text)
except ValueError:
# json can't be parsed
continue