Python struggling to read many JSON files?
I'm writing a short script to check for which 5 letter twitter handles are available, basically 5 for loops and then using Twitter API to check if it is available.
In the middle for loop I have two lines:
response = requests.get("https://twitter.com/users/username_available?username=" + user)
print user, str(response.json()["valid"])
It ran for a little bit and at some point decided it couldn't read JSON files anymore, and now when I try running it it stops immediately with the same error:
File "check.py", line 25, in <module>
main()
File "check.py", line 16, in main
print user, str(response.json()["valid"])
File "/Library/Python/2.7/site-packages/requests/models.py", line 886, in json
return complexjson.loads(self.text, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
The only logical answer that comes to my mind is that my computer can't handle so many JSON requests but I was wondering if anyone knew any way to get around this.
Worked it out after many minutes of confusion..
Twitter has a rate limit of 180 calls every 15 minutes.
https://dev.twitter.com/rest/public/rate-limiting
Related
I scrape json pages but sometimes I get this error:
ERROR: Spider error processing <GET https://reqbin.com/echo/get/json/page/2>
Traceback (most recent call last):
File "/home/user/.local/lib/python3.8/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/home/user/path/scraping.py", line 239, in parse_images
jsonresponse = json.loads(response.text)
File "/usr/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 48662 (char 48661)
So I suspect that the json page does not have the time to be fully loaded and that's why parsing of its json content fails. And if I do it manually, I mean taking the json content as a string and loading it with the json module, it works and I don't get the json.decoder.JSONDecodeError error.
What I've done so far is to set in settings.py:
DOWNLOAD_DELAY = 5
DOWNLOAD_TIMEOUT = 600
DOWNLOAD_FAIL_ON_DATALOSS = False
CONCURRENT_REQUESTS = 8
hoping that it would slow down the scraping and solve my problem but the problem still occurs.
Any idea on how to be sure that the json page loaded completely so the parsing of its content does not fail ?
you can try to increase DOWNLOAD_TIMEOUT. It usually helps. If that's not enough, you can try to reduce CONCURRENT_REQUESTS.
If that still doesn't help, try use retry request. You can write your own retry_request function and call it return self.retry_request(response).
Or do it something like that req = response.request.copy(); req.dont_filter=True And return req.
You can also use RetryMiddleware. Read more on the documentation page https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.retry
I'm trying to execute Overpass queries from a Python script. I'm practicing at overpass-turbo.eu and found the following query to work as intended:
[out:json][timeout:600];
{{geocodeArea:Niedersachsen}}->.searchArea;
(
node[place=city](area.searchArea);
node[place=town](area.searchArea);
);
out;
However, when I submit the exact same query from a Python script, I get an error:
import requests
overpass_query = """
[out:json][timeout:600];
{{geocodeArea:Niedersachsen}}->.searchArea;
(
node[place=city](area.searchArea);
node[place=town](area.searchArea);
);
out;
"""
overpass_url = "http://overpass-api.de/api/interpreter"
response = requests.get(overpass_url, params={'data': overpass_query})
data = response.json()
/home/enno/events/docker/etl/venv/bin/python /home/enno/events/docker/etl/test2.py
Traceback (most recent call last):
File "/home/enno/events/docker/etl/test2.py", line 16, in <module>
data = response.json()
File "/home/enno/events/docker/etl/venv/lib/python3.6/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Process finished with exit code 1
Why is this? It seems to have to do with the curly braces, but I can't figure out how to solve this.
Many thanks,
Enno
The curly braces (aka {{geocodeArea:Niedersachsen}}) are a special feature of overpass turbo and are not part of Overpass API. See extended overpass turbo queries for a list of these shortcuts.
{{geocodeArea:name}} will tell overpass turbo to perform a geocoding request using Nominatim. It will then use the first result to construct an area(id) query. You have to perform the same step (using Nominatim or any other geocoder) in your program.
I got a program from a formerly colleague and now should maintain it.
This python script asks our Jira instance with a given jql ( on the API ).
The return is a list of all issues, which are matching the search criteria.
But now it's not working, and I receive on the server ( Ubuntu ) and on my local windows PC a Json error message.
note : it ran for about a year not, but back then it worked.
Here is what the script looks like :
import json
import subprocess
jiraSerachUrl = "https://ourJiraInstance.net/rest/api/2/search?jql=key%20=%20%22TEST-123%22"
jiraResponse = subprocess.Popen(["curl","-l","-s","-u", "jiraUser"+":"+"jiraUserPassword", "-X", "GET", jiraSerachUrl ],stdout=subprocess.PIPE,shell=True).communicate()[0]
## shell=True only added for Windows Instance
print(type(jiraResponse))
##print = <class 'bytes'>
print(jiraResponse)
## print = b''
jiraJsonResponse = json.loads(jiraResponse.decode('utf-8'))
print(jiraJsonResponse)
The jql/jira search address returns the following (shorted answer, all fields of the task are returned):
{"expand":"names,schema","startAt":0,"maxResults":50,"total":1,"issues":
[{"expand":"operations,versionedRepresentations,editmeta,changelog,transitions,renderedFields",
"id":"145936","self":"https://ourJiraInstance.net/rest/api/2/issue/145936","key":"TEST-123","fields":{"parent": ...
The Error on the Windows PC is the following
Traceback (most recent call last): File
"C:\Users\User\Desktop\test.py", line 10, in
jiraJsonResponse = json.loads(jiraResponse.decode('utf-8')) File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\json__init__.py",
line 319, in loads
return _default_decoder.decode(s) File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py",
line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Users\User\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py",
line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char
0)
This is the error on the Ubuntu Server ( running the same script )
Traceback (most recent call last): File "searchJira.py", line 33, in
jiraJsonResponse = json.loads(jiraResponse) File "/usr/lib/python2.7/json/init.py", line 338, in loads
return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded
So far I tried to change the Json load to simpleJson, but with the same result.
Changing the format to which Json should decode ( e.g. unicode ) took no effect.
I have tried a bit and finaly got it. by replacing curl with responses i got finally the result I wanted. my request looks now like this :
r = requests.get(jiraSerachUrl,auth=HTTPBasicAuth(user, password), verify=False)
jiraJsonResponse=json.loads(r.text)
python 3.4 and Coinbase V2 API
I am working on some BTC data analysis and trying to make continuous requests to coinbase API. When running my script, it will always eventually crash on a calls to
r = client.get_spot_price()
r = client.get_buy_price()
r = client.get_sell_price()
The unusual thing is that the script will always crash at different times. Sometimes it will successfully collect data for an hour or so and then crash, other times it will crash after 5 - 10 minutes.
ERROR:
r = client.get_spot_price()
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/client.py", line 191, in get_spot_price
response = self._get('v2', 'prices', 'spot', data=params)
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/client.py", line 129, in _get
return self._request('get', *args, **kwargs)
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/client.py", line 116, in _request
return self._handle_response(response)
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/client.py", line 125, in _handle_response
raise build_api_error(response)
File "/home/g/.local/lib/python3.4/site-packages/coinbase/wallet/error.py", line 49, in build_api_error
blob = blob or response.json()
File "/home/g/.local/lib/python3.4/site-packages/requests/models.py", line 812, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.4/json/__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)
It seems to be crashing due to some json decoding?
Does anyone have any idea why this will only throw errors at certain times?
I have tried something like the following to avoid crashing due to this error:
#snap is tuple of data containing data from buy, sell , spot price
if not any(snap):
print('\n\n-----ENTRY ERROR---- Snap returned None \n\n')
success = False
return
but it isn't doing the trick
What are some good ways to handle this error in your opinion?
Thanks, any help is much appreciated!
For me it could be something related with that issue https://github.com/coinbase/coinbase-python/issues/15. It seems in fact to be an internal library error (as the code does raise build_api_error(response) what confirms my assertions).
Maybe it possible that the problem is related to a internet connectivity? If your network (or the server fails), it can either fail to retrieve the JSON file or can retrieve an empty one. But, the library should inform you more clearly.
So, it will try to decode an empty file inside the JSON decoder, what causes the error.
A temporary workaround would be to brace your code with a try statement and to try again if it fails.
You have to supply it with a currency to get a price.
Here is an example:
price = client.get_spot_price(currency_pair='XRP-USD')
I want to create issue on JIRA using python, so I am learning the way on Welcome to jira-python's documentation.
But then the first question puzzles me. What is the server if we are using our own JIRA? On this documentation, it uses https://jira.atlassian.com. If I am using JIRA whose url is like: https://bugs.company.com/secure/Dashboard.jspa. What is the server for me?
Now, I am using
jira = JIRA(options={'server': 'https://bugs.company.com'})
projects = jira.projects()
keys = [project.key for project in projects]
I will get the error:
Traceback (most recent call last):
File "MethodTest.py", line 9, in <module>
projects = jira.projects()
File "/Library/Python/2.7/site-packages/jira/client.py", line 838, in projects
r_json = self._get_json('project')
File "/Library/Python/2.7/site-packages/jira/client.py", line 1423, in _get_json
r_json = json.loads(r.text)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
The problems might comes from the fact that you are using a secure connection to your jira instance. You need to setup a proper certificate for your connection or simply disable certificate verification.
See jira.client.JIRA options and set verify to False as such:
jira = JIRA(options={'server': 'https://bugs.company.com',
'verify': False})
Are you setting the proper username and password?
Finally, you might want to check with your IT department for the proper url.