Error when passing urllib.urlopen result to json.load - python

I'm new to python but would like to use urllib to download tweets, I'm following a tutorial instructions but get the same error every time, I print:
import urllib
import json
response = urllib.urlopen("https://twitter.com/search?q=Microsoft&src=tyah")
print json.load(response)
But everytime I get the error:
Traceback (most recent call last):
File "C:\Python27\print.py", line 4, in <module>
print json.load(response)
File "C:\Python27\Lib\json\__init__.py", line 278, in load
**kw)
File "C:\Python27\Lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "C:\Python27\Lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\Lib\json\decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

As noted in comments, the answer is: nothing is wrong with your code, per se.
The problem is that when json.load looks at response, it does not find JSON in there - it is finding HTML.
You need to pass a file-like object containing JSON into the json.load function, or it will raise the exception you see here.
To get JSON from Twitter, you need to call a URL that gives a JSON response. I can tell you now, that none of the Web interface URLs do this directly. You should use the Twitter API.
However, purely for sake of demonstration, if you deconstruct the page at the URL you are calling now, you will find that to load the tweet data, the page makes the following request:
https://twitter.com/i/search/timeline?q=Microsoft&src=tyah&composed_count=0&include_available_features=1&include_entities=1
And this URL does return JSON in response, which would work just fine with your current code.
Of course, I'm pretty sure doing so violates some sort of Twitter TOS, so if you do this there are all sorts of potential negative repercussions to consider. Plus it's just not good sportsmanship. :)

Related

What's the most Pythonic way to parse out this value from a JSON-like blob?

See below. Given a well-known Google URL, I'm trying to retrieve data from that URL. That data will provide me another Google URL from which I can retrieve a list of JWKs.
>>> import requests, json
>>> open_id_config_url = 'https://ggp.sandbox.google.com/.well-known/openid-configuration'
>>> response = requests.get(open_id_config_url)
>>> r.status_code
200
>>> response.text
u'{\n "issuer": "https://www.stadia.com",\n "jwks_uri": "https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt#system.gserviceaccount.com",\n "claims_supported": [\n "iss",\n "aud",\n "sub",\n "iat",\n "exp",\n "s_env",\n "s_app_id",\n "s_gamer_tag",\n "s_purchase_country",\n "s_current_country",\n "s_session_id",\n "s_instance_ip",\n "s_restrict_text_chat",\n "s_restrict_voice_chat",\n "s_restrict_multiplayer",\n "s_restrict_stream_connect",\n ],\n "id_token_signing_alg_values_supported": [\n "RS256"\n ],\n}'
Above I have successfully retrieved the data from the first URL. I can see the entry jwks_uri contains the second URL I need. But when I try to convert that blob of text to a python dictionary, it fails.
>>> response.json()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/saqib.ali/saqib-env-99/lib/python2.7/site-packages/requests/models.py", line 889, in json
self.content.decode(encoding), **kwargs
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
>>> json.loads(response.text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
The only way I can get out the JWKs URL is by doing this ugly regular expression parsing:
>>> re.compile('(?<="jwks_uri": ")[^"]+').findall(response.text)[0]
u'https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt#system.gserviceaccount.com'
Is there a cleaner, more Pythonic way to extract this string?
I really wish Google would send back a string that could be cleanly JSON-ified.
The returned json string is incorrect because last item of the dictionary ends with ,, which json cannot parse.
": [\n "RS256"\n ],\n}'
^^^
But ast.literal_eval can do that (as python parsing accepts lists/dicts that end with a comma). As long as you don't have booleans or null values, it is possible and pythonic
>>> ast.literal_eval(response.text)["jwks_uri"]
'https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt#system.gserviceaccount.com'
Your JSON is invalid because it has an extra comma after the last value in the claims_supported array.
I wouldn't necessarily recommend it, but you could use the similarity of JSON and Python syntax to parse this directly, since Python is much less picky:
ast.literal_eval(response.tezt)
As suggested in this answer use yaml to parse json. It will tolerate the trailing comma as well as other deviations from the json standard.
import yaml
d = yaml.load(response.text)

What am I doing wrong with requests in python: ValueError: Expecting value: line 1 column 1 (char 0)?

I'm not sure even how to ask the question, as it seems it would require quite a lot of code to get into the details. Rather than show the code, I will discuss the behavior when I run.
I am using requests to grab information from an online database. When I run a for loop to go through all of my entries, I get an error like the one below on one of the first 20 entries (usually the first, but not always). The entries in the list are all alike (just different ID numbers). I am using sleep() to ensure that I do not go beyond my rate limit (I have tried increasing sleep to ridiculous wait times, but still get the error). What really surprises me is that it works some, and then gets stuck.... what could cause that?
Also, the code was working before, then I made a large number of edits to other code in the same file, but I didn't think I edited anything related to this.
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/Riot_API_Challenger_Stats/Main.py", line 233, in <module>
main()
File "C:/Users/Mark/PycharmProjects/Riot_API_Challenger_Stats/Main.py", line 212, in main
match_histories=get_match_histories(challenger_Ids+master_Ids)
File "C:/Users/Mark/PycharmProjects/Riot_API_Challenger_Stats/Main.py", line 62, in get_match_histories
match_histories[summoner_Ids[i]]=api.get_match_history_data(summoner_Ids[i])
File "C:\Users\Mark\PycharmProjects\Riot_API_Challenger_Stats\RiotAPI.py", line 52, in get_match_history_data
return self._request(api_url)
File "C:\Users\Mark\PycharmProjects\Riot_API_Challenger_Stats\RiotAPI.py", line 25, in _request
return response.json()
File "C:\Users\Mark\Anaconda3\lib\site-packages\requests\models.py", line 819, in json
return json.loads(self.text, **kwargs)
File "C:\Users\Mark\Anaconda3\lib\json\__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "C:\Users\Mark\Anaconda3\lib\json\decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Mark\Anaconda3\lib\json\decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)
Here are lines 10-25 of RiotAPI
def _request(self, api_url, params={}):
args = {'api_key':self.api_key}
for key, value in params.items():
if key not in args:
args[key] = value
#requests.get accesses the URL
response = requests.get(
Consts.URL['base'].format(
proxy=self.region,
region=self.region,
url=api_url
),
params=args
)
print(response.url)
return response.json()
Here is the response:
{"matches":[{"matchId":1878534497,"region":"NA","platformId":"NA1","matchMode":"CLASSIC","matchType":"MATCHED_GAME","matchCreation":1436223958539,"matchDuration":2097,"queueType":"RANKED_SOLO_5x5","mapId":11,"season":"SEASON2015","matchVersion":"5.12.0.348","participants":[{"teamId":200,"spell1Id":4,"spell2Id":7,"championId":15,"highestAchievedSeasonTier":"UNRANKED","timeline":{"creepsPerMinDeltas":{"zeroToTen":5.699999999999999,"tenToTwenty":6.9,"twentyToThirty":7.1},"xpPerMinDeltas":{"zeroToTen":358.5,"tenToTwenty":350.0,"twentyToThirty":364.20000000000005},"goldPerMinDeltas":{"zeroToTen":365.3,"tenToTwenty":337.5,"twentyToThirty":287.5},"csDiffPerMinDeltas":{"zeroToTen":-0.7,"tenToTwenty":-1.7000000000000004,"twentyToThirty":1.0999999999999999},"xpDiffPerMinDeltas":{"zeroToTen":-0.9000000000000057,"tenToTwenty":-114.75,"twentyToThirty":-121.19999999999999},"damageTakenPerMinDeltas":{"zeroToTen":480.5,"tenToTwenty":565.3,"twentyToThirty":1258.6},"damageTakenDiffPerMinDeltas":{"zeroToTen":-147.49999999999994,"tenToTwenty":-134.69999999999996,"twentyToThirty":15.0},"role":"DUO_CARRY","lane":"BOTTOM"},"masteries":[{"masteryId":4112,"rank":4},{"masteryId":4114,"rank":1},{"masteryId":4122,"rank":3},{"masteryId":4124,"rank":1},{"masteryId":4132,"rank":1},{"masteryId":4134,"rank":3},{"masteryId":4142,"rank":2},{"masteryId":4144,"rank":1},{"masteryId":4151,"rank":1},{"masteryId":4152,"rank":3},{"masteryId":4162,"rank":1},{"masteryId":4211,"rank":2},{"masteryId":4212,"rank":2},{"masteryId":4221,"rank":1},{"masteryId":4222,"rank":3},{"masteryId":4232,"rank":1}],"stats":{"winner":false,"champLevel":14,"item0":3031,"item1":0,"item2":3142,"item3":3035,"item4":1053,"item5":3250,"item6":3342,"kills":4,"doubleKills":1,"tripleKills":0,"quadraKills":0,"pentaKills":0,"unrealKills":0,"largestKillingSpree":3,"deaths":12,"assists":5,"totalDamageDealt":184710,"totalDamageDealtToChampions":27477,"totalDamageTaken":30740,"largestCriticalStrike":684,"totalHeal":2952,"minionsKilled":237,"neutralMinionsKilled":1,"neutralMinionsKilledTeamJungle":1,"neutralMinionsKilledEnemyJungle":0,"goldEarned":12074,"goldSpent":12065,"combatPlayerScore":0.....etc.}}]}]}
It'd be helpful if you can include a snippet of your code to see how you've actually used it.
However, from the exception, I think you're trying to do a
return response.json()
at line 25 of C:\Users\Mark\PycharmProjects\Riot_API_Challenger_Stats\RiotAPI.py
but the response is not in JSON format.
You can see the raw response with
print response.text
to see the string version of the response and check that the string is in JSON format.
It would be nice if you could post the actual code. But if you are unable to do so due to confidential reasons we can surmise some information from the stack trace.
You are using some HTTP protocol (SOAP/ResT) API to get a number( or series of numbers) in JSON format. One of these ID numbers has a character that is not expected or the JSON is invalid itself. Try and print the JSON request you receive before you pass it to see which one fails. Then create a unit test and run it try to analyze it with breakpoints.
Could be some sort of hyphenated or foreign character based on the database.

Unexpected behaviour exhibited by Python's JSON module

I am doing a simple wrapper program around a software tool, and this wrapper will, among other things, parse a JSON configuration file that will be coded by the user as a way for the user to specify certain configutations to the program.
Within the program, I have the following lines of code:
with open(os.path.join(args.config_bin_dir, 'basic_config.json'), 'r') as jsondata :
print "JSON File contents:", jsondata.read()
basic_config.update(json.load(jsondata))
print "The basic_config after updating with JSON:", basic_config
jsondata.close()
The basic_config object is a dictionary object that gets constructed somewhere on top earlier up in the program, and over here I want to add the new dictionary key-value pairs obtained from parsing the user's JSON configuration file into the basic_config dictionary object.
When the program is run, I get the following error, a section of which is as follows:
File "path/to/python/script.py", line 42, in main
basic_config.update(json.load(jsondata))
File "/usr/lib/python2.7/json/__init__.py", line 290, in load
**kw)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
I'm quite puzzled by this error, but I noticed that it goes away so long as I remove the line print "JSON File contents:", jsondata.read() and run the code as either:
with open(os.path.join(args.config_bin_dir, 'basic_config.json'), 'r') as jsondata :
# print "JSON File contents:", jsondata.read()
basic_config.update(json.load(jsondata))
print "The basic_config after updating with JSON:", basic_config
jsondata.close()
or
with open(os.path.join(args.config_bin_dir, 'basic_config.json'), 'r') as jsondata :
# print "JSON File contents:", jsondata.read()
basic_config.update(json.load(jsondata))
# print "The basic_config after updating with JSON:", basic_config
jsondata.close()
I'll still get the same error if I comment out the line after the json.load method:
with open(os.path.join(args.config_bin_dir, 'basic_config.json'), 'r') as jsondata :
print "JSON File contents:", jsondata.read()
basic_config.update(json.load(jsondata))
# print "The basic_config after updating with JSON:", basic_config
jsondata.close()
So I'm guessing that it has to do with calling the read method on the JSON file object before parsing that same file. Could anyone please explain to me why this is so, and whether I am mistaken about the cause of the error?
Thank you very much!
When you call read() on a file handle, its current read position is advanced to the end of the file. This mirrors the behavior of the underlying file descriptors. You can reset to the start of a file via jsondata.seek(0, 0).

No JSON object could be decoded. Threading Python

I've been trying to make multiple HTTP request using threading.
First, I have a method that calls to an API, and that method takes a dict as an argument and returns a JSON object. Additionally, it works perfectly fine when I run it without threading. However, here is the code and error I get when trying to use threading.
import sys
sys.path.append(r'C:/dict/path')
import apiModule
import threading
token = 'xxxx'
apiModule = apiModule.Module(token)
urls = [{'url': 'http://www.example.com'}, {'url': 'http://www.example.com/2'}]
data = []
for element in urls:
thread = threading.Thread(target=apiModule.method(),kwargs=element)
thread.start()
data.append(thread)
Traceback (most recent call last):
File "<pyshell#72>", line 2, in <module>
thread = threading.Thread(target=apiModule.method(),kwargs=element)
File "C:/dict/path", line 54, in method
return json.loads(r.content)
File "C:\Python27\lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\lib\json\decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Any help would be appreciated!
To expand on #falsetru's reply: According to the threading documentation:
target is the callable object to be invoked by the run() method
In your example you pass target=apiModule.method(), immediately invoking the method "method" on apiModule. This raises the error you're seeing. Currently the code is not even reaching the thread.start() call.

json.loads() failing

So I'm working on a django project which uses a celery task queue to make HTTP requests.
In my celery task code I have:
json.loads('{"content-type": "application/json"}')
print test.headers
json.loads(test.headers)
Which results in:
[2012-07-19 17:02:38,536: WARNING/PoolWorker-4] '{"content-type": "application/json"}'
[2012-07-19 17:02:38,569: ERROR/MainProcess] Task core.tasks.test_run[f304bcdd-72b3-4dd5-9abb-927dc29e7f65] raised exception: ValueError('No JSON object could be decoded',)
Traceback (most recent call last):
File "/usr/local/bin/lib/python2.7/site-packages/celery/task/trace.py", line 212, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/ironman_deploy/Ironman/core/tasks.py", line 18, in test_run
json.loads(test.headers)
File "/usr/local/bin/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/usr/local/bin/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/bin/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
No JSON object could be decoded: No JSON object could be decoded
I have literally no idea what's going on... clearly json can decode the string, because it doesn't fail 2 lines above, however when I pass the string in by reference it seems to choke.
Could anyone shed light on this for me?
test.headers could be a dict. If you print it, it will output something looking like JSON, yet test.headers might not be JSON at all, and decoding it would cause JSON to choke.
Whats is "test.headers" your snippet doesn't indicate this. If test.headers is being assigned to the result of the first json.loads call, then the second one will obviosuly fail since you aren't providing it a string. The second call should be json.dumps(test.headers)

Categories

Resources