Can't extract JSON from an http request - python

I'm having problems getting data from an HTTP response. The format unfortunately comes back with '\n' attached to all the key/value pairs. JSON says it must be a str and not "bytes".
I have tried a number of fixes so my list of includes might look weird/redundant. Any suggestions would be appreciated.
#!/usr/bin/env python3
import urllib.request
from urllib.request import urlopen
import json
import requests
url = "http://finance.google.com/finance/info?client=ig&q=NASDAQ,AAPL"
response = urlopen(url)
content = response.read()
print(content)
data = json.loads(content)
info = data[0]
print(info)
#got this far - planning to extract "id:" "22144"

When it comes to making requests in Python, I personally like to use the requests library. I find it easier to use.
import json
import requests
r = requests.get('http://finance.google.com/finance/info?client=ig&q=NASDAQ,AAPL')
json_obj = json.loads(r.text[4:])
print(json_obj[0].get('id'))
The above solution prints: 22144
The response data had a couple unnecessary characters at the head, which is why I am only loading the relevant (json) portion of the response: r.text[4:]. This is the reason why you couldn't load it as json initially.

Bytes object has method decode() which converts bytes to string. Checking the response in the browser, seems there are some extra characters at the beginning of the string that needs to be removed (a line feed character, followed by two slashes: '\n//'). To skip the first three characters from the string returned by the decode() method we add [3:] after the method call.
data = json.loads(content.decode()[3:])
print(data[0]['id'])
The output is exactly what you expect:
22144

JSON says it must be a str and not "bytes".
Your content is "bytes", and you can do this as below.
data = json.loads(content.decode())

Related

JSONDecodeError: Expecting value: line 1 column 1 (char 0) / While json parameter include

I'm trying to retrieve data from https://clinicaltrials.gov/ and althought I've specified the format as Json in the request parameter:
fmt=json
the returned value is txt by default.
As a consequence i'm not able to retrieve the response in json()
Good:
import requests
response = requests.get('https://clinicaltrials.gov/api/query/study_fields?expr=heart+attack&fields=NCTId%2CBriefTitle%2CCondition&min_rnk=1&max_rnk=&fmt=json')
response.text
Not Good:
import requests
response = requests.get('https://clinicaltrials.gov/api/query/study_fields?expr=heart+attack&fields=NCTId%2CBriefTitle%2CCondition&min_rnk=1&max_rnk=&fmt=json')
response.json()
Any idea how to turn this txt to json ?
I've tried with response.text which is working but I want to retrieve data in Json()
You can use following code snippet:
import requests, json
response = requests.get('https://clinicaltrials.gov/api/query/study_fields?expr=heart+attack&fields=NCTId%2CBriefTitle%2CCondition&min_rnk=1&max_rnk=&fmt=json')
jsonResponse = json.loads(response.content)
You should use the JSON package (that is built-in python, so you don't need to install anything), that will convert the text into a python object (dictionary) using the json.loads() function. Here you can find some examples.

python unable to parse JSON Data

I am unable to parse the JSON data using python.
A webpage url is returning JSON Data
import requests
import json
BASE_URL = "https://www.codechef.com/api/ratings/all"
data = {'page': page, 'sortBy':'global_rank', 'order':'asc', 'itemsPerPage':'40' }
r = requests.get(BASE_URL, data = data)
receivedData = (r.text)
print ((receivedData))
when I printed this, I got large text and when I validated using https://jsonlint.com/ it showed VALID JSON
Later I used
import requests
import json
BASE_URL = "https://www.codechef.com/api/ratings/all"
data = {'page': page, 'sortBy':'global_rank', 'order':'asc', 'itemsPerPage':'40' }
r = requests.get(BASE_URL, data = data)
receivedData = (r.text)
print (json.loads(receivedData))
When I validated the large printed text using https://jsonlint.com/ it showed INVALID JSON
Even if I don't print and directly use the data. It is working properly. So I am sure even internally it is not loading correctly.
is python unable to parse the text to JSON properly?
in short, json.loads converts from a Json (thing, objcet, array, whatever) into a Python object - in this case, a Json Dictionary. When you print that, it will print as a itterative and therefore print with single quotes..
Effectively your code can be expanded:
some_dictionary = json.loads(a_string_which_is_a_json_object)
print(some_dictionary)
to make sure that you're printing json-safe, you would need to re-encode with json.dumps
When you use python's json.loads(text) it returns a python dictionary. When you print that dictionary out it is not in json format.
If you want a json output you should use json.dumps(json_object).

Why is it giving me the error, "the JSON object must be str, not 'bytes'", and how do I fix it?

I was following a tutorial about how to use JSON objects (link: https://www.youtube.com/watch?v=Y5dU2aGHTZg). When they ran the code, they got no errors, but I did. Is it something to do with different Python versions or something?
from urllib.request import urlopen
import json
def printResults(data):
theJSON = json.loads(data)
print (theJSON)
def main():
urlData ="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson"
webUrl = urlopen(urlData)
print(webUrl.getcode())
if (webUrl.getcode()==200):
data = webUrl.read()
printResults(data)
else:
print ("You failed")
main()
The HTTPResponse object returned from urlopen reads bytes data (raw binary data), not str data (textual data), while the json module works with str. You need to know (or inspect the headers to determine) the encoding used for the data received, and decode it appropriately before using json.loads.
Assuming it's UTF-8 (most websites are), you can just change:
data = webUrl.read()
to:
data = webUrl.read().decode('utf-8')
and it should fix your problem.
I think they were using a different version of the urllib
Try with urllib3 and do the import like this:
from urllib import urlopen
Hope this is the fix to your problem

Simple POST using urllib with Python 3.3

I'm participating in a hash-breaking contest and I'm trying to automate posting strings to an html form and getting the hash score back. So far I've managed to get SOMETHING posted to the url, but its not the exact string I'm expecting, and thus the value returned for the hash is way off from the one obtained by just typing in the string manually.
import urllib.parse, urllib.request
url = "http://almamater.xkcd.com/?edu=una.edu"
data = "test".encode("ascii")
header = {"Content-Type":"application/octet-stream"}
req = urllib.request.Request(url, data, header)
f = urllib.request.urlopen(req)
print(f.read())
#parse f to pull out hash
I obtain the following hash from the site:
0fff9563bb3279289227ac77d319b6fff8d7e9f09da1247b72a0a265cd6d2a62645ad547ed8193db48cff847c06494a03f55666d3b47eb4c20456c9373c86297d630d5578ebd34cb40991578f9f52b18003efa35d3da6553ff35db91b81ab890bec1b189b7f52cb2a783ebb7d823d725b0b4a71f6824e88f68f982eefc6d19c6
This differs considerably from what I expected, which is what you get if you type in "test" (no quotes) into the form:
e21091dbb0d61bc93db4d1f278a04fe1a51165fb7262c7da31f886ae09ff3e04c41483c500db2792c59742958d8f7f39fe4f4f2cdc7940b7b25e3289b89d344e06f76305b9de525933b5df5dae2a37388f82cf76374fe363587acfb49b9d2c8fc131ef4a32c762be083b07330989b298d60e312f56a6b8a4c0f53c9b59864fb7
Obviously the code isn't doing what I'm expecting it to do. Any tips?
When you submit your form data, it also includes the field name, so when you submit "test" the data submitted actually looks like "hashable=test". Try changing your data like this:
data = "hashable=test".encode("ascii")
or alternatively:
data = urllib.parse.urlencode({'hashable': 'test'})

How do I fix a "JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)"?

I'm trying to get Twitter API search results for a given hashtag using Python, but I'm having trouble with this "No JSON object could be decoded" error. I had to add the extra % towards the end of the URL to prevent a string formatting error. Could this JSON error be related to the extra %, or is it caused by something else? Any suggestions would be much appreciated.
A snippet:
import simplejson
import urllib2
def search_twitter(quoted_search_term):
url = "http://search.twitter.com/search.json?callback=twitterSearch&q=%%23%s" % quoted_search_term
f = urllib2.urlopen(url)
json = simplejson.load(f)
return json
There were a couple problems with your initial code. First you never read in the content from twitter, just opened the url. Second in the url you set a callback (twitterSearch). What a call back does is wrap the returned json in a function call so in this case it would have been twitterSearch(). This is useful if you want a special function to handle the returned results.
import simplejson
import urllib2
def search_twitter(quoted_search_term):
url = "http://search.twitter.com/search.json?&q=%%23%s" % quoted_search_term
f = urllib2.urlopen(url)
content = f.read()
json = simplejson.loads(content)
return json

Categories

Resources