JSON Decode Error when requesting JSON response from Google - python

I'm trying to get my head around the requests python package
import requests
url = "https://www.google.com/search?q=london"
response = requests.get(url, headers={"Accept": "application/json"})
data = response.json()
And i'm receiving the following error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
However this code does work with some other websites.. is there a reason this would error on specific websites and is there a way around it? For example if i wanted the search results when searching London on Google?
Thanks

response.json() does not convert any server response to a JSON, it simply parses 'stringified' JSONs. So if the server returns a string that is not a JSON, then this will throw a decode error.
Some servers do return JSON objects, in which case your code will work. In the case of https://www.google.com/search?q=london this actually returns HTML code (as you would expect since it's a webpage).
You can test this by printing the response:
print(response.text)
which outputs:
# some very long output that ends with:
...();})();google.drty&&google.drty();</script></body></html>
Notice the </html> tag at the end? So this cannot be parsed into a JSON.
So how do you parse this into a usable HTML? You can use beautiful soup:
import requests
from bs4 import BeautifulSoup
url = "https://www.google.com/search?q=london"
response = requests.get(url, headers={"Accept": "application/json"})
soup = BeautifulSoup(response.text)
print(soup.prettify())

Related

How to GET responde status code from get request?

Hi I am very new to python programming. Here I'm trying to write a python script which will get a status code using GET request. I can able to do it for single URL but how to do it for multiple URL's in a single script.
Here is the basic code I have written which will get response code from a url.
import requests
import json
import jsonpath
#API URL
url = "https://reqres.in/api/users?page=2"
#Send Get Request
response = requests.get(url)
if response:
print('Response OK')
else:
print('Response Failed')
# Display Response Content
print(response.content)
print(response.headers)
#Parse response to json format
json_response = json.loads(response.text)
print(json_response)
#Fetch value using Json Path
pages = jsonpath.jsonpath(json_response,'total_pages')
print(pages[0])
try this code.
import requests
with open("list_urls.txt") as f:
for url in f:
response = requests.get(url)
print ("The url is ",url,"and status code is",response.status_code)
I hope this helps.
You can acess to the status code with response.status_code
You can put your code in a function like this
def treat_url(url):
response = requests.get(url)
if response:
print('Response OK')
else:
print('Response Failed')
# Display Response Content
print(response.content)
print(response.headers)
#Parse response to json format
json_response = json.loads(response.text)
print(json_response)
#Fetch value using Json Path
pages = jsonpath.jsonpath(json_response,'total_pages')
print(pages[0])
And have a list of urls and iterate throw it:
url_list=["www.google.com","https://reqres.in/api/users?page=2"]
for url in url_list:
treat_url(url)
A couple of suggestions, the question itself is not very clear, so a good articulation would be useful for all the contributors over here :) ...
Now coming to what I was able to comprehend, there are few modifications that you can do:
response = requests.get(url) You will always get a response object, I think you might want to check the status code here, which you can do by response.status_code and based upon what you get, you say whether or not you got a success response.
and regarding looping, you can check the last page from response JSON as response_json['last_page'] and run a for loop on range(2, last_page + 1) and append the page number in URI to fetch individual pages response
You can directly fetch JSON from response object response.json()
Please refer to requests doc here

Request returns hidden characters

I am using requests.get to read a JSON object. The string downloaded is just a URL to download. I try to feed it in using requests.get(), but I get a 404 error. However, when I hardcode the value and run a requests.get(), I get a 200 response. Here is the pseudocode:
response = requests.get(repository, headers=headers, data=data)
pod_map = json.loads(response.text)['locationMap']
for key in pod_map.keys():
url = pod_map["key"] #url should be something like http://mylink.com
response = requests.get(url)
print response.status_code
The problem is that I when I run the code like this, I get a 404. However, when I just copy/paste url into a variable, I get a 200. Is there something I am missing with regards to encoding/decoding the JSON?

Python request resulting in blank response

I'm relatively new to Python so would like some help, I've created a script which simply use the request library and basic auth to connect to an API and returns the xml or Json result.
# Imports
import requests
from requests.auth import HTTPBasicAuth
# Set variables
url = "api"
apiuser = 'test'
apipass = 'testpass'
# CALL API
r = requests.get(url, auth=HTTPBasicAuth(apiuser, apipass))
# Print Statuscode
print(r.status_code)
# Print XML
xmlString = str(r.text)
print(xmlString)
if but it returns a blank string.
If I was to use a browser to call the api and enter the cretentials I get the following response.
<Response>
<status>SUCCESS</status>
<callId>99999903219032190321</callId>
<result xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Dummy">
<authorFullName>jack jones</authorFullName>
<authorOrderNumber>1</authorOrderNumber>
</result>
</Response>
Can anyone tell me where I'm going wrong.
What API are you connecting to?
Try adding a user-agent to the header:
r = requests.get(url, auth=HTTPBasicAuth(apiuser, apipass), headers={'User-Agent':'test'})
Although this is not an exact answer for the OP, it may solve the issue for someone having a blank response from python-requests.
I was getting a blank response because of the wrong content type. I was expecting an HTML rather than a JSON or a login success. The correct content-type for me was application/x-www-form-urlencoded.
Essentially I had to do the following to make my script work.
data = 'arcDate=2021/01/05'
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
}
r = requests.post('https://www.deccanherald.com/getarchive', data=data, headers=headers)
print(r.status_code)
print(r.text)
Learn more about this in application/x-www-form-urlencoded or multipart/form-data?
Run this and see what responses you get.
import requests
url = "https://google.com"
r = requests.get(url)
print(r.status_code)
print(r.json)
print(r.text)
When you start having to pass things in your GET, PUT, DELETE, OR POST requests, you will add it in the request.
url = "https://google.com"
headers = {'api key': 'blah92382377432432')
r = requests.get(url, headers=headers)
Then you should see the same type of responses. Long story short,
Print(r.text) to see the response, then you once you see the format of the response you get, you can move it around however you want.
I have an empty response only when the authentication failed or is denied.
The HTTP status is still ≤ 400.
However, in the header you can find :
'X-Seraph-LoginReason': 'AUTHENTICATED_FAILED'
or
'X-Seraph-LoginReason': 'AUTHENTICATED_DENIED'
If the request is empty, not even a status code I could suggest waiting some time between printing. Maybe the server is taking time to return the response to you.
import time
time.sleep(5)
Not the nicest thing, but it's worth trying
How can I make a time delay in Python?
I guess there are no errors during execution
EDIT: nvm, you mentioned that you got a status code, I thought you were literally geting nothing.
On the side, if you are using python3 you have to use Print(), it replaced Print

Urllib request throws a decode error when parsing from url

I'm trying to parse the json formatted data from this url: http://ws-old.parlament.ch/sessions?format=json. My browser copes nicely with the json data. But requests always throw the following error:
JSONDecodeError: Expecting value: line 3 column 1 (char 4)
I'm using Python 3.5. And this is my code:
import json
import urllib.request
connection = urllib.request.urlopen('http://ws-old.parlament.ch/affairs/20080062?format=json')
js = connection.read()
info = json.loads(js.decode("utf-8"))
print(info)
The site uses User-Agent filtering to only serve JS to known browsers. Luckily it is easily fooled, just set the User-Agent header to Mozilla:
request = urllib.request.Request(
'http://ws-old.parlament.ch/affairs/20080062?format=json',
headers={'User-Agent': 'Mozilla'})
connection = urllib.request.urlopen(request)
js = connection.read()
info = json.loads(js.decode("utf-8"))
print(info)

Python - POST request repsonse and JSON parsing

I´m using Python 2.7.7 to send a post request to a website. Im using the requests module and my code looks like this: (NAME and PASS are substituted)
r = requests.post("http://play.pokemonshowdown.com/action.php", data="act=login&name=NAME&pass=PASS&challengekeyid="+challstrarr[2]+"&challenge="+challstrarr[3])
print(r.text)
print(r.json())
r.text returns just a blank line, r.Json returns this error: "ValueError: No JSON object could be decoded"
The website i´m requesting has the following tutorial:
you'll need to make an HTTP POST request to http://play.pokemonshowdown.com/action.php with the data act=login&name=USERNAME&pass=PASSWORD&challengekeyid=KEYID&challenge=CHALLENGE
Either way, the response will start with ] and be followed by a JSON object which we'll call data."
I´m not sure if the post request response is faulty (and hence the blank line) or if its not faulty and the json parsing is off
You should pass a dictionary object to the post function (the data argument), only in the get method you should pass a query string:
postData = {
#put you post data here
}
r = requests.post("http://play.pokemonshowdown.com/action.php", data=postData)
print(r.text)
print(r.json())

Categories

Resources