Urllib request throws a decode error when parsing from url

Urllib request throws a decode error when parsing from url - python

I'm trying to parse the json formatted data from this url: http://ws-old.parlament.ch/sessions?format=json. My browser copes nicely with the json data. But requests always throw the following error:
JSONDecodeError: Expecting value: line 3 column 1 (char 4)
I'm using Python 3.5. And this is my code:
import json
import urllib.request
connection = urllib.request.urlopen('http://ws-old.parlament.ch/affairs/20080062?format=json')
js = connection.read()
info = json.loads(js.decode("utf-8"))
print(info)

The site uses User-Agent filtering to only serve JS to known browsers. Luckily it is easily fooled, just set the User-Agent header to Mozilla:
request = urllib.request.Request(
'http://ws-old.parlament.ch/affairs/20080062?format=json',
headers={'User-Agent': 'Mozilla'})
connection = urllib.request.urlopen(request)
js = connection.read()
info = json.loads(js.decode("utf-8"))
print(info)

Related

creating http request URL which can be pasted in browser from server and request body json

I have the following three parts: server, json_str, req
server = 'http://example.com:9013/run'
json_str = '[{"Leg":[{"currency":"INR","Type":"NA"}],"P":"xyz","code":"0100"}]'
import json as js
req = js.dumps({
"func": "rfqfunc",
"args": ["dummyQuote", json_str, 'True']
})
I usually call to get a response for this using
request.post(server, data=req)
but using these parts How do I make a browser addressbar pastable URL like the following (which is the desired outcome):
http://example.com:9013/run?func=rfqfunc&dummyQuote&[%20{%20"P":%20"xyz",%20"code":%20"0100",%20"Leg":%20[%20{%20"currency":%20"INR",%20"Type":%20"NA"%20}%20]}%20]
I have searched a lot on urlparse, urljoin, urlencode but nothings has given me anything near to desired results.

Why I am getting a invalid checksum response on doing a post request?

I have tried using a rest client (ARC)for doing a post request to a private API and I am getting correct response but when I switch to python and did the same request using the python request package , the response is this
b'{"code":"Invalid Checksum","message":"Invalid Checksum"}'
I am using the same URL , header and body tag. Where can I possibly go wrong .
Here is the code snippet
import requests
import json
request_args = {"Id": -1,"startDate": "2018-01-13","endDate": "2018-01-14","ProviderId": 1}
headers = {'Authorization':'xxxxxx','Content-Type':'application/json','content-md5':'yyyy'}
base_url = "https://myendpoint"
response = requests.post(base_url,data=request_args, headers=headers)
print(response.content)

Python requests - using twitter search

I am trying to use requests to get data from twitter but when i run my code i get this error: simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
This is my code so far:
import requests
url = 'https://twitter.com/search?q=memes&src=typed_query'
results = requests.get(url)
better_results = results.json()
better_results['results'][1]['text'].encode('utf-8')
print(better_results)

because you are making a request to a dynamic website.
when we are making a request to a dynamic website we must render the html first in order to receive all the content that we were expecting to receive.
just making the request is not enough.
other libraries such as requests_html render the html and javascript in background using a lite browser.
you can try this code:
# pip install requests_html
from requests_html import HTMLSession
url = 'https://twitter.com/search?q=memes&src=typed_query'
session = HTMLSession()
response = session.get(url)
# rendering part
response.html.render(timeout=20)
better_results = response.json()
better_results['results'][1]['text'].encode('utf-8')
print(better_results)

JSON Decode Error when requesting JSON response from Google

I'm trying to get my head around the requests python package
import requests
url = "https://www.google.com/search?q=london"
response = requests.get(url, headers={"Accept": "application/json"})
data = response.json()
And i'm receiving the following error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
However this code does work with some other websites.. is there a reason this would error on specific websites and is there a way around it? For example if i wanted the search results when searching London on Google?
Thanks

response.json() does not convert any server response to a JSON, it simply parses 'stringified' JSONs. So if the server returns a string that is not a JSON, then this will throw a decode error.
Some servers do return JSON objects, in which case your code will work. In the case of https://www.google.com/search?q=london this actually returns HTML code (as you would expect since it's a webpage).
You can test this by printing the response:
print(response.text)
which outputs:
# some very long output that ends with:
...();})();google.drty&&google.drty();</script></body></html>
Notice the </html> tag at the end? So this cannot be parsed into a JSON.
So how do you parse this into a usable HTML? You can use beautiful soup:
import requests
from bs4 import BeautifulSoup
url = "https://www.google.com/search?q=london"
response = requests.get(url, headers={"Accept": "application/json"})
soup = BeautifulSoup(response.text)
print(soup.prettify())

https get request with python urllib2

I am trying to fetch data from quandl using urllib2.Please check code below.
import json
from pymongo import MongoClient
import urllib2
import requests
import ssl
#import quandl
codes = [100526];
for id in codes:
url = 'https://www.quandl.com.com//api/v3/datasets/AMFI/"+str(id)+".json?api_key=XXXXXXXX&start_date=2013-08-30'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
data = response.read()
print data
OR
for id in codes:
url = "https://www.quandl.com.com//api/v3/datasets/AMFI/"+str(id)+".json?api_key=XXXXXXXX&start_date=2013-08-30"
request = requests.get(url,verify=False)
print request
I am getting HTTPERROR exception 404 in 1st case. and when I use request module I get SSL error even after using verify=false. I am looking through previous posts but most of them are related to HTTP request.
Thanks for help.
J

This is working for me, but you get a warning about the SSL certificate but you don't need to care about it.
import requests
codes = [100526];
for id in codes:
url = "https://www.quandl.com.com//api/v3/datasets/AMFI/"+str(id)+".json?api_key=XXXXXXXX&start_date=2013-08-30"
request = requests.get(url, verify=False)
print request.text
request.text has your response data.

You seem to be using a wrong URL (.com.com instead of .com) as well as a combination of different quotes in the first version of your code. Use the following instead and it should work:
import urllib2
import requests
codes = [100526]
for id in codes:
url = "https://www.quandl.com//api/v3/datasets/AMFI/"+str(id)+".json?start_date=2013-08-30"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
print response.read()
for id in codes:
url = "https://www.quandl.com//api/v3/datasets/AMFI/"+str(id)+".json?start_date=2013-08-30"
response = requests.get(url,verify=False)
print response.text
To disable the warning about the SSL certificate, use the following code before making the request using requests:
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Urllib request throws a decode error when parsing from url - python

Related

creating http request URL which can be pasted in browser from server and request body json

Why I am getting a invalid checksum response on doing a post request?

Python requests - using twitter search

JSON Decode Error when requesting JSON response from Google

https get request with python urllib2

Categories

Resources