Web spider: [ValueError] No JSON object could be decoded - python

Recenetly I have been doing the web spider for fun. I want to learn how to login a website with verify code. One way I learnt is to use cookies. So I had a try. But I realised a problem.
For example, I want to use request.session to get url:www.lovetvshow.com
And I can get all the html text, but when I was trying to convert it to Json, it failed. It always shows "[ValueError] No JSON object could be decoded". But I have already had the text. Why is it no json object?
session = requests.session()
login_data = {'email': email, 'password': password}
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Host': 'www.lovetvshow.com'
}
# r = session.post('http://www.renren.com/', data=login_data, headers=header)
r = session.get('http://www.lovetvshow.com/',headers=header)
print r
print r.json()
This will yield:
<Response [200]>
Traceback (most recent call last):
File "C:/Users/Hao/PycharmProjects/WebSpiderTutorial1/WebSpiderTutorial1.py", line 128, in <module>
requests_session, requests_cookies = create_session()
File "C:/Users/Hao/PycharmProjects/WebSpiderTutorial1/WebSpiderTutorial1.py", line 104, in create_session
print r.json()
File "C:\Python27\lib\site-packages\requests\models.py", line 892, in json
return complexjson.loads(self.text, **kwargs)
File "C:\Python27\lib\json\__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\lib\json\decoder.py", line 383, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Any suggestions? Thanks a head.

You have to make sure there is JSON to be decoded. Check the complete text using
print print r.text . And look for your json

Related

python/requests/bs4 getting empty output [duplicate]

I am making a Python request using the request library to TikTok. I managed to dig up the URL for their user details (I don't know whether this is legal or not. If it isn't, please let me know). When I try to parse it into json, it raises an exception. Could someone helpme parse/fix this? Here is the code:
Python Code:
r1 = requests.get("https://www.tiktok.com/node/share/user/#nike?isUniqueId=true&verifyFp=verify_kb51zknj_GH98fcme_eDuR_4XzM_ATwp_s8TRdCzr8fwi&_signature=KBbp4AAgEBCtR.e4r-y0ZSgWqPAAHbR").json()
print(r1)
Output:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Thanks
you must provide user-agent header, ie
headers = {
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
}
r1 = requests.get("https://www.tiktok.com/node/share/user/#nike?isUniqueId=true&verifyFp=verify_kb51zknj_GH98fcme_eDuR_4XzM_ATwp_s8TRdCzr8fwi&_signature=KBbp4AAgEBCtR.e4r-y0ZSgWqPAAHbR", headers=headers).json()
print(r1)
sory I tried to post it as comment, but failed to format the code xD
I wrote a wrapper in Python that allows you to fetch Users, Videos, Hashtags, Videos by music, etc.
The project can be found here - TikTokAPI-Python
For your problem of fetching a user -
Install
pip install PyTikTokAPI
Get user
from TikTokAPI import TikTokAPI
api = TikTokAPI()
user_obj = api.getUserByName("fcbarcelona")

TypeError: Object of type set is not JSON serializable while using requests

So, I was writing a program that uses the requests library. While trying to do something I got this error:
Traceback (most recent call last):
File "C:\Users\myste\Desktop\Bang.com keker\e.py", line 10, in <module>
print(requests.get('https://www.bang.com/login_check',headers=headd,json=content).text)
File "C:\Python39\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "C:\Python39\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python39\lib\site-packages\requests\sessions.py", line 528, in request
prep = self.prepare_request(req)
File "C:\Python39\lib\site-packages\requests\sessions.py", line 456, in prepare_request
p.prepare(
File "C:\Python39\lib\site-packages\requests\models.py", line 319, in prepare
self.prepare_body(data, files, json)
File "C:\Python39\lib\site-packages\requests\models.py", line 469, in prepare_body
body = complexjson.dumps(json)
File "C:\Python39\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Python39\lib\json\encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python39\lib\json\encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "C:\Python39\lib\json\encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable
This is my code:
import requests
headd = {
"User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36",
"Pragma":"no-cache" ,
"Accept":"*/*"
}
content = {
f"\"_username\":\"MY_EMAIL\",\"_password\":\"MY_PASSWORD\",\"_remember_me\":true"
}
print(requests.get('https://www.bang.com/login_check',headers=headd,json=content).text)
What this program basically does is, it will take a list of your email and passwords and then checks if they are valid. Its incase you forget your accounts password. Dont ask me what is the point of this program cuz idk myself, I got a request to make this so I am making it moyai
When passing content to the json parameter, your passing a set, however, the json parameter is expecting a dict.
content = {
f"\"_username\":\"MY_EMAIL\",\"_password\":\"MY_PASSWORD\",\"_remember_me\":true"
}
>>> print(type(content))
<class 'set'>
Instead, it should be:
content = {"_username": "MY_EMAIL", "_password": "MY_PASSWORD", "_remember_me": "true"}
>>> print(type(content))
<class 'dict'>
Your content is actually a set not a dict. Replace it with something like:
content = {"_username": "my_email", "_password": "my-password"} # etc...

Requests Json throws an error in python 3.9

I am running the below code, it runs fine in python 2.7 but throws an error in python 3.7 or 3.9 compiler.
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
username = "geeks_for_geeks"
user_info = requests.get('https://instagram.com/%s/?__a=1'%username, headers = headers)
print (user_info.json())
The error in python is below:
Traceback (most recent call last):
File "", line 13, in
File "/usr/lib/python3/dist-packages/requests/models.py", line 897, in json
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I would bet that you are getting back an error from the instagram site and the JSON decoder can't process the message (because it is not JSON). try
print(user_info.text)
to see what the site is returning.

Python - Requests - JSONDecodeError

Hello there,
I get the following error when running the following script in Python:
import requests
r = requests.get('https://www.instagram.com/p/CJDxE7Yp5Oj/?__a=1')
data = r.json()['graphql']['shortcode_media']
C:\ProgramData\Anaconda3\envs\test\python.exe C:/Users/Solba/PycharmProjects/test/main.py
Traceback (most recent call last):
File "C:/Users/Solba/PycharmProjects/test/main.py", line 4, in
data = r.json()
File "C:\ProgramData\Anaconda3\envs\test\lib\site-packages\requests\models.py", line 900, in json
return complexjson.loads(self.text, **kwargs)
File "C:\ProgramData\Anaconda3\envs\test\lib\json_init_.py", line 357, in loads
return _default_decoder.decode(s)
File "C:\ProgramData\Anaconda3\envs\test\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\ProgramData\Anaconda3\envs\test\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Process finished with exit code 1
Python version: 3.9
PyCharm version: 2020.3.1
Anaconda version: 1.10.0
Please help. Thank u.
r.json() expects a JSON string to be returned by the API. The API should explicitly say it is responding with JSON through response headers.
In this case, the URL you are requesting is either not responding with a proper JSON or not explicitly saying it is responding with a JSON.
You can first check the response sent by the URL by:
data = r.text
print(data)
If the response can be treated as a JSON string, then you can process it with:
import json
data = json.loads(r.text)
Note:
You can also check the content-type and Accept headers to ensure the request and response are in the required datatype
The reason is because the response is not returning JSON, but instead a whole HTML page. Try r.text instead of r.json()..., and then do whatever you want from there.
If you are not sure the type of content it returns:
h = requests.head('https://www.instagram.com/p/CJDxE7Yp5Oj/?__a=1')
header = h.headers
contentType = header.get('content-type')
print(contentType)
Based on your URL, it returns text/html.
Alternatively, you can try to add a User-Agent in your request - this is to emulate the request to make it look like it comes from a browser, and not a script.
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/46.0.2490.80'
}
r = requests.get('https://www.instagram.com/p/CJDxE7Yp5Oj/?__a=1', headers=headers)
data = r.json()

Python execute a request with multiple URLs from a list

I'm new to python.
I've made a list with URLs and I want to do urllib.request for all the URLs inside the list. My list currently has 5 URLs however I can only request one index at a time urlib.Request(List[0]) and if I do urlib.Request(List[0:4]) I'm getting an error
Traceback (most recent call last):
File "c:/Users/Farzad/Desktop/Python/Webscraping/Responseheaderinfo.py", line 22, in <module>
response = urllib.urlopen(request)
File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 548, in _open
'unknown_open', req)
File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 1387, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: ['http>
import urllib.request as urllib
import socket
import pyodbc
from datetime import datetime
import ssl
import OpenSSL
List = open("C:\\Users\\Farzad\\Desktop\\hosts.txt").read().splitlines()
length = len(List)
for i in range(length):
print(List)
request = urllib.Request(List[0])
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36')
response = urllib.urlopen(request)
rdata = response.info()
ipaddr = socket.gethostbyname(request.origin_req_host)
The code could be as the follows:
import urllib.request as urllib
import socket
import pyodbc
from datetime import datetime
import ssl
import OpenSSL
import logging
from celery.app.log import Logging
List = open("C:\\Users\\Farzad\\Desktop\\hosts.txt").read().splitlines()
length = len(List)
for url in List:
print(url)
try:
request = urllib.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36')
response = urllib.urlopen(request)
rdata = response.info()
ipaddr = socket.gethostbyname(request.origin_req_host)
except Exception as e:
print(logging.traceback.format_exc())

Categories

Resources