python requests handle error 302? - python

I am trying to make a http request using requests library to the redirect url (in response headers-Location). When using Chrome inspection, I can see the response status is 302.
However, in python, requests always returns a 200 status. I added the allow_redirects=False, but the status is still always 200.
The url is https://api.weibo.com/oauth2/authorize?redirect_uri=http%3A//oauth.weico.cc&response_type=code&client_id=211160679
the first line entered the test account: moyan429#hotmail.com
the second line entered the password: 112358
and then click the first button to login.
My Python code:
import requests
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36'
session = requests.session()
session.headers['User-Agent'] = user_agent
session.headers['Host'] = 'api.weibo.com'
session.headers['Origin']='https://api.weibo.com'
session.headers['Referer'] ='https://api.weibo.com/oauth2/authorize?redirect_uri=http%3A//oauth.weico.cc&response_type=code&client_id=211160679'
session.headers['Connection']='keep-alive'
data = {
'client_id': api_key,
'redirect_uri': callback_url,
'userId':'moyan429#hotmail.com',
'passwd': '112358',
'switchLogin': '0',
'action': 'login',
'response_type': 'code',
'quick_auth': 'null'
}
resp = session.post(
url='https://api.weibo.com/oauth2/authorize',
data=data,
allow_redirects=False
)
code = resp.url[-32:]
print code

You are probably getting an API error message. Use print resp.text to see what the server tells you is wrong here.
Note that you can always inspect resp.history to see if there were any redirects; if there were any you'll find a list of response objects.
Do not set the Host or Connection headers; leave those to requests to handle. I doubt the Origin or Referer headers here needed either. Since this is an API, the User-Agent header is probably also overkill.

Related

Error 'Unexpected HTTP code on the target page', 'status_code': 403 when I try to request a json url with a proxy api

I'm trying to scrap this website https://triller.co/ , so I want to get information from profile pages like this https://triller.co/#warnermusicarg , what I do is trying to request the json url that contains the information, in this case it's https://social.triller.co/v1.5/api/users/by_username/warnermusicarg
When I use requests.get() it works normally and I can retrieve all the information.
import requests
import urllib.parse
from urllib.parse import urlencode
url = 'https://social.triller.co/v1.5/api/users/by_username/warnermusicarg'
headers = {'authority':'social.triller.co',
'method':'GET',
'path':'/v1.5/api/users/by_username/warnermusicarg',
'scheme':'https',
'accept':'*/*',
'accept-encoding':'gzip, deflate, br',
'accept-language':'ar,en-US;q=0.9,en;q=0.8',
'authorization': 'Bearer eyJhbGciOiJIUzI1NiIsImlhdCI6MTY0MDc4MDc5NSwiZXhwIjoxNjkyNjIwNzk1fQ.eyJpZCI6IjUyNjQ3ODY5OCJ9.Ds-acbfcGSeUrGDSs47pBiT3b13Eb9SMcB8BF8OylqQ',
'origin':'https://triller.co',
'sec-ch-ua':'" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
'sec-ch-ua-mobile':'?0',
'sec-ch-ua-platform':'"Windows"',
'sec-fetch-dest':'empty',
'sec-fetch-mode':'cors',
'sec-fetch-site':'same-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
response = requests.get(url, headers=headers)
The problem arises when I try to use an API proxy providers as Webscraping.ai, ScrapingBee, etc
api_key='my_api_key'
api_url='https://api.webscraping.ai/html?'
params = {'api_key': api_key, 'timeout': '20000', 'url':url}
proxy_url = api_url + urlencode(params)
response2 = requests.get(proxy_url, headers=headers)
This gives me this error
2022-01-08 22:30:59 [urllib3.connectionpool] DEBUG: https://api.webscraping.ai:443 "GET /html?api_key=my_api_key&timeout=20000&url=https%3A%2F%2Fsocial.triller.co%2Fv1.5%2Fapi%2Fusers%2Fby_username%2Fwarnermusicarg&render_js=false HTTP/1.1" 502 91
{'status_code': 403, 'status_message': '', 'message': 'Unexpected HTTP code on the target page'}
What I tried to do is:
1- I searched for the meaning of 403 code in the documentation of my API proxy provider, it said that api_key is wrong, but I'm 100% sure it's correct,
Also, I changed to another API proxy provider but the same issue,
Also, I had the same issue with twitter.com
And I don't know what to do?
Currently, the code on the question successfully returns a response with code 200, but there are 2 possible issues:
Some sites block datacenter proxies, try to use proxy=residential API parameter (params = {'api_key': api_key, 'timeout': '20000', proxy: 'residential', 'url':url}).
Some of the headers on your headers parameter are unnecessary. Webscraping.AI uses its own set of headers to mimic the behaviors of normal browsers, so setting custom user-agent, accept-language, etc., may interfere with them and cause 403 responses from the target site. Use only the necessary headers. Looks like it will be only the authorization header in your case.
I don't know exactly what caused this error but I tried using their webscraping_ai.ApiClient() instance as in here and it worked,
configuration = webscraping_ai.Configuration(
host = "https://api.webscraping.ai",
api_key = {
'api_key': 'my_api_key'
}
)
with webscraping_ai.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = webscraping_ai.HTMLApi(api_client)
url_j = url # str | URL of the target page
headers = headers
timeout = 20000
js = False
proxy = 'datacenter'
api_response = api_instance.get_html(url_j, headers=headers, timeout=timeout, js=js, proxy=proxy)

Delete default header fields in Python Requests [duplicate]

I want to send a value for "User-agent" while requesting a webpage using Python Requests. I am not sure is if it is okay to send this as a part of the header, as in the code below:
debug = {'verbose': sys.stderr}
user_agent = {'User-agent': 'Mozilla/5.0'}
response = requests.get(url, headers = user_agent, config=debug)
The debug information isn't showing the headers being sent during the request.
Is it acceptable to send this information in the header? If not, how can I send it?
The user-agent should be specified as a field in the header.
Here is a list of HTTP header fields, and you'd probably be interested in request-specific fields, which includes User-Agent.
If you're using requests v2.13 and newer
The simplest way to do what you want is to create a dictionary and specify your headers directly, like so:
import requests
url = 'SOME URL'
headers = {
'User-Agent': 'My User Agent 1.0',
'From': 'youremail#domain.example' # This is another valid field
}
response = requests.get(url, headers=headers)
If you're using requests v2.12.x and older
Older versions of requests clobbered default headers, so you'd want to do the following to preserve default headers and then add your own to them.
import requests
url = 'SOME URL'
# Get a copy of the default headers that requests would use
headers = requests.utils.default_headers()
# Update the headers with your custom ones
# You don't have to worry about case-sensitivity with
# the dictionary keys, because default_headers uses a custom
# CaseInsensitiveDict implementation within requests' source code.
headers.update(
{
'User-Agent': 'My User Agent 1.0',
}
)
response = requests.get(url, headers=headers)
It's more convenient to use a session, this way you don't have to remember to set headers each time:
session = requests.Session()
session.headers.update({'User-Agent': 'Custom user agent'})
session.get('https://httpbin.org/headers')
By default, session also manages cookies for you. In case you want to disable that, see this question.
It will send the request like browser
import requests
url = 'https://Your-url'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
response= requests.get(url.strip(), headers=headers, timeout=10)

Python request returning IN-QUEUE status after posting

I've been trying to learn and understand how http requests are being made between the clients and server, so I've decided to test out by sending a simple post request to geeksforgeeks' online IDE. Using requests, I am able to get a response of the following,
{'status': 'SUCCESS', 'sid': 'df1acaffefacb25dfef2c7cf66022925'} using the following code,
import requests
import time
url = "https://ide.geeksforgeeks.org/main.php"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0'}
code = """
print("hello")
"""
data = {
'lang':'Python3',
'code': code,
'input':'0',
'save':'false'
}
r = requests.post(url, data = data, headers = headers)
print(r.json())
Next, I know that i have to resend back the sid parameter to https://ide.geeksforgeeks.org/submissionResult.php in order to get a successful response. However, when i entered the following code,
requesttype = {
'sid': r.json()['sid'] ,
'requestType': 'fetchResults'
}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0'}
url2 = "https://ide.geeksforgeeks.org/submissionResult.php"
session = requests.Session()
outcome = session.post(url2, data = requesttype, headers = headers)
It returns {'status': 'IN-QUEUE'}. Upon analyzing the networks tab in my browser, it seems as though the online IDE has a queue system that will only parse your incoming request server sided through a queue-like system. Hence I would like to know how to obtain a success response by "communicating" with their queue system.
A successful response looks like this,
{"valid":"1","output":"testing\n","time":"0.02","compResult":"S","memory":"0.125","hash":"79c7a40c6f6b36f1dfc23119f27ba66e_Tester_U16","status":"SUCCESS"}
My hunch is telling me that the website is detecting me as a bot and I should use something else like selenium perhaps.
When I run the code on my computer, it works just fine. I believe the problem could be something to do with the SSL certificate that you might want to initialize.

How can I use POST from requests module to login to Github?

I have tried logging into GitHub using the following code:
url = 'https://github.com/login'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'login':'username',
'password':'password',
'authenticity_token':'Token that keeps changing',
'commit':'Sign in',
'utf8':'%E2%9C%93'
}
res = requests.post(url)
print(res.text)
Now, res.text prints the code of login page. I understand that it maybe because the token keeps changing continuously. I have also tried setting the URL to https://github.com/session but that does not work either.
Can anyone tell me a way to generate the token. I am looking for a way to login without using the API. I had asked another question where I mentioned that I was unable to login. One comment said that I am not doing it right and it is possible to login just by using the requests module without the help of Github API.
ME:
So, can I log in to Facebook or Github using the POST method? I have tried that and it did not work.
THE USER:
Well, presumably you did something wrong
Can anyone please tell me what I did wrong?
After the suggestion about using sessions, I have updated my code:
s = requests.Session()
headers = {Same as above}
s.put('https://github.com/session', headers=headers)
r = s.get('https://github.com/')
print(r.text)
I still can't get past the login page.
I think you get back to the login page because you are redirected and since your code doesn't send back your cookies, you can't have a session.
You are looking for session persistance, requests provides it :
Session Objects The Session object allows you to persist certain
parameters across requests. It also persists cookies across all
requests made from the Session instance, and will use urllib3's
connection pooling. So if you're making several requests to the same
host, the underlying TCP connection will be reused, which can result
in a significant performance increase (see HTTP persistent
connection).
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
http://docs.python-requests.org/en/master/user/advanced/
Actually in post method the request parameters should be in request body, not in header.So the login data should be in data parameter.
For github, authenticity token is present in value attribute of an input tag which is extracted using BeautifulSoup library.
This code works fine
import requests
from getpass import getpass
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
login_data = {
'commit': 'Sign in',
'utf8': '%E2%9C%93',
'login': input('Username: '),
'password': getpass()
}
url = 'https://github.com/session'
session = requests.Session()
response = session.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html5lib')
login_data['authenticity_token'] = soup.find(
'input', attrs={'name': 'authenticity_token'})['value']
response = session.post(url, data=login_data, headers=headers)
print(response.status_code)
response = session.get('https://github.com', headers=headers)
print(response.text)
This code works perfectly
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
login_data = {
'commit': 'Sign in',
'utf8': '%E2%9C%93',
'login': 'your-username',
'password': 'your-password'
}
with requests.Session() as s:
url = "https://github.com/session"
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html5lib')
login_data['authenticity_token'] = soup.find('input', attrs={'name': 'authenticity_token'})['value']
r = s.post(url, data=login_data, headers=headers)
You can also try using the PyGitHub API to perform common git tasks.
Check the link below:
https://github.com/PyGithub/PyGithub

Error: HTTPS site requires a 'Referer header' to be sent by your Web browser, but none was sent

You are seeing this message because this HTTPS site requires a 'Referer
header' to be sent by your Web browser, but none was sent. This header is
required for security reasons, to ensure that your browser is not being
hijacked by third parties.
I was trying to login to a website using requests but received the error above, how do I create a 'Referer
header'?
payload = {'inUserName': 'xxx.com', 'inUserPass': 'xxxxxx'}
url = 'https:xxxxxx'
req=requests.post(url, data=payload)
print(req.text)
You can pass in headers you want to send on your request as a keyword argument to request.post:
payload = {'inUserName': 'xxx.com', 'inUserPass': 'xxxxxx'}
url = 'https:xxxxxx'
req=requests.post(url, data=payload, headers={'Referer': 'yourReferer')
print(req.text)
I guess you are using this library: http://docs.python-requests.org/en/latest/user/quickstart/
If this is the case you have to add a custom header Referer (see section Custom headers). The code would be something like this:
url = '...'
payload = ...
headers = {'Referer': 'https://...'}
r = requests.post(url, data=payload, headers=headers)
For more information on the referer see this wikipedia article: https://en.wikipedia.org/wiki/Referer
I was getting the Same error in Chrome. What I did was just disabled all my chrome extensions including ad blockers. Here after I reloaded the page from where i wanted to scrape the data and logged in once again and then in the code as #Stephan Kulla mentioned you need to add headers inside headers i added user agent, referer, referrer-policy, origin. all these you can get in from inspect sample where you will find a Network part..
add all those in header and try to login again using post it should work.(It worked for me)
ori = 'https:......'
login_route = 'login/....'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36' , 'origin':'https://www.screener.in', 'referer': '/login/','referrer-policy':'same-origin'}
s=requests.session()
csrf = s.get(ori+login_route).cookies['csrftoken']
payload = {
'username': 'xxxxxx',
'password': 'yyyyyyy',
'csrfmiddlewaretoken': csrf
}
login_req = s.post(ori+login_route,headers=header,data=payload)

Categories

Resources