I want to send a value for "User-agent" while requesting a webpage using Python Requests. I am not sure is if it is okay to send this as a part of the header, as in the code below:
debug = {'verbose': sys.stderr}
user_agent = {'User-agent': 'Mozilla/5.0'}
response = requests.get(url, headers = user_agent, config=debug)
The debug information isn't showing the headers being sent during the request.
Is it acceptable to send this information in the header? If not, how can I send it?
The user-agent should be specified as a field in the header.
Here is a list of HTTP header fields, and you'd probably be interested in request-specific fields, which includes User-Agent.
If you're using requests v2.13 and newer
The simplest way to do what you want is to create a dictionary and specify your headers directly, like so:
import requests
url = 'SOME URL'
headers = {
'User-Agent': 'My User Agent 1.0',
'From': 'youremail#domain.example' # This is another valid field
}
response = requests.get(url, headers=headers)
If you're using requests v2.12.x and older
Older versions of requests clobbered default headers, so you'd want to do the following to preserve default headers and then add your own to them.
import requests
url = 'SOME URL'
# Get a copy of the default headers that requests would use
headers = requests.utils.default_headers()
# Update the headers with your custom ones
# You don't have to worry about case-sensitivity with
# the dictionary keys, because default_headers uses a custom
# CaseInsensitiveDict implementation within requests' source code.
headers.update(
{
'User-Agent': 'My User Agent 1.0',
}
)
response = requests.get(url, headers=headers)
It's more convenient to use a session, this way you don't have to remember to set headers each time:
session = requests.Session()
session.headers.update({'User-Agent': 'Custom user agent'})
session.get('https://httpbin.org/headers')
By default, session also manages cookies for you. In case you want to disable that, see this question.
It will send the request like browser
import requests
url = 'https://Your-url'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
response= requests.get(url.strip(), headers=headers, timeout=10)
Related
I want to land on the main (learning) page of my Duolingo profile but I am having a little trouble finding the correct way to sign into the website with my credentials using Python Requests.
I have tried making requests as well as I understood them but I am pretty much a noob in this so it has all went in vain thus far.
Help would be really appreciated!
This is what I was trying by my own means by the way:
#The Dictionary Keys/Values and the Post Request URL were taken from the Network Source code in Inspect on Google Chrome
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/81.0.4044.138 Safari/537.36'
}
login_data =
{
'identifier': 'something#email.com',
'password': 'myPassword'
}
with requests.Session() as s:
url = "https://www.duolingo.com/2017-06-30/login?fields="
s.post(url, headers = headers, params = login_data)
r = s.get("https://www.duolingo.com/learn")
print(r.content)
The post request receives the following content:
b'{"details": "Malformed JSON: No JSON object could be decoded", "error": "BAD_REQUEST_SCHEMA"}'
And since the login fails, the get request for the learn page receives this:
b'<html>\n <head>\n <title>401 Unauthorized</title>\n </head>\n <body>\n <h1>401
Unauthorized</h1>\n This server could not verify that you are authorized to access the document you
requested. Either you supplied the wrong credentials (e.g., bad password), or your browser does not
understand how to supply the credentials required.<br/><br/>\n\n\n\n </body>\n</html>'
Sorry if I am making any stupid mistakes. I do not know a lot about all this. Thanks!
If you inspect the POST request carefully you can see that:
accepted content type is application/json
there are more fields than you have supplied (distinctId, landingUrl)
the data is sent as a json request body and not url params
The only thing that you need to figure out is how to get distinctId then you can do the following:
EDIT:
Sending email/password as json body appears to be enough and there is no need to get distinctId, example:
import requests
import json
headers = {'content-type': 'application/json'}
data = {
'identifier': 'something#email.com',
'password': 'myPassword',
}
with requests.Session() as s:
url = "https://www.duolingo.com/2017-06-30/login?fields="
# use json.dumps to convert dict to serialized json string
s.post(url, headers=headers, data=json.dumps(data))
r = s.get("https://www.duolingo.com/learn")
print(r.content)
I try to login to the member area of the following website :
https://trader.degiro.nl/
Unfortunately, I tried many way without success.
The post form since to be a json it's the reason why I sent a json instead of the post data
import requests
session = requests.Session()
data = {"username":"test", "password":"test", "isRedirectToMobile": "false", "loginButtonUniversal": ""}
url = "https://trader.degiro.nl/login/#/login"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36'}
r = session.post(url, headers=headers, json={'json_payload': data})
Does any one have a idea why it doesn't work ?
Looking at the request my browser sends, the code should be:
url = "https://trader.degiro.nl/login/secure/login"
...
r = session.post(url, headers=headers, json=data)
That is, there's no need to wrap the data in json_payload and the url is slightly different to the one for viewing the login page.
I am trying to make a http request using requests library to the redirect url (in response headers-Location). When using Chrome inspection, I can see the response status is 302.
However, in python, requests always returns a 200 status. I added the allow_redirects=False, but the status is still always 200.
The url is https://api.weibo.com/oauth2/authorize?redirect_uri=http%3A//oauth.weico.cc&response_type=code&client_id=211160679
the first line entered the test account: moyan429#hotmail.com
the second line entered the password: 112358
and then click the first button to login.
My Python code:
import requests
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36'
session = requests.session()
session.headers['User-Agent'] = user_agent
session.headers['Host'] = 'api.weibo.com'
session.headers['Origin']='https://api.weibo.com'
session.headers['Referer'] ='https://api.weibo.com/oauth2/authorize?redirect_uri=http%3A//oauth.weico.cc&response_type=code&client_id=211160679'
session.headers['Connection']='keep-alive'
data = {
'client_id': api_key,
'redirect_uri': callback_url,
'userId':'moyan429#hotmail.com',
'passwd': '112358',
'switchLogin': '0',
'action': 'login',
'response_type': 'code',
'quick_auth': 'null'
}
resp = session.post(
url='https://api.weibo.com/oauth2/authorize',
data=data,
allow_redirects=False
)
code = resp.url[-32:]
print code
You are probably getting an API error message. Use print resp.text to see what the server tells you is wrong here.
Note that you can always inspect resp.history to see if there were any redirects; if there were any you'll find a list of response objects.
Do not set the Host or Connection headers; leave those to requests to handle. I doubt the Origin or Referer headers here needed either. Since this is an API, the User-Agent header is probably also overkill.
You are seeing this message because this HTTPS site requires a 'Referer
header' to be sent by your Web browser, but none was sent. This header is
required for security reasons, to ensure that your browser is not being
hijacked by third parties.
I was trying to login to a website using requests but received the error above, how do I create a 'Referer
header'?
payload = {'inUserName': 'xxx.com', 'inUserPass': 'xxxxxx'}
url = 'https:xxxxxx'
req=requests.post(url, data=payload)
print(req.text)
You can pass in headers you want to send on your request as a keyword argument to request.post:
payload = {'inUserName': 'xxx.com', 'inUserPass': 'xxxxxx'}
url = 'https:xxxxxx'
req=requests.post(url, data=payload, headers={'Referer': 'yourReferer')
print(req.text)
I guess you are using this library: http://docs.python-requests.org/en/latest/user/quickstart/
If this is the case you have to add a custom header Referer (see section Custom headers). The code would be something like this:
url = '...'
payload = ...
headers = {'Referer': 'https://...'}
r = requests.post(url, data=payload, headers=headers)
For more information on the referer see this wikipedia article: https://en.wikipedia.org/wiki/Referer
I was getting the Same error in Chrome. What I did was just disabled all my chrome extensions including ad blockers. Here after I reloaded the page from where i wanted to scrape the data and logged in once again and then in the code as #Stephan Kulla mentioned you need to add headers inside headers i added user agent, referer, referrer-policy, origin. all these you can get in from inspect sample where you will find a Network part..
add all those in header and try to login again using post it should work.(It worked for me)
ori = 'https:......'
login_route = 'login/....'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36' , 'origin':'https://www.screener.in', 'referer': '/login/','referrer-policy':'same-origin'}
s=requests.session()
csrf = s.get(ori+login_route).cookies['csrftoken']
payload = {
'username': 'xxxxxx',
'password': 'yyyyyyy',
'csrfmiddlewaretoken': csrf
}
login_req = s.post(ori+login_route,headers=header,data=payload)
I am testing some application where I send some POST requests, want to test the behavior of the application when some headers are missing in the request to verify that it generates the correct error codes.
For doing this, my code is as follows.
header = {'Content-type': 'application/json'}
data = "hello world"
request = urllib2.Request(url, data, header)
f = urllib2.urlopen(request)
response = f.read()
The problem is urllib2 adds it's own headers like Content-Length, Accept-Encoding when it sends the POST request, but I don't want urllib2 to add any more headers than the one I specified in the headers dict above, is there a way to do that, I tried setting the other headers I don't want to None, but they still go with those empty values as part of the request which I don't want.
The header takes a dictionary type, example below using a chrome user-agent. For all standard and some non-stranded header fields take a look here. You also need to encode your data with urllib not urllib2. This is all mention in the python documentation here
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()