I'm using OneDrive Python SKD in order to handle authentication with OneDrive SDK. The authentication is done as:
import onedrivesdk
from onedrivesdk.helpers import GetAuthCodeServer
redirect_uri = "http://localhost:8080/"
client_secret = "your_app_secret"
client = onedrivesdk.get_default_client(client_id='your_client_id',
scopes=['wl.signin',
'wl.offline_access',
'onedrive.readwrite'])
auth_url = client.auth_provider.get_auth_url(redirect_uri)
#this will block until we have the code
code = GetAuthCodeServer.get_auth_code(auth_url, redirect_uri)
client.auth_provider.authenticate(code, redirect_uri, client_secret)
However since I use a EC2 instance to run this authentication, and furthermore I don't want to utilize a browser just for that, the code blocks indefinitely. Here's the get_auth_code from Microsoft:
def get_auth_code(auth_url, redirect_uri):
"""Easy way to get the auth code. Wraps up all the threading
and stuff. Does block main thread.
Args:
auth_url (str): URL of auth server
redirect_uri (str): Redirect URI, as set for the app. Should be
something like "http://localhost:8080" for this to work.
Returns:
str: A string representing the auth code, sent back by the server
"""
HOST, PORT = urlparse(redirect_uri).netloc.split(':')
PORT = int(PORT)
# Set up HTTP server and thread
code_acquired = threading.Event()
s = GetAuthCodeServer((HOST, PORT), code_acquired, GetAuthCodeRequestHandler)
th = threading.Thread(target=s.serve_forever)
th.start()
webbrowser.open(auth_url)
# At this point the browser will open and the code
# will be extracted by the server
code_acquired.wait() # First wait for the response from the auth server
code = s.auth_code
s.shutdown()
th.join()
return code
I want to return the code. Here's a sample of auth_url:
https://login.live.com/oauth20_authorize.srf?scope=wl.offline_access+onedrive.readwrite&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2F&response_type=code&client_id='your_client_id'
When I enter that URL in the browser, I get the code back:
http://localhost:8080/?code=Mb0bba7d1-adbc-9c1d-f790-3709cd0b9f16
SO I want to avoid that cumbersome process to get the code back by using requests. How can I accomplish that?
I know this is an old question, but I was struggling with the same problem - I wanted to get the code using requests library. I managed to do it somehow, but I'm doubting that this isn't a very sustainable solution. Hopefully, after reading my solution you will understand better how the authentication works and you might find an improved solution.
I have a Python Flask app with mySQL database. Occasionally, I want to create a backup of the database and send the backup file to my OneDrive, plus I want to fire this process inside my Flask App.
First, I registered my App at Microsoft Application Registration Portal and added a new platform Web with redirect url http://localhost:8080/signin-microsoft. I gave the app read and write permissions and stored the Application Id (client_id) and Application Secret (client_secret).
Second, I added a new route to my Flask App. Note that my Flask App is running on localhost:8080.
#app.route("/signin-microsoft", methods=['GET'])
def get_code():
return 'Yadda'
Third, I replicated the HTTP request header created by my browser to my requests.get call. That is, I opened Chrome, pasted auth_url to address bar, hit enter, inspected the request header and copied its content to my code.
r = requests.get(auth_url,
headers = {"Host" : "login.live.com",
"Connection" : "keep-alive",
"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding" : "gzip, deflate, br",
"Upgrade-Insecure-Requests" : "1",
"Accept-Language" : "fi-FI,fi;q=0.9,en-US;q=0.8,en;q=0.7",
"User-agent" : "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
"Cookie": (SUPER LONG WONT PASTE HERE)})
Fourth, I parsed the code from url the request was redirected.
re_url = r.url
code = re_url.split('code=')[-1]
Here is the final code:
redirect_uri = 'http://localhost:8080/signin-microsoft'
client_secret = CLIENT_SECRET
client_id = CLIENT_ID
api_base_url='https://api.onedrive.com/v1.0/'
scopes=['wl.signin', 'wl.offline_access', 'onedrive.readwrite']
http_provider = onedrivesdk.HttpProvider()
auth_provider = onedrivesdk.AuthProvider(
http_provider = http_provider, client_id=client_id, scopes=scopes)
client = onedrivesdk.OneDriveClient(api_base_url, auth_provider, http_provider)
auth_url = client.auth_provider.get_auth_url(redirect_uri)
r = requests.get(auth_url,
headers = {"Host" : "login.live.com",
"Connection" : "keep-alive",
"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding" : "gzip, deflate, br",
"Upgrade-Insecure-Requests" : "1",
"Accept-Language" : "fi-FI,fi;q=0.9,en-US;q=0.8,en;q=0.7",
"User-agent" : "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
"Cookie": (SUPER LOOONG)})
re_url = r.url
code = re_url.split('code=')[-1]
client.auth_provider.authenticate(code, redirect_uri, client_secret)
I think that there are two main points here: You need a HTTP server that listens redirect uri (in Microsoft's example they used HTTPServer from http.server) and you need to get the headers of the request right. Without the headers, the request won't redirect correctly and you won't get the code!
Related
I'm trying to scrape data from a dynamic website using Python requests.
I've had a look through the network requests in developer tools and found the URL the website sends GET requests to in order to access the required data:
When a request is made to the website's API it returns a cookie (via the Set-Cookie header) which I believe the browser then uses in future GET requests to access the data. Here is a screenshot of the request and response headers when the page is first loaded and all previous cookies have been removed:
When I load the request URL directly in my browser it works fine (it's able to acquire a valid cookie from the website and load the data). But when I send a GET request to that same URL via the Python requests module, the cookie returned doesn't seem to be working (I'm getting a 403 - Forbidden error).
My Python code:
import requests
session = requests.Session()
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-GB,en;q=0.9",
"Host": "www.oddschecker.com",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.33",
}
response = session.get(url, headers=headers)
# Currently returning a 403 error unless I add the cookie from my browser as a header.
I believe the cookie is the issue because when I instead take the cookie generated by the browser and use that as a header in my Python program it is then able to return the desired information (until the cookie expires.)
My goal is for the Python program to be able to acquire a working cookie from this website automatically so it can successfully send requests and gather data.
1 - The target DOMAIN is https://www.dnb.com/
This website is blocking access to it from many countries around the world including mine (Algeria).
So the known solution is clear (use a proxy), which I did.
2 - Configuring the system proxy in the network configuration, and connecting to the website via (Google Chrome) works, also using Firefox with the proxy settings works fine.
3 - I came to my code to start the job
import requests
# 1. Initialize the proxy
proxy = "xxx.xxx.xxx.xxx:3128"
# 2. Setting the Headers (I cloned Firefox request headers)
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep - alive",
"Accept": "text/html, application/xhtml+xml, application/xml;q=0.9, image/webp, */*;q = 0.8",
"Upgrade - Insecure - Requests": "1",
"Host": "www.dnb.com",
"DNT": "1"
}
# 3. URL
URL = "https://www.dnb.com/business-directory/company-profiles.bicicletas_monark_s-a.7ad1f8788ea84850ceef11444c425a52.html"
# 4. Make a get request.
r = requests.get(URL, headers=headers, proxies={"https": proxy})
# Nothing in return and program keep executing (like infinite loop).
Note:
I know this keeps on waiting because the default timeout is set to None, but it is sure that the setup is working, and the requests library must return a response, using the timeout here can be to assess the reliability of the proxy as an example.
So, What the cause for this, it stuck (and I'm also), I'm getting the response and the correct HTML content with (Firefox, Chrome, Postman) with the same configuration.
I checked your code and ran it on my local machine. It seems the issue is with proxy. I added a public proxy and it is working. You can confirm it by adding a "timeout" argument to the requests.get function to some seconds. Also if the code working properly(even the response is 403) it means there is an issue with the proxy.
My Django website is hosted using Apache server. I want to send data using requests.post to my website using a python script on my pc but It is giving 403 forbidden.
import json
url = "http://54.161.205.225/Project/devicecapture"
headers = {'User-Agent':
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
'content-type': 'application/json'}
data = {
"nb_imsi":"test API",
"tmsi1":"test",
"tmsi2":"test",
"imsi":"test API",
"country":"USA",
"brand":"Vodafone",
"operator":"test",
"mcc":"aa",
"mnc":"jhj",
"lac":"hfhf",
"cellIid":"test cell"
}
response = requests.post(url, data =json.dumps(data),headers=headers)
print(response.status_code)
I have also given permission to the directory containing the views.py where this request will go.
I have gone through many other answers but they didn't help.
I have tried the code without json.dumps also but it isn't working with that also.
How to resolve this?
After investigating it looks like the URL that you need to post to in order to login is: http://54.161.205.225/Project/accounts/login/?next=/Project/
You can work out what you need to send in a post request by looking in the Chrome DevTools, Network tab. This tells us that you need to send the fields username, password and csrfmiddlewaretoken, which you need to pull from the page.
You can get it by extracting it from the response of the first get request. It is stored on the page like this:
<input type="hidden" name="csrfmiddlewaretoken" value="OspZfYQscPMHXZ3inZ5Yy5HUPt52LTiARwVuAxpD6r4xbgyVu4wYbfpgYMxDgHta">
So you'll need to do some kind of Regex to get it. You'll work it out.
So first you have to create a session. Then load the login page with a get request. Then send a post request with your login credentials to that same URL. And then your session will gain the required cookies that will then allow you to post to your desired URL. This is an example below.
import requests
# Create session
session = requests.session()
# Add user-agent string
session.headers.update({'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) ' +
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'})
# Get login page
response = session.get('http://54.161.205.225/Project/accounts/login/?next=/Project/')
# Get csrf
# Do something to response.text
# Post to login
response = session.post('http://54.161.205.225/Project/accounts/login/?next=/Project/', data={
'username': 'example123',
'password': 'examplexamplexample',
'csrfmiddlewaretoken': 'something123123',
})
# Post desired data
response = session.post('http://url.url/other_page', data={
'data': 'something',
})
print(response.status_code)
Hopefully this should get you there. Good luck.
For more information check out this question on requests: Python Requests and persistent sessions
I faced that situation many times
The problems were :
54.161.205.225 is not added to allowed hosts in settings.py
the apache wsgi is not correctly configured
things might help with debug :
Check apache error-logs to investigate what went wrong
try running server locally and post to it to make sure prob is not related to apache
I'm trying to log in to a website from this URL: "https://pollev.com/login". Since I'm using a school email, the portal redirects to the school's login portal and uses that portal to authenticate the login. It shows up when you type in a uw.edu email (example: myname#uw.edu). After logging in, UW sends a POST request callback to https://www.polleverywhere.com/auth/washington/callback with a SAMLResponse header like this. I think I need to simulate the GET request from pollev's login page and then send the login headers to the UW login page, but what I'm doing right now isn't working.
Here's my code:
import requests
with requests.session() as s:
header_data = {
'user - agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
'referer': 'https://pollev.com/login'
}
login_data = {
'j_username' : 'username',
'j_password' : 'password',
'_eventId_proceed' : 'Sign in'
}
r = s.get('https://idp.u.washington.edu/idp/profile/SAML2/Redirect/SSO?execution=e2s1',
headers=header_data, data=login_data)
print(r.text)
Right now, r.text shows a NoSuchFlowExecutionException html page. What am I missing? Logging into the website normally requires a login, password, Referrer, and X-CSRF token which I was able to do, but I don't know how to navigate a redirect for authentication.
Old question but I had nearly identical needs and carried on until I solved it. In my case, which may still be the case of the OP, I have the required credentials. I am certain this could be made more efficient / pythonic and would greatly appreciate those tips / corrections.
import re
import requests
# start HTTP request session
s = requests.Session()
# Prepare for first request - This is the ultimate target URL
url1 = '/URL/needing/shibbolethSAML/authentication'
header_data = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}
# Make first request
r1 = s.get(url1, headers = header_data)
# Prepare for second request - extract URL action for next POST from response, append header, and add login credentials
ss1 = re.search('action="', r1.text)
ss2 = re.search('" autocomplete', r1.text)
url2 = 'https://idp.u.washington.edu' + r1.text[ss1.span(0)[1]:ss2.span(0)[0]]
header_data.update({'Accept-Encoding': 'gzip, deflate, br', 'Content-Type': 'application/x-www-form-urlencoded'})
cred = {'j_username': 'username', 'j_password':'password', '_eventId_proceed' : 'Sign in'}
# Make second request
r2 = s.post(url2, data = cred)
# Prepare for third request - format and extract URL, RelayState, and SAMLResponse
ss3 = re.search('<form action="',r2.text) # expect only one instance of this pattern in string
ss4 = re.search('" method="post">',r2.text) # expect only one instance of this pattern in string
url3 = r2.text[ss3.span(0)[1]:ss4.span(0)[0]].replace(':',':').replace('/','/')
ss4 = re.search('name="RelayState" value="', r2.text) # expect only one instance of this pattern in string
ss5 = re.search('"/>', r2.text)
relaystate_value = r2.text[ss4.span(0)[1]:ss5.span(0)[0]].replace(':',':')
ss6 = re.search('name="SAMLResponse" value="', r2.text)
ss7 = [m.span for m in re.finditer('"/>',r2.text)] # expect multiple matches with the second match being desired
saml_value = r2.text[ss6.span(0)[1]:ss7[1](0)[0]]
data = {'RelayState': relaystate_value, 'SAMLResponse': [saml_value, 'Continue']}
header_data.update({'Host': 'training.ehs.washington.edu', 'Referer': 'https://idp.u.washington.edu/', 'Connection': 'keep-alive'})
# Make third request
r3 = s.post(url3, headers=header_data, data = data)
# You should now be at the intended URL
You're not going to be successful faking out SAML2 SSO. The identity provider (IdP) at UW is looking to support an authentication request from the service provider (SP) polleverywhere.com. Part of that is verifying the request actually originated from polleverywhere. This could be as simple has requiring SSL connection from polleverywhere, it could be as complicated as requiring an encrypted & signed authentication request. Since you don't have those credentials, the resulting response isn't going to be readable. SPs are registered with IdPs.
Now, there may be a different way to sign into polleverywhere -- a different URL which will not trigger an SSO request, but that might be network restricted or require other difficult authentication.
I am trying to make a http request using requests library to the redirect url (in response headers-Location). When using Chrome inspection, I can see the response status is 302.
However, in python, requests always returns a 200 status. I added the allow_redirects=False, but the status is still always 200.
The url is https://api.weibo.com/oauth2/authorize?redirect_uri=http%3A//oauth.weico.cc&response_type=code&client_id=211160679
the first line entered the test account: moyan429#hotmail.com
the second line entered the password: 112358
and then click the first button to login.
My Python code:
import requests
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36'
session = requests.session()
session.headers['User-Agent'] = user_agent
session.headers['Host'] = 'api.weibo.com'
session.headers['Origin']='https://api.weibo.com'
session.headers['Referer'] ='https://api.weibo.com/oauth2/authorize?redirect_uri=http%3A//oauth.weico.cc&response_type=code&client_id=211160679'
session.headers['Connection']='keep-alive'
data = {
'client_id': api_key,
'redirect_uri': callback_url,
'userId':'moyan429#hotmail.com',
'passwd': '112358',
'switchLogin': '0',
'action': 'login',
'response_type': 'code',
'quick_auth': 'null'
}
resp = session.post(
url='https://api.weibo.com/oauth2/authorize',
data=data,
allow_redirects=False
)
code = resp.url[-32:]
print code
You are probably getting an API error message. Use print resp.text to see what the server tells you is wrong here.
Note that you can always inspect resp.history to see if there were any redirects; if there were any you'll find a list of response objects.
Do not set the Host or Connection headers; leave those to requests to handle. I doubt the Origin or Referer headers here needed either. Since this is an API, the User-Agent header is probably also overkill.