python requests module with redirect - python

I am trying to perform a get request in python using the requests module. However, before I can do a get the website redirects me to a login page. I need to login first which will then land to me to the page I am requesting.
Following is the content I receive after doing the get. How should I perform the login in order to access the page I am looking for? Any help would be appreciated!
<form action="/idp/profile/SAML2/Redirect/SSO?execution=e1s1" method="post">
<div class="form-element-wrapper">
<label for="username">Username</label>
<input class="form-element form-field" id="username" name="j_username" type="text" value="">
</div>
<div class="form-element-wrapper">
<label for="password">Password</label>
<input class="form-element form-field" id="password" name="j_password" type="password" value="******">
</div>
<div class="form-element-wrapper">
<input type="checkbox" name="donotcache" value="1">Don't Remember Login </div>
<div class="form-element-wrapper">
<input id="_shib_idp_revokeConsent" type="checkbox" name="_shib_idp_revokeConsent" value="true">
Clear prior granting of permission for release of your information to this service.
</div>
<div class="form-element-wrapper">
<button class="form-element form-button" type="submit" name="_eventId_proceed"
onClick="this.childNodes[0].nodeValue='Logging in, please wait...'">Login</button>
</div>
</form>
Following is the code I have written until now:
values = {'j_username':'****'}
with requests.Session() as s:
p = s.get(url,verify=False)
logger.info(p.text)

values = {'j_username':'****'}
with requests.Session() as session:
login_response = session.post(login_url, data=data, verify=False)
# the session will now have the session cookie, so subsequent requests will be authenticated. its worth inspecting the response to make sure it is the correct status code.
other_response = session.get(url) # expect this not to redirect to login page

Related

How to transmit form-data without POST with python requests?

I'm writing a little piece of python code to login on several "Investing"-Websites of mine and get out the current amount of money invested. I'm using pythons requests library and analyze the html-source to identify the form and the fields to fill in.
So, a form may look like this:
<form class="onboarding-form" id="loginForm" action="https://estateguru.co/portal/login/authenticate" method="post" data-redirect="https://estateguru.co/portal/home">
<div class="row">
<div class="col-md-6">
<div class="form-group">
<input type="text" class="form-control main-input" name="username">
<label class="bmd-label-floating main-label">E-Mail</label>
<em id="username-error" class="error bmd-help help-block" style="display:none;">This field is required.</em>
</div>
</div>
<div class="col-md-6">
<div class="form-group">
<input type="password" class="form-control main-input login-pass" name="password">
<label class="bmd-label-floating main-label long-label">Passwort (Mindestens 8 Zeichen)</label>
<em id="password-error" class="error bmd-help help-block" style="display:none;">This field is required.</em>
<i class="zmdi zmdi-eye"></i>
</div>
</div>
</div>
In this case, my code looks like this:
import requests
_username = 'xxx'
_password = 'yyy'
loginUrl = 'https://estateguru.co/portal/login/authenticate'
readUrl = 'https://estateguru.co/portal/portfolio/overview'
with requests.session() as s:
payload = {"username": _username, "password": _password}
final = s.post(loginUrl, data = payload)
result = s.get(readUrl)
print(result)
This works like a charm for many websites! But now i got an website without the "method=post" in the form, so i don't know hot to transmit the form-data.
The html-part (from http://www.reinvest24.com/en/login) looks like this:
<form>
<div class="form-group">
<input type="text" id="email" placeholder="Email" value="" name="email" maxLength="100" class="form-control"/>
</div>
<div class="form-group">
<input type="password" id="password" placeholder="Password" value="" name="password" maxLength="100" class="form-control"/>
</div>
<p class="forgot text-right">
<span>Forgot password?</span>
</p>
<input type="submit" class="btn btn-success" value="Login"/>
<p class="reg text-center">
<span>Don't have an account?</span>
<a href="/en/registration">
<span>Sign up</span></a>
</p>
</form>
So without the method clarified, I tried
final = s.get(loginUrl, data = payload)
but without success. The result in both cases is a html-output saying something about "Loading authorization details...".
So my question is: Am i missing the right method (POST/GET) to transmit data or am i missing some other parameter? Some websites require a session-token, which I retrieve from the login-site itself (like it's the case in https://www.mintos.com/de/login), but in my opinion, this is not the problem here.
By default, the body (form data) of a HTTP request is ignored when the method is set to GET. Therefore you shouldn't try to submit the request via GET (not only is it not safe to transmit sensitive info over GET, the server would just ignore the username/password of your request).
The issue here is that the page is doing some JavaScript magic to submit your request over a different URL. Open up your web inspector and watch the "network" tab whenever you try to login on that website. You should see that the request is being POSTed to https://api-frontend.reinvest24.com/graphql.
When we inspect this POST request, we can see that the data is being transmitted as a JSON body, not a form body. So your request should look something along the lines of this:
login_url = 'https://api-frontend.reinvest24.com/graphql'
payload = {
"operationName": "login",
"variables": {
"email": EMAIL,
"password": PASSWORD
},
"query": "mutation login($email: String!, $password: String!) {\n login(email: $email, password: $password)\n}\n"
}
r = s.post(url=login_url, json=payload)
# note that we used the 'json' parameter here not 'data'
Chrome web inspector is your friend here to observe how data is transmitted when logging in.
Good luck!

Python - Login to Website

I know that there are several posts on this subject and I believe I have read a significant amount of them, however I still can't login to this website.
Below is my inspection of the login page:
<form id="login" name="login" method="POST" action="/signin">
<div id="login_username">
<label>Email</label>
<input class="textfield" id="email" name="email" type="text" autocomplete="off" value="">
</div>
<div id="login_password">
<label>Password</label>
<input class="textfield" id="password" name="password"
type="password" autocomplete="off" value="">
</div>
<input type="hidden" id="hash" name="hash" value="">
<div id="login_submit">
<a id="forgot_password_link">Forgot Password?</a>
<input class="submitbutton" type="submit" value="Sign In">
</div>
</form>
Below is my code:
username = 'XXXXX#gmail.com'
password = 'XXXX'
hash = ''
data = {'password':password, 'email':username,'hash':hash}
login_url = "https://carmel.orangetheoryfitness.com/login"
s = requests.session()
result = s.post(login_url, data=data, headers = dict(referer=login_url))
scrape_url = 'https://carmel.orangetheoryfitness.com/apps/otf/classes/view?id=16297&loc=0'
result = s.get(url=scrape_url)
From here I go on to search the html document but I'm not finding what I want as I am sent back to the login page when getting the scrape_url. I have verified this by inspecting the resulting html document.
Things I have considered:
-Almost all blog posts or SO responses indicate that usually a CSRF token is required. I have searched the login page and can't find a CSRF token.
The form has an action="/signin" attribute so you need to post to https://carmel.orangetheoryfitness.com/signin instead.
result = s.post('https://carmel.orangetheoryfitness.com/signin', data=data, headers = dict(referer=login_url))

Python: Trying to loggin with requests and perform a HTTP request

I am trying to loggin to my account using the following python code without success. The login-process is in two steps on two pages. First enter login, second enter password. I am using Python3:
from bs4 import BeautifulSoup
import requests, lxml.html
with requests.Session() as s:
#First login page
login = s.get('https://accounts.ft.com/login')
login_html = lxml.html.fromstring(login.text)
#getting the form inputs
hidden_inputs = login_html.xpath(r'//form//input')
form = {x.name: x.value for x in hidden_inputs}
#filling inputs with email
form['email'] = 'me#mail.com'
response = s.post('https://accounts.ft.com/login', data=form)
# Receive reponse 200
#Second login page
login_html = lxml.html.fromstring(response.text)
#getting inputs
hidden_inputs = login_html.xpath(r'//form//input')
form = {x.name: x.value for x in hidden_inputs}
#filling inputs with email and password
form['email'] = 'me#mail.com'
form['password'] = 'p****word'
response = s.post('https://accounts.ft.com/login', data=form)
#Receive reponse 200
#Trying to read an article being loggedIn
page = s.get('https://www.ft.com/content/173695cc-1a98-11e7-a266-12672483791a')
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
# data-next-is-logged-in="false" => Please Register to read this page...
Here is what the Form looks like:
<div class="js-container" data-component="two-step-login-form" id="content">
<div class="lgn-box">
<form action="/login/submitEmail" class="js-email-lookup-form" data-test-id="enter-email-form" method="POST" name="enter-email-form" novalidate="">
<input name="location" type="hidden" value="" />
<input name="continueUrl" type="hidden" value="" />
<input name="readerId" type="hidden" value="" />
<input name="loginUrl" type="hidden" value="/login" />
<div class="lgn-box__title">
<h1 class="lgn-heading--alpha">
Sign in
</h1>
</div>
<div class="o-forms-group">
<label class="o-forms-label" for="email">
Email address
</label>
<input autocomplete="off" autofocus="" class="o-forms-text js-email" id="email" maxlength="64" name="email" required="" type="email">
<input id="password" name="password" style="display:none" type="password">
<label for="password">
</label>
</input>
</input>
</div>
<div class="o-forms-group">
<button class="o-buttons o-buttons--standout o-buttons--big" name="Next" type="submit">
Next
</button>
</div>
</form>
</div>
Here is what my data passed to POST looks like:
form
{'password': 'p****word', 'continueUrl': '', 'loginUrl': '/login', 'email': 'me#mail.com', 'readerId': '', 'location': ''}
The POST request returns for both 1st and 2nd loggin page a 200 response. But it seems that I am still not logged in.
I have tried using http://accounts.ft.com/sso/redirects?email=me#mail.com as a URL for POST request, returning a 405 Bad Request error
I am not sure that I am actually not logged in, bud I have no idea how to monitor that.
Is it possible that the website prevents me from logging-in if not in a web-browser?
Try using selenium to simulate the web browser as it appears that FT blocks automated access.
Alternatively you can see if a site has been archived with something like archive.is (which will pull most sites into a more machine friendly setup).
Finally, there is both a datamining API and a headline API that the FT offers at their developer page

400 Bad Request Error on POST request with Flask

I am attempting to return data from a HTML form with a POST request using Flask. For some reason I get a 400 Bad Request error. Looking at Chrome Dev Tools, I can see that all of the form fields with the input tag are part of the post request. The select tag with the dropwdown list is not being captured and I think this is creating the error. Anyone know why the select tag is not being captured in the post request? Any help much appreciated.
Here is the HTML form:
<label for="vendor">Select Vendor</label>
<div class="flextable p-b" style="padd">
<div class="flextable-item">
<select class="selectpicker" data-live-search="true" form="addInvoice" name="vendor" id="vendor">
<option>Jack Jaffa & Associates</option>
<option>Jacobs/Doland/Beer LLC</option>
<option>Jenkins & Huntington Inc.</option>
<option>Joseph J. Blake & Associates, Inc.</option>
<option>Langan (Geotechnical)</option>
<option>Madison Realty Capital</option>
<option>McNamara Salvia, Inc</option>
<option>Metropolis Group, Inc</option>
<option>National Grid</option>
</select>
</div>
<div class="flextable-item">
<button type="button" class="btn btn-xs btn-primary-outline">Add vendor</button>
</div>
<label for="invoice_number">Invoice Number:</label>
<input type="text" class="form-control p-b" placeholder="Every vendor invoice # must be unique" name="invoice_number" id="invoice_number">
<label for="invoice_amount">Amount:</label>
<input type="text" class="form-control p-b" placeholder="$0.00" name="invoice_amount" id="invoice_amount">
<label for="invoice_amount">Description:</label>
<input type="text" class="form-control p-b" placeholder="$0.00" width="100%" name="description" id="description">
<div class="spacer"></div>
<div class="flextable">
<div class="flextable-item">
<label for="date_received">Date received:</label>
</div>
<div>
<div class="flextable-item">
<div class="input-group">
<span class="input-group-addon">
<span class="icon icon-calendar"></span>
</span>
<input type="text" value="01/01/2015" class="form-control" data-provide="datepicker" style="width: 200px;" name="date_received" id="date_received">
</div>
</div>
</div>
</div>
</div>
<div class="modal-actions p-t-lg">
<button type="button" class="btn-link modal-action" data-dismiss="modal">Cancel</button>
<button type="submit" class="btn-link modal-action" id="submit" >
<strong>Save + Continue</strong>
</button>
</div>
</form>
Here is the Flask python route:
#app.route('/add_invoice', methods=['GET', 'POST'])
def add_invoice():
""" Method for capturing form data to add invoice items to database"""
if request.method == 'POST':
find_cost_code = 7777 # eventually need code to lookup cost-code from POST request
print request.form['invoice_number']
print request.form['invoice_amount']
print request.form['description']
print request.form['vendor']
print request.form['date_received']
return "This is a test"
ADDED INFO:
So if I remove this line, the bad request error goes away:
print request.form['vendor']
This is because the "vendor" field is the only one in the html form that uses a select tag for input and that data is not catpured in the post request dictionary (which i can see in Chrome Dev Tool). The POST request is missing the field associated with the select tag. Not sure how to capture the select tag in the form data...
I've faced this issue many time when dealing with flask form, I think the solution is to enable csrf token protection :
according to this you need to initialise and enable it for you app!
Most of time error 400 is due to missing CSRF token.
you can do it as :
from flask_wtf.csrf import CSRFProtect
csrf = CSRFProtect(app)
for initialisation
and in your form add:
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}"/>
hope it will help!!
feel free to edit.
form="addInvoice"
This attribute as part of my HTML form markup was what caused the error. I'm not sure exactly why but when it is removed, the error goes away.
(Thanks for your help above in trying to look into this.)
You need getlist to capture select.

Python 3 script for logging into a website using the Requests module

I'm trying to write some Python (3.3.2) code to log in to a website using the Requests module. Here is the form section of the login page:
<form method="post" action="https://www.ibvpn.com/billing/dologin.php" name="frmlogin">
<input type="hidden" name="token" value="236647d2da7c8408ceb78178ba03876ea1f2b687" />
<div class="logincontainer">
<fieldset>
<div class="clearfix">
<label for="username">Email Address:</label>
<div class="input">
<input class="xlarge" name="username" id="username" type="text" />
</div>
</div>
<div class="clearfix">
<label for="password">Password:</label>
<div class="input">
<input class="xlarge" name="password" id="password" type="password"/>
</div>
</div>
<div align="center">
<p>
<input type="checkbox" name="rememberme" /> Remember Me
</p>
<p>Request a Password Reset</p>
</div>
</fieldset>
</div>
<div class="actions">
<input type="submit" class="btn primary" value="Login" />
</div>
</form>
Here is my code, trying to deal with hidden input:
import requests
from bs4 import BeautifulSoup
url = 'https://www.ibvpn.com/billing/clientarea.php'
body = {'username':'my email address','password':'my password'}
s = requests.Session()
loginPage = s.get(url)
soup = BeautifulSoup(loginPage.text)
hiddenInputs = soup.findAll(name = 'input', type = 'hidden')
for hidden in hiddenInputs:
name = hidden['name']
value = hidden['value']
body[name] = value
r = s.post(url, data = body)
This just returns the login page. If I post my login data to the URL in the 'action' field, I get a 404 error.
I've seen other posts on StackExchange where automatic cookie handling doesn't seem to work, so I've also tried dealing with the cookies manually using:
cookies = dict(loginPage.cookies)
r = s.post(url, data = body, cookies = cookies)
But this also just returns the login page.
I don't know if this is related to the problem, but after I've run either variant of the code above, entering r.cookies returns <<class 'requests.cookies.RequestsCookieJar'>[]>
If anyone has any suggestions, I'd love to hear them.
You are loading the wrong URL. The form has an action attribute:
<form method="post" action="https://www.ibvpn.com/billing/dologin.php" name="frmlogin">
so you must post your login information to:
https://www.ibvpn.com/billing/dologin.php
instead of posting back to the login page. POST to soup.form['action'] instead:
r = s.post(soup.form['action'], data=body)
Your code is handling cookies just fine; I can see that s.cookies holds a cookie after requesting the login form, for example.
If this still doesn't work (a 404 is returned), then the server is using additional techniques to detect scripts vs. real browsers. Usually this is done by parsing the request headers. Look at your browser headers and replicate those. It may just be the User-Agent header that they parse, but Accept-* headers and Referrer can also play a role.

Categories

Resources