python urllib form post - python

<div id="login-section">
<fieldset class="validation-group">
<table id="navLgnMember" cellspacing="0" cellpadding="0" style="border-collapse:collapse;">
<tr>
<td>
<div id="login-user">
<div class="input" id="username-wrapper">
<div class="loginfield-label">Number / ID / Email</div>
<div class="input-field-small float-left submit-on-enter"><div class="left"></div><input name="ctl00$ctl01$navLgnMember$Username" type="text" maxlength="80" id="Username" title="Username" class="center" style="width:85px;" /><div class="right"></div></div>
</div>
<div class="input" id="password-wrapper">
<div class="loginfield-label">
Password</div>
<div class="input-field-small float-left submit-on-enter"><div class="left"></div><input name="ctl00$ctl01$navLgnMember$Password" type="password" id="Password" title="Password" class="center" title="Password" style="width:85px;" /><div class="right"></div></div>
</div>
<div id="login-wrapper">
<input type="submit" name="ctl00$ctl01$navLgnMember$Login" value="" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("ctl00$ctl01$navLgnMember$Login", "", false, "", "https://tatts.com/tattersalls", false, false))" id="Login" class="button-login" />
</div>
how would one go about submitting to this form from urllib as the current code i have:
import cookielib
import urllib
import urllib2
# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]
# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)
# The action/ target from the form
authentication_url = 'https://tatts.com/tattersalls'
# Input parameters we are going to send
payload = {
'_EVENTTARGET:' ''
'__EVENTARGUMENT:' ''
'__VIEWSTATE:' '/wEPDwUKMTIwNzM2NDc5NQ9kFgICCBBkZBYCAgcPZBYGZg9kFgJmD2QWAmYPFgIeB1Zpc2libGVoZAIBD2QWAmYPZBYCZg8WAh8AaGQCAg9kFgJmD2QWAgIBD2QWAmYPZBYCZg9kFgYCAw9kFgICBw8WAh4FY2xhc3MFFmxhdGVzdFJlc3VsdHNCb2R5RGl2QmdkAgsPZBYCZg9kFgICBQ8WBB4JaW5uZXJodG1sBR8qRGl2IDEgJDFNIGZvciB1cCB0byA0IHdpbm5lcnMuHwBnZAIND2QWAmYPZBYCZg9kFgYCAQ8PFgIeBFRleHQFNVdobyB3b24gbW9uZXkgaW4gRHJvbWFuYT8gVGF0dHNMb3R0byBwcml6ZSB1bmNsYWltZWQhZGQCAw8PFgIfAwV5QSBkaXZpc2lvbiBvbmUgVGF0dHNMb3R0byBwcml6ZSBvZiAkODI5LDM2MS42OCB3YXMgd29uIGluIERyb21hbmEgaW4gU2F0dXJkYXkgbmlnaHTigJlzIGRyYXcgYnV0IHRoZSB3aW5uZXIgaXMgYSBteXN0ZXJ5IWRkAgUPDxYCHgtOYXZpZ2F0ZVVybAUbL3RhdHRlcnNhbGxzL3dpbm5lci1zdG9yaWVzZGRk40y89P1oSwLqvsMH4ZGTu9vsloo='
'__PREVIOUSPAGE:' 'PnGXOHeTQRfdct4aw9jgJ_Padml1ip-t05LAdAWQvBe5-2i1ECm5zC0umv9-PrWPJIXsvg9OvNT2PNp99srtKpWlE4J-6Qp1mICoT3eP49RSXSmN6p_XiieWS68YpbKqyBaJrkmYbJpZwCBw0Wq3tSD3JUc1'
'__EVENTVALIDATION:': '/wEdAAfZmOrHFYG4x80t+WWbtymCH/lQNl+1rLkmSESnowgyHVo7o54PGpUOvQpde1IkKS5gFTlJ0qDsO6vsTob8l0lt1XHRKk5WhaA0Ow6IEfhsMPG0mcjlqyEi79A1gbm2y9z5Vxn3bdCWWa28kcUm81miXWvi1mwhfxiUpcDlmGDs/3LMo4Y='
'ctl00$ctl01$showUpgradeReminderHid:' 'false'
'ctl00$ctl01$navLgnMember$Username:' 'x-tinct'
'ctl00$ctl01$navLgnMember$Password:' '########'
'ctl00$ctl01$navLgnMember$Login:'
}
# Use urllib to encode the payload
data = urllib.urlencode(payload)
# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)
# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()
print (resp)
is a fair way off submitting to the right part of the webfrom.
im trying to log in and create a session to then be able to post further webform data to further parts of the site.
Thanks in advance.

According to this other post from SO : Mechanize and Javascript, you have different options, from simulating in Python what javascript script is doing, to using the full fledged Selenium with its Python bindings.
If you try to proceed the simple Python way, I would strongly urge you to use a network spy such as the excellent wireshark to analyse what a successful login through a real browser actualy gets and sends, and what your Python simulation sends.

Related

How to fill and submit a form using python

I am filling a form of a web page with the help of mechanize module but I am getting error when I run my code.
I just want to fill the form and submit it successfully.
My attempt :
code snippet from this stack answer
import re
from mechanize import Browser
username="Bob"
password="admin"
br = Browser()
# Ignore robots.txt
br.set_handle_robots( False )
# Google demands a user-agent that isn't a robot
br.addheaders = [('User-agent', 'Firefox')]
br.open("https://fb.vivoliker.com/app/fb/token")
br.select_form(name="order")
br["u"] = [username]
br["p"]=[password]
response = br.submit()
Output :
Error (FormNotFoundError)
but what should I enter the name in br.select_form() because when I see source code of web page their is no name attribute set to that form.
Html source code of form from web page
<div class="container">
<form ls-form="fb-init">
<input type="hidden" name="machine_id">
<div class="form-group row">
<input id="u" type="text" class="form-control" placeholder="Facebook Username / Id / Email / Mobile Number" required="required">
</div>
<div class="form-group row">
<input id="p" type="password" class="form-control" placeholder="Facebook Password" required="required">
</div>
<div class="form-group row mt-3">
<button type="button" id='generating' class="btn btn-primary btn-block" onclick="if (!window.__cfRLUnblockHandlers) return false; get()" data-cf-modified-4e9e40fa9e78b45594c87eaa-="">Get Access Token</button>
</div>
<div ls-form="event"></div>
</form>
Expected output :
My form should be submit with the values that I given.
see javascript of this webpage given below .
I want to fill and submit form of this web page :
Web page source
I believe the form you want to select is ls-form=fb-init
However, since mechanize module requires replacing hyphens with underscores to convert HTML attrs to keyword arguments, you would want to write it like this:
br.select_form(ls_form='fb-init')
To clarify, the correct form to select is not named 'order', the form is named 'fb-init' and it is a ls-form (written as 'ls_form' with underscore). So with the change, it should be like this:
import re
from mechanize import Browser
username="Bob"
password="admin"
br = Browser()
# Ignore robots.txt
br.set_handle_robots( False )
# Google demands a user-agent that isn't a robot
br.addheaders = [('User-agent', 'Firefox')]
br.open("https://fb.vivoliker.com/app/fb/token")
br.select_form(ls_form='fb-init')
And then continue from there.

Unsuccessful attempts to post request for form submission

I am working on my first project using python to submit a form and retrieve weather data from http://www.weather.gov
I am brand new to HTTP form submission and as such it is highly possible I am going about this all wrong. I've read that mechanize and/or selenium are more efficient for this type of job, but I am limited to these modules by my school's server.
import requests
r = requests.get('http://www.weather.gov')
location = raw_input("Enter zipcode: ")
payload = {'key1' : location}
q = requests.post('http://forecast.weather.gov/', data = payload)
print q.text
My attempts to search a given zipcode have been unsuccessful, I am not reaching the local weather for the given zipcode.
Note: I have also tried this form submission using urllib & urllib2 without success.
import urllib
import urllib2
location = raw_input("Enter Zip Code here: ")
url = 'http://forecast.weather.gov/'
values = {'inputstring' : location}
data = urllib.urlencode(values)
req = urllib2.Request(url, data = data)
response = urllib2.urlopen(req)
html = response.read()
print html
Form as seen from inspecting the page:
<form name="getForecast" id="getForecast" action="http://forecast.weather.gov/zipcity.php" method="get">
<label for="inputstring">Local forecast by <br>"City, St" or ZIP code</label>
<input id="inputstring" name="inputstring" type="text" value="Enter location ..." onclick="this.value=''" autocomplete="off">
<input name="btnSearch" id="btnSearch" type="submit" value="Go">
<div id="txtError">
<div id="errorNoResults" style="display:none;">Sorry, the location you searched for was not found. Please try another search.</div>
<div id="errorMultipleResults" style="display:none">Multiple locations were found. Please select one of the following:</div>
<div id="errorChoices" style="display:none"></div>
<input id="btnCloseError" type="button" value="Close" style="display:none">
</div>
<div id="txtHelp"><a style="text-decoration: underline;" href="javascript:void(window.open('http://weather.gov/ForecastSearchHelp.html','locsearchhelp','status=0,toolbar=0,location=0,menubar=0,directories=0,resizable=1,scrollbars=1,height=500,width=530').focus());">Location Help</a></div>
</form>
<input id="inputstring" name="inputstring" type="text" value="Enter location ..." onclick="this.value=''" autocomplete="off">
Post to the URL http://forecast.weather.gov/zipcity.php
That is where the php script lies for receiving post data from the form.

Python - Logging in to web scrape

I'm trying to web-scrape a page on www.roblox.com that requires me to be logged in. I have done this using the .ROBLOSECURITY cookie, however, that cookie changes every few days. I want to instead log in using the login form and Python. The form and what I have so far is below. I do NOT want to use any add-on libraries like mechanize or requests.
Form:
<form action="/newlogin" id="loginForm" method="post" novalidate="novalidate" _lpchecked="1"> <div id="loginarea" class="divider-bottom" data-is-captcha-on="False">
<div id="leftArea">
<div id="loginPanel">
<table id="logintable">
<tbody><tr id="username">
<td><label class="form-label" for="Username">Username:</label></td>
<td><input class="text-box text-box-medium valid" data-val="true" data-val-required="The Username field is required." id="Username" name="Username" type="text" value="" autocomplete="off" aria-required="true" aria-invalid="false" style="cursor: auto; background-image: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR4nGP6zwAAAgcBApocMXEAAAAASUVORK5CYII=);"></td>
</tr>
<tr id="password">
<td><label class="form-label" for="Password">Password:</label></td>
<td><input class="text-box text-box-medium" data-val="true" data-val-required="The Password field is required." id="Password" name="Password" type="password" autocomplete="off" style="cursor: auto; background-image: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR4nGP6zwAAAgcBApocMXEAAAAASUVORK5CYII=);"></td>
</tr>
</tbody></table>
<div>
</div>
<div>
<div id="forgotPasswordPanel">
<a class="text-link" href="/Login/ResetPasswordRequest.aspx" target="_blank">Forgot your password?</a>
</div>
<div id="signInButtonPanel" data-use-apiproxy-signin="False" data-sign-on-api-path="https://api.roblox.com/login/v1">
<a roblox-js-onclick="" class="btn-medium btn-neutral">Sign In</a>
<a roblox-js-oncancel="" class="btn-medium btn-negative">Cancel</a>
</div>
<div class="clearFloats">
</div>
</div>
<span id="fb-root">
<div id="SplashPageConnect" class="fbSplashPageConnect">
<a class="facebook-login" href="/Facebook/SignIn?returnTo=/home" ref="form-facebook">
<span class="left"></span>
<span class="middle">Login with Facebook<span>Login with Facebook</span></span>
<span class="right"></span>
</a>
</div>
</span>
</div>
</div>
<div id="rightArea" class="divider-left">
<div id="signUpPanel" class="FrontPageLoginBox">
<p class="text">Not a member?</p>
<h2>Sign Up to Build & Make Friends</h2>
Sign Up
^Don't know what that "Sign Up" thing is doing there, can't delete it.
What I have so far:
import cookielib
import urllib
import urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
authentication_url = 'http://www.roblox.com/newlogin'
payload = {
'ReturnUrl' : 'http://www.roblox.com/home',
'Username' : 'usernamehere',
'Password' : 'passwordhere'
}
data = urllib.urlencode(payload)
req = urllib2.Request(authentication_url, data)
resp = urllib2.urlopen(req)
contents = resp.read()
print contents
I am very new to Python so I don't know how much of this works. Please let me know what is wrong with my code; I only get the log in page when I print contents
PS: The login page is HTTPS
Solution from OP.
I finished the script myself with the code below:
import cookielib
import urllib
import urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
authentication_url = 'https://www.roblox.com/newlogin'
payload = {
'username' : 'YourUsernameHere',
'password' : 'YourPasswordHere',
'' : 'Log In',
}
data = urllib.urlencode(payload)
req = urllib2.Request(authentication_url, data)
resp = urllib2.urlopen(req)
PageYouWantToOpen = urllib2.urlopen("http://www.roblox.com/develop").read()
I made this class a few weeks ago using just urllib.request for some webscraping/autotab opening. This may help you out or perhaps get you on the right path.
import urllib.request
class Log_in:
def __init__(self, loginURL, username, password):
self.loginURL = loginURL
self.username = username
self.password = password
def log_in_to_site(self):
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm = None,
uri=self.loginURL,
user=self.username,
passwd=self.password)
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)

Python 3 script for logging into a website using the Requests module

I'm trying to write some Python (3.3.2) code to log in to a website using the Requests module. Here is the form section of the login page:
<form method="post" action="https://www.ibvpn.com/billing/dologin.php" name="frmlogin">
<input type="hidden" name="token" value="236647d2da7c8408ceb78178ba03876ea1f2b687" />
<div class="logincontainer">
<fieldset>
<div class="clearfix">
<label for="username">Email Address:</label>
<div class="input">
<input class="xlarge" name="username" id="username" type="text" />
</div>
</div>
<div class="clearfix">
<label for="password">Password:</label>
<div class="input">
<input class="xlarge" name="password" id="password" type="password"/>
</div>
</div>
<div align="center">
<p>
<input type="checkbox" name="rememberme" /> Remember Me
</p>
<p>Request a Password Reset</p>
</div>
</fieldset>
</div>
<div class="actions">
<input type="submit" class="btn primary" value="Login" />
</div>
</form>
Here is my code, trying to deal with hidden input:
import requests
from bs4 import BeautifulSoup
url = 'https://www.ibvpn.com/billing/clientarea.php'
body = {'username':'my email address','password':'my password'}
s = requests.Session()
loginPage = s.get(url)
soup = BeautifulSoup(loginPage.text)
hiddenInputs = soup.findAll(name = 'input', type = 'hidden')
for hidden in hiddenInputs:
name = hidden['name']
value = hidden['value']
body[name] = value
r = s.post(url, data = body)
This just returns the login page. If I post my login data to the URL in the 'action' field, I get a 404 error.
I've seen other posts on StackExchange where automatic cookie handling doesn't seem to work, so I've also tried dealing with the cookies manually using:
cookies = dict(loginPage.cookies)
r = s.post(url, data = body, cookies = cookies)
But this also just returns the login page.
I don't know if this is related to the problem, but after I've run either variant of the code above, entering r.cookies returns <<class 'requests.cookies.RequestsCookieJar'>[]>
If anyone has any suggestions, I'd love to hear them.
You are loading the wrong URL. The form has an action attribute:
<form method="post" action="https://www.ibvpn.com/billing/dologin.php" name="frmlogin">
so you must post your login information to:
https://www.ibvpn.com/billing/dologin.php
instead of posting back to the login page. POST to soup.form['action'] instead:
r = s.post(soup.form['action'], data=body)
Your code is handling cookies just fine; I can see that s.cookies holds a cookie after requesting the login form, for example.
If this still doesn't work (a 404 is returned), then the server is using additional techniques to detect scripts vs. real browsers. Usually this is done by parsing the request headers. Look at your browser headers and replicate those. It may just be the User-Agent header that they parse, but Accept-* headers and Referrer can also play a role.

Login to webpage not working with Python requests module

I am trying to use the Python requests module to authenticate to a website and then retrieve some information from it. This is the login part of the page:
<div>
<label class="label-left" for="username"> … </label>
<input id="username" class="inputbox" type="text" size="18" alt="username" name="username"></input>
</div>
<div>
<label class="label-left" for="passwd"> … </label>
<input id="passwd" class="inputbox" type="password" alt="password" size="18" name="passwd"></input>
</div>
<div> … </div>
<div class="readon">
<input class="button" type="submit" value="Login" name="Submit"></input>
What I am doing now is:
payload = {
'username': username,
'passwd': password,
'Submit':'Login'
}
with requests.Session() as s:
s.post(login, data=payload)
ans = s.get(url)
print ans.text
The problem is that I get the same login page, even after the authentication. The response code is 200, so everything should be ok. Am I missing something?
UPDATE
Thanks to the comment, I have analyzed the post requests and I've seen that there are some hidden parameters. Among them, there are some parameters whose values vary between different requests. For this reason, I am simply getting them with BeautifulSoup and then updating the payload of the post request as follows:
with requests.Session() as s:
login_page = s.get(login)
soup = BeautifulSoup(login_page.text)
inputs = soup.findAll(name='input',type='hidden')
for el in inputs:
name = el['name']
value = el['value']
payload[name]=value
s.post(login, data=payload)
ans = s.get(url)
Nevertheless, I am still getting the login page. There can be some other influencing elements?

Categories

Resources