I am trying to use Requests module to login into a site and get the html of the landing page. I am new to these stuff and I can't find a decent tutorial for this.
Here's the information that I have about that page
HTML of the form for login (url:http://14.139.251.99:8080/jopacv06/html/checkouts)
<FORM NAME="form" METHOD="POST" ACTION="./memberlogin" onsubmit="this.onsubmit= function(){return false;}">
<table class='loginTbl' border='1' align="center" cellspacing='3' cellpadding='3' width='60%'>
<input type="hidden" name="hdnrequesttype" value="1" />
<thead>
<tr>
<td colspan='3' align="middle" class='loginHead'>Login</td>
</tr>
</thead>
<tbody class='loginBody'>
<tr>
<td class='loginBodyTd1' nowrap="nowrap">Employee ID</td>
<td class='loginBodyTd2'><input type='text' name='txtmemberid' id='txtmemberid' value='' class='loginTextBox' size='30' maxlength='8'/></td>
<td class='loginBodyTd3' rowspan='2'><input type="submit" class="goclearbutton" value=" Go "></td>
</tr><input type='hidden' name='txtmemberpwd' id='txtmemberpwd' value='' />
</tbody>
<tfoot>
<tr>
<td colspan='3' class='loginFoot'>
<font class='loginRed'>New Visitor?</font>
Send your registration request to library !
</td>
</tr>
</tfoot>
</table>
</form>
I came to know that I may need to set cookie , so the cookie name in the landing page is JSESSIONID(in case that's reqd). And I discovered that once I successfuly log in then I would have to use beautifulSoup to get the details. Please help me how to combine these pieces together.
You will have to do something like this,
import requests
response = requests.post("http://14.139.251.99:8080/jopacv06/html/checkouts/memberlogin", data = {'txtmemberid': '1'})
if response.status_code == 200:
html_code = response.text
// Do whatever you want to do further with this HTML now.
Related
I'm trying to download a file with Python from a site. The issue is the download automatically starts after submitting the form on the page. Using Mechanize, I am able to log in, get to the page where the download lives, fill out the form, and submit the form (which kicks off the download of an xls file).
Looking in content-disposition, I can see attachment name:
attachment {'filename': 'policytransactions.xls'}
but I cannot figure out how to download this file locally.
Looking at the page source, I can see that the answer to my question is somewhere in here:
<td><div id="form1:j_idt37" class="ui-datatable ui-widget dataTable"><table role="grid"><thead><tr role="row"><th id="form1:j_idt37:j_idt38" class="ui-state-default" role="columnheader"><div class="ui-dt-c"><span></span></div></th></tr></thead><tfoot></tfoot><tbody id="form1:j_idt37_data" class="ui-datatable-data ui-widget-content"><tr data-ri="0" class="ui-widget-content ui-datatable-even" role="row"><td role="gridcell"><div class="ui-dt-c">
<script type="text/javascript" src="/policy/app/javax.faces.resource/jsf.js?ln=javax.faces"></script>
<span class="outputText">XLS</span></div></td></tr></tbody></table></div><script id="form1:j_idt37_s" type="text/javascript">PrimeFaces.cw('DataTable','widget_form1_j_idt37',{id:'form1:j_idt37'});</script></td>
<td><table>
<tbody>
<tr>
<td><span id="form1:dateField3"><input id="form1:dateField3_input" name="form1:dateField3_input" type="text" value="04/01/2017" class="ui-inputfield ui-widget ui-state-default ui-corner-all" /></span><script id="form1:dateField3_s" type="text/javascript">$(function(){PrimeFaces.cw('Calendar','widget_form1_dateField3',{id:'form1:dateField3',popup:true,locale:'en_US',dateFormat:'mm/dd/yy',defaultDate:'04/01/2017'});});</script></td>
<td><span id="form1:dateField4"><input id="form1:dateField4_input" name="form1:dateField4_input" type="text" value="04/28/2017" class="ui-inputfield ui-widget ui-state-default ui-corner-all" /></span><script id="form1:dateField4_s" type="text/javascript">$(function(){PrimeFaces.cw('Calendar','widget_form1_dateField4',{id:'form1:dateField4',popup:true,locale:'en_US',dateFormat:'mm/dd/yy',defaultDate:'04/28/2017'});});</script></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<input type="hidden" name="javax.faces.ViewState" id="javax.faces.ViewState" value="e2s1" />
</form>
</div>
Any suggestions on how to grab this? Thanks
I'm trying to login into website using a Python script, store the cookie I receive, and then use that same cookie to access member-only parts of the website. I've read several posts and answers about this topic, but none of the answers have worked for me.
Here is the HTML code for the website login page I'm trying to access.
<form action="/login?task=user.login" method="post">
<fieldset>
<table border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="70" nowrap="">Username </td>
<td width="260"><input type="text" name="username" id="username" value="" class="validate-username" size="25"/></td>
</tr>
<tr>
<td width="70" nowrap="">Password </td>
<td width="260"><input type="password" name="password" id="password" value="" class="validate-password" size="25"/></td>
</tr>
<tr>
<td colspan="2"><label style="float: left;width: 70%;" for="modlgn_remember">Remember Me</label>
<input style="float: right;width: 20%;"id="modlgn_remember" type="checkbox" name="remember" class="inputbox" value="yes"/></td>
</tr>
<tr>
<td colspan="2" width="100%"> Forgot your password?</td>
</tr>
<tr>
<td colspan="2" width="100%"> Forgot your username?</td>
</tr>
<tr>
<td colspan="2"><button type="submit" class="button cta">Log in</button></td>
<!-- <td colspan="1">Register Now</td>-->
</tr>
</tbody>
</table>
<input type="hidden" name="return"
value="aHR0cHM6Ly9maWYuY29tLw=="/>
<input type="hidden" name="3295f23066f7c6ab53c290c6c022cc4b" value="1" /> </fieldset>
</form>
Here is my own code that I'm using to attempt a login.
from requests import session
payload = {
'username': 'MY_USERNAME',
'password': 'MY_PASSWORD'
}
s = session()
s.post('https://fif.com/login?task=user.login', data=payload)
response = s.get('https://fif.com/tools/capacity')
From everything I have read, this should work, but it doesn't. I've been struggling with this for two days, so if you know the answer, I would love the solution.
For reference, here are all the other StackOverflow posts I have looked at in hopes for an answer:
Python Requests and Persistent Sessions
Logging into a site using Python Reqeusts
Login to website using python
How to “log in” to a website using Python's Requests module?
Python: Requests Session Login Cookies
How to use Python to login to a webpage and retrieve cookies for later usage?
cUrl Login then cUrl Download
You should be posting all the required data, you can use bs4 to parse the login page to get the values you need:
from requests import session
from bs4 import BeautifulSoup
data = {
'username': 'MY_USERNAME',
'password': 'MY_PASSWORD'
}
head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}
with session() as s:
soup = BeautifulSoup(s.get("https://fif.com/login").content)
form_data = soup.select("form[action^=/login?task] input")
data.update({inp["name"]: inp["value"] for inp in form_data if inp["name"] not in data})
s.post('https://fif.com/login?task=user.login', data=data, headers=head)
resp = s.get('https://fif.com/tools/capacity')
If you make a requests and look in chrome tools or firebug, the form data looks like:
username:foo
password:bar
return:aW5kZXgucGhwP29wdGlvbj1jb21fdXNlcnMmdmlldz1wcm9maWxl
d68a2b40daf7b6c8eaa3a2f652f7ee62:1
I encounter some difficulties to log on a website from a python script, in order to retrieve data from it later on, once I will be connected.
I think that the part of the HTML page with the form expecting username and password is the following :
<div class="contentLogin">
<form action="/login/loginSubmit" method="post" class="memberLogin">
<table cellpadding="0" cellspacing="0" border="0" >
<tr>
<td><label class="color2">Déjà membre</label></td>
<td> </td>
<td><input type="text" value="pseudo" class="input" name="login" id="login" /></td>
<td><input type="password" value="pass" class="input" name="pass" id="pass" /></td>
<td><input type="submit" value="ok" class="color2 loginSubmit" /></td>
</tr>
<tr>
<td colspan="3"></td>
<td colspan="2" >
Mot de passe oublié ?
</td>
</tr>
</table>
</form> </div>
I would like to use the "requests" module of python langage, to do the POST request that will connect me to the site.
My code already contains the following commands :
import requests
pars = {'login': 'dva2tlse', 'pass': 'VeryStrong', 'action': 'Idunno'}
resp = requests.post("http://www.example.com", params=pars)
But it seems not to work, because I even do not know WHICH action should be indicated within th POST request. (I do not even know how exactly to use it, since I never done it)
Thank' you to help me make all that work correctly,
David
Change the url value in requests.post to match the one given in <form action> attribute.
Also, remove the action key in your pars dictionary.
pars = { 'login': 'dva2tlse', 'pass': 'VeryStrong' }
resp = requests.post("http://example.com/login/loginSubmit", params=pars)
If you want to keep your login state for further page calls, you can use requests.Session()
s = requests.Session()
pars = { 'login': 'dva2tlse', 'pass': 'VeryStrong' }
resp = s.post("http://example.com/login/loginSubmit", params=pars)
As long as you keep using s, you will stay logged in.
I'm trying to login to Myanimelist.net programmatically using urllib.
I'm not really sure what is happening. I'm not logging in correctly, as the code it returns isn't what I expect it to be. Not sure what the POST parameters should be or what I'm doing wrong. Looked at a bunch of similar stackoverflow questions regarding urllib login authentications but I can't figure it out.
This is the form:
<form action="http://myanimelist.net/login.php" id="loginForm" method="post" name="loginForm">
<table align="center" border="0" cellpadding="6" cellspacing="0">
<tbody>
<tr>
<td width="100"><strong>Username:</strong></td>
<td>
<input class="inputtext" id="loginUNAME" name="username"size="30" type="text" value="">
</td>
</tr>
<tr>
<td><strong>Password:</strong></td>
<td>
<input class="inputtext" name="password" size="30" type="password">
</td>
</tr>
<tr>
<td align="center" colspan="2">
<input name="cookie" type="checkbox" value="1"> Always stay logged in?
</td>
</tr>
<tr>
<td align="center" colspan="2"><input class="inputButton" name= "sublogin" type="submit" value="Login">
<input class="inputButton" name="register" onclick="document.location='http://myanimelist.net/register.php';" type="button" value="Register"></td>
</tr>
<tr>
<td align="center" colspan="2">
<a href="http://myanimelist.net/password.php">Forget Your
Password?</a>
</td>
</tr>
</tbody>
</table>
</form>
My login code:
import http.cookiejar
import urllib
from bs4 import BeautifulSoup
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
url = 'http://myanimelist.net/login.php'
payload = {
'username' : '<username>',
'password' : '<password>',
'cookie' : '1',
'sublogin' : 'Login'
}
data = urllib.parse.urlencode(payload).encode('ascii')
request = urllib.request.Request(url=url, data=data)
response = urllib.request.urlopen(request)
html = response.read().decode('utf-8')
soup = BeautifulSoup(html)
print(soup.prettify())
and what I get is:
<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script>
(function(){function getSessionCookies(){cookieArray=new Array();var cName=/^\s?
incap_ses_/;var c=document.cookie.split(";");for(var i=0;i<c.length;i++){key=c[i
].substr(0,c[i].indexOf("="));value=c[i].substr(c[i].indexOf("=")+1,c[i].length)
;if(cName.test(key)){cookieArray[cookieArray.length]=value}}return cookieArray}f
unction setIncapCookie(vArray){try{cookies=getSessionCookies();digests=new Array
(cookies.length);for(var i=0;i<cookies.length;i++){digests[i]=simpleDigest((vArr
ay)+cookies[i])}res=vArray+",digest="+(digests.join())}catch(e){res=vArray+",dig
est="+(encodeURIComponent(e.toString()))}createCookie("___utmvc",res,20)}functio
n simpleDigest(mystr){var res=0;for(var i=0;i<mystr.length;i++){res+=mystr.charC
odeAt(i)}return res}function createCookie(name,value,seconds){if(seconds){var da
te=new Date();date.setTime(date.getTime()+(seconds*1000));var expires="; expires
="+date.toGMTString()}else{var expires=""}document.cookie=name+"="+value+expires
+"; path=/"}function test(o){var res="";var vArray=new Array();for(test in o){sw
itch(o[test]){case"exists":try{vArray[vArray.length]=encodeURIComponent(test+"="
+typeof(eval(test)))}catch(e){vArray[vArray.length]=encodeURIComponent(test+"="+
e)}break;case"value":try{vArray[vArray.length]=encodeURIComponent(test+"="+eval(
test).toString())}catch(e){vArray[vArray.length]=encodeURIComponent(test+"="+e)}
break;case"plugins":try{p=navigator.plugins;pres="";for(a in p){pres+=(p[a]["des
cription"]+" ").substring(0,20)}vArray[vArray.length]=encodeURIComponent("plugin
s="+pres)}catch(e){vArray[vArray.length]=encodeURIComponent("plugins="+e)}break;
case"plugin":try{a=navigator.plugins;for(i in a){f=a[i]["filename"].split(".");i
f(f.length==2){vArray[vArray.length]=encodeURIComponent("plugin="+f[1]);break}}}
catch(e){vArray[vArray.length]=encodeURIComponent("plugin="+e)}break}}vArray=vAr
ray.join();return vArray}var o={navigator:"exists","navigator.vendor":"value",op
era:"exists",ActiveXObject:"exists","navigator.appName":"value",platform:"plugin
",webkitURL:"exists","navigator.plugins.length==0":"value"};try{setIncapCookie(t
est(o));document.createElement("img").src="/_Incapsula_Resource?SWKMTFSR=1&e="+M
ath.random()}catch(e){img=document.createElement("img");img.src="/_Incapsula_Res
ource?SWKMTFSR=1&e="+e}})();
</script>
<script>
(function() {
var z="";var b="7472797B766172207868723B76617220743D6E6577204461746528292E676574
54696D6528293B766172207374617475733D227374617274223B7661722074696D696E673D6E6577
2041727261792833293B77696E646F772E6F6E756E6C6F61643D66756E6374696F6E28297B74696D
696E675B325D3D22723A222B286E6577204461746528292E67657454696D6528292D74293B646F63
756D656E742E637265617465456C656D656E742822696D6722292E7372633D222F5F496E63617073
756C615F5265736F757263653F4553324C555243543D363726743D373826643D222B656E636F6465
555249436F6D706F6E656E74287374617475732B222028222B74696D696E672E6A6F696E28292B22
2922297D3B69662877696E646F772E584D4C4874747052657175657374297B7868723D6E65772058
4D4C48747470526571756573747D656C73657B7868723D6E657720416374697665584F626A656374
28224D6963726F736F66742E584D4C4854545022297D7868722E6F6E726561647973746174656368
616E67653D66756E6374696F6E28297B737769746368287868722E72656164795374617465297B63
61736520303A7374617475733D6E6577204461746528292E67657454696D6528292D742B223A2072
657175657374206E6F7420696E697469616C697A656420223B627265616B3B6361736520313A7374
617475733D6E6577204461746528292E67657454696D6528292D742B223A2073657276657220636F
6E6E656374696F6E2065737461626C6973686564223B627265616B3B6361736520323A7374617475
733D6E6577204461746528292E67657454696D6528292D742B223A20726571756573742072656365
69766564223B627265616B3B6361736520333A7374617475733D6E6577204461746528292E676574
54696D6528292D742B223A2070726F63657373696E672072657175657374223B627265616B3B6361
736520343A7374617475733D22636F6D706C657465223B74696D696E675B315D3D22633A222B286E
6577204461746528292E67657454696D6528292D74293B6966287868722E7374617475733D3D3230
30297B706172656E742E6C6F636174696F6E2E72656C6F616428297D627265616B7D7D3B74696D69
6E675B305D3D22733A222B286E6577204461746528292E67657454696D6528292D74293B7868722E
6F70656E2822474554222C222F5F496E63617073756C615F5265736F757263653F535748414E4544
4C3D343436383430363736353433323532383538322C313135353632373036373830323435333434
33392C31373934323137333630323239353234373434352C333830353834222C66616C7365293B78
68722E73656E64286E756C6C297D63617463682863297B7374617475732B3D6E6577204461746528
292E67657454696D6528292D742B2220696E6361705F6578633A20222B633B646F63756D656E742E
637265617465456C656D656E742822696D6722292E7372633D222F5F496E63617073756C615F5265
736F757263653F4553324C555243543D363726743D373826643D222B656E636F6465555249436F6D
706F6E656E74287374617475732B222028222B74696D696E672E6A6F696E28292B222922297D3B";
for (var i=0;i<b.length;i+=2){z=z+parseInt(b.substring(i, i+2), 16)+",";}z = z.s
ubstring(0,z.length-1); eval(eval('String.fromCharCode('+z+')'));})();
</script></head>
<body>
<iframe style="display:none;visibility:hidden;" src="//content.incapsula.com/jsT
est.html" id="gaIframe"></iframe>
</body>
</html>
Any ideas about what I'm doing wrong?
The problem is on the following line:
response = urllib.request.urlopen(request)
You're submitting your request as a GET instead of POST which causes it to fail.
From the docs:
urllib.request.urlopen(url[, data][, timeout])
Open the URL url, which can be either a string or a Request object.
data may be a string specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided. data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.parse.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format.
I tried to convert app engine generated output page into pdf, and had some problems.
First: I select the contents in jQuery.
Second: Send this javascript variable to a new python script
Third: In the new python script, using xhtml2pdf to the conversion.
However, I got confused in the Second step. Below is my approach:
HTML:
<div class="articles">
<h2 class="model_header">PFAM Output</h2>
<form>
<table align="center">
<!--end 04uberoutput_start-->
<table class="out_chemical" width="550" border="1">
<tr>
<th scope="col" colspan="5">
<div align="center">Chemical Inputs</div>
</th>
</tr>
<tr>
<th scope="col" width="250">
<div align="center">Variable</div>
</th>
<th scope="col" width="150">
<div align="center">Unit</div>
</th>
<th scope="col" width="150">
<div align="center">Value</div>
</th>
</tr>
<tr>
<td>
<div align="center">Water Column Half life #20 ℃</div>
</td>
<td>
<div align="center">days</div>
</td>
<td>
<div align="center">11</div>
</td>
</tr>
</table>
</table>
</form>
</div>
JS
$(document).ready(function () {
var jq_html = $("div.articles").html();
console.log(jq_html);
$('.getpdf').append('<tr style="display:none"><td><input name="extract" value="' + jq_html + '"></input></td></tr>');
$('.getpdf').append('<tr><td><input type="submit" value="Generate PDF"/></td></tr>');
})
new python script to do the conversion
def post(self):
form = cgi.FieldStorage()
extract = form.getvalue('extract')
print extract
self.response.out.write(html)
When I tried to check if variable extract is transferred correctly, I got an empty page. It seems like this variable is ignored... The whole framework seems fine if I feed extract with a number. So could anyone help me to identify if my approach is correct? Thanks!
This line of code does not handle escaping HTML correctly. Additionally, it is a text field rather than a hidden field:
$('.getpdf').append('<tr style="display:none"><td><input name="extract" value="' + jq_html + '"></input></td></tr>');
A better way to do it would be like this:
$('<tr style="display:none"><td><input type="hidden" name="extract"></td></tr>')
.appendTo('.getpdf')
.find('input')
.val(jq_html);