Scraping with Python and ViewState - python

Until just a couple of weeks ago i was able to get the json data from a website after performing some post requests
payload = {'NumDossier': numCase_com, 'idJuridiction': numJuri, 'typeDossier': "DP"}
r = requests.post(url, json=payload)
Now i am getting this strange HTML on every post request
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<form method="post" action="./getJuridiction1instance" id="form1">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE2MTY2ODcyMjlkZDDGgir+XqoIRtkvd//GurKfYTbq8hIisZRyOefPLUDj" />
<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="79390B9D" />
<div>
</div>
</form>
</body>
</html>
When inspecting the newtwork on chrome, in no time it sends __VIEWSTATE or __VIEWSTATEGENERATOR, i spent two days trying to figure out what i am doing wrong but nothing. (I am no expert in ASP.Net)
PS: I get the same result with postman

Related

How handle a Tag field in JS together with Flask Form?

Intro: I can receive the values from input fields in Flask by
var = request.form['description']
and the following HTML:
<form method="post" enctype="multipart/form-data">
<div class="mb-3">
<label for="description">Beschreibung</label>
<textarea name="description" placeholder="Post description"
class="form-control"></textarea>
</div>
<div class="form-group">
<button type="submit" class="btn btn-primary">Submit</button>
</div>
</form>
Question: Any idea how to handle a tag field, like e.g. Link , together with Flask Form? How to handle Flask Form and $.Ajax Post request simultaneously? Any hint would be highly appriciated. I did my research but found nothing online.
I found now an easy solution for me. Don´t know if it is very clean but what I did was basically create a hidden field and pass the var from JS to the hidden field inside <form so that i can grab it with request in Flask. I also did this the other way around, passing the variable from Flask to JS. In this way I can use the CSS,JS in the link above outside of <form, i.e. I can press Enter
HTML:
<!-- from JS to Flask -->
<input type="hidden" name="hidden_tags" id="hidden_tags" value="" />
<!-- from Flask to JS -->
<input type="hidden" name="hidden_tags2" id="hidden_tags2" value="{{ post['tags'] }}"/>
in JS:
tags = document.getElementById("hidden_tags2").value.split('/');
document.getElementById("hidden_tags").value = tags
in Flask:
request.form['hidden_tags']

How to send HTML form data to SQLite using Python

So, last few days I was trying to build a website where you input the post data then hit submit and then boom your post appears on the home page/post page.
So, I have created the form but the problem is that I don't know how to send HTML form data to SQLite database so it can be viewed by multiple users anytime.
<form class="posts" action="." method="post">
<h2>Title</h2>
<input type="text" name="title" placeholder="Title">
<h2>Description</h2>
<textarea input class="des" name="description"type="text" placeholder="Description"></textarea>
<h2>Image(Optional)</h2>
<input type="file" name="inpFile" id="inpFile" class="img-btn">
<div class="img-prev" id="imgPrev">
<img src="" alt="Image Preview" class="img-prev__img">
<span class="img-prev__def-text">Image Preview</span>
</div>
<input type="submit" value="Submit">
</form>
I think that you should take a look at some Python's web-frameworks like Flask or Django first, so that you can understand this subject a little clearer.

Python requests with login credentials

I am trying to login to a URL & download the content then parse, the URL needs username & password to login.
using below gives below errors:
import requests
url = 'https://test/acx/databaseUsage.jssp?object=all'
values = {'username': 'test_user',
'password': 'test_pswd'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.post(url, data=values, headers=headers)
print r.content
Error log output from above code:
tried with below values as well , without any success
values = {'Login': 'test',
'Password': 'test',
'Log in': 'submit'}
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=Edge"/> <!-- must be first; see SD5930 -->
<title>Test URL login</title>
<!--meta name="apple-mobile-web-app-capable" content="yes" /-->
<link type="text/css" rel="StyleSheet" href="/nl/logon.css"></link>
</head>
<body onLoad="setFocus();">
<div id="htmlContent">
<div id="container">
<div id="content">
<div class="login_frame">
<div class="header_login">
<img src="/nl/img/logo.png" alt="Test URL" />
</div>
<div id="form-main">
<!--[if lte IE 7]>
<div class="warning"><b>Warning</b>: your browser isn't supported by Test URL. <br/>To be able to use Test URL to its full potential, you need to update your browser.</div>
<![endif]-->
<form method="POST" autocorrect="off" autocapitalize="off" name="loginForm" action="/nl/jsp/logon.jsp">
<input type="hidden" name="action" value="submit" />
<input type="hidden" name="target" value="/acx/databaseUsage.jssp?object=all">
<p class="input first">
<label for="login">Login</label>
<span>
<input id="login" name="login" tabindex="1" type="text" value="" />
</span>
</p>
<p class="input">
<label for="password">Password</label>
<span>
<input id="password" name="password" tabindex="2" type="password" autocomplete="off" />
</span>
<br />
</p>
<p class="memorize submit last">
<input id="rememberMe" name="rememberMe" class="checkbox" tabindex="3" type="checkbox" />
<label class="checkbox" for="rememberMe">Keep me logged in</label>
<button id="validate" type="submit">Log in</button>
</p>
</form>
</div>
</div>
</div>
</div>
<div id="footer" class="dashboardFooter">
<div id="footerContent" class="nlui-pageWidth">
<p>
© Test URL 2017
</p>
</div>
</div>
</div>
<script type="text/javascript">
function setFocus() {
document.loginForm.login.focus();
}
</script>
</body>
</html>
Image of login page
In order to login successfully you'll have to submit the correct data to the correct URL. You can get those values from the HTML form, or by inspecting the network traffic in your browser. Also, you may want to gather any authenticated cookies.
Make sure to use the correct URL. You can get that URL from the form's action attribute (if the form has no action it is submitted to the URL that hosts it). If you examine the form you'll see that it is submitted to: "/nl/jsp/logon.jsp".
Make sure to include all required data. If the form contains hidden inputs they should be included in the POST data. It is important to submit all the form fields because they may contain essential data.
You can use a Session() object to store your cookies. This will collect and use cookies (and other parameters) across requests, and so you can access the site as an authenticated user.
If you want to set or change headers you can use either the headers parameter or the Session.headers attribute - which wil use those headers for all requests. Usually changing the default User-Agent is enough, but some sites may expect more headers (a valid Referer for example).
import requests
url = 'https://example.com/nl/jsp/logon.jsp'
post_data = {
'login': 'username',
'password': 'password',
'target':'/acx/databaseUsage.jssp?object=all',
'action':'submit'
}
with requests.Session() as s:
s.headers['User-Agent'] = 'My user-agent'
r = s.post(url, data=post_data)
print(r.text)
If you still can't login you may have to use Selenium. Sometimes JavaScript is involved in the login process and requests doesn't run JavaScript code. It may be possible to reverse-engineer this process but it would be much easier/better to use Selenium.

Python web crawler login redirects to login page

I am currently trying to crawl a homepage that requires a login. My current code looks like this:
url="xyz"
session=requests.Session()
mail='mail'
password='pw'
login_data={
'clientid':mail,
'key':password,
}
source_code=session.post('xyz_login', data=login_data)
source_code=session.get(url)
Unfortunately, this does not work. It does not give an error but redirects me to the login page. The HTML of the login is the following:
<div><input name="clientid" style="margin:5px;width:340px;" value=""/></div>
<div style="padding:0 5px;">Password</div>
<div><input name="key" style="margin:5px;width:340px;" type="password"/></div>
<div style="padding:5px;*padding: 0 5px;">
<input type="submit" style="display:none"/>
<img id="submitButton" src="image/buttonLogin.png" alt="submit" onclick="submitForm();$('#loginForm').submit();" style="cursor:pointer;"/>
<img id="waiting" src="image/indicator.gif" alt="waiting" style="display:none"/>
<img src="image/buttonCreateAccount.png" alt="create new Account"/>
Does someone have an idea, where the problem lies? Is it because of the submitForm()?

Python 3 script for logging into a website using the Requests module

I'm trying to write some Python (3.3.2) code to log in to a website using the Requests module. Here is the form section of the login page:
<form method="post" action="https://www.ibvpn.com/billing/dologin.php" name="frmlogin">
<input type="hidden" name="token" value="236647d2da7c8408ceb78178ba03876ea1f2b687" />
<div class="logincontainer">
<fieldset>
<div class="clearfix">
<label for="username">Email Address:</label>
<div class="input">
<input class="xlarge" name="username" id="username" type="text" />
</div>
</div>
<div class="clearfix">
<label for="password">Password:</label>
<div class="input">
<input class="xlarge" name="password" id="password" type="password"/>
</div>
</div>
<div align="center">
<p>
<input type="checkbox" name="rememberme" /> Remember Me
</p>
<p>Request a Password Reset</p>
</div>
</fieldset>
</div>
<div class="actions">
<input type="submit" class="btn primary" value="Login" />
</div>
</form>
Here is my code, trying to deal with hidden input:
import requests
from bs4 import BeautifulSoup
url = 'https://www.ibvpn.com/billing/clientarea.php'
body = {'username':'my email address','password':'my password'}
s = requests.Session()
loginPage = s.get(url)
soup = BeautifulSoup(loginPage.text)
hiddenInputs = soup.findAll(name = 'input', type = 'hidden')
for hidden in hiddenInputs:
name = hidden['name']
value = hidden['value']
body[name] = value
r = s.post(url, data = body)
This just returns the login page. If I post my login data to the URL in the 'action' field, I get a 404 error.
I've seen other posts on StackExchange where automatic cookie handling doesn't seem to work, so I've also tried dealing with the cookies manually using:
cookies = dict(loginPage.cookies)
r = s.post(url, data = body, cookies = cookies)
But this also just returns the login page.
I don't know if this is related to the problem, but after I've run either variant of the code above, entering r.cookies returns <<class 'requests.cookies.RequestsCookieJar'>[]>
If anyone has any suggestions, I'd love to hear them.
You are loading the wrong URL. The form has an action attribute:
<form method="post" action="https://www.ibvpn.com/billing/dologin.php" name="frmlogin">
so you must post your login information to:
https://www.ibvpn.com/billing/dologin.php
instead of posting back to the login page. POST to soup.form['action'] instead:
r = s.post(soup.form['action'], data=body)
Your code is handling cookies just fine; I can see that s.cookies holds a cookie after requesting the login form, for example.
If this still doesn't work (a 404 is returned), then the server is using additional techniques to detect scripts vs. real browsers. Usually this is done by parsing the request headers. Look at your browser headers and replicate those. It may just be the User-Agent header that they parse, but Accept-* headers and Referrer can also play a role.

Categories

Resources