python autofill form in a webpage - python

I am trying to fill a form in a webpage that has a single text box and a send button the html looks like this
<form class="form-horizontal">
<div class="row">
<div class="col-md-12">
<div id="TextContainer" class="textarea-container">
<textarea id="Text" rows="5" maxlength="700" class="form-control remove-border" style="background:none;"></textarea>
</div><button id="Send" class="btn btn-primary-outline" type="button" onclick="SendMessage()" style="margin-top:10px" data-loading-text="Loading..."><span class="icon icon-pencil"></span> Send</button>
</div>
</div>
</form>
I tried to use mechanize to submit the form with this code
import re
from mechanize import Browser
br = Browser()
response=br.open("https://abcd.com/")
for f in br.forms():
if f.attrs['class'] == 'form-horizontal':
br.form = f
text = br.form.find_control(id="Text")
text.value = "something"
br.submit()
The code runs without an error, but no submission is happening , how do I do it?
Here is the SendMessage function
function SendMessage() {
var text = $('#Text').val();
var userId = $('#RecipientId').val();
if (text.trim() === "")
{
$('#TextContainer').css('border-color', 'red');
}
else if (new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+#)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(text))
{
$('#TextContainer').css('border-color', 'red');
$('#message').html("Links are not allowed in messages");
}
else
{
$('#Send').button('loading');
$.ajax(
{
url: '/Messages/SendMessage',
type: 'POST',
cache: false,
data:
{
__RequestVerificationToken: $('<input name="__RequestVerificationToken" type="hidden" value="CfDJ8MQSRebrM95Pv2f7WNJmKQWGnVR66zie_VVqFsquOCZLDuYRRBPP1yzk_755VDntlD3u0L3P-YYR0-Aqqh1qIjd09HrBg8GNiN_AU48MMlrOtUKDyJyYCJrD918coQPG0dmgkLR3W85gV6P4zObdEMw" />').attr('value'),
userId: userId,
text: text
}
});
}
}

I suspect the issue is that the submit button in the HTML form is not of type=submit - so mechanise won't know what to do when you call br.submit(). The fix is to either change the button type on the HTML website, or tell Browser which button to use for submitting the form:
br.submit(type='button', id='Send')
The submit method takes the same arguments as the HTML Forms API, so I recommend taking a look at the documentation for more details.
Update
The problem here seems to be the JavaScript method attached to the button. Mechanize does not support calling JavaScript functions, hence you won't be able to just use the .submit() method to submit the form. Instead, the best option would probably be to read in the SendMessage() JavaScript function, which gets called if someone clicks on the Send button, and translate it to Python manually. In the best case it consists of a simple AJAX POST request which is very easy to implement in Python. Please look here for a related question.
Second Update
Given the new information in your question, in particular the JavaScript function, you can now manually implement the POST request inside your Python script. I suggest the use of the Requests module which will make the implementation much easier.
import requests
data = {
"__RequestVerificationToken": "CfDJ8MQSRebrM95Pv2f7WNJmKQWGnVR66zie_VVqFsquOCZLDuYRRBPP1yzk_755VDntlD3u0L3P-YYR0-Aqqh1qIjd09HrBg8GNiN_AU48MMlrOtUKDyJyYCJrD918coQPG0dmgkLR3W85gV6P4zObdEMw",
"userId": "something",
"text": "something else"
}
response = requests.post("https://example.com/Messages/SendMessage", data=data)
response will now consist of the response which you can use to check if the request was successfully made. Please note that you might need to read out the __RequestVerificationToken with mechanize as I suspect it is generated each time you open the website. You could just read out the HTML source with html_source = br.read() and then search for __RequestVerificationToken and try to extract the corresponding value.

You can give name attribute to your text area like:
<form class="form-horizontal">
<div class="row">
<div class="col-md-12">
<div id="TextContainer" class="textarea-container">
<textarea id="Text" name="sometext" rows="5" maxlength="700" class="form-control remove-border" style="background:none;"></textarea>
</div><button id="Send" class="btn btn-primary-outline" type="button" onclick="SendMessage()" style="margin-top:10px" data-loading-text="Loading..."><span class="icon icon-pencil"></span> Send</button>
</div>
</div>
</form>
Then try this out:
import re
from mechanize import Browser
br = mechanize.Browser()
br.open("https://abcd.com/")
br.select_form(nr=0) #in case of just single form you can select form passing nr=0
br["sometext"] = "something"
response = br.submit()
print(response.read())
If it successfully submits form then you can read your response body.

Related

I am getting a `KeyError` when I receive a request to my API

I am building a dictionary app with Flask where users can add new words, I am trying to request the word from the word input , I am having issues with the POST request, the error I am receiving on my terminal is this:
line 50, in add_word
word = req['word']
keyError:'word'
and this is how I wrote the code in my app.py file:
#app.route('/word', methods= ['POST'])
def add_word():
req = request.get_json()
word = req['word']
meaning = req['meaning']
conn = mysql.get_db()
cur = conn.cursor()
cur.execute('insert into word(word, meaning) VALUES (%s, %s)',(word, meaning))
conn.commit()
cur.close()
return json.dumps("success")
here is the json in my JavaScript file, I am posting to my flask app:
$('#word-form').submit(function() {
let word = $('word').val();
let meaning = $('meaning').val();
$.ajax({
url: '/word',
type: 'POST',
dataType: 'json',
data : JSON.stringify({
'word': word,
'meaning': meaning
}),
contentType: 'application/json, charset = UTF-8',
success: function(data) {
location.reload();
},
error: function(err) {
console.log(err);
}
})
here is the Html page:
<div class="div col-md-2 sidenav">
All words
Add New
<div>
<form action="javascript:0" id="word-form">
<div class="form-group">
<label for="word">Word:</label>
<input type="text"
class="form-control"
name="word"
id="word"
placeholder="Type in the word here:"
required>
</div>
<div class="form-group">
<label for="Meaning">Meaning:</label>
<textarea class="form-control" id="meaning"
placeholder="enter the meaning here: " required></textarea>
</div>
<button type="submit" class="btn btn-primary btn-block btn-lg" id="submit">Submit</button>
<button type="button" class="btn btn-warning btn-block btn-lg" id="cancel">Cancel</button>
</form>
</div>
</div>
<div class="div col-md-10 main">
<table style="border: 2px;">
<thead>
<tr>
<th>SN</th>
<th>Word</th>
<th>Meaning</th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
{% for word in words %}
<tr>
<td>{{ loop.index }}</td>
<td>{{ word['word'] }}</td>
<td>{{ word['meaning'] }}</td>
<td><button class="btn btn-sm btn-success btn-block edit" id="{{word['id']}}">Edit</button></td>
<td><button class="btn btn-sm btn-danger btn-block delete" id="{{word['id']}}">Delete</button></td>
</tr>
{% else %}
<tr>
<td colspan="3">The dictionary has no words at the moment, please come bay later</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
Things seem to be in a confused state in the client code, and potentially the application architecture in general.
There are two general approaches to designing web apps that impacts how you create routes and build requests. One approach is the AJAX-based single page app architecture that loads one HTML skeleton, then uses JS (jQuery here) to make AJAX requests to a JSON API and injects the response data into the page using JS instead of page refreshes and navigations. Since you don't have a client-side router, this doesn't qualify as a SPA, but it's worth understanding to provide context on your design.
On the other hand, you can use HTML form submissions (<form action="/path/to/resource" method="POST">) and render_template to display new pages with a browser refresh for all form submissions.
The code here is somewhere in the middle, which is potentially fine (you can use AJAX for certain events and form submissions but mostly rely on full-navigation templates for routes). But it's important to be clear on the request-response workflow you're adopting so the design makes sense and can be debugged.
Here are few oddities/inconsistencies in your current design:
return json.dumps("success") is not really JSON as it seems like you want it to be--use jsonify and a dictionary or list, e.g. jsonify({"status": "success"}; it's customary for JSON routes to return JSON responses if they aren't rendering templates or redirecting.
The client ignores the JSON response and calls location.reload. If you're just planning on reloading and you have no special client processing to do, there's not much point in using AJAX or JSON here--just submit the form to the backend and redirect to the template or static HTML page you want to show next. No client-side JS involved. Redirect to an error page or render a template with errors shown on the form on error.
Links with href="#" are poor practice. Better to use buttons if you're adding JS to these handlers and you don't want them to trigger navigation. This is semantically more appropriate and doesn't hijack the URL.
<form action="javascript:0" id="word-form"> looks like it's trying to prevent the form submission, but all this does is replace the page content with the character "0". I can't imagine how this is useful or desirable. Submitting a form to a JSON route can produce the error you're seeing--another sign of confusion about which architecture you're following. Use event.preventDefault() (add the event parameter to the callback to .submit()) to prevent the form submission from refreshing the page.
After you've prevented the page refresh, you can debug the AJAX request.
When a route is complaining about missing keys, consider that objects with keys pointing to undefined disappear when serialized as JSON (undefined is not a thing in JSON):
const word = undefined;
const foo = 42;
const bar = "baz";
console.log({word, foo, bar}); /* =>
{
"word": undefined,
"foo": 42,
"bar": "baz"
}
*/
console.log(JSON.stringify({
word,
foo,
bar,
})); // => {"foo":42,"bar":"baz"}
If you add a console.log to see if your values are there (or print the JSON on the backend route before indexing into it), these values aren't defined:
let word = $('word').val();
let meaning = $('meaning').val();
console.log(word, meaning); // <-- undefined, undefined
Why? The reason is that these selectors are missing the # symbol prefix to denote an id. Without it, jQuery looks for <word></word> and <meaning></meaning> HTML elements that don't exist.
Change these lines to:
const word = $('#word').val();
const meaning = $('#meaning').val();
and now your request body should be ready to send.
Next problem: $.ajax's dataType key specifies the response type, not the request type. Use dataType: "json" to specify the appropriate request header to trigger the Flask handler to JSON parse the request body.
After these changes, things should work, with the caveat that it might be time for a rethink of your overall design, or at least a redesign of this route workflow.
A word of advice: work slowly and test all of your assuptions at each step. The code here shows many errors that are hard to debug because they're stacked on top of each other. Isolate and validate each behavior in your app. For example, when adding the jQuery submit handler and collecting the form values, print them to make sure they're actually there as you expected.
In case you're stuck, here's minimal, complete, runnable code you can reference.
app.py:
from flask import (
Flask, jsonify, render_template, request, url_for
)
app = Flask(__name__)
#app.route("/")
def index():
return render_template("index.html")
#app.post("/words/")
def words():
payload = request.get_json()
word = payload.get("word")
meaning = payload.get("meaning")
if word is None or meaning is None:
return (jsonify({
"error": "missing required keys `word` or `meaning`"
}), 400)
# handle db operation and report failure as above
return jsonify({"status": "success"})
if __name__ == "__main__":
app.run(debug=True)
templates/index.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
</head>
<body>
<form id="word-form">
<div>
<label for="word">Word:</label>
<input
name="word"
id="word"
placeholder="Type in the word here:"
required
>
</div>
<div>
<label for="meaning">Meaning:</label>
<textarea
name="meaning"
id="meaning"
placeholder="enter the meaning here: "
required
></textarea>
</div>
<button type="submit">Submit</button>
</form>
<script>
$('#word-form').submit(function (e) {
e.preventDefault();
const word = $('#word').val();
const meaning = $('#meaning').val();
console.log(word, meaning);
$.ajax({
url: "{{url_for('words')}}",
type: "POST",
dataType: "json",
contentType: "application/json",
data: JSON.stringify({word, meaning}),
success: function (data) {
console.log(data);
},
error: function (err) {
console.error(err);
}
});
});
</script>
</body>
</html>
See also: How to get POSTed JSON in Flask?
Here are a few additional notes that are somewhat tangential to the main issue but have to be mentioned.
You have <label for="Meaning"> but no name="meaning" element to shift focus to when clicked.
It's another antipattern to put ids on everything promiscuously. Only add ids to elements when they must have one because you're using it for something specific. Prefer classes, especially for styling.
On the backend, the code here is unsafe:
req = request.get_json()
word = req['word']
meaning = req['meaning']
If your client gives a bad request with missing values, you should detect that and return a 400/422 response (or similar) rather than crashing.
For example (from the above code snippet):
req = request.get_json()
word = req.get("word")
meaning = req.get("meaning")
if word is None or meaning is None:
return (jsonify({
"error": "missing required keys `word` or `meaning`"
}), 400)
Similarly, don't assume the database operation will succeed. Always check for errors and return an appropriate response to the client.
Resources are usually plural, not singular: words, users, posts.

Login to https website using Python

I'm new to posting on stackoverflow so please don't bite! I had to resort to making an account and asking for help to avoid banging my head on the table any longer...
I'm trying to login to the following website https://account.socialbakers.com/login using the requests module in python. It seems as if the requests module is the place to go but the session.post() function isn't working for me. I can't tell if there is something unique about this type of form or the fact the website is https://
The login form is the following:
<form action="/login" id="login-form" method="post" novalidate="">
<big class="error-message">
<big>
<strong>
</strong>
</big>
</big>
<div class="item-full">
<label for="">
<span class="label-header">
<span>
Your e-mail address
</span>
</span>
<input id="email" name="email" type="email"/>
</label>
</div>
<div class="item-list">
<div class="item-big">
<label for="">
<span class="label-header">
<span>
Password
</span>
</span>
<input id="password" name="password" type="password"/>
</label>
</div>
<div class="item-small">
<button class="btn btn-green" type="submit">
Login
</button>
</div>
</div>
<p>
<a href="/email/reset-password">
<strong>
Lost password?
</strong>
</a>
</p>
</form>
Based on the following post How to "log in" to a website using Python's Requests module? among others I have tried the following code:
url = 'https://account.socialbakers.com/login'
payload = dict(email = 'Myemail', password = 'Mypass')
with session() as s:
soup = BeautifulSoup(s.get(url).content,'lxml')
p = s.post(url, data = payload, verify=True)
print(p.text)
This however just gives me the login page again and doesn't seem to log me in
I have checked in the form that I am referring to the correct names of the inputs 'email' and 'password'. I've tried explicitly passing through cookies as well. The verify=True parameter was suggested as a way to deal with the fact the website is https.
I can't work out what isn't working/what is different about this form to the one on the linked post.
Thanks
Edit: Updated p = s.get to p = s.post
Checked the website. It is sending the SHA3 hash of the password instead of sending as plaintext. You can see this in line 111 of script.js which is included in the main page as :
<script src="/js/script.js"></script>
inside the head tag.
So you need to replicate this behaviour while sending POST requests. I found pysha3 library that does the job pretty well.
So first install pysha3 by running pip install pysha3 (give sudo if necessary) then run the code below
import sha3
import hashlib
import request
url = 'https://account.socialbakers.com/login'
myemail = "abhigolu10#gmail.com"
mypassword = hashlib.sha3_512(b"st#ck0verflow").hexdigest() #take SHA3 of password
payload = {'email':myemail, 'password':mypassword}
with session() as s:
soup = BeautifulSoup(s.get(url).content,'lxml')
p = s.post(url, data = payload, verify=True)
print(p.text)
and you will get the correct logged in page!
Two things to look out. One, try to use s.post and second you need to check in the browser if there is any other value the form is sending by looking at the network tab.
Form is not sending password in clear text. It is encrypting or hashing it before sending. When you type password aaaa in form via network it sends
b3744bb9a8adb2d67cfdf79095bd84f5e77500a76727e6d73eef460eb806511ba73c9f765d4b3738e0b1399ce4a4c4ac3aed17fff34e0ef4037e9be466adec61
so no easy way to login via requests library without duplicating this behavior.

How to use mechanize to fill search form and get results back?

i am trying to use mechanize with python to search for a keyword using a a search form.
this is the form code :
<form id="searchform" action="http://www.example.com/" method="get">
<input id="s" type="text" onfocus="if (this.value == 'Search') {this.value = '';}" onblur="if (this.value == '') {this.value = 'Search';}" name="s" value="Search">
<input type="image" style="border:0; vertical-align: top;" src="http://www.example.com/wp-content/themes/SimpleColor/images/search.gif">
</form>
i want to be able to submit and get the results back so i can extract the info from the results.
Thanks in advance.
From the official page, you create a browser object by br = mechanize.Browser() and follow a link with the object - br.open("http://www.example.com/"), and then you select a form by br.select_form(name="searchform") and you can pass an input by br["s"] = #something and submit it resp = br.submit() use the resp object like you wish.

Python: request module confirm popup

After POST request, there is confirmation(jquery) box appears in browser.
So How do I select confirmation box using request api ?
Here HTML page :
<div class="sent_sms_bg">
<form name="smsAction" id="smsAction" action="/sms_all_action" method="post">
<p><input name="" id="select_all" type="checkbox" value="" /> Select All</p>
<p><input name="Delete_SMS" type="button" class="delete_selected" value="Delete Selected" />
</p>
</div>
and What I'm trying :
self.session = requests.session()
..
...
url = 'http://www.indyarocks.com/sms_all_action'
form = {
'select_all' : 'on',
'Delete_SMS': 'Delete Selected',
}
status = self.session.post(url , data = form)
POSTing directly to the url should work, as presumably the website has a form with an ajax popup to make sure you really want to submit the form, THEN POSTs the data to the URL. It wouldn't work to POST the data first and then fire the popup (as the form is already submitted so it's too late to do anything about it)

log in to webpage with python to scrape data

I am trying to build a webscraper to extract my stats data from MWO Mercs. To do so it is necessary to login to the page and then go through the 6 different stats pages to get the data (this will go into a data base later but that is not my question).
The login form is given below (from https://mwomercs.com/login?return=/profile/stats?type=mech)- from what I see there are two fields that need data EMAIL and PASSWORD and need to be posted. It should then open http://mwomercs.com/profile/stats?type=mech . After that I need have a session to cycle through the various stats pages.
I have tried using urllib, mechanize and requests but I have been totally unable to find the right answer - I would prefer to use requests.
I do realise that similar questions have been asked in stackoverflow but I have searched for a very long time with no success.
Thank you for any help that could be provided
<div id="stubPage">
<div class="container">
<h1 id="stubPageTitle">LOGIN</h1>
<div id="loginForm">
<form action="/do/login" method="post">
<legend>MechWarrior Online REGISTER</legend>
<label>Email Address:</label>
<div class="input-prepend"><span class="add-on textColorBlack textPlain">#</span><input id="email" name="email" class="span4" size="16" type="text" placeholder="user#example.org"></div>
<label>Password:</label>
<div class="input-prepend"><span class="add-on"><span class="icon-lock"></span></span><input id="password" name="password" class="span4" size="16" type="password"></div>
<br>
<button type="submit" class="btn btn-large btn-block btn-primary">LOGIN</button>
<br>
<span class="pull-right">[ Forgot Your Password? ]</span>
<br>
<input type="hidden" name="return" value="/profile/stats?type=mech">
</form>
</div>
</div>
</div>
The Requests documentation is very simple and easy to follow when it comes to submitting form data. Please give this a read-through: More Complicated POST requests
Logins usually come down to saving the cookie and sending it with future requests.
After you POST to the login page with requests.post(), use the request object to retieve the cookies. This is one way to do it:
post_headers = {'content-type': 'application/x-www-form-urlencoded'}
payload = {'username':username, 'password':password}
login_request = requests.post(login_url, data=payload, headers=post_headers)
cookie_dict = login_request.cookies.get_dict()
stats_reqest = requests.get(stats_url, cookies=cookie_dict)
If you still have problems, check the return code from the request with login_request.status_code or the page content for an error with login_request.text
Edit:
Some sites will redirect you several times when you make a request. Make sure to check the request.history object to see what happened and why you got bounced out. For example, I get redirects like this all of the time:
>>> some_request.history
(<Response [302]>, <Response [302]>)
Each item in the history tuple is another request. You can inspect them like normal requests objects, such as request.history[0].url and you can disable the redirects by putting allow_redirects=False in your request parameters:
login_request = requests.post(login_url, data=payload, headers=post_headers, allow_redirects=False)
In some cases, I've had to disallow redirects and add new cookies before progressing to the proper page. Try using something like this to keep your existing cookies and add the new cookies to it:
cookie_dict = dict(cookie_dict.items() + new_request.cookies.get_dict().items())
Doing this after each request will keep your cookies up-to-date for your next request, similar to how your browser would.

Categories

Resources