I am working on my first project using python to submit a form and retrieve weather data from http://www.weather.gov
I am brand new to HTTP form submission and as such it is highly possible I am going about this all wrong. I've read that mechanize and/or selenium are more efficient for this type of job, but I am limited to these modules by my school's server.
import requests
r = requests.get('http://www.weather.gov')
location = raw_input("Enter zipcode: ")
payload = {'key1' : location}
q = requests.post('http://forecast.weather.gov/', data = payload)
print q.text
My attempts to search a given zipcode have been unsuccessful, I am not reaching the local weather for the given zipcode.
Note: I have also tried this form submission using urllib & urllib2 without success.
import urllib
import urllib2
location = raw_input("Enter Zip Code here: ")
url = 'http://forecast.weather.gov/'
values = {'inputstring' : location}
data = urllib.urlencode(values)
req = urllib2.Request(url, data = data)
response = urllib2.urlopen(req)
html = response.read()
print html
Form as seen from inspecting the page:
<form name="getForecast" id="getForecast" action="http://forecast.weather.gov/zipcity.php" method="get">
<label for="inputstring">Local forecast by <br>"City, St" or ZIP code</label>
<input id="inputstring" name="inputstring" type="text" value="Enter location ..." onclick="this.value=''" autocomplete="off">
<input name="btnSearch" id="btnSearch" type="submit" value="Go">
<div id="txtError">
<div id="errorNoResults" style="display:none;">Sorry, the location you searched for was not found. Please try another search.</div>
<div id="errorMultipleResults" style="display:none">Multiple locations were found. Please select one of the following:</div>
<div id="errorChoices" style="display:none"></div>
<input id="btnCloseError" type="button" value="Close" style="display:none">
</div>
<div id="txtHelp"><a style="text-decoration: underline;" href="javascript:void(window.open('http://weather.gov/ForecastSearchHelp.html','locsearchhelp','status=0,toolbar=0,location=0,menubar=0,directories=0,resizable=1,scrollbars=1,height=500,width=530').focus());">Location Help</a></div>
</form>
<input id="inputstring" name="inputstring" type="text" value="Enter location ..." onclick="this.value=''" autocomplete="off">
Post to the URL http://forecast.weather.gov/zipcity.php
That is where the php script lies for receiving post data from the form.
Related
Beginner here, I'm trying to code a website to take a URL as the input and then feed it into my script. Here is my Python script:
import requests
import json
from bs4 import BeautifulSoup
import cgi
import subprocess, cgitb
app = Flask(__name__)
form = cgi.FieldStorage()
url = form.getvalue("market_url")
ua = {"User-Agent":"Mozilla/5.0"}
data = requests.get(url, headers=ua)
soup = BeautifulSoup(data.content, 'html.parser')
And then my HTML looks like this:
<section>
<form action="./app.py" method="post">
<p> Name: <input type="text" name="market_url" id="market_url" value=""
placeholder="Enter Market URL: " /></p>
<input type="submit" value="Submit" />
</form>
</section>
Does someone know what I'm doing wrong here? I keep getting an error that says "requests.exceptions.MissingSchema: Invalid URL 'None': No scheme supplied. Perhaps you meant http://None?"
I'm new to posting on stackoverflow so please don't bite! I had to resort to making an account and asking for help to avoid banging my head on the table any longer...
I'm trying to login to the following website https://account.socialbakers.com/login using the requests module in python. It seems as if the requests module is the place to go but the session.post() function isn't working for me. I can't tell if there is something unique about this type of form or the fact the website is https://
The login form is the following:
<form action="/login" id="login-form" method="post" novalidate="">
<big class="error-message">
<big>
<strong>
</strong>
</big>
</big>
<div class="item-full">
<label for="">
<span class="label-header">
<span>
Your e-mail address
</span>
</span>
<input id="email" name="email" type="email"/>
</label>
</div>
<div class="item-list">
<div class="item-big">
<label for="">
<span class="label-header">
<span>
Password
</span>
</span>
<input id="password" name="password" type="password"/>
</label>
</div>
<div class="item-small">
<button class="btn btn-green" type="submit">
Login
</button>
</div>
</div>
<p>
<a href="/email/reset-password">
<strong>
Lost password?
</strong>
</a>
</p>
</form>
Based on the following post How to "log in" to a website using Python's Requests module? among others I have tried the following code:
url = 'https://account.socialbakers.com/login'
payload = dict(email = 'Myemail', password = 'Mypass')
with session() as s:
soup = BeautifulSoup(s.get(url).content,'lxml')
p = s.post(url, data = payload, verify=True)
print(p.text)
This however just gives me the login page again and doesn't seem to log me in
I have checked in the form that I am referring to the correct names of the inputs 'email' and 'password'. I've tried explicitly passing through cookies as well. The verify=True parameter was suggested as a way to deal with the fact the website is https.
I can't work out what isn't working/what is different about this form to the one on the linked post.
Thanks
Edit: Updated p = s.get to p = s.post
Checked the website. It is sending the SHA3 hash of the password instead of sending as plaintext. You can see this in line 111 of script.js which is included in the main page as :
<script src="/js/script.js"></script>
inside the head tag.
So you need to replicate this behaviour while sending POST requests. I found pysha3 library that does the job pretty well.
So first install pysha3 by running pip install pysha3 (give sudo if necessary) then run the code below
import sha3
import hashlib
import request
url = 'https://account.socialbakers.com/login'
myemail = "abhigolu10#gmail.com"
mypassword = hashlib.sha3_512(b"st#ck0verflow").hexdigest() #take SHA3 of password
payload = {'email':myemail, 'password':mypassword}
with session() as s:
soup = BeautifulSoup(s.get(url).content,'lxml')
p = s.post(url, data = payload, verify=True)
print(p.text)
and you will get the correct logged in page!
Two things to look out. One, try to use s.post and second you need to check in the browser if there is any other value the form is sending by looking at the network tab.
Form is not sending password in clear text. It is encrypting or hashing it before sending. When you type password aaaa in form via network it sends
b3744bb9a8adb2d67cfdf79095bd84f5e77500a76727e6d73eef460eb806511ba73c9f765d4b3738e0b1399ce4a4c4ac3aed17fff34e0ef4037e9be466adec61
so no easy way to login via requests library without duplicating this behavior.
<div id="login-section">
<fieldset class="validation-group">
<table id="navLgnMember" cellspacing="0" cellpadding="0" style="border-collapse:collapse;">
<tr>
<td>
<div id="login-user">
<div class="input" id="username-wrapper">
<div class="loginfield-label">Number / ID / Email</div>
<div class="input-field-small float-left submit-on-enter"><div class="left"></div><input name="ctl00$ctl01$navLgnMember$Username" type="text" maxlength="80" id="Username" title="Username" class="center" style="width:85px;" /><div class="right"></div></div>
</div>
<div class="input" id="password-wrapper">
<div class="loginfield-label">
Password</div>
<div class="input-field-small float-left submit-on-enter"><div class="left"></div><input name="ctl00$ctl01$navLgnMember$Password" type="password" id="Password" title="Password" class="center" title="Password" style="width:85px;" /><div class="right"></div></div>
</div>
<div id="login-wrapper">
<input type="submit" name="ctl00$ctl01$navLgnMember$Login" value="" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("ctl00$ctl01$navLgnMember$Login", "", false, "", "https://tatts.com/tattersalls", false, false))" id="Login" class="button-login" />
</div>
how would one go about submitting to this form from urllib as the current code i have:
import cookielib
import urllib
import urllib2
# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]
# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)
# The action/ target from the form
authentication_url = 'https://tatts.com/tattersalls'
# Input parameters we are going to send
payload = {
'_EVENTTARGET:' ''
'__EVENTARGUMENT:' ''
'__VIEWSTATE:' '/wEPDwUKMTIwNzM2NDc5NQ9kFgICCBBkZBYCAgcPZBYGZg9kFgJmD2QWAmYPFgIeB1Zpc2libGVoZAIBD2QWAmYPZBYCZg8WAh8AaGQCAg9kFgJmD2QWAgIBD2QWAmYPZBYCZg9kFgYCAw9kFgICBw8WAh4FY2xhc3MFFmxhdGVzdFJlc3VsdHNCb2R5RGl2QmdkAgsPZBYCZg9kFgICBQ8WBB4JaW5uZXJodG1sBR8qRGl2IDEgJDFNIGZvciB1cCB0byA0IHdpbm5lcnMuHwBnZAIND2QWAmYPZBYCZg9kFgYCAQ8PFgIeBFRleHQFNVdobyB3b24gbW9uZXkgaW4gRHJvbWFuYT8gVGF0dHNMb3R0byBwcml6ZSB1bmNsYWltZWQhZGQCAw8PFgIfAwV5QSBkaXZpc2lvbiBvbmUgVGF0dHNMb3R0byBwcml6ZSBvZiAkODI5LDM2MS42OCB3YXMgd29uIGluIERyb21hbmEgaW4gU2F0dXJkYXkgbmlnaHTigJlzIGRyYXcgYnV0IHRoZSB3aW5uZXIgaXMgYSBteXN0ZXJ5IWRkAgUPDxYCHgtOYXZpZ2F0ZVVybAUbL3RhdHRlcnNhbGxzL3dpbm5lci1zdG9yaWVzZGRk40y89P1oSwLqvsMH4ZGTu9vsloo='
'__PREVIOUSPAGE:' 'PnGXOHeTQRfdct4aw9jgJ_Padml1ip-t05LAdAWQvBe5-2i1ECm5zC0umv9-PrWPJIXsvg9OvNT2PNp99srtKpWlE4J-6Qp1mICoT3eP49RSXSmN6p_XiieWS68YpbKqyBaJrkmYbJpZwCBw0Wq3tSD3JUc1'
'__EVENTVALIDATION:': '/wEdAAfZmOrHFYG4x80t+WWbtymCH/lQNl+1rLkmSESnowgyHVo7o54PGpUOvQpde1IkKS5gFTlJ0qDsO6vsTob8l0lt1XHRKk5WhaA0Ow6IEfhsMPG0mcjlqyEi79A1gbm2y9z5Vxn3bdCWWa28kcUm81miXWvi1mwhfxiUpcDlmGDs/3LMo4Y='
'ctl00$ctl01$showUpgradeReminderHid:' 'false'
'ctl00$ctl01$navLgnMember$Username:' 'x-tinct'
'ctl00$ctl01$navLgnMember$Password:' '########'
'ctl00$ctl01$navLgnMember$Login:'
}
# Use urllib to encode the payload
data = urllib.urlencode(payload)
# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)
# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()
print (resp)
is a fair way off submitting to the right part of the webfrom.
im trying to log in and create a session to then be able to post further webform data to further parts of the site.
Thanks in advance.
According to this other post from SO : Mechanize and Javascript, you have different options, from simulating in Python what javascript script is doing, to using the full fledged Selenium with its Python bindings.
If you try to proceed the simple Python way, I would strongly urge you to use a network spy such as the excellent wireshark to analyse what a successful login through a real browser actualy gets and sends, and what your Python simulation sends.
I'm trying to write some Python (3.3.2) code to log in to a website using the Requests module. Here is the form section of the login page:
<form method="post" action="https://www.ibvpn.com/billing/dologin.php" name="frmlogin">
<input type="hidden" name="token" value="236647d2da7c8408ceb78178ba03876ea1f2b687" />
<div class="logincontainer">
<fieldset>
<div class="clearfix">
<label for="username">Email Address:</label>
<div class="input">
<input class="xlarge" name="username" id="username" type="text" />
</div>
</div>
<div class="clearfix">
<label for="password">Password:</label>
<div class="input">
<input class="xlarge" name="password" id="password" type="password"/>
</div>
</div>
<div align="center">
<p>
<input type="checkbox" name="rememberme" /> Remember Me
</p>
<p>Request a Password Reset</p>
</div>
</fieldset>
</div>
<div class="actions">
<input type="submit" class="btn primary" value="Login" />
</div>
</form>
Here is my code, trying to deal with hidden input:
import requests
from bs4 import BeautifulSoup
url = 'https://www.ibvpn.com/billing/clientarea.php'
body = {'username':'my email address','password':'my password'}
s = requests.Session()
loginPage = s.get(url)
soup = BeautifulSoup(loginPage.text)
hiddenInputs = soup.findAll(name = 'input', type = 'hidden')
for hidden in hiddenInputs:
name = hidden['name']
value = hidden['value']
body[name] = value
r = s.post(url, data = body)
This just returns the login page. If I post my login data to the URL in the 'action' field, I get a 404 error.
I've seen other posts on StackExchange where automatic cookie handling doesn't seem to work, so I've also tried dealing with the cookies manually using:
cookies = dict(loginPage.cookies)
r = s.post(url, data = body, cookies = cookies)
But this also just returns the login page.
I don't know if this is related to the problem, but after I've run either variant of the code above, entering r.cookies returns <<class 'requests.cookies.RequestsCookieJar'>[]>
If anyone has any suggestions, I'd love to hear them.
You are loading the wrong URL. The form has an action attribute:
<form method="post" action="https://www.ibvpn.com/billing/dologin.php" name="frmlogin">
so you must post your login information to:
https://www.ibvpn.com/billing/dologin.php
instead of posting back to the login page. POST to soup.form['action'] instead:
r = s.post(soup.form['action'], data=body)
Your code is handling cookies just fine; I can see that s.cookies holds a cookie after requesting the login form, for example.
If this still doesn't work (a 404 is returned), then the server is using additional techniques to detect scripts vs. real browsers. Usually this is done by parsing the request headers. Look at your browser headers and replicate those. It may just be the User-Agent header that they parse, but Accept-* headers and Referrer can also play a role.
I am trying to use the Python requests module to authenticate to a website and then retrieve some information from it. This is the login part of the page:
<div>
<label class="label-left" for="username"> … </label>
<input id="username" class="inputbox" type="text" size="18" alt="username" name="username"></input>
</div>
<div>
<label class="label-left" for="passwd"> … </label>
<input id="passwd" class="inputbox" type="password" alt="password" size="18" name="passwd"></input>
</div>
<div> … </div>
<div class="readon">
<input class="button" type="submit" value="Login" name="Submit"></input>
What I am doing now is:
payload = {
'username': username,
'passwd': password,
'Submit':'Login'
}
with requests.Session() as s:
s.post(login, data=payload)
ans = s.get(url)
print ans.text
The problem is that I get the same login page, even after the authentication. The response code is 200, so everything should be ok. Am I missing something?
UPDATE
Thanks to the comment, I have analyzed the post requests and I've seen that there are some hidden parameters. Among them, there are some parameters whose values vary between different requests. For this reason, I am simply getting them with BeautifulSoup and then updating the payload of the post request as follows:
with requests.Session() as s:
login_page = s.get(login)
soup = BeautifulSoup(login_page.text)
inputs = soup.findAll(name='input',type='hidden')
for el in inputs:
name = el['name']
value = el['value']
payload[name]=value
s.post(login, data=payload)
ans = s.get(url)
Nevertheless, I am still getting the login page. There can be some other influencing elements?