I am trying to login into the website http://ogame.us using python to access the data. After looking around the web to find out how to attempt to do this, I settled on using the mechanize module. I think I have the general gist of the code down, but when I submit the html form nothing happens. Here's the code:
import sys,os
import mechanize, urllib
import cookielib
from BeautifulSoup import BeautifulSoup,BeautifulStoneSoup,Tag
import datetime, time, socket
import re,sys,os,mechanize,urllib,time, urllib2
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6')]
br.open('http://ogame.us')
br.select_form(name = 'loginForm' )
br['login'] = 'stackexample'
br['pass'] = 'examplepassword'
br['uni_url'] = ['uni103.ogame.us']
br.submit()
print br.geturl()
The response from geturl() is the same url that I was at before. Anyone know what is going on?
Try this:
data = br.submit()
html=data.read()
Maybe select the button directly?
response = br.submit(type="submit", id="loginSubmit")
There is a third field (uni) that I was not completing. Everything else was correct.
In the future, with Google Chrome (and probably other browsers) you can view the actual requests sent to the browser by opening Chrome Developer Tools and looking under network. This saves quite a bit of time.
Related
I need to login into a website by using mechanize in python and then continue traversing that website using pycurl. So what I need to know is how to transfer a logged-in state established via mechanize into pycurl. I assume it's not just about copying the cookie over. Or is it? Code examples are valued ;)
Why I'm not willing to use pycurl alone:
I have time constraints and my mechanize code worked after 5 minutes of modifying this example as follows:
import mechanize
import cookielib
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# debugging messages?
#br.set_debug_http(True)
#br.set_debug_redirects(True)
#br.set_debug_responses(True)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# Open the site
r = br.open('https://thewebsite.com')
html = r.read()
# Show the source
print html
# or
print br.response().read()
# Show the html title
print br.title()
# Show the response headers
print r.info()
# or
print br.response().info()
# Show the available forms
for f in br.forms():
print f
# Select the first (index zero) form
br.select_form(nr=0)
# Let's search
br.form['username']='someusername'
br.form['password']='somepwd'
br.submit()
print br.response().read()
# Looking at some results in link format
for l in br.links(url_regex='\.com'):
print l
Now if I could only transfer the right information from br object to pycurl I would be done.
Why I'm not willing to use mechanize alone:
Mechanize is based on urllib and urllib is a nightmare. I had too many traumatizing issues with it. I can swallow one or two calls in order to login, but please no more. In contrast pycurl has proven for me to be stable, customizable and fast. From my experience, pycurl to urllib is like star trek to flintstones.
PS: In case anyone wonders, I use BeautifulSoup once I have the html
Solved it. Appartently it WAS all about the cookie. Here is my code to get the cookie:
import cookielib
import mechanize
def getNewLoginCookieFromSomeWebsite(username = 'someusername', pwd = 'somepwd'):
"""
returns a login cookie for somewebsite.com by using mechanize
"""
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but does not hang on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:26.0) Gecko/20100101 Firefox/26.0')]
# Open login site
response = br.open('https://www.somewebsite.com')
# Select the first (index zero) form
br.select_form(nr=0)
# Enter credentials
br.form['user']=username
br.form['password']=pwd
br.submit()
cookiestr = ""
for c in br._ua_handlers['_cookies'].cookiejar:
cookiestr+=c.name+'='+c.value+';'
return cookiestr
In order to activate the usage of that cookie when using pycurl, all you have to do is to type the following before c.perform() occurs:
c.setopt(pycurl.COOKIE, getNewLoginCookieFromSomeWebsite("username", "pwd"))
Keep in mind: some websites may keep interacting with the cookie via Set-Content and pycurl (unlike mechanize) does not automatically execute any operations on cookies. Pycurl simply receives the string and leaves to the user what to do with it.
I am trying to login to my bank website using Python and mechanize.
https://chaseonline.chase.com/Logon.aspx
I have looked at all the previous posted but still can't login. I'm thinking it may have to do with the way I am submitting my form. The HTML for the submit button is:
<input type="image" id="logon"
src="https://chaseonline.chase.com/images/logon.gif" onclick="return
check_all_fields_logon_RSA_Auth(document.getElementById('UserID'),
document.getElementById('Password'));" width="58" height="21" border="0"
title="Log On" tabindex="7">
Here is the script I'm using:
import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib
from time import sleep
chase_url = 'https://chaseonline.chase.com/Logon.aspx'
# Browser
br = mechanize.Browser()
# Enable cookie support for urllib2
cookiejar = cookielib.LWPCookieJar()
br.set_cookiejar( cookiejar )
# Broser options
br.set_handle_equiv( True )
br.set_handle_gzip( True )
br.set_handle_redirect( True )
br.set_handle_referer( True )
br.set_handle_robots( False )
# Refresh handle
br.set_handle_refresh( mechanize._http.HTTPRefreshProcessor(), max_time = 1 )
br.addheaders = [ ( 'User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1' ) ]
# authenticate
br.open(chase_url)
br.select_form( nr=0 )
br.form['UserID'] = 'joe1234'
br.form['Password'] = '123456'
br.submit()
print "Success!\n"
sleep(10)
print br.title()
If the login worked, then the page should be "Chase Online - My Account"
What am I doing wrong?
I was trying to do something similar but with ESPN a few months ago. Mechanize gave me problems, then someone gave me the answer.
The answer is selenium.
https://selenium-python.readthedocs.org/getting-started.html
Otherwise, what exactly is going wrong when you run your script? Does it produce an error? What does the error say?
I've got a script set to log into a website. The challenge is that I'm running the script on EC2 and the website is asking for me to do additional verification by sending me a custom code.
I receive the email immediately but need to be able to update that field on the fly.
This is the script
import urllib2
import urllib2
import cookielib
import urllib
import requests
import mechanize
from bs4 import BeautifulSoup
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_refresh(False)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# The site we will navigate into, handling it's session
br.open('https://www.website.com/login')
#select the first form
br.select_form(nr=0)
#user credentials
br['user_key'] = 'username#gmail.com'
br['user_password'] = 'somePassw0rd'
# Login
br.submit()
#enter verification code
input_var = raw_input("Enter something: ")
#put verification code in form
br['Verication'] = str(input_var)
#submit form
br.submit()
The challenge for me is that I keep getting an error saying:
AttributeError: mechanize._mechanize.Browser instance has no attribute __setitem__ (perhaps you forgot to .select_form()?)
What can I do to make this run as intended?
after you br.submit() you go straight into
br['Verication'] = str(input_var)
this is incorrect since using br.submit() will make your browser not have a form selected anymore.
after submitting i would try:
for form in br.forms():
print form
to see if there is another form to be selected
read up on the html code on the login site and check to see exactly what happens when you click login. You may have to reselect a form on that same page then assign the verification code to one of the controls
Now I'm using iMacros to extract data from a web and fill forms submitting the data.
But iMacros is a expensive tool. I need a free library and I've read about Scrapy for data minning. I's a litle more complex to programming with it but the money rules.
The question is if I can fill html forms with Scrapy and submit to the web page. I don't want to use Javascript, I want to use exclusively Python scripts.
I searched in http://doc.scrapy.org/ but I didn't found nothing about form-submit.
Use the scrapy.http.FormRequest class.
The FormRequest class extends the base Request with functionality for dealing with HTML forms
http://doc.scrapy.org/en/latest/topics/request-response.html#formrequest-objects
Mechanize is a python library that lets you automate interactions with a website. It supports HTML form filling.
The below program explains you how to fill a form:
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text
# Browser
br = mechanize.Brenter code hereowser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# The site we will navigate into, handling it's session
br.open('http://gmail.com')
# Select the first (index zero) form
br.select_form(nr=0)
# User credentials
br.form['Email'] = 'user'
br.form['Passwd'] = 'password'
# Login
br.submit()
# Filter all links to mail messages in the inbox
all_msg_links = [l for l in br.links(url_regex='\?v=c&th=')]
# Select the first 3 messages
for msg_link in all_msg_links[0:3]:
print msg_link
# Open each message
br.follow_link(msg_link)
html = br.response().read()
soup = BeautifulSoup(html)
# Filter html to only show the message content
msg = str(soup.findAll('div', attrs={'class': 'msg'})[0])
# Show raw message content
print msg
# Convert html to text, easier to read but can fail if you have intl
# chars
# print html2text.html2text(msg)
print
# Go back to the Inbox
br.follow_link(text='Inbox')
# Logout
br.follow_link(text='Sign out')
I've been successfully scraping a website using mechanize, but I've had some problems with the page.open getting stuck (and not giving a timeout error) so I'd like to try and perform the same scrape with Requests. However, I can't figure out how to select a form to enter my login credentials. Here is the working code in mechanize:
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.set_proxies({"https": "198.102.28.100:7808", "http": "54.235.92.109:8080"})
# Open Login Page
br.open("https://example.com/login/Signin?")
br.select_form(name="signinForm_0")
br["loginEmail"] = "username"
br["loginPassword"] = 'password'
br.method = "POST"
br.submit()
#Open Page
URL = 'https://example.com'
br.open(URL, timeout=5.0)
I'm unsure how to replicate the br.select_form functionality using Python Requets. Does anyone have any ideas or experience doing this?
If I not wrong, Selenium is similar to Mechanize, but not Requests. Requests is used mostly to HTTP. Requests is similar to urllib or urllib2 but it is better. You can send request (GET or POST) and read html file from server but you need other modules to get some element on page - BeautifulSoup, lxml, pyQuery