I'm trying to login with mechanize, get the session cookie, then load a protected page but mechanize doesn't seem to be saving or re-using the session. When I try to load the protected resource I get redirected to the login page. Can anyone see what I'm doing wrong from the code below?
import mechanize
import urllib
import Cookie
import cookielib
cookiejar=cookielib.LWPCookieJar()
br = mechanize.Browser()
br.set_cookiejar(cookiejar)
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 Compatible')]
br.set_cookiejar(cookiejar)
params = {'email_address': 'name#company.com', 'password':pass}
data = urllib.urlencode(params)
request = mechanize.Request('/myLoginPage', data=data)
response = br.open(request)
html = response.read()
request = mechanize.Request('/myProtectedPage')
response = br.open(request)
At this point response is not the data from the protected resource its a redirect to the login page
Related
I'm a beginner to web scraping and although I can do it to an average webpage, I've tried in both node.js and python to scrape Solarwinds but it only returns the login page despite giving the correct login credentials.
import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib
cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_cookiejar(cj)
br.open("******")
br.select_form(nr=0)
br.form['username'] = '***'
br.form['password'] = '***'
br.submit()
print br.response().read()
I always get this error mechanize._form.ControlNotFoundError: no control matching name 'username'
I've got a script set to log into a website. The challenge is that I'm running the script on EC2 and the website is asking for me to do additional verification by sending me a custom code.
I receive the email immediately but need to be able to update that field on the fly.
This is the script
import urllib2
import urllib2
import cookielib
import urllib
import requests
import mechanize
from bs4 import BeautifulSoup
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_refresh(False)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# The site we will navigate into, handling it's session
br.open('https://www.website.com/login')
#select the first form
br.select_form(nr=0)
#user credentials
br['user_key'] = 'username#gmail.com'
br['user_password'] = 'somePassw0rd'
# Login
br.submit()
#enter verification code
input_var = raw_input("Enter something: ")
#put verification code in form
br['Verication'] = str(input_var)
#submit form
br.submit()
The challenge for me is that I keep getting an error saying:
AttributeError: mechanize._mechanize.Browser instance has no attribute __setitem__ (perhaps you forgot to .select_form()?)
What can I do to make this run as intended?
after you br.submit() you go straight into
br['Verication'] = str(input_var)
this is incorrect since using br.submit() will make your browser not have a form selected anymore.
after submitting i would try:
for form in br.forms():
print form
to see if there is another form to be selected
read up on the html code on the login site and check to see exactly what happens when you click login. You may have to reselect a form on that same page then assign the verification code to one of the controls
I've navigated through different responses to my question but still didn't manage to run it :(.
I'm logging onto a site using python & mechanize, my code looks like this
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
...
r = br.open('http://...')
html = r.read()
form = br.forms().next()
br.form = form
br.submit()
Sending the form is not a problem, the problem is that when I write br.open() again to perform a GET request, Python doesn't send back the Cookie PHPSESSID (I looked this in wireshark), any ideas?
Thanks!
import cookielib, urllib2
ckjar = cookielib.MozillaCookieJar(os.path.join(’C:\Documents and Settings\tom\Application Data\Mozilla\Firefox\Profiles\h5m61j1i.default’, ‘cookies.txt’))
req = urllib2.Request(url, postdata, header)
req.add_header(’User-Agent’, \
‘Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)’)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(ckjar) )
f = opener.open(req)
htm = f.read()
f.close()
When I try to post data from http to https, urllib2 does not return desired https webpage instead website asks to enable cookies.
To get first http page:
proxyHandler = urllib2.ProxyHandler({'http': "http://proxy:port" })
opener = urllib2.build_opener(proxyHandler)
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')]
urllib2.install_opener(opener)
resp = urllib2.urlopen(url)
content = resp.read()
When I extract data from above page and post data to second https page, urllib2 returns success status 200 and page asks to enable cookies.
I've checked the post data, its fine. I'm getting cookies from website but not sure whether they are being sent with next request or not as I read in python docs that urllib2 automatically handles cookies.
To get second https page:
resp = urllib2.urlopen(url, data=postData)
content = resp.read()
I also tried to set proxy handler to this as read in a reply to similar problem on stackoverflow somewhere but got same result:
proxyHandler = urllib2.ProxyHandler({'https': "http://proxy:port" })
urllib2 "handles" cookies in responses but it doesn't not automatically store them and resend them with later requests. You'll need to use the cooklib module for that.
There are some examples in the documentation that show how it works with urllib2.
I am using this code:
def req(url, postfields):
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy_support)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
return opener.open(url).read()
To make a simple http get request (using tor as proxy).
Now I would like to know how to make multiple request using the same cookie.
For example:
req('http://loginpage', 'postfields')
source = req('http://pageforloggedinonly', 0)
#do stuff with source
req('http://anotherpageforloggedinonly', 'StuffFromSource')
I know that my function req doesn't support POST (yet), but I have sent postfields using httplib so I guess I can figure that by myself, but I don't understand how to use cookies, I saw some examples but they are all one request only, I want to reuse the cookie from the first login request in the succeeding requests, or saving/using the cookie from a file (like curl does), that would make everything easier.
The code I posted I only to illustrate what I am trying to achieve, I think I will use httplib(2) for the final app.
UPDATE:
cookielib.LWPCOokieJar worked fine, here's a sample I did for testing:
import urllib2, cookielib, os
def request(url, postfields, cookie):
urlopen = urllib2.urlopen
cj = cookielib.LWPCookieJar()
Request = urllib2.Request
if os.path.isfile(cookie):
cj.load(cookie)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
req = Request(url, postfields, txheaders)
handle = urlopen(req)
cj.save(cookie)
return handle.read()
print request('http://google.com', None, 'cookie.txt')
The cookielib module is what you need to do this. There's a nice tutorial with some code samples.