I'm a beginner to web scraping and although I can do it to an average webpage, I've tried in both node.js and python to scrape Solarwinds but it only returns the login page despite giving the correct login credentials.
import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib
cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_cookiejar(cj)
br.open("******")
br.select_form(nr=0)
br.form['username'] = '***'
br.form['password'] = '***'
br.submit()
print br.response().read()
I always get this error mechanize._form.ControlNotFoundError: no control matching name 'username'
Related
I wrote the following script to submit a file here .
The login works fine. I'm not able to re-submit to one problem more than once, but it works fine when submitting for the first time.
import mechanize
import cookielib
from bs4 import BeautifulSoup
import html2text
import urllib2
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Chrome')]
# Logging in
br.open('http://codeforces.com/enter')
br.select_form(nr=1)
br.form['handle'] = username
br.form['password'] = password
br.submit()
# Submitting
br.open('http://codeforces.com/problemset/submit')
br.select_form(nr=1)
br.form['submittedProblemCode'] = problemCode
#selecting language
br.form['programTypeId'] = ['42']
br.form.add_file(open("code.cpp"), 'text/plain', "code.cpp")
br.submit()
print br.geturl()
For a successful submission, br.geturl() prints
http://codeforces.com/problemset/status
which is the required page, but for an unsuccessful submission it prints
http://codeforces.com/problemset/submit?csrf_token=/insert token/
I'm trying to login with mechanize, get the session cookie, then load a protected page but mechanize doesn't seem to be saving or re-using the session. When I try to load the protected resource I get redirected to the login page. Can anyone see what I'm doing wrong from the code below?
import mechanize
import urllib
import Cookie
import cookielib
cookiejar=cookielib.LWPCookieJar()
br = mechanize.Browser()
br.set_cookiejar(cookiejar)
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 Compatible')]
br.set_cookiejar(cookiejar)
params = {'email_address': 'name#company.com', 'password':pass}
data = urllib.urlencode(params)
request = mechanize.Request('/myLoginPage', data=data)
response = br.open(request)
html = response.read()
request = mechanize.Request('/myProtectedPage')
response = br.open(request)
At this point response is not the data from the protected resource its a redirect to the login page
i want to login to http://nas.ub.ac.id/webAuth/ using mechanize
it is the form
https://lh5.googleusercontent.com/-xD337tAg6mE/U2eDbrTnsZI/AAAAAAAAEuo/P42DQqjHgBQ/w676-h200-no/forms.PNG
it is my code
import mechanize
br = mechanize.Browser()
br.open("http://nas.ub.ac.id/webAuth")
br.select_form(nr = 0)
br.form['username'] = "my username"
br.form['password'] = "my password"
br.submit()
do you know why is it not succesfully to login. thanks
I've got a script set to log into a website. The challenge is that I'm running the script on EC2 and the website is asking for me to do additional verification by sending me a custom code.
I receive the email immediately but need to be able to update that field on the fly.
This is the script
import urllib2
import urllib2
import cookielib
import urllib
import requests
import mechanize
from bs4 import BeautifulSoup
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_refresh(False)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# The site we will navigate into, handling it's session
br.open('https://www.website.com/login')
#select the first form
br.select_form(nr=0)
#user credentials
br['user_key'] = 'username#gmail.com'
br['user_password'] = 'somePassw0rd'
# Login
br.submit()
#enter verification code
input_var = raw_input("Enter something: ")
#put verification code in form
br['Verication'] = str(input_var)
#submit form
br.submit()
The challenge for me is that I keep getting an error saying:
AttributeError: mechanize._mechanize.Browser instance has no attribute __setitem__ (perhaps you forgot to .select_form()?)
What can I do to make this run as intended?
after you br.submit() you go straight into
br['Verication'] = str(input_var)
this is incorrect since using br.submit() will make your browser not have a form selected anymore.
after submitting i would try:
for form in br.forms():
print form
to see if there is another form to be selected
read up on the html code on the login site and check to see exactly what happens when you click login. You may have to reselect a form on that same page then assign the verification code to one of the controls
Now I'm using iMacros to extract data from a web and fill forms submitting the data.
But iMacros is a expensive tool. I need a free library and I've read about Scrapy for data minning. I's a litle more complex to programming with it but the money rules.
The question is if I can fill html forms with Scrapy and submit to the web page. I don't want to use Javascript, I want to use exclusively Python scripts.
I searched in http://doc.scrapy.org/ but I didn't found nothing about form-submit.
Use the scrapy.http.FormRequest class.
The FormRequest class extends the base Request with functionality for dealing with HTML forms
http://doc.scrapy.org/en/latest/topics/request-response.html#formrequest-objects
Mechanize is a python library that lets you automate interactions with a website. It supports HTML form filling.
The below program explains you how to fill a form:
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text
# Browser
br = mechanize.Brenter code hereowser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# The site we will navigate into, handling it's session
br.open('http://gmail.com')
# Select the first (index zero) form
br.select_form(nr=0)
# User credentials
br.form['Email'] = 'user'
br.form['Passwd'] = 'password'
# Login
br.submit()
# Filter all links to mail messages in the inbox
all_msg_links = [l for l in br.links(url_regex='\?v=c&th=')]
# Select the first 3 messages
for msg_link in all_msg_links[0:3]:
print msg_link
# Open each message
br.follow_link(msg_link)
html = br.response().read()
soup = BeautifulSoup(html)
# Filter html to only show the message content
msg = str(soup.findAll('div', attrs={'class': 'msg'})[0])
# Show raw message content
print msg
# Convert html to text, easier to read but can fail if you have intl
# chars
# print html2text.html2text(msg)
print
# Go back to the Inbox
br.follow_link(text='Inbox')
# Logout
br.follow_link(text='Sign out')