Gmail has this sweet thing going on to get an atom feed:
def gmail_url(user, pwd):
return "https://"+str(user)+":"+str(pwd)+"#gmail.google.com/gmail/feed/atom"
Now when you do this in a browser, it authenticates and forwards you. But in Python, at least what I'm trying, isn't working right.
url = gmail_url(settings.USER, settings.PASS)
print url
opener = urllib.FancyURLopener()
f = opener.open(url)
print f.read()
Instead of forwarding correctly, it's doing this:
>>>
https://user:pass#gmail.google.com/gmail/feed/atom
Enter username for New mail feed at mail.google.com:
This is BAD! I shouldn't have to type in the username and password again!! How can I make it just auto-forward in python as it does in my web browser, so I can get the feed contents without all the BS?
You can use the HTTPBasicAuthHandler, I tried the following and it worked:
import urllib2
def get_unread_msgs(user, passwd):
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(
realm='New mail feed',
uri='https://mail.google.com',
user='%s#gmail.com' % user,
passwd=passwd
)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
feed = urllib2.urlopen('https://mail.google.com/mail/feed/atom')
return feed.read()
Related
I've been writing automated tests for a web application and it involves sending emails not just for account creation and password resets, but as the premise for the actual product it sends emails with virtual documents.
As part of my tests I obviously need to check that these emails contain certain elements eg. link to sign up, link to documents etc.
I have written some python code (for the gmail atom feed) that would just find and print the title of each email and if their is a link print that too but it cannot find the link.
import urllib2
import untangle
FEED_URL = 'https://mail.google.com/mail/feed/atom'
def get_unread_msgs(user, passwd):
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(
realm='New mail feed',
uri='https://mail.google.com',
user='{user}#gmail.com'.format(user=user),
passwd=passwd
)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
feed = urllib2.urlopen(FEED_URL)
return feed.read()
if __name__ == "__main__":
import getpass
user = raw_input('Username: ')
passwd = getpass.getpass('Password: ')
xml = get_unread_msgs(user, passwd)
o = untangle.parse(xml)
try:
for item in o.feed.entry:
title = item.title.cdata
print title
link = item.link.cdata
if link:
print "Link"
print ' ', link
except IndexError:
pass # no new mail
Edit: I've just realised that the atom feed doesn't actually give the message data..
Could anyone please suggest an alternative method of achieving my goal?
You could access the messages via imaplib instead:
import imaplib
def get_unread_msgs(user, passwd):
M = imaplib.IMAP4_SSL('imap.gmail.com')
M.login(user, passwd)
try:
M.select()
try:
type, data = M.search(None, '(UNSEEN)')
for num in data[0].split():
yield M.fetch(num, '(RFC822)')
finally:
M.close()
finally:
M.logout()
You will need to enable IMAP in your gmail settings if you haven't already:
Get started with IMAP and POP3
If you are looking for a (gmail specific) solution without polling the server for updates, you can look into the Gmail Notifications API.
So I have been looking around and have managed to cobble together some code that lets me login to the website, http://forums.somethingawful.com
It works, I can see from the response that it works.
When I try using the same urllib2 opener that I created for the above login, to visit this part of the site http://forums.somethingawful.com/attachment.php?attachmentid=300 (which I need to be logged in to view) to open this page, I get a response of "ÿØÿà"
EDIT: http://i.imgur.com/PmWl1s4.png
I have included a screenshot of what the target page looks like when logged in, if this is anymore help
Any ideas why?
"""
# Script to log in to website and store cookies.
# run as: python web_login.py USERNAME PASSWORD
#
# sources of code include:
#
# http://stackoverflow.com/questions/2954381/python-form-post-using-urllib2-also-question-on-saving-using-cookies
# http://stackoverflow.com/questions/301924/python-urllib-urllib2-httplib-confusion
# http://www.voidspace.org.uk/python/articles/cookielib.shtml
#
# mashed together by Martin Chorley
#
# Licensed under a Creative Commons Attribution ShareAlike 3.0 Unported License.
# http://creativecommons.org/licenses/by-sa/3.0/
"""
import urllib, urllib2
import cookielib
import sys
import urlparse
from BeautifulSoup import BeautifulSoup as bs
class WebLogin(object):
def __init__(self, username, password):
# url for website we want to log in to
self.base_url = 'http://forums.somethingawful.com/'
# login action we want to post data to
# could be /login or /account/login or something similar
self.login_action = '/account.php?'
# file for storing cookies
self.cookie_file = 'login.cookies'
# user provided username and password
self.username = username
self.password = password
# set up a cookie jar to store cookies
self.cj = cookielib.MozillaCookieJar(self.cookie_file)
# set up opener to handle cookies, redirects etc
self.opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(self.cj)
)
# pretend we're a web browser and not a python script
self.opener.addheaders = [('User-agent',
('Chrome/16.0.912.77'))
]
# open the front page of the website to set and save initial cookies
response = self.opener.open(self.base_url)
self.cj.save()
# try and log in to the site
response = self.login()
response2 = self.opener.open("http://forums.somethingawful.com/attachment.php?attachmentid=300")
print response2.read() + "LLLLLL"
# method to do login
def login(self):
# parameters for login action
# may be different for different websites
# check html source of website for specifics
login_data = urllib.urlencode({
'action': 'login',
'username': 'username',
'password': 'password'
})
# construct the url
login_url = self.base_url + self.login_action
# then open it
response = self.opener.open(login_url, login_data)
# save the cookies and return the response
self.cj.save()
return response
if __name__ == "__main__":
username = "username"
password = "password"
# initialise and login to the website
test = WebLogin(username, password)
Try this instead:
import urllib2,cookielib
def login(username,password):
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
url1 = "http://forums.somethingawful.com/attachment.php?attachmentid=300"
url2 = "http://forums.somethingawful.com/account.php?action=loginform"
data = "&username="+username+"&password="+password
socket = opener.open(url1)
socket = opener.open(url2,data)
return socket.read()
P.S.: I wrote it as a standalone function; you can integrate it into your class if it works for you. In addition, the call to opener.open(url1) might be redundant; would need a valid pair of username/password in order to verify that...
I'm trying to open a URL in Python that needs username and password. My specific implementation looks like this:
http://char_user:char_pwd#casfcddb.example.com/......
I get the following error spit to the console:
httplib.InvalidURL: nonnumeric port: 'char_pwd#casfcddb.example.com'
I'm using urllib2.urlopen, but the error is implying it doesn't understand the user credentials. That it sees the ":" and expects a port number rather than the password and actual address. Any ideas what I'm doing wrong here?
Use BasicAuthHandler for providing the password instead:
import urllib2
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://casfcddb.xxx.com", "char_user", "char_pwd")
auth_handler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
urllib2.urlopen("http://casfcddb.xxx.com")
or using the requests library:
import requests
requests.get("http://casfcddb.xxx.com", auth=('char_user', 'char_pwd'))
I ran into a situation where I needed BasicAuth handling and only had urllib available (no urllib2 or requests). The answer from Uku mostly worked, but here are my mods:
import urllib.request
url = 'https://your/url.xxx'
username = 'username'
password = 'password'
passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, username, password)
auth_handler = urllib.request.HTTPBasicAuthHandler(passman)
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)
resp = urllib.request.urlopen(url)
data = resp.read()
I am trying to write a Python script to POST a multipart form to a site that requires authentication through CAS.
There are two approaches that both solve part of the problem:
The Python requests library works well for submitting multipart forms.
There is caslib, with a login function. It returns an OpenerDirector that can presumably be used for further requests.
Unfortunately, I can't figure out how to get a complete solution out what I have so far.
There are just some ideas from a couple hours of research; I am open to just about any solution that works.
Thanks for the help.
I accepted J.F. Sebastian's answer because I think it was closest to what I'd asked, but I actually wound up getting it to work by using mechanize, Python library for web browser automation.
import argparse
import mechanize
import re
import sys
# (SENSITIVE!) Authentication info
username = r'username'
password = r'password'
# Command line arguments
parser = argparse.ArgumentParser(description='Submit lab to CS 235 site (Winter 2013)')
parser.add_argument('lab_num', help='Lab submission number')
parser.add_argument('file_name', help='Submission file (zip)')
args = parser.parse_args()
# Go to login site
br = mechanize.Browser()
br.open('https://cas.byu.edu/cas/login?service=https%3a%2f%2fbeta.cs.byu.edu%2f~sub235%2fsubmit.php')
# Login and forward to submission site
br.form = br.forms().next()
br['username'] = username
br['password'] = password
br.submit()
# Submit
br.form = br.forms().next()
br['labnum'] = list(args.lab_num)
br.add_file(open(args.file_name), 'application/zip', args.file_name)
r = br.submit()
for s in re.findall('<h4>(.+?)</?h4>', r.read()):
print s
You could use poster to prepare multipart/form-data. Try to pass poster's opener to the caslib and use caslib's opener to make requests (not tested):
import urllib2
import caslib
import poster.encode
import poster.streaminghttp
opener = poster.streaminghttp.register_openers()
r, opener = caslib.login_to_cas_service(login_url, username, password,
opener=opener)
params = {'file': open("test.txt", "rb"), 'name': 'upload test'}
datagen, headers = poster.encode.multipart_encode(params)
response = opener.open(urllib2.Request(upload_url, datagen, headers))
print response.read()
You could write a Authentication Handler for Requests using caslib. Then you could do something like:
auth = CasAuthentication("url", "login", "password")
response = requests.get("http://example.com/cas_service", auth=auth)
Or if you're making tons of requests against the website:
s = requests.session()
s.auth = auth
s.post('http://casservice.com/endpoint', data={'key', 'value'}, files={'filename': '/path/to/file'})
After much reading here on Stackoverflow as well as the web I'm still struggling with getting things to work.
My challenge: to get access to a restricted part of a website for which I'm a member using Python and urllib2.
From what I've read the code should be like this:
mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
url = 'http://www.domain.com'
mgr.add_password(None, url, 'username', 'password')
handler = urllib2.HTTPBasicAuthHandler(mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
try:
response = urllib2.urlopen('http://www.domain.com/restrictedpage')
page = response.read()
print page.geturl()
except IOError, e:
print e
The print doesn't print "http://www.domain.com/restrictedpage", but shows "http://www.domain.com/login" so my credentials aren't stored/processed and I'm being redirected.
How can I get this to work? I've been trying for days and keep hitting the same dead ends. I've tried all the examples I could find to no avail.
My main question is: what's needed to authenticate to a website using Python and urllib2?
Quick question: what am I doing wrong?
Check first manually what is really happening when you are successfully authenticated (instructions with Chrome):
Open develper tools in Chrome (Ctrl + Shift + I)
Click Network tab
Go and do the authentication manually (go the the page, type user + passwd + submit)
check the POST method in the Network tab of the developer tools
check the Request Headers, Query String Parameters and Form Data. There you find all the information needed what you need to have in your own POST.
Then install "Advanced Rest Client (ARC)" Chrome extension
Use the ARC to construct a valid POST for authentication.
Now you know what to have in your headers and form data. Here's a sample code using Requests that worked for me for one particular site:
import requests
USERNAME = 'user' # put correct usename here
PASSWORD = 'password' # put correct password here
LOGINURL = 'https://login.example.com/'
DATAURL = 'https://data.example.com/secure_data.html'
session = requests.session()
req_headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
formdata = {
'UserName': USERNAME,
'Password': PASSWORD,
'LoginButton' : 'Login'
}
# Authenticate
r = session.post(LOGINURL, data=formdata, headers=req_headers, allow_redirects=False)
print r.headers
print r.status_code
print r.text
# Read data
r2 = session.get(DATAURL)
print "___________DATA____________"
print r2.headers
print r2.status_code
print r2.text
For HTTP Basic Auth you can refer this : http://www.voidspace.org.uk/python/articles/authentication.shtml