Python3: Download DOM from webpage with username/password

Python3: Download DOM from webpage with username/password - python

I am trying to download the DOM from the Yahoo fantasy football page. The data requires a yahoo user. I am looking for the python library where I can add my user/pass to the request.
urllib has HTTPBasicAuthHandler which needs a HTTPPasswordMgr Objects
the add_password field says I am missing an argument, when I try and pass it the 4 it wants. I am not sure what to put for the realm. I am new to Python.
I have found the Requests to look promising, but when I install it, it throws an error and I can not import it properly :\
I was hoping this was a bit easier to do in Python!
import urllib.request
try:
url = "http://football.fantasysports.yahoo.com/"
username = "un"
password = "pw"
pwObj = urllib.request.HTTPPasswordMgr.add_password("http://yahoo.com",url, username, password)
request = urllib.request.HTTPBasicAuthHandler(pwObj)
result = urllib.request.urlopen(request)
print(result.read())
except Exception as e:
print(str(e))
# Error: add_password() missing 1 required positional argument: 'passwd'
The ideal solution would have someone downloading yahoo DOM data from a page that requires credentials :)

Like this:
import urllib.request
try:
url = "http://football.fantasysports.yahoo.com/"
username = "_username"
password = "_password"
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, url, username, password)
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
opener = urllib.request.build_opener(handler)
opener.open("http://football.fantasysports.yahoo.com/f1/leaderboard")
urllib.request.install_opener(opener)
result = urllib.request.urlopen(url)
print(result.read())
except Exception as e:
print(str(e))
But, how to run java events ?

Related

HTTP Basic Auth Failing or Incomplete

I'm trying to log into the vRealize Operations Rest API using basic auth method from this question:
HTTP Basic Authentication not working in Python 3.4
So I use the second code sample:
import urllib.parse
import urllib.request
import urllib.response
userName = "username"
passWord = "password"
top_level_url = "URL"
# create an authorization handler
p = urllib.request.HTTPPasswordMgrWithDefaultRealm()
p.add_password(None, top_level_url, userName, passWord);
auth_handler = urllib.request.HTTPBasicAuthHandler(p)
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)
try:
result = opener.open(top_level_url)
messages = result.read()
print (messages)
except IOError as e:
print (e)
and I get a blue font: HTTP Error 401: Unauthorized
So I think this is an issue with trusting the certificate or it has something to do with the headers. How can I work around this? Any advice would be appreciated.

Python httplib.InvalidURL: nonnumeric port fail

I'm trying to open a URL in Python that needs username and password. My specific implementation looks like this:
http://char_user:char_pwd#casfcddb.example.com/......
I get the following error spit to the console:
httplib.InvalidURL: nonnumeric port: 'char_pwd#casfcddb.example.com'
I'm using urllib2.urlopen, but the error is implying it doesn't understand the user credentials. That it sees the ":" and expects a port number rather than the password and actual address. Any ideas what I'm doing wrong here?

Use BasicAuthHandler for providing the password instead:
import urllib2
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://casfcddb.xxx.com", "char_user", "char_pwd")
auth_handler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
urllib2.urlopen("http://casfcddb.xxx.com")
or using the requests library:
import requests
requests.get("http://casfcddb.xxx.com", auth=('char_user', 'char_pwd'))

I ran into a situation where I needed BasicAuth handling and only had urllib available (no urllib2 or requests). The answer from Uku mostly worked, but here are my mods:
import urllib.request
url = 'https://your/url.xxx'
username = 'username'
password = 'password'
passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, username, password)
auth_handler = urllib.request.HTTPBasicAuthHandler(passman)
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)
resp = urllib.request.urlopen(url)
data = resp.read()

403 error while calling Reddit API.

I'm trying to access data saved by the user. And it keeps returning a 403 error with this being its api end point.
http://www.reddit.com/dev/api#GET_user_{username}_saved
I'm thoroughly confused what to send in my headers to make this request work and the reddit documentation has no mention of it at all. Help?
I'm using Python-requests library to do this.

Referring to line 686 in reddit's code in listingcontroller.py (here) :
if (where in ('saved', 'hidden') and not
((c.user_is_loggedin and c.user._id == vuser._id) or
c.user_is_admin)):
return self.abort403()
you can clearly see that you must be logged in as username or be an admin in order to get the saved or hidden data - otherwise you get a 403 error.

As #zenpoy already mentioned (and which you already know), you have to be logged in. Therefore, you should save the cookie, which you get as a response of a valid call to api/login. I've written some code, which logs a user in and retrieves all saved things:
import urllib
import urllib2
import cookielib
import json
login_url = 'https://ssl.reddit.com/api/login/'
saved_url = 'https://ssl.reddit.com/user/<username>/saved.json'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
def login(username, passwd):
values = {'user': username,
'api_type': 'json',
'passwd': passwd}
data = urllib.urlencode(values)
response = opener.open(login_url, data).read()
print json.loads(response)
def retrieve_saved(username):
url = saved_url.replace('<username>', username)
response = opener.open(url).read()
print json.loads(response)
login(<username>, <passwd>)
retrieve_saved(<username>)

How to authenticate a site with Python using urllib2?

After much reading here on Stackoverflow as well as the web I'm still struggling with getting things to work.
My challenge: to get access to a restricted part of a website for which I'm a member using Python and urllib2.
From what I've read the code should be like this:
mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
url = 'http://www.domain.com'
mgr.add_password(None, url, 'username', 'password')
handler = urllib2.HTTPBasicAuthHandler(mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
try:
response = urllib2.urlopen('http://www.domain.com/restrictedpage')
page = response.read()
print page.geturl()
except IOError, e:
print e
The print doesn't print "http://www.domain.com/restrictedpage", but shows "http://www.domain.com/login" so my credentials aren't stored/processed and I'm being redirected.
How can I get this to work? I've been trying for days and keep hitting the same dead ends. I've tried all the examples I could find to no avail.
My main question is: what's needed to authenticate to a website using Python and urllib2?
Quick question: what am I doing wrong?

Check first manually what is really happening when you are successfully authenticated (instructions with Chrome):
Open develper tools in Chrome (Ctrl + Shift + I)
Click Network tab
Go and do the authentication manually (go the the page, type user + passwd + submit)
check the POST method in the Network tab of the developer tools
check the Request Headers, Query String Parameters and Form Data. There you find all the information needed what you need to have in your own POST.
Then install "Advanced Rest Client (ARC)" Chrome extension
Use the ARC to construct a valid POST for authentication.
Now you know what to have in your headers and form data. Here's a sample code using Requests that worked for me for one particular site:
import requests
USERNAME = 'user' # put correct usename here
PASSWORD = 'password' # put correct password here
LOGINURL = 'https://login.example.com/'
DATAURL = 'https://data.example.com/secure_data.html'
session = requests.session()
req_headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
formdata = {
'UserName': USERNAME,
'Password': PASSWORD,
'LoginButton' : 'Login'
}
# Authenticate
r = session.post(LOGINURL, data=formdata, headers=req_headers, allow_redirects=False)
print r.headers
print r.status_code
print r.text
# Read data
r2 = session.get(DATAURL)
print "___________DATA____________"
print r2.headers
print r2.status_code
print r2.text

For HTTP Basic Auth you can refer this : http://www.voidspace.org.uk/python/articles/authentication.shtml

Python form POST using urllib2 (also question on saving/using cookies)

I am trying to write a function to post form data and save returned cookie info in a file so that the next time the page is visited, the cookie information is sent to the server (i.e. normal browser behavior).
I wrote this relatively easily in C++ using curlib, but have spent almost an entire day trying to write this in Python, using urllib2 - and still no success.
This is what I have so far:
import urllib, urllib2
import logging
# the path and filename to save your cookies in
COOKIEFILE = 'cookies.lwp'
cj = None
ClientCookie = None
cookielib = None
logger = logging.getLogger(__name__)
# Let's see if cookielib is available
try:
import cookielib
except ImportError:
logger.debug('importing cookielib failed. Trying ClientCookie')
try:
import ClientCookie
except ImportError:
logger.debug('ClientCookie isn\'t available either')
urlopen = urllib2.urlopen
Request = urllib2.Request
else:
logger.debug('imported ClientCookie succesfully')
urlopen = ClientCookie.urlopen
Request = ClientCookie.Request
cj = ClientCookie.LWPCookieJar()
else:
logger.debug('Successfully imported cookielib')
urlopen = urllib2.urlopen
Request = urllib2.Request
# This is a subclass of FileCookieJar
# that has useful load and save methods
cj = cookielib.LWPCookieJar()
login_params = {'name': 'anon', 'password': 'pass' }
def login(theurl, login_params):
init_cookies();
data = urllib.urlencode(login_params)
txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
try:
# create a request object
req = Request(theurl, data, txheaders)
# and open it to return a handle on the url
handle = urlopen(req)
except IOError, e:
log.debug('Failed to open "%s".' % theurl)
if hasattr(e, 'code'):
log.debug('Failed with error code - %s.' % e.code)
elif hasattr(e, 'reason'):
log.debug("The error object has the following 'reason' attribute :"+e.reason)
sys.exit()
else:
if cj is None:
log.debug('We don\'t have a cookie library available - sorry.')
else:
print 'These are the cookies we have received so far :'
for index, cookie in enumerate(cj):
print index, ' : ', cookie
# save the cookies again
cj.save(COOKIEFILE)
#return the data
return handle.read()
# FIXME: I need to fix this so that it takes into account any cookie data we may have stored
def get_page(*args, **query):
if len(args) != 1:
raise ValueError(
"post_page() takes exactly 1 argument (%d given)" % len(args)
)
url = args[0]
query = urllib.urlencode(list(query.iteritems()))
if not url.endswith('/') and query:
url += '/'
if query:
url += "?" + query
resource = urllib.urlopen(url)
logger.debug('GET url "%s" => "%s", code %d' % (url,
resource.url,
resource.code))
return resource.read()
When I attempt to log in, I pass the correct username and pwd,. yet the login fails, and no cookie data is saved.
My two questions are:
can anyone see whats wrong with the login() function, and how may I fix it?
how may I modify the get_page() function to make use of any cookie info I have saved ?

There are quite a few problems with the code that you've posted. Typically you'll want to build a custom opener which can handle redirects, https, etc. otherwise you'll run into trouble. As far as the cookies themselves so, you need to call the load and save methods on your cookiejar, and use one of subclasses, such as MozillaCookieJar or LWPCookieJar.
Here's a class I wrote to login to Facebook, back when I was playing silly web games. I just modified it to use a file based cookiejar, rather than an in-memory one.
import cookielib
import os
import urllib
import urllib2
# set these to whatever your fb account is
fb_username = "your#facebook.login"
fb_password = "secretpassword"
cookie_filename = "facebook.cookies"
class WebGamePlayer(object):
def __init__(self, login, password):
""" Start up... """
self.login = login
self.password = password
self.cj = cookielib.MozillaCookieJar(cookie_filename)
if os.access(cookie_filename, os.F_OK):
self.cj.load()
self.opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(self.cj)
)
self.opener.addheaders = [
('User-agent', ('Mozilla/4.0 (compatible; MSIE 6.0; '
'Windows NT 5.2; .NET CLR 1.1.4322)'))
]
# need this twice - once to set cookies, once to log in...
self.loginToFacebook()
self.loginToFacebook()
self.cj.save()
def loginToFacebook(self):
"""
Handle login. This should populate our cookie jar.
"""
login_data = urllib.urlencode({
'email' : self.login,
'pass' : self.password,
})
response = self.opener.open("https://login.facebook.com/login.php", login_data)
return ''.join(response.readlines())
test = WebGamePlayer(fb_username, fb_password)
After you've set your username and password, you should see a file, facebook.cookies, with your cookies in it. In practice you'll probably want to modify it to check whether you have an active cookie and use that, then log in again if access is denied.

If you are having a hard time making your POST requests to work (like I had with a login form), it definitely pays to quickly install the Live HTTP headers extension to Firefox (http://livehttpheaders.mozdev.org/index.html). This small extension can, among other things, show you the exact POST data that are sent when you manually log in.
In my case, I had banged my head against the wall for hours because the site insisted on an extra field with 'action=login' (doh!).

Please using ignore_discard and ignore_expires while save cookie, in mine case it saved OK.
self.cj.save(cookie_file, ignore_discard=True, ignore_expires=True)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python3: Download DOM from webpage with username/password - python

Related

HTTP Basic Auth Failing or Incomplete

Python httplib.InvalidURL: nonnumeric port fail

403 error while calling Reddit API.

How to authenticate a site with Python using urllib2?

Python form POST using urllib2 (also question on saving/using cookies)

Categories

Resources