Adding authentication header in python 3 - python

Using the urllib2 library and the add_header function, I am able to authenticate and retrieve data in python 2.7. But since urllib2 library in more present in python 3, how do I add the Basic Authentication header with urllib library?

Please check add_header method of urllib.request's Request class.
import urllib.request
req = urllib.request.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib.request.urlopen(req)
By the way, I recommend you to check another way, using HTTPBasicAuthHandler:
import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
urllib.request.urlopen('http://www.example.com/login.html')
(taken from the same page)

Related

How to send cookies with urllib

I'm attempting to connect to a website that requires you to have a specific cookie to access it. For the sake of this question, we'll call the cookie 'required_cookie' and the value 'required_value'.
This is my code:
import urllib
import http.cookiejar
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('required_cookie', 'required_value'), ('User-Agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
req = Request('https://www.thewebsite.com/')
webpage = urlopen(req).read()
print(webpage)
I'm new to urllib so please answer me as a beginner
To do this with urllib, you need to:
Construct a Cookie object. The constructor isn't documented in the docs, but if you help(http.cookiejar.Cookie) in the interactive interpreter, you can see that its constructor demands values for all 16 attributes. Notice that the docs say, "It is not expected that users of http.cookiejar construct their own Cookie instances."
Add it to the cookiejar with cj.set_cookie(cookie).
Tell the cookiejar to add the correct headers to the request with cj.add_cookie_headers(req).
Assuming you've configured the policy correctly, you're set.
But this is a huge pain. As the docs for urllib.request say:
See also The Requests package is recommended for a higher-level HTTP client interface.
And, unless you have some good reason you can't install requests, you really should go that way. urllib is tolerable for really simple cases, and it can be handy when you need to get deep under the covers—but for everything else, requests is much better.
With requests, your whole program becomes a one-liner:
webpage = requests.get('https://www.thewebsite.com/', cookies={'required_cookie': required_value}, headers={'User-Agent': 'Mozilla/5.0'}).text
… although it's probably more readable as a few lines:
cookies = {'required_cookie': required_value}
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.thewebsite.com/', cookies=cookies, headers=headers)
webpage = response.text
With the help of Kite documentation: https://www.kite.com/python/answers/how-to-add-a-cookie-to-an-http-request-using-urllib-in-python
You can add cookie this way:
import urllib
a_request = urllib.request.Request("http://www.kite.com/")
a_request.add_header("Cookie", "cookiename=cookievalue")
or in a different way:
from urllib.request import Request
url = "https://www.kite.com/"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0', 'Cookie':'myCookie=lovely'})

FancyURLopener.version equivalence in requests

Sometimes it's necessary to change the version attribute when retrieving a request using FancyURLopener, e.g.
from urllib.request import FancyURLopener
class NewOpener(FancyURLopener):
version = 'Some fancy thing'
url = 'www.google.com'
opener = NewOpener.retrieve(url, 'google.html')
Is there an equivalence in the requests library when using requests.get()?
As #Sraw commented, the "version" is basically the user-agent file in the header, so
requests.get(url, headers={'User-agent': 'Some fancy thing'}

REST GET request on Python sending named argument

I am playing from Linux using curl inside bash but now I need to move my script to python in order to be more effective.
What is the best way to do something like the following line on python?
curl baseuri:port/resource -u user:pswd
I played already with urllib2 but I dont get a clue on how to send the "-u user:pswd"
You'll need to use a HTTPPasswordMgrWithDefaultRealm instance to handle the authentication, and add in a HTTPBasicAuthHandler handler to respond to the authentication challenge:
import urllib2
url = 'baseuri:port/resource'
username = 'user'
password = 'pswd'
pwmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
pwmgr.add_password(None, url, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
response = opener.open(theurl)
Yes, this is a handful.
If you can install a 3rd party library, then add use the requests library; it'll be so much easier:
import requests
url = 'baseuri:port/resource'
username = 'user'
password = 'pswd'
response = requests.get(url, auth=(username, password))
Actually, we use pycurl instead which also works pretty well. Here an example to get a json from the request:
import pycurl
import json
from io import BytesIO
data = BytesIO()
pyCurl = pycurl.Curl()
pyCurl.setopt(pyCurlClass.URL,string_http)
pyCurl.setopt(pycurl.USERPWD, 'user:password')
pycurl.setopt(c.WRITEFUNCTION, data.write)
pyCurl.perform()
dictionary = json.loads(data.getvalue())

urllib2 failing with https websites

Using urllib2 and trying to get an https page, it keeps failing with
Invalid url, unable to resolve
The url is
https://www.domainsbyproxy.com/default.aspx
but I have this happening on multiple https sites.
I am using python 2.7, and below is the code I am using to setup the connection
opener = urllib2.OpenerDirector()
opener.add_handler(urllib2.HTTPHandler())
opener.add_handler(urllib2.HTTPDefaultErrorHandler())
opener.addheaders = [('Accept-encoding', 'gzip')]
fetch_timeout = 12
response = opener.open(url, None, fetch_timeout)
The reason I am setting handlers manually is because I don't want redirects handled (which works fine). The above works fine for http requests, however https - fails.
Any clues?
You should be using HTTPSHandler instead of HTTPHandler
If you don't mind external libraries, consider the excellent requests module. It takes care of these quirks with urllib.
Your code, using requests is:
import requests
r = requests.get(url, headers={'Accept-encoding': 'gzip'}, timeout=12)

setting referral url in python urllib.urlretrieve

I am using urllib.urlretrieve in Python to download websites. Though some websites seem to not want me to download them, unless they have a proper referrer from their own site. Does anybody know of a way I can set a referrer in one of Python's libraries or a external one to.
import urllib2
req = urllib2.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib2.urlopen(req)
adopted from http://docs.python.org/library/urllib2.html
urllib makes it hard to send arbitrary headers with the request; you could use urllib2, which lets you build and send a Request object with arbitrary headers (including of course the -- alas sadly spelled;-) -- Referer). Doesn't offer urlretrieve, but it's easy to just urlopen as you with and copy the resulting file-like object to disk if you want (directly, or e.g. via shutil functions).
Also, using urllib2 with build_opener you can do this:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('Referer', 'http://www.python.org/')]
opener.open('http://www.example.com/')

Categories

Resources