Python requests lib, is requests.Session equivalent to urllib2's opener? - python

I need to accomplish a login task in my own project.Luckily I found someone has it done already.
Here is the related code.
import re,urllib,urllib2,cookielib
class Login():
cj = cookielib.LWPCookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
def __init__(self,name='',password='',domain=''):
self.name=name
self.password=password
self.domain=domain
urllib2.install_opener(self.opener)
def login(self):
params = {'domain':self.domain,'email':self.name,'password':self.password}
req = urllib2.Request(
website_url,
urllib.urlencode(params)
)
self.openrate = self.opener.open(req)
print self.openrate.geturl()
info = self.openrate.read()
I've tested the code, it works great (according to info).
Now I want to port it to Python 3 as well as using requests lib instead of urllib2.
My thoughts:
since the original code use opener, though not sure, I think its equivalent in requests is requests.Session
Am I supposed to pass in a jar = cookiejar.CookieJar() when making request? Not sure either.
I've tried something like
import requests
from http import cookiejar
from urllib.parse import urlencode
jar = cookiejar.CookieJar()
s = requests.Session()
s.post(
website_url,
data = urlencode(params),
allow_redirects = True,
cookies = jar
)
Also, followed the answer in Putting a `Cookie` in a `CookieJar`, I tried making the same request again, but none of these worked.
That's why I'm here for help.
Will someone show me the right way to do this job? Thank you~

An opener and a Session are not entirely analogous, but for your particular use-case they match perfectly.
You do not need to pass a CookieJar when using a Session: Requests will automatically create one, attach it to the Session, and then persist the cookies to the Session for you.
You don't need to urlencode the data: requests will do that for you.
allow_redirects is True by default, you don't need to pass that parameter.
Putting all of that together, your code should look like this:
import requests
s = requests.Session()
s.post(website_url, data = params)
Any future requests made using the Session you just created will automatically have cookies applied to them if they are appropriate.

Related

How to send cookies with urllib

I'm attempting to connect to a website that requires you to have a specific cookie to access it. For the sake of this question, we'll call the cookie 'required_cookie' and the value 'required_value'.
This is my code:
import urllib
import http.cookiejar
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('required_cookie', 'required_value'), ('User-Agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
req = Request('https://www.thewebsite.com/')
webpage = urlopen(req).read()
print(webpage)
I'm new to urllib so please answer me as a beginner
To do this with urllib, you need to:
Construct a Cookie object. The constructor isn't documented in the docs, but if you help(http.cookiejar.Cookie) in the interactive interpreter, you can see that its constructor demands values for all 16 attributes. Notice that the docs say, "It is not expected that users of http.cookiejar construct their own Cookie instances."
Add it to the cookiejar with cj.set_cookie(cookie).
Tell the cookiejar to add the correct headers to the request with cj.add_cookie_headers(req).
Assuming you've configured the policy correctly, you're set.
But this is a huge pain. As the docs for urllib.request say:
See also The Requests package is recommended for a higher-level HTTP client interface.
And, unless you have some good reason you can't install requests, you really should go that way. urllib is tolerable for really simple cases, and it can be handy when you need to get deep under the covers—but for everything else, requests is much better.
With requests, your whole program becomes a one-liner:
webpage = requests.get('https://www.thewebsite.com/', cookies={'required_cookie': required_value}, headers={'User-Agent': 'Mozilla/5.0'}).text
… although it's probably more readable as a few lines:
cookies = {'required_cookie': required_value}
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.thewebsite.com/', cookies=cookies, headers=headers)
webpage = response.text
With the help of Kite documentation: https://www.kite.com/python/answers/how-to-add-a-cookie-to-an-http-request-using-urllib-in-python
You can add cookie this way:
import urllib
a_request = urllib.request.Request("http://www.kite.com/")
a_request.add_header("Cookie", "cookiename=cookievalue")
or in a different way:
from urllib.request import Request
url = "https://www.kite.com/"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0', 'Cookie':'myCookie=lovely'})

Download a file from https with authentication

I have a Python 2.6 script that downloades a file from a web server. I want this this script to pass a username and password(for authenrication before fetching the file) and I am passing them as part of the url as follows:
import urllib2
response = urllib2.urlopen("http://'user1':'password'#server_name/file")
However, I am getting syntax error in this case. Is this the correct way to go about it? I am pretty new to Python and coding in general.
Can anybody help me out?
Thanks!
If you can use the requests library, it's insanely easy. I'd highly recommend using it if possible:
import requests
url = 'http://somewebsite.org'
user, password = 'bob', 'I love cats'
resp = requests.get(url, auth=(user, password))
I suppose you are trying to pass through a Basic Authentication. In this case, you can handle it this way:
import urllib2
username = 'user1'
password = '123456'
#This should be the base url you wanted to access.
baseurl = 'http://server_name.com'
#Create a password manager
manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
manager.add_password(None, baseurl, username, password)
#Create an authentication handler using the password manager
auth = urllib2.HTTPBasicAuthHandler(manager)
#Create an opener that will replace the default urlopen method on further calls
opener = urllib2.build_opener(auth)
urllib2.install_opener(opener)
#Here you should access the full url you wanted to open
response = urllib2.urlopen(baseurl + "/file")
Use requests library and just put the credentials inside your .netrc file.
The library will load them from there and you will be able to commit the code to your SCM of choice without any security worries.

REST GET request on Python sending named argument

I am playing from Linux using curl inside bash but now I need to move my script to python in order to be more effective.
What is the best way to do something like the following line on python?
curl baseuri:port/resource -u user:pswd
I played already with urllib2 but I dont get a clue on how to send the "-u user:pswd"
You'll need to use a HTTPPasswordMgrWithDefaultRealm instance to handle the authentication, and add in a HTTPBasicAuthHandler handler to respond to the authentication challenge:
import urllib2
url = 'baseuri:port/resource'
username = 'user'
password = 'pswd'
pwmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
pwmgr.add_password(None, url, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
response = opener.open(theurl)
Yes, this is a handful.
If you can install a 3rd party library, then add use the requests library; it'll be so much easier:
import requests
url = 'baseuri:port/resource'
username = 'user'
password = 'pswd'
response = requests.get(url, auth=(username, password))
Actually, we use pycurl instead which also works pretty well. Here an example to get a json from the request:
import pycurl
import json
from io import BytesIO
data = BytesIO()
pyCurl = pycurl.Curl()
pyCurl.setopt(pyCurlClass.URL,string_http)
pyCurl.setopt(pycurl.USERPWD, 'user:password')
pycurl.setopt(c.WRITEFUNCTION, data.write)
pyCurl.perform()
dictionary = json.loads(data.getvalue())

using python urlopen for a url query

Using urlopen also for url queries seems obvious. What I tried is:
import urllib2
query='http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
f = urllib2.urlopen(query)
s = f.read()
f.close()
However, for this specific url query it fails with HTTP error 403 forbidden
When entering this query in my browser, it works.
Also when using http://www.httpquery.com/ to submit the query, it works.
Do you have suggestions how to use Python right to grab the correct response?
Looks like it requires cookies... (which you can do with urllib2), but an easier way if you're doing this, is to use requests
import requests
session = requests.session()
r = session.get('http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627')
This is generally a much easier and less-stressful method of retrieving URLs in Python.
requests will automatically store and re-use cookies for you. Creating a session is slightly overkill here, but is useful for when you need to submit data to login pages etc..., or re-use cookies across a site... etc...
using urllib2 is something like
import urllib2, cookielib
cookies = cookielib.CookieJar()
opener = urllib2.build_opener( urllib2.HTTPCookieProcessor(cookies) )
data = opener.open('url').read()
It appears that the urllib2 default user agent is banned by the host. You can simply supply your own user agent string:
import urllib2
url = 'http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
request = urllib2.Request(url, headers={"User-Agent" : "MyUserAgent"})
contents = urllib2.urlopen(request).read()
print contents

Web proxy in python/django?

I need to have a proxy that acts as an intermediary to fetch images. An example would be, my server requests domain1.com/?url=domain2.com/image.png and domain1.com server will respond with the data at domain2.com/image.png via domain1.com server.
Essentially I want to pass to the proxy the URL I want fetched, and have the proxy server respond with that resource.
Any suggestions on where to start on this?
I need something very easy to use or implement as I'm very much a beginner at all of this.
Most solutions I have found in python and/or django have the proxy acts as a "translater" i.e. domain1.com/image.png translates to domain2.com/image.png, which is obviously not the same.
I currently have the following code, but fetching images results in garbled data:
import httplib2
from django.conf.urls.defaults import *
from django.http import HttpResponse
def proxy(request, url):
conn = httplib2.Http()
if request.method == "GET":
url = request.GET['url']
resp, content = conn.request(url, request.method)
return HttpResponse(content)
Old question but for future googlers, I think this is what you want:
# proxies the google logo
def test(request):
url = "http://www.google.com/logos/classicplus.png"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
return HttpResponse(response.read(), mimetype="image/png")
A very simple Django proxy view with requests and StreamingHttpResponse:
import requests
from django.http import StreamingHttpResponse
def my_proxy_view(request):
url = request.GET['url']
response = requests.get(url, stream=True)
return StreamingHttpResponse(
response.raw,
content_type=response.headers.get('content-type'),
status=response.status_code,
reason=response.reason)
The advantage of this approach is that you don't need to load the complete file in memory before streaming the content to the client.
As you can see, it forwards some response headers. Depending on your needs, you may want to forward the request headers as well; for example:
response = requests.get(url, stream=True,
headers={'user-agent': request.headers.get('user-agent')})
If you need something more complete than my previous answer, you can use this class:
import requests
from django.http import StreamingHttpResponse
class ProxyHttpResponse(StreamingHttpResponse):
def __init__(self, url, headers=None, **kwargs):
upstream = requests.get(url, stream=True, headers=headers)
kwargs.setdefault('content_type', upstream.headers.get('content-type'))
kwargs.setdefault('status', upstream.status_code)
kwargs.setdefault('reason', upstream.reason)
super().__init__(upstream.raw, **kwargs)
for name, value in upstream.headers.items():
self[name] = value
You can use this class like so:
def my_proxy_view(request):
url = request.GET['url']
return ProxyHttpResponse(url, headers=request.headers)
The advantage of this version is that you can reuse it in multiple views. Also, it forwards all headers, and you can easily extend it to add or exclude some other headers.
If the file you're fetching and returning is an image, you'll need to change the mimetype of your HttpResponse Object.
Use mechanize, it allow you to choose a proxy and act like a browser, making it easy to change the user agent, to go back and forth in the history and to handle authentification or cookies.

Categories

Resources