How do you send a HEAD HTTP request in Python 2? - python

What I'm trying to do here is get the headers of a given URL so I can determine the MIME type. I want to be able to see if http://somedomain/foo/ will return an HTML document or a JPEG image for example. Thus, I need to figure out how to send a HEAD request so that I can read the MIME type without having to download the content. Does anyone know of an easy way of doing this?

urllib2 can be used to perform a HEAD request. This is a little nicer than using httplib since urllib2 parses the URL for you instead of requiring you to split the URL into host name and path.
>>> import urllib2
>>> class HeadRequest(urllib2.Request):
... def get_method(self):
... return "HEAD"
...
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))
Headers are available via response.info() as before. Interestingly, you can find the URL that you were redirected to:
>>> print response.geturl()
http://www.google.com.au/index.html

edit: This answer works, but nowadays you should just use the requests library as mentioned by other answers below.
Use httplib.
>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]
There's also a getheader(name) to get a specific header.

Obligatory Requests way:
import requests
resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

I believe the Requests library should be mentioned as well.

Just:
import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'
response = urllib2.urlopen(request)
response.info().gettype()
Edit: I've just came to realize there is httplib2 :D
import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...
link text

For completeness to have a Python3 answer equivalent to the accepted answer using httplib.
It is basically the same code just that the library isn't called httplib anymore but http.client
from http.client import HTTPConnection
conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()
print(res.status, res.reason)

import httplib
import urlparse
def unshorten_url(url):
parsed = urlparse.urlparse(url)
h = httplib.HTTPConnection(parsed.netloc)
h.request('HEAD', parsed.path)
response = h.getresponse()
if response.status/100 == 3 and response.getheader('Location'):
return response.getheader('Location')
else:
return url

As an aside, when using the httplib (at least on 2.5.2), trying to read the response of a HEAD request will block (on readline) and subsequently fail. If you do not issue read on the response, you are unable to send another request on the connection, you will need to open a new one. Or accept a long delay between requests.

I have found that httplib is slightly faster than urllib2. I timed two programs - one using httplib and the other using urllib2 - sending HEAD requests to 10,000 URL's. The httplib one was faster by several minutes. httplib's total stats were: real 6m21.334s
user 0m2.124s
sys 0m16.372s
And urllib2's total stats were: real 9m1.380s
user 0m16.666s
sys 0m28.565s
Does anybody else have input on this?

And yet another approach (similar to Pawel answer):
import urllib2
import types
request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)
Just to avoid having unbounded methods at instance level.

Probably easier: use urllib or urllib2.
>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'
f.info() is a dictionary-like object, so you can do f.info()['content-type'], etc.
http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html
The docs note that httplib is not normally used directly.

Related

From HTTPResponse to str in Python 3.6

From a POST request to Vimeo API I get a JSON object encoded as HTTPResponse.
r = http.request('POST', 'https://api.vimeo.com/oauth/authorize/client?grant_type=client_credentials', headers={'Authorization': 'basic XXX'})
I do not find a way to convert the HTTPResponse to a str or Json object. In stackoverflow I found and tried the following options:
json.loads(r.decode('utf-8'))
json.loads(r.readall().decode('utf-8'))
str(r, 'utf-8')
but none of them worked.
Please can you help?
Thanks
try with requests module
import requests
import json
r=requests.post('https://api.vimeo.com/oauth/authorize/client?grant_type=client_credentials', varData, headers={'Authorization': 'basic XXX'})
response = json.loads(r.text)
From Python docs (emphasis mine):
class http.client.HTTPResponse(sock, debuglevel=0, method=None, url=None)
Class whose instances are returned upon successful connection. Not instantiated directly by user.
And also:
See also The Requests package is recommended for a higher-level HTTP client interface.
So you're probably better off using requests directly.
After having made your request, just use json.loads(r.text).
Use can use http.client module. Example:
import http.client
import json
conn = http.client.HTTPConnection('https://api.vimeo.com/oauth/authorize/client?grant_type=client_credentials')
headers = {'Authorization': 'basic XXX'}
params = varData
conn.request('POST', '', params, headers)
response = conn.getresponse()
content = bytes.decode(response.read(), 'utf-8') #return string value
res_map = json.loads(content) #if content is json string
For more information, refer this: http.client

How to get cookies from urllib.request?

How to get cookie from an urllib.request?
import urllib.request
import urllib.parse
data = urllib.parse.urlencode({
'user': 'user',
'pass': 'pass'
})
data = data.encode('utf-8')
request = urllib.request.urlopen('http://example.com', data)
print(request.info())
request.info() returns cookies but not in very usable way.
response.info() is a dict type object. so you can parse any info you need. Here is a demo written in python3:
from urllib import request
from urllib.error import HTTPError
# declare url, header_params
req = request.Request(url, data=None, headers=header_params, method='GET')
try:
response = request.urlopen(req)
cookie = response.info().get_all('Set-Cookie')
content_type = response.info()['Content-Type']
except HTTPError as err:
print("err status: {0}".format(err))
return
You can now, parse cookie variable as your application requirement.
Just used the following code to get cookie from Python Challenge #17, hope it helps (Python 3.8 being used):
import http.cookiejar
import urllib
cookiejar = http.cookiejar.CookieJar()
cookieproc = urllib.request.HTTPCookieProcessor(cookiejar)
opener = urllib.request.build_opener(cookieproc)
response = opener.open(url)
for cookie in cookiejar:
print(cookie.name, cookie.value)
I think using the requests package is a much better choice these days. Try this sample code that shows google setting cookies when you visit:
import requests
url = "http://www.google.com"
r = requests.get(url,timeout=5)
if r.status_code == 200:
for cookie in r.cookies:
print(cookie) # Use "print cookie" if you use Python 2.
Gives:
Cookie NID=67=n0l3ME1Jl3-wwlH7oE5pvxJ_CfU12hT5Kh65wh21bvE3hrKFAo1sJVj_UcuLCr76Ubi3yxENROaYNEitdgW4IttL43YZGlf8xAPl1IbzoLG31KP5U2tiP2y4DzVOJ2fA for .google.se/
Cookie PREF=ID=ce66d1288fc0d977:FF=0:TM=1407525509:LM=1407525509:S=LxQv7q8fju-iHJPZ for .google.se/

Can I do preemptive authentication with httplib2?

I need to perform preemptive basic authentication against an HTTP server, i.e., authenticate right away without waiting on a 401 response. Can this be done with httplib2?
Edit:
I solved it by adding an Authorization header to the request, as suggested in the accepted answer:
headers["Authorization"] = "Basic {0}".format(
base64.b64encode("{0}:{1}".format(username, password)))
Add an appropriately formed 'Authorization' header to your initial request.
This also works with the built-in httplib (for anyone wishing to minimize 3rd-party libs/modules). I am using it to authenticate with our Jenkins server using the API Token that Jenkins can create for each user.
>>> import base64, httplib
>>> headers = {}
>>> headers["Authorization"] = "Basic {0}".format(
base64.b64encode("{0}:{1}".format('<username>', '<jenkins_API_token>')))
>>> ## Enable the job
>>> conn = httplib.HTTPConnection('jenkins.myserver.net')
>>> conn.request('POST', '/job/Foo-trunk/enable', None, headers)
>>> resp = conn.getresponse()
>>> resp.status
302
>>> ## Disable the job
>>> conn = httplib.HTTPConnection('jenkins.myserver.net')
>>> conn.request('POST', '/job/Foo-trunk/disable', None, headers)
>>> resp = conn.getresponse()
>>> resp.status
302
I realize this is old, but I figured I'd throw in the solution if you're using Python 3 with httplib2 since I haven't been able to find it anywhere else. I'm also authenticating against a Jenkins server using the API Token for each Jenkins user. If you're not concerned with Jenkins, simply substitute the actual user's password for the API Token.
b64encode is expecting an binary string of ASCII characters. With Python 3 a TypeError will be raised if a plain string is passed in. To get around this, the "user:api_token" portion of the header must be encoded using either 'ascii' or 'utf-8', passed to b64encode, then the resulting byte string must be decoded to a plain string before being placed in the header. The following code did what I needed:
import httplib2, base64
cred = base64.b64encode("{0}:{1}".format(
<user>, <api_token>).encode('utf-8')).decode()
headers = {'Authorization': "Basic %s" % cred}
h = httplib2.Http('.cache')
response, content = h.request("http://my.jenkins.server/job/my_job/enable",
"GET", headers=headers)

How can I unshorten a URL?

I want to be able to take a shortened or non-shortened URL and return its un-shortened form. How can I make a python program to do this?
Additional Clarification:
Case 1: shortened --> unshortened
Case 2: unshortened --> unshortened
e.g. bit.ly/silly in the input array should be google.com in the output array
e.g. google.com in the input array should be google.com in the output array
Send an HTTP HEAD request to the URL and look at the response code. If the code is 30x, look at the Location header to get the unshortened URL. Otherwise, if the code is 20x, then the URL is not redirected; you probably also want to handle error codes (4xx and 5xx) in some fashion. For example:
# This is for Py2k. For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse
def unshorten_url(url):
parsed = urlparse.urlparse(url)
h = httplib.HTTPConnection(parsed.netloc)
h.request('HEAD', parsed.path)
response = h.getresponse()
if response.status/100 == 3 and response.getheader('Location'):
return response.getheader('Location')
else:
return url
Using requests:
import requests
session = requests.Session() # so connections are recycled
resp = session.head(url, allow_redirects=True)
print(resp.url)
Unshorten.me has an api that lets you send a JSON or XML request and get the full URL returned.
If you are using Python 3.5+ you can use the Unshortenit module that makes this very easy:
from unshortenit import UnshortenIt
unshortener = UnshortenIt()
uri = unshortener.unshorten('https://href.li/?https://example.com')
Open the url and see what it resolves to:
>>> import urllib2
>>> a = urllib2.urlopen('http://bit.ly/cXEInp')
>>> print a.url
http://www.flickr.com/photos/26432908#N00/346615997/sizes/l/
>>> a = urllib2.urlopen('http://google.com')
>>> print a.url
http://www.google.com/
To unshort, you can use requests. This is a simple solution that works for me.
import requests
url = "http://foo.com"
site = requests.get(url)
print(site.url)
http://github.com/stef/urlclean
sudo pip install urlclean
urlclean.unshorten(url)
Here a src code that takes into account almost of the useful corner cases:
set a custom Timeout.
set a custom User Agent.
check whether we have to use an http or https connection.
resolve recursively the input url and prevent ending within a loop.
The src code is on github # https://github.com/amirkrifa/UnShortenUrl
comments are welcome ...
import logging
logging.basicConfig(level=logging.DEBUG)
TIMEOUT = 10
class UnShortenUrl:
def process(self, url, previous_url=None):
logging.info('Init url: %s'%url)
import urlparse
import httplib
try:
parsed = urlparse.urlparse(url)
if parsed.scheme == 'https':
h = httplib.HTTPSConnection(parsed.netloc, timeout=TIMEOUT)
else:
h = httplib.HTTPConnection(parsed.netloc, timeout=TIMEOUT)
resource = parsed.path
if parsed.query != "":
resource += "?" + parsed.query
try:
h.request('HEAD',
resource,
headers={'User-Agent': 'curl/7.38.0'}
}
)
response = h.getresponse()
except:
import traceback
traceback.print_exec()
return url
logging.info('Response status: %d'%response.status)
if response.status/100 == 3 and response.getheader('Location'):
red_url = response.getheader('Location')
logging.info('Red, previous: %s, %s'%(red_url, previous_url))
if red_url == previous_url:
return red_url
return self.process(red_url, previous_url=url)
else:
return url
except:
import traceback
traceback.print_exc()
return None
You can use geturl()
from urllib.request import urlopen
url = "bit.ly/silly"
unshortened_url = urlopen(url).geturl()
print(unshortened_url)
# google.com
This Is very easy task you just need to add 4 lines of codes thats it :)
import requests
url = input('Enter url : ')
site = requests.get(url)
print(site.url)
just run this code you will successfully unshort the url.

Is there any way to do HTTP PUT in python

I need to upload some data to a server using HTTP PUT in python. From my brief reading of the urllib2 docs, it only does HTTP POST. Is there any way to do an HTTP PUT in python?
I've used a variety of python HTTP libs in the past, and I've settled on requests as my favourite. Existing libs had pretty useable interfaces, but code can end up being a few lines too long for simple operations. A basic PUT in requests looks like:
payload = {'username': 'bob', 'email': 'bob#bob.com'}
>>> r = requests.put("http://somedomain.org/endpoint", data=payload)
You can then check the response status code with:
r.status_code
or the response with:
r.content
Requests has a lot synactic sugar and shortcuts that'll make your life easier.
import urllib2
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request('http://example.org', data='your_put_data')
request.add_header('Content-Type', 'your/contenttype')
request.get_method = lambda: 'PUT'
url = opener.open(request)
Httplib seems like a cleaner choice.
import httplib
connection = httplib.HTTPConnection('1.2.3.4:1234')
body_content = 'BODY CONTENT GOES HERE'
connection.request('PUT', '/url/path/to/put/to', body_content)
result = connection.getresponse()
# Now result.status and result.reason contains interesting stuff
You can use the requests library, it simplifies things a lot in comparison to taking the urllib2 approach. First install it from pip:
pip install requests
More on installing requests.
Then setup the put request:
import requests
import json
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
# Create your header as required
headers = {"content-type": "application/json", "Authorization": "<auth-key>" }
r = requests.put(url, data=json.dumps(payload), headers=headers)
See the quickstart for requests library. I think this is a lot simpler than urllib2 but does require this additional package to be installed and imported.
This was made better in python3 and documented in the stdlib documentation
The urllib.request.Request class gained a method=... parameter in python3.
Some sample usage:
req = urllib.request.Request('https://example.com/', data=b'DATA!', method='PUT')
urllib.request.urlopen(req)
You should have a look at the httplib module. It should let you make whatever sort of HTTP request you want.
I needed to solve this problem too a while back so that I could act as a client for a RESTful API. I settled on httplib2 because it allowed me to send PUT and DELETE in addition to GET and POST. Httplib2 is not part of the standard library but you can easily get it from the cheese shop.
I also recommend httplib2 by Joe Gregario. I use this regularly instead of httplib in the standard lib.
Have you taken a look at put.py? I've used it in the past. You can also just hack up your own request with urllib.
You can of course roll your own with the existing standard libraries at any level from sockets up to tweaking urllib.
http://pycurl.sourceforge.net/
"PyCurl is a Python interface to libcurl."
"libcurl is a free and easy-to-use client-side URL transfer library, ... supports ... HTTP PUT"
"The main drawback with PycURL is that it is a relative thin layer over libcurl without any of those nice Pythonic class hierarchies. This means it has a somewhat steep learning curve unless you are already familiar with libcurl's C API. "
If you want to stay within the standard library, you can subclass urllib2.Request:
import urllib2
class RequestWithMethod(urllib2.Request):
def __init__(self, *args, **kwargs):
self._method = kwargs.pop('method', None)
urllib2.Request.__init__(self, *args, **kwargs)
def get_method(self):
return self._method if self._method else super(RequestWithMethod, self).get_method()
def put_request(url, data):
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = RequestWithMethod(url, method='PUT', data=data)
return opener.open(request)
You can use requests.request
import requests
url = "https://www.example/com/some/url/"
payload="{\"param1\": 1, \"param1\": 2}"
headers = {
'Authorization': '....',
'Content-Type': 'application/json'
}
response = requests.request("PUT", url, headers=headers, data=payload)
print(response.text)
A more proper way of doing this with requests would be:
import requests
payload = {'username': 'bob', 'email': 'bob#bob.com'}
try:
response = requests.put(url="http://somedomain.org/endpoint", data=payload)
response.raise_for_status()
except requests.exceptions.RequestException as e:
print(e)
raise
This raises an exception if there is an error in the HTTP PUT request.
Using urllib3
To do that, you will need to manually encode query parameters in the URL.
>>> import urllib3
>>> http = urllib3.PoolManager()
>>> from urllib.parse import urlencode
>>> encoded_args = urlencode({"name":"Zion","salary":"1123","age":"23"})
>>> url = 'http://dummy.restapiexample.com/api/v1/update/15410' + encoded_args
>>> r = http.request('PUT', url)
>>> import json
>>> json.loads(r.data.decode('utf-8'))
{'status': 'success', 'data': [], 'message': 'Successfully! Record has been updated.'}
Using requests
>>> import requests
>>> r = requests.put('https://httpbin.org/put', data = {'key':'value'})
>>> r.status_code
200

Categories

Resources