Get HTTP response header using Python requests module

Get HTTP response header using Python requests module - python

I am using 'requests' module in Python to query a RESTful API endpoint. Sometimes, the endpoint returns an HTTP Error 500. I realize I can get the status code using requests.status_code but when I get error 500, I'd like to see the HTTP "response text" (I'm not sure what it's called, examples below). So far I've been able to get some of the headers using response.headers. However, the info I'm looking for is still not there.
Using "curl -vvv", I can see the HTTP response that I'm after (some output omitted for clarity):
< HTTP/1.1 200 OK <---------------------this is what I'm after)
* Server nginx/1.4.1 is not blacklisted
< Server: nginx/1.4.1
< Date: Wed, 05 Feb 2014 16:13:25 GMT
< Content-Type: application/octet-stream
< Connection: close
< Set-Cookie: webapp.session.id="mYzk5NTc0MDZkYjcxZjU4NmM=|1391616805|f83c47a363194c1ae18e"; expires=Fri, 07 Mar 2014 16:13:25 GMT; Path=/
< Content-Disposition: attachment; filename = "download_2014161325.pdf"
< Cache-Control: public
Again, that's from curl. Now, when I use Python's request module and ask for headers, this is all I get:
CaseInsensitiveDict(
{
'date': 'Tue, 04 Feb 2014 21:56:45 GMT',
'set-cookie': 'webapp.session.id="xODgzNThlODkzZ2U0ZTg=|1391551005|a11ca2ad11195351f636fef"; expires=Thu, 06 Mar 2014 21:56:45 GMT; Path=/,
'connection': 'close',
'content-type': 'application/json',
'server': 'nginx/1.4.1'
}
)
Notice the curl response includes "HTTP/1.1 200 OK" but the requests.headers does not. Nearly everything else in the response headers are there. The requests.status_code gives me the 200. In this example, all I'm after is the "OK". In other scenarios, our nginx server returns more detailed messages, like "HTTP/1.1 500 search unavailable" or "HTTP/1.1 500 bad parameters", etc. I'd like to get this text. Is there a way or could I hack something with Popen and curl? Requests.content and requests.text don't help.

You are looking for the Response.reason attribute:
>>> import requests
>>> r = requests.get('http://httpbin.org/get')
>>> r.status_code
200
>>> r.reason
'OK'
>>> r = requests.get('http://httpbin.org/status/500')
>>> r.reason
'INTERNAL SERVER ERROR'

That is an excellent answer BUT please keep in mind that for certain applications, you need to retrieve the response headers. This is often the case in paginated REST apis. Those can be retrieved with:
r.headers
And iterate the keys with:
[x for x in r.headers]
Happy coding! [R]

Related

If-None-Match header not accepted request header

I'm trying to understand how Etag works in Django. I added middleware in settings ('django.middleware.http.ConditionalGetMiddleware') and this seems to work as it generates the Etag:
HTTP/1.0 200 OK
Date: Mon, 15 Jan 2018 16:58:30 GMT
Server: WSGIServer/0.2 CPython/3.6.0
Content-Type: application/json
Vary: Accept
Allow: GET, HEAD, OPTIONS
X-Frame-Options: SAMEORIGIN
Content-Length: 1210
ETag: "060e28ac5f08d82ba0cd876a8af64e6d"
Access-Control-Allow-Origin: *
However, when I put If-None-Match: '*' in the request header, I get the following error:
Request header field If-None-Match is not allowed by Access-Control-Allow-Headers in preflight response.
And I notice the request method sent back in the response is OPTIONS and the rest of the headers look like this:
HTTP/1.0 200 OK
Date: Mon, 15 Jan 2018 17:00:26 GMT
Server: WSGIServer/0.2 CPython/3.6.0
Content-Type: text/html; charset=utf-8
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: accept, accept-encoding, authorization, content-type, dnt, origin, user-agent, x-csrftoken, x-requested-with
Access-Control-Allow-Methods: DELETE, GET, OPTIONS, PATCH, POST, PUT
Access-Control-Max-Age: 86400
So the question is how do I get If-None-Match to be an allowed header? I'm not sure if this is a server or client issue. I'm using Django/DRF/Vue for my stack and Axios for making http requests.

As the response contains various CORS headers, I believe you have already used django-cors-headers, you could adjust Access-Control-Allow-Headers with CORS_ALLOW_HEADERS config option, get more detail on its doc.

urllib2 returns sometimes old page - returns strange header

I'm working on a python script which works with a JSON returned by an URL.
Since a couple of days urllib2 returns (just sometimes) an old state of the JSON.
I did add the headers "Cache-Control":"max-age=0" etc. still it sometimes happen.
If I print out the request info I get:
Server: nginx/1.8.0
Date: Thu, 03 Sep 2015 17:02:47 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 3539
Status: 200 OK
X-XHR-Current-Location: /shop/169464.json
X-UA-Compatible: IE=Edge,chrome=1
ETag: "b1fbe7a01e0832025a3afce23fc2ab56"
X-Request-Id: 4cc0d399f943ad09a903f18a6ce1c488
X-Runtime: 0.123033
X-Rack-Cache: miss
Accept-Ranges: bytes
X-Varnish: 1707606900 1707225496
Age: 2860
Via: 1.1 varnish
Cache-Control: private, max-age=0, must-revalidate
Pragma: no-cache
X-Cache: HIT
X-Cache: MISS from adsl
X-Cache-Lookup: MISS from adsl:21261
Connection: close
has it something to do with the header "Age" or "X-Cache-Rack"? Or any ideas how I can fix it?
thanks in advance!

try to fake the user-agent, remove cookies, drop sessions.
fake_user_agent = ['chrome','firefox','safari']
request = urllib2.Request(url)
request.add_header('User-Agent', get_random(fake_user_agent))
content = urllib2.build_opener().open(request)
if all doesn't work, then try using tor to change ip per request.
if nothing works, then you can't bypass it because you are most definitely connecting to the transparent proxy

not getting all cookie info using python requests module

I'm learning how to login to an example website using python requests module. This
Video Tutorial
got me started. From all the cookies that I see in GoogleChrome>Inspect Element>NetworkTab, I'm not able to retrieve all of them using the following code:
import requests
with requests.Session() as s:
url = 'http://www.noobmovies.com/accounts/login/?next=/'
s.get(url)
allcookies = s.cookies.get_dict()
print allcookies
Using this I only get csrftoken like below:
{'csrftoken': 'ePE8zGxV4yHJ5j1NoGbXnhLK1FQ4jwqO'}
But in google chrome, I see all these other cookies apart from csrftoken (sessionid, _gat, _ga etc):
I even tried the following code from here, but the result was the same:
from urllib2 import Request, build_opener, HTTPCookieProcessor, HTTPHandler
import cookielib
#Create a CookieJar object to hold the cookies
cj = cookielib.CookieJar()
#Create an opener to open pages using the http protocol and to process cookies.
opener = build_opener(HTTPCookieProcessor(cj), HTTPHandler())
#create a request object to be used to get the page.
req = Request("http://www.noobmovies.com/accounts/login/?next=/")
f = opener.open(req)
#see the first few lines of the page
html = f.read()
print html[:50]
#Check out the cookies
print "the cookies are: "
for cookie in cj:
print cookie
Output:
<!DOCTYPE html>
<html xmlns="http://www.w3.org
the cookies are:
<Cookie csrftoken=ePE8zGxV4yHJ5j1NoGbXnhLK1FQ4jwqO for www.noobmovies.com/>
So, how can I get all the cookies ? Thanks.

The cookies being set are from other pages/resources, probably loaded by JavaScript code. You can check it making the request to the page only (without running the JS code), using tools such as wget, curl or httpie.
The only cookie this server set is csrftoken, as you can see in:
$ wget --server-response 'http://www.noobmovies.com/accounts/login/?next=/'
--2016-02-01 22:51:55-- http://www.noobmovies.com/accounts/login/?next=/
Resolving www.noobmovies.com (www.noobmovies.com)... 69.164.217.90
Connecting to www.noobmovies.com (www.noobmovies.com)|69.164.217.90|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx/1.4.6 (Ubuntu)
Date: Tue, 02 Feb 2016 00:51:58 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Expires: Tue, 02 Feb 2016 00:51:58 GMT
Vary: Cookie,Accept-Encoding
Cache-Control: max-age=0
Set-Cookie: csrftoken=XJ07sWhMpT1hqv4K96lXkyDWAYIFt1W5; expires=Tue, 31-Jan-2017 00:51:58 GMT; Max-Age=31449600; Path=/
Last-Modified: Tue, 02 Feb 2016 00:51:58 GMT
Length: unspecified [text/html]
Saving to: ‘index.html?next=%2F’
index.html?next=%2F [ <=> ] 10,83K 2,93KB/s in 3,7s
2016-02-01 22:52:03 (2,93 KB/s) - ‘index.html?next=%2F’ saved [11085]
Note the Set-Cookie line.

Python Scripting Question

I have a question about writing a small tool that would provide the headers for any website. I am new to python but wanted to know if there is anything else other than encoding that I would have to account for in my code when developing the tool? I have a rough draft of my code shown below. Any pointers from the python coders?
#!/usr/bin/python
import sys, urllib
if len(sys.argv) == 2:
website = sys.argv[1]
website = urllib.urlopen(sys.argv[1])
if(website.code != 200):
print "Something went wrong here"
print website.code
exit(0)
print 'Printing the headers'
print '-----------------------------------------'
for header, value in website.headers.items() :
print header + ' : ' + value

Seems a fairly straightforward script (though this question seems more of a fit for stackoverflow). Couple comments, first curl -I is a useful command line tool to compare against. Second, even when you don't get 200 status, there are still often useful content or headers you may want to display. E.g.,
$ curl -I http://security.stackexchange.com/asdf
HTTP/1.1 404 Not Found
Cache-Control: private
Content-Length: 24068
Content-Type: text/html; charset=utf-8
X-Frame-Options: SAMEORIGIN
Set-Cookie: prov=678b5b9c-0130-4398-9834-673475961dc6; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
Date: Fri, 25 Apr 2014 07:24:00 GMT
Also note urllib follows redirects automatically. E.g., with curl you'll see:
$ curl -I http://www.security.stackexchange.com
HTTP/1.1 301 Moved Permanently
Content-Length: 157
Content-Type: text/html; charset=UTF-8
Location: http://security.stackexchange.com/
Date: Fri, 25 Apr 2014 07:26:52 GMT
while your tool will just give.
$ python user3567119.py http://www.security.stackexchange.com
Printing the headers
-----------------------------------------
content-length : 68639
set-cookie : prov=9bf4f3d4-e3ae-4161-8e34-9aaa83f0aa4b; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
expires : Fri, 25 Apr 2014 07:29:32 GMT
vary : *
last-modified : Fri, 25 Apr 2014 07:28:32 GMT
connection : close
cache-control : public, no-cache="Set-Cookie", max-age=60
date : Fri, 25 Apr 2014 07:28:31 GMT
x-frame-options : SAMEORIGIN
content-type : text/html; charset=utf-8
Third, if you continue playing around with HTTP requests in python, I highly recommend using requests. With requests, you'll be able to see the 301 if you do:
In [1]: import requests
In [2]: r=requests.get('http://www.security.stackexchange.com')
In [3]: r
Out[3]: <Response [200]>
In [4]: r.history
Out[4]: (<Response [301]>,)
It's also worth trying out some HTTP requests in just plain old telnet. E.g., telnet security.stackexchange.com 80 then quickly type:
GET / HTTP/1.1
Host: security.stackexchange.com
followed by a blank line. Then you'll see the actual HTTP response on the wire (instead of recreating it after urllib has processed the HTTP response):
HTTP/1.1 200 OK
Cache-Control: public, no-cache="Set-Cookie", max-age=60
Content-Type: text/html; charset=utf-8
Expires: Fri, 25 Apr 2014 07:38:37 GMT
Last-Modified: Fri, 25 Apr 2014 07:37:37 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Set-Cookie: prov=a75de1f2-678b-4a9d-bbfd-39e933e60237; domain=.stackexchange.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
Date: Fri, 25 Apr 2014 07:37:36 GMT
Content-Length: 68849
<!DOCTYPE html>

Python Scraping Web with Session Cookie

Hi iam trying to scrap some data off from this URL:
http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1
As you may have noticed, if cookies and session data is not yet set you will be redirected to its base url (http://www.21cineplex.com/)
I tried to do it like this:
def main():
try:
cj = CookieJar()
baseurl = "http://www.21cineplex.com"
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(baseurl)
urllib2.install_opener(opener)
movieSource = urllib2.urlopen('http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1').read()
splitSource = re.findall(r'<ul class="w462">(.*?)</ul>', movieSource)
print splitSource
except Exception, e:
str(e)
print "Error occured in main Block"
However, i ended up failing to scrap from that particular URL.
A quick inspection reveals that the website is setting a session ID (PHPSESSID) and make a copy to the client's cookie as such.
The question is how do i mitigate such example?
ps: i've tried to install request (via pip) how ever it gives me (404):
Getting page https://pypi.python.org/simple/request/
Could not fetch URL https://pypi.python.org/simple/request/: HTTP Error 404: Not Found (request does not have any releases)
Will skip URL https://pypi.python.org/simple/request/ when looking for download links for request
Getting page https://pypi.python.org/simple/
URLs to search for versions for request:
* https://pypi.python.org/simple/request/
Getting page https://pypi.python.org/simple/request/
Could not fetch URL https://pypi.python.org/simple/request/: HTTP Error 404: Not Found (request does not have any releases)
Will skip URL https://pypi.python.org/simple/request/ when looking for download links for request
Could not find any downloads that satisfy the requirement request
Cleaning up...

Thanks to #Chainik i got it to work now. I ended up modify my code like this:
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
baseurl = "http://www.21cineplex.com/"
regex = '<ul class="w462">(.*?)</ul>'
opener.open(baseurl)
urllib2.install_opener(opener)
request = urllib2.Request('http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1')
request.add_header('Referer', baseurl)
requestData = urllib2.urlopen(request)
htmlText = requestData.read()
Once, the html text is retrieved. It's all about parsing its content.
Cheers

Try setting a referer URL, see below.
Without referer URL set (302 redirect):
$ curl -I "http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1"
HTTP/1.1 302 Moved Temporarily
Server: nginx
Date: Thu, 19 Sep 2013 09:19:19 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.4.17
Set-Cookie: PHPSESSID=5effe043db4fd83b2c5927818cb1a7ca; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: kota=3; expires=Fri, 19-Sep-2014 09:19:19 GMT; path=/
Location: http://www.21cineplex.com/
With referer URL set (HTTP/200):
$ curl -I -e "http://www.21cineplex.com/"
"http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1"
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 19 Sep 2013 09:19:24 GMT
Content-Type: text/html
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.4.17
Set-Cookie: PHPSESSID=a7abd6592c87e0c1a8fab4f855baa0a4; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: kota=3; expires=Fri, 19-Sep-2014 09:19:24 GMT; path=/
To set referer URL using urllib, see this post
-- ab1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get HTTP response header using Python requests module - python

You are looking for the Response.reason attribute: >>> import requests >>> r = requests.get('http://httpbin.org/get') >>> r.status_code 200 >>> r.reason 'OK' >>> r = requests.get('http://httpbin.org/status/500') >>> r.reason 'INTERNAL SERVER ERROR'

That is an excellent answer BUT please keep in mind that for certain applications, you need to retrieve the response headers. This is often the case in paginated REST apis. Those can be retrieved with: r.headers And iterate the keys with: [x for x in r.headers] Happy coding! [R]

Related

If-None-Match header not accepted request header

urllib2 returns sometimes old page - returns strange header

not getting all cookie info using python requests module

Python Scripting Question

Python Scraping Web with Session Cookie

Categories

Resources