I'm trying to convert my python script from issuing a curl command via os.system() to using requests. I thought I'd use pycurl, but this question convinced me otherwise. The problem is I'm getting an error returned from the server that I can see when using r.text (from this answer) but I need more information. Is there a better way to debug what's happening?
for what it's worth I think the issue revoles around converting my --data flag from curl/pycurl to requests. I've created a dictionary of the params i was passing to --data before. My guess is that one of those isn't valid but how can I get more info to know for sure?
example:
headers2 = {"Accept":"*/*", \
"Content-Type":"application/x-www-form-urlencoded", \
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36", \
"Origin":"https://somedomain.com", \
"X-Requested-With":"XMLHttpRequest", \
"Connection":"keep-alive", \
"Accept-Language":"en-US,en;q=0.8", \
"Referer":"https://somedomain.com/release_cr_new.html?releaseid=%s&v=2&m=a&prev_release_id=%s" % (current_release_id, previous_release_id), \
"Host":"somedomain.com", \
"Accept-Encoding":"gzip,deflate,sdch", \
"Cookie":'cookie_val'}
for bug_id in ids:
print bug_id
data = {'dump_json':'1','releaseid':current_release_id, 'v':'2','m':'a','prev_release_id': previous_release_id,'bug_ids': bug_id, 'set_cols':'sqa_status&sqa_updates%5B0%5D%5Bbugid%5D=' + bug_id + '&sqa_updates%5B0%5D%5Bsqa_status%5D=6'}
print 'current_release_id' , data['releaseid']
print 'previous_release_id', data['prev_release_id']
r = requests.post(post_url, data=json.dumps(data), headers=headers2)
print r.text
The output I'm getting is a pretty generic html message that I've seen before when I've queried the server in the wrong way. So I know I'm reaching the right server at least.
I'm not really expecting any output. This should just post to the server and update a field in the DB.
Anatomy of an http response
Example (loading this page)
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Fri, 27 Sep 2013 19:22:41 GMT
Last-Modified: Fri, 27 Sep 2013 19:21:41 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Date: Fri, 27 Sep 2013 19:21:41 GMT
Content-Length: 12706
<!DOCTYPE html>
<html>
... truncated rest of body ...
The first line is the status line and consists of the status code and status text.
Headers are key/value pairs. Headers are ended with an empty new line. The empty line denotes there are no more headers and the start of the payload / body follows.
body consumes the rest of the message.
The following explains how to extract the 3 parts:
Status Line
Use the following to get the status line sent back from the server
>>> bad_r = requests.get('http://httpbin.org/status/404')
>>> bad_r.status_code
404
>>> bad_r.raise_for_status()
Traceback (most recent call last):
File "requests/models.py", line 832, in raise_for_status
raise http_error
requests.exceptions.HTTPError: 404 Client Error
(source)
Headers:
r = requests.get('http://en.wikipedia.org/wiki/Monty_Python')
# response headers:
r.headers
# request headers:
r.request.headers
Body
Use r.text.
Post Request Encoding
The 'content-type' you send to the server in the request should match the content-type you're actually sending. In your case, you are sending json but telling the server you're sending form data (which is the default if you do not specify).
From the headers you show above:
"Content-Type":"application/x-www-form-urlencoded",
But your request.post call sets data=json.dumps(data) which is JSON. The headers should say:
"Content-type": "application/json",
The value returned from the request object contains the request information under .request.
Example:
r = requests.request("POST", url, ...)
print("Request headers:", r.request.headers)
print("Request body:", r.request.body)
print("Response status code:", r.status_code)
print("Response text:", r.text.encode('utf8'))
Related
I'm trying to mimic a curl request I'm doing using python the call is. Note that
HTTPS Request
Ignoring SSL ceritification verification
My server is django
The curl command I used is
curl -k --dump-header - -H "Content-Type: application/json" -X POST --data '{"environment_name": "foo"}' https://localhost/api/v1/environment/
and the response from the server is successful
HTTP/1.1 201 CREATED
Date: Tue, 17 Jun 2014 00:59:59 GMT
Server: Server
Vary: Accept-Language,Cookie,User-Agent
Content-Language: en-us
Location: https://localhost/api/v1/environment/None/
Status: 201 CREATED
Content-Length: 0
Cneonction: close
Content-Type: text/html; charset=utf-8
However when I try to do a post request in python with 'requests' my script is
import json
data = {'enviornment_name' : 'foo'}
headers = {'Content-type' : 'application/json'}
response = requests.post("https://localhost/api/v1/environment", headers=headers, data=data, verify=False)
When running the script I get back a huge stack trace but the part in red is
E DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing: incorrect data check',))
I'm not sure why I can't my script to work via python
Your server is claiming to return gzip'd content which it is not. The response is returning Content-Encoding: gzip in the headers. This could be because in requests, by default, it sends Accept-Encoding: gzip, compress. You should try setting "Accept-Encoding" to None in your headers dictionary in addition to #Fabricator's suggestion in the comments.
Your headers dictionary will look like:
headers = {'Content-Type': 'application/json', 'Accept-Encoding': None}
And your call to requests will look like
requests.post(url, headers=headers, data=json.dumps(data), verify=False)
#Fabricator I need the verify=False however I noticed the one thing in my code that was an issue for the server I was using I needed the trailing '/' at the end of the URI. In addition I also needed the json.dumps(data) not json.dump(data) in case others are looking. Thanks for the help
I am using cakephp 2.4.5. I want to send a HTTP POST with URL parameters. I am using python 2.7 request module to send the HTTP POST. Please assume the payload is structured correctly as I have tested that part.
URL_post = http://127.0.0.1/webroot/TestFunc?identity_number=S111A/post
r = requests.post(URL_post, payload)
On the cakephp side, the controller looks something like this;
public function TestFunc($id=null)
{
$identity_number = $this->request->query['identity_number'];
$this->request->data['Model']['associated_id']=$identity_number;
$this->Model->saveAll($this->request->data, array('deep' => true));
}
I have tested that the query is not received correctly. However, if I am not using HTTP POST and just throwing in a normal URL, the query can be received correctly.
What have I done wrong?
The query part of the url is sent correctly:
import requests
requests.post('http://localhost/webroot/TestFunc?identity_number=S111A/post',
{'Model': 'data'})
The Request
POST /webroot/TestFunc?identity_number=S111A/post HTTP/1.1
Host: localhost
User-Agent: python-requests/2.2.1 CPython/3.4 Linux/3.2
Accept: */*
Accept-Encoding: gzip, deflate, compress
Content-Type: application/x-www-form-urlencoded
Content-Length: 10
Model=data
You could also make the requests using params:
requests.post('http://localhost/webroot/TestFunc',
data={'Model': 'data'},
params={'identity_number': 'S111A/post'})
The only difference is that S111A/post is sent as S111A%2Fpost (the same url in the end).
Look at http://docs.python-requests.org/en/latest/user/quickstart/#passing-parameters-in-urls.
payload = {"identity_number": "S111A/post"}
URL_post = "http://127.0.0.1/webroot/TestFunc"
req = requests.post(URL_post, params=payload)
print(req.status_code)
---SOLVED---
It turned out the request body had literal r"\n" (repr: "\\n") characters in it, and since I simply copy pasted the body as a Python string, Python thought I was giving it newline characters rather than escaped newline characters.
The reason this causes a Bad Request is as follows: So the body was JSON, and in JSON you have to escape all your newline characters by definition. So when the server loads the JSON object from the raw text, an error is thrown causing Bad Request
I realised this because the Content-Length header was different in both cases (\n is one char while \\\n is two chars, although perhaps the Content-Length doesn't actually matter.
Also it is noteworthy that when a lower Content-Length is sent, Bad Request is also returned. I believe this is because the JSON body gets truncated, and the server doesn't accept the important char (e.g. closing brace or something)
--- Problem:---
Summary:
I am trying to use Python to simulate a POST request to bitbucket.org performed within my Firefox web browser. Here is what I did:
Tracked the POST request using Firebug
Copied the POST request headers
Copied the POST request body (in application/json format)
Code:
Here is the code I use to POST my request, but it's a bit long and not very relevant. My Content-Type is application/json, and my POST body is a JSON-encoded string.
dataString = '{"branch":"master","files":[{"path":"readme.txt","content":"ntestxx\n \n"}],"message":"readme.txt edited online with Bitbucket","parents":["465305dc4da32f91da057b65297cda9b72c"],"repository":{"full_name":"minesite/ica-i18n"},"timestamp":"2014-03-20T23:49:29.759Z","transient":false}'
headers = {'X-CSRFToken': '6TqWjCl698U99Iu6ZYGBAloCxZ', 'Content-Length': '2190', 'Accept-Language': 'en,en-us;q=0.7,zh;q=0.3', 'X-NewRelic-ID': 'VwMGVVZSGwIIUFBQDwU=, VwMGVVZSGwIIUFBQDwU=', 'Cookie': 'csrftoken=6TqWjCl698U99Iu6ZYGBAloCxZ; __utma=254090395.1171276563.1394767875.1394776803.1395358874.3; __utmc=254090395; __utmz=254090395.1394776803.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); bb_session=gpqergylgoa7icpwosqsbpxig0; __utmv=254090395.|1=isBBUser=true=1; recently-viewed-repos_1701252=3802872%2C108928; __utmb=254090395.21.9.1395359363952', 'Connection': 'keep-alive', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0', 'Host': 'bitbucket.org', 'X-Requested-With': 'XMLHttpRequest', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'Referer': 'https://bitbucket.org/xxxxxx/xxxxxxx/src/465305dc4da32f91da057b6529a8e4/readme.txt?at=master', 'Content-Type': 'application/json; charset=UTF-8', 'Accept-Encoding': 'gzip, deflate'}
edit = requests.post("https://bitbucket.org/!api/internal/repositories/xxxxxxx/xxxxxxxx/oecommits/", data=dataString, headers=headers)
Results vs. expected results:
When I perform the POST request using my Firefox web browser (using Firebug's "resend request" function), I get a 409 CONFLICT response (Which is the desired response! I am simulating a request to an online editor, so that should be the correct response to a re-sent edit).
However, when I try to simulate the request by copying the request header and the request body, I get a 400 BAD REQUEST response, and the response contains no other information, so I don't even know what my problem is.
Regardless of how many times I send the POST in the web-browser (despite an incorrect timestamp), it achieves the intended outcome, but the server refuses to accept any requests I make using the python requests library.
Response using browser request:
Headers
HTTP/1.1 409 CONFLICT
Server: nginx/1.5.10
Date: Fri, 21 Mar 2014 00:20:55 GMT
Content-Type: text/plain
Content-Length: 45
Connection: keep-alive
x-served-by: app16
X-Render-Time: 0.558492183685
Content-Language: en
X-Static-Version: 48695e7c3140
Vary: Authorization, Accept-Language, Cookie
X-Version: e6778a5040f7
Etag: "92f0b780984e984140de0f8ed0a3992c"
X-Frame-Options: SAMEORIGIN
X-Request-Count: 483
X-NewRelic-App-Data: PxQEVFdXCAITVVlWBgMPUkYdFGQHBDcQUQxLA1tMXV1dSn8UXwJHCwtYGAMPF1pGUw8EFhlQRxYXH1dDC0gKDEQHSgxZVBpaUgtdDVQTQFgrWFsICAZ9V1kQIg1aXF4SLFBYVw4DEUxTEF0DTF0WHgNJCU8EVApUUgUHVFFQCgQCU1FXGwMGX1QdFAEBUVVbA1AJVQEBB1FSA11DHQdSDhdTag==
Body
Specified change not on head of branch master
Response using python request:
Headers
content-length: 11
x-served-by: app10
x-render-time: 0.012787103653
content-language: en
content-type: text/plain
vary: Authorization, Accept-Language, Cookie
connection: keep-alive
server: nginx/1.5.10
x-version: e6778a5040f7
etag: "825644f747baab2c00e420dbbc39e4b3"
x-request-count: 321
x-newrelic-app-data: PxQEVFdXCAITVVlWBgMPUkYdFGQHBDcQUQxLA1tMXV1dSn8UXwJHCwtYGAMPF1pGUw8EFhlQRxYXH1dDC0gRB0MNTRBbXQ5gVhZWFEMCVkBIBhtRSFMJAARQUlsDBw9VXAIBC1tWVU4CUwtUFBpVAwFcWgdTVQIAXQBRWQQAGh9WBQ0RUmw=
date: Fri, 21 Mar 2014 00:51:01 GMT
x-frame-options: SAMEORIGIN
x-static-version: 48695e7c3140
Body
Bad Request
Some of my ideas:
I am thinking that perhaps there is another component to a HTTP POST request that I need to simulate? Perhaps when Firefox sends a POST request, there is some header or wrapper added that makes the request valid?
Or is there something more to a POST request than just a method, headers, and body?
Maybe it's something to do with the fact that it's HTTPS instead of HTTP?
Update:
I have tried sending the "sent cookies" in with the request, to little success.
Or is there something more to a POST request than just a method,
headers, and body?
No. The important part are the request headers. They should be exactly the same in both cases.
Because Firebug can just track the network requests inside Firefox, you'll need an external network analyzer like Wireshark to track the requests coming from your Python script.
Of course you need to run it on the server where the script lies.
Another solution would be to run your request against a local web server and log the request information there.
Then you'll be able to compare the request made in the browser with the one from your script.
I am using 'requests' module in Python to query a RESTful API endpoint. Sometimes, the endpoint returns an HTTP Error 500. I realize I can get the status code using requests.status_code but when I get error 500, I'd like to see the HTTP "response text" (I'm not sure what it's called, examples below). So far I've been able to get some of the headers using response.headers. However, the info I'm looking for is still not there.
Using "curl -vvv", I can see the HTTP response that I'm after (some output omitted for clarity):
< HTTP/1.1 200 OK <---------------------this is what I'm after)
* Server nginx/1.4.1 is not blacklisted
< Server: nginx/1.4.1
< Date: Wed, 05 Feb 2014 16:13:25 GMT
< Content-Type: application/octet-stream
< Connection: close
< Set-Cookie: webapp.session.id="mYzk5NTc0MDZkYjcxZjU4NmM=|1391616805|f83c47a363194c1ae18e"; expires=Fri, 07 Mar 2014 16:13:25 GMT; Path=/
< Content-Disposition: attachment; filename = "download_2014161325.pdf"
< Cache-Control: public
Again, that's from curl. Now, when I use Python's request module and ask for headers, this is all I get:
CaseInsensitiveDict(
{
'date': 'Tue, 04 Feb 2014 21:56:45 GMT',
'set-cookie': 'webapp.session.id="xODgzNThlODkzZ2U0ZTg=|1391551005|a11ca2ad11195351f636fef"; expires=Thu, 06 Mar 2014 21:56:45 GMT; Path=/,
'connection': 'close',
'content-type': 'application/json',
'server': 'nginx/1.4.1'
}
)
Notice the curl response includes "HTTP/1.1 200 OK" but the requests.headers does not. Nearly everything else in the response headers are there. The requests.status_code gives me the 200. In this example, all I'm after is the "OK". In other scenarios, our nginx server returns more detailed messages, like "HTTP/1.1 500 search unavailable" or "HTTP/1.1 500 bad parameters", etc. I'd like to get this text. Is there a way or could I hack something with Popen and curl? Requests.content and requests.text don't help.
You are looking for the Response.reason attribute:
>>> import requests
>>> r = requests.get('http://httpbin.org/get')
>>> r.status_code
200
>>> r.reason
'OK'
>>> r = requests.get('http://httpbin.org/status/500')
>>> r.reason
'INTERNAL SERVER ERROR'
That is an excellent answer BUT please keep in mind that for certain applications, you need to retrieve the response headers. This is often the case in paginated REST apis. Those can be retrieved with:
r.headers
And iterate the keys with:
[x for x in r.headers]
Happy coding! [R]
I am using the python urllib2 library for opening URL, and what I want is to get the complete header info of the request. When I use response.info I only get this:
Date: Mon, 15 Aug 2011 12:00:42 GMT
Server: Apache/2.2.0 (Unix)
Last-Modified: Tue, 01 May 2001 18:40:33 GMT
ETag: "13ef600-141-897e4a40"
Accept-Ranges: bytes
Content-Length: 321
Connection: close
Content-Type: text/html
I am expecting the complete info as given by live_http_headers (add-on for firefox), e.g:
http://www.yellowpages.com.mt/Malta-Web/127151.aspx
GET /Malta-Web/127151.aspx HTTP/1.1
Host: www.yellowpages.com.mt
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=156587571.1883941323.1313405289.1313405289.1313405289.1; __utmz=156587571.1313405289.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
HTTP/1.1 302 Found
Connection: Keep-Alive
Content-Length: 141
Date: Mon, 15 Aug 2011 12:17:25 GMT
Location: http://www.trucks.com.mt
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET, UrlRewriter.NET 2.0.0
X-AspNet-Version: 2.0.50727
Set-Cookie: ASP.NET_SessionId=zhnqh5554omyti55dxbvmf55; path=/; HttpOnly
Cache-Control: private
My request function is:
def dorequest(url, post=None, headers={}):
cOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
urllib2.install_opener( cOpener )
if post:
post = urllib.urlencode(post)
req = urllib2.Request(url, post, headers)
response = cOpener.open(req)
print response.info() // this does not give complete header info, how can i get complete header info??
return response.read()
url = 'http://www.yellowpages.com.mt/Malta-Web/127151.aspx'
html = dorequest(url)
Is it possible to achieve the desired header info details by using urllib2? I don't want to switch to httplib.
Those are all of the headers the server is sending when you do the request with urllib2.
Firefox is showing you the headers it's sending to the server as well.
When the server gets those headers from Firefox, some of them may trigger it to send back additional headers, so you end up with more response headers as well.
Duplicate the exact headers Firefox sends, and you'll get back an identical response.
Edit: That location header is sent by the page that does the redirect, not the page you're redirected to. Just use response.url to get the location of the page you've been sent to.
That first URL uses a 302 redirect. If you don't want to follow the redirect, but see the headers from the first page instead, use a URLOpener instead of a FancyURLOpener, which automatically follows redirects.
I see that server returns HTTP/1.1 302 Found - HTTP redirect.
urllib automatically follow redirects, so headers returned by urllib is headers from http://www.trucks.com.mt, not http://www.yellowpages.com.mt/Malta-Web/127151.aspx