---SOLVED---
It turned out the request body had literal r"\n" (repr: "\\n") characters in it, and since I simply copy pasted the body as a Python string, Python thought I was giving it newline characters rather than escaped newline characters.
The reason this causes a Bad Request is as follows: So the body was JSON, and in JSON you have to escape all your newline characters by definition. So when the server loads the JSON object from the raw text, an error is thrown causing Bad Request
I realised this because the Content-Length header was different in both cases (\n is one char while \\\n is two chars, although perhaps the Content-Length doesn't actually matter.
Also it is noteworthy that when a lower Content-Length is sent, Bad Request is also returned. I believe this is because the JSON body gets truncated, and the server doesn't accept the important char (e.g. closing brace or something)
--- Problem:---
Summary:
I am trying to use Python to simulate a POST request to bitbucket.org performed within my Firefox web browser. Here is what I did:
Tracked the POST request using Firebug
Copied the POST request headers
Copied the POST request body (in application/json format)
Code:
Here is the code I use to POST my request, but it's a bit long and not very relevant. My Content-Type is application/json, and my POST body is a JSON-encoded string.
dataString = '{"branch":"master","files":[{"path":"readme.txt","content":"ntestxx\n \n"}],"message":"readme.txt edited online with Bitbucket","parents":["465305dc4da32f91da057b65297cda9b72c"],"repository":{"full_name":"minesite/ica-i18n"},"timestamp":"2014-03-20T23:49:29.759Z","transient":false}'
headers = {'X-CSRFToken': '6TqWjCl698U99Iu6ZYGBAloCxZ', 'Content-Length': '2190', 'Accept-Language': 'en,en-us;q=0.7,zh;q=0.3', 'X-NewRelic-ID': 'VwMGVVZSGwIIUFBQDwU=, VwMGVVZSGwIIUFBQDwU=', 'Cookie': 'csrftoken=6TqWjCl698U99Iu6ZYGBAloCxZ; __utma=254090395.1171276563.1394767875.1394776803.1395358874.3; __utmc=254090395; __utmz=254090395.1394776803.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); bb_session=gpqergylgoa7icpwosqsbpxig0; __utmv=254090395.|1=isBBUser=true=1; recently-viewed-repos_1701252=3802872%2C108928; __utmb=254090395.21.9.1395359363952', 'Connection': 'keep-alive', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0', 'Host': 'bitbucket.org', 'X-Requested-With': 'XMLHttpRequest', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'Referer': 'https://bitbucket.org/xxxxxx/xxxxxxx/src/465305dc4da32f91da057b6529a8e4/readme.txt?at=master', 'Content-Type': 'application/json; charset=UTF-8', 'Accept-Encoding': 'gzip, deflate'}
edit = requests.post("https://bitbucket.org/!api/internal/repositories/xxxxxxx/xxxxxxxx/oecommits/", data=dataString, headers=headers)
Results vs. expected results:
When I perform the POST request using my Firefox web browser (using Firebug's "resend request" function), I get a 409 CONFLICT response (Which is the desired response! I am simulating a request to an online editor, so that should be the correct response to a re-sent edit).
However, when I try to simulate the request by copying the request header and the request body, I get a 400 BAD REQUEST response, and the response contains no other information, so I don't even know what my problem is.
Regardless of how many times I send the POST in the web-browser (despite an incorrect timestamp), it achieves the intended outcome, but the server refuses to accept any requests I make using the python requests library.
Response using browser request:
Headers
HTTP/1.1 409 CONFLICT
Server: nginx/1.5.10
Date: Fri, 21 Mar 2014 00:20:55 GMT
Content-Type: text/plain
Content-Length: 45
Connection: keep-alive
x-served-by: app16
X-Render-Time: 0.558492183685
Content-Language: en
X-Static-Version: 48695e7c3140
Vary: Authorization, Accept-Language, Cookie
X-Version: e6778a5040f7
Etag: "92f0b780984e984140de0f8ed0a3992c"
X-Frame-Options: SAMEORIGIN
X-Request-Count: 483
X-NewRelic-App-Data: PxQEVFdXCAITVVlWBgMPUkYdFGQHBDcQUQxLA1tMXV1dSn8UXwJHCwtYGAMPF1pGUw8EFhlQRxYXH1dDC0gKDEQHSgxZVBpaUgtdDVQTQFgrWFsICAZ9V1kQIg1aXF4SLFBYVw4DEUxTEF0DTF0WHgNJCU8EVApUUgUHVFFQCgQCU1FXGwMGX1QdFAEBUVVbA1AJVQEBB1FSA11DHQdSDhdTag==
Body
Specified change not on head of branch master
Response using python request:
Headers
content-length: 11
x-served-by: app10
x-render-time: 0.012787103653
content-language: en
content-type: text/plain
vary: Authorization, Accept-Language, Cookie
connection: keep-alive
server: nginx/1.5.10
x-version: e6778a5040f7
etag: "825644f747baab2c00e420dbbc39e4b3"
x-request-count: 321
x-newrelic-app-data: PxQEVFdXCAITVVlWBgMPUkYdFGQHBDcQUQxLA1tMXV1dSn8UXwJHCwtYGAMPF1pGUw8EFhlQRxYXH1dDC0gRB0MNTRBbXQ5gVhZWFEMCVkBIBhtRSFMJAARQUlsDBw9VXAIBC1tWVU4CUwtUFBpVAwFcWgdTVQIAXQBRWQQAGh9WBQ0RUmw=
date: Fri, 21 Mar 2014 00:51:01 GMT
x-frame-options: SAMEORIGIN
x-static-version: 48695e7c3140
Body
Bad Request
Some of my ideas:
I am thinking that perhaps there is another component to a HTTP POST request that I need to simulate? Perhaps when Firefox sends a POST request, there is some header or wrapper added that makes the request valid?
Or is there something more to a POST request than just a method, headers, and body?
Maybe it's something to do with the fact that it's HTTPS instead of HTTP?
Update:
I have tried sending the "sent cookies" in with the request, to little success.
Or is there something more to a POST request than just a method,
headers, and body?
No. The important part are the request headers. They should be exactly the same in both cases.
Because Firebug can just track the network requests inside Firefox, you'll need an external network analyzer like Wireshark to track the requests coming from your Python script.
Of course you need to run it on the server where the script lies.
Another solution would be to run your request against a local web server and log the request information there.
Then you'll be able to compare the request made in the browser with the one from your script.
Related
Is there a way to form batched requests using python requests module ? I want to send multiple homogenous http API requests in a single POST request. I am trying to use the GCP documentation https://cloud.google.com/dns/docs/reference/batch?hl=en_US&_ga=2.138235123.-2126794010.1660759555 to create multiple DNS records in a single POST request. Can anyone help with a simple example of how this could be achieved using python requests module ?
This is how the sample POST request will look like:
POST /batch/farm/v1 HTTP/1.1
Authorization: Bearer your_auth_token
Host: www.googleapis.com
Content-Type: multipart/mixed; boundary=batch_foobarbaz
Content-Length: total_content_length
--batch_foobarbaz
Content-Type: application/http
Content-ID: <item1:12930812#barnyard.example.com>
GET /farm/v1/animals/pony
--batch_foobarbaz
Content-Type: application/http
Content-ID: <item2:12930812#barnyard.example.com>
PUT /farm/v1/animals/sheep
Content-Type: application/json
Content-Length: part_content_length
If-Match: "etag/sheep"
{
"animalName": "sheep",
"animalAge": "5"
"peltColor": "green",
}
--batch_foobarbaz
Content-Type: application/http
Content-ID: <item3:12930812#barnyard.example.com>
GET /farm/v1/animals
If-None-Match: "etag/animals"
--batch_foobarbaz--
Basically; the main intention here is to not overload the remote API with multiple http requests causing the rate limit throttling but instead use batched http requests so that the remote API gets only a single batched request embedded with multiple requests in the form of parts.
No. HTTP doesn't work that way. You can send multiple requests simultaneously using threads, but you can't send multiple POSTs through a single request.
According to the doc, the individual batch HTTP requests are supposed to go in the body of the request. You can try building the requests up manually. Like so:
import requests
body = """
--batch_foobarbaz
Content-Type: application/http
Content-ID: <item1:12930812#barnyard.example.com>
GET /farm/v1/animals/pony
--batch_foobarbaz
Content-Type: application/http
Content-ID: <item2:12930812#barnyard.example.com>
PUT /farm/v1/animals/sheep
Content-Type: application/json
Content-Length: part_content_length
If-Match: "etag/sheep"
{
"animalName": "sheep",
"animalAge": "5"
"peltColor": "green",
}
--batch_foobarbaz
Content-Type: application/http
Content-ID: <item3:12930812#barnyard.example.com>
GET /farm/v1/animals
If-None-Match: "etag/animals"
--batch_foobarbaz--
"""
response = requests.post(
"https://www.googleapis.com/batch/API/VERSION/batch/form/v1",
data=body,
headers={
'Authorization': 'Bearer your_auth_token',
'Host': 'www.googleapis.com',
'Content-Type': 'multipart/mixed; boundary=batch_foobarbaz'
}
)
Of course, you'd have to build the individual requests manually:
Content-Type: application/http
Content-ID: <item1:12930812#barnyard.example.com>
GET /farm/v1/animals/pony
Unless you can find a library that can construct HTTP requests according to the HTTP/1.1 RFC. I can't think of one off the top of my head.
Please note that I am still very green to the world of coding and have only been writing code for a week. Thus please forgive me if my question is stupid.
My current objective is to log into a website with my python script.
I need to take information out of the HTTP response headers and add it to my POST, this is my current script:
import requests
targetURL = "http://website.com/login.aspx"
headers = { "Host": "192.168.56.101",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0"}
get_request = requests.get(url=targetURL,
headers=headers)
The response I get is as follows:
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store
Pragma: no-cache,no-cache
Content-Length: 15748
Content-Type: text/html; charset=utf-8
Expires: -1
Set-Cookie: ASP.NET_SessionId=123456
How does one create a HTTP post with the information received from the above?
Any information would be highly appreciated.
The return value of requests.get is an object with properties that you can access. Try the following:
response = requests.get(url=targetURL, headers=headers)
response_headers = response.headers
response_headers is a dictionary. You can modify it and pass it to whatever you need.
I am trying to do a post request on python utilizing the requests library, when I set my custom headers which are the following:
User-Agent: MBAM-C
Content-Type: application/json
Authorization: True
Content-Length: 619
Connection: Close
However when It send the request with the custom headers it adds its own headers which give a bad request response from the server..
User-Agent: MBAM-C
Accept-Encoding: gzip, deflate
Accept: */*
Connection: Close
Content-Type: application/json
Authorization: True
Content-Length: 559
It is due to the design goals of the requests project.
This behavior is documented here. You may want to use a lower level library, if it is problematic for the library to be correcting content length or adding desirable headers. Requests bills itself as: "an elegant and simple HTTP library for Python, built for human beings." and a part of that is advertising that it can accept compressed content and all mime types.
Note: Custom headers are given less precedence than more specific sources of information. For instance:
Authorization headers set with headers= will be overridden if credentials are specified in .netrc, which in turn will be overridden by the auth= parameter.
Authorization headers will be removed if you get redirected off-host.
Proxy-Authorization headers will be overridden by proxy credentials provided in the URL.
Content-Length headers will be overridden when we can determine the length of the content.
I'm trying to mimic a curl request I'm doing using python the call is. Note that
HTTPS Request
Ignoring SSL ceritification verification
My server is django
The curl command I used is
curl -k --dump-header - -H "Content-Type: application/json" -X POST --data '{"environment_name": "foo"}' https://localhost/api/v1/environment/
and the response from the server is successful
HTTP/1.1 201 CREATED
Date: Tue, 17 Jun 2014 00:59:59 GMT
Server: Server
Vary: Accept-Language,Cookie,User-Agent
Content-Language: en-us
Location: https://localhost/api/v1/environment/None/
Status: 201 CREATED
Content-Length: 0
Cneonction: close
Content-Type: text/html; charset=utf-8
However when I try to do a post request in python with 'requests' my script is
import json
data = {'enviornment_name' : 'foo'}
headers = {'Content-type' : 'application/json'}
response = requests.post("https://localhost/api/v1/environment", headers=headers, data=data, verify=False)
When running the script I get back a huge stack trace but the part in red is
E DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing: incorrect data check',))
I'm not sure why I can't my script to work via python
Your server is claiming to return gzip'd content which it is not. The response is returning Content-Encoding: gzip in the headers. This could be because in requests, by default, it sends Accept-Encoding: gzip, compress. You should try setting "Accept-Encoding" to None in your headers dictionary in addition to #Fabricator's suggestion in the comments.
Your headers dictionary will look like:
headers = {'Content-Type': 'application/json', 'Accept-Encoding': None}
And your call to requests will look like
requests.post(url, headers=headers, data=json.dumps(data), verify=False)
#Fabricator I need the verify=False however I noticed the one thing in my code that was an issue for the server I was using I needed the trailing '/' at the end of the URI. In addition I also needed the json.dumps(data) not json.dump(data) in case others are looking. Thanks for the help
I am using the python urllib2 library for opening URL, and what I want is to get the complete header info of the request. When I use response.info I only get this:
Date: Mon, 15 Aug 2011 12:00:42 GMT
Server: Apache/2.2.0 (Unix)
Last-Modified: Tue, 01 May 2001 18:40:33 GMT
ETag: "13ef600-141-897e4a40"
Accept-Ranges: bytes
Content-Length: 321
Connection: close
Content-Type: text/html
I am expecting the complete info as given by live_http_headers (add-on for firefox), e.g:
http://www.yellowpages.com.mt/Malta-Web/127151.aspx
GET /Malta-Web/127151.aspx HTTP/1.1
Host: www.yellowpages.com.mt
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=156587571.1883941323.1313405289.1313405289.1313405289.1; __utmz=156587571.1313405289.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
HTTP/1.1 302 Found
Connection: Keep-Alive
Content-Length: 141
Date: Mon, 15 Aug 2011 12:17:25 GMT
Location: http://www.trucks.com.mt
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET, UrlRewriter.NET 2.0.0
X-AspNet-Version: 2.0.50727
Set-Cookie: ASP.NET_SessionId=zhnqh5554omyti55dxbvmf55; path=/; HttpOnly
Cache-Control: private
My request function is:
def dorequest(url, post=None, headers={}):
cOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
urllib2.install_opener( cOpener )
if post:
post = urllib.urlencode(post)
req = urllib2.Request(url, post, headers)
response = cOpener.open(req)
print response.info() // this does not give complete header info, how can i get complete header info??
return response.read()
url = 'http://www.yellowpages.com.mt/Malta-Web/127151.aspx'
html = dorequest(url)
Is it possible to achieve the desired header info details by using urllib2? I don't want to switch to httplib.
Those are all of the headers the server is sending when you do the request with urllib2.
Firefox is showing you the headers it's sending to the server as well.
When the server gets those headers from Firefox, some of them may trigger it to send back additional headers, so you end up with more response headers as well.
Duplicate the exact headers Firefox sends, and you'll get back an identical response.
Edit: That location header is sent by the page that does the redirect, not the page you're redirected to. Just use response.url to get the location of the page you've been sent to.
That first URL uses a 302 redirect. If you don't want to follow the redirect, but see the headers from the first page instead, use a URLOpener instead of a FancyURLOpener, which automatically follows redirects.
I see that server returns HTTP/1.1 302 Found - HTTP redirect.
urllib automatically follow redirects, so headers returned by urllib is headers from http://www.trucks.com.mt, not http://www.yellowpages.com.mt/Malta-Web/127151.aspx