Django Response always Chunked with text/html cannot set Content-Length

Django Response always Chunked with text/html cannot set Content-Length - python

In my Django Application's views.py , I return an HttpResponse object after attempting to set the following HTTP Header fields:
# Create a Response Object with the content to return
response = HttpResponse("%s"%(output_display),mimetype='text/html')
response['Cache-Control'] = 'must-revalidate, max-age=20'
response['Vary'] = 'Accept-Encoding'
response['Transfer-Encoding'] = 'gzip'
#response['Content-Encoding'] = 'gzip'
response['Connection'] = 'close'
#response['Content-Type'] = 'text/html'
response['Content-Length'] = '%s'%(len(output_display))
return response
I then capture the output using the Live HTTP Headers plugin with FireFox, and it looks like:
HTTP/1.1 200 OK
Date: Sun, 10 Mar 2013 14:55:09 GMT
Server: Apache/2.2.22 (Ubuntu)
Transfer-Encoding: gzip, chunked <---------- Why 'chunked'?
Vary: Accept-Encoding
Connection: close
Cache-Control: must-revalidate, max-age=20
Content-Encoding: gzip
Content-Type: text/html <---------------------- No Content-Length even though I set it?
X-Pad: avoid browser bug
I am trying to cache using Apache2's mem_cache, so I need the Content-Length to be set and cannot have 'chunked' for Transfer-Encoding.
My Apache2 mem_cache.conf looks like ( large numbers just for testing ):
<IfModule mod_mem_cache.c>
CacheEnable mem /
MCacheSize 10000
MCacheMaxObjectCount 10000000
MCacheMinObjectSize 1
MCacheMaxObjectSize 10000000
MCacheMaxStreamingBuffer 10000000
</IfModule>
But even though I explicitly set the Content-Length and Transfer-Encoding in my response code, 'chunked' is inserted automatically and therefore my Content-Length is not honored. Why is this? How can I fix this to get the desired response? Thanks -

I came across a similar issue recently with a mod_wsgi application; I was trying to update an apache configuration that was using its built-in disk cache, to use socache/memcache instead.
The disk cache was working, but switching to memcache or shmcb didn't work. If I issued a request for a resource I wanted cached, it wouldn't store it in the cache (CacheDetailHeader is helpful for this). Checking the logs at debug, I found the message:
[Wed Dec 05 18:52:16.571002 2018] [cache_socache:debug] \
[pid 884:tid 140422596777728] mod_cache_socache.c(389): \
[client 127.0.0.1:56576] AH02346: URL 'http://127.0.1.1:80/cacheme/c?' \
had no explicit size, ignoring, referer: http://127.0.0.1/
It seems that socache doesn't like objects that don't have explicit sizes. I tried setting the newer, socache equivalents of those mod_memcache settings to sufficiently large values: CacheSocacheMaxSize and CacheSocacheReadSize.
I know that the Content-Length header was being set and made it through to somewhere; it showed up in the mod_wsgi logs when I deliberately miscalculated it.
A few things I found:
Don't set Transfer-Encoding header yourself, as this is forbidden by the WSGI specification:
Who set the Transfer-Encoding: chunked header?
Even though you're setting the Content-Length header yourself, it's also being gzipped by apache. This changes the length; when Apache doesn't know what the length will be, it switches to chunked and removes the Content-Length header.
I found that with:
Content-Type: text/html
Content-Length set to my utf-8 encoding size
set in the python/mod_wsgi application, and:
SetEnv no-gzip 1
set in the apache configuration, that the object made it into a shmcb cache.
It looks like when apache gzips an object, it changes the headers to that it isn't accepted by socache.
I looked around for ways to make them compatible, but couldn't find too much on this issue. There is some mention of reordering the cache/deflate filters in the mod_cache documentation:
https://httpd.apache.org/docs/2.4/mod/mod_cache.html#finecontrol
This worked if I put in a directive to reorder the cache/deflate filters:
# within a directory
SetOutputFilter CACHE;DEFLATE
Curiously, on a cache miss, the server returned gzipped content, but on a cache hit, the server returned unencoded text/html. This looks odd, but I haven't understood the FilterChain directives well enough to try those out.
I also found some mention of this in a related issue with php/content-length:
https://serverfault.com/questions/183843/content-length-not-sent-when-gzip-compression-enabled-in-apache
The answer there found that if they set the DeflateBufferSize to a large-enough value, then content-length would be set.
I couldn't get this to work.
So it looks like one is stuck between choosing cached or gzipped.

Related

uwsgi/nginx configuration for chunked response

I have two endpoints like below:
GET on /api/v1/foo
POST on /api/v1/foo
I need the POST implementation to send back chunked responses using HTTP/1.1 chuked-tranfer encoding however the GET endpoint should send plain JSON
My setup is nginx -> uwsgi -> flask.
I see some of my chunks currently getting truncated at a hex size of 1000 which is 4K in bytes and not the same as my flask layer sent it. Probably because I'm missing some nginx or uwsgi configuration.
uwsgi configuration(uwsgi.ini):
[uwsgi]
route = ^/api/v1/foo$ goto:dochunked
route-run = last:
route-label = dochunked
route-if = equal:$\{REQUEST_METHOD\};POST goto:dopostchunked
route-run = last:
route-label = dopostchunked
route-run = chunked:
nginx configuration:
location / {
uwsgi_pass unix:var/uwsgi.sock;
uwsgi_read_timeout 600;
include uwsgi_params;
}
location /api/v1/foo {
uwsgi_pass unix:var/uwsgi.sock;
uwsgi_read_timeout 600;
include uwsgi_params;
if ($request_method = "POST" ) {
set $chunked_transfer_encoding on;
add_header X-Accel-Buffering no;
}
}
curl response headers
HTTP/1.1 200 OK
Server: nginx/1.10.1
Date: Wed, 03 Jan 2018 00:06:50 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: keep-alive
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
X-Accel-Buffering: no

Chunking is about Transfer-Encoding, plain JSON is Content-Type, the two things are not related.
The transfer encoding stuff is just about the communication methods used by the two HTTP 1.1 endpoints (the server and the client). Like would be a gzip compression, also. Using chunked transmission avoids using the Content-Length headers and allows the response to be sent in multiple chunks, of course. But on the other side, once the response is received chunks are added, and you should not see any difference between a response sent via Content-Length+big-body-in-one-chunk or a body-sent-in-multiple-chunks.
I say should because you may experience problems with bad HTTP/1.1. libraries which do not wait until the end of the message (last chunk marker) before launching something like an response-receveid event for application languages.
Usually using chunks or not is the responsability of the HTTP server, and you have few contgrol other that because chunks support is a requested feature of HTTP/1.1. Playing with the size of the response body and the size of buffers used by the http server you may see differences on the way chunks are made. If you have multiple actors in the chain (like here flask and Nginx), each actor can decide to reorganize the chunks, merge some of them (buffering), or not.
But as I said, you should not care about it. Unless your client side of the application as bugs with chunked encoding, that would mean your side of the HTTP communication doesn't understand HTTP/1.1.
Finally, if you really need to avoid chunks, but you shouldn't, I see 3 options:
You could enforce an HTTP/1.0 response. No chunks with HTTP/1.0. But that's a very very old version of the protocol. To do that you'll have to ask for HTTP/1.0 in the request side, you'll get an HTTP/1.1 response from Nginx but without the advanced features of HTTP/1.1 (like chunks).
You could use the nginx chunked_transfer_encoding setting. we can see it's on by default, so usually you use that to set it to off on a specific location. Your current way of using it does nothing. This option was made specifically for bad HTTp clients, as stated:
It may come in handy when using a software failing to support chunked
encoding despite the standard’s requirement.
You could maybe also try playing with proxy_buffering off, that may work, I'm unsure.

Why does django ignore HTTP_X_FORWARDED_PROTO from the wire but not in tests?

Why does django ignore the HTTP_X_FORWARDED_PROTO if it comes through the wire?
I added to the settings.xml the following config:
# make sure we know when we are secure when we are behind a proxy
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
I made a test to test that if
def testHttpSupport(self):
url = reverse('configuration-list')
response = self.client.get(url, HTTP_X_FORWARDED_PROTO='https')
cfg = response.data[0]
cfg_url = cfg['url']
self.assertTrue(cfg_url.startswith('https'))
this works fine. The url of the return object starts with https.
however if I try :
curl -v -H 'HTTP_X_FORWARDED_PROTO: https' http://localhost:8000/api/users/
...
> GET /api/users/ HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.51.0
> Accept: */*
> HTTP_X_FORWARDED_PROTO: https
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Mon, 03 Jul 2017 16:22:04 GMT
< Server: WSGIServer/0.2 CPython/3.6.1
< Content-Type: application/json
< Allow: GET, POST, OPTIONS
< Vary: Accept, Cookie
< X-Frame-Options: SAMEORIGIN
< Content-Length: 197
<
* Curl_http_done: called premature == 0
* Closing connection 0
[{"url":"http://localhost:8000/api/users/1/",...
How come it does not return 'https://' based urls like in my unit-test?

The issue is the header name. When accessing Django through a WSGI server, you should use the X-Forwarded-Proto header instead of the HTTP_X_FORWARDED_PROTO:
curl -v -H 'X-Forwarded-Proto: https' http://localhost:8000/api/users/
The WSGI protocol states that the relevant CGI specifications must be followed, which say:
Meta-variables with names beginning with 'HTTP_' contain values read
from the client request header fields, if the protocol used is HTTP.
The HTTP header field name is converted to upper case, has all
occurrences of "-" replaced with "_" and has 'HTTP_' prepended to
give the meta-variable name.
(source)
So whenever you are using a WSGI server, the X-Forwarded-Proto header is automatically converted to HTTP_X_FORWARDED_PROTO before it is passed in to Django. When you pass in the HTTP_X_FORWARDED_PROTO header instead, HTTP_ must still be prepended according to the specification. Thus, you end up with a header named HTTP_HTTP_X_FORWARDED_PROTO in Django.
self.client is not a WSGI server, and values passed in through the kwargs are inserted directly into the WSGI environment, without any processing. So in that case you have to do the conversion yourself and actually use the HTTP_X_FORWARDED_PROTO key:
CGI specification
The headers sent via **extra should follow CGI specification. For example, emulating a different “Host” header as sent in the HTTP request from the browser to the server should be passed as HTTP_HOST.
(source)

Content-type is blank in the headers of some requests

I've ran this queries millions (yes, millions) of times before with other URLs. However, I'm getting a KeyError when checking the content-type of the following webpage.
Code snippet:
r = requests.get("http://health.usnews.com/health-news/articles/2014/10/15/limiting-malpractice-claims-may-not-curb-costly-medical-tests", timeout=10, headers=headers)
if "text/html" in r.headers["content-type"]:
Error:
KeyError: 'content-type'
I checked the content of r.headers and it's:
CaseInsensitiveDict({'date': 'Fri, 20 May 2016 06:44:19 GMT', 'content-length': '0', 'connection': 'keep-alive', 'server': 'BigIP'})
What could be causing this?

Not all servers set a Content-Type header. Use .get() to retrieve a default if it is missing:
if "text/html" in r.headers.get("content-type", ''):
For the URL you gave I can't reproduce this:
$ curl -s -D - -o /dev/null "http://health.usnews.com/health-news/articles/2014/10/15/limiting-malpractice-claims-may-not-curb-costly-medical-tests"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Powered-By: Brightspot
Content-Type: text/html;charset=UTF-8
Date: Fri, 20 May 2016 06:45:12 GMT
Set-Cookie: JSESSIONID=A0C35776067AABCF9E029150C64D8D91; Path=/; HttpOnly
Transfer-Encoding: chunked
but if the header is missing from your response then it usually isn't Python's fault, and certainly not your code's fault.
It could be you encountered a buggy server or temporary glitch, or the server you contacted doesn't like you for one reason or another. Your sample response headers have the content-length set to 0 as well, for example, indicating there was no content to serve at all.
The server that gave you that response is BigIP, a load balancer / network router product from a company called F5. Hard to say exactly what kind (they have global routing servers as well as per-datacenter or cluster load balancers). It could be that the load balancer ran out of back-end servers to serve the request, doesn't have servers in your region, or the load balancer decided that you are sending too many requests and refuses to give you more than just this response, or it is the wrong phase of the moon and Jupiter is in retrograde and it threw a tantrum. We can't know!
But, just in case this happens again, do also look at the response status code. It may well be a 4xx or 5xx status code indicating that something was wrong with your request or with the server. For example, a 429 status code response would indicate you made too many requests in a short amount of time and should slow down. Test for it by checking r.status_code.

Testing web-tornado using Firefox's HttpRequest addon

I am testing my web-tornado application using Firefox's HttpRequest add-on but after I log in and receive my secure cookie data, I am not able to re-use it to consume protected methods.
This is my response data:
POST http://mylocalurl:8888/user/login
Content-Type: application/x-www-form-urlencoded
Login=mylogin;Pass=123
-- response -- 200 OK Content-Length: 33
Content-Type: text/html; charset=UTF-8
Server: TornadoServer/2.2.1
Set-Cookie:
IdUser="Mjk=|1395170421|ffaf0d6fecf2f91c0dccca7cab03d799ef6637a0";
expires=Thu, 17 Apr 2014 19:20:21 GMT; Path=/
{
"Success": true }
-- end response --
Now why I am trying to do is to configure HttpRequester to use this cookie for my new requests. I tried to add it using the "Headers" tab but my server keeps sending me a 403, Forbidden.
Can anyone help me on this ? It could be with another tool (for linux) too.

I really like fiddler2 for these kind of things and there's an alpha build for mono that you may wish to try out: http://www.telerik.com/download/fiddler
If you don't mind paid software you can use Charles, for which there is a free trial.
And if you are testing and already using python, why not use a simple python script with requests and its Session object with cookie-persistence..

Making Head Requests in Twisted

I am relatively new to using Twisted and I am having trouble returning the content-length header when performing a basic head request. I have set up an asynchronous client already but the trouble comes in this bit of code:
def getHeaders(url):
d = Agent(reactor).request("HEAD", url)
d.addCallbacks(handleResponse, handleError)
return d
def handleResponse(r):
print r.code, r.headers
whenFinished = twisted.internet.defer.Deffered()
r.deliverBody(PrinterClient(whenFinished))
return whenFinished
I am making a head request and passing the url. As indicated in this documentation the content-length header is not stored in self.length, but can be accessed from the self.headers response. The output is returning the status code as expected but the header output is not what is expected. Using "uhttp://www.espn.go.com" as an example it currently returns:
Set-Cookie: SWID=77638195-7A94-4DD0-92A5-348603068D58;
path=/; expires=Fri, 31-Jan-2034 00:50:09 GMT; domain=go.com;
X-Ua-Compatible: IE=edge,chrome=1
Cache-Control: max-age=15
Date: Fri, 31 Jan 2014 00:50:09 GMT
P3P: CP="CAO DSP COR CURa ADMa DEVa TAIa PSAa PSDa IVAi IVDi CONi
OUR SAMo OTRo BUS PHY ONL UNI PUR COM NAV INT DEM CNT STA PRE"
Content-Type: text/html; charset=iso-8859-1
As you can see, no content-length field is returned. If the same request is done in requests then the result will contain the content-length header:
r = requests.head("http://www.espn.go.com")
r.headers
({'content-length': '46133', 'content-encoding': 'gzip'...})
(rest omitted for readability)
What is causing this problem? I am sure it is a simple mistake on my part but I for the life of me cannot figure out what I have done wrong. Any help is appreciated.

http://www.espn.go.com/ returns one response if the client sends an Accept-Encoding: gzip header and another response if it doesn't.
One of the differences between the two responses is the inclusion of the Content-Length header.
If you want to make requests using Agent including Accept-Encoding: gzip then take a look at ContentDecoderAgent or the third-party treq package.

http allows (but does not REQUIRE) entity headers in responses to HEAD requests. The only restriction it places is that 200 responses to HEAD requests MUST NOT include an entity payload. Its up to the origin server to decide which, if any entity headers it would like to include.
In the case of Content-Length, it makes sense for this to be optional for HEAD; if the entity will be computed dynamically (as with compressing/decompressing content), it's better for the server to avoid the extra work of computing the content length when the request won't include the content anyway.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.