Http protocol, Content-Length, get page content Python - python

I'm trying to code my own Python 3 http library to learn more about sockets and the Http protocol. My question is, if a do a recv(bytesToRead) using my socket, how can I get only the header and then with the Content-Length information, continue recieving the page content? Isn't that the purpose of the Content-Length header?
Thanks in advance

In the past to accomplish this, I will read a portion of the socket data into memory, and then read from that buffer until a "\r\n\r\n" sequence is encountered (you could use a state machine to do this or simply use the string.find() function. Once you reach that sequence you know all of the headers have been read and you can do some parsing of the headers and then read the entire content length. You may need to be prepared to read a response that does not include a content-length header since not all responses contain it.
If you run out of buffer before seeing that sequence, simply read more data from the socket into your buffer and continue processing.
I can post a C# example if you would like to look at it.

Related

Python requests remove Connection header

I'm currently needing to make a single HTTP request to a bunch of servers that I have in a list however, these HTTP requests contain a 'Connection' header, which I need to remove. How would I do this?
I had the same issue with Accept-Encoding, but I was able to comment out the section that automatically applies that header in httpclient.py (I'm using the requests lib for this). Is there any way around this aside from using sockets and sending raw HTTP requests? Is there maybe another snippet that can be commented out to prevent the header from being automatically added?
I realize that removing the header is a bad idea in the real world, but there's justification for this and it 100% needs to be removed.
I've tried assign it as empty and NoneType, both appear to be failing. I'm wondering if this is something I can't change.
Seems you can actually patch this out by commenting out line 1180 in urllib2.py
headers["Connection"] = "close"

Access HTML code in HTTP response with scapy

I have a program that uses scapy to sniff data, I'm trying to access the html returned in the http response, I can access all the headers and the response body which includes the html BUT it appears as ��Z��}ks۸���[u��ܵ�#J��/ɴ+q2��&3��s�.
Accessing packet[Raw].load returns the above result.
Now looking at the headers I can see that this is compressed with gzip, which explains why its being displayed like this, so I tried decompressing it with as GzipFile and using zlib but in both cases I got an error message stating this is not a gzip file.
Any help on decompressing it properly??
UPDATE: I noticed that the main issue is that I am trying to decompress part of the string, so the http response is being sent as chunks and the decompress method is failing because I am trying to decompress each chunk separately, if I combine all the chunks I am able to decompress using zlib and gzip, but the same question remains, can I decompress the chunks one at a time before combining them ?
Well, you can do it with gzip module:
body_stream = StringIO.StringIO(body)
gzipper = gzip.GzipFile(fileobj=body_stream)
data = gzipper.read()
print data[:25]

Sort header from HTTP response before writing it to a file

I am currently trying to implement a HTTP client using python and sockets. It is very simple and the only thing it has to do is to download a file from a webserver and put it into a file supplied by the user.
My code is working fine but I am having a problem of how to exclude the HTTP response header from the file.
The HTTP response header is only at the beginning of the file so I was thinking that I could just dump all the data into the file and then take the header out after. This is a problem though since I/O is very slow.
My next thought was that I could run some Regex on the first response I get from the server, sort away the header and then dump the rest into the file. This seems as a very clunky way to do it though.
Does anyone have any suggestions on how to do this in a smart way?
In the http response, the headers are separated from the body with '\r\n\r\n'. To get only the body, you can try this:
bodyBegin = httpResponse.find('\r\n\r\n') + 4
body = httpResponse[bodyBegin:]
saveToFile(body)

Can Django send multi-part responses for a single request?

I apologise if this is a daft question. I'm currently writing against a Django API (which I also maintain) and wish under certain circumstances to be able to generate multiple partial responses in the case where a single request yields a large number of objects, rather than sending the entire JSON structure as a single response.
Is there a technique to do this? It needs to follow a standard such that client systems using different request libraries would be able to make use of the functionality.
The issue is that the client system, at the point of asking, does not know the number of objects that will be present in the response.
If this is not possible, then I will have to chain requests on the client end - for example, getting the first 20 objects & if the response suggests there will be more, requesting the next 20 etc. This approach is an OK work-around, but any subsequent requests rely on the previous response. I'd rather ask once and have some kind of multi-part response.
As far as I know, No you can't send Multipart http response not yet atleast. Multipart response is only valid in http requests. Why? Because no browser as I know of completely supports this.
Firefox 3.5: Renders only the last part, others are ignored.
IE 8: Shows all the content as if it were text/plain, including the boundaries.
Chrome 3: Saves all the content in a single file, nothing is rendered.
Safari 4: Saves all the content in a single file, nothing is rendered.
Opera 10.10: Something weird. Starts rendering the first part as plain/text, and then clears everything. The loading progress bar hangs on 31%.
(Data credits Diego Jancic)

Content-Length header not returned from Pylons response

I'm still struggling to Stream a file to the HTTP response in Pylons. In addition to the original problem, I'm finding that I cannot return the Content-Length header, so that for large files the client cannot estimate how long the download will take. I've tried
response.content_length = 12345
and I've tried
response.headers['Content-Length'] = 12345
In both cases the HTTP response (viewed in Fiddler) simply does not contain the Content-Length header. How do I get Pylons to return this header?
(Oh, and if you have any ideas on making it stream the file please reply to the original question - I'm all out of ideas there.)
Edit: while not a generic solution, for serving static files FileApp allows sending the Content-Length header. For dynamic content it looks like Alex Martelli's answer is the only option.
There's a bit of middleware code here that ensures all responses get a content length header if they're missing it. You could tweak it so that you set some other header in your response (say 'X-The-Content-Length') and the middleware uses that to make the content length if the latter's missing. I view the whole thing as a workaround for what I consider a pylons bug (its cavalier attitude to content length!) but apparently the pylons authors disagree with me on that score, so it's nice to at least have workarounds for it!-)
Try:
response.headerlist.append((str("Content-Length"), str(" 123456")))

Categories

Resources