How Flask streams data [duplicate] - python

Could someone tell me about that Content-Length or Transfer-Encoding: "Chunked" is a must for Http request ? I'm using c++ to write a http sever.
The http respond can use close socket to know the length of message body. But how about request ?
I have checked RFC2616 about http 1.1, but I'm not clear enough about that.
My question is if a http request sent out without "Content-Length" or "Chunked Transfer-Encoding",how could I use "WSARecv" to know the length of message body, for the case I use WSARecv and get all the headers and net stream ends with "\r\n\r\n" coincidentally, I fail to get the length of message body. If I deliver a WSARecv again,it may wait forever because there is no more data. If I do not deliver "WSARecv" again, i may not get the message body after if there has.
Or maybe the "Content-Length" and "chunked transfer-encoding" is a must for http request ? client should set one of them to tell sever the length of message ?

If you don't specify a Transfer-Encoding or Content-Length then the request (or response) is implicitly a variable length request/response and the only way to signal the end of the body is to shutdown the connection (and conversely detect the shutdown/eof in the receiver).
This means that this kind of request/response is also implicitly Connection: close
If you're implementing HTTP1.1 then you must support all three transfer methods.
When I wrote my HTTP server I abstracted the concept of a "request stream" from the concept of a "connection stream". The "request stream" is polymorphic and supports the concept of reading to "EOF". There's no reason you couldn't also have a method on there to "read_chunk". In the case of a non-chunked request, this could simply read until EOF.
This allowed me to implement simultaneous execution of multiple requests on the same connection (but there is some jiggery-pokery to ensure that the responses go back in the correct order!)

RFC 2616 is obsolete.
The answer is in https://greenbytes.de/tech/webdav/rfc7230.html#header.content-length.

Related

Does setting socket timeout cancel the initial request

I have a request that can only run once. At times, the request takes much longer than it should.
If I were to set a default socket timeout value (using socket.setdefaulttimeout(5)), and it took longer than 5 seconds, will the original request be cancelled so it's safe to retry (see example code below)?
If not, what is the best way to cancel the original request and retry it again ensuring it never runs more than once.
import socket
from googleapiclient.discovery import build
from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_exception_type
#retry(
retry=retry_if_exception_type(socket.timeout),
wait=wait_fixed(4),
stop=stop_after_attempt(3)
)
def create_file_once_only(creds, body):
service = build('drive', 'v3', credentials=creds)
file = service.files().create(body=body, fields='id').execute()
socket.setdefaulttimeout(5)
create_file_once_only(creds, body)
It's unlikely that this can be made to work as you hope. An HTTP POST (as with any other HTTP request) is implemented by sending a command to the web server, then receiving a response. The python requests library encapsulates a lot of tedious parts of that for you, but at the core, it's going to do a socket send followed by a socket recv (it may of course require more than one send or recv depending on the size of the data).
Now, if you were able to connect to the web server initially (again, this is taken care of for you by the requests library but typically only takes a few milliseconds), then it's highly likely that the data in your POST request has long since been sent. (If the data you are sending is megabytes long, it's possible that it's only been partially sent, but if it is reasonably short, it's almost certainly been sent in full.)
That in turn means that in all likelihood the server has received your entire request and is working on it or has enqueued your request to work on it eventually. In either case, even if you break the connection to the server by timing out on the recv, it's unlikely that the server will actually even notice that until it gets to the point in its execution where it would be sending its response to your request. By that point, it has probably finished doing whatever it was going to do.
In other words, your socket timeout is not going to apply to the "HTTP request" -- it applies to the underlying socket operations instead -- and almost certainly to the recv part on the tail end. And just breaking the socket connection doesn't cancel the HTTP request.
There is no reliable way to do what you want without designing a transactional protocol with the close cooperation of the HTTP server.
You could do something (with the cooperation of the HTTP server still) that could do something approximating it:
Create a unique ID (UUID or the like)
Send a request to the server that contains that UUID along with the other account info (name, password, whatever else)
The server then only creates the account if it hasn't already created an account with the same unique ID.
That way, you can request the operation multiple times, but know that it will only actually be implemented once. If asked to do the same operation a second time, the server would simply respond with "yep, already did that".

basic authentication https proxy sockets displaying tlsv1 decode error after connection established

I am attempting to connect to an https authenticated proxy with basic auth (user:pass with b64 encoding). I was able to make non-authed proxies work fine, but i recently bought proxies and now i need to make authed work. It seemed simple, all i had to do was add a "proxy-authorization" header and it would work. After doing this it worked, well it gave me a "200 connection established" response. After doing this, i attempt to ssl wrap the socket with the "ssl" lib. This causes me to receive the following error: "tlsv1 decode error". After looking at some stack posts, i figured i should change the protocol version being used, I tried to use every other available version which for some reason still resulted in the error i said above (this was weird because i was not using "tlsv1".
credentials = base64.encodebytes(b"user:pass").decode().strip(r"\n")
s.connect((proxy[0], int(proxy[1])))
s.send(f"CONNECT site.com:443 HTTP/1.1\r\nProxy-Authorization: Basic {credentials}\r\nHost: site.com\r\n\r\n".encode())
connect_response = s.recv(4096)
print(connect_response)
s = wrap_socket(s)
Upon doing more research, i figured out that it could have something to do with the "realm" response header. I could not find any docs on this and i dont know what its used for. This makes me believe that i am doing something wrong , and i should have more steps before i attempt to wrap the socket.
My question is why am i getting this error/traceback when attempting to wrap the socket and how can i fix this?
If you print out the request before you send it you will see this:
b'CONNECT google.com:443 HTTP/1.1\r\nProxy-Authorization: Basic dXNlcjpwYXNz\n\r\nHost: google.com\r\n\r\n'
Notice the \n\r\n at the end of the Proxy-Authorization header. This is simply wrong, i.e. it should be \r\n only.
Due to the extraneous \n these characters get actually get interpreted as the end of the HTTP header and the rest Host: google.com\r\n\r\n will remain unread by the server for now. It gets only read when the TLS handshake gets expected, i.e. the Host: google.com\r\n\r\n will be interpreted as part of the TLS ClientHello by the server and thus the TLS handshake will fail.
The fix is to eliminate the wrong \n at the end. While you've tried this with strip(r"\n") the correct way would be strip("\n") only: you want to eliminate the 1-byte character "\n" and not the 2-byte string r"\n".

Why does aiohttp need await to get body?

I found aiohttp server should use await keyword to get Request Body
async def handler(request):
body = await request.json(). # or text(), read()
I think when the handler is called, the request body is already in server side memory and I don't think it is I/O intensive work, needing asynchronous operation.
Any missing point?
With a very large request message-body, you might not have received the complete body when the handler is called. HTTP1/1 states that the server might answer before the end of the request (from RFC 2616):
An HTTP/1.1 (or later) client sending a message-body SHOULD monitor the network connection for an error status while it is transmitting the request. If the client sees an error status, it SHOULD immediately cease transmitting the body.
So you could for example reply with an 4xx Client error code immediately if you do not accept the request (e.g. 401 Unauthorized if the token is invalid) before receiving the whole request message-body.
On the contrary, they are not in memory. Quoting the documentation:
While methods read(), json() and text() are very convenient you should use them carefully. All these methods load the whole response in memory. For example if you want to download several gigabyte sized files, these methods will load all the data in memory. Instead you can use the content attribute.
Also see the other answer for the inner workings of the HTTP protocol.

How can I make an http request without getting back an http response in Python?

I want to send it and forget it. The http rest service call I'm making takes a few seconds to respond. The goal is to avoid waiting those few seconds before more code can execute.
I'd rather not use python threads
I'll use twisted async calls if I must and ignore the response.
You are going to have to implement that asynchronously as HTTP protocol states you have a request and a reply.
Another option would be to work directly with the socket, bypassing any pre-built module. This would allow you to violate protocol and write your own bit that ignores any responses, in essence dropping the connection after it has made the request.
HTTP implies a request and a reply for that request. Go with an async approach.
You do not need twisted for this, just urllib will do. See http://pythonquirks.blogspot.com/2009/12/asynchronous-http-request.html
I am copying the relevant code here but the credit goes to that link:
import urllib2
class MyHandler(urllib2.HTTPHandler):
def http_response(self, req, response):
return response
o = urllib2.build_opener(MyHandler())
o.open('http://www.google.com/')

Python: Asynchronous http requests sent in order with automatic handling of cookies?

I am coding a python (2.6) interface to a web service. I need to communicate via http so that :
Cookies are handled automatically,
The requests are asynchronous,
The order in which the requests are sent is respected (the order in which the responses to these requests are received does not matter).
I have tried what could be easily derived from the build-in libraries, facing different problems :
Using httplib and urllib2, the requests are synchronous unless I use thread, in which case the order is not guaranteed to be respected,
Using asyncore, there was no library to automatically deal with cookies send by the web service.
After some googling, it seems that there are many examples of python scripts or libraries that match 2 out of the 3 criteria, but not the 3 of them. I am thinking of reading through the cookielib sources and adapting what I need of it to asyncore (or only to my application in a ad hoc manner), but it seems strange that nothing like this exists yet, as I guess I am not the only one interested. If anyone knows of pointers about this problem, it would be greatly appreciated.
Thank you.
Edit to clarify :
What I am doing is a local proxy that interfaces my IRC client with a webchat. It creates a socket that listens to IRC connections, then upon receiving one, it logs in the webchat via http. I don't have access to the behaviour of the webchat, and it uses cookies for session IDs. When client sends several IRC requests to my python proxy, I have to forward them to the webchat's server via http and with cookies. I also want to do this asynchronously (I don't want to wait for the http response before I send the next request), and currently what happens is that the order in which the http requests are sent is not the order in which the IRC commands were received.
I hope this clarifies the question, and I will of course detail more if it doesn't.
Using httplib and urllib2, the
requests are synchronous unless I use
thread, in which case the order is not
guaranteed to be respected
How would you know that the order has been respected unless you get your response back from the first connection before you send the response to the second connection? After all, you don't care what order the responses come in, so it's very possible that the responses come back in the order you expect but that your requests were processed in the wrong order!
The only way you can guarantee the ordering is by waiting for confirmation that the first request has successfully arrived (eg. you start receiving the response for it) before beginning the second request. You can do this by not launching the second thread until you reach the response handling part of the first thread.

Categories

Resources