python requests keep connection alive for indefinite time

python requests keep connection alive for indefinite time - python

I'm trying to get a python script running which calls an external API (to which I only have read-access) in a certain interval, the API uses cookie-based authentication: Calling the /auth endpoint initially sets session cookies which are then used for authentication in further requests.
As for my problem: Because the authentication is based on an active session, the cookies aren't valid once the connection drops, and therefore has to be restarted. From what I've read, requests is based on urllib3, which keeps the connection alive by default. Yet, after a few tests I noticed that under some circumstances, the connection will be dropped anyway.
I used a Session object from the requests module and I've tested how long it takes for the connection to be dropped as follows:
from requests import session
import logging
import time import time, sleep
logging.basicConfig(level=logging.DEBUG)
def tt(interval):
credentials = {"username":"user","password":"pass"}
s = Session()
r = s.post("https://<host>:<port>/auth", json=credentials)
ts = time()
while r.status_code is 200:
r = s.get("https://<host>:<port>/some/other/endpoint")
sleep(interval)
return time() - ts # Seconds until connection drop
Might not be the best way to find that out, but I let that function run twice, once with an interval of 1 second and then with an interval of 1 minute. Both had run for about an hour until I had to manually stop the execution.
However, when I swapped the two lines within the while loop, which meant that there was a 1-minute-delay after the initial POST /auth request, the following GET request failed with a 401 Unauthorized and this message being logged beforehand:
DEBUG:urllib3.connectionpool:Resetting dropped connection: <host>
As the interval of requests may range from a few minutes to multiple hours in my prod script, I have to know beforehand how long these sessions are kept alive and whether there are some exceptions to that rule (like dropping the connection if no request after the initial POST /auth is made for a short while).
So, how long does requests or rather urllib3 keep the connection alive, and is it possible to extend that time indefinitely?
Or is it the server instead of requests that drops the connection?

By using requests.Session, keep-alive is handled for you automatically.
In the first version of your loop that continuously polls the server after the /auth call is made, the server does not drop the connection due to the subsequent GET that happens. In the second version, it's likely that sleep interval exceeds the amount of time the server is configured to keep the connection open.
Depending on the server configuration of the API, the response headers may include a Keep-Alive header with information about how long connections are kept open at a minimum. HTTP/1.0 specifies this information is included in the timeout parameter of the Keep-Alive header. You could use this information to determine how long you have until the server will drop the connection.
In HTTP/1.1, persistent connections are used by default and the Keep-Alive header is not used unless the server explicitly implements it for backwards compatibility. Due to this difference, there isn't an immediate way for a client to determine the exact timeout for connections since it may exist solely as server side configuration.
The key to keeping the connection open would be to continue polling at regular intervals. The interval you use must be less than the server's configured connection timeout.
One other thing to point out is that artificially extending the length of the session indefinitely this way makes one more vulnerable to session fixation attacks. You may want to consider adding logic that occasionally reestablishes the session to minimize risk of these types of attacks.

Related

Does setting socket timeout cancel the initial request

I have a request that can only run once. At times, the request takes much longer than it should.
If I were to set a default socket timeout value (using socket.setdefaulttimeout(5)), and it took longer than 5 seconds, will the original request be cancelled so it's safe to retry (see example code below)?
If not, what is the best way to cancel the original request and retry it again ensuring it never runs more than once.
import socket
from googleapiclient.discovery import build
from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_exception_type
#retry(
retry=retry_if_exception_type(socket.timeout),
wait=wait_fixed(4),
stop=stop_after_attempt(3)
)
def create_file_once_only(creds, body):
service = build('drive', 'v3', credentials=creds)
file = service.files().create(body=body, fields='id').execute()
socket.setdefaulttimeout(5)
create_file_once_only(creds, body)

It's unlikely that this can be made to work as you hope. An HTTP POST (as with any other HTTP request) is implemented by sending a command to the web server, then receiving a response. The python requests library encapsulates a lot of tedious parts of that for you, but at the core, it's going to do a socket send followed by a socket recv (it may of course require more than one send or recv depending on the size of the data).
Now, if you were able to connect to the web server initially (again, this is taken care of for you by the requests library but typically only takes a few milliseconds), then it's highly likely that the data in your POST request has long since been sent. (If the data you are sending is megabytes long, it's possible that it's only been partially sent, but if it is reasonably short, it's almost certainly been sent in full.)
That in turn means that in all likelihood the server has received your entire request and is working on it or has enqueued your request to work on it eventually. In either case, even if you break the connection to the server by timing out on the recv, it's unlikely that the server will actually even notice that until it gets to the point in its execution where it would be sending its response to your request. By that point, it has probably finished doing whatever it was going to do.
In other words, your socket timeout is not going to apply to the "HTTP request" -- it applies to the underlying socket operations instead -- and almost certainly to the recv part on the tail end. And just breaking the socket connection doesn't cancel the HTTP request.
There is no reliable way to do what you want without designing a transactional protocol with the close cooperation of the HTTP server.
You could do something (with the cooperation of the HTTP server still) that could do something approximating it:
Create a unique ID (UUID or the like)
Send a request to the server that contains that UUID along with the other account info (name, password, whatever else)
The server then only creates the account if it hasn't already created an account with the same unique ID.
That way, you can request the operation multiple times, but know that it will only actually be implemented once. If asked to do the same operation a second time, the server would simply respond with "yep, already did that".

How to close a SolrClient connection?

I am using SolrClient for python with Solr 6.6.2. It works as expected but I cannot find anything in the documentation for closing the connection after opening it.
def getdocbyid(docidlist):
for id in docidlist:
solr = SolrClient('http://localhost:8983/solr', auth=("solradmin", "Admin098"))
doc = solr.get('Collection_Test',doc_id=id)
print(doc)
I do not know if the client closes it automatically or not. If it doesn't, wouldn't it be a problem if several connections are left open? I just want to know if it there is any way to close the connection. Here is the link to the documentation:
https://solrclient.readthedocs.io/en/latest/

The connections are not kept around indefinitely. The standard timeout for any persistent http connection in Jetty is five seconds as far as I remember, so you do not have to worry about the number of connections being kept alive exploding.
The Jetty server will also just drop the connection if required, as it's not required to keep it around as a guarantee for the client. solrclient uses a requests session internally, so it should do pipelining for subsequent queries. If you run into issues with this you can keep a set of clients available as a pool in your application instead, then request an available client instead of creating a new one each time.
I'm however pretty sure you won't run into any issues with the default settings.

Why Python http request creates TIME_WAIT connections?

I have this simple code, which connects with an external server. I call this function 100s of time a minute. And after a while I'm getting system lacked sufficient buffer exception. When I viewed the connections using TCPView it shows hundreds of connections to external server in TIME_WAIT status.
Why this is happening?
Is python request module not suitable if I have to send 100s of requests, then what should I do?
def sendGetRequest(self, url, payload):
success = True
url = self.generateUrl(url)
result = requests.get(url, params=urllib.parse.urlencode(payload))
code = result.status_code
text = result.text
if code < 200 or code >= 300:
success = False
result.close()
return success, code, text

You are closing a lot of connections you opened with requests at the client side, where the server expected them to be re-used instead.
Because HTTP is a TCP protocol, a bidirectional protocol, closing a socket on the client side means the socket can't yet fully close until the other end (the server end) acknowledges that the connection has been closed properly. Until the acknowledgement has been exchanged with the server (or until a timeout, set to 2x the maximum segment lifetime is reached), the socket remains in the TIME_WAIT state. In HTTP closing normally happens on the server side, after a response has been completed; it is the server that'll wait for your client to acknowledge closure.
You see a lot of these on your side, because each new connection must use a new local port number. A server doesn't see nearly the same issues because it uses a fixed port number for the incoming requests, and that single port number can accept more connections even though there may be any number of outstanding TIME_WAIT connection states. A lot of local outgoing ports in TIME_WAIT on the other hand means you'll eventually run out of local ports to connect from.
This is not unique to Python or to requests.
What you instead should do is minimize the number of connections and minimize closing. Modern HTTP servers expect you to be reusing connections for multiple requests. You want to use a requests.Session() object, so it can manage connections for you, and then do not close the connections yourself.
You can also drastically simplify your function by using standard requests functionality; params already handles url encoding, for example, and comparisons already give you a boolean value you could assign directly to success:
session = requests.Session()
def sendGetRequest(self, url, payload):
result = session.get(self.generateUrl(url), params=payload)
success = 200 <= result.status_code < 300
return success, result.status_code, result.text
Note that a 3xx status code is already handled automatically, so you could just use response.ok:
def sendGetRequest(self, url, payload):
result = session.get(self.generateUrl(url), params=payload)
return result.ok, result.status_code, result.text
Next, you may want to consider using asyncio coroutines (and aiohttp, still using sessions) to make all those check requests. That way your code doesn't have to sit idle for each request-response roundtrip to complete, but could be doing something else in that intervening period. I've build applications that handle 1000s of concurrent HTTP requests at a time without breaking a sweat, all the while doing lots of meaningful operations while slow network I/O operations are completing.

How do I make sure TLS session tickets are being rotated?

I have established a simple TLS 1.2 session between a client and a server using Python's SSL module (running LibreSSL 2.2.7 under the hood) and am wondering if session tickets are automatically rotated.
It looks like the server is hinting at the client that the session ticket should only be valid for 300 seconds (Session Ticket Lifetime Hint: 300 seconds)
But it's been almost an hour and a new session ticket hasn't been issued like I expected. Meanwhile I exchanged some application data between the two parties but that didn't seem to trigger anything.
Per RFC 4507 I understand the 300 seconds hint is not strictly required to be followed
The ticket_lifetime_hint field contains a hint from the server
about how long the ticket should be stored. The value indicates
the lifetime in seconds as a 32-bit unsigned integer in network
byte order. A value of zero is reserved to indicate that the
lifetime of the ticket is unspecified. A client SHOULD delete the
ticket and associated state when the time expires. It MAY delete
the ticket earlier based on local policy. A server MAY treat a
ticket as valid for a shorter or longer period of time than what is
stated in the ticket_lifetime_hint.
But then how do I know if ticket rotation is happening? How do I check how long my client waits before rotating tickets?

The session ticket is given by the server during the handshake. And to initiate a handshake you must either start with a new connection with an empty ticket (by playing with the HTTP keep alives for example), or force a rehandshake in an established connection. Unfortunately, keeping a connection opened for a long time and waiting to see anything happen like a ticket update is not likely to happen.
If you want to restart with new connections, either program your client to close and reopen new connections from time to time, or try the HTTP Keep-Alive header on the server side which is supposed to inform the client on how it should behave.
Unluckily we are unsure of this header behaviour because we know this header exists in RFC 2068 , but its use is described in an RFC draft which is now expired.
An example of use :
Keep-Alive: timeout=300
The SSL rehandshake is possible if you have access to a low-level API. Then the server can send an HelloRequest forcing the client to start a rehandshake and at this moment it is supposed to ask for a new ticket if the previous is considered expired.
In both cases, you should confirm with a network capture that it is behaving as expected. There is probably no way to see anything if you're not coding with a low level language (Java for example allows to code rehandshakes, but I'm not sure coding an entire server is worth).

What happens if a HTTP connection is closed while AppEngine is still running

The real question is if Google App Engine guarantees it would complete a HTTP request even if the connection is no longer existed (such as terminated, lost Internet connection).
Says we have a python script running on Google App Engine:
db.put(status = "Outputting")
print very_very_very_long_string_like_1GB
db.put(status = "done")
If the client decides to close the connection in the middle (too much data coming...), will status = "done" be executed? Or will the instance be killed and all following code be ignored?

If the client breaks the connect, the request will continue to execute. Unless it reaches the deadline of 60 seconds.

GAE uses Pending Queue to queue up requests. If client drops connection and request is already in the queue or being executed, then it will not be aborted. Afaik all other http servres behave the same way.
This will be a real problem when you make requests that change state (PUT, POST, DELETE) on mobile networks. On Edge networks we see about 1% of large requests (uploads, ~500kb) dropped in the middle of request executing (exec takes about 1s): e.g. server gets the data and processes it, but client does not receive response, triggering it to retry. This could produce duplicate data in the DB, breaking integrity of this data.
To alleviate this you will need to make your web methods idempotent: repeating the same method with same arguments does not change state. The easiest way to achieve this would be one of:
Hash relevant data and compare to existing hashes. In you case it would be the string you are trying to save (very_very_very_long_string_like_1GB). You can do this server side.
Client provides unique request-scoped ID, and sever checks if this ID was already used.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.