I am creating service A that communicates with service B over http requests. Since my system needs to support high load of usage, I wanted to create a persistent connection.
I am working with python requests library. I thought of creating a requests.Session object and just keep using the same object for all http requests.
What happens if for some reason the underlying tcp connection was lost? How can I check the session object is still alive and healthy? I tried googling but haven't found anything...
Thank you :)
Related
So I have an Azure functions app written in python and quite often the code throws an error like this.
HTTPSConnectionPool(host='www.***.com', port=443): Max retries exceeded with url: /x/y/z (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7faba31d0438>: Failed to establish a new connection: [Errno 110] Connection timed out',))
This happens in a few diffrent functions that make https connections.
I contacted support and they told me that this was caused by SNAT port exhaustion and adviced me to: "Modify the application to reuse connections instead of creating a connection per request, use connection pooling, use service endpoints if the you are connecting to resources in Azure." They sent me this link https://4lowtherabbit.github.io/blogs/2019/10/SNAT/ and also this https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections
Problem is I am unsure about how to practically reuse and or pool connections in python and I am unsure what the primary cause of exhaustion is, as this data is not publicly available.
So I am looking for help with applying their advice to all our http(s) and database connections.
I made the assumption that pymongo and pyodbc (the database clients we use) would handle pooling an reuse despite me creating a new client each time a function runs. Is this incorrect and if so, how do I reuse these database clients in python to prevent this?
The problem has so far only been caused when using requests (or the zeep SOAP library that internally defaults to using requests) to hit a https endpoint. Is there any way I could improve how I use requests. Like reusing sessions or closing connections explicitly. I am aware that requests creates a session in the background when calling requests.get. But my knowledge about the library is insufficient to figure out if this is the problem and how I could solve it. I am thinking I might be able to create and reuse a single session instance for each specific http(s) call in each function, but I am unsure if this is correct and also I have no idea on how to actually do it.
In a few places I also use aiohttp and if possible would like to achive the same thing there.
I haven't looked into service endpoints yet but I am about to.
So in short. What can I in pratice do to ensure reusage/pooling with requests, pyodbc, pymongo and aiohttp?
I am using SolrClient for python with Solr 6.6.2. It works as expected but I cannot find anything in the documentation for closing the connection after opening it.
def getdocbyid(docidlist):
for id in docidlist:
solr = SolrClient('http://localhost:8983/solr', auth=("solradmin", "Admin098"))
doc = solr.get('Collection_Test',doc_id=id)
print(doc)
I do not know if the client closes it automatically or not. If it doesn't, wouldn't it be a problem if several connections are left open? I just want to know if it there is any way to close the connection. Here is the link to the documentation:
https://solrclient.readthedocs.io/en/latest/
The connections are not kept around indefinitely. The standard timeout for any persistent http connection in Jetty is five seconds as far as I remember, so you do not have to worry about the number of connections being kept alive exploding.
The Jetty server will also just drop the connection if required, as it's not required to keep it around as a guarantee for the client. solrclient uses a requests session internally, so it should do pipelining for subsequent queries. If you run into issues with this you can keep a set of clients available as a pool in your application instead, then request an available client instead of creating a new one each time.
I'm however pretty sure you won't run into any issues with the default settings.
Assuming that I can verify that a bunch of POST requests are in fact logically independent, how can I set up HTTP pipelining using python-requests and force it to allow POST requests in the pipeline?
Does someone have a working example?
P.S. for extra points, how to handle errors for outstanding requests if pipeline suddenly breaks?
P.P.S. grequests is not an option in this case.
Pipelining requests can be done with the builtin httplib, but only by accessing the connection and response objects below their public interface. This snippet demonstrates.
Edit: updated version for Python3: https://github.com/urllib3/urllib3/issues/52#issuecomment-109756116
The requests library does not support HTTP pipelining.
You can approximate pipelining by using grequests which makes it easier to run many requests in parallel, but each parallel request would still use a new TCP connection.
(requests does pool connections, keeping the TCP connection open if the remote server permits this, but that only helps for sequential connections, and request and response still have to alternate).
Using New Relic on a Tornado application, some external services are being tracked and some are not. I've noticed that the ones that work utilize httplib while the others are using Tornado's HTTP client, which directly communicates with a socket.
My assumption is that the New Relic agent is hooked into the httplib, because under the hood httplib uses the same socket.
Is there anyway to track these requests as well?
The New Relic python agent does not currently support Tornado's HTTP client, but keep an eye on the release notes for any changes in the future:
https://docs.newrelic.com/docs/releases/python
You can also find a list of currently instrumented external service modules here:
https://docs.newrelic.com/docs/python/instrumented-python-packages#external-web-services
EDIT:Question Updated. Thanks Slott.
I have a TCP Server in Python.
It is a server with asynchronous behaviour. .
The message format is Binary Data.
Currently I have a python client that interacts with the code.
What I want to be able to do eventually implement a Web based Front End to this client.
I just wanted to know , what should be correct design for such an application.
Start with any WSGI-based web server. werkzeug is a choice.
The Asynchronous TCP/IP is a seriously complicated problem. HTTP is synchronous. So using the synchronous web server presenting some asynchronous data is always a problem. Always.
The best you can do is to buffer things and have two processes in your web application.
TCP/IP process that collects data from the remove server and buffers it in a file (or files) somewhere.
WSGI web process which handles GET/POST processing.
GET requests will fetch some or all of the buffer and display it.
POST requests will send a message to the TCP/IP server.
For Web-based, talk HTTP. Use JSON or XML as data formats.
Be standards-compliant and make use of the vast number of libraries out there. Don't reinvent the wheel. This way you have less headaches in the long run.
if you need to maintain a connection to a backend server across multiple HTTP requests, Twisted's HTTP server is an ideal choice, since it's built to manage multiple connections easily.