serverless and connection via pymongo and connection pooling

serverless and connection via pymongo and connection pooling - python

I am developing a function in aws lambda/serverless and was wondering how to get connection pooling done. From a programmatic point of view I know how to establish a pymongo connection with pooling, but no idea how to achieve this with serverless since it is stateless and every invocation would then trigger a new connection (and in theory a lot at the same time).
Any advice?

Related

Azure functions python - how to prevent SNAT port exhaustion?

So I have an Azure functions app written in python and quite often the code throws an error like this.
HTTPSConnectionPool(host='www.***.com', port=443): Max retries exceeded with url: /x/y/z (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7faba31d0438>: Failed to establish a new connection: [Errno 110] Connection timed out',))
This happens in a few diffrent functions that make https connections.
I contacted support and they told me that this was caused by SNAT port exhaustion and adviced me to: "Modify the application to reuse connections instead of creating a connection per request, use connection pooling, use service endpoints if the you are connecting to resources in Azure." They sent me this link https://4lowtherabbit.github.io/blogs/2019/10/SNAT/ and also this https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections
Problem is I am unsure about how to practically reuse and or pool connections in python and I am unsure what the primary cause of exhaustion is, as this data is not publicly available.
So I am looking for help with applying their advice to all our http(s) and database connections.
I made the assumption that pymongo and pyodbc (the database clients we use) would handle pooling an reuse despite me creating a new client each time a function runs. Is this incorrect and if so, how do I reuse these database clients in python to prevent this?
The problem has so far only been caused when using requests (or the zeep SOAP library that internally defaults to using requests) to hit a https endpoint. Is there any way I could improve how I use requests. Like reusing sessions or closing connections explicitly. I am aware that requests creates a session in the background when calling requests.get. But my knowledge about the library is insufficient to figure out if this is the problem and how I could solve it. I am thinking I might be able to create and reuse a single session instance for each specific http(s) call in each function, but I am unsure if this is correct and also I have no idea on how to actually do it.
In a few places I also use aiohttp and if possible would like to achive the same thing there.
I haven't looked into service endpoints yet but I am about to.
So in short. What can I in pratice do to ensure reusage/pooling with requests, pyodbc, pymongo and aiohttp?

How to implement connection pooling for a python application connecting to Vertica or PostgreSQL?

I have a python application which interacts with vertica database through vertica python client. Currently there is no connection pool to manage the connections, instead for every request a new connection is opened and then closed at the end of the request. However, this design will cost to handle concurrent requests. Also, the python application is run on a uwsgi and an Nginx server to process multiple requests.
I would like to use an existing connection pool to handle connections to vertica from python but I dont seem to find connection pools like C3Po or Hikari in python. Could you please help me with the pools for python - vertica

For native Postgres, have a look at some of the connection pools discussed at Should PostgreSQL connections be pooled in a Python web app, or create a new connection per request?
For Vertica, it doesn't look like connection pooling is available in the native driver though it might be worth posting an issue on GitHub if you'd like more specific details. You could probably use Vertica's ODBC driver through pyODBC since that supports connedction pooling if configured as discussed at http://www.unixodbc.org/doc/conn_pool.html

Creating a persistent MongDB connection with PyMongo

What are methods of having persistent connections to a MongoDB, instead of creating a MongoClient instance and using it when constructing queries? I noted that it opens/closes a connection on each query operation.
I'm using Python, and have pymongo installed. I've looked around and didn't find much information on connection management. In light of this, what are general recommendations on database management?

Just have a global MongoClient at the top level of a Python module:
client = MongoClient(my_connection_string)
It's critical that you create one client at your application's startup. Use that one same client for every operation for the lifetime of your application and never call "close" on it. This will provide optimal performance.
The client manages a connection pool, and reuses connections as much as possible. It does not open and close a new connection per query, that would be awful. See PyMongo's docs for connection pooling.

How to configure Pyramid to find MongoDB Primary replica

Is there a way to configure Pyramid so that when MongoDB fails over to a secondary replica, Pyramid starts using it?

Pyramid should be using the official python MongoDB drivers. The drivers are configured to do this "automatically", but they need the correct connection string.
See here for the connection strings.
One thing to keep in mind, the definition of "automatic fail-over" is not clear cut.
If you create a new connection to the DB that connection will point at the current primary.
If you use an existing connection from a pool, that connection may be pointing at the wrong server. In this case it will throw an exception the first time and should connect to the correct server the second time.
However, when a fail-over happens, there is a brief window where there is no primary (typically 2-10 seconds). If you use a connection during this period, no connection will be primary.
Note that this is not specific to python, it's the way Replica Sets function.

Python MySQL connection pooling

I've looked through stackoverflow and can see some oldish posts on this and wondered what the current thinking is about pooling connections in Python for MySQL.
We have a set of python processes that are threading with each thread creating a connection to MySQL. This all works fine, but we can have over 150 connections to MySQL.
When I look at the process state in MySQL I can see that most of the connections are asleep most of the time. The application is connecting to a Twitter streaming API so its busy but this only accounts for a few connections.
Is there a good way of adding connection pooling to Python MySQL and can this be done simply without re-writing all of the existing code?
Many thanks.
PT

See DBUtils
Maybe you have an abstract layer for MySQL, you can modify this layer to avoid rewriting all the code.
If not, you have to hack your Python-MySQL driver.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.