I wish to avoid overrunning the http connections pool - python

I am creating a tool that will run many simultaneous calls to a RESTful API. I am using the python "Requests" module and the "threading" module. Once I stack too many simultaneous gets on the system I am getting exceptions like this:
ConnectionError: HTTPConnectionPool(host='xxx.net', port=80): Max retries exceeded with url: /thing/subthing/ (Caused by : [Errno 10055] An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full)
What can I do to either increase the buffer and queue space, or ask the Requests module to wait for an available slot?
(I know I could stuff it in a "try" loop, but that seems clumsy)

Use a session. If you use the requests.request family of methods (get, post, ...), each request will use it's own session with it's own connection pool, therfore not making any use of connection pooling.
If you need to fine-tune the number of connections used within a session, you can do this by changing it's HTTPAdapter

Related

What can cause a “Resource temporarily unavailable” on sock connect() command

I am debugging a Python flask application. The application runs atop uWSGI configured with 6 threads and 1 process. I am using Flask-Executor to offload some slower tasks. These tasks create a connection with the Flask application, i.e., the same process, and perform some HTTP GET requests. The executor is configured to use 2 threads max. This application runs on Ubuntu 16.04.3 LTS.
Every once in a while the threads in the executor completely stop working. The code uses the Python requests library to do the requests. The underlying error message is:
Action failed. HTTPSConnectionPool(host='somehost.com', port=443): Max retries exceeded with url: /api/get/value (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8d75bb5860>: Failed to establish a new connection: [Errno 11] Resource temporarily unavailable',))
The code that is running within the executor looks like this:
adapter = requests.adapters.HTTPAdapter(max_retries=3)
session = requests.Session()
session.mount('http://somehost.com:80', adapter)
session.headers.update({'Content-Type': 'application/json'})
...
session.get(uri, params=params, headers=headers, timeout=3)
I've spent a good amount of time trying to peel back the Python requests stack down to the C sockets that it uses. I've also tried reproducing this error using small C and Python programs. At first I thought it could be that sockets were not getting closed and so we were running out of allowable sockets as a resource, but that gives me a message more along the lines of "too many files are open".
Setting aside the Python stack, what could cause a [Errno 11] Resource temporarily unavailable on a socket connect() command? Also, if you've run into this using requests, are there arguments that I could pass in to prevent this?
I've seen the What can cause a “Resource temporarily unavailable” on sock send() command StackOverflow post, but I'm that's on a send() command and not on the initial connect(), which is what I suspect is where the code is getting hung up.
The error message Resource temporarily unavailable corresponds to the error code EAGAIN.
The connect() manpage states, that the error `EAGAIN occurs in the following situations:
No more free local ports or insufficient entries in the routing cache. For AF_INET see the description of /proc/sys/net/ipv4/ip_local_port_range ip(7) for information on how to increase the number of local ports.
This can happen, when very many connections to the same IP/port combination are in use and no local port for automatic binding can be found. You can check with
netstat -tulpen
which connections exactly cause this.

Azure functions python - how to prevent SNAT port exhaustion?

So I have an Azure functions app written in python and quite often the code throws an error like this.
HTTPSConnectionPool(host='www.***.com', port=443): Max retries exceeded with url: /x/y/z (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7faba31d0438>: Failed to establish a new connection: [Errno 110] Connection timed out',))
This happens in a few diffrent functions that make https connections.
I contacted support and they told me that this was caused by SNAT port exhaustion and adviced me to: "Modify the application to reuse connections instead of creating a connection per request, use connection pooling, use service endpoints if the you are connecting to resources in Azure." They sent me this link https://4lowtherabbit.github.io/blogs/2019/10/SNAT/ and also this https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections
Problem is I am unsure about how to practically reuse and or pool connections in python and I am unsure what the primary cause of exhaustion is, as this data is not publicly available.
So I am looking for help with applying their advice to all our http(s) and database connections.
I made the assumption that pymongo and pyodbc (the database clients we use) would handle pooling an reuse despite me creating a new client each time a function runs. Is this incorrect and if so, how do I reuse these database clients in python to prevent this?
The problem has so far only been caused when using requests (or the zeep SOAP library that internally defaults to using requests) to hit a https endpoint. Is there any way I could improve how I use requests. Like reusing sessions or closing connections explicitly. I am aware that requests creates a session in the background when calling requests.get. But my knowledge about the library is insufficient to figure out if this is the problem and how I could solve it. I am thinking I might be able to create and reuse a single session instance for each specific http(s) call in each function, but I am unsure if this is correct and also I have no idea on how to actually do it.
In a few places I also use aiohttp and if possible would like to achive the same thing there.
I haven't looked into service endpoints yet but I am about to.
So in short. What can I in pratice do to ensure reusage/pooling with requests, pyodbc, pymongo and aiohttp?

Python Post Requests using Requests library giving broken pipe error

I am running a API developed in GOLang which accepts post requests over LAN. My client is using Python to send some data (size 350KB) to the server. The python code is multithreaded and may be performing simultaneous post requests, 1 per each thread. Expected average requests per second between client and server is around 3. The requests are not getting timeout as the error gets raised much earlier.
I can not seem to find the source of the error. The network should be robust as both server and client are on a single switch having 1 Gbps. Please help.
HTTPConnectionPool(host='192.168.1.105', port=8080): Max retries exceeded with url: /match (Caused by <class 'socket.error'>: [Errno 32] Broken pipe)

Pass connected SSL Socket to another Process

I am struggling to find a mechanism to send a request to the target server and when the socket has data to be read, pass the socket to another process for getting the data out.
I came so far using epoll on Linux, to implement it to the point that i do the handshake, i send the request and the request arrives, then i pass the socket fd to another process for futher handling, i explicitly save the SSL Session using PEM_write_bio_SSL_SESSION and then read it using PEM_read_bio_SSL_SESSION and add it to the context but i can not read the ssl socket in another process because i get either Internal error or Handshake failure.
I've read this article but still couldn't find any mechanism to work it out. I know this is because openssl is application-level library but there has to be way because Apache already is doing this .
At least, if its not possible, is there a way to decrypt the data from socket (which i can read normally) using Master Key from openssl's session ?
The only way you can do this is by cloning the full user space part of the SSL socket, which is spread over multiple internal data structures. Since you don't have access to all the structures from python you can only do this by cloning the process, i.e. use fork.
Note that once you have forked the process you should only continue to work with the SSL socket in one of the processes, i.e. it is not possible to fork, do some work in the child and then do some work in the parent process. This is not possible because once you are dealing with the socket the SSL state gets changed, but only in one of the processes. In the other process the state gets out of sync and any attempts to use this wrong state later will cause strange errors.

Change the connection pool size for Python's "requests" module when in Threading

(edit: Perhaps I am wrong in what this error means. Is this indicating that the connection pool at my CLIENT is full? or a connection pool at the SERVER is full and this is the error my client is being given?)
I am attempting to make a large number of http requests concurrently using the python threading and requests module. I am seeing this error in logs:
WARNING:requests.packages.urllib3.connectionpool:HttpConnectionPool is full, discarding connection:
What can I do to increase the size of the connection pool for requests?
This should do the trick:
import requests.adapters
session = requests.Session()
adapter = requests.adapters.HTTPAdapter(pool_connections=100, pool_maxsize=100)
session.mount('http://', adapter)
response = session.get("/mypage")
Note: Use this solution only if you cannot control the construction of the connection pool (as described in #Jahaja's answer).
The problem is that the urllib3 creates the pools on demand. It calls the constructor of the urllib3.connectionpool.HTTPConnectionPool class without parameters. The classes are registered in urllib3 .poolmanager.pool_classes_by_scheme. The trick is to replace the classes with your classes that have different default parameters:
def patch_http_connection_pool(**constructor_kwargs):
"""
This allows to override the default parameters of the
HTTPConnectionPool constructor.
For example, to increase the poolsize to fix problems
with "HttpConnectionPool is full, discarding connection"
call this function with maxsize=16 (or whatever size
you want to give to the connection pool)
"""
from urllib3 import connectionpool, poolmanager
class MyHTTPConnectionPool(connectionpool.HTTPConnectionPool):
def __init__(self, *args,**kwargs):
kwargs.update(constructor_kwargs)
super(MyHTTPConnectionPool, self).__init__(*args,**kwargs)
poolmanager.pool_classes_by_scheme['http'] = MyHTTPConnectionPool
Then you can call to set new default parameters. Make sure this is called before any connection is made.
patch_http_connection_pool(maxsize=16)
If you use https connections you can create a similar function:
def patch_https_connection_pool(**constructor_kwargs):
"""
This allows to override the default parameters of the
HTTPConnectionPool constructor.
For example, to increase the poolsize to fix problems
with "HttpSConnectionPool is full, discarding connection"
call this function with maxsize=16 (or whatever size
you want to give to the connection pool)
"""
from urllib3 import connectionpool, poolmanager
class MyHTTPSConnectionPool(connectionpool.HTTPSConnectionPool):
def __init__(self, *args,**kwargs):
kwargs.update(constructor_kwargs)
super(MyHTTPSConnectionPool, self).__init__(*args,**kwargs)
poolmanager.pool_classes_by_scheme['https'] = MyHTTPSConnectionPool
Jahaja's answer already gives the recommended solution to your problem, but it does not answer what is going on or, as you asked, what this error means.
Some very detailed information about this is in urllib3 official documentation, the package requests uses under the hood to actually perform its requests. Here are the relevant parts for your question, adding a few notes of my own and ommiting code examples since requests have a different API:
The PoolManager class automatically handles creating ConnectionPool instances for each host as needed. By default, it will keep a maximum of 10 ConnectionPool instances [Note: That's pool_connections in requests.adapters.HTTPAdapter(), and it has the same default value of 10]. If you’re making requests to many different hosts it might improve performance to increase this number
However, keep in mind that this does increase memory and socket consumption.
Similarly, the ConnectionPool class keeps a pool of individual HTTPConnection instances. These connections are used during an individual request and returned to the pool when the request is complete. By default only one connection will be saved for re-use [Note: That's pool_maxsize in HTTPAdapter(), and requests changes the default value from 1 to 10]. If you are making many requests to the same host simultaneously it might improve performance to increase this number
The behavior of the pooling for ConnectionPool is different from PoolManager. By default, if a new request is made and there is no free connection in the pool then a new connection will be created. However, this connection will not be saved if more than maxsize connections exist. This means that maxsize does not determine the maximum number of connections that can be open to a particular host, just the maximum number of connections to keep in the pool. However, if you specify block=True [Note: Available as pool_block in HTTPAdapter()] then there can be at most maxsize connections open to a particular host
Given that, here's what happened in your case:
All pools mentioned are CLIENT pools. You (or requests) have no control over any server connection pools
That warning is about HttpConnectionPool, i.e, the number of simultaneous connections made to the same host, so you could increase pool_maxsize to match the number of workers/threads you're using to get rid of the warning.
Note that requests is already opening as many simultaneous connections as you ask for, regardless of pool_maxsize. If you have 100 threads, it will open 100 connections. But with the default value only 10 of them will be kept in the pool for later reuse, and 90 will be discarded after completing the request.
Thus, a larger pool_maxsize increases performance to a single host by reusing connections, not by increasing concurrency.
If you're dealing with multiple hosts, then you might change pool_connections instead. The default is 10 already, so if all your requests are to the same target host, increasing it will not have any effect on performance (but it will increase the resources used, as said in above documentation)
In case anyone needs to do it with Python Zeep and wants to safe bit of time to figure out
here is a quick recipe:
from zeep import Client
from requests import adapters as request_adapters
soap = "http://example.com/BLA/sdwl.wsdl"
wsdl_path = "http://example.com/PATH/TO_WSLD?wsdl"
bind = "Binding"
client = Client(wsdl_path) # Create Client
# switch adapter
session = client.transport.session
adapter = request_adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10)
# mount adapter
session.mount('https://', adapter)
binding = '{%s}%s' % (soap, bind)
# Create Service
service = client.create_service(binding, wsdl_path.split('?')[0])
Basically the connection should be created before creating the service
The answer is actualy taken from the python-zeep Repo from a closed issue,
for refence I'll add it --> here

Categories

Resources