Celery Redis ConnectionError('max number of clients reached',)

Celery Redis ConnectionError('max number of clients reached',) - python

I have a django application leveraging celery for asynchronous tasks. I've been running into an issue where I reach the max number of redis connections. I am fairly new to both celery and redis.
I'm confused because in my config I define - CELERY_REDIS_MAX_CONNECTIONS = 20 which is the limit on my redis plan.
For experimentation, I bumped the plan up and that solved the issue. I am confused, however, that I am running into this problem again after defining the max number of connections. I downgraded the plan and set the limit to the plans max.
I am wondering if the BROKER_POOL_LIMIT needs to be changed.
Is there anything I am missing to help solve connection errors and celery.
Is it possible to figure out how many connections all of my tasks need? I have 16 jobs running every minute.
Another thought, I noticed that connecting the redis cli threw the connection error, is it possible that I am at the limit, and accessing the cli is putting me over?
I also cant kill connections because I cannot connect to the redis cli while it throws this error.

Related

Is there any point of using connection pools with a non multi threaded web server?

I have built a webserver written in python using the flask framework and psycopg2 and I have some questions about concurrent processing as it relates to dbs and the server itself. I am using gunicorn to start my app with
web:gunicorn app:app.
From my understanding a webserver such as this processes requests one at a time. So, if someone makes a get or post request to the server the server must finish responding to that get or post request before it can then move on to another request. If this is the case, then why would I need to make more than one connection cursor object? For example, if someone were making a post request that requires me to update something in the db, then my server can't process requests until I return out of that post end point anyway so that one connection object isn't bottle necking anything is it?
Ultimately, I am trying to allow my server to process a large number of requests simultaneously. In order to do so, I think I would first have to make multiple instances of my server, and THEN the connection pool comes into play right? I think in order to make multiple instances of my server (apologies if any terminologies are being used incorrectly here), I would do one of these things:
one way would be to: I would need to use multiple threads and if the machine my application is deployed on in the cloud has multiple cpu cores, then it can do this(?). However, I have read that python does not support "True multithreading" meaning a multi threaded program is not actually running concurrently, it's just switching back and forth between those threads really quickly, so would this really be any different than my set up currently?
the second way: use multiple gunicorn workers, or use multiple dynos. I think this route is the solution here, but I don't understand the theory on how to set this up at all. If I spawn additional gunicorn workers, what is happening behind the scenes? Would this still all run on my heroku application instance? Does the amount of cores I have access to on heroku affect this in anyway? Also, regardless of which way I pick, what would I be looking to change in the app.py code or would the change solely be inside the procfile?
Assuming I manage to set up multithreading or gunicorn workers, how would this then affect the connection pool set up/what should I do in regards to the connection pool? If anyone familiar with this can help provide some theory or explanations or some resources, I would greatly appreciate it. Thanks all!

From my experience with python here's what I've learned...
If you are using multiple threads or async then you need to use a pool or an async connection
If you have multiple processes and your code is strictly synchronous with no threads then a pool is not necessary. You can reuse a single connection for each process since they are not shared between each other.
Threads dont speed up execution speed in python usually since python will only ever run one thread at a time. Though they can help speed if threads need to block.
For web servers the true bottle neck is IO usually, meaning connecting to db or read file or w.e. Multiple process and making those process async gives the greatest performance. Starlette is a async version of Flask... kinda and is usually much faster when setup properly and using async libraries

gunicorn and/or celery: What is the way get the best out of both?

I've a machine learning application which uses flask to expose api(for production this is not a good idea, but even if I'll use django in future the idea of the question shouldn't change).
The main problem is how to serve multiple requests to my app. Few months back celery has been added to get around this problem. The number of workers in celery that was spawned is equal to the number of cores present in the machine. For very few users this was looking fine and was in production for some time.
When the number of concurrent users got increased, it was evident that we should do a performance testing on it. It turns out: it is able to handle 20 users for 30 GB and 8 core machine without authentication and without any front-end. Which is not looking like a good number.
I didn't know there are things like: application server, web server, model server. When googling for this problem: gunicorn was a good application server python application.
Should I use gunicorn or any other application server along with celery and why
If I remove celery and only use gunicorn with the application can I achieve concurrency. I have read somewhere celery is not good for machine learning applications.
What are the purposes of gunicorn and celery. How can we achieve the best out of both.
Note: Main goal is to maximize concurrency. While serving in production authentication will be added. One front-end application might come into action in between in production.

There is no shame in flask. If in fact you just need a web API wrapper, flask is probably a much better choice than django (simply because django is huge and you'd be using only a fraction of its capability).
However, your concurrency problems are apparently stemming from the fact that you are doing some heavy-duty processing for each request. There is simply no way around that; if you require a certain amount of computational resources per request, you can't magic those up. From here on, it's a juggling act.
If you want a guaranteed response immediately, you need to have as many workers as potential simultaneous requests. This may involve load balancing over multiple servers, if you can't scrounge up enough resources on one server. (cue gunicorn, a web application server, responsible for accepting connections and then distributing them to multiple application processes.)
If you are okay with not getting an immediate response, you can let stuff queue up. (cue celery, a task queue, which worker processes can use to retrieve the next thing to be done, and deposit results). This works best if you don't need a response in the same request-response cycle; e.g. you submit a job from client, and they only get an acknowledgement that the job has been received; you would need a second request to ask about the status of the job, and possibly the results of the job if it is finished.
Alternately, instead of Flask you could use websockets or Tornado, to push out the response to the client when it is available (as opposed to user polling for results, or waiting on a live HTTP connection and taking up a server process).

Task queues end up with "(2062, 'Cloud SQL socket open failed with error: No such file or directory')"

We are building an application which uses heavy backend tasks (Task queues), And in each task - we are doing I/O in Google Cloud SQL.
As GAE have limitation for 12 concurrent connections (not sure whether this is issue? I saw at https://stackoverflow.com/a/26155819/687692)
""Each App Engine instance running in a Standard environment or using Standard-compatible APIs cannot have more than 12 concurrent connections to a Google Cloud SQL instance." - https://cloud.google.com/sql/faq"
My most of the backend tasks (100-500 tasks per second) are failing because of this issue.
Also, I checked active connection for last 4 days: I dont see any of the connection is going more than 12 connections.
So, What approach i need to take to fix this? Connection pooling (How to do it in GAE with GCS?) ? or some other fix?
Any help - guidance is very much appreciated.
Let me know, if any one need more information.
Thanks,

It's not likely that you would exceed the 12 connection limit with the standard Python App Engine scaling settings if the connections are handled properly.
To demonstrate, I have created a small application that schedules many tasks, with each task acquiring a database connection and doing some work. I am able to run this test without hitting connection issues.
One thing worth making sure is that you are not leaking any connections (i.e. not closing the connection in some places or when exceptions happen).
For MySQLdb, you can guarantee you are not leaking connections by using closing from contextlib:
from contextlib import closing
def getDbConnection():
return MySQLdb.connect(unix_socket='/cloudsql/instance_name', db='db', user='user', charset='utf8')
with closing(getDbConnection()) as db:
# do stuff, database is guaranteed to be closed

Celery/CloudAMQP error in a Heroku Flask App

I'm running a Flask app on Heroku (on the free tier) and running into some trouble when scheduling tasks using apply_async. If I schedule more than two tasks, I get a long stacktrace with the exception:
AccessRefused(403, u"ACCESS_REFUSED - access to exchange 'celeryresults' in vhost 'rthtwchf' refused for user 'rthtwchf'", (40, 10), 'Exchange.declare')
The odd thing is the first two tasks (before restarting all of my processes) always seem to complete with no issue.
A little bit of search engine sleuthing leads me to https://stackoverflow.com/questions/21071906/celery-cannot-connect-remote-worker-with-new-username which makes it looks like a permissions issue, but I'd assume that the Heroku CloudAMPQ service would have taken care of that already.
Any advice is appreciated!

I think your connections are exceeding 3 (free plan limit). Set the BROKER_POOL_LIMIT to 1 and it will work.

Do I need celery when I am using gevent?

I am working on a django web app that has functions (say for e.g. sync_files()) that take a long time to return. When I use gevent, my app does not block when sync_file() runs and other clients can connect and interact with the webapp just fine.
My goal is to have the webapp responsive to other clients and not block. I do not expect a zillion users to connect to my webapp (perhaps max 20 connections), and I do not want to set this up to become the next twitter. My app is running on a vps, so I need something light weight.
So in my case listed above, is it redundant to use celery when I am using gevent? Is there a specific advantage to using celery? I prefer not to use celery since it is yet another service that will be running on my machine.
edit: found out that celery can run the worker pool on gevent. I think I am a litle more unsure about the relationship between gevent & celery.

In short you do need a celery.
Even if you use gevent and have concurrency, the problem becomes request timeout. Lets say your task takes 10 minutes to run however the typical request timeout is about up to a minute. So what will happen if you trigger the task directly within a view is that the server will start processing it however after a minute a client (browser) will probably disconnect the connection since it will think the server is offline. As a result, your data can become corrupt since you cannot be guaranteed what will happen when connection will close. Celery solves this because it will trigger a background process which will process the task independent of the view. So the user will get the view response right away and at the same time the server will start processing the task. That is a correct pattern to handle any scenarios which require lots of processing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.