I'm running a Kubernetes cluster with three Celery pods, using a single Redis pod as the message queue. Celery version 4.1.0, Python 3.6.3, standard Redis pod from helm.
At a seemingly quick influx of tasks, the Celery pods to stop processing tasks whatsoever. They will be fine for the first few tasks, but then eventually stop working and my tasks hang.
My tasks follow this format:
#app.task(bind=True)
def my_task(some_param):
result = get_data(some_param)
if result != expectation:
task.retry(throw=False, countdown=5)
And are generally queued as follows:
from my_code import my_task
my_task.apply_async(queue='worker', kwargs=celery_params)
The relevant portion of my deployment.yaml:
command: ["celery", "worker", "-A", "myapp.implementation.celery_app", "-Q", "http"]
The only difference between this cluster and my local cluster, which I use docker-compose to manage, is that the cluster is running a prefork pool and locally I run eventlet pool to be able to put together a code coverage report. I've tried running eventlet on the cluster but I see no difference in the results, the tasks still hang.
Is there something I'm missing about running a Celery worker in Kubernetes? Is there a bug that could be affecting my results? Are there any good ways to break into the cluster to see what's actually happening with this issue?
Running the celery tasks without apply_async allowed me to debug this issue, showing that there was a concurrency logic error in the Celery tasks. I highly recommend this method of debugging Celery tasks.
Instead of:
from my_code import my_task
celery_params = {'key': 'value'}
my_task.apply_async(queue='worker', kwargs=celery_params)
I used:
from my_code import my_task
celery_params = {'key': 'value'}
my_task(**celery_params)
This allowed me to locate the concurrency issue. After I had found the bug, I converted the code back to an asynchronous method call using apply_async.
Related
We are scheduling tasks with Airflow with celery as the executor, both of broker and result backend are redis. There are 200+ queues and 100+ workers, and the redis connections is 9200+ now, which is very close to the redis max connection threshold 10000.
I have tried 2 ways to reduce the redis connection but all of them not work well, the method that I have tried is:
set ignore_result=True in #app.task(), this method could reduce redis connection signicantly, but the task state in celery will be ignored, and the task could not be scheduled in Airflow if it is failed, since the scheduler could not get the info that the task in celery has failed.
set BROKER_POOL_LIMIT=0, this method could reduce redis connection a little, the redis connection of a queue that has no tasks run reduce from 10 to 8, after adding this config to airflow.cfg
I have upgrade celery from 4.0.2 to 4.1.0, but the problem is still there, is there any other way to solve this problem properly?
I would like to leverage Celery (with RabbitMQ as backend MQ) to execute tasks of varying flavors via different Queues. One requirement is that consumption (by the workers) from a particular Queue should have the capability to be paused and resumed.
Celery, seems to have this capability via calling add_consumer and cancel_consumer. While I was able to cancel the consumption of tasks from a queue for a particular worker, I cannot get the worker to resume consumption by calling add_consumer. The code to reproduce this issue is provided here. My guess is likely I'm missing some sort of a parameter to be provided either in the celeryconfig or via the arguments when starting the workers?
Would be great to get some fresh pairs of eyes on this. There is not much discussion on Stackoverflow regarding add_consumer nor in Github. So I'm hoping there's some experts here willing to share their thoughts/experience.
--
I am running the below:
Windows OS, RabbitMQ 3.5.6, Erlang 18.1, Python 3.3.5, celery 3.1.15
To resume from queue, you need to specify queue name as well as target workers. Here is how to do it.
app.control.add_consumer(queue='high', destination=['celery#asus'])
Here is add_consumer signature
def add_consumer(state, queue, exchange=None, exchange_type=None,
routing_key=None, **options):
In your case, you are calling with
app.control.add_consumer('high', destination=['celery#high1woka'])
So high is getting passed to state and queue is empty. So it is not able to resume.
To get celery worker to resume working in Windows OS, my work around is listed below.
update celery : pip install celery==4.1.0
update billiard/spawn.py : encasulate line 338 to 339 with try: except: pass
(optional) install eventlet: pip install eventlet==0.22.1
add --pool=eventlet or --pool=solo when starting workers per comment in https://github.com/celery/celery/issues/4178
I am using celery for my web application.
Celery executes Parent tasks which then executes further pipline of tasks
The issues with celery
I can't get dependency graph and visualizer i get with luigi to see whats the status of my parent task
Celery does not provide mechanism to restart the failed pipeline and start from where it failed.
These two thing i can easily get from luigi.
So i was thinking that once celery runs the parent task then inside that task i execute the Luigi pipleine.
Is there going to be any issue with that i.e i need to autoscale the celery workers based on queuesize . will that affect any luigi workers across multiple machines??
Never tried but I think it should be possible to call a luigi task form inside a celery task, the same way you do it from python code in general:
from foobar import MyTask
from luigi import scheduler
task = MyTask(123, 'another parameter value')
sch = scheduler.CentralPlannerScheduler()
w = worker.Worker(scheduler=sch)
w.add(task)
w.run()
About scaling your queue and celery workers: if you have too many celery workers calling luigi tasks of course it will require you to scale your luigi scheduler/daemon so it can handle the number of API requests (every time you call a task to be excecuted, you hit the luigi scheduler API, every N seconds -it dependes on your config- your tasks will hit the scheduler API to say "I'm alive", every time a task finished with -error or success- you hit the scheduler API, and so on).
So yes, take a close look at your scheduler to see if it's receiving too many http requests or if its database is being a bottle neck (luigi uses by default an sqlite but you can easily change it to mysql o postgres).
UPDATE:
Since version 2.7.0, luigi.scheduler.CentralPlannerScheduler has been renamed to luigi.scheduler.Scheduler as you may see here so the above code should now be:
from foobar import MyTask
from luigi import scheduler
task = MyTask(123, 'another parameter value')
sch = scheduler.Scheduler()
w = worker.Worker(scheduler=sch)
w.add(task)
w.run()
I'm testing celery tasks and have stumbled on issue. If in task exists code with request(through urllib.urlopen) then it's hanging. What reasons can be?
I just try start on minimal config with Flask.
I used rabbitmq and redis for broker and backend, but result is the same.
file(run_celery.py) with tasks:
...import celery and flask app...
celery = Celery(
app.import_name,
backend=app.config['CELERY_BROKER_URL'],
broker=app.config['CELERY_BROKER_URL']
)
#celery.task
def test_task(a):
print(a)
print(requests.get('http://google.com'))
In this way I launched worker:
celery -A run_celery.celery worker -l debug
After this, I run ipython and call task.
from run_celery import test_task
test_task.apply_async(('sfas',))
Worker's beginning perform task:
...
Received task: run_celery.test_task...
sfas
Starting new HTTP connection (1)...
And after this it's hanging.
This behavior is actual only if task contain request.
What Did I do wrong?
I found reason in my code and very wondered O_o. I don't know why this is happening but within file with tasks, exists import Model and when it is executing then perform initialization instance MagentoAPI(https://github.com/bernieke/python-magento). If I comment out this initialization then requests in celery tasks perform correctly.
I'm facing a basic issue while setting up python-rq - the rqworker doesn't seem to recognize jobs that are pushed to the queue it's listening on.
Everything is run inside virtualenv
I have the following code:
from redis import Redis
from rq import Queue
from rq.registry import FinishedJobRegistry
from videogen import videogen
import time
redis_conn = Redis(port=5001)
videoq = Queue('medium', connection=redis_conn)
fin_registry = FinishedJobRegistry(connection=redis_conn, name='medium')
jobid = 1024
job = videoq.enqueue(videogen, jobid)
while not job.is_finished:
time.sleep(2)
print job.result
Here videogen is a simple function which immediately returns the integer parameter it receives.
On running rqworker medium and starting the app, there is no result printed. There are NO extra traces at rqworker other than this:
14:41:29 RQ worker started, version 0.5.0
14:41:29
14:41:29 *** Listening on medium...
The redis instance is accessible from the same shell where I run rqworker, as even shows the updated keys:
127.0.0.1:5001> keys *
1) "rq:queues"
2) "rq:queue:medium"
3) "rq:job:9a46f9c5-03e1-4b08-946b-61ad2c3815b1"
So what is possibly missing here?
Silly error - had to supply redis connection url to rqworker
rqworker --url redis://localhost:5001 medium
It's worth noting that this can also happen if you run your RQ workers on Windows, which is not supported by the workers. From the documentation:
RQ workers will only run on systems that implement fork(). Most
notably, this means it is not possible to run the workers on Windows
without using the Windows Subsystem for Linux and running in a bash
shell.