Can i use luigi with Python celery - python

I am using celery for my web application.
Celery executes Parent tasks which then executes further pipline of tasks
The issues with celery
I can't get dependency graph and visualizer i get with luigi to see whats the status of my parent task
Celery does not provide mechanism to restart the failed pipeline and start from where it failed.
These two thing i can easily get from luigi.
So i was thinking that once celery runs the parent task then inside that task i execute the Luigi pipleine.
Is there going to be any issue with that i.e i need to autoscale the celery workers based on queuesize . will that affect any luigi workers across multiple machines??

Never tried but I think it should be possible to call a luigi task form inside a celery task, the same way you do it from python code in general:
from foobar import MyTask
from luigi import scheduler
task = MyTask(123, 'another parameter value')
sch = scheduler.CentralPlannerScheduler()
w = worker.Worker(scheduler=sch)
w.add(task)
w.run()
About scaling your queue and celery workers: if you have too many celery workers calling luigi tasks of course it will require you to scale your luigi scheduler/daemon so it can handle the number of API requests (every time you call a task to be excecuted, you hit the luigi scheduler API, every N seconds -it dependes on your config- your tasks will hit the scheduler API to say "I'm alive", every time a task finished with -error or success- you hit the scheduler API, and so on).
So yes, take a close look at your scheduler to see if it's receiving too many http requests or if its database is being a bottle neck (luigi uses by default an sqlite but you can easily change it to mysql o postgres).
UPDATE:
Since version 2.7.0, luigi.scheduler.CentralPlannerScheduler has been renamed to luigi.scheduler.Scheduler as you may see here so the above code should now be:
from foobar import MyTask
from luigi import scheduler
task = MyTask(123, 'another parameter value')
sch = scheduler.Scheduler()
w = worker.Worker(scheduler=sch)
w.add(task)
w.run()

Related

How to test code that creates Celery tasks?

I've read Testing with Celery but I'm still a bit confused. I want to test code that generates a Celery task by running the task manually and explicitly, something like:
def test_something(self):
do_something_that_generates_a_celery_task()
assert_state_before_task_runs()
run_task()
assert_state_after_task_runs()
I don't want to entirely mock up the creation of the task but at the same time I don't care about testing the task being picked up by a Celery worker. I'm assuming Celery works.
The actual context in which I'm trying to do this is a Django application where there's some code that takes too long to run in a request, so, it's delegated to background jobs.
In test mode use CELERY_TASK_ALWAYS_EAGER = True. You can set this setting in your settings.py in django if you have followed the default guide for django-celery configuration.

Celery Does Not Process Task in Kubernetes with Redis

I'm running a Kubernetes cluster with three Celery pods, using a single Redis pod as the message queue. Celery version 4.1.0, Python 3.6.3, standard Redis pod from helm.
At a seemingly quick influx of tasks, the Celery pods to stop processing tasks whatsoever. They will be fine for the first few tasks, but then eventually stop working and my tasks hang.
My tasks follow this format:
#app.task(bind=True)
def my_task(some_param):
result = get_data(some_param)
if result != expectation:
task.retry(throw=False, countdown=5)
And are generally queued as follows:
from my_code import my_task
my_task.apply_async(queue='worker', kwargs=celery_params)
The relevant portion of my deployment.yaml:
command: ["celery", "worker", "-A", "myapp.implementation.celery_app", "-Q", "http"]
The only difference between this cluster and my local cluster, which I use docker-compose to manage, is that the cluster is running a prefork pool and locally I run eventlet pool to be able to put together a code coverage report. I've tried running eventlet on the cluster but I see no difference in the results, the tasks still hang.
Is there something I'm missing about running a Celery worker in Kubernetes? Is there a bug that could be affecting my results? Are there any good ways to break into the cluster to see what's actually happening with this issue?
Running the celery tasks without apply_async allowed me to debug this issue, showing that there was a concurrency logic error in the Celery tasks. I highly recommend this method of debugging Celery tasks.
Instead of:
from my_code import my_task
celery_params = {'key': 'value'}
my_task.apply_async(queue='worker', kwargs=celery_params)
I used:
from my_code import my_task
celery_params = {'key': 'value'}
my_task(**celery_params)
This allowed me to locate the concurrency issue. After I had found the bug, I converted the code back to an asynchronous method call using apply_async.

Celery/Django: Get result of periodic task execution

I have a Django 1.7 project using Celery (latest). I have a REST API that receives some parameters, and creates, programmatically, a PeriodicTask. For testing, I'm using a period of seconds:
periodic_task, _= PeriodicTask.objects.get_or_create(name=task_label, task=task_name, interval=interval_schedule)
I store a reference to this tasks somewhere. I start celery beat:
python manage.py celery beat
and a worker:
python manage.py celery worker --loglevel=info
and my task runs as I can see in the worker's output.
I've set the result backend:
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
and with that, I can check the task results using the TaskMeta model. The objects there contains the task_id (the same that I would get if I call the task with .delay() or .apply_async() ), the status, the result, everything, beautiful.
However, I can't find a connection between the PeriodicTask object and TaskMeta.
PeriodicTask has a task property, but its just the task name/path. The id is just a consecutive number, not the task_id from TaskMeta, and I really need to be able to find the task that was executed as a PeriodicTask with TaskMeta so I can offer some monitoring over the status. TaskMeta doesn't have any other value that allows me to identify which task ran (since I will have several ones), so at least I could give a status of the last execution.
I've checked all over Celery docs and in here, but no solution so far.
Any help is highly appreciated.
Thanks
You can run service to monitor task have been performed by using command line
python manage.py celerycam --frequency=10.0
More detail at:
http://www.lexev.org/en/2014/django-celery-setup/

Celery task is hanging with http request

I'm testing celery tasks and have stumbled on issue. If in task exists code with request(through urllib.urlopen) then it's hanging. What reasons can be?
I just try start on minimal config with Flask.
I used rabbitmq and redis for broker and backend, but result is the same.
file(run_celery.py) with tasks:
...import celery and flask app...
celery = Celery(
app.import_name,
backend=app.config['CELERY_BROKER_URL'],
broker=app.config['CELERY_BROKER_URL']
)
#celery.task
def test_task(a):
print(a)
print(requests.get('http://google.com'))
In this way I launched worker:
celery -A run_celery.celery worker -l debug
After this, I run ipython and call task.
from run_celery import test_task
test_task.apply_async(('sfas',))
Worker's beginning perform task:
...
Received task: run_celery.test_task...
sfas
Starting new HTTP connection (1)...
And after this it's hanging.
This behavior is actual only if task contain request.
What Did I do wrong?
I found reason in my code and very wondered O_o. I don't know why this is happening but within file with tasks, exists import Model and when it is executing then perform initialization instance MagentoAPI(https://github.com/bernieke/python-magento). If I comment out this initialization then requests in celery tasks perform correctly.

How can concurrency per task be controlled for pcelery?

Can I have finer grain control over the number of celery workers running per task? I'm running pyramid applications and using pceleryd for async.
from ini file:
CELERY_IMPORTS = ('learning.workers.matrix_task',
'learning.workers.pipeline',
'learning.workers.classification_task',
'learning.workers.metric')
CELERYD_CONCURRENCY = 6
from learning.workers.matrix_task
from celery import Task
class BuildTrainingMatrixTask(Task):
....
class BuildTestMatrixTask(Task):
....
I want up to 6 BuildTestMatrixTask tasks running at a time. But I want only 1 BuiltTrainingMatrixTask running at a time. Is there a way to accomplish this?
You can send tasks to separate queues according to its type, i.e. BuildTrainingMatrixTask to first queue (let it be named as 'training_matrix') and BuildTestMatrixTask to second one (test_matrix). See Routing Tasks for details. Then you should start a worker for each queue with desirable concurrency:
$ celery worker --queues 'test_matrix' --concurrency=6
$ celery worker --queues 'training_matrix' --concurrency=1

Categories

Resources