I'm testing celery tasks and have stumbled on issue. If in task exists code with request(through urllib.urlopen) then it's hanging. What reasons can be?
I just try start on minimal config with Flask.
I used rabbitmq and redis for broker and backend, but result is the same.
file(run_celery.py) with tasks:
...import celery and flask app...
celery = Celery(
app.import_name,
backend=app.config['CELERY_BROKER_URL'],
broker=app.config['CELERY_BROKER_URL']
)
#celery.task
def test_task(a):
print(a)
print(requests.get('http://google.com'))
In this way I launched worker:
celery -A run_celery.celery worker -l debug
After this, I run ipython and call task.
from run_celery import test_task
test_task.apply_async(('sfas',))
Worker's beginning perform task:
...
Received task: run_celery.test_task...
sfas
Starting new HTTP connection (1)...
And after this it's hanging.
This behavior is actual only if task contain request.
What Did I do wrong?
I found reason in my code and very wondered O_o. I don't know why this is happening but within file with tasks, exists import Model and when it is executing then perform initialization instance MagentoAPI(https://github.com/bernieke/python-magento). If I comment out this initialization then requests in celery tasks perform correctly.
Related
Following this guide(Complex Example: Showing Status Updates and Results section) i have 2 flask endpoints for starting a bound task (POST request) and retreiving tasks result by its id (GET request to /status/<task_id>).
Running flask app as flask run in one shell and celery worker -A app.celery -l info in another one, it is possible to run task and then get its result by GET request to the /status/ endpoint.
After adding gunicorn and setting number of workers to 3, POST requests run normally, but getting status of a specific running task is a problem, as it can't get task(task.info is None). There is a chance of getting task result by this status endpoint, but if i correctrly understand the problem, it depends on which flask instance gunicorn redirects a request to.
I dont set any specific celery setting, only broker and result_backend(using RabbitMQ).
How to correctly configure the gunicorn+flask+celery for this sort of tasks?
Fixed by using redis as result backend(or any other than RPC, i believe).
According to the documentation
The RPC result backend (rpc://) is special as it doesn’t actually store the states, but rather sends them as messages. This is an important difference as it means that a result can only be retrieved once, and only by the client that initiated the task. Two different processes can’t wait for the same result.
I'm running a Kubernetes cluster with three Celery pods, using a single Redis pod as the message queue. Celery version 4.1.0, Python 3.6.3, standard Redis pod from helm.
At a seemingly quick influx of tasks, the Celery pods to stop processing tasks whatsoever. They will be fine for the first few tasks, but then eventually stop working and my tasks hang.
My tasks follow this format:
#app.task(bind=True)
def my_task(some_param):
result = get_data(some_param)
if result != expectation:
task.retry(throw=False, countdown=5)
And are generally queued as follows:
from my_code import my_task
my_task.apply_async(queue='worker', kwargs=celery_params)
The relevant portion of my deployment.yaml:
command: ["celery", "worker", "-A", "myapp.implementation.celery_app", "-Q", "http"]
The only difference between this cluster and my local cluster, which I use docker-compose to manage, is that the cluster is running a prefork pool and locally I run eventlet pool to be able to put together a code coverage report. I've tried running eventlet on the cluster but I see no difference in the results, the tasks still hang.
Is there something I'm missing about running a Celery worker in Kubernetes? Is there a bug that could be affecting my results? Are there any good ways to break into the cluster to see what's actually happening with this issue?
Running the celery tasks without apply_async allowed me to debug this issue, showing that there was a concurrency logic error in the Celery tasks. I highly recommend this method of debugging Celery tasks.
Instead of:
from my_code import my_task
celery_params = {'key': 'value'}
my_task.apply_async(queue='worker', kwargs=celery_params)
I used:
from my_code import my_task
celery_params = {'key': 'value'}
my_task(**celery_params)
This allowed me to locate the concurrency issue. After I had found the bug, I converted the code back to an asynchronous method call using apply_async.
I am using celery for my web application.
Celery executes Parent tasks which then executes further pipline of tasks
The issues with celery
I can't get dependency graph and visualizer i get with luigi to see whats the status of my parent task
Celery does not provide mechanism to restart the failed pipeline and start from where it failed.
These two thing i can easily get from luigi.
So i was thinking that once celery runs the parent task then inside that task i execute the Luigi pipleine.
Is there going to be any issue with that i.e i need to autoscale the celery workers based on queuesize . will that affect any luigi workers across multiple machines??
Never tried but I think it should be possible to call a luigi task form inside a celery task, the same way you do it from python code in general:
from foobar import MyTask
from luigi import scheduler
task = MyTask(123, 'another parameter value')
sch = scheduler.CentralPlannerScheduler()
w = worker.Worker(scheduler=sch)
w.add(task)
w.run()
About scaling your queue and celery workers: if you have too many celery workers calling luigi tasks of course it will require you to scale your luigi scheduler/daemon so it can handle the number of API requests (every time you call a task to be excecuted, you hit the luigi scheduler API, every N seconds -it dependes on your config- your tasks will hit the scheduler API to say "I'm alive", every time a task finished with -error or success- you hit the scheduler API, and so on).
So yes, take a close look at your scheduler to see if it's receiving too many http requests or if its database is being a bottle neck (luigi uses by default an sqlite but you can easily change it to mysql o postgres).
UPDATE:
Since version 2.7.0, luigi.scheduler.CentralPlannerScheduler has been renamed to luigi.scheduler.Scheduler as you may see here so the above code should now be:
from foobar import MyTask
from luigi import scheduler
task = MyTask(123, 'another parameter value')
sch = scheduler.Scheduler()
w = worker.Worker(scheduler=sch)
w.add(task)
w.run()
Using "send_task" celery actually never verifies a remote task exists i.e:
app.send_task('tasks.i.dont.exist', args=[], kwargs={})
Celery seems to still return a message i.e.:
<AsyncResult: b8c1425a-7411-491f-b75a-34313832b8ba>
Is there a way for it to fail if the remote task does not exist?
I've tried adding .get() and it just freezes.
According to the documentation:
If the task is not registered in the current process then you can also
execute a task by name.
You do this by using the send_task() method of the celery instance
If you want verification consider using delay instead.
You can read more about how to execute celery tasks here.
I have a Django 1.7 project using Celery (latest). I have a REST API that receives some parameters, and creates, programmatically, a PeriodicTask. For testing, I'm using a period of seconds:
periodic_task, _= PeriodicTask.objects.get_or_create(name=task_label, task=task_name, interval=interval_schedule)
I store a reference to this tasks somewhere. I start celery beat:
python manage.py celery beat
and a worker:
python manage.py celery worker --loglevel=info
and my task runs as I can see in the worker's output.
I've set the result backend:
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
and with that, I can check the task results using the TaskMeta model. The objects there contains the task_id (the same that I would get if I call the task with .delay() or .apply_async() ), the status, the result, everything, beautiful.
However, I can't find a connection between the PeriodicTask object and TaskMeta.
PeriodicTask has a task property, but its just the task name/path. The id is just a consecutive number, not the task_id from TaskMeta, and I really need to be able to find the task that was executed as a PeriodicTask with TaskMeta so I can offer some monitoring over the status. TaskMeta doesn't have any other value that allows me to identify which task ran (since I will have several ones), so at least I could give a status of the last execution.
I've checked all over Celery docs and in here, but no solution so far.
Any help is highly appreciated.
Thanks
You can run service to monitor task have been performed by using command line
python manage.py celerycam --frequency=10.0
More detail at:
http://www.lexev.org/en/2014/django-celery-setup/