Celery, periodic task execution, with concurrency

Celery, periodic task execution, with concurrency - python

I would like to launch a periodic task every second but only if the previous task ended (db polling to send task to celery).
In the Celery documentation they are using the Django cache to make a lock.
I tried to use the example:
from __future__ import absolute_import
import datetime
import time
from celery import shared_task
from django.core.cache import cache
LOCK_EXPIRE = 60 * 5
#shared_task
def periodic():
acquire_lock = lambda: cache.add('lock_id', 'true', LOCK_EXPIRE)
release_lock = lambda: cache.delete('lock_id')
a = acquire_lock()
if a:
try:
time.sleep(10)
print a, 'Hello ', datetime.datetime.now()
finally:
release_lock()
else:
print 'Ignore'
with the following configuration:
app.conf.update(
CELERY_IGNORE_RESULT=True,
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
CELERYBEAT_SCHEDULE={
'periodic_task': {
'task': 'app_task_management.tasks.periodic',
'schedule': timedelta(seconds=1),
},
},
)
But in the console, I never see the Ignore message and I have Hello every second. It seems that the lock is not working fine.
I launch the periodic task with:
celeryd -B -A my_app
and the worker with:
celery worker -A my_app -l info
Could you please correct my misunderstanding?

From the Django Cache Framework documentation about local-memory cache:
Note that each process will have its own private cache instance, which
means no cross-process caching is possible.
So basically your workers are each dealing with their own cache. If you need a low resource cost cache backend I would recommend File Based Cache or Database Cache, both allow cross-process.

Related

Celery not consuming tasks when queue name is specified [duplicate]

By default Celery send all tasks to 'celery' queue, but you can change this behavior by adding extra parameter:
#task(queue='celery_periodic')
def recalc_last_hour():
log.debug('sending new task')
recalc_hour.delay(datetime(2013, 1, 1, 2)) # for example
Scheduler settings:
CELERYBEAT_SCHEDULE = {
'installer_recalc_hour': {
'task': 'stats.installer.tasks.recalc_last_hour',
'schedule': 15 # every 15 sec for test
},
}
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
Run worker:
python manage.py celery worker -c 1 -Q celery_periodic -B -E
This scheme doesn't work as expected: this workers sends periodic tasks to 'celery' queue, not 'celery_periodic'. How can I fix that?
P.S. celery==3.0.16

Periodic tasks are sent to queues by celery beat where you can do everything you do with the Celery API. Here is the list of configurations that comes with celery beat:
https://celery.readthedocs.org/en/latest/userguide/periodic-tasks.html#available-fields
In your case:
CELERYBEAT_SCHEDULE = {
'installer_recalc_hour': {
'task': 'stats.installer.tasks.recalc_last_hour',
'schedule': 15, # every 15 sec for test
'options': {'queue' : 'celery_periodic'}, # options are mapped to apply_async options
},
}

I found solution for this problem:
1) First of all I changed the way for configuring periodic tasks. I used #periodic_task decorator like this:
#periodic_task(run_every=crontab(minute='5'),
queue='celery_periodic',
options={'queue': 'celery_periodic'})
def recalc_last_hour():
dt = datetime.utcnow()
prev_hour = datetime(dt.year, dt.month, dt.day, dt.hour) \
- timedelta(hours=1)
log.debug('Generating task for hour %s', str(prev_hour))
recalc_hour.delay(prev_hour)
2) I wrote celery_periodic twice in params to #periodic_task:
queue='celery_periodic' option is used when you invoke task from code (.delay or .apply_async)
options={'queue': 'celery_periodic'} option is used when celery beat invokes it.
I'm sure, the same thing is possible if you'd configure periodic tasks with CELERYBEAT_SCHEDULE variable.
UPD. This solution correct for both DB based and file based storage for CELERYBEAT_SCHEDULER.

And if you are using djcelery Database scheduler, you can specify the queue on the Execution Options -> queue field

Django rq-scheduler: jobs in scheduler doesnt get executed

In my Heroku application I succesfully implemented background tasks. For this purpose I created a Queue object at the top of my views.py file and called queue.enqueue() in the appropriate view.
Now I'm trying to set a repeated job with rq-scheduler's scheduler.schedule() method. I know that it is not best way to do it but I call this method again at the top of my views.py file. Whatever I do, I couldn't get it to work, even if it's a simple HelloWorld function.
views.py:
from redis import Redis
from rq import Queue
from worker import conn
from rq_scheduler import Scheduler
scheduler = Scheduler(queue=q, connection=conn)
print("SCHEDULER = ", scheduler)
def say_hello():
print(" Hello world!")
scheduler.schedule(
scheduled_time=datetime.utcnow(), # Time for first execution, in UTC timezone
func=say_hello, # Function to be queued
interval=60, # Time before the function is called again, in seconds
repeat=10, # Repeat this number of times (None means repeat forever)
queue_name='default',
)
worker.py:
import os
import redis
from rq import Worker, Queue, Connection
import django
django.setup()
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL')
if not redis_url:
print("Set up Redis To Go first. Probably can't get env variable REDISTOGO_URL")
raise RuntimeError("Set up Redis To Go first. Probably can't get env variable REDISTOGO_URL")
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
print(" CREATING NEW WORKER IN worker.py")
worker = Worker(map(Queue, listen))
worker.work()
I'm checking the length of my queue before and after of schedule(), but it looks like length is always 0. I also can see that there are jobs when I call scheduler.get_jobs(), but those jobs doesn't get enqueued or performed I think.
I also don't want to use another cron solution for my project, as I already can do background tasks with rq, it shouldn't be that hard to implement a repeated task, or is it?
I went through documentation a couple times, now I feel so stuck, so I appretiate all the help or advices that I can get.
Using rq 1.6.1 and rq-scheduler 0.10.0 packages with Django 2.2.5 and Python 3.6.10
Edit: When I print jobs in scheduler, I see that their enqueued_at param is set to None, am I missing something really simple?

Why isn't celery periodic task working?

I'm trying to create a periodic task within a Django app.
I added this to my settings.py:
from datetime import timedelta
CELERYBEAT_SCHEDULE = {
'get_checkins': {
'task': 'api.tasks.get_checkins',
'schedule': timedelta(seconds=1)
}
}
I'm just getting started with Celery and haven't figured out which broker I want to use, so I added this as well to just bypass the broker for the time being:
if DEBUG:
CELERY_ALWAYS_EAGER = True
I also created a celery.py file in my project folder:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'testproject.settings')
app = Celery('testproject')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
Inside my app, called api, I made a tasks.py file:
from celery import shared_task
#shared_task
def get_checkins():
print('hello from get checkins')
I'm running the worker and beat with celery -A testproject worker --beat -l info
It starts up fine and I can see the task is registered under [tasks], but I don't see any jobs getting logged. Should be one per second. Can anyone tell why this isn't executing?

I looked at your post and don't see any comment on the broker you are using along with celery.
Have you installed a broker like Rabbitmq? Is it running or logging some kind of error?
Celery needs a broker to send and receive data.
Check the documentation here (http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#choosing-a-broker)

Starting celery worker from multiprocessing

I'm new to celery. All of the examples I've seen start a celery worker from the command line. e.g:
$ celery -A proj worker -l info
I'm starting a project on elastic beanstalk and thought it would be nice to have the worker be a subprocess of my web app. I tried using multiprocessing and it seems to work. I'm wondering if this is a good idea, or if there might be some disadvantages.
import celery
import multiprocessing
class WorkerProcess(multiprocessing.Process):
def __init__(self):
super().__init__(name='celery_worker_process')
def run(self):
argv = [
'worker',
'--loglevel=WARNING',
'--hostname=local',
]
app.worker_main(argv)
def start_celery():
global worker_process
worker_process = WorkerProcess()
worker_process.start()
def stop_celery():
global worker_process
if worker_process:
worker_process.terminate()
worker_process = None
worker_name = 'celery#local'
worker_process = None
app = celery.Celery()
app.config_from_object('celery_app.celeryconfig')

Seems like a good option, definitely not the only option but a good one :)
One thing you might want to look into (you might already be doing this), is linking the autoscaling to the size of your Celery queue. So you only scale up when the queue is growing.
Effectively Celery does something similar internally of course, so there's not a lot of difference. The only snag I can think of is the handling of external resources (database connections for example), that might be a problem but is completely dependent on what you are doing with Celery.

If anyone is interested, I did get this working on Elastic Beanstalk with a pre-configured AMI server running Python 3.4. I had a lot of problems with the Docker based server running Debian Jessie. Something to do with port remapping, maybe. Docker is kind of a black box, and I've found it very hard to work with and debug. Fortunately, the good folks at AWS just added a non-docker Python 3.4 option on April 8, 2015.
I did a lot of searching to get this deployed and working. I saw lots of questions without answers. So here's my very simple deployed python 3.4/flask/celery process.
Celery you can just pip install. You'll need to install rabbitmq from a configuration file with a config command or container_command. I'm using a script in my uploaded project zip, so a container_command is necessary to use the script (regular eb config command takes place before the project is installed).
[yourapproot]/.ebextensions/05_install_rabbitmq.config:
container_commands:
01RunScript:
command: bash ./init_scripts/app_setup.sh
[yourapproot]/init_scripts/app_setup.sh:
#!/usr/bin/env bash
# Download and install Erlang
yum install erlang
# Download the latest RabbitMQ package using wget:
wget http://www.rabbitmq.com/releases/rabbitmq-server/v3.5.1/rabbitmq-server-3.5.1-1.noarch.rpm
# Install rabbit
rpm --import http://www.rabbitmq.com/rabbitmq-signing-key-public.asc
yum -y install rabbitmq-server-3.5.1-1.noarch.rpm
# Start server
/sbin/service rabbitmq-server start
I'm doing a flask app, so I startup the workers before the first request:
#app.before_first_request
def before_first_request():
task_mgr.start_celery()
The task_mgr creates the celery app object (which I call celery, since the flask app object is app). The -Ofair is pretty key here, for a simple task manager. There's all kinds of strange behavior with task prefetch. This should maybe be the default?
task_mgr/task_mgr.py:
import celery as celery_module
import multiprocessing
class WorkerProcess(multiprocessing.Process):
def __init__(self):
super().__init__(name='celery_worker_process')
def run(self):
argv = [
'worker',
'--loglevel=WARNING',
'--hostname=local',
'-Ofair',
]
celery.worker_main(argv)
def start_celery():
global worker_process
multiprocessing.set_start_method('fork') # 'spawn' seems to work also
worker_process = WorkerProcess()
worker_process.start()
def stop_celery():
global worker_process
if worker_process:
worker_process.terminate()
worker_process = None
worker_name = 'celery#local'
worker_process = None
celery = celery_module.Celery()
celery.config_from_object('task_mgr.celery_config')
My config is pretty simple so far:
task_mgr/celery_config.py:
BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json' # 'pickle' warning: can't use datetime in json
CELERY_RESULT_SERIALIZER = 'json' # 'pickle' warning: can't use datetime in json
CELERY_TASK_RESULT_EXPIRES = 18000 # Results hang around for 5 hours
CELERYD_CONCURRENCY = 4
Then you can put tasks wherever you need them:
from task_mgr.task_mgr import celery
import time
#celery.task(bind=True)
def error_task(self):
self.update_state(state='RUNNING')
time.sleep(10)
raise KeyError('im an error')
#celery.task(bind=True)
def long_task(self):
self.update_state(state='RUNNING')
time.sleep(20)
return 'long task finished'
#celery.task(bind=True)
def task_with_status(self, wait):
self.update_state(state='RUNNING')
for i in range(5):
time.sleep(wait)
self.update_state(
state='PROGRESS',
meta={
'current': i + 1,
'total': 5,
'status': 'progress',
'host': self.request.hostname,
}
)
time.sleep(wait)
return 'finished with wait = ' + str(wait)
I also keep a task queue to hold the async results so I can monitor the tasks:
task_queue = []
def queue_task(task, *args):
async_result = task.apply_async(args)
task_queue.append(
{
'task_name':task.__name__,
'task_args':args,
'async_result':async_result
}
)
return async_result
def get_tasks_info():
tasks = []
for task in task_queue:
task_name = task['task_name']
task_args = task['task_args']
async_result = task['async_result']
task_id = async_result.id
task_state = async_result.state
task_result_info = async_result.info
task_result = async_result.result
tasks.append(
{
'task_name': task_name,
'task_args': task_args,
'task_id': task_id,
'task_state': task_state,
'task_result.info': task_result_info,
'task_result': task_result,
}
)
return tasks
And of course, start the tasks where you need to:
from webapp.app import app
from flask import url_for, render_template, redirect
from webapp import tasks
from task_mgr import task_mgr
#app.route('/start_all_tasks')
def start_all_tasks():
task_mgr.queue_task(tasks.long_task)
task_mgr.queue_task(tasks.error_task)
for i in range(1, 9):
task_mgr.queue_task(tasks.task_with_status, i * 2)
return redirect(url_for('task_status'))
#app.route('/task_status')
def task_status():
current_tasks = task_mgr.get_tasks_info()
return render_template(
'parse/task_status.html',
tasks=current_tasks
)
And that's about it. Let me know if you need any help, though my celery knowledge is still fairly limited.

Celery Beat Windows Simple Example (not with Django)

I'm really struggling to set up a periodic task using Celery Beat on Windows 7 (unfortunately that is what I'm dealing with at the moment). The app that will be using celery is written with CherryPy, so the Django libraries are not relevant here. All I'm looking for is a simple example of how to start the Celery Beat Process in the background. The FAQ section says the following, but I haven't been able to actually do it yet:
Windows
The -B / –beat option to worker doesn’t work?¶
Answer: That’s right. Run celery beat and celery worker as separate services instead.
My project layout is as follows:
proj/
__init__.py (empty)
celery.py
celery_schedule.py
celery_settings.py (these work
tasks.py
celery.py:
from __future__ import absolute_import
from celery import Celery
from proj import celery_settings
from proj import celery_schedule
app = Celery(
'proj',
broker=celery_settings.BROKER_URL,
backend=celery_settings.CELERY_RESULT_BACKEND,
include=['proj.tasks']
)
# Optional configuration, see the application user guide.
app.conf.update(
CELERY_TASK_RESULT_EXPIRES=3600,
CELERYBEAT_SCHEDULE=celery_schedule.CELERYBEAT_SCHEDULE
)
if __name__ == '__main__':
app.start()
tasks.py
from __future__ import absolute_import
from proj.celery import app
#app.task
def add(x, y):
return x + y
celery_schedule.py
from datetime import timedelta
CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.add',
'schedule': timedelta(seconds=3),
'args': (16, 16)
},
}
Running "celery worker --app=proj -l info" from the command line (from the parent directory of "proj") starts the worker thread just fine and I can execute the add task from the Python terminal. However, I just can't figure out how to start the beat service. Obviously the syntax is probably incorrect as well because I haven't gotten past the missing --beat option.

Just start another process via a new terminal window, make sure you are in the correct directory and execute the command celery beat (no '--' needed preceding the beat keyword).
If this does not solve your issue, rename your celery_schedule.py file to celeryconfig.py and include it in your celery.py file as: app.config_from_object('celeryconfig') right above your name == main
then spawn a new celery beat process: celery beat

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Celery, periodic task execution, with concurrency - python

Related

Celery not consuming tasks when queue name is specified [duplicate]

Django rq-scheduler: jobs in scheduler doesnt get executed

Why isn't celery periodic task working?

Starting celery worker from multiprocessing

Celery Beat Windows Simple Example (not with Django)

Categories

Resources