Django celery task run at once on startup of celery server

Django celery task run at once on startup of celery server - python

I need to find how to specify a kind of initial celery task, that will start all other tasks in specially defined way. This initial task should be run immediately at once on celery server startup and never run again.

How about using celeryd_after_setup or celeryd_init signal?
Follwing example code from the documentation:
from celery.signals import celeryd_init
#celeryd_init.connect(sender='worker12#example.com')
def configure_worker12(conf=None, **kwargs):
...

I found the way to do this. It has one negative side - impossible to specify current year and task will run after year again. But usually server restarts more often, then this period.
from celery.task import PeriodicTask
class InitialTasksStarter(PeriodicTask):
starttime = datetime.now() + timedelta(minutes=1)
run_every = crontab(month_of_year=starttime.month, day_of_month=starttime.day, hour=starttime.hour, minute=starttime.minute)
def run(self, **kwargs):
....
return True

Related

Django rq-scheduler: jobs in scheduler doesnt get executed

In my Heroku application I succesfully implemented background tasks. For this purpose I created a Queue object at the top of my views.py file and called queue.enqueue() in the appropriate view.
Now I'm trying to set a repeated job with rq-scheduler's scheduler.schedule() method. I know that it is not best way to do it but I call this method again at the top of my views.py file. Whatever I do, I couldn't get it to work, even if it's a simple HelloWorld function.
views.py:
from redis import Redis
from rq import Queue
from worker import conn
from rq_scheduler import Scheduler
scheduler = Scheduler(queue=q, connection=conn)
print("SCHEDULER = ", scheduler)
def say_hello():
print(" Hello world!")
scheduler.schedule(
scheduled_time=datetime.utcnow(), # Time for first execution, in UTC timezone
func=say_hello, # Function to be queued
interval=60, # Time before the function is called again, in seconds
repeat=10, # Repeat this number of times (None means repeat forever)
queue_name='default',
)
worker.py:
import os
import redis
from rq import Worker, Queue, Connection
import django
django.setup()
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL')
if not redis_url:
print("Set up Redis To Go first. Probably can't get env variable REDISTOGO_URL")
raise RuntimeError("Set up Redis To Go first. Probably can't get env variable REDISTOGO_URL")
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
print(" CREATING NEW WORKER IN worker.py")
worker = Worker(map(Queue, listen))
worker.work()
I'm checking the length of my queue before and after of schedule(), but it looks like length is always 0. I also can see that there are jobs when I call scheduler.get_jobs(), but those jobs doesn't get enqueued or performed I think.
I also don't want to use another cron solution for my project, as I already can do background tasks with rq, it shouldn't be that hard to implement a repeated task, or is it?
I went through documentation a couple times, now I feel so stuck, so I appretiate all the help or advices that I can get.
Using rq 1.6.1 and rq-scheduler 0.10.0 packages with Django 2.2.5 and Python 3.6.10
Edit: When I print jobs in scheduler, I see that their enqueued_at param is set to None, am I missing something really simple?

Celery's exception 'TimeLimitExceeded' with group of tasks

I have this celery's settings:
WORKER_MAX_TASKS_PER_CHILD = 1
TASK_TIME_LIMIT = 30
When i run group of tasks:
from celery import group, shared_task
from time import sleep
#shared_task
def do_something(arg):
sleep(60)
return arg*2
group([do_something.s(i) for i in range(3)]).apply_async()
I'm geting TimeLimitExceeded inside of group and then worker is killed by celery at once. How can i handle it?

According to the documentation:
The soft time limit allows the task to catch an exception to clean up before it is killed: the hard timeout isn’t catch-able and force terminates the task.
Answer will be simple: do not use hard-time limits for tasks if you want to catch exception.

Celery beat not sending crontab task when hour is set

I'm using celery 4.1 and all my periodic tasks work correctly except where I set the hour in a crontab task. I was thinking it had something to do with the timezone setting, but I can't seem to work out where the problem is.
dashboard/celery.py
from __future__ import absolute_import, unicode_literals
from celery import Celery
app = Celery('dashboard',
broker='redis://',
backend='redis://localhost',
include=['dashboard.tasks'])
app.conf.update(
result_expires=3600,
enable_utc = False,
timezone = 'America/New_York'
)
if __name__ == '__main__':
app.start()
This works:
#app.task
#periodic_task(run_every=(crontab()))
def shutdown_vms():
inst = C2CManage(['stop','kube'])
inst.run()
return
This works:
#app.task
#periodic_task(run_every=(crontab(minute=30,hour='*')))
def shutdown_vms():
inst = C2CManage(['stop','kube'])
inst.run()
return
This doesn't work:
#app.task
#periodic_task(run_every=(crontab(minute=30,hour=6)))
def shutdown_vms():
inst = C2CManage(['stop','kube'])
inst.run()
return
Beat picks up the task just fine:
<ScheduleEntry: dashboard.tasks.shutdown_vms dashboard.tasks.shutdown_vms() <crontab: 30 6 * * * (m/h/d/dM/MY)>
But it never sends it. I've let the processes run over a weekend and it never submits the task. I don't know what I'm doing wrong. I do have other tasks that run on timedelta periodicity and they all work perfectly.
Any help would be awesome.
EDIT: host is set to use the America/New_York timezone.
EDIT2: running beat as a separate process:
celery -A dashboard worker -l info
celery -A dashboard beat -l debug
I run them detached mostly or use multi.

Looks like this bug is causing it.
https://github.com/celery/celery/issues/4177
And several others that indicate that scheduling is not calculated properly when not using UTC.
Switched celery to use UTC as timezone and enabled utc and it works fine.

I solve this problem by use celery==4.0.1

An easy solution for the problem is
In celery settings update the following config
app.conf.enable_utc = False
app.conf.timezone = "Asia/Calcutta" #change to your timezone

Celery task schedule (Ensuring a task is only executed one at a time)

I have a task, somewhat like this:
#task()
def async_work(info):
...
At any moment, I may call async_work with some info. For some reason, I need to make sure that only one async_work is running at a time, other calling request must wait for.
So I come up with the following code:
is_locked = False
#task()
def async_work(info):
while is_locked:
pass
is_locked = True
...
is_locked = False
But it says it's invalid to access local variables...
How to solve it?

It is invalid to access local variables since you can have several celery workers running tasks. And those workers might even be on different hosts. So, basically, there is as many is_locked variable instances as many Celery workers are running
your async_work task. Thus, even though your code won't raise any errors you wouldn't get desired effect with it.
To achieve you goal you need to configure Celery to run only one worker. Since any worker can process a single task at any given time you get what you need.
EDIT:
According to Workers Guide > Concurrency:
By default multiprocessing is used to perform concurrent execution of
tasks, but you can also use Eventlet. The number of worker
processes/threads can be changed using the --concurrency argument
and defaults to the number of CPUs available on the machine.
Thus you need to run the worker like this:
$ celery worker --concurrency=1
EDIT 2:
Surprisingly there's another solution, moreover it is even in the official docs, see the Ensuring a task is only executed one at a time article.

You probably don't want to use concurrency=1 for your celery workers - you want your tasks to be processed concurrently. Instead you can use some kind of locking mechanism. Just ensure timeout for cache is bigger than time to finish your task.
Redis
import redis
from contextlib import contextmanager
redis_client = redis.Redis(host='localhost', port=6378)
#contextmanager
def redis_lock(lock_name):
"""Yield 1 if specified lock_name is not already set in redis. Otherwise returns 0.
Enables sort of lock functionality.
"""
status = redis_client.set(lock_name, 'lock', nx=True)
try:
yield status
finally:
redis_client.delete(lock_name)
#task()
def async_work(info):
with redis_lock('my_lock_name') as acquired:
do_some_work()
Memcache
Example inspired by celery documentation
from contextlib import contextmanager
from django.core.cache import cache
#contextmanager
def memcache_lock(lock_name):
status = cache.add(lock_name, 'lock')
try:
yield status
finally:
cache.delete(lock_name)
#task()
def async_work(info):
with memcache_lock('my_lock_name') as acquired:
do_some_work()

I have implemented a decorator to handle this. It's based on Ensuring a task is only executed one at a time from the official Celery docs.
It uses the function's name and its args and kwargs to create a lock_id, which is set/get in Django's cache layer (I have only tested this with Memcached but it should work with Redis as well). If the lock_id is already set in the cache it will put the task back on the queue and exit.
CACHE_LOCK_EXPIRE = 30
def no_simultaneous_execution(f):
"""
Decorator that prevents a task form being executed with the
same *args and **kwargs more than one at a time.
"""
#functools.wraps(f)
def wrapper(self, *args, **kwargs):
# Create lock_id used as cache key
lock_id = '{}-{}-{}'.format(self.name, args, kwargs)
# Timeout with a small diff, so we'll leave the lock delete
# to the cache if it's close to being auto-removed/expired
timeout_at = monotonic() + CACHE_LOCK_EXPIRE - 3
# Try to acquire a lock, or put task back on queue
lock_acquired = cache.add(lock_id, True, CACHE_LOCK_EXPIRE)
if not lock_acquired:
self.apply_async(args=args, kwargs=kwargs, countdown=3)
return
try:
f(self, *args, **kwargs)
finally:
# Release the lock
if monotonic() < timeout_at:
cache.delete(lock_id)
return wrapper
You would then apply it on any task as the first decorator:
#shared_task(bind=True, base=MyTask)
#no_simultaneous_execution
def sometask(self, some_arg):
...

Best practice of testing django-rq ( python-rq ) in Django

I'll start using django-rq in my project.
Django integration with RQ, a Redis based Python queuing library.
What is the best practice of testing django apps which is using RQ?
For example, if I want to test my app as a black box, after User makes some actions I want to execute all jobs in current Queue, and then check all results in my DB. How can I do it in my django-tests?

I just found django-rq, which allows you to spin up a worker in a test environment that executes any tasks on the queue and then quits.
from django.test impor TestCase
from django_rq import get_worker
class MyTest(TestCase):
def test_something_that_creates_jobs(self):
... # Stuff that init jobs.
get_worker().work(burst=True) # Processes all jobs then stop.
... # Asserts that the job stuff is done.

I separated my rq tests into a few pieces.
Test that I'm correctly adding things to the queue (using mocks).
Assume that if something gets added to the queue, it will eventually be processed. (rq's test suite should cover this).
Test, given the correct input, my tasks work as expected. (normal code tests).
Code being tested:
def handle(self, *args, **options):
uid = options.get('user_id')
# ### Need to exclude out users who have gotten an email within $window
# days.
if uid is None:
uids = User.objects.filter(is_active=True, userprofile__waitlisted=False).values_list('id', flat=True)
else:
uids = [uid]
q = rq.Queue(connection=redis.Redis())
for user_id in uids:
q.enqueue(mail_user, user_id)
My tests:
class DjangoMailUsersTest(DjangoTestCase):
def setUp(self):
self.cmd = MailUserCommand()
#patch('redis.Redis')
#patch('rq.Queue')
def test_no_userid_queues_all_userids(self, queue, _):
u1 = UserF.create(userprofile__waitlisted=False)
u2 = UserF.create(userprofile__waitlisted=False)
self.cmd.handle()
self.assertItemsEqual(queue.return_value.enqueue.mock_calls,
[call(ANY, u1.pk), call(ANY, u2.pk)])
#patch('redis.Redis')
#patch('rq.Queue')
def test_waitlisted_people_excluded(self, queue, _):
u1 = UserF.create(userprofile__waitlisted=False)
UserF.create(userprofile__waitlisted=True)
self.cmd.handle()
self.assertItemsEqual(queue.return_value.enqueue.mock_calls, [call(ANY, u1.pk)])

I commited a patch that lets you do:
from django.test impor TestCase
from django_rq import get_queue
class MyTest(TestCase):
def test_something_that_creates_jobs(self):
queue = get_queue(async=False)
queue.enqueue(func) # func will be executed right away
# Test for job completion
This should make testing RQ jobs easier. Hope that helps!

Just in case this would be helpful to anyone. I used a patch with a custom mock object to do the enqueue that would run right away
#patch django_rq.get_queue
with patch('django_rq.get_queue', return_value=MockBulkJobGetQueue()) as mock_django_rq_get_queue:
#Perform web operation that starts job. In my case a post to a url
Then the mock object just had one method:
class MockBulkJobGetQueue(object):
def enqueue(self, f, *args, **kwargs):
# Call the function
f(
**kwargs.pop('kwargs', None)
)

what I've done for this case is to detect if I'm testing, and use fakeredis during tests. finally, in the test itself, I enqueue the redis worker task in synch mode:
first, define a function that detects if you're testing:
TESTING = len(sys.argv) > 1 and sys.argv[1] == 'test'
def am_testing():
return TESTING
then in your file that uses redis to queue up tasks, manage the queue this way.
you could extend get_queue to specify a queue name if needed:
if am_testing():
from fakeredis import FakeStrictRedis
from rq import Queue
def get_queue():
return Queue(connection=FakeStrictRedis())
else:
import django_rq
def get_queue():
return django_rq.get_queue()
then, enqueue your task like so:
queue = get_queue()
queue.enqueue(task_mytask, arg1, arg2)
finally, in your test program, run the task you are testing in synch mode, so that it runs in the same process as your test. As a matter of practice, I first clear the fakeredis queue, but I don't think its necessary since there are no workers:
from rq import Queue
from fakeredis import FakeStrictRedis
FakeStrictRedis().flushall()
queue = Queue(async=False, connection=FakeStrictRedis())
queue.enqueue(task_mytask, arg1, arg2)
my settings.py has the normal django_redis settings, so django_rq.getqueue() uses these when deployed:
RQ_QUEUES = {
'default': {
'HOST': env_var('REDIS_HOST'),
'PORT': 6379,
'DB': 0,
# 'PASSWORD': 'some-password',
'DEFAULT_TIMEOUT': 360,
},
'high': {
'HOST': env_var('REDIS_HOST'),
'PORT': 6379,
'DB': 0,
'DEFAULT_TIMEOUT': 500,
},
'low': {
'HOST': env_var('REDIS_HOST'),
'PORT': 6379,
'DB': 0,
}
}

None of the answers above really solved how to test without having redis installed and using django settings. I found including the following code in the tests does not impact the project itself yet gives everything needed.
The code uses fakeredis to pretend there is a Redis service available, set up the connection before RQ Django reads the settings.
The connection must be the same because in fakeredis connections
do not share the state. Therefore, it is a singleton object to reuse it.
from fakeredis import FakeStrictRedis, FakeRedis
class FakeRedisConn:
"""Singleton FakeRedis connection."""
def __init__(self):
self.conn = None
def __call__(self, _, strict):
if not self.conn:
self.conn = FakeStrictRedis() if strict else FakeRedis()
return self.conn
django_rq.queues.get_redis_connection = FakeRedisConn()
def test_case():
...

You'll need your tests to pause while there are still jobs in the queue. To do this, you can check Queue.is_empty(), and suspend execution if there are still jobs in the queue:
import time
from django.utils.unittest import TestCase
import django_rq
class TestQueue(TestCase):
def test_something(self):
# simulate some User actions which will queue up some tasks
# Wait for the queued tasks to run
queue = django_rq.get_queue('default')
while not queue.is_empty():
time.sleep(5) # adjust this depending on how long your tasks take to execute
# queued tasks are done, check state of the DB
self.assert(.....)

I came across the same issue. In addition, I executed in my Jobs e.g. some mailing functionality and then wanted to check the Django test mailbox if there were any E-Mail. However, since the with Django RQ the jobs are not executed in the same context as the Django test, the emails that are sent do not end up in the test mailbox.
Therefore I need to execute the Jobs in the same context. This can be achieved by:
from django_rq import get_queue
queue = get_queue('default')
queue.enqueue(some_job_callable)
# execute input watcher
jobs = queue.get_jobs()
# execute in the same context as test
while jobs:
for job in jobs:
queue.remove(job)
job.perform()
jobs = queue.get_jobs()
# check no jobs left in queue
assert not jobs
Here you just get all the jobs from the queue and execute them directly in the test. One can nicely implement this in a TestCase Class and reuse this functionality.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django celery task run at once on startup of celery server - python

I need to find how to specify a kind of initial celery task, that will start all other tasks in specially defined way. This initial task should be run immediately at once on celery server startup and never run again.

How about using celeryd_after_setup or celeryd_init signal? Follwing example code from the documentation: from celery.signals import celeryd_init #celeryd_init.connect(sender='worker12#example.com') def configure_worker12(conf=None, **kwargs): ...

Related

Django rq-scheduler: jobs in scheduler doesnt get executed

Celery's exception 'TimeLimitExceeded' with group of tasks

Celery beat not sending crontab task when hour is set

Celery task schedule (Ensuring a task is only executed one at a time)

Best practice of testing django-rq ( python-rq ) in Django

Categories

Resources