I setup Celery with Django app and broker "Redis"
#task
def proc(product_id,url,did,did_name):
## some long operation here
#task
def Scraping(product_id,num=None):
if num:
num=int(num) ## this for i can set what count of subtasks run now
res=group([proc.s(product_id,url,did,dis[did]) for did in dis.keys()[:num]])()
result = res.get()
return sum(result)
First few subtasks run successful, but later any worker dissapears and new tasks are still in RECEIVED status. Because the worker which must operate it does not exist.
I setup minimal concurency and 2 workers in /etc/default/celeryd.
I monitor CPU and memory usage, no highload detected.
There are no errors in the Celery logs!!!
What's wrong?
[2015-12-19 04:00:30,131: INFO/MainProcess] Task remains.tasks.proc[fd0ec29c-436f-4f60-a1b6-3785342ac173] succeeded in 20.045763085s: 6
[2015-12-19 04:17:28,895: INFO/MainProcess] missed heartbeat from w2#server.domain.com
[2015-12-19 04:17:28,897: DEBUG/MainProcess] w2#server.domain.com joined the party
[2015-12-19 05:11:44,057: INFO/MainProcess] missed heartbeat from w2#server.domain.com
[2015-12-19 05:11:44,058: DEBUG/MainProcess] w2#server.domain.com joined the party
SOLUTION>>> --------------------------------------------------------------)))))
if you use django-celery and want use celery as daemon: no use app() http://docs.celeryproject.org/en/latest/userguide/application.html , instead you must setup your celery in /etc/default/celeryd direct to manage.py of your project as: CELERYD_MULTI="$CELERYD_CHDIR/manage.py celeryd_multi"
do not disable heartbeats!!!!!
for use celery with direct to manage.py need:
create arg. CELERY_APP="" in /etc/default/celeryd beacuse if you don't do it, beat will be make run-command with old argument "app".
add line: "export DJANGO_SETTINGS_MODULE="your_app.settings"" to celeryd config if you not use default settings
Related
I use Django and Celery to schedule a task but I have an issue with the logger because it doesn't propagate properly. As you can see in the code below I have configured the Python logging module and the get_task_logger Celery module.
import logging
from celery import Celery
from celery.utils.log import get_task_logger
# Configure logging
logging.basicConfig(filename='example.log',level=logging.DEBUG)
# Create Celery application and Celery logger
app = Celery('capital')
logger = get_task_logger(__name__)
#app.task()
def candle_updated(d, f):
logging.warning('This is a log')
logger.info('This is another log')
return d+f
I use django-celery-beat extension to setup periodic task from the Django admin. This module stores the schedule in the Django database, and presents a convenient admin interface to manage periodic tasks at runtime.
As recommended in the documentation I start the worker and the scheduler that way:
$ celery -A capital beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler
celery beat v4.4.0 (cliffs) is starting.
__ - ... __ - _
LocalTime -> 2020-04-02 22:33:32
Configuration ->
. broker -> redis://localhost:6379//
. loader -> celery.loaders.app.AppLoader
. scheduler -> django_celery_beat.schedulers.DatabaseScheduler
. logfile -> [stderr]#%INFO
. maxinterval -> 5.00 seconds (5s)
[2020-04-02 22:33:32,630: INFO/MainProcess] beat: Starting...
[2020-04-02 22:33:32,631: INFO/MainProcess] Writing entries...
[2020-04-02 22:33:32,710: INFO/MainProcess] Scheduler: Sending due task Candles update (marketsdata.tasks.candle_updated)
[2020-04-02 22:33:32,729: INFO/MainProcess] Writing entries...
[2020-04-02 22:33:38,726: INFO/MainProcess] Scheduler: Sending due task Candles update (marketsdata.tasks.candle_updated)
[2020-04-02 22:33:44,751: INFO/MainProcess] Scheduler: Sending due task Candles update (marketsdata.tasks.candle_updated)
Everything seems to run fine. There is output in the console every 6 seconds (frequency of the periodic task) so it seems the task is executed in the background but I can't check it. And the problem I have is that the file example.log is empty, what could be the reason?
Did you start a worker node as well? beat is just the scheduler, You have to run a worker as well
celery -A capital worker -l info
I am trying to establish a periodic task using Celery (4.2.0) and RabbitMQ (3.7.14) running with Python 3.7.2 on an Azure VM using Ubuntu 16.04. I am able to start the beat and worker and see the message get kicked off from beat to the worker but at this point I'm met with an error like so
[2019-03-29 21:35:00,081: ERROR/MainProcess] Received
unregistered task of type 'facebook-call.facebook_api'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you're using relative imports?
My code is as follows:
from celery import Celery
from celery.schedules import crontab
app = Celery('facebook-call', broker='amqp://localhost//')
#app.task
def facebook_api():
{function here}
app.conf.beat.schedule = {
'task': 'facebook-call.facebook_api',
'schedule': crontab(hour=0, minute =0, day='0-6'),
}
I am starting the beat and worker processes by using the name of the python file which contains all of the code
celery -A FacebookAPICall beat --loglevel=info
celery -A FacebookAPICall worker --loglevel=info
Again, the beat process starts and I can see the message being successfully passed to the worker but cannot figure out how to "register" the task so that it is processed by the worker.
I was able to resolve the issue by renaming the app from facebook-call to coincide with the name of the file FacebookAPICall
Before:
app = Celery('facebook-call', broker='amqp://localhost//'
After:
app = Celery('FacebookAPICall', broker='amqp://localhost//'
From reading the Celery documentation, I don't totally understand why the name of the app must also be the name of the .py file but that seems to do the trick.
I have four time-consuming tasks which should execute one by one and the result from previous task can be the input of the next one. So I choose Celery chain to do this. And I do this exampled by the follow code:
mychain = chain(task1.s({'a': 1}), task2.s(), task3.s(), task4.s())
mychain.apply_async()
But the execute order of the tasks is:
enter code here`task1() ---> task4() ---> task3() --->task2()
I don't know what happen.
I run a web server by tornado, and it woke up the tasks by chain.
logging:
[2018-07-23 18:34:12,816][pid:25557][tid:140228657469056][util.py:109] DEBUG: chain: fetch({}) | callback() | convert() | format()
the other tasks run in celery
logging:
[2018-07-23 18:34:12,816: INFO/MainProcess] Received task: fetch[045acf81-274b-457c-8bb5-6d0248264b76]
[2018-07-23 18:34:17,786: INFO/MainProcess] Received task: format[103b4ffa-57db-4b04-a745-7dfee5786695]
[2018-07-23 18:34:18,227: INFO/MainProcess] Received task: convert[81ddbaf9-37b3-406a-b608-a05affa97f45]
[2018-07-23 18:34:20,942: INFO/MainProcess] Received task: callback[b1ea7c70-db45-4501-9859-7ad22532c38a]
The reason is that the celery version of the two machines is different!
And then we set the same celery version, they work!
I am hoping someone can help me as I've looked on Stack Overflow and cannot find a solution to my problem. I am running a Django project and have Supervisor, RabbitMQ and Celery installed. RabbitMQ is up and running and Supervisor is ensuring my celerybeat is running, however, while it logs that the beat has started and sends tasks every 5 minutes (see below), the tasks never actually execute:
My supervisor program conf:
[program:nrv_twitter]
; Set full path to celery program if using virtualenv
command=/Users/tsantor/.virtualenvs/nrv_env/bin/celery beat -A app --loglevel=INFO --pidfile=/tmp/nrv-celerybeat.pid --schedule=/tmp/nrv-celerybeat-schedule
; Project dir
directory=/Users/tsantor/Projects/NRV/nrv
; Logs
stdout_logfile=/Users/tsantor/Projects/NRV/nrv/logs/celerybeat_twitter.log
redirect_stderr=true
autorestart=true
autostart=true
startsecs=10
user=tsantor
; if rabbitmq is supervised, set its priority higher so it starts first
priority=999
Here is the output of the log from the program above:
[2014-12-16 20:29:42,293: INFO/MainProcess] beat: Starting...
[2014-12-16 20:34:08,161: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:39:08,186: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:44:08,204: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:49:08,205: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:54:08,223: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
Here is my celery.py settings file:
from datetime import timedelta
BROKER_URL = 'amqp://guest:guest#localhost//'
CELERY_DISABLE_RATE_LIMITS = True
CELERYBEAT_SCHEDULE = {
'gettweets-every-5-mins': {
'task': 'twitter.tasks.get_tweets',
'schedule': timedelta(seconds=300) # 300 = every 5 minutes
},
}
Here is my celeryapp.py:
from __future__ import absolute_import
import os
from django.conf import settings
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'app.settings')
app = Celery('app')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
Here is my twitter/tasks.py:
from __future__ import absolute_import
import logging
from celery import shared_task
from twitter.views import IngestTweets
log = logging.getLogger('custom.log')
#shared_task
def get_tweets():
"""
Get tweets and save them to the DB
"""
instance = IngestTweets()
IngestTweets.get_new_tweets(instance)
log.info('Successfully ingested tweets via celery task')
return True
The get_tweets method never gets executed, however I know it works as I can execute get_tweets manually and it works fine.
I have spent two days trying to figure out why its sending due tasks, but not executing them? Any help is greatly appreciated. Thanks in advance.
user2097159 thanks for pointing me in the right direction, I was not aware I also must run a worker using supervisor. I thought it was either a worker or a beat, but now I understand that I must have a worker to handle the task and a beat to fire off the task periodically.
Below is the missing worker config for supervisor:
[program:nrv_celery_worker]
; Worker
command=/Users/tsantor/.virtualenvs/nrv_env/bin/celery worker -A app --loglevel=INFO
; Project dir
directory=/Users/tsantor/Projects/NRV/nrv
; Logs
stdout_logfile=/Users/tsantor/Projects/NRV/nrv/logs/celery_worker.log
redirect_stderr=true
autostart=true
autorestart=true
startsecs=10
user=tsantor
numprocs=1
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
I then reset the RabbitMQ queue. Now that I have both the beat and worker programs managed via supervisor, all is working as intended. Hope this helps someone else out.
You need to a start both a worker process and a beat process. You can create separate processes as described in tsantor's answer, or you can create a single process with both a worker and a beat. This can be more convenient during development (but is not recommended for production).
From "Starting the scheduler" in the Celery documentation:
You can also embed beat inside the worker by enabling the workers -B option, this is convenient if you’ll never run more than one worker node, but it’s not commonly used and for that reason isn’t recommended for production use:
$ celery -A proj worker -B
For expression in Supervisor config files see https://github.com/celery/celery/tree/master/extra/supervisord/ (linked from "Daemonization")
I tried using this code to try to dynamically add / remove scheduled tasks.
My tasks.py file looks like this:
from celery.decorators import task
import logging
log = logging.getLogger(__name__)
#task
def mytask():
log.debug("Executing task")
return
The problem is that the tasks do not actually execute (i.e there is no log output), but I get the following messages in my celery log file, exactly on schedule:
[2013-05-10 04:53:00,005: INFO/MainProcess] Got task from broker: cron.tasks.mytask[dfcf397b-e30b-45bd-9f5f-11a17a51b6c4]
[2013-05-10 04:54:00,007: INFO/MainProcess] Got task from broker: cron.tasks.mytask[f013b3cd-6a0f-4060-8bcc-3bb51ffaf092]
[2013-05-10 04:55:00,007: INFO/MainProcess] Got task from broker: cron.tasks.mytask[dfc0d563-ff4b-4132-955a-4293dd3a9ac7]
[2013-05-10 04:56:00,012: INFO/MainProcess] Got task from broker: cron.tasks.mytask[ba093535-0d70-4dc5-89e4-441b72cfb61f]
I can definitely confirm that the logger is configured correctly and working fine. If I were to try and call result = mytask.delay() in the interactive shell, result.state will indefinitely contain the state PENDING.
EDIT: See also Django Celery Periodic Tasks Run But RabbitMQ Queues Aren't Consumed