Django rq-scheduler: jobs in scheduler doesnt get executed - python

In my Heroku application I succesfully implemented background tasks. For this purpose I created a Queue object at the top of my views.py file and called queue.enqueue() in the appropriate view.
Now I'm trying to set a repeated job with rq-scheduler's scheduler.schedule() method. I know that it is not best way to do it but I call this method again at the top of my views.py file. Whatever I do, I couldn't get it to work, even if it's a simple HelloWorld function.
views.py:
from redis import Redis
from rq import Queue
from worker import conn
from rq_scheduler import Scheduler
scheduler = Scheduler(queue=q, connection=conn)
print("SCHEDULER = ", scheduler)
def say_hello():
print(" Hello world!")
scheduler.schedule(
scheduled_time=datetime.utcnow(), # Time for first execution, in UTC timezone
func=say_hello, # Function to be queued
interval=60, # Time before the function is called again, in seconds
repeat=10, # Repeat this number of times (None means repeat forever)
queue_name='default',
)
worker.py:
import os
import redis
from rq import Worker, Queue, Connection
import django
django.setup()
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL')
if not redis_url:
print("Set up Redis To Go first. Probably can't get env variable REDISTOGO_URL")
raise RuntimeError("Set up Redis To Go first. Probably can't get env variable REDISTOGO_URL")
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
print(" CREATING NEW WORKER IN worker.py")
worker = Worker(map(Queue, listen))
worker.work()
I'm checking the length of my queue before and after of schedule(), but it looks like length is always 0. I also can see that there are jobs when I call scheduler.get_jobs(), but those jobs doesn't get enqueued or performed I think.
I also don't want to use another cron solution for my project, as I already can do background tasks with rq, it shouldn't be that hard to implement a repeated task, or is it?
I went through documentation a couple times, now I feel so stuck, so I appretiate all the help or advices that I can get.
Using rq 1.6.1 and rq-scheduler 0.10.0 packages with Django 2.2.5 and Python 3.6.10
Edit: When I print jobs in scheduler, I see that their enqueued_at param is set to None, am I missing something really simple?

Related

python : dynamically spawn multithread workers with flask-socket io and python-binance

Hello fellow developers,
I'm actually trying to create a small webapp that would allow me to monitor multiple binance accounts from a dashboard and maybe in the futur perform some small automatic trading actions.
My frontend is implemented with Vue+quasar and my backend server is based on python Flask for the REST api.
What I would like to do is being able to start a background process dynamically when a specific endpoint of my server is called. Once this process is started on the server, I would like it to communicate via websocket with my Vue client.
Right now I can spawn the worker and create the websocket communication, but somehow, I can't figure out how to make all the threads in my worker to work all together. Let me get a bit more specific:
Once my worker is started, I'm trying to create at least two threads. One is the infinite loop allowing me to automate some small actions and the other one is the flask-socketio server that will handle the sockets connections. Here is the code of that worker :
customWorker.py
import time
from flask import Flask
from flask_socketio import SocketIO, send, emit
import threading
import json
import eventlet
# custom class allowing me to communicate with my mongoDD
from db_wrap import DbWrap
from binance.client import Client
from binance.exceptions import BinanceAPIException, BinanceWithdrawException, BinanceRequestException
from binance.websockets import BinanceSocketManager
def process_message(msg):
print('got a websocket message')
print(msg)
class customWorker:
def __init__(self, workerId, sleepTime, dbWrap):
self.workerId = workerId
self.sleepTime = sleepTime
self.socketio = None
self.dbWrap = DbWrap()
# this retrieves worker configuration from database
self.config = json.loads(self.dbWrap.get_worker(workerId))
keys = self.dbWrap.get_worker_keys(workerId)
self.binanceClient = Client(keys['apiKey'], keys['apiSecret'])
def handle_message(self, data):
print ('My PID is {} and I received {}'.format(os.getpid(), data))
send(os.getpid())
def init_websocket_server(self):
app = Flask(__name__)
socketio = SocketIO(app, async_mode='eventlet', logger=True, engineio_logger=True, cors_allowed_origins="*")
eventlet.monkey_patch()
socketio.on_event('message', self.handle_message)
self.socketio = socketio
self.app = app
def launch_main_thread(self):
while True:
print('My PID is {} and workerId {}'
.format(os.getpid(), self.workerId))
if self.socketio is not None:
info = self.binanceClient.get_account()
self.socketio.emit('my_account', info, namespace='/')
def launch_worker(self):
self.init_websocket_server()
self.socketio.start_background_task(self.launch_main_thread)
self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False)
Once the REST endpoint is called, the worker is spawned by calling birth_worker() method of "Broker" object available within my server :
from custom_worker import customWorker
#...
def create_worker(self, workerid, sleepTime, dbWrap):
worker = customWorker(workerid, sleepTime, dbWrap)
worker.launch_worker()
def birth_worker(workerid, 5, dbwrap):
p = Process(target=self.create_worker, args=(workerid,10, botPipe, dbWrap))
p.start()
So when this is done, the worker is launched in a separate process that successfully creates threads and listens for socket connection. But my problem is that I can't use my binanceClient in my main thread. I think that it is using threads and the fact that I use eventlet and in particular the monkey_patch() function breaks it. When I try to call the binanceClient.get_account() method I get an error AttributeError: module 'select' has no attribute 'poll'
I'm pretty sure about that it comes from monkey_patch because if I use it in the init() method of my worker (before patching) it works and I can get the account info. So I guess there is a conflict here that I've been trying to resolve unsuccessfully.
I've tried using only the thread mode for my socket.io app by using async_mode=threading but then, my flask-socketio app won't start and listen for sockets as the line self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False) blocks everything
I'm pretty sure I have an architecture problem here and that I shouldn't start my app by launching socketio.run. I've been unable to start it with gunicorn for example because I need it to be dynamic and call it from my python scripts. I've been struggling to find the proper way to do this and that's why I'm here today.
Could someone please give me a hint on how is this supposed to be achieved ? How can I dynamically spawn a subprocess that will manage a socket server thread, an infinite loop thread and connections with binanceClient ? I've been roaming stack overflow without success, every advice is welcome, even an architecture reforge.
Here is my environnement:
Manjaro Linux 21.0.1
pip-chill:
eventlet==0.30.2
flask-cors==3.0.10
flask-socketio==5.0.1
pillow==8.2.0
pymongo==3.11.3
python-binance==0.7.11
websockets==8.1

Launching celery task_monitor in django

Looking at the celery docs i can see that the task monitor is launched in a script (see below). In an implementation of django (as is my understanding), this won't be the case, as (in my understanding) I'll have to launch the task monitor in a thread.
Currently I'm launching the monitor the first time i run a job, then checking its state each subsequent time i run a job (see further below). This seems like a bad way to do this.
My question is globally: What is the correct way to instantiate the task monitor for celery in a django project? but a good answer would include:
Is threading the accepted way to do this?
Should i launch this in a sub process
do i need to be worried about volume going through the task monitor (hence i should use threading)
Is there a standard, widely accepted way to do this?
It seems I'm missing something really obvious.
# docs example - not implemented like this in my project
from celery import Celery
def my_monitor(app):
state = app.events.State()
def announce_failed_tasks(event):
state.event(event)
# task name is sent only with -received event, and state
# will keep track of this for us.
task = state.tasks.get(event['uuid'])
print('TASK FAILED: %s[%s] %s' % (
task.name, task.uuid, task.info(),))
with app.connection() as connection:
recv = app.events.Receiver(connection, handlers={
'task-failed': announce_failed_tasks,
})
recv.capture(limit=None, timeout=None, wakeup=True)
if __name__ == '__main__':
app = Celery(broker='amqp://guest#localhost//')
# LAUNCHED HERE
my_monitor(app)
# my current implementation
# If the celery_monitor is not instantiated, set it up
app = Celery('scheduler',
broker=rabbit_url, # Rabbit-MQ
backend=redis_url, # Redis
include=tasks
)
celery_monitor = Thread(target=build_monitor, args=[app], name='monitor-global', daemon=True)
# import celery_monitor into another module
global celery_monitor
if not celery_monitor.is_alive():
try:
celery_monitor.start()
logger.debug('Celery Monitor - Thread Started (monitor-retry) ')
except RuntimeError as e: # occurs if thread is dead
# create new instance if thread is dead
logger.debug('Celery Monitor - Error restarting thread (monitor-rety): {}'.format(e))
celery_monitor = Thread(target=build_monitor, args=[app], name='monitor-retry', daemon=True)
celery_monitor.start() # start thread
logger.debug('Celery Monitor - Thread Re-Started (monitor-retry) ')
else:
logger.debug('Celery Monitor - Thread is already alive. Dont do anything.')

Undesired delay in the celery process

I am encountering an undesired delay in the celery process that I cannot explain. My intent is to manage live processing of incoming data (at a rate of 10 to 60 data per seconds). Processing of one piece of data is divided into two fully sequential tasks but parallelization is used to start processing the next piece of data (with task 1) while processing the current one (with task 2) is not finished yet. Getting the shortest delay in the process is of at-most importance since it is a live application.
Once in a while, I encounter a freeze in the process. To see where this problem came from I started monitoring the occupation of my workers. It appeared that it happened during the communication between workers. I designed the lightest and simplest example to illustrate it here.
Here is my code, as you can see I have two tasks doing nothing but waiting 10ms each. I call them by using celery chains once every 20ms. I track each workers occupation by using prerun and postrun along with logging. In most of the case all is happening sequentially as time spent by both the workers doesn't exceed the send rate.
from __future__ import absolute_import
import time
from celery import chain
from celery.signals import task_prerun, task_postrun
from celery import Celery
from kombu import Queue, Exchange
N_ITS = 100000 # Total number of chains sent
LOG_FILE = 'log_file.txt' # Path to the log file
def write_to_log_file(text):
with open(LOG_FILE, 'a') as f:
f.write(text)
# Create celery app
app = Celery('live')
app.config_from_object('celeryconfig')
default_exchange = Exchange('default', type='direct')
app.conf.task_queues = tuple(Queue(route['queue'], default_exchange, routing_key=route['queue'])
for route in app.conf.task_routes.values() + [{'queue': 'default'}])
app.conf.update(result_expires=3600)
# Define functions that record timings
#task_prerun.connect()
def task_prerun(signal=None, sender=None, task_id=None, task=None, **kwargs):
text = 'task_prerun; {0}; {1:.16g}\n'.format(task.name, time.time())
write_to_log_file(text)
#task_postrun.connect()
def task_postrun(signal=None, sender=None, task_id=None, task=None, **kwargs):
text = 'task_postrun; {0}; {1:.16g}\n'.format(task.name, time.time())
write_to_log_file(text)
# Define tasks
#app.task
def task_1(i):
print 'Executing task_1: {}'.format(i)
time.sleep(0.01)
#app.task
def task_2(i):
print 'Executing task_2: {}'.format(i)
time.sleep(0.01)
# Send chained tasks
def main():
celery_chains = []
for i in range(N_ITS):
print '[{}] - Dispatching tasks'.format(i)
celery_chains.append(chain(task_1.si(i) | task_2.si(i))())
time.sleep(0.02)
# wait for all tasks to complete
[c.get() for c in celery_chains]
if __name__ == '__main__':
main()
I also give the configuration of celery if needed:
from __future__ import absolute_import
import os
name = 'live'
broker_url = 'pyamqp://{}'.format(os.environ.get('RMQ_HOST', 'localhost'))
print 'broker_url:', broker_url
include = ['live']
DEFAULT_QUEUE = 'celery'
# A named queue that's not already defined in task_queues will be created automatically.
task_create_missing_queues = True
broker_pool_limit = 10000
task_routes = {
'live.task_1': {'queue': 'worker_1'},
'live.task_2': {'queue': 'worker_2'}
}
# We always set the routing key to be the queue name so we do it here automatically.
for v in task_routes.values():
v.update({'routing_key': v['queue']})
task_serializer = 'pickle'
result_serializer = 'pickle'
accept_content = ['json', 'pickle']
timezone = 'Europe/Paris'
enable_utc = True
For the broker, I use the docker image rabbitmq:3.6-alpine with basic configurations appart that I enabled rabbitmq_management.
This resuts in the following worker occupation chronogram: (the color indicates the index of the data being processed, so you can link tasks belonging to the same chain)
As you can see, usually everything goes well and task 2 is called right after task 1 is finished. However, sometimes (indicated by the arrows on the figure) task 2 doesn't start immediately even though worker 2 isn't occupied. It imputes a delay of 27ms, which is more than twice the duration of a single task. This happened approximately every 2 seconds during this execution.
I made some additionnal investigation using firehose to study the message exchange in rabbitmq and it resulted that the messages are effectively sent on time. To my understanding, the worker waits to go fetch the message and process the task, but I cannot understand why.
I tried setting the broker pool limit to a high number but the issue remains.

Why does this Celery "hello world" loop forever?

Consider the code:
from celery import Celery, group
from time import time
app = Celery('tasks', broker='redis:///0', backend='redis:///1', task_ignore_result=False)
#app.task
def test_task(i):
print('hi')
return i
x = test_task.delay(3)
print(x.get())
I run it by calling python script.py, but I'm getting no results. Why?
You don't get any results because you've asked your celery app to execute a task without starting a worker process to do the work executing it. The process you did start is blocked on the call to get().
First things first, when using celery it is critical that you do not have tasks get executed when a module is imported, so let's put your task execution inside of a main() function, and put it in a file called celery_test.py.
from celery import Celery, group
from time import time
app = Celery('tasks', broker='redis:///0', backend='redis:///1', task_ignore_result=False)
#app.task
def test_task(i):
print('hi')
return i
def main():
x = test_task.delay(3)
print(x.get())
if __name__ == '__main__':
main()
Now let's start a pool of celery workers to execute tasks for this app. You can do this by opening a new terminal and executing the following.
celery worker -A celery_test --loglevel=INFO
The -A flag refers to the module where celery will find an application to add workers to. You should see some output in the terminal to indicate that the the celery worker is running and ready for tasks to process.
Now, try executing your script again with python celery_test.py. You should see hi show up in the worker's log output, but the the value 3 returned in the script that called get().
Be warned, if you've been playing with celery without running a worker, it probably has lots of tasks waiting in your broker to execute. The first time you start up the worker pool, you'll see them all execute in parallel until the broker runs out of tasks.

Celery, periodic task execution, with concurrency

I would like to launch a periodic task every second but only if the previous task ended (db polling to send task to celery).
In the Celery documentation they are using the Django cache to make a lock.
I tried to use the example:
from __future__ import absolute_import
import datetime
import time
from celery import shared_task
from django.core.cache import cache
LOCK_EXPIRE = 60 * 5
#shared_task
def periodic():
acquire_lock = lambda: cache.add('lock_id', 'true', LOCK_EXPIRE)
release_lock = lambda: cache.delete('lock_id')
a = acquire_lock()
if a:
try:
time.sleep(10)
print a, 'Hello ', datetime.datetime.now()
finally:
release_lock()
else:
print 'Ignore'
with the following configuration:
app.conf.update(
CELERY_IGNORE_RESULT=True,
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
CELERYBEAT_SCHEDULE={
'periodic_task': {
'task': 'app_task_management.tasks.periodic',
'schedule': timedelta(seconds=1),
},
},
)
But in the console, I never see the Ignore message and I have Hello every second. It seems that the lock is not working fine.
I launch the periodic task with:
celeryd -B -A my_app
and the worker with:
celery worker -A my_app -l info
Could you please correct my misunderstanding?
From the Django Cache Framework documentation about local-memory cache:
Note that each process will have its own private cache instance, which
means no cross-process caching is possible.
So basically your workers are each dealing with their own cache. If you need a low resource cost cache backend I would recommend File Based Cache or Database Cache, both allow cross-process.

Categories

Resources