RQ Worker throwing "ValueError"

RQ Worker throwing "ValueError" - python

I'm attempting to get RQ/RQ-Worker running on my Flask application. I've tried to get it down to a very simple test case. Here's the general idea:
The user visits the /test page. Which triggers a job to be queued and returns the queued job's job_key
The worker (worker.py) processes the queued job.
The user can then visit the /retrieve/<job_key> page to retrieve the result. [This is not shown.]
The current job is just to add 2 + 2.
Here is the application code:
from rq import Queue
from rq.job import Job
# import conn from worker.py
from worker import conn
app = Flask(__name__)
q = Queue(connection=conn)
def add():
return 2+2
#app.route('/test')
def test():
job = q.enqueue_call(func="add", args=None, result_ttl=5000)
return job.get_id()
if __name__ == "__main__":
app.run()
The worker.py source code looks like this:
from redis import StrictRedis
from rq import Worker, Queue, Connection
listen = ['default']
redis_url = 'redis://localhost:6379'
conn = StrictRedis.from_url(redis_url)
if __name__ == "__main__":
with Connection(conn):
worker = Worker(list(map(Queue, listen)))
worker.work()
To my knowledge, the application code isn't the issue. I can visit the /test page which will enqueue the job. However, once I run the worker, I get the following error:
Traceback (most recent call last):
File "/home/<>/dev/sched/venv/lib/python3.5/site-packages/rq/worker.py", line 588, in perform_job
rv = job.perform()
File "/home/<>/dev/sched/venv/lib/python3.5/site-packages/rq/job.py", line 498, in perform
self._result = self.func(*self.args, **self.kwargs)
File "/home/<>/dev/sched/venv/lib/python3.5/site-packages/rq/job.py", line 206, in func
return import_attribute(self.func_name)
File "/home/<>/dev/sched/venv/lib/python3.5/site-packages/rq/utils.py", line 149, in import_attribute
module_name, attribute = name.rsplit('.', 1)
ValueError: not enough values to unpack (expected 2, got 1)
I feel like the line:
worker = Worker(list(map(Queue, listen)))
is the problem just b/c of the nature of the error, but I have no idea how to fix it. Especially b/c I've seen other projects that seem to use the exact same worker source code.
My technology stack is:
Flask (0.11.1)
Redis (2.10.5)
RQ (0.6.0)
RQ-Worker (0.0.1)
EDIT:
Beginning to think this is a bug. Check out this issue ticket in RQ's source: issue #531.

For me the issue was caused by RQ not being able to resolve my worker module.
The solution was to supply the "qualified" name to enqueue, e.g:
job = q.enqueue("app.worker.add", data)

Related

How to access redis connection in background task

I'm trying to extend the flask-base project https://github.com/hack4impact/flask-base/tree/master/app which comes with a user model only. I'm trying to add the ability to run a background task on redis using rq. I've found https://devcenter.heroku.com/articles/python-rq which is helpful.
this app has support for redis queues with a background redis queue being implemented by running :
#manager.command
def run_worker():
"""Initializes a slim rq task queue."""
listen = ['default']
conn = Redis(
host=app.config['RQ_DEFAULT_HOST'],
port=app.config['RQ_DEFAULT_PORT'],
db=0,
password=app.config['RQ_DEFAULT_PASSWORD'])
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
using:
$ python manage.py run_worker
In my views I have:
#main.route('/selected')
def background_selected():
from rq import Queue
from manage import run_worker.conn
q = Queue(connection=conn)
return q.enqueue(selected)
The problem is I don't know how to import the connection created in run_worker() into my view. I've tried variations of :
from manage import run_worker.conn
but I'm getting:
SyntaxError: invalid syntax.
How can I get access to the conn variable in the background task?

from the documentation, python-rq Configuration
Can you try by making the below changes:
manager.py
import redis
"""Initializes a slim rq task queue."""
listen = ['default']
conn = redis.Redis(host=app.config['RQ_DEFAULT_HOST'],
port=app.config['RQ_DEFAULT_PORT'],
db=0,
password=app.config['RQ_DEFAULT_PASSWORD'])
#manager.command
def run_worker():
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
and from view:
from rq import Queue
from manage import conn
q = Queue(connection=conn)

I contacted the developer who provided the following:

Celery add_periodic_task blocks Django running in uwsgi environment

I have written a module that dynamically adds periodic celery tasks based on a list of dictionaries in the projects settings (imported via django.conf.settings).
I do that using a function add_tasks that schedules a function to be called with a specific uuid which is given in the settings:
def add_tasks(celery):
for new_task in settings.NEW_TASKS:
celery.add_periodic_task(
new_task['interval'],
my_task.s(new_task['uuid']),
name='My Task %s' % new_task['uuid'],
)
Like suggested here I use the on_after_configure.connect signal to call the function in my celery.py:
app = Celery('my_app')
#app.on_after_configure.connect
def setup_periodic_tasks(celery, **kwargs):
from add_tasks_module import add_tasks
add_tasks(celery)
This setup works fine for both celery beat and celery worker but breaks my setup where I use uwsgi to serve my django application. Uwsgi runs smoothly until the first time when the view code sends a task using celery's .delay() method. At that point it seems like celery is initialized in uwsgi but blocks forever in the above code. If I run this manually from the commandline and then interrupt when it blocks, I get the following (shortened) stack trace:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
return obj.__dict__[self.__name__]
KeyError: 'tasks'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
return obj.__dict__[self.__name__]
KeyError: 'data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
return obj.__dict__[self.__name__]
KeyError: 'tasks'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
(SHORTENED HERE. Just contained the trace from the console through my call to this function)
File "/opt/my_app/add_tasks_module/__init__.py", line 42, in add_tasks
my_task.s(new_task['uuid']),
File "/usr/local/lib/python3.6/site-packages/celery/local.py", line 146, in __getattr__
return getattr(self._get_current_object(), name)
File "/usr/local/lib/python3.6/site-packages/celery/local.py", line 109, in _get_current_object
return loc(*self.__args, **self.__kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/app/__init__.py", line 72, in task_by_cons
return app.tasks[
File "/usr/local/lib/python3.6/site-packages/kombu/utils/objects.py", line 44, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/usr/local/lib/python3.6/site-packages/celery/app/base.py", line 1228, in tasks
self.finalize(auto=True)
File "/usr/local/lib/python3.6/site-packages/celery/app/base.py", line 507, in finalize
with self._finalize_mutex:
It seems like there is a problem with acquiring a mutex.
Currently I am using a workaround to detect if sys.argv[0] contains uwsgi and then not add the periodic tasks, as only beat needs the tasks, but I would like to understand what is going wrong here to solve the problem more permanently.
Could this problem have something to do with using uwsgi multi-threaded or multi-processed where one thread/process holds the mutex the other needs?
I'd appreciate any hints that can help me solve the problem. Thank you.
I am using: Django 1.11.7 and Celery 4.1.0
Edit 1
I have created a minimal setup for this problem:
celery.py:
import os
from celery import Celery
from django.conf import settings
from myapp.tasks import my_task
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'my_app.settings')
app = Celery('my_app')
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(
60,
my_task.s(),
name='Testtask'
)
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
tasks.py:
from celery import shared_task
#shared_task()
def my_task():
print('ran')
Make sure that CELERY_TASK_ALWAYS_EAGER=False and that you have a working message queue.
Run:
./manage.py shell -c 'from myapp.tasks import my_task; my_task.delay()'
Wait about 10 seconds before interrupting to see the above error.

So, I have found out that the #shared_task decorator creates the problem. I can circumvent the problem when I declare the task right in the function called by the signal like so:
def add_tasks(celery):
#celery.task
def my_task(uuid):
print(uuid)
for new_task in settings.NEW_TASKS:
celery.add_periodic_task(
new_task['interval'],
my_task.s(new_task['uuid']),
name='My Task %s' % new_task['uuid'],
)
This solution is actually working for me, but I have one more problem with this: I use this code in a pluggable app, so I can't directly access the celery app outside of the signal handler but would like to also be able to call the my_task function from within other code. By defining it within the function it is not available outside of the function, so I cannot import it anywhere else.
I can probably work around this by defining the task function outside of the signal function, and use it with different decorators here and in the tasks.py. I am wondering though if there is a decorator apart from the #shared_task decorator that I can use in the tasks.py that does not create the problem.
The current best solution could be:
task_app.__init__.py:
def my_task(uuid):
# do stuff
print(uuid)
def add_tasks(celery):
celery_my_task = celery.task(my_task)
for new_task in settings.NEW_TASKS:
celery.add_periodic_task(
new_task['interval'],
celery_my_task(new_task['uuid']),
name='My Task %s' % new_task['uuid'],
)
task_app.tasks.py:
from celery import shared_task
from task_app import my_task
shared_my_task = shared_task(my_task)
myapp.celery.py:
import os
from celery import Celery
from django.conf import settings
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'my_app.settings')
app = Celery('my_app')
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
from task_app import add_tasks
add_tasks(sender)
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

Could you give a try that signal #app.on_after_finalize.connect:
some fast snippet from working project celery==4.1.0, Django==2.0, django-celery-beat==1.1.0 and django-celery-results==1.0.1
#app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
""" setup of periodic task :py:func:shopify_data_fetcher.celery.fetch_shopify
based on the schedule defined in: settings.CELERY_BEAT_SCHEDULE
"""
for task_name, task_config in settings.CELERY_BEAT_SCHEDULE.items():
sender.add_periodic_task(
task_config['schedule'],
fetch_shopify.s(**task_config['kwargs']['resource_name']),
name=task_name
)
piece of CELERY_BEAT_SCHEDULE:
CELERY_BEAT_SCHEDULE = {
'fetch_shopify_orders': {
'task': 'shopify.tasks.fetch_shopify',
'schedule': crontab(hour="*/3", minute=0),
'kwargs': {
'resource_name': shopify_constants.SHOPIFY_API_RESOURCES_ORDERS
}
}
}

Django rq worker proper setup passing django settngs

i Have a django project which works fine. I need a rq worker to do a job. I got a redis-server running.
Whis is my worker.py file:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
os.environ['DJANGO_SETTINGS_MODULE']='ak.settings.prod_lt'
worker = Worker(map(Queue, listen))
worker.work()
I run the worker from my ubuntu terminal using the following commandline:
/opt/crm/env/bin/python /opt/crm/ak/ak/worker.py
The worker starts fine.
The job I am giving it is getting data from the database and writing the data to an excel file but I am getting the following error:
rq.exceptions.UnpickleError: (
u'Could not unpickle',
ImportError('No module named ak.settings.prod_lt',
<function model_unpickle at 0x7fcba6b67938>,
(('crm', 'EmailExport'), [], <function simple_class_factory at 0x7fcba6b678c0>)))
Can anybody tell me what could be wrong?

Cannot run in multiple processes: IOLoop instance has already been initialized. You cannot call IOLoop.instance() before calling start_processes()

I'm trying to run multiple process in Tornado and I tried the suggestions made on this thread : run multiple tornado processess
But the error hasn't gone for me. This is the server file.
server.py
import os
import sys
import tornado
#import pymongo
from tornado import ioloop, web, httpserver, websocket
from tornado.options import options
#Loading default setting files
import settings
#Motorengine - ODM for mongodb
#from motorengine import connect
app = tornado.web.Application(handlers=[
(r'/', MainHandler),
(r'/ws', WSHandler),
(r'/public/(.*)', tornado.web.StaticFileHandler, {'path': options.public_path})],
template_path=os.path.join(os.path.dirname(__file__), "app/templates"),
static_path= options.static_path,
autoreload=True,
#images=os.path.join(os.path.dirname(__file__), "images"),
debug=False)
if __name__ == '__main__':
#read settings from commandline
options.parse_command_line()
server = tornado.httpserver.HTTPServer(app, max_buffer_size=1024*1024*201)
server.bind(options.port)
# autodetect cpu cores and fork one process per core
server.start(0)
#app.listen(options.port,xheaders=True)
try:
ioloop = tornado.ioloop.IOLoop.instance()
#connect("attmlplatform", host="localhost", port=27017, io_loop=ioloop)
print("Connected to database..")
ioloop.start()
print ('Server running on http://localhost:{}'.format(options.port))
except KeyboardInterrupt:
tornado.ioloop.IOLoop.instance().stop()
I've commented out the 'connect' import based on the anticipation that it may be triggering the instance and I'm not connecting to the database at all. This is just trying to get the server up.
This is the entire trace :
File "server.py", line 52, in <module>
server.start(0)
File "/home/vagrant/anaconda3/envs/py34/lib/python3.4/site-packages/tornado/tcpserver.py", line 200, in start
process.fork_processes(num_processes)
File "/home/vagrant/anaconda3/envs/py34/lib/python3.4/site-packages/tornado/process.py", line 126, in fork_processes
raise RuntimeError("Cannot run in multiple processes: IOLoop instance "
RuntimeError: Cannot run in multiple processes: IOLoop instance has already been initialized. You cannot call IOLoop.instance() before calling start_processes()
Any suggestions much appreciated!
Thanks!

autoreload is incompatible with multi-process mode. When autoreload is enabled you must run only one process.

Python cassandra-driver OperationTimeOut on every query in Celery task

I have a problem with every insert query (little query) which is executed in celery tasks asynchronously.
In sync mode when i do insert all done great, but when it executed in apply_async() i get this:
OperationTimedOut('errors=errors=errors={}, last_host=***.***.*.***, last_host=None, last_host=None',)
Traceback:
Traceback (most recent call last):
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
File "/var/nfs_www/***/www_v1/app/mods/news_feed/tasks.py", line 26, in send_new_comment_reply_notifications
send_new_comment_reply_notifications_method(comment_id)
File "/var/nfs_www/***www_v1/app/mods/news_feed/methods.py", line 83, in send_new_comment_reply_notifications
comment_type='comment_reply'
File "/var/nfs_www/***/www_v1/app/mods/news_feed/models/storage.py", line 129, in add
CommentsFeed(**kwargs).save()
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cqlengine/models.py", line 531, in save
consistency=self.__consistency__).save()
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cqlengine/query.py", line 907, in save
self._execute(insert)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cqlengine/query.py", line 786, in _execute
tmp = execute(q, consistency_level=self._consistency)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cqlengine/connection.py", line 95, in execute
result = session.execute(query, params)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 1103, in execute
result = future.result(timeout)
File "/var/nfs_www/***/env_v0/local/lib/python2.7/site-packages/cassandra/cluster.py", line 2475, in result
raise OperationTimedOut(errors=self._errors, last_host=self._current_host)
OperationTimedOut: errors={}, last_host=***.***.*.***
Does anyone have ideas about problem?
I found this When cassandra-driver was executing the query, cassandra-driver returned error OperationTimedOut, but my query is very little and problem only in celery tasks.
UPDATE:
I made a test task and it raises this error too.
#celery.task()
def test_task_with_cassandra():
from app import cassandra_session
cassandra_session.execute('use news_feed')
return 'Done'
UPDATE 2:
Made this:
#celery.task()
def test_task_with_cassandra():
from cqlengine import connection
connection.setup(app.config['CASSANDRA_SERVERS'], port=app.config['CASSANDRA_PORT'],
default_keyspace='test_keyspace')
from .models import Feed
Feed.objects.count()
return 'Done'
Got this:
NoHostAvailable('Unable to connect to any servers', {'***.***.*.***': OperationTimedOut('errors=errors=Timed out creating connection, last_host=None, last_host=None',)})
From shell i can connect to it
UPDATE 3:
From deleted thread on github issue (found this in my emails): (this worked for me too)
Here's how, in substance, I plug CQLengine to Celery:
from celery import Celery
from celery.signals import worker_process_init, beat_init
from cqlengine import connection
from cqlengine.connection import (
cluster as cql_cluster, session as cql_session)
def cassandra_init():
""" Initialize a clean Cassandra connection. """
if cql_cluster is not None:
cql_cluster.shutdown()
if cql_session is not None:
cql_session.shutdown()
connection.setup()
# Initialize worker context for both standard and periodic tasks.
worker_process_init.connect(cassandra_init)
beat_init.connect(cassandra_init)
app = Celery()
This is crude, but works. Should we add this snippet in the FAQ ?

I had a similar issue. It seemed to be related to sharing the Cassandra session between tasks. I solved it by creating a session per thread. Make sure you call get_session() from you tasks and then do this:
thread_local = threading.local()
def get_session():
if hasattr(thread_local, "cassandra_session"):
return thread_local.cassandra_session
cluster = Cluster(settings.CASSANDRA_HOSTS)
session = cluster.connect(settings.CASSANDRA_KEYSPACE)
thread_local.cassandra_session = session
return session

Inspired by Ron's answer, I come up with the following code to put in tasks.py:
import threading
from django.conf import settings
from cassandra.cluster import Cluster
from celery.signals import worker_process_init,worker_process_shutdown
thread_local = threading.local()
#worker_process_init.connect
def open_cassandra_session(*args, **kwargs):
cluster = Cluster([settings.DATABASES["cassandra"]["HOST"],], protocol_version=3)
session = cluster.connect(settings.DATABASES["cassandra"]["NAME"])
thread_local.cassandra_session = session
#worker_process_shutdown.connect
def close_cassandra_session(*args,**kwargs):
session = thread_local.cassandra_session
session.shutdown()
thread_local.cassandra_session = None
This neat solution will automatically open/close cassandra sessions when celery worker process starts and stops.
Side note: protocol_version=3, because Cassandra 2.1 only supports protocol versions 3 and lower.

The other answers didn't work for me, but the question's 'update 3' did. Here's what I ended up with (small updates to the suggestion within the question):
from celery.signals import worker_process_init
from cassandra.cqlengine import connection
from cassandra.cqlengine.connection import (
cluster as cql_cluster, session as cql_session)
def cassandra_init(*args, **kwargs):
""" Initialize a clean Cassandra connection. """
if cql_cluster is not None:
cql_cluster.shutdown()
if cql_session is not None:
cql_session.shutdown()
connection.setup(settings.DATABASES["cassandra"]["HOST"].split(','), settings.DATABASES["cassandra"]["NAME"])
# Initialize worker context (only standard tasks)
worker_process_init.connect(cassandra_init)

Using django-cassandra-engine the following resolved the issue for me:
db_connection = connections['cassandra']
#worker_process_init.connect
def connect_db(**_):
db_connection.reconnect()
#worker_shutdown.connect
def disconnect(**_):
db_connection.connection.close_all()
look at here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

RQ Worker throwing "ValueError" - python

For me the issue was caused by RQ not being able to resolve my worker module. The solution was to supply the "qualified" name to enqueue, e.g: job = q.enqueue("app.worker.add", data)

Related

How to access redis connection in background task

Celery add_periodic_task blocks Django running in uwsgi environment

Django rq worker proper setup passing django settngs

Cannot run in multiple processes: IOLoop instance has already been initialized. You cannot call IOLoop.instance() before calling start_processes()

Python cassandra-driver OperationTimeOut on every query in Celery task

Categories

Resources