Use .replace method with Celery sub-tasks - python

I'm trying to solve a problem in celery:
I have one task that queries an API for ids, and then starts a sub-task for each of these.
I do not know, ahead of time, what the ids are, or how many there are.
For each id, I go through a big calculation that then dumps some data into a database.
After all the sub-tasks are complete, I want to run a summary function (export DB results to an Excel format).
Ideally, I do not want to block my main worker querying the status of the sub-tasks (Celery gets angry if you try this.)
This question looks very similar (if not identical?): Celery: Callback after task hierarchy
So using the "solution" (which is a link to this discussion, I tried the following test script:
# test.py
from celery import Celery, chord
from celery.utils.log import get_task_logger
app = Celery('test', backend='redis://localhost:45000/10?new_join=1', broker='redis://localhost:45000/11')
app.conf.CELERY_ALWAYS_EAGER = False
logger = get_task_logger(__name__)
#app.task(bind=True)
def get_one(self):
print('hello world')
self.replace(get_two.s())
return 1
#app.task
def get_two():
print('Returning two')
return 2
#app.task
def sum_all(data):
print('Logging data')
logger.error(data)
return sum(data)
if __name__ == '__main__':
print('Running test')
x = chord(get_one.s() for i in range(3))
body = sum_all.s()
result = x(body)
print(result.get())
print('Finished w/ test')
It doesn't work for me. I get an error:
AttributeError: 'get_one' object has no attribute 'replace'
Note that I do have new_join=1 in my backend URL, though not the broker. If I put it there, I get an error:
TypeError: _init_params() got an unexpected keyword argument 'new_join'
What am I doing wrong? I'm using the Python 3.4.3 and the following packages:
amqp==1.4.6
anyjson==0.3.3
billiard==3.3.0.20
celery==3.1.18
kombu==3.0.26
pytz==2015.4
redis==2.10.3

The Task.replace method will be added in Celery 3.2: http://celery.readthedocs.org/en/master/whatsnew-3.2.html#task-replace (that changelog entry is misleading, because it suggests that Task.replace existed before and has been changed.)

Related

How do you return the result of a completed celery task and store the data in variables?

I have two flask modules app.py and tasks.py.
I set up Celery in tasks.py to complete a selenium webdriver request (which takes about 20 seconds). My goal is to simply return the result of that request to app.py.
Running the Celery worker on another terminal, I can see in the console that the Celery task completes successfully and prints all the data I need from the selenium request. However, now I just want to return the task result to app.py.
How do I obtain the celery worker results data from tasks.py and store each result element as a variable in app.py?
app.py:
I define the marketplace and call the task function and request the indexed results:
import tasks
marketplace = 'cheddar_block_games'
# This is what I am trying to get back:
price_check = tasks.scope(marketplace[0])
image = tasks.scope(marketplace[1])
tasks.py:
celery = Celery(broker='redis://127.0.0.1:6379')
#celery.task()
def scope(marketplace):
web.get(f'https://magiceden.io/marketplace/{marketplace}')
price_check = WebDriverWait(web,30).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[2]/div[2]/div[3]/div[2]/div[2]/div[3]/div[2]/div[4]/div/div[2]/div[1]/div[2]/div/div[2]/div/div[2]/div/span/div[2]/div/span[1]"))).text
image = WebDriverWait(web,30).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[2]/div[2]/div[3]/div[2]/div[2]/div[3]/div[2]/div[4]/div/div[2]/div[1]/div[2]/div/div[1]/div/div/img")))
return (price_check, image)
This answer might be relevant:
https://stackoverflow.com/a/30760142/9347535
app.py should call the task e.g. using scope.delay or scope.apply_async. You could then fetch the task result with AsyncResult.get():
https://docs.celeryq.dev/en/latest/userguide/tasks.html#result-backends
Since the task returns a tuple, you can store each variable by unpacking it:
https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
The result would be something like this:
import tasks
marketplace = 'cheddar_block_games'
result = tasks.scope.delay(marketplace)
price_check, image = result.get()

Make python script communicate with flask app

I have two scripts that are running in loop independently: a simple python script that generates data
myData=0
while True:
myData = get_data() # this data is now available for Flask App
and the flask application that displays data
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world(myData):
return str(myData)
app.run()
I wish to somehow connect the two scripts, so the application displays the data produced by the python script.
myData=0
app = Flask(__name__)
#app.route('/')
def hello_world(myData):
return str(myData)
app.run() # does not return until server is terminated
while True:
myData = get_data()
When I combine the scripts as shown above, I can see that the execution does not get to the while loop (past app.run() line) until I terminate the app.
I found a similar question here, but not not helpful, and another question here that is identical to what I am trying to do, but it also does not give me any clue. I can not find any info that tells how to make a flask application to communicate with a separately running script. Here's a similar question with no definite answer. Please, give me an insight how these two things should run together, or an example would be greatly appreciated.
Since your script keeps generating data indefinitely, I would suggest transforming it into a generator and iterating over it from the web request handler:
def my_counter():
i = 0
while True:
yield i # I'm using yield instead of return
i = i + 1
my_counter_it = my_counter()
#app.route('/')
def hello_world():
return str(next(my_counter_it)) # return next value from generator
You can also communicate with a long running separate process (external command):
import subprocess
def my_counter():
# run the yes command which repeatedly outputs y
# see yes(1) or http://man7.org/linux/man-pages/man1/yes.1.html
p = subprocess.Popen('yes', stdout=subprocess.PIPE)
# the following can also be done with just one line: yield from p.stdout
for line in p.stdout:
yield line
You can create a function that will procedure the data, which can then be served on the route:
def get_data():
i = 0
while i < 1000:
i += 1
return str(i)
#app.route('/')
def hello_world():
return get_data()

How can I leverage luigi for Openstack tasks

I want to use Luigi to manage workflows in Openstack. I am new to Luigi. For the starter, I just want to authenticate myself to Openstack and then fetch image list, flavor list etc using Luigi. Any help will be appreciable.
I am not good with python but I tried below code. I am also not able to list images. Error: glanceclient.exc.HTTPNotFound: The resource could not be found. (HTTP 404)
import luigi
import os_client_config
import glanceclient.v2.client as glclient
from luigi.mock import MockFile
import sys
import os
def get_credentials():
d = {}
d['username'] = 'X'
d['password'] = 'X'
d['auth_url'] = 'X'
d['tenant_name'] = 'X'
d['endpoint'] = 'X'
return d
class LookupOpenstack(luigi.Task):
d =[]
def requires(self):
pass
def output(self):
gc = glclient.Client(**get_credentials())
images = gc.images.list()
print("images", images)
for i in images:
print(i)
return MockFile("images", mirror_on_stderr=True)
def run(self):
pass
if __name__ == '__main__':
luigi.run(["--local-scheduler"], LookupOpenstack())
The general approach to this is just write python code to perform the tasks you want using the OpenStack API. https://docs.openstack.org/user-guide/sdk.html It looks like the error you are getting is addressed on the OpenStack site. https://ask.openstack.org/en/question/90071/glanceclientexchttpnotfound-the-resource-could-not-be-found-http-404/
You would then just wrap this code in luigi Tasks as appropriate- there's nothing special about doing with this OpenStack, except that you must define the output() of your luigi tasks to match up with an output that indicates the task is done. Right now it looks like the work is being done in the output() method, which should be in the run() method, the output method should just be what to look for to indicate that the run() method is complete so it doesn't run() when required by another task if it is already done.
It's really impossible to say more without understanding more details of your workflow.

need to restart python while applying Celery config

That's a small story...
I had this error:
AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'
When changed tasks.py, like Diederik said at Celery with RabbitMQ: AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'
app = Celery('tasks', backend='rpc://', broker='amqp://guest#localhost//')
ran it
>>> from tasks import add
>>> result = add.delay(4,50)
>>> result.ready()
got DisabledBackend again ... hmm what was that..
put code to file run.py and it returned True...
from tasks import add
try:
result = add.delay(1,4)
print (result.ready())
except:
print "exept"
I see that if I call >>> from tasks import add after tasks.py changed, it doesn't get the updates... That behaviour is the same for ipython, so because of I can't understand the reason, I advice people to DEBUG from scripts like ~runthis.py
Will be glad for answer which will smash my idea...
If using the interpreter, you need to
reload(tasks)
this will force reimport tasks module

Django: calling celery task declared in same module

I have a celery task that's declared on my Django project that I'm trying to call from the same module it's declared in. Right now, it looks like the following:
# myapp.admin.py
from myproject.celery import app as celery_app
#celery_app.task(name='myapp.admin.add')
def add(x, y):
time.sleep(10000)
return x + y
def my_custom_admin_action(modeladmin, request, queryset):
add.delay(2, 4)
# action later declared in the ModelAdmin
Knowing that celery sometimes is complicated with relative imports, I've specified the name. I even added the following to my settings.py:
CELERY_IMPORTS = ('myapp.admin', )
But when I try to use the admin action, I get the following message in my manage.py celeryd output:
[2014-09-18 14:58:25,413: ERROR/MainProcess] Received unregistered task of type 'myapp.admin.add'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you are using relative imports?
Please see http://bit.ly/gLye1c for more information.
Traceback (most recent call last):
File "/Users/JJ/.virtualenvs/TCJ/lib/python2.7/site-packages/celery/worker/consumer.py", line 455, in on_task_received
strategies[name](message, body,
KeyError: 'myapp.admin.add'
What am I doing wrong here? I even tried importing within the action as from . import add, but that didn't seem to help.
Celery is not picking your add task. One alternate way to solve this is to modify the instance of your Celery.
In myproject/celery.py
change instance of celery
app = Celery('name', backend='your_backend', broker='your_broker')
to
app = Celery('name', backend='your_backend', broker='your_broker',
include['myapp.admin',])

Categories

Resources