Is there something wrong with my Python code? (functions)

Is there something wrong with my Python code? (functions) - python

#tasks.py
from celery.decorators import task
#task()
def add(x, y):
add.delay(1, 9)
return x + y
>>> import tasks
>>> res = tasks.add.delay(5, 2)
>>> res.result()
7
If I run this code, I expect tasks to be continously added to the queue. But it's not! Only the first task (5,2) gets added to the queue and processed.
There should continuously be tasks being added, due to this line: "add.delay(1,9)"
Note: I need each task to execute another task.

As far as I can see, a periodic_task decorator is creating preiodic tasks, task creates just one task. And delay just executes it asynchronically.
You should just use periodic_task, instead of recursion.

add inside function body refers to original function, not its decorated version.
If you just need to run task repeatedly, use #periodic_task instead. You only need recursion if delay is different each time. In this case, subclass Task instead of using decorator and you'll be able to use recursion without a problem.

You should look at subtasks and callbacks, might give you the answer you are looking for
http://celeryproject.org/docs/userguide/tasksets.html

Related

Run a same celery task in loop

how to run this kind of celery task properly?
#app.task
def add(x)
x + 1
def some_func():
result = 'result'
for i in range(10):
task_id = uuid()
add.apply_async((i,)), task_id=task_id)
return result
I need all tasks to be performed sequentially after the previous one is completed.
I tried using time.sleep() but in this case returning result waits until all tasks are completed. But I need the result returned and all 10 tasks are running sequentially in the background.
there is a group() in celery, but it runs tasks in parallel

Finally, I solved it by using immutable signature and chain
tasks = [
add.si(x).set(task_id=uuid())
for x in range(10)
]
chain(*tasks).apply_async()

If some_func() is executed outside Celery (say a script is used as "producer" to just send those tasks to be executed), then nothing stops you from calling .get() on AsyncResult to wait for task to finish, and loop that as much as you like.
If, however, you want to execute that loop as some sort of Celery workflow, then you have to build a Chain and use it.

multiprocessing in python using pool.map_async

Hi I don't feel like I have quite understood multiprocessing in python correctly.
I want to run a function called 'run_worker' (which is simply code that runs and manages a subprocess) 20 times in parallel and wait for all the functions to complete. Each run_worker should run on a separate core/thread. I don' mind what order the processes complete hence i used async and i dont have a return value so i used map
I thought that I should use:
if __name__ == "__main__":
num_workers = 20
param_map = []
for i in range(num_workers):
param_map += [experiment_id]
pool = mp.Pool(processes= num_workers)
pool.map_async(run_worker, param_map)
pool.close()
pool.join()
However this code exits straight away and doesn't appear to execute run_worker properly. Also do I really have to create a param_map of the same experiment_id to pass to the worker because this seems like a hack to get the number of run_workers created. Ideally i would like to run a function with no parameters and no return value over multiple cores.
Note I am using windows 2019 server in AWS.
edit added run_worker which calls a subprocess which write to file:
def run_worker(experiment_id):
hostname = socket.gethostname()
experiment = conn.experiments(experiment_id).fetch()
while experiment.progress.observation_count < experiment.observation_budget:
suggestion = conn.experiments(experiment.id).suggestions().create()
value = evaluate_model(suggestion.assignments)
conn.experiments(experiment_id).observations().create(suggestion=suggestion.id,value=value,metadata=dict(hostname=hostname),)
# Update the experiment object
experiment = conn.experiments(experiment_id).fetch()

It seems that for this simple purpose you can better be using pool.map instead of pool.map_async. They both run in parallel, however pool.map is blocking until all operations are finished (see also this question). pool.map_async is especially meant for situations like this:
result = map_async(func, iterable)
while not result.ready():
// do some work while map_async is running
pass
// blocking call to get the result
out = result.get()
Regarding your question about the parameters, the fundamental idea of a map operation is to map the values of one list/array/iterable to a new list of values of the same size. As far as I can see in the docs, multiprocessing does not provide any method to run multiple functions without parameters.
If you would also share your run_worker function, that might help to get better answers to your question. That might also clear up why you would run a function without any arguments and return values using a map operation in the first place.

Celery - Iterating over tasks as they complete

Maybe I am misunderstanding something about celery but I have been stuck on this for quite a while.
I have a bunch of simple subtasks that I want to run in parallel, and I want to iterate over them as they complete rather than waiting for all of them to complete. I tried this:
def task_generator():
for row in db:
yield mytask.s(row)
from celery.result import ResultSet
r = ResultSet(t.delay() for t in task_generator())
for result in r.iterate():
print result
However celery runs all of the tasks first and the iteration begins only after all of the tasks are completed, despite the docs for ResultSet.iterate reading "Iterate over the return values of the tasks as they finish one by one."
So how do I iterate over the task results as they are completed?

I've implemented this, and have used it in the past. I think .iterate() is deprecated though, so I am working on a new solution myself.
#task
def task_one(foo, bar):
return foo + bar
subtasks = []
for x in range(10):
subtasks.append(task_one.s(x, x+1))
results = celery.Group(subtasks)() # call this
for result in results.iterate(propagate=False):
answer = result.iteritems().next()

How to schedule an event in python without multithreading?

Is it possible to schedule an event in python without multithreading?
I am trying to obtain something like scheduling a function to execute every x seconds.

Maybe sched?

You could use a combination of signal.alarm and a signal handler for SIGALRM like so to repeat the function every 5 seconds.
import signal
def handler(sig, frame):
print ("I am done this time")
signal.alarm(5) #Schedule this to happen again.
signal.signal(signal.SIGALRM, handler)
signal.alarm(5)
The other option is to use the sched module that comes along with Python but I don't know whether it uses threads or not.

Sched is probably the way to go for this, as #eumiro points out. However, if you don't want to do that, then you could do this:
import time
while 1:
#call your event
time.sleep(x) #wait for x many seconds before calling the script again

Without threading it seldom makes sense to periodically call a function. Because your main thread is blocked by waiting - it simply does nothing.
However if you really want to do so:
import time
for x in range(3):
print('Loop start')
time.sleep(2)
print('Calling some function...')
I this what you really want?

You could use celery:
Celery is an open source asynchronous
task queue/job queue based on
distributed message passing. It is
focused on real-time operation, but
supports scheduling as well.
The execution units, called tasks, are
executed concurrently on one or more
worker nodes. Tasks can execute
asynchronously (in the background) or
synchronously (wait until ready).
and a code example:
You probably want to see some code by
now, so here’s an example task adding
two numbers:
from celery.decorators import task
#task
def add(x, y):
return x + y
You can execute the task in the
background, or wait for it to finish:
>>> result = add.delay(4, 4)
>>> result.wait() # wait for and return the result 8
This is of more general use than the problem you describe requires, though.

simplifying threading in python

I am looking for a way to ease my threaded code.
There are a lot of places in my code where I do something like:
for arg in array:
t=Thread(lambda:myFunction(arg))
t.start()
i.e running the same function, each time for different parameters, in threads.
This is of course a simplified version of the real code, and usually the code inside the for loop is ~10-20 lines long, that cannot be made simple by using one auxiliary function like myFunction in the example above (had that been the case, I could've just used a thread pool).
Also, this scenario is very, very common in my code, so there are tons of lines which I consider redundant. It would help me a lot if I didn't need to handle all this boilerplate code, but instead be able to do something like:
for arg in array:
with threaded():
myFunction(arg)
i.e somehow threaded() takes every line of code inside it and runs it in a separate thread.
I know that context managers aren't supposed to be used in such situations, that it's probably a bad idea and will require an ugly hack, but nonetheless - can it be done, and how?

How about this:
for arg in array:
def _thread():
# code here
print arg
t = Thread(_thread)
t.start()
additionally, with decorators, you can sugar it up a little:
def spawn_thread(func):
t = Thread(func)
t.start()
return t
for arg in array:
#spawn_thread
def _thread():
# code here
print arg

Would a thread pool help you here? Many implementations for Python exist, for example this one.
P.S: still interested to know what your exact use-case is

What you want is a kind of "contextual thread pool".
Take a look at the ThreadPool class in this module, designed to be used similar to the manner you've given. Use would be something like this:
with ThreadPool() as pool:
for arg in array:
pool.add_thread(target=myFunction, args=[arg])
Failures in any task given to a ThreadPool will flag an error, and perform the standard error backtrace handling.

I think you're over-complicating it. This is the "pattern" I use:
# util.py
def start_thread(func, *args):
thread = threading.Thread(target=func, args=args)
thread.setDaemon(True)
thread.start()
return thread
# in another module
import util
...
for arg in array:
util.start_thread(myFunction, arg)
I don't see the big deal about having to create myFunction. You could even define the function inline with the function that starts it.
def do_stuff():
def thread_main(arg):
print "I'm a new thread with arg=%s" % arg
for arg in array:
util.start_thread(thread_main, arg)
If you're creating a large number of threads, a thread pool definitely makes more sense. You can easily make your own with the Queue and threading modules. Basically create a jobs queue, create N worker threads, give each thread a "pointer" to the queue and have them pull jobs from the queue and process them.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there something wrong with my Python code? (functions) - python

As far as I can see, a periodic_task decorator is creating preiodic tasks, task creates just one task. And delay just executes it asynchronically. You should just use periodic_task, instead of recursion.

You should look at subtasks and callbacks, might give you the answer you are looking for http://celeryproject.org/docs/userguide/tasksets.html

Related

Run a same celery task in loop

multiprocessing in python using pool.map_async

Celery - Iterating over tasks as they complete

How to schedule an event in python without multithreading?

simplifying threading in python

Categories

Resources