Multiprocessing apply_async() not working on Ubuntu

Multiprocessing apply_async() not working on Ubuntu - python

I am running this code as a CherryPy Web Service both on Mac OS X and Ubuntu 14.04. By using multiprocessing on python3 I want to start the static method worker() in an asynchronous way, within a Process Pool.
The same code runs flawlessly on Mac OS X, in Ubuntu 14.04 worker() does not run. I.e. by debugging the code inside the POST method I am able to see that each line is executed - from
reqid = str(uuid.uuid4())
to
return handle_error(202, "Request ID: " + reqid)
Starting the same code in Ubuntu 14.04, it does not run the worker() method, not even a print() at the top of the method (which would be logged).
Here's the relevant code (I only omitted the handle_error() method):
import cherrypy
import json
from lib import get_parameters, handle_error
from multiprocessing import Pool
import os
from pymatbridge import Matlab
import requests
import shutil
import uuid
from xml.etree import ElementTree
class Schedule(object):
exposed = True
def __init__(self, mlab_path, pool):
self.mlab_path = mlab_path
self.pool = pool
def POST(self, *paths, **params):
if validate(cherrypy.request.headers):
try:
reqid = str(uuid.uuid4())
path = os.path.join("results", reqid)
os.makedirs(path)
wargs = [(self.mlab_path, reqid)]
self.pool.apply_async(Schedule.worker, wargs)
return handle_error(202, "Request ID: " + reqid)
except:
return handle_error(500, "Internal Server Error")
else:
return handle_error(401, "Unauthorized")
#### this is not executed ####
#staticmethod
def worker(args):
mlab_path, reqid = args
mlab = Matlab(executable=mlab_path)
mlab.start()
mlab.run_code("cd mlab")
mlab.run_code("sched")
a = mlab.get_variable("a")
mlab.stop()
return reqid
####
# to start the Web Service
if __name__ == "__main__":
# start Web Service with some configuration
global_conf = {
"global": {
"server.environment": "production",
"engine.autoreload.on": True,
"engine.autoreload.frequency": 5,
"server.socket_host": "0.0.0.0",
"log.screen": False,
"log.access_file": "site.log",
"log.error_file": "site.log",
"server.socket_port": 8084
}
}
cherrypy.config.update(global_conf)
conf = {
"/": {
"request.dispatch": cherrypy.dispatch.MethodDispatcher(),
"tools.encode.debug": True,
"request.show_tracebacks": False
}
}
pool = Pool(3)
cherrypy.tree.mount(Schedule('matlab', pool), "/sched", conf)
# activate signal handler
if hasattr(cherrypy.engine, "signal_handler"):
cherrypy.engine.signal_handler.subscribe()
# start serving pages
cherrypy.engine.start()
cherrypy.engine.block()

Your logic is hiding the problem from you. The apply_async method returns an AsyncResult object which acts as a handler to the asynchronous task you just scheduled. As you ignore the outcome of the scheduled task, the whole thing looks like "failing silently".
If you try to get the results from that task, you'd see the real problem.
handler = self.pool.apply_async(Schedule.worker, wargs)
handler.get()
... traceback here ...
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
In short, you must ensure the arguments you pass to the Pool are Picklable.
Instance and class methods are Picklable if the object/class they belong to is picklable as well. Static methods are not picklable because they loose the association with the object itself, therefore the pickle library cannot serialise them correctly.
As a general line, is better to avoid scheduling to multiprocessing.Pool anything different than a top level defined functions.

To run a background tasks with Cherrypy it's better if you use an asynchronous task queue manager like Celery or RQ. This services are very easy to install and run, your tasks will run in a completely separated process and if you need to scale because your load is increasing it'll be very straight forward.
You have a simple example with Cherrypy here.

I solved changing the method from #staticmethod to #classmethod. Now the job runs inside the ProcessPool. I found classmethods to be more useful in this case, as explained here.
Thanks.

Related

Using Pyramid events and multithreading

I'd like to use events subscription / notification together with multithreading. It sounds like it should just work in theory and the documentation doesn't include any warnings. The events should be synchronous, so no deferring either.
But in practice, when I notify off the main thread, nothing comes in:
def run():
logging.config.fileConfig(sys.argv[1])
with bootstrap(sys.argv[1]) as env:
get_current_registry().notify(FooEvent()) # <- works
Thread(target=thread).start() # <- doesn't work
def thread():
get_current_registry().notify(FooEvent())
Is this not expected to work? Or am I doing something wrong?
I tried also the suggested solution. It doesn't print the expected event.
class Foo:
pass
#subscriber(Foo)
def metric_report(event):
print(event)
def run():
with bootstrap(sys.argv[1]) as env:
def foo(env):
try:
with env:
get_current_registry().notify(Foo())
except Exception as e:
print(e)
t = Thread(target=foo, args=(env,))
t.start()
t.join()

get_current_registry() is trying to access the threadlocal variable Pyramid registers when processing requests or config to tell the thread what Pyramid app is currently active IN THAT THREAD. The gotcha here is that get_current_registry() always returns a registry, just not the one you want, so it's hard to see why it's not working.
When spawning a new thread, you need to register your Pyramid app as the current threadlocal. The best way to do this is with pyramid.scripting.prepare. The "easy" way is just to run bootstrap again in your thread. I'll show the "right" way though.
def run():
pyramid.paster.setup_logging(sys.argv[1])
get_current_registry().notify(FooEvent()) # doesn't work, just like in the thread
with pyramid.paster.bootstrap(sys.argv[1]) as env:
registry = env['registry']
registry.notify(FooEvent()) # works
get_current_registry().notify(FooEvent()) # works
Thread(target=thread_main, args=(env['registry'],)).start()
def thread_main(registry):
registry.notify(FooEvent()) # works, but threadlocals are not setup if other code triggered by this invokes get_current_request() or get_current_registry()
# so let's setup threadlocals
with pyramid.scripting.prepare(registry=registry) as env:
registry.notify(FooEvent()) # works
get_current_registry().notify(FooEvent()) # works
pyramid.scripting.prepare is what bootstrap uses under the hood, and is a lot more efficient than running bootstrap multiple times because it shares the registry and all of your app configuration instead of making a completely new copy of your app.

Is it just that the 'with' context applies to the Thread() create statement only and does not propogate to the thread() method. i.e. in the case that works the 'get_current_registry' call has 'with' env context, but this 'with' context will not propogate to the point where the thread runs the 'get_current_registry'. So you need to propogate the env to the thread() - perhaps by creating a simple runnable class that takes the env in the init method.
class X:
def __init__(self,env):
self.env = env
def __call__(self):
with self.env:
get_current_registry().notify(FooEvent())
return
def run():
logging.config.fileConfig(sys.argv[1])
with bootstrap(sys.argv[1]) as env:
get_current_registry().notify(FooEvent())
Thread(target=X(env)).start()

Using python multiprocessing to log into routers

I'm a bit new to coding/scripting in general and need some help implementing multiprocessing.
I currently have two functions that I'll concentrate on here. The first, def getting_routes(router):logs into all my routers (the list of routers comes from a previous function) and runs a command. The second function `def parse_paths(routes): parses the results of this command.
def get_list_of_routers
<some code>
return routers
def getting_routes(router):
routes = sh.ssh(router, "show ip route")
return routes
def parse_paths(routes):
l = routes.split("\n")
...... <more code>.....
return parsed_list
My list is roughly 50 routers long and along with subsequent parsing, takes quite a bit of time so I'd like to use the multiprocessing module to run the sshing into routers, command execution, and subsequent parsing in parallel across all routers.
I wrote:
#!/usr/bin/env python
import multiprocessing.dummy import Pool as ThreadPool
def get_list_of_routers (***this part does not need to be threaded)
<some code>
return routers
def getting_routes(router):
routes = sh.ssh(router, "show ip route")
return routes
def parse_paths(routes):
l = routes.split("\n")
...... <more code>.....
return parsed_list
if __name__ == '__main__':
worker_1 = multiprocessing.Process(target=getting_routes)
worker_2 = multiprocessing.Process(target=parse_paths)
worker_1.start()
worker_2.start()
What I'd like is for parallel sshing into a router, running the command, and returning the parsed output. I've been reading http://kmdouglass.github.io/posts/learning-pythons-multiprocessing-module.html and the multiprocessing module but am still not getting the results I need and keep getting undefined errors. Any help with what I might be missing in the multiprocessing module? Thanks in advance!

Looks like you're not sending the router parameter to the getting_routes function.
Also, I think using threads will be sufficient, you don't need to create new processes.
What you need to do is create a loop in you main block that will start a new thread for each router that is returned from the get_list_of_routers function. Then you have two options - either call the parse_paths function from within the thread, or get the return results from the threads and then call the parse_paths.
For example:
import Queue
from threading import Thread
que = Queue.Queue()
threads = []
for router in get_list_of_routers():
t = Thread(target=lambda q, arg1: q.put(getting_routers(arg1)), args=(que, router))
t.start()
threads.append(t)
for t in threads:
t.join()
results = []
while not que.empty():
results.append(que.get())
parse_paths(results)

python-daemon doesn't call the start function

I've been following the this example to implement a python daemon, and it seems to be somewhat working, but only the reconfigure function is called.
This is the code I've been using:
import signal
import daemon
import lockfile
import manager
context = daemon.DaemonContext(
working_directory='/home/debian/station',
pidfile=lockfile.FileLock('/var/run/station.pid'))
context.signal_map = {
signal.SIGTERM: manager.Manager.program_terminate,
signal.SIGHUP: 'terminate',
signal.SIGUSR1: manager.Manager.program_reload_configuration,
}
manager.Manager.program_configure()
with context:
manager.Manager.program_start()
This is the code on the manager class:
#staticmethod
def program_configure():
logging.info('Configuring program')
#staticmethod
def program_reload_configuration():
logging.info('Reloading configuration')
#staticmethod
def program_start():
global Instance
logging.info('Program started')
Instance = Manager()
Instance.run()
#staticmethod
def program_terminate():
logging.info('Terminating')
And the log shows only:
INFO:root:Configuring program
For some reason program_start() isn't being called.
program_configure() is called every time the python file is read, so that's that, but why isn't program_start() called?
I start the daemon by typing sudo service station.sh start and the line that runs the script is:
python $DAEMON start
EDIT:
After reading a bit, I've realized that the program probably exits or hangs in context.__enter__() (with calls that). But I have no clue what is causing this

The problem wasn't in the python-daemon not calling the functions, it's the logging that didn't work.
When the daemon creates a new process it doesn't transfer all file handles from the mother process - Therefore the logs aren't written. See this question for more info.
The solution to that is to use the files_preserve property like so:
# Set the logger
LOG_LEVEL = logging.DEBUG
logger = logging.getLogger()
logger.setLevel(LOG_LEVEL)
fh = logging.FileHandler(LOG_FILENAME)
logger.addHandler(fh)
# Not create the context, and notify it to preserve the log file
context = daemon.DaemonContext(
working_directory='/home/debian/station',
pidfile=lockfile.FileLock('/var/run/station.pid'),
files_preserve=[fh.stream],
)

User defined celery task class : init is getting called during import

I am trying to use celery task as a class and looking at following behavior. I guess that I missed something. Let me first tell you what I am trying to achieve :
1. Create a class with its init function which would be called only once by celery. This will setup required params for my class. I am gonna create a threadpool here.
2. Create instance of this celery task object in producer and put jobs in it.
To achieve the same I tried naive example mentioned on celery site and created a sample class. I am creating task using :
celery -c 1 -A proj worker --loglevel=debug
it seems to be working at first but then I observed that init of task is getting called at import in tester.py, I could stop this init in object usage by passing flag but init during import is a real concern here.
Can you please point me to correct usage of this example. I do not want init of task class to be called more than what I invoked using celery command. In real life scenario it would create unnecessary threads.
Also if possible, point me to right an example which is closest to my requirement mentioned above.
celery.py
from __future__ import absolute_import
from celery import Celery
app = Celery('proj',
broker='amqp://',
backend='amqp://',
include=['proj.tasks'])
# Optional configuration, see the application user guide.
app.conf.update(
CELERY_TASK_RESULT_EXPIRES=3600,
)
if __name__ == '__main__':
app.start()
tasks.py
from __future__ import absolute_import
from proj.celery import app
class NaiveAuthenticateServer(app.Task):
def __init__(self, celeryInst = 1):
if celeryInst == 1:
print "Hi, I am celery instant"
else :
print "Did you invoke me from command"
self.users = {'george': 'password'}
def run(self, username, password):
try:
return self.users[username] == password
except KeyError:
return False
tester.py
from proj import tasks
obj = tasks.NaiveAuthenticateServer(0)
res = obj.delay('hi', 'hello')
print res.get()
o/p of tester.py
Hi, I am celery instant
Did you invoke me from command
False

You should not create an instance of the task class yourself, but rather let celery do that for you automatically when the process starts up.
Therefore, you need to define a task function that uses the base class:
#app.task(base=NaiveAuthenticateServer)
def my_task(arg1, arg2):
print arg1, arg2
And then submit the task like this:
from proj import tasks
tasks.my_task.delay('hi', 'hello')

Celery not running with flask application

I'm using celery in my flask application but celery(3.1.8).This is my configuration with the flask application
celery.py
from __future__ import absolute_import
from celery import Celery
from cuewords.settings import CELERY_BROKER_URL,CELERY_RESULT_BACKEND
app = Celery('proj',
broker=CELERY_BROKER_URL,
backend=CELERY_RESULT_BACKEND)
app.conf.update(CELERY_TASK_RESULT_EXPIRES=3600)
if __name__ == '__main__':
app.start()
setting.py
CELERY_BROKER_URL='redis://localhost:6379/0'
CELERY_RESULT_BACKEND='redis://localhost:6379/0'
BROKER_TRANSPORT = 'redis'
api.py
class Webcontent(Resource):
def post(self,session=session):
args = self.parser.parse_args()
site_url = args["url"]
url_present=Websitecontent.site_url_present(session,site_url)
if site_url.strip() != "" and not url_present:
try:
#add data and commit
session.commit()
websitecontent=Websitecontent(params*)
websitecontent.update_url(id,session)
except:
session.rollback()
raise
finally:
session.close()
else:
return "No data created / data already present"
And in my model i'm adding a method to task
model.py
from cuewords.celery import app
class Websitecontent(Base):
#app.task(name='update_url')
def update_url(self,id,session):
...code goes here..
And this how i run the celery from command prompt
celery -A cuewords.celery worker
And i also using flower to monitor the task i can see a worker running but i couldn't see any task its empty .Any idea what im missing or doing wrong ..
Thanks

The problem is that your tasks never get imported into the Python runtime when running the worker(s). The celery command is your entry point. And you're telling Celery to import your cuewords.celery module because thats where you're app instance resides. However, this is where the chain of events ends and no further Python code is imported.
Now, the most common mistake is to import the tasks into the same module as the Celery app instance. Unfortunately this will result in two modules trying to import things from each other and will result in a circular import error. This is no good.
To get around this one could import the task functions into the Celery app module and register them without using the decorator style. For example:
from celery import Celery
from models import my_task
app = Celery()
app.task(name='my_task')(my_task)
This would remove the need to import the app instance in your model module.
However, you're using method tasks. Method tasks need to be treated differently than function tasks as noted here: http://docs.celeryproject.org/en/latest/reference/celery.contrib.methods.html. Method tasks are different from function tasks, because they are associated with an instance of an object. In other words, the function is a class function. So to use the previous style of registering tasks, you'd need an instance of the class first. To get around this you should consider making your tasks functions instead of methods.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing apply_async() not working on Ubuntu - python

I solved changing the method from #staticmethod to #classmethod. Now the job runs inside the ProcessPool. I found classmethods to be more useful in this case, as explained here. Thanks.

Related

Using Pyramid events and multithreading

Using python multiprocessing to log into routers

python-daemon doesn't call the start function

User defined celery task class : init is getting called during import

Celery not running with flask application

Categories

Resources