Using Pyramid events and multithreading

Using Pyramid events and multithreading - python

I'd like to use events subscription / notification together with multithreading. It sounds like it should just work in theory and the documentation doesn't include any warnings. The events should be synchronous, so no deferring either.
But in practice, when I notify off the main thread, nothing comes in:
def run():
logging.config.fileConfig(sys.argv[1])
with bootstrap(sys.argv[1]) as env:
get_current_registry().notify(FooEvent()) # <- works
Thread(target=thread).start() # <- doesn't work
def thread():
get_current_registry().notify(FooEvent())
Is this not expected to work? Or am I doing something wrong?
I tried also the suggested solution. It doesn't print the expected event.
class Foo:
pass
#subscriber(Foo)
def metric_report(event):
print(event)
def run():
with bootstrap(sys.argv[1]) as env:
def foo(env):
try:
with env:
get_current_registry().notify(Foo())
except Exception as e:
print(e)
t = Thread(target=foo, args=(env,))
t.start()
t.join()

get_current_registry() is trying to access the threadlocal variable Pyramid registers when processing requests or config to tell the thread what Pyramid app is currently active IN THAT THREAD. The gotcha here is that get_current_registry() always returns a registry, just not the one you want, so it's hard to see why it's not working.
When spawning a new thread, you need to register your Pyramid app as the current threadlocal. The best way to do this is with pyramid.scripting.prepare. The "easy" way is just to run bootstrap again in your thread. I'll show the "right" way though.
def run():
pyramid.paster.setup_logging(sys.argv[1])
get_current_registry().notify(FooEvent()) # doesn't work, just like in the thread
with pyramid.paster.bootstrap(sys.argv[1]) as env:
registry = env['registry']
registry.notify(FooEvent()) # works
get_current_registry().notify(FooEvent()) # works
Thread(target=thread_main, args=(env['registry'],)).start()
def thread_main(registry):
registry.notify(FooEvent()) # works, but threadlocals are not setup if other code triggered by this invokes get_current_request() or get_current_registry()
# so let's setup threadlocals
with pyramid.scripting.prepare(registry=registry) as env:
registry.notify(FooEvent()) # works
get_current_registry().notify(FooEvent()) # works
pyramid.scripting.prepare is what bootstrap uses under the hood, and is a lot more efficient than running bootstrap multiple times because it shares the registry and all of your app configuration instead of making a completely new copy of your app.

Is it just that the 'with' context applies to the Thread() create statement only and does not propogate to the thread() method. i.e. in the case that works the 'get_current_registry' call has 'with' env context, but this 'with' context will not propogate to the point where the thread runs the 'get_current_registry'. So you need to propogate the env to the thread() - perhaps by creating a simple runnable class that takes the env in the init method.
class X:
def __init__(self,env):
self.env = env
def __call__(self):
with self.env:
get_current_registry().notify(FooEvent())
return
def run():
logging.config.fileConfig(sys.argv[1])
with bootstrap(sys.argv[1]) as env:
get_current_registry().notify(FooEvent())
Thread(target=X(env)).start()

Related

How to refactor patches and mocks into a custom decorator for a pytest?

I have a working PyTest here with multiple patch and fixture like the code in here
I want to wrap the setup of the test case including the patch into a single decorator so other test cases in other repo can use it.
Ideally my vision of the code with the decorator would look like below:
import pytest
from module.dummy_microservice import DummyMicroservice
#my_custom_decorator(microservice=DummyMicroservice, ini_file="tests/Microservice/appsettings.ini")
def test_receive_message(test_resource: Resources):
# ARRANGE
Resources.start()
# ACT
Resources.transport...
# ASSERT
class Resources:
def __init__(self, microservice: Microservice, transport:Transport) -> None:
self.microservice = microservice
self.transport = transport
def start(self, delay=0.05):
t1 = Thread(target=self.microservice.start)
t1.setDaemon(True) # run the thread in the background
t1.start()
sleep(0.05)
Where the setup of the test & generation of Resources happen behind the scene in the my_custom_decorator
I can pass the DummyMicroservice to the microservice parameter, so the my_custom_decorator can create a DummyMicroservice instance
I am new to Python and I am not sure where to start for this issue.
Can it be done in Python? If yes, can you show me some example?

Firing Event Hooks when running Locust as a library

I am trying to perform load test using Locust library for an API endpoint. Here, i am running Locust as a library instead of using locust command. I am trying to perform global setup and global teardown so that a global state is created initially which is used by all the users and then later cleared on teardown(Eg. Downloading S3 files once and then removing it at end).
There are built-in event hooks to add this functionality like init and quitting which can be used when running the locustfile using locust command. But, I am unable to trigger these events when running it as a library. Based on the Locust's source code, I can check that these events are fired in locust main.py file but it's not called when running as a library.
How to add such events when running it as a library? I have tried with the below 2 approaches. Is adding event listener and manually calling event.fire() a correct approach or directly creating and calling custom methods for it instead of using events is a better approach?
In general, should init and quitting events be used for setting a global state initially and then clearing at end or test_start and test_stop events can also be used in its place?
Source Code for reference:
Approach - 1 (Using event hooks)
import gevent
from locust import HttpUser, task, between
from locust.env import Environment
from locust.stats import stats_printer, stats_history
from locust.log import setup_logging
from locust import events
setup_logging("INFO", None)
def on_init(environment, **kwargs):
print("Perform global setup to create a global state")
def on_quit(environment, **kwargs):
print('Perform global teardown to clear the global state')
events.quitting.add_listener(on_quit)
events.init.add_listener(on_init)
class User(HttpUser):
wait_time = between(1, 3)
host = "https://docs.locust.io"
#tas
def my_task(self):
self.client.get("/")
#task
def task_404(self):
self.client.get("/non-existing-path")
# setup Environment and Runner
env = Environment(user_classes=[User], events=events)
runner = env.create_local_runner()
### Fire init event and environment and local runner have been instantiated
env.events.init.fire(environment=env, runner=runner) # Is it correct approach?
# start a WebUI instance
env.create_web_ui("127.0.0.1", 8089)
# start a greenlet that periodically outputs the current stats
gevent.spawn(stats_printer(env.stats))
# start a greenlet that save current stats to history
gevent.spawn(stats_history, env.runner)
# start the test
env.runner.start(1, spawn_rate=10)
# in 5 seconds stop the runner
gevent.spawn_later(5, lambda: env.runner.quit())
# wait for the greenlets
env.runner.greenlet.join()
### Fire quitting event when locust process is exiting
env.events.quitting.fire(environment=env, reverse=True) # Is it correct approach?
# stop the web server for good measures
env.web_ui.stop()
Approach - 2 (Creating custom methods and calling these directly)
import gevent
from locust import HttpUser, task, between
from locust.env import Environment
from locust.stats import stats_printer, stats_history
from locust.log import setup_logging
from locust import events
setup_logging("INFO", None)
class User(HttpUser):
wait_time = between(1, 3)
host = "https://docs.locust.io"
#classmethod
def perform_global_setup(cls):
print("Perform global setup to create a global state")
#classmethod
def perform_global_teardown(cls):
print('Perform global teardown to clear the global state')
#task
def my_task(self):
self.client.get("/")
#task
def task_404(self):
self.client.get("/non-existing-path")
# setup Environment and Runner
env = Environment(user_classes=[User])
runner = env.create_local_runner()
### Perform global setup
for cls in env.user_classes:
cls.perform_global_setup() # Is it correct approach?
# start a WebUI instance
env.create_web_ui("127.0.0.1", 8089)
# start a greenlet that periodically outputs the current stats
gevent.spawn(stats_printer(env.stats))
# start a greenlet that save current stats to history
gevent.spawn(stats_history, env.runner)
# start the test
env.runner.start(1, spawn_rate=10)
# in 5 seconds stop the runner
gevent.spawn_later(5, lambda: env.runner.quit())
# wait for the greenlets
env.runner.greenlet.join()
### Perform global teardown
for cls in env.user_classes:
cls.perform_global_teardown() # Is it correct approach?
# stop the web server for good measures
env.web_ui.stop()

Both approaches are fine. Using event hooks makes more sense if you think you might want to run in the normal (not as-a-library) way in the future, but if that is unlikely to happen then choose the approach that feels most natural to you.
init/quitting only differ from test_start/stop in a meaningful way when doing multiple runs in gui mode (where test_start/stop may happen multiple times). Use the one that is appropriate for what you are doing in the event handler, there is no other guideline.

Make this code non blocking

I'm using the VSphere API, here are the lines that I'm dealing with:
task = vm.PowerOff()
while task.info.state not in [vim.TaskInfo.State.success, vim.TaskInfo.State.error]:
time.sleep(1)
log.info("task {} is running".format(task))
log.ingo("task {} is done".format(task))
The problem here is that this blocks the execution completely whilst the task is not finished. I would like the logging part to be ran "in parallel", so I can start other tasks.
I thought about creating a function that would accept a task as parameter, and poll the info.state attribute just like now, but how do I make this non blocking ?
EDIT: I'm using Python 2.7

You could use asyncio and create an event loop. You can use asyncio.async() to create an asynchronous task that won't block the event loop execution.

Here is an example of using the threading module:
import threading
class VMShutdownThread(threading.Thread):
def __init__(self, vm):
self.vm = vm
def run(self):
task = vm.PowerOff()
while task.info.state not in [vim.TaskInfo.State.success, vim.TaskInfo.State.error]:
time.sleep(1)
log.info("task {} is running".format(task))
log.info("task {} is done".format(task))
vm_shutdown_thread = VMShutdownThread(vm)
vm_shutdown_thread.start()
If you create a logger, you can configure it to print the thread name.

Multiprocessing apply_async() not working on Ubuntu

I am running this code as a CherryPy Web Service both on Mac OS X and Ubuntu 14.04. By using multiprocessing on python3 I want to start the static method worker() in an asynchronous way, within a Process Pool.
The same code runs flawlessly on Mac OS X, in Ubuntu 14.04 worker() does not run. I.e. by debugging the code inside the POST method I am able to see that each line is executed - from
reqid = str(uuid.uuid4())
to
return handle_error(202, "Request ID: " + reqid)
Starting the same code in Ubuntu 14.04, it does not run the worker() method, not even a print() at the top of the method (which would be logged).
Here's the relevant code (I only omitted the handle_error() method):
import cherrypy
import json
from lib import get_parameters, handle_error
from multiprocessing import Pool
import os
from pymatbridge import Matlab
import requests
import shutil
import uuid
from xml.etree import ElementTree
class Schedule(object):
exposed = True
def __init__(self, mlab_path, pool):
self.mlab_path = mlab_path
self.pool = pool
def POST(self, *paths, **params):
if validate(cherrypy.request.headers):
try:
reqid = str(uuid.uuid4())
path = os.path.join("results", reqid)
os.makedirs(path)
wargs = [(self.mlab_path, reqid)]
self.pool.apply_async(Schedule.worker, wargs)
return handle_error(202, "Request ID: " + reqid)
except:
return handle_error(500, "Internal Server Error")
else:
return handle_error(401, "Unauthorized")
#### this is not executed ####
#staticmethod
def worker(args):
mlab_path, reqid = args
mlab = Matlab(executable=mlab_path)
mlab.start()
mlab.run_code("cd mlab")
mlab.run_code("sched")
a = mlab.get_variable("a")
mlab.stop()
return reqid
####
# to start the Web Service
if __name__ == "__main__":
# start Web Service with some configuration
global_conf = {
"global": {
"server.environment": "production",
"engine.autoreload.on": True,
"engine.autoreload.frequency": 5,
"server.socket_host": "0.0.0.0",
"log.screen": False,
"log.access_file": "site.log",
"log.error_file": "site.log",
"server.socket_port": 8084
}
}
cherrypy.config.update(global_conf)
conf = {
"/": {
"request.dispatch": cherrypy.dispatch.MethodDispatcher(),
"tools.encode.debug": True,
"request.show_tracebacks": False
}
}
pool = Pool(3)
cherrypy.tree.mount(Schedule('matlab', pool), "/sched", conf)
# activate signal handler
if hasattr(cherrypy.engine, "signal_handler"):
cherrypy.engine.signal_handler.subscribe()
# start serving pages
cherrypy.engine.start()
cherrypy.engine.block()

Your logic is hiding the problem from you. The apply_async method returns an AsyncResult object which acts as a handler to the asynchronous task you just scheduled. As you ignore the outcome of the scheduled task, the whole thing looks like "failing silently".
If you try to get the results from that task, you'd see the real problem.
handler = self.pool.apply_async(Schedule.worker, wargs)
handler.get()
... traceback here ...
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
In short, you must ensure the arguments you pass to the Pool are Picklable.
Instance and class methods are Picklable if the object/class they belong to is picklable as well. Static methods are not picklable because they loose the association with the object itself, therefore the pickle library cannot serialise them correctly.
As a general line, is better to avoid scheduling to multiprocessing.Pool anything different than a top level defined functions.

To run a background tasks with Cherrypy it's better if you use an asynchronous task queue manager like Celery or RQ. This services are very easy to install and run, your tasks will run in a completely separated process and if you need to scale because your load is increasing it'll be very straight forward.
You have a simple example with Cherrypy here.

I solved changing the method from #staticmethod to #classmethod. Now the job runs inside the ProcessPool. I found classmethods to be more useful in this case, as explained here.
Thanks.

python-daemon doesn't call the start function

I've been following the this example to implement a python daemon, and it seems to be somewhat working, but only the reconfigure function is called.
This is the code I've been using:
import signal
import daemon
import lockfile
import manager
context = daemon.DaemonContext(
working_directory='/home/debian/station',
pidfile=lockfile.FileLock('/var/run/station.pid'))
context.signal_map = {
signal.SIGTERM: manager.Manager.program_terminate,
signal.SIGHUP: 'terminate',
signal.SIGUSR1: manager.Manager.program_reload_configuration,
}
manager.Manager.program_configure()
with context:
manager.Manager.program_start()
This is the code on the manager class:
#staticmethod
def program_configure():
logging.info('Configuring program')
#staticmethod
def program_reload_configuration():
logging.info('Reloading configuration')
#staticmethod
def program_start():
global Instance
logging.info('Program started')
Instance = Manager()
Instance.run()
#staticmethod
def program_terminate():
logging.info('Terminating')
And the log shows only:
INFO:root:Configuring program
For some reason program_start() isn't being called.
program_configure() is called every time the python file is read, so that's that, but why isn't program_start() called?
I start the daemon by typing sudo service station.sh start and the line that runs the script is:
python $DAEMON start
EDIT:
After reading a bit, I've realized that the program probably exits or hangs in context.__enter__() (with calls that). But I have no clue what is causing this

The problem wasn't in the python-daemon not calling the functions, it's the logging that didn't work.
When the daemon creates a new process it doesn't transfer all file handles from the mother process - Therefore the logs aren't written. See this question for more info.
The solution to that is to use the files_preserve property like so:
# Set the logger
LOG_LEVEL = logging.DEBUG
logger = logging.getLogger()
logger.setLevel(LOG_LEVEL)
fh = logging.FileHandler(LOG_FILENAME)
logger.addHandler(fh)
# Not create the context, and notify it to preserve the log file
context = daemon.DaemonContext(
working_directory='/home/debian/station',
pidfile=lockfile.FileLock('/var/run/station.pid'),
files_preserve=[fh.stream],
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Pyramid events and multithreading - python

Related

How to refactor patches and mocks into a custom decorator for a pytest?

Firing Event Hooks when running Locust as a library

Make this code non blocking

Multiprocessing apply_async() not working on Ubuntu

python-daemon doesn't call the start function

Categories

Resources