Pluggable Python sub-command pattern?

Pluggable Python sub-command pattern? - python

I'm looking for a good pattern for how to implement Python sub-commands, where the main command looks up the subcommand at run time (instead of knowing the list of all possible sub-commands; this allows the "application" to be easily extended with new sub-commands without having to change the main code)
E.g:
topcmd.py foo
will look in /some/dir for foo.py and if it exists, run it. Or some variation of it.
The code invoked in foo.py should preferably be a well-defined function or method on a class or object.

While this question is actually quite broad, there are sufficient tools available within a typical default Python installation (i.e. with setuptools) that this is relatively achievable, in a way that is actually extensible so that other packages can be created/installed in a manner that provide new, discoverable subcommands for your main program.
Your base package can provide a standard entry_point in the form of console_scripts that point to your entry point that will feed all arguments into an instance of some argument parser (such as argparse), and some kind of registry which you might implement under a similar scheme as console_scripts, except under your specific entry_points group so that it would iterate through every entry and instantiate the objects that would also provide their own ArgumentParser instances which your main entry point would dynamically register to itself as a subcommand, thus showing your users what subcommands are actually available and what their invocation might be like.
To provide an example, in your main package's setup.py, you might have an entry like
setup(
name='my.package',
# ...
entry_points={
'console_scripts': [
'topcmd = my.package.runtime:main',
],
'my.package.subcmd': [
'subcmd1 = my.package.commands:subprog1',
'subcmd2 = my.package.commands:subprog2',
],
},
# ...
)
Inside the my/package/runtime.py source file, a main method will have to construct a new ArgumentParser instance, and while iterating through the entry points provided by pkg_resources.working_set, for example:
from pkg_resources import working_set
def init_parser(argparser): # pass in the argparser provided by main
commands = argparser.add_subparsers(dest='command')
for entry_point in working_set.iter_entry_points('my.package.subcmd'):
subparser = commands.add_parser(entry_point.name)
# load can raise exception due to missing imports or error in object creation
subcommand = entry_point.load()
subcommand.init_parser(subparser)
So in the main function, the argparser instance it created could be passed into a function like one in above, and the entry point 'subcmd1 = my.package.commands:subprog1' will be loaded. Inside my/package/command.py, an implemented init_parser method must be provided, which will take the provided subparser and populate it with the required arguments:
class SubProgram1(object):
def init_parser(self, argparser)
argparser.add_argument(...)
subprog1 = SubProgram1()
Oh, one final thing, after passing in the arguments to the main argparser.parse_args(...), the name of the command is provided to argparser.command. It should be possible to change that to the actual instance, but that may or may not achieve what you exactly want (because the main program might want to do further work/validation before actually using the command). That part is another complicated part, but at least the argument parser should contain the information required to actually run the correct subprogram.
Naturally, this includes absolutely no error checking, and it must be implemented in some form to prevent faulty subcommand classes from blowing up the main program. I have made use of a pattern like this one (albeit with a lot more complex implementation) that can support an arbitrary amount of nested subcommand. Also packages that want to implement custom commands can simply add their own entry to the entry point group (in this case, to my.package.subcmd) for their own setup.py. For example:
setup(
name="some.other.package",
# ...
entry_points={
'my.package.subcmd': [
'extracmd = some.other.package.commands:extracmd',
],
},
# ...
)
Addendum:
As requested, an actual implementation that's used in production is in a package (calmjs) that I currently maintain. Installing that package (into a virtualenv) and running calmjs on the command line should show a listing of subcommands identical to the entries defined in the main package's entry points. Installing an additional package that extends the functionality (such as calmjs.webpack) and running calmjs again will now list calmjs.webpack as an additional subcommand.
The entry points references instances of subclasses to the Runtime class, and in it there is a place where the subparser is added and if satisfies registration requirements (many statements following that relate to various error/sanity checking, such as what to do when multiple packages define the same subcommand name for runtime instances, amongst other things), registered to the argparser instance on that particular runtime instance, and the subparser is passed into the init_argparser method of the runtime that encapsulates the subcommand. As an example, the calmjs webpack subcommand subparser is set up by its init_argparser method, and that package registers the webpack subcommand in its own setup.py. (To play with them, please just simply use pip to install the relevant packages).

You can use the __import__ function to dynamically import a module using a string name passed on the command line.
mod = sys.argv[1]
command =__import__(mod)
# assuming your pattern has a run method defined.
command.run()
Error handling, etc left as an exercise for the reader
Edit: This would depend on user plugins being installed via pip. If you want users to drop plugins into a folder without installing, then you would have to add that folder to your python path.

The simplest answer seems to be, if all my commands are in foo.commands:
import foo.commands
import importlib
for importer, modname, ispkg in pkgutil.iter_modules(foo.commands.__path__):
mod=importlib.import_module('foo.commands.' + cmd)
mod.run()
This will run all the sub-commands. (Well, in the real code I will run just one. This is the howto.)

Related

How do I load a Python module with custom globals using importlib?

I'm trying to put together a small build system in Python that generates Ninja files for my C++ project. Its behavior should be similar to CMake; that is, a bldfile.py script defines rules and targets and optionally recurses into one or more directories by calling bld.subdir(). Each bldfile.py script has a corresponding bld.File object. When the bldfile.py script is executing, the bld global should be predefined as that file's bld.File instance, but only in that module's scope.
Additionally, I would like to take advantage of Python's bytecode caching somehow, but the .pyc file should be stored in the build output directory instead of in a __pycache__ directory alongside the bldfile.py script.
I know I should use importlib (requiring Python 3.4+ is fine), but I'm not sure how to:
Load and execute a module file with custom globals.
Re-use the bytecode caching infrastructure.
Any help would be greatly appreciated!

Injecting globals into a module before execution is an interesting idea. However, I think it conflicts with several points of the Zen of Python. In particular, it requires writing code in the module that depends on global values which are not explicitly defined, imported, or otherwise obtained - unless you know the particular procedure required to call the module.
This may be an obvious or slick solution for the specific use case but it is not very intuitive. In general, (Python) code should be explicit. Therefore, I would go for a solution where parameters are explicitly passed to the executing code. Sounds like functions? Right:
bldfile.py
def exec(bld):
print('Working with bld:', bld)
# ...
calling the module:
# set bld
# Option 1: static import
import bldfile
bldfile.exec(bld)
# Option 2: dynamic import if bldfile.py is located dynamically
import importlib.util
spec = importlib.util.spec_from_file_location("unique_name", "subdir/subsubdir/bldfile.py")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
module.exec(bld)
That way no code (apart from the function definition) is executed when importing the module. The exec function needs to be called explicitly and when looking at the code inside exec it is clear where bld comes from.

I studied importlib's source code and since I don't intend to make a reusable Loader, it seems like a lot of unnecessary complexity. So I just settled on creating a module with types.ModuleType, adding bld to the module's __dict__, compiling and caching the bytecode with compile, and executing the module with exec. At a low level, that's basically all importutil does anyway.

It is possible to overcome the lack of possibility by using dummy module, which would load its globals .
#service.py
module = importlib.import_module('userset')
module.user = user
module = importlib.import_module('config')
#config.py
from userset import *
#now you can use user from service.py

Efficient way to pass system properties in python 3

I'm using python 3 for an application. For that utility, I need to pass command line arguments as following,
python3 -m com.xxx.executor -Denvironment=dev -Dtoggle=False
Both the parameter environment and toggle are present in a property file too. If the value is specified in command line, it should override what is present on property file.
I'm basically a java guy and in java, the properties passed in the form -Dkey=value will be set as system property. Then these properties can be read from code as System.getProperty(key, defaultVal).
But when I try the same in python 3, it didn't work.
After referring python docs, it seems to me like the sys._xoptions are suitable for my requirement.
python3 -Xenvironment=dev -Xtoggle=False -m com.xxx.executor
Then read the properties using, sys._xoptions
I'm using Cpython. The aim of the thread is to ensure that, the way I'm proceeding is right or not. Or is there any other better ways in python to implement the same.
Python veterans, please guide !

For argument parsing, I use the argparse module (docs) to define which are valid named and/or positional arguments.
There are third-party modules as well such as click and docopt. You should use what you feel most comfortable with and whether or not you can use third-party modules. The click documentation contains a (possibly biased) comparison between it, argparse and docopt.
I've never used sys._xoptions, nor did I know of its existence. Seems a bit strange that a function starting with an underscore (normally used to indicate a "private" function) is mentioned in the docs. Perhaps someone else can shed some light on this.
For the parsing of a property file, I use the configparser module (docs). Of course, you could opt for a JSON or YAML config file if you'd prefer that.
That said, you'll have to come up with the necessary code to overrule properties when specified as arguments (though that shouldn't be too difficult).

From the docs on -X args
Reserved for various implementation-specific options. CPython currently defines the following possible values:
That means you probably shouldn't be hijacking these for your own purposes. As Kristof mentioned, argparse is a pretty reasonable choice. Since you want both a file and command line arguments, here's a quick example using a json file-based config:
import json
import argparse
argparser = argparse.ArgumentParser()
argparser.add_argument('--environment')
argparser.add_argument('--toggle', action='store_true')
try:
with open('config.json') as f:
args = json.load(f)
except (IOError, ValueError) as e:
# If the file doesn't exist or has invalid JSON
args = {}
args.update(vars(argparser.parse_args()))
print(args)
There are other possible alternatives for the file-based config, like the configparser module.

in python unit test using vcr, can we use one function to generate different cassettes files?

vcrpy is the python record/play package, below is the common way from the guideline
class TestCloudAPI(unittest.TestCase):
def test_get_api_token(self):
with vcr.use_cassette('fixtures/vcr_cassettes/test_get_api_token.yaml'):
# real request and testing
def test_container_lifecycle(self):
with vcr.use_cassette('fixtures/vcr_cassettes/test_container_lifecycle.yaml'):
I want to have different record files, so I have to repeat this in every method.
Is it possible to have one line somewhere to simplify this like:
TEST_CASE_VCR(USE_METHOD_AS_FILENAME)
class TestCloudAPI(unittest.TestCase):
def test_get_api_token(self):
# real request and testing
def test_container_lifecycle(self):

This is now supported in newer versions of vcrpy by omitting the cassette name altogether. From the documentation:
VCR.py now allows the omission of the path argument to the use_cassette function. Both of the following are now legal/should work
#my_vcr.use_cassette
def my_test_function():
...
In both cases, VCR.py will use a path that is generated from the provided test function’s name. If no
cassette_library_dir has been set, the cassette will be in a file with the name of the test function in directory
of the file in which the test function is declared. If a cassette_library_dir has been set, the cassette
will appear in that directory in a file with the name of the decorated function.
It is possible to control the path produced by the automatic naming machinery by customizing the
path_transformer and func_path_generator vcr variables

There isn't a feature to do this currently built in to VCR, but you can make your own. Check out the decorator that Venmo created.

This gets a lot easier with vcrpy-unittest which is--as you might guess--integration between vcrpy and unittest.
Your example becomes this:
from vcr_unittest import VCRTestCase
class TestCloudAPI(VCRTestCase):
def test_get_api_token(self):
# real request and testing
def test_container_lifecycle(self):
# real request and testing
and the cassettes are automatically named according to the test and saved in a cassettes dir alongside the test file. For example, this would create two files: cassettes/TestCloudAPI.test_get_api_token.yaml and cassettes/TestCloudAPI.test_container_lifecycle.yaml.
The directory and naming can be customized by overriding a couple methods: _get_cassette_library_dir and _get_cassette_name but it's probably not necessary.
vcrpy-unittest is on github at https://github.com/agriffis/vcrpy-unittest and PyPI at https://pypi.python.org/pypi/vcrpy-unittest

Making something like plugin system

I want to make something like plugin system but can't make it working. To be specific I have some requirements.
I have main script who should search for other python scripts in ./plugins dir and load them.
This main script is searching for classes who inherits from Base using globals()
If I place these classes in the same main file it works very well but I can't get it worked as I want.
Is it possible to do this in Python?
I try to make some like this:
source: plugins/test.py
class SomeClass(Base):
def __init__(self):
self.name = "Name of plugin"
Main script just execute some methods on this class.

You could either import the python file dynamically or use the exec statement (make sure to define a context to execute in, otherwise the context you use the statement in will be used). Then use Base.__subclasses__, assuming Base being a new-style class, or call a function from the imported plugin module. In the latter case, you must provide a plugin-registration mechanism.

Use http://docs.python.org/2/library/imp.html#imp.load_module
For py3 I think there is importlib but I don't know how to use that one offhand.

Try importing the modules using imp -- imp.loadmodule will let you create namespace names dynamically if you need to. Then you can use inspect.getmembers() and inspect.is_class() to find the classes in your imported module (example code in this answer) to find all the clases defined there. Test those for being subclasses of your plugin.
...or, more pythonically, just use hasattr to find out if the imported classes 'quack like a duck' (ie, have the methods you expect from your plugin).
PS - I'm assuming you're asking for python 2.x. Good idea to tag the post with version # in future.

How do I add a setuptools entry_point as an example in my main project?

I want to make my program pluggable. I'd like to use the setuptools way of doing so, using eggs.
I've been able to make a plugin to provide an alternate class for some functionality, and I can use it.
I would like to select the class to use at runtime; either the one in my core module or in any of the plugins. I'd like to use the pkg_resources way of querying for these classes:
for entrypoint in pkg_resources.iter_entry_points("myapp.myclasses"):
How can I create an EntryPoint object for my class in core, and register it, so that iter_entry_points will return it the same way as for my .egg plugin class ?

pkg_resources.iter_entry_points lists any entry points by the given name in any egg, including your own package. Thus, if your entry_points entry in setup.py lists the following, and you've run setup.py develop to generate the metadata, your own entry point will be included:
[myapp.myclasses]
classentry1 = myapp.mymodule:myclassname1
classentry2 = myapp.mymodule:myclassname2
The Babel project does exactly this; in it's setup.py it lists entry points for both babel.checkers and babel.extractors, and these are looked up by babel.messages.checkers:_find_checkers and babel.messages.extract:extract, respectively.
If you do not want to have a setup.py file (which is easy enough to create and/or generate from a template), then you are facing having to alter the internal state of pkg_resources.working_set instead:
working_set.entries is a list of eggs. You'll have to add the path of your project's top-level directory to this.
working_set.entry_keys is a mapping from paths in entries to a list of package names. Add your project as `working_set.entry_keys[path] = ['package.name']
working_set.by_key is a mapping from package name to a pkg_resources.Distribution instances. You'll need to create a Distribution instance and store it under your package name: working_set.by_key['package.name'] = yourdistribution.
For your purposes the Distribution instance can be fairly sparse, but I'd include at least the project name. You'll need to have a entry-point map on it though:
yourdistribution = Distribution(project_name='package.name')
yourdistribution._ep_map = {'myapp.myclasses', {
'classentry1': entrypointinstance_for_classentry1,
'classentry2': entrypointinstance_for_classentry2,
}}
The internal structure _ep_map is normally parsed on demand from the egg-info metadata.
Please note that this is relying entirely on undocumented internal data structures that can change between versions. In other words, you are on your own here. I'd generate a setup.py file instead, and run python setup.py develop to generate the egg metadata instead.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pluggable Python sub-command pattern? - python

Related

How do I load a Python module with custom globals using importlib?

Efficient way to pass system properties in python 3

in python unit test using vcr, can we use one function to generate different cassettes files?

Making something like plugin system

How do I add a setuptools entry_point as an example in my main project?

Categories

Resources