Best architecture for a Python command-line tool with multiple subcommands

Best architecture for a Python command-line tool with multiple subcommands - python

I am developing a command-line toolset for a project. The final tool shall support many subcommands, like so
foo command1 [--option1 [value]?]*
So there can be subcommands like
foo create --option1 value --
foo make file1 --option2 --option3
The tool uses the argparse library for handling command-line arguments and help functionality etc.
A few additional requirements and constraints:
Some options and functionality is identical for all subcommands (e.g. parsing a YAML configuration file etc.)
Some subcommands are quick and simple to code, because they e.g. just call an external bash script.
Some subcommands will be complex and hence long code.
Help for the basic tool should be available as well as for an individual subcommand:
foo help
Available commands are: make, create, add, xyz
foo help make
Details for the make subcommand
error codes should be uniform across the subcommands (like the same error code for "file not found")
For debugging purposes and for making progress with self-contained functionality for minimal viable versions, I would like to develop some subcommands as self-containted scripts and modules, like
make.py
that can be imported into the main foo.py script and later on invoked as both
make.py --option1 value etc.
and
foo.py make --option1 value
Now, my problem is: What is the best way to modularize such a complex CLI tool with minimal redundancy (e.g. the arguments definition and parsing should only be coded in one component)?
Option 1: Put everything into one big script, but that will become difficult to manage.
Option 2: Develop the functionality for a subcommand in individual modules / files (like make.py, add.py); but such must remain invocable (via if __name__ == '__main__' ...).
The functions from the subcommand modules could then be imported into the main script, and the parser and arguments from the subcommand added as a subparser.
Option 3: The main script could simply reformat the call to a subcommand to subprocess, like so
subprocess.run('./make.py {arguments}', shell=True, check=True, text=True)

I'm more used to answering questions about the details of numpy and argparse, but here's how I envisage a large package.
In a main.py:
import submod1
# ....
sublist = [submod1, ...]
def make_parser(sublist):
parser = argparse.ArgumentParser()
# parser.add_argument('-f','--foo') # main specific
# I'd avoid positionals
sp = parser.add_subparsers(dest='cmd', etc)
splist=[]
for md in sublist:
sp1 = sp.add_parser(help='', parents=[md.parser])
sp1.set_default(func=md.func) # subparser func as shown in docs
splist.append(sp1)
return parser
if name == 'main':
parser = make_parser(sublist)
args = parser.parse_args()
# print(args) # debugging display
args.func(args) # again the subparser func
In submod1.py
import argparse
def make_parser():
parser = argparse.ArgumentParser(add_help=False) # check docs?
parser.add_argument(...) # could add a common parents here
return parser
parser.make_parser()
def func(args):
# module specific 'main'
I'm sure this is incomplete in many ways, since I've written this on the fly without testing. It's a basic subparser definition as documented, but using parents to import subparsers as defined in the submodules. parents could also be used to define common arguments for subparsers; but utility functions would work just as well. I think parents is most useful when using a parser that you can't otherwise access; ie. an imported one.
parents essentially copies Actions from one parser to the new one - copy by reference (not by value or as a copy). It is not a highly developed tool, and there have been a number of SO where people ran into problems. So don't try to over extend it.

Consider using the Command Pattern along with the Factory Method Pattern.
In short, create an abstract class called Command and make each command it's own class inheriting from Command.
Example:
class Command():
def execute(self):
raise NotImplementedError()
class Command1(Command):
def __init__(self, *args):
pass
def execute(self):
pass
class Command2(Command):
def __init__(self, *args):
pass
def execute(self):
pass
This will handle execution of commands. For building, make a command factory.
class CommandFactory():
#staticmethod
def create(command, *args):
if command == 'command1':
return Command1(args)
elif command == 'command2':
return Command2(args)
Then you'd be able to execute a command with one line:
CommandFactory.create(command, args).execute()

Thanks for all of your suggestions!
I think the most elegant approach is using Typer and following this recipe:
https://typer.tiangolo.com/tutorial/subcommands/add-typer/

Related

Is it possible to send an 'OptionParser' object as input arguement to run main of imported python module?

There are two python scripts, 'script1.py' and 'script2.py'.
'script1.py' uses OptionParser to parse command line arguments.
The contents of 'script1.py' look something like this
from optparse import OptionParser
def main():
parser = OptionParser()
parser.add_option("-o", "--option1")
parser.add_option("-p", "--option2")
(opts, args) = parser.parse_args()
# Do things with the options
if __name__ == '__main__':
main()
To run it on a command line. It is run with:
python script1.py -o Option1 -p Option2
'script2.py' also uses OptionParser implemented in the same way but with a different set of options.
'script2.py' also has 'script1.py' imported as a module.
I would like to run the main of script1.py from script2.py. What is the best way to do this?
One way I got this to work is by changing the main of script1.py to take OptionParser as an arguement.
def main(OptionParser):
...
...
...
if __name__ == '__main__':
main(OptionParser)
And making sure the OptionParser for both the scripts have exactly the same options. If I do that then I can just pass the OptionParser object from script2 into script1 as follows:
script1.main(OptionParser)
Is there a way to achieve the same result without making the OptionParser in both the scripts the same.
Ideally, I would like it to work as follows:
script1.main(option1="Option1", option2="Option2")
This way I can run script1 from the command line as well as from another script.
Edit:
I'm also aware I can used subprocess and os.system() to execute the python script. I'm wondering if there are neater ways to design the interaction between the two scripts.
Edit 2:
As per Mig's suggestion I moved the option parser out of main.
scrip1.py looks as follows now
def main(option1, option2):
# Do main things
if __name__ == '__main__':
parser = OptionParser()
parser.add_option("-o", "--option1")
parser.add_option("-p", "--option2")
(opts, args) = parser.parse_args()
main(option1=opts.option1, option2=opts.option2)
Now from script2.py after importing script1.py as a module I can call main of script1 script1.main(option1="Option1", option2="Option2").

If you have functions that are supposed to work both as main script and as imported, then I would not use opt parser in it. There are many ways to do this but you can have a main that only takes care of your opt parser and then passing right arguments to the function which is really responsible for the job. Do you see what I mean?
Then in this case calling it from the command line will take the arguments from an opt parser, but if you use it as a library, then you call the function doing the job instead.
Another way to do this is which is pretty similar is to keep main as the function doing the real job, but you create the opt parser in the if __name__ == '__main__': block at the end. You build your opt parser in the this block and call main with the arguments it needs.
All in all the principle is to separate the real job from the option parsing.
I don't know all the details of your application, so it may not be the answer you are looking for, but it is quite a common thing to do in many programming languages.

How do I refactor a script that uses argparse to be callable inside another Python script?

I have a script that finds test names and is widely used in our company. It operates on the command line like so:
find_test.py --type <type> --name <testname>
Inside the script is the equivalent of:
import argparse
parser = argparse.ArgumentParser(description='Get Test Path')
parser.add_argument('--type', dest='TYPE', type=str, default=None, help="TYPE [REQUIRED]")
parser.add_argument('--name', dest='test_name', type=str, default=None, help="Test Name (Slow)")
parser.add_argument('--id', dest='test_id', type=str, default=None, help="Test ID (Fast)")
parser.add_argument('--normalise', dest='normalise', action="store_true", default=False, help="Replace '/' with '.' in Test Name")
args = parser.parse_args()
(Not sure what all these arguments do, I personally only use the first two). These lines are then proceeded by the code that uses these arguments.
I want to refactor this script so I can import it as a module, but also preserve its command line functionality - since lots of people use this script and it is also called in some of our csh scripts.
I have refactored it so far, like so:
def _main():
<all the former code that was in find_test.py>
if __name__ == "__main__":
_main()
And this still runs fine from the command line. But I don't know how then in my parent script I pass arguments with the relevant switches into this.
How do I refactor this further and then call it in parent script?
Is this possible?
I'd also rather not use docopts which i've read is the new argparse unless necessary - i.e. can't be done with argparse, since it's not installed company wide and this can be an arduous procedure.

You shouldn't just move all the code directly into a function; that doesn't help at all.
What you should do is move the code that needs to run whatever happens into a function. (And since it is the external interface, it should not begin with _.) The code that only needs to run from the command line - ie the parser stuff - should stay in the __name__ == '__main__' block, and it should pass its results to main().
So:
def main(TYPE, test_name, test_id=None, normalise=False):
# ... body of script...
if __name__ == "__main__":
parser = ...
...
args = parser.parse_args()
main(**vars(args))
(And docopt isn't the new anything; it's an external library which some people like.)

Creating a Python command line application

So I wrote a Python 3 library, which serves as an application 'backend'. Now I can sit down with the interpreter, import the source file(s), and hack around using the lib - I know how to do this.
But I would also like to build a command line 'frontent' application using the library. My library defines a few objects which have high-level commands, which should be visible by the application. Such commands may return some data structures and the high-level commands would print them nicely. In other words, the command line app would be a thin wrapper around the lib, passing her input to the library commands, and presenting results to the user.
The best example of what I'm trying to achieve would probably be Mercurial SCM, as it is written in Python and the 'hg' command does what I'm looking for - for instance, 'hg commit -m message' will find the code responsible for the 'commit' command implementation, pass the arguments from the user and do its work. On the way back, it might get some results and print them out nicely.
Is there a general way of doing it in Python? Like exposing classes/methods/functions as 'high level' commands with an annotation? Does anybody know of any tutorials?

You can do this with argparse. For example here is the start of my deploy script.
def main(argv):
"""
Entry point for the deploy script.
Arguments:
argv: All command line arguments save the name of the script.
"""
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('-v', '--verbose', action='store_true',
help='also report if files are the same')
parser.add_argument('-V', '--version', action='version',
version=__version__)
parser.add_argument('command', choices=['check', 'diff', 'install'])
fname = '.'.join(['filelist', pwd.getpwuid(os.getuid())[0]])
args = parser.parse_args(argv)
It uses an argument with choices to pick a function. You could define a dictionary mapping choices to functions;
cmds = {'check': do_check, 'diff': do_diff, 'install': do_install}
fn = cmds[args.command]
If you make sure that all the dict keys are in the command choices, you don't need to catch KeyError.

Pythonic way to set module-wide settings from external file

Some background (not mandatory, but might be nice to know): I am writing a Python command-line module which is a wrapper around latexdiff. It basically replaces all \cite{ref1, ref2, ...} commands in LaTeX files with written-out and properly formatted references before passing the files to latexdiff, so that latexdiff will properly mark changes to references in the text (otherwise, it treats the whole \cite{...} command as a single "word"). All the code is currently in a single file which can be run with python -m latexdiff-cite, and I have not yet decided how to package or distribute it. To make the script useful for anybody else, the citation formatting needs to be configurable. I have implemented an optional command-line argument -c CONFIGFILE to allow the user to point to their own JSON config file (a default file resides in the module folder and is loaded if the argument is not used).
Current implementation: My single-file command-line Python module currently parses command-line arguments in if __name__ == '__main__', and loads the config file (specified by the user in -c CONFIGFILE) here before running the main function of the program. The config variable is thus available in the entire module and all is well. However, I'm considering publishing to PyPI by following this guide which seems to require me to put the command-line parsing in a main() function, which means the config variable will not be available to the other functions unless passed down as arguments to where it's needed. This "passing down by arguments" method seems a little cluttered to me.
Question: Is there a more pythonic way to set some configuration globals in a module or otherwise accomplish what I'm trying to? (I don't want to rely on 3rd party modules.) Am I perhaps completely off the tracks in some fundamental way?

One way to do it is to have the configurations defined in a class or a simple dict:
class Config(object):
setting1 = "default_value"
setting2 = "default_value"
#staticmethod
def load_config(json_file):
""" load settings from config file """
with open(json_file) as f:
config = json.load(f)
for k, v in config.iteritems():
setattr(Config, k, v)
Then your application can access the settings via this class: Config.setting1 ...

How to require one command line action argument among several possible but exclusive?

Using argparse, is there a simple way to specify arguments which are mutually exclusive so that the application asks for one of these arguments have to be provided but only one of them?
Example of fictive use-case:
> myapp.py foo --bar
"Foo(bar) - DONE"
> myapp.py read truc.txt
"Read: truc.txt - DONE"
>myapp.py foo read
Error: use "myapp.py foo [options]" or "myapp.py read [options]" (or something similar).
> myapp.py foo truc.txt
Error: "foo" action don't need additional info.
> myapp.py read --bar
Error: "read" action don't have a "--bar" option.
My goal is to have a "driver" application(1) that would internally apply one action depending on the first command line argument and have arguments depending on the action.
So far I see no obvious ways to do this with argparse without manually processing the arguments myself, but maybe I missed something Pythonic? (I'm not a Python3 expert...yet)
I call it "driver" because it might be implemented by calling another application, like gcc does with different compilers.

What you're trying to do is actually supported quite well in Python.
See Mutual Exclusion
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument('foo', dest='foo', nargs=1)
group.add_argument('read', dest='read', nargs=1)
args = parser.parse_args()
return args

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.