Efficient way to pass system properties in python 3

Efficient way to pass system properties in python 3 - python

I'm using python 3 for an application. For that utility, I need to pass command line arguments as following,
python3 -m com.xxx.executor -Denvironment=dev -Dtoggle=False
Both the parameter environment and toggle are present in a property file too. If the value is specified in command line, it should override what is present on property file.
I'm basically a java guy and in java, the properties passed in the form -Dkey=value will be set as system property. Then these properties can be read from code as System.getProperty(key, defaultVal).
But when I try the same in python 3, it didn't work.
After referring python docs, it seems to me like the sys._xoptions are suitable for my requirement.
python3 -Xenvironment=dev -Xtoggle=False -m com.xxx.executor
Then read the properties using, sys._xoptions
I'm using Cpython. The aim of the thread is to ensure that, the way I'm proceeding is right or not. Or is there any other better ways in python to implement the same.
Python veterans, please guide !

For argument parsing, I use the argparse module (docs) to define which are valid named and/or positional arguments.
There are third-party modules as well such as click and docopt. You should use what you feel most comfortable with and whether or not you can use third-party modules. The click documentation contains a (possibly biased) comparison between it, argparse and docopt.
I've never used sys._xoptions, nor did I know of its existence. Seems a bit strange that a function starting with an underscore (normally used to indicate a "private" function) is mentioned in the docs. Perhaps someone else can shed some light on this.
For the parsing of a property file, I use the configparser module (docs). Of course, you could opt for a JSON or YAML config file if you'd prefer that.
That said, you'll have to come up with the necessary code to overrule properties when specified as arguments (though that shouldn't be too difficult).

From the docs on -X args
Reserved for various implementation-specific options. CPython currently defines the following possible values:
That means you probably shouldn't be hijacking these for your own purposes. As Kristof mentioned, argparse is a pretty reasonable choice. Since you want both a file and command line arguments, here's a quick example using a json file-based config:
import json
import argparse
argparser = argparse.ArgumentParser()
argparser.add_argument('--environment')
argparser.add_argument('--toggle', action='store_true')
try:
with open('config.json') as f:
args = json.load(f)
except (IOError, ValueError) as e:
# If the file doesn't exist or has invalid JSON
args = {}
args.update(vars(argparser.parse_args()))
print(args)
There are other possible alternatives for the file-based config, like the configparser module.

Related

Python: Private Types in Type Hints?

I like type hints, especially for my method parameters. In my current script one function should retrieve a parameter of type argparse._SubParsersAction. As one can see from the underscore, this is a private type by convention. I guess this is why PyCharm complains with the error message Cannot find reference '_SubParsersAction' in 'argparse.pyi' when trying to import it (although it's there).
The script runs but it feels wrong. The error message seems reasonable to me as private types are meant to be... well, private. My first question is therefore why the public method ArgumentParser.add_subparsers() returns an object of a private type in the first place.
I've looked for a public super class or interface and _SubParsersAction does indeed extend from argparse.Action, but that doesn't help me as Action does not define _SubParsersAction's add_parser() method (which I need).
So my next questions are: Can I use type hints with the argparse API? Or is it only partially possible because the API was designed long before type hints were introduced? Or does my idea of typing not fit to Python's type system?
Here is my affected code snippet. It creates an argument parser with sub arguments as described in the documentation (https://docs.python.org/dev/library/argparse.html#sub-commands).
main.py
from argparse import ArgumentParser
import sub_command_foo
import sub_command_bar
import sub_command_baz
def main():
parser = ArgumentParser()
sub_parsers = parser.add_subparsers()
sub_command_foo.add_sub_parser(sub_parsers)
sub_command_bar.add_sub_parser(sub_parsers)
sub_command_baz.add_sub_parser(sub_parsers)
args = parser.parse_args()
args.func(args)
if __name__ == '__main__':
main()
sub_command_foo.py
from argparse import _SubParsersAction, Namespace
def add_sub_parser(sub_parsers: _SubParsersAction):
arg_parser = sub_parsers.add_parser('foo')
# Add arguments...
arg_parser.set_defaults(func=run)
def run(args: Namespace):
print('foo...')
The problem lies in sub_command_foo.py. PyCharm shows the error message Cannot find reference '_SubParsersAction' in 'argparse.pyi' on the first line from argparse import _SubParsersAction, Namespace.

I don't use pycharm and haven't done much with type hints so can't help your there. But I know argparse well. The bulk of this module was written in before 2010, and it's been modified since then at a snail's pace. It's well organized in the OOP sense, but the documentation is more of glorified tutorial than a formal reference. That is, it focuses on the functions and methods users will needed, and doesn't try to formally document all classes and the methods.
I do a lot of my testing in an interactive ipython session where I can look at the objects returned by commands.
In Python the distinction between public and private classes and methods is not as formal as other languages. The '_' prefix does mark 'private' things. They aren't usually documented, but they are still accessible. Sometimes a '*' import will import all object that don't start with it, but argparse has a more explicit __all__ list.
argparser.ArgumentParser does create an object instance. But that class inherits from 2 'private' classes. The add_argument method creates an Action object, and puts it on the parser._actions list. It also returns it to the user (though usually that reference is ignored). The action is actually a subclass (all of which are 'private'). add_subparsers is just a specialized version of this add_argument.
The add_parser method creates a ArgumentParser object
It feels to me that type hinting that requires a separate import of 'private' classes is counter productive. You shouldn't be explicitly referencing those classes, even if your code produces them. Some people (companies) fear they can be changed without notification and thus break their code. Knowing how slowly argparse gets changed I wouldn't too much about that.

Get a module's attribute's type without importing the module or attribute

I am using python to do some processing on .py files. These .py files may be from unknown sources so I do not wish to directly run their code (for security), and may not have their dependencies installed anyway. I am analysing these files using python's tokenize module and then using the tokens to look at what the types of any NAME tokens are. For a function or class declared in a file you can just do:
import tokenize
# tokenize the source file ...
all_functions = []
for index, token in enumerate(tokens):
# check the token type
if token[0] == tokenize.NAME:
# check the token's string
if token[1] == "def":
# the next token is always the name of the function
all_functions.append(tokens[index + 1][1])
elif token[1] == "class":
# as above but for classes ...
The problem is that for an imported module I don't know how to tell the difference between a class and a function without seeing its declaration.
Take the following code snippet:
import pathlib
foo = pathlib.Path("some/path")
bar = pathlib.urlquote_from_bytes(b"some bytes")
Because this is well written code (PEP8 compliant), I can assume that pathlib.Path will be a class because the first character is uppercase and I can assume that pathlib.urlquote_from_bytes will be a function because it uses lower case words with underscores, however I cannot know for sure without having the module's source code (which may not be the case). Not all of the .py files I receive will necessarily be well written (PEP8 compliant) so I cannot rely on this.
Is there any other way of finding out whether some python module's attributes are of a given type? A thought I had would be to run python3 -m py_compile <file> and then analyse the result .pyc file, but I have never looked into cpython so I don't know if this would actually be helpful or not. Any suggestions would be welcome.

For this specific use case, it turned out I did not need to separate out classes and functions and could wrap them all up as callables. A callable is essentially any instance that has a __call__() method, e.g. some method x(arg1, arg2, ...) is shorthand for x.__call__(arg1, arg2, ...).
For anyone using older versions of Python it is worth noting that "this function was first removed in Python 3.0 and then brought back in Python 3.2."
Further reading:
Blog post: Is it a class or a function? It's a callable!
Python3 docs: Call function
StackOverflow: What is a callable?

Pluggable Python sub-command pattern?

I'm looking for a good pattern for how to implement Python sub-commands, where the main command looks up the subcommand at run time (instead of knowing the list of all possible sub-commands; this allows the "application" to be easily extended with new sub-commands without having to change the main code)
E.g:
topcmd.py foo
will look in /some/dir for foo.py and if it exists, run it. Or some variation of it.
The code invoked in foo.py should preferably be a well-defined function or method on a class or object.

While this question is actually quite broad, there are sufficient tools available within a typical default Python installation (i.e. with setuptools) that this is relatively achievable, in a way that is actually extensible so that other packages can be created/installed in a manner that provide new, discoverable subcommands for your main program.
Your base package can provide a standard entry_point in the form of console_scripts that point to your entry point that will feed all arguments into an instance of some argument parser (such as argparse), and some kind of registry which you might implement under a similar scheme as console_scripts, except under your specific entry_points group so that it would iterate through every entry and instantiate the objects that would also provide their own ArgumentParser instances which your main entry point would dynamically register to itself as a subcommand, thus showing your users what subcommands are actually available and what their invocation might be like.
To provide an example, in your main package's setup.py, you might have an entry like
setup(
name='my.package',
# ...
entry_points={
'console_scripts': [
'topcmd = my.package.runtime:main',
],
'my.package.subcmd': [
'subcmd1 = my.package.commands:subprog1',
'subcmd2 = my.package.commands:subprog2',
],
},
# ...
)
Inside the my/package/runtime.py source file, a main method will have to construct a new ArgumentParser instance, and while iterating through the entry points provided by pkg_resources.working_set, for example:
from pkg_resources import working_set
def init_parser(argparser): # pass in the argparser provided by main
commands = argparser.add_subparsers(dest='command')
for entry_point in working_set.iter_entry_points('my.package.subcmd'):
subparser = commands.add_parser(entry_point.name)
# load can raise exception due to missing imports or error in object creation
subcommand = entry_point.load()
subcommand.init_parser(subparser)
So in the main function, the argparser instance it created could be passed into a function like one in above, and the entry point 'subcmd1 = my.package.commands:subprog1' will be loaded. Inside my/package/command.py, an implemented init_parser method must be provided, which will take the provided subparser and populate it with the required arguments:
class SubProgram1(object):
def init_parser(self, argparser)
argparser.add_argument(...)
subprog1 = SubProgram1()
Oh, one final thing, after passing in the arguments to the main argparser.parse_args(...), the name of the command is provided to argparser.command. It should be possible to change that to the actual instance, but that may or may not achieve what you exactly want (because the main program might want to do further work/validation before actually using the command). That part is another complicated part, but at least the argument parser should contain the information required to actually run the correct subprogram.
Naturally, this includes absolutely no error checking, and it must be implemented in some form to prevent faulty subcommand classes from blowing up the main program. I have made use of a pattern like this one (albeit with a lot more complex implementation) that can support an arbitrary amount of nested subcommand. Also packages that want to implement custom commands can simply add their own entry to the entry point group (in this case, to my.package.subcmd) for their own setup.py. For example:
setup(
name="some.other.package",
# ...
entry_points={
'my.package.subcmd': [
'extracmd = some.other.package.commands:extracmd',
],
},
# ...
)
Addendum:
As requested, an actual implementation that's used in production is in a package (calmjs) that I currently maintain. Installing that package (into a virtualenv) and running calmjs on the command line should show a listing of subcommands identical to the entries defined in the main package's entry points. Installing an additional package that extends the functionality (such as calmjs.webpack) and running calmjs again will now list calmjs.webpack as an additional subcommand.
The entry points references instances of subclasses to the Runtime class, and in it there is a place where the subparser is added and if satisfies registration requirements (many statements following that relate to various error/sanity checking, such as what to do when multiple packages define the same subcommand name for runtime instances, amongst other things), registered to the argparser instance on that particular runtime instance, and the subparser is passed into the init_argparser method of the runtime that encapsulates the subcommand. As an example, the calmjs webpack subcommand subparser is set up by its init_argparser method, and that package registers the webpack subcommand in its own setup.py. (To play with them, please just simply use pip to install the relevant packages).

You can use the __import__ function to dynamically import a module using a string name passed on the command line.
mod = sys.argv[1]
command =__import__(mod)
# assuming your pattern has a run method defined.
command.run()
Error handling, etc left as an exercise for the reader
Edit: This would depend on user plugins being installed via pip. If you want users to drop plugins into a folder without installing, then you would have to add that folder to your python path.

The simplest answer seems to be, if all my commands are in foo.commands:
import foo.commands
import importlib
for importer, modname, ispkg in pkgutil.iter_modules(foo.commands.__path__):
mod=importlib.import_module('foo.commands.' + cmd)
mod.run()
This will run all the sub-commands. (Well, in the real code I will run just one. This is the howto.)

Is Python mimetypes.init() an outdated function?

I'm struggling to understand what the init() function does in the python mimetypes package. Is it an outdated function that isn't needed in more recent versions of python?

mimetypes.init() is useful if you want to add MIME type / extension mappings beyond the default. If you don't need to do that, then there's no need to call mimetypes.init(); just use the utility functions normally, and they'll call it themselves if necessary. If you do need to do that, aside from mimetypes.init() there's also mimetypes.read_mime_types() and mimetypes.add_type().
This applies to Python 2 and 3.

According to the mimetypes module documentation:
The functions described below provide the primary interface for this
module. If the module has not been initialized, they will call init()
if they rely on the information init() sets up.
mimetypes.init(files=None)
Initialize the internal data structures. If given, files must be a
sequence of file names which should be used to augment the default
type map. If omitted, the file names to use are taken from knownfiles;
on Windows, the current registry settings are loaded. Each file named
in files or knownfiles takes precedence over those named before it.
Calling init() repeatedly is allowed.
Specifying an empty list for files will prevent the system defaults
from being applied: only the well-known values will be present from a
built-in list.
It's there both in Python 2.7 and Python 3.x.

Storing configuration defaults in a Python docstring?

I have a Python app with many modules. There's a YAML config file which contains configuration settings for each module. In each module's init(), I load the config and then process it into a config dictionary for that module.
The processing is ugly right now. Dozens and dozens of lines at the beginning of each module with a lot of stuff like:
if 'foo' not in config:
config['foo'] = bar
or
if 'foo' in config:
config['foo'] = string_to_list(config['foo'])
else:
config['foo'] = list()
etc.
So now I want to write a centralized config processing method that each module can use. I'm thinking that I want to use a YAML-formatted string to specify what the final config should look like. (Which settings are mandatory, default values, object types, etc.)
I'm thinking the config specification in each module could be something like this: (with the type|default|required values)
config_template = """MySection:
setting1: int|0
setting2: string|None|required
setting3: int|10"""
So far, so good. My real question is whether there's any way for me to save this config specification thing in each module in a docstring for init() or something. I'm thinking that since I'm essentially defining exactly what the module expects for its config, along with the defaults and the types, if I could store this config thing in the docstring then I ought to be able to write something and/or configure sphinx to pretty it up for the documentation?
So I wonder if this approach sounds sane in general, and, if so, is there a way I can store this config info in the docstring to get it to work as double duty?
EDIT: I considered setting up a dictionary with default values first, and certainly that will take care of the defaults. But there are a lot of cases where I need the values to be of a certain type. For example, maybe some have to be a list, but if a single item is entered into the config file it will be read as a string, so I need the config processor to convert the string to a list with that string as its only item.
EDIT 2: The reason I was asking about the docstring is because I was thinking that my config specification would essentially tell a programmer what this modules was expecting in terms of a config dictionary, so if I can do it in the docstring then I can just specify it once and have the module use it for its config as well as have it appear in the sphinx documentation. But if that's not possible, then so be it. I can use it in a variable (like in my example above) and then write the docstring manually.
Thanks!
Brian

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.