How and when to initialise configuration in Python?

How and when to initialise configuration in Python? - python

I'm getting pretty confused as to how and where to initialise application configuration in Python 3.
I have configuration that consists of application specific config (db connection strings, url endpoints etc.) and logging configuration.
Before my application performs its intended function I want to initialise the application and logging config.
After a few different attempts, I eventually ended up with something like the code below in my main entry module. It has the nice effect of all imports being grouped at the top of the file (https://www.python.org/dev/peps/pep-0008/#imports), but it doesn't feel right since the config modules are being imported for side effects alone which is pretty non-intuitive.
import config.app_config # sets up the app config
import config.logging_config # sets up the logging config
...
if __name__ == "__main__":
...
config.app_config looks something like follows:
_config = {
'DB_URL': None
}
_config['DB_URL'] = _get_db_url()
def db_url():
return _config['DB_URL']
def _get_db_url():
#somehow get the db url
and config.logging_config looks like:
if not os.path.isdir('.\logs'):
os.makedirs('.\logs')
if os.path.exists('logging_config.json'):
with open(path, 'rt') as f:
config = json.load(f)
logging.config.dictConfig(config)
else:
logging.basicConfig(level=log_level)
What is the common way to set up application configuration in Python? Bearing in mind that I will have multiple applications each using the config.app_config and config.logging_config module, but with different connection string possibly read from a file

I ended up with a cut down version of the Django approach: https://github.com/django/django/blob/master/django/conf/init.py
It seems pretty elegant and has the nice benefit of working regardless of which module imports settings first.

Related

What is the best PySpark practice to load config from external file

I would like to initialize config once, and then use it in many modules of my PySpark project.
I see 2 ways to do it.
load it in entry point and pass as an argument to each function
main.py:
with open(sys.argv[1]) as f:
config = json.load(f)
df = load_df(config)
df = parse(df, config)
df = validate(df, config, strict=True)
dump(df, config)
But it seems unbeauty to pass one external argument to each function.
Load config in config.py and import this object in each module
config.py
import sys
import json
with open(sys.argv[1]) as f:
config = json.load(f)
main.py
from config import config
df = load_df()
df = parse(df)
df = validate(df, strict=True)
dump(df)
and in each module add row
from config import config
It seems more beauty because config is not, strictly speaking, an argument of function. It is general context where they execute.
Unfortunately, PySpark pickle config.py and tries to execute it on server, but doesn't pass sys.argv to them!
So, I see error when run it
File "/PycharmProjects/spark_test/config.py", line 6, in <module>
CONFIG_PATH = sys.argv[1]
IndexError: list index out of range
What is the best practice to work with general config, loaded from file, in PySpark?

Your program starts execution on master and passes main bulk of its work to executors by invoking some functions on them. The executors are different processes that are typically run on different physical machines.
Thus anything that the master would want to reference on executors needs to be either a standard library function (to which the executors have access) or a pickelable object that can be sent over.
You typically don't want to load and parse any external resources on the executors, since you would always have to copy them over and make sure you load them properly... Passing a pickelable object as an argument of the function (e.g. for a UDF) works much better, since there is only one place in your code where you need to load it.
I would suggest creating a config.py file and add it as an argument to your spark-submit command:
spark-submit --py-files /path/to/config.py main_program.py
Then you can create spark context like this:
spark_context = SparkContext(pyFiles=['/path/to/config.py'])
and simply use import config wherever you need.
You can even include whole python packages in a tree packaged as a single zip file instead of just a single config.py file, but then be sure to include __init__.py in every folder that needs to be referenced as python module.

How to get the location of a Zope installation from inside an instance?

We are working on an add-on that writes to a log file and we need to figure out where the default var/log directory is located (the value of the ${buildout:directory} variable).
Is there an easy way to accomplish this?

In the past I had a similar use case.
I solved it by declaring the path inside the zope.conf:
zope-conf-additional +=
<product-config pd.prenotazioni>
logfile ${buildout:directory}/var/log/prenotazioni.log
</product-config>
See the README of this product:
https://github.com/PloneGov-IT/pd.prenotazioni/
This zope configuration can then be interpreted with this code:
from App.config import getConfiguration
product_config = getattr(getConfiguration(), 'product_config', {})
config = product_config.get('pd.prenotazioni', {})
logfile = config.get('logfile')
See the full example
here: https://github.com/PloneGov-IT/pd.prenotazioni/blob/9a32dc6d2863b5bfb5843d441e652101406d9a2c/pd/prenotazioni/init.py#L17
Worth noting is the fact that the initial return avoids multiple logging if the init function is mistakenly called more than once.
Anyway, if you do not want to play with buildout and custom zope configuration, you may want to get the default event log location.
It is specified in the zope.conf. You should have something like this:
<eventlog>
level INFO
<logfile>
path /path/to/plone/var/log/instance.log
level INFO
</logfile>
</eventlog>
I was able to obtain the path with this code:
from App.config import getConfiguration
import os
eventlog = getConfiguration().eventlog
logpath = eventlog.handler_factories[0].instance.baseFilename
logfolder = os.path.split(logpath)[0]
Probably looking at in the App module code you will find a more straightforward way of getting this value.
Another possible (IMHO weaker) solution would be store (through buildout or your prefered method) the logfile path into an environment variable.

You could let buildout set it in parts/instance/etc/zope.conf in an environment variable:
[instance]
recipe = plone.recipe.zope2instance
environment-vars =
BUILDOUT_DIRECTORY ${buildout:directory}
Check it in Python code with:
import os
buildout_directory = os.environ.get('BUILDOUT_DIRECTORY', '')
By default you already have the INSTANCE_HOME environment variable, which might be enough.

Adding Command Line Arguments to Python Twisted

I am still new to Python so keep that in mind when reading this.
I have been hacking away at an existing Python script that was originally "put" together by a few different people.
The script was originally designed to load it's 'configuration' using a module named "conf/config.py" which is basically Python code.
SETTING_NAME='setting value'
I've modified this to instead read it's settings from a configuration file using ConfigParser:
import ConfigParser
config_file_parser = ConfigParser.ConfigParser()
CONFIG_FILE_NAME = "/etc/service_settings.conf"
config_file_parser.readfp(open(r'' + CONFIG_FILE_NAME))
SETTING_NAME = config_file_parser.get('Basic', 'SETTING_NAME')
The problem I am having is how to specify the configuration file to use. Currently I have managed to get it working (somewhat) by having multiple TAC files and setting the "CONFIG_FILE_NAME" variable there using another module to hold the variable value. For example, I have a module 'conf/ConfigLoader.py":
global CONFIG_FILE_NAME
Then my TAC file has:
import conf.ConfigLoader as ConfigLoader
ConfigLoader.CONFIG_FILE_NAME = '/etc/service_settings.conf'
So the conf/config.py module now looks like:
import ConfigLoader
config_file_parser = ConfigParser.ConfigParser()
config_file_parser.readfp(open(r'' + ConfigLoader.CONFIG_FILE_NAME))
It works, but it requires managing two files instead of a single conf file. I attempted to use the "usage.Options" feature as described on http://twistedmatrix.com/documents/current/core/howto/options.html. So I have twisted/plugins/Options.py
from twisted.python import usage
global CONFIG_FILE_NAME
class Options(usage.Options):
optParameters = [['conf', 'c', 'tidepool.conf', 'Configuration File']]
# Get config
config = Options()
config.parseOptions()
CONFIG_FILE_NAME = config.opts['conf']
That does not work at all. Any tips?

I don't know if I understood your problem.
If you want to load the configuration from multiple locations you could pass a list of filenames to the configparser: https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.read
If you were trying to make a generic configuration manager, you could create a class of a functions the receives the filename or you could use set the configuration file name in an environment variable and read that variable in your script using something like os.environ.get('CONFIG_FILE_NAME').

Create pyramid request for testing, so that events are triggered

I would like to test a pyramid view like the following one:
def index(request):
data = request.some_custom_property.do_something()
return {'some':data}
some_custom_property is added to the request via such an event handler:
#subscriber(NewRequest)
def prepare_event(event):
event.request.set_property(
create_some_custom_property,
'some_custom_property',reify=True
)
My problem is: If I create a test request manually, the event is not setup correctly, because no events are triggered. Because the real event handler is more complicated and depends on configuration settings, I don't want to reproduce that code in my test code. I would like to use the pyramid infracstructure as much as possible. I learned from an earlier question how to set up a real pyramid app from an ini file:
from webtest import TestApp
from pyramid.paster import get_app
app = get_app('testing.ini#main')
test_app = TestApp(app)
The test_app works fine, but I can only get back the html output (which is the idea of TestApp). What I want to do is, to execute index in the context of app or test_app, but to get back the result of index before it's send to a renderer.
Any hint how to do that?

First of all, I believe this is a really bad idea to write doctests like this. Since it requires a lot of initialization work, which is going to be included in documentation (remember doctests) and will not "document" anything. And, to me, these tests seems to be the job for unit/integration test. But if you really want, here's a way to do it:
import myapp
from pyramid.paster import get_appsettings
from webtest import TestApp
app, conf = myapp.init(get_appsettings('settings.ini#appsection'))
rend = conf.testing_add_renderer('template.pt')
test_app = TestApp(app)
resp = test_app.get('/my/view/url')
rend.assert_(key='val')
where myapp.init is a function that does the same work as your application initialization function, which is called by pserve (like main function here. Except myapp.init takes 1 argument, which is settings dictionary (instead of main(global_config, **settings)). And returns app (i.e. conf.make_wsgi_app()) and conf (i.e pyramid.config.Configurator instance). rend is a pyramid.testing.DummyTemplateRenderer instance.
P.S. Sorry for my English, I hope you'll be able to understand my answer.
UPD. Forgot to mention that rend has _received property, which is the value passed to renderer, though I would not recommend to use it, since it is not in public interface.

Getting config variables outside of the application context

I've extracted several of my sqlalchemy models to a separate and installable package (../lib/site-packages), to use across several applications. So I only need to:
from models_package import MyModel
from any application needing access to these models.
Everything is ok so far, except I cannot find a satisfactory way of getting several application dependent config variables used by some of the models, which may vary from application to application. So some model need to be aware of some variables, where previously I've used the application they were in.
Neither
current_app.config['XYZ']
or
config = LocalProxy(lambda: current_app.config['XYZ'])
have worked (outside of application context errors) so I'm stuck right now. Maybe this is poor programming and/or design on my behalf, so how do clear this up? There must be some way, but I haven't reasoned myself toward it yet.
SOLUTION:
Avoiding setting items that would occur on module load (like a constant containing an api key), both of the above should work, and they do. Anything not using those in the context of model-in-the-application use will of course error, methods returning the values you need should be good.

If you are using a configuration pattern utilising classes and inheritance as described here, you could simply import your config classes with their respective properties and access them anywhere you want:
class Config(object):
IMPORT = 'ME'
DEBUG = False
TESTING = False
DATABASE_URI = 'sqlite:///:memory:'
class ProductionConfig(Config):
DATABASE_URI = 'mysql://user#localhost/foo'
class DevelopmentConfig(Config):
DEBUG = True
class TestingConfig(Config):
TESTING = True
Now, in your foo.py:
from config import Config
print(Config.IMPORT) # prints 'ME'

well, since current_app can be a proxy of your flask program when the blueprint is registered, and that is done at run-time, you can't use it in your models_package modules.
(app tries to import models_package, and models_package requires app's configs to initalize things- thus import fails)
one option would be doing circular imports :
assuming everything is in 'App' module:
__init__.py
import flask
application = flask.Flask(__name__)
application.config = #load configs
import models_package
models_package.py
from App import application
config = application.config
or create your own config object, but that somehow doubles complexity:
models_package.py
import flask
config = flask.config.Config(defaults=flask.Flask.default_config)
#pick one of those and apply the same config initialization as you do in
#your __init__.py
config.from_pyfile(..) #or
config.from_object(..) #or
config.from_envvar(..)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How and when to initialise configuration in Python? - python

I ended up with a cut down version of the Django approach: https://github.com/django/django/blob/master/django/conf/init.py It seems pretty elegant and has the nice benefit of working regardless of which module imports settings first.

Related

What is the best PySpark practice to load config from external file

How to get the location of a Zope installation from inside an instance?

Adding Command Line Arguments to Python Twisted

Create pyramid request for testing, so that events are triggered

Getting config variables outside of the application context

Categories

Resources