I am still new to Python so keep that in mind when reading this.
I have been hacking away at an existing Python script that was originally "put" together by a few different people.
The script was originally designed to load it's 'configuration' using a module named "conf/config.py" which is basically Python code.
SETTING_NAME='setting value'
I've modified this to instead read it's settings from a configuration file using ConfigParser:
import ConfigParser
config_file_parser = ConfigParser.ConfigParser()
CONFIG_FILE_NAME = "/etc/service_settings.conf"
config_file_parser.readfp(open(r'' + CONFIG_FILE_NAME))
SETTING_NAME = config_file_parser.get('Basic', 'SETTING_NAME')
The problem I am having is how to specify the configuration file to use. Currently I have managed to get it working (somewhat) by having multiple TAC files and setting the "CONFIG_FILE_NAME" variable there using another module to hold the variable value. For example, I have a module 'conf/ConfigLoader.py":
global CONFIG_FILE_NAME
Then my TAC file has:
import conf.ConfigLoader as ConfigLoader
ConfigLoader.CONFIG_FILE_NAME = '/etc/service_settings.conf'
So the conf/config.py module now looks like:
import ConfigLoader
config_file_parser = ConfigParser.ConfigParser()
config_file_parser.readfp(open(r'' + ConfigLoader.CONFIG_FILE_NAME))
It works, but it requires managing two files instead of a single conf file. I attempted to use the "usage.Options" feature as described on http://twistedmatrix.com/documents/current/core/howto/options.html. So I have twisted/plugins/Options.py
from twisted.python import usage
global CONFIG_FILE_NAME
class Options(usage.Options):
optParameters = [['conf', 'c', 'tidepool.conf', 'Configuration File']]
# Get config
config = Options()
config.parseOptions()
CONFIG_FILE_NAME = config.opts['conf']
That does not work at all. Any tips?
I don't know if I understood your problem.
If you want to load the configuration from multiple locations you could pass a list of filenames to the configparser: https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.read
If you were trying to make a generic configuration manager, you could create a class of a functions the receives the filename or you could use set the configuration file name in an environment variable and read that variable in your script using something like os.environ.get('CONFIG_FILE_NAME').
Related
I would like to initialize config once, and then use it in many modules of my PySpark project.
I see 2 ways to do it.
load it in entry point and pass as an argument to each function
main.py:
with open(sys.argv[1]) as f:
config = json.load(f)
df = load_df(config)
df = parse(df, config)
df = validate(df, config, strict=True)
dump(df, config)
But it seems unbeauty to pass one external argument to each function.
Load config in config.py and import this object in each module
config.py
import sys
import json
with open(sys.argv[1]) as f:
config = json.load(f)
main.py
from config import config
df = load_df()
df = parse(df)
df = validate(df, strict=True)
dump(df)
and in each module add row
from config import config
It seems more beauty because config is not, strictly speaking, an argument of function. It is general context where they execute.
Unfortunately, PySpark pickle config.py and tries to execute it on server, but doesn't pass sys.argv to them!
So, I see error when run it
File "/PycharmProjects/spark_test/config.py", line 6, in <module>
CONFIG_PATH = sys.argv[1]
IndexError: list index out of range
What is the best practice to work with general config, loaded from file, in PySpark?
Your program starts execution on master and passes main bulk of its work to executors by invoking some functions on them. The executors are different processes that are typically run on different physical machines.
Thus anything that the master would want to reference on executors needs to be either a standard library function (to which the executors have access) or a pickelable object that can be sent over.
You typically don't want to load and parse any external resources on the executors, since you would always have to copy them over and make sure you load them properly... Passing a pickelable object as an argument of the function (e.g. for a UDF) works much better, since there is only one place in your code where you need to load it.
I would suggest creating a config.py file and add it as an argument to your spark-submit command:
spark-submit --py-files /path/to/config.py main_program.py
Then you can create spark context like this:
spark_context = SparkContext(pyFiles=['/path/to/config.py'])
and simply use import config wherever you need.
You can even include whole python packages in a tree packaged as a single zip file instead of just a single config.py file, but then be sure to include __init__.py in every folder that needs to be referenced as python module.
I'm getting pretty confused as to how and where to initialise application configuration in Python 3.
I have configuration that consists of application specific config (db connection strings, url endpoints etc.) and logging configuration.
Before my application performs its intended function I want to initialise the application and logging config.
After a few different attempts, I eventually ended up with something like the code below in my main entry module. It has the nice effect of all imports being grouped at the top of the file (https://www.python.org/dev/peps/pep-0008/#imports), but it doesn't feel right since the config modules are being imported for side effects alone which is pretty non-intuitive.
import config.app_config # sets up the app config
import config.logging_config # sets up the logging config
...
if __name__ == "__main__":
...
config.app_config looks something like follows:
_config = {
'DB_URL': None
}
_config['DB_URL'] = _get_db_url()
def db_url():
return _config['DB_URL']
def _get_db_url():
#somehow get the db url
and config.logging_config looks like:
if not os.path.isdir('.\logs'):
os.makedirs('.\logs')
if os.path.exists('logging_config.json'):
with open(path, 'rt') as f:
config = json.load(f)
logging.config.dictConfig(config)
else:
logging.basicConfig(level=log_level)
What is the common way to set up application configuration in Python? Bearing in mind that I will have multiple applications each using the config.app_config and config.logging_config module, but with different connection string possibly read from a file
I ended up with a cut down version of the Django approach: https://github.com/django/django/blob/master/django/conf/init.py
It seems pretty elegant and has the nice benefit of working regardless of which module imports settings first.
We are working on an add-on that writes to a log file and we need to figure out where the default var/log directory is located (the value of the ${buildout:directory} variable).
Is there an easy way to accomplish this?
In the past I had a similar use case.
I solved it by declaring the path inside the zope.conf:
zope-conf-additional +=
<product-config pd.prenotazioni>
logfile ${buildout:directory}/var/log/prenotazioni.log
</product-config>
See the README of this product:
https://github.com/PloneGov-IT/pd.prenotazioni/
This zope configuration can then be interpreted with this code:
from App.config import getConfiguration
product_config = getattr(getConfiguration(), 'product_config', {})
config = product_config.get('pd.prenotazioni', {})
logfile = config.get('logfile')
See the full example
here: https://github.com/PloneGov-IT/pd.prenotazioni/blob/9a32dc6d2863b5bfb5843d441e652101406d9a2c/pd/prenotazioni/init.py#L17
Worth noting is the fact that the initial return avoids multiple logging if the init function is mistakenly called more than once.
Anyway, if you do not want to play with buildout and custom zope configuration, you may want to get the default event log location.
It is specified in the zope.conf. You should have something like this:
<eventlog>
level INFO
<logfile>
path /path/to/plone/var/log/instance.log
level INFO
</logfile>
</eventlog>
I was able to obtain the path with this code:
from App.config import getConfiguration
import os
eventlog = getConfiguration().eventlog
logpath = eventlog.handler_factories[0].instance.baseFilename
logfolder = os.path.split(logpath)[0]
Probably looking at in the App module code you will find a more straightforward way of getting this value.
Another possible (IMHO weaker) solution would be store (through buildout or your prefered method) the logfile path into an environment variable.
You could let buildout set it in parts/instance/etc/zope.conf in an environment variable:
[instance]
recipe = plone.recipe.zope2instance
environment-vars =
BUILDOUT_DIRECTORY ${buildout:directory}
Check it in Python code with:
import os
buildout_directory = os.environ.get('BUILDOUT_DIRECTORY', '')
By default you already have the INSTANCE_HOME environment variable, which might be enough.
I'm trying to understand the best way to implement Python's ConfigParser so that elements are accessible to multiple program modules. I'm using:-
import ConfigParser
import os.path
config = ConfigParser.RawConfigParser()
config.add_section('paths')
homedir = os.path.expanduser('~')
config.set('paths', 'user', os.path.join(homedir, 'users'))
[snip]
config.read(configfile)
In the last line the configfile variable is referenced. This has to be passed to the Config module so the user can override a default config file. How should I implement this in a module/class in such a manner that other modules can use config.get(spam, eggs)?
Let's say the code you have written in your question is inside a module configmodule.py. Then other modules get access to it in the following way:
from configmodule import config
config.get(spam, eggs)
Let me explain the use case...
In a simple python web application framework designed for Google App Engine, I'd like to have my models loaded automatically from a 'models' directory, so all that's needed to add a model to the application is place a file user.py (for example), which contains a class called 'User', in the 'models/' directory.
Being GAE, I can't read from the file system so I can't just read the filenames that way, but it seems to me that I must be able to 'import * from models' (or some equivalent), and retrieve at least a list of module names that were loaded, so I can subject them to further processing logic.
To be clear, I want this to be done WITHOUT having to maintain a separate list of these module names for the application to read from.
You can read from the filesystem in GAE just fine; you just can't write to the filesystem.
from models import * will only import modules listed in __all__ in models/__init__.py; there's no automatic way to import all modules in a package if they're not declared to be part of the package. You just need to read the directory (which you can do) and __import__() everything in it.
As explained in the Python tutorial, you cannot load all .py files from a directory unless you list them manually in the list named __all__ in the file __init__.py. One of the reasons why this is impossible is that it would not work well on case-insensitive file systems -- Python would not know in which case the module names should be used.
Let me start by saying that I'm not familiar with Google App Engine, but the following code demonstrates how to import all python files from a directory. In this case, I am importing files from my 'example' directory, which contains one file, 'dynamic_file.py'.
import os
import imp
import glob
def extract_module_names(python_files):
module_names = []
for py_file in python_files:
module_name = (os.path.basename(py_file))[:-3]
module_names.append(module_name)
return module_names
def load_modules(modules, py_files):
module_count = len(modules)
for i in range(0, module_count):
globals()[modules[i]] = imp.load_source(modules[i], py_files[i])
if __name__ == "__main__":
python_files = glob.glob('example/*.py')
module_names = extract_module_names(python_files)
load_modules(module_names, python_files)
dynamic_file.my_func()
Also, if you wish to iterate over these modules, you could modify the load_modules function to return a list of the loaded module objects by appending the 'imp.load_source(..)' call to a list.
Hope it helps.