How to swap configuration files for Production and Development? - python

Some time ago I created a script using Python, the script will execute some actions in an instance based on a configuration file.
This is the issue, I created 2 configuration files.
Config.py
instance= <Production url>
Value1= A
Value2= B
...
TestConfig.py
instance= <Development url>
Value1= C
Value2= D
...
So when I want the script to execute the tasks in a development instance to do tests, I just import the TestConfig.py instead of the Config.py.
Main.py
# from Config import *
from TestConfig import *
The problem comes when I update the script using git. If I want to run the script in development I have to modify the file manually, this means that I will have uncommited changes in the server.
Doing this change takes about 1 min of my time but I feel like I'm doing something wrong.
Do you know if there's a standard or right way to accomplish this kind of tasks?

Use that:
try:
from TestConfig import *
except ImportError:
from Config import *
On production, remove TestConfig.py

Export environment variables on your machines, and chose the settings based on that environment variable.

I think Django addresses this issue best with local_settings.py. Based on this approach. at the end of all your imports (after from config import *), just add:
# Add this at the end of all imports
# This is safe to commit and even push to production so long as you don't have local_config in your production server
try:
from local_config import *
except ImportError:
pass
And create a local_config.py per machine. What this will do is import everything from config, and then again from local_config, overriding global configuration settings, should they have the same name as the settings inside config.

The other answers here offer perfectly fine solutions if you really want to differentiate between production and test environments in your script. I would advocate for a different approach, however: to properly test your code, you should create an entirely separate test environment and run your code there, without any changes (or changes to the config files).
I can't make any specific suggestions for how to go about this since I don't know what the script does. In general though, you should try to create a sandbox that spoofs your production environment and is completely isolated. You can create a wrapper script that will run your code in the sandbox and modify the inputs and outputs as necessary to make your code interact with the test environment instead of the production environment. This wrapper is where you should be choosing which environment the code runs in and which config files it uses.
This approach abstracts the testing away from the code itself, making both easier to maintain. Designing for test is a reasonable approach for hardware, where you are stuck with the hardware you have after fabrication, but it makes less sense for software, where wrappers and spoofed data are easier to manage. You shouldn't have to modify your production code base just to handle testing.
It also entirely elimiates the chance that you'll forget to change something when you want to switch between testing and deployment to production.

Related

Prevent any file system usage in Python's pytest

I have a program that, for data security reasons, should never persist anything to local storage if deployed in the cloud. Instead, any input / output needs to be written to the connected (encrypted) storage instead.
To allow deployment locally as well as to multiple clouds, I am using the very useful fsspec. However, other developers are working on the project as well, and I need a way to make sure that they aren't accidentally using local File I/O methods - which may pass unit tests, but fail when deployed to the cloud.
For this, my idea is to basically mock/replace any I/O methods in pytest with ones that don't work and make the test fail. However, this is probably not straightforward to implement. I am wondering whether anyone else has had this problem as well, and maybe best practices / a library exists for this already?
During my research, I found pyfakefs, which looks like it is very close what I am trying to do - except I don't want to simulate another file system, I want there to be no local file system at all.
Any input appreciated.
You can not use any pytest addons to make it secure. There will always be ways to overcome it. Even if you patch everything in the standard python library, the code always can use third-party C libraries which can't be patched from the Python side.
Even if you, by some way, restrict every way the python process can write the file, it will still be able to call the OS or other process to write something.
The only ways are to run only the trusted code or to use some sandbox to run the process.
In Unix-like operating systems, the workable solution may be to create a chroot and run the program inside it.
If you're ok with just preventing opening files using open function, you can patch this function in builtins module.
_original_open = builtins.open
class FileSystemUsageError(Exception):
pass
def patched_open(*args, **kwargs):
raise FileSystemUsageError()
#pytest.fixture
def disable_fs():
builtins.open = patched_open
yield
builtins.open = _original_open
I've done this example of code on the basis of the pytest plugin which is written by the company in which I work now to prevent using network in pytests. You can see a full example here: https://github.com/best-doctor/pytest_network/blob/4e98d816fb93bcbdac4593710ff9b2d38d16134d/pytest_network.py

Python 3 Best practices on distinguish production and development configuration

I'm working on an embedded system project where my dev setup is different than my prod.
The differences include variables and packages imports.
What is the best way to structure the config files for python3 application where dev and prod setups are different?
prod: My device exchange messages (using pyserial) with an electronic system and also communicates with a server.
dev: I use a fake and fixed response from a function to mock both the electronic and server responses.
Even if the functions that I mock are essential in prod they are less in dev.
I can mock them because the most important part of this project are the functions that use and treat them.
So, there are packages imports and function calls that do not make sense and introduce errors in dev mode.
Every time I need to switch from one to another I need to change a good amount of the code and some times there are errors introduced. I know this is really (💩) not the best approach and I wanted to know what are the best practices.
The Closest Solution Found
Here there is a good solution to set up different variables for each environment. I was hoping for something similar but for projects that require different packages import for different environments.
My Setup
Basic workflow:
A task thread is executed each second
module_1 do work and call module_2
module_2 do work and call module_3
module_3 do work and send back a response
Basic folder structure:
root
main
config.py
/config
prod
dev
/mod_1
/mod_2
/mod_3
/replace_imports
module_1 and module_3 use, each one, a specific package for prod and must be replaced by a dev function
What do I have:
# config.py
if os.environ["app"] == "dev":
import * from root.config.dev
if os.environ["app"] == "prod":
import * from root.config.prod
# config/prod.py
import _3rd_party_module_alpha
import _3rd_party_module_beta
...
obj_alpha = _3rd_party_module_alpha()
func_beta = _3rd_party_module_beta()
# config/dev.py
import * from root.replace_imports
# replace_imports.py
obj_alpha = fake_3rd_party_module_alpha()
func_beta = fake_3rd_party_module_beta()
You really should not have code changes between a dev at point X, and pushing into QA/CI , then prod at point X. Your dev and prod code can be expected to be different at different stages, of course, and version control is key. But moving to production should not require code changes, just config changes.
Environment variables (see 12 factor app stuff) can help, but sometimes config is in code, for example in Django setting files.
In environments like Django where "it points to" a settings file, I've seen this kinda of stuff:
base_settings.py:
common config
dev_settings.py:
#top of file
import * from base_settings
... dev specifics, including overrides of base...
edit: I am well aware of issues with import *. First, this is a special case, for configurations, where you want to import everything. Second, the real issue with import * is that it clobbers the current namespace. That import is right at the top so that won't happen. Linters aside, and they can be suppressed for just that line, the leftover issue is that you may not always know where a variable magically came from, unless you look in base.
prod_settings.py:
import * from base_settings
...production specifics, including overrides of base...
Advanced users of webpack configuration files (and those are brutal) do the same thing, i.e. use a base.js then import that into a dev.js and prod.js.
The key here is to have as much as possible in base, possibly with the help of environment variables (be careful not to over-rely on those, no ones likes apps with dozens of environment variable settings). Then dev and prod are basically about keys, paths, urls, ports, that kind of stuff. Make sure to keep secrets out of them however, because they gravitate there naturally but have no business being under version control.
re. your current code
appname = os.getenv("app")
if appname == "dev":
#outside of base=>dev/prod merges, at top of "in-config" namespaces,
#I would avoid `import *` here and elsewhere
import root.config.dev as config
elif appname == "prod":
import root.config.prod as config
else:
raise ValueError(f"environment variable $app should have been either `dev` or `prod`. got:{appname}:")
Last, if you are stuck without an "it points to Python settings file" mechanism, similar to that found in Django, you can (probably) roll your own by storing the config module path (xxx.prod,xxx.dev) in an environment variable and then using a dynamic import. Mind you, your current code largely does that already except that you can't experiment with/add other settings files.
Don't worry if you don't get it right right away, and don't over-design up front - it takes a while to find what works best for you/your app.
Pipenv was specially created for these things and more
Use git branches (or mercurial or whatever version control system you're using - you are using a vcs, are you ?) and virtualenvs. That's exactly what they are for. Your config files should only be used for stuff like db connections identifiers, api keys etc.

Python: Securing untrusted scripts/subprocess with chroot and chjail?

I'm writing a web server based on Python which should be able to execute "plugins" so that functionality can be easily extended.
For this I considered the approach to have a number of folders (one for each plugin) and a number of shell/python scripts in there named after predefined names for different events that can occur.
One example is to have an on_pdf_uploaded.py file which is executed when a PDF is uploaded to the server. To do this I would use Python's subprocess tools.
For convenience and security, this would allow me to use Unix environment variables to provide further information and set the working directory (cwd) of the process so that it can access the right files without having to find their location.
Since the plugin code is coming from an untrusted source, I want to make it as secure as possible. My idea was to execute the code in a subprocess, but put it into a chroot jail with a different user, so that it can't access any other resources on the server.
Unfortunately I couldn't find anything about this, and I wouldn't want to rely on the untrusted script to put itself into a jail.
Furthermore, I can't put the main/calling process into a chroot jail either, since plugin code might be executed in multiple processes at the same time while the server is answering other requests.
So here's the question: How can I execute subprocesses/scripts in a chroot jail with minimum privileges to protect the rest of the server from being damaged by faulty, untrusted code?
Thank you!
Perhaps something like this?
# main.py
subprocess.call(["python", "pluginhandler.py", "plugin", env])
Then,
# pluginhandler.py
os.chroot(chrootpath)
os.setgid(gid) # Important! Set GID first! See comments for details.
os.setuid(uid)
os.execle(programpath, arg1, arg2, ..., env)
# or another subprocess call
subprocess.call["python", "plugin", env])
EDIT: Wanted to use fork() but I didn't really understand what it did. Looked it up. New
code!
# main.py
import os,sys
somevar = someimportantdata
pid = os.fork()
if pid:
# this is the parent process... do whatever needs to be done as the parent
else:
# we are the child process... lets do that plugin thing!
os.setgid(gid) # Important! Set GID first! See comments for details.
os.setuid(uid)
os.chroot(chrootpath)
import untrustworthyplugin
untrustworthyplugin.run(somevar)
sys.exit(0)
This was useful and I pretty much just stole that code, so kudos to that guy for a decent example.
After creating your jail you would call os.chroot from your Python source to go into it. But even then, any shared libraries or module files already opened by the interpreter would still be open, and I have no idea what the consequences of closing those files via os.close would be; I've never tried it.
Even if this works, setting up chroot is a big deal so be sure the benefit is worth the price. In the worst case you would have to ensure that the entire Python runtime with all modules you intend to use, as well as all dependent programs and shared libraries and other files from /bin, /lib etc. are available within each jailed filesystem. And of course, doing this won't protect other types of resources, i.e. network destinations, database.
An alternative could be to read in the untrusted code as a string and then exec code in mynamespace where mynamespace is a dictionary defining only the symbols you want to expose to the untrusted code. This would be sort of a "jail" within the Python VM. You might have to parse the source first looking for things like import statements, unless replacing the built-in __import__ function would intercept that (I'm unsure).

Creating an in memory constant when django starts

I have this kind of setup :
Overridden BaseRunserverCommand that adds another option (--token) that would get a string token.
Store it in the app called "vault" as a global variable
Then continue executing the BaseRunserverCommand
Now later when I try to get the value of this global variable after the server started, I am unable to see the value. Is this going out of scope? How to store this one time token that is entered before the django starts?
Judging by the name of the command line option, this sounds like a configuration variable -- why not put it in settings.py like all the other configurations?
If it's a secure value that you don't want checked in to version control, one pattern I've seen is to put secure or environment-sensitive (i.e. only makes sense in production or development) configurations in a local_settings.py file which is not checked in to version control, then add to the end of your settings.py:
try:
from local_settings import *
except ImportError:
# if you require a local_settings to be present,
# you could let this exception rise, or raise a
# more specific exception here
pass
(Note that some people like to invert the import relationship and then use --settings on the command line with runserver -- this works, too, but requires that you always remember to use --settings.)
There's no such thing as a "global variable" in Python that is available from everywhere. You need to import all names before you can use them.
Even if you did this, though, it wouldn't actually work on production. Not only is there no command that you run to start up the production server (it's done automatically by Apache or whatever server software you're using), the server usually runs in multiple processes, so the variable would have to be set in each of them.
You should use a setting, as dcrosta suggests.

Multiple python web applications running on multiple domains (virtualhost)?

I am stuck and desperate.
Is it possible to serve multiple python web applications on multiple different domains using virtualhost on cherrypy? Hmm wait... I will answer myself: Yes, it is possible. With virtual host dispatcher it is possible, until i require this:
I need to use more instances of same application but in different versions. This means that I need to somehow split the namespace for the python import for these applications.
Example:
I have application MyApp and there are two versions of it. I have got two domains app1.com and app2.com.
When I access app1.com I would like to get the application MyApp in version 1. When I access app2.com, it should be MyApp in version 2.
I am now using the VirtualHostDispatcher of cherrypy 3.2 and the problem is that, when I use import from the methods of MyApp version 1 and the MyApp version 2 has been loaded before, python will use the already imported module (because of module cache).
Yes.. it is possible to wrap the import and clean the python module cache everytime (i use this for the top level application object instantiation), but it seems quite unclean for me... And I think that it is also inefficient...
So, what do you recommend me?
I was thinking about using apache2 and cherrypy using Mod_WSGI, but it seems that this does not solve the import problem, becuase there is still one python process for all apps togetger.
Maybe, I am thinking about the whole problem completely wrong and I will need to re-think it. I am opened for every idea or tip. Only limitation is that i want to use Python 3. Anything else is still opened for discussion :-)
Thank you for every response!
Apache/mod_wsgi can do what is required. Each mounted web application under mod_wsgi will run in a distinct sub interpreter in the same process so can be using different code bases. Better still, you use daemon mode of mod_wsgi and delegate each web application to distinct process so not risk of them interfering with each other.
what about creating myapp_selector module that does smth like that:
def application(env, start_response):
import myapp1
import myapp2
if env['SERVER_NAME'] == 'myapp1.com':
myapp = myapp1
elif env['SERVER_NAME'] == 'myapp2.com':
myapp = myapp2
# ...
return myapp.process_request()

Categories

Resources