I want to allow users to make their own Python "mods" for my game, by placing their scripts in a special folder which the game "scans" for Python modules and imports.
What would be the simplest way to prevent "dangerous" scripts from being imported? I don't want people complaining to me that they used someone's mod and it erased their hard drive.
Things I would like to limit is accessing/modifying/creating any files outside of their folder and connecting to the internet/downloading/sending data. If you can thik of anything else, let me know.
So how can this be done?
Restricted Python seems to able to restrict functionality for code in a clean way and is compatible with python up to 2.7.
http://pypi.python.org/pypi/RestrictedPython/
e.g.
By supplying a different __builtins__ dictionary, we can rule out unsafe operations, such as opening files [...]
The obvious way to do it is to load the module as a string and exec it. This has just as many security risks, but might be easier to block by using custom globals and locals. Have a look at this question - it gives some really good guidance on this. As pointed out in Delnan's comments, this isn't completely secure though.
You could also try this. I haven't used it, but it seems to provide a safe environment for unsafe scripts.
There are some serious shortcomings for sandboxed python execution. aquavitae's answer links to some good discussion on the matter, especially this blog post. Read that first.
There is a kernel of secure execution within cPython. The fundamental idea is to replace the __builtins__ global (Note: not the __builtin__ module), which informs python to turn on some security features; making a handful of attributes on certain objects inaccessible, and removing most of the implementation objects from the interpreter when evaulating that bit of code.
You'll then need to write an actual implementation; in such a way that the protected modules are not the leaked into the sandbox. A fairly tested "file" replacement is provided in the linked blog. Getting a look on that might give you an idea of how involved and complex this problem is.
So now that you have understood that this is a challenge in python; you should take a look at languages with sandbox execution as a core feature, such as Lua, which is very popular in games.
Giving them python execution and trying to limit what they do is asking for trouble. See this SO question for discussion and a pointer to a good article. (You would presumably disable "eval", but it wouldn't make much difference in practice.
My suggestion: Turn the question around. Your goal is to provide them with scripting facilities so they can enhance the game. Find or define an interpreter for a suitable scripting language that has the features you need, and use it to execute their scripts. For example, you could support data persistence in a simple keystore model, without giving them file creation access. Or give them a command to create files but ensure it only accepts a path-less filename. The essential thing is to ensure that there is NO way for them to execute python commands directly.
Related
I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've noticed a few Python packages that use config files written in Python. Apart from the obvious privilege escalation, what are the pros and cons of this approach?
Is there much of a precedence for this? Are there any guides as to the best way to implement this?
Just to clarify: In my particular use case, this will only be used by programmers or people who know what they're doing. It's not a config file in a piece of software that will be distributed to end users.
The best example I can think of for this is the django settings.py file, but I'm sure there are tons of other examples for using a Python file for configuration.
There are a couple of key advantages for using Python as config file over other solutions, for example:
There is no need to parse the file: Since the file is already Python, you don't have to write or import a parser to extract the key value pairs from the file.
Configuration settings can be more than just key/values: While it would be folly to have settings define their own classes, you can use them to define tuples, lists or dictionaries of settings allowing for more options and configuration than other options. This is especially true with django, where the settings file has to accommodate for all manner of plug-ins that weren't originally known by the framework designers.
Writing configuration files is easy: This is spurious, but since the configuration is a Python file it can be edited and debugged within the IDE of the program itself.
Implicit error-checking: If your program requires an option called FILE_NAME and that isn't in the settings the program will throw an exception. This means that settings become mandatory and error handling of the settings can be more explicit. This can be a double edged sword, but manually changing config files should be for power editors who should be able to handle the consequences of exceptions.
Config options are easily accessed and namespaces: Once you go import settings you can wildly start calling settings.UI_COLOR or settings.TIMEOUT. These are clear, and with the right IDE, tracking where these settings are made becomes easier than with flat files.
But the most powerful reason: Overrides, overrides, overrides. This is quite an advanced situation and can be use-case specific, but one that is encouraged by django in a few places.
Picture that you are building a web application, where there is a development and production server. Each of these need their own settings, but 90% of them are the same. In that case you can do things like define a config file that covers all of development and make it (if its safer) the default settings, and then override if its production, like so:
PORT = 8080
HOSTNAME = "dev.example.com"
COLOR = "0000FF"
if SITE_IS_LIVE:
import * from production_settings.py
Doing an import * from will cause any settings that have been declared in the production_settings.py file to override the declarations in the settings file.
I've not seen a best practise guideline or PEP document that covers how to do this, but if you wanted some general guidelines, the django settings.py is a good example to follow.
Use consistent variable names, preferably UPPER CASE as they are understood to be settings or constants.
Expect odd data structures, if you are using Python as the configuration language, then try to handle all basic data types.
Don't try and make an interface to change settings, that isn't a simple text editor.
When shouldn't you use this approach? When you are dealing with simple key/value pairs that need to be changed by novice users. Python configs are a power user option only. Novice users will forget to end quotes or lists, not be consistent, will delete options they think don't apply and will commit the unholiest of unholies and will mix tabs and spaces spaces only. Because you are essentially dealing with code not config files, all off these will break your program. On the otherside, writing a tool that would parse through through a python file to find the appropriate options and update them is probably more trouble than it is worth, and you'd be better of reusing an existing module like ConfigParser
I think Python code gets directly used for configuration mostly because it's just so easy, quick, powerful and flexible way to get it done. There is currently no other tool in the Python ecosystem that provides all these benefits together. The ConfigParserShootout cat give you enough reasons why it may be better to roll Python code as config.
There are some security considerations that can be worked around either by defensive code evaluation or by policies such as properly setting the filesystem permissions in deployment.
I've seen so much struggle with rather complex configuration being done in various formats, using various parsers, but in the end being the easiest when done in code.
The only real downside I came upon is that people managing the configuration have to be somewhat aware of Python, at least the syntax, to be able to do anything and not to brake anything. May or may not matter case by case.
Also the fact that some serious projects, such as Django and Sphinx, are using this very approach should be soothing enough:
https://docs.djangoproject.com/en/dev/topics/settings/
http://sphinx-doc.org/config.html
There are many options for writing configuration files, with well written parsers:
ini
json
yaml
xml
csv
there's no good reason to have any kind of configuration be parsed as a python script directly. That could led to many kind of problems, from the security aspects to the hard to debug errors, that could be raised late in the run of the program life.
There's even discussions to build an alternative to the setup.py for python packages, which is pretty close to a python source code based configuration from a python coder's point of view.
Otherwise, you may just have seen python objects exported as strings, that looks a bit like json, though a little more flexible… Which is then perfectly fine as long as you don't eval()/exec() them or even import them, but pass it through a parser, like 'ast.literal_eval' or parsing, so you can make sure you only load static data not executable code.
The only few times I'd understand having something close to a config file written in python, is a module included in a library that defines constants used by that library designed to be handled by the user of the library. I'm not even sure that would be a good design decision, but I'd understand such a thing.
edit:
I wouldn't consider django's settings.py an example of good practice, though I consider it's part of what I'm consider a configuration file for coding-literate users that works fine because django is aimed at being used mostly by coders and sysadmins. Also, django offers a way of configuration through a webpage.
To take #lego's arguments:
There is no need to parse the file
there's no need to explicitly parse it, though the cost of parsing is anecdotic, even more given the safety and the extra safety and the ability to detect problems early on
Configuration settings can be more than just key/values
ini files apart, you can define almost any fundamental python type using json/yaml or xml. And you don't want to define classes, or instanciate complex objects in a configuration file…
Writing configuration files is easy:
but using a good editor, json/yaml or even xml syntax can be checked and verified, to have a perfectly parsable file.
Implicit error-checking:
not an argument neither, as you say it's double sworded, you can have something that parses fine, but causes an exception after many hours of run.
Config options are easily accessed and namespaces:
using json/yaml or xml, options can easily be namespaced, and used as python objects naturally.
But the most powerful reason: Overrides, overrides, overrides
It's not a good argument neither in favor of python code. Considering your code is made of several modules that are interdendant and use a common configuration file, and each of them have their own configuration, then it's pretty easy to load first the main configuration file as a good old python dictionary, and the other configuration files just loaded by updating the dictionary.
If you want to track changes, then there are many recipes to organize a hierarchy of dicts that fallbacks to another dict if it does not contain the value.
And finally, configuration values changed at runtime can't be (actually shouldn't be) serialized in python correctly, as doing so would mean changing the currently running program.
I'm not saying you shouldn't use python to store configuration variables, I'm just saying that whatever syntax you choose, you should get it through a parser before getting it as instances in your program. Never, ever load user modifiable content without double checking. Never trust your users!
If the django people are doing it, it's because they've built a framework that only makes sense when gathering many plugins together to build an application. And then, to configure the application, you're using a database (which is a kind of configuration file… on steroids), or actual files.
HTH
I've done this frequently in company internal tools and games. Primary reason being simplicity: you just import the file and don't need to care about formats or parsers. Usually it has been exactly what #zmo said, constants meant for non programmers in the team to modify (say the size of the grid of the game level. or the display resolution).
Sometimes it has been useful to be able to have logic in the configuration. For example alternative functions that populate the initial configuration of the board in the game. I've found this a great advantage actually.
I acknowledge that this could lead to hard to debug problems. Perhaps in these cases those modules have been more like game level init modules than typical config files. Anyhow I've been really happy about the straightforward way to make clear textual config files with the ability to have logic there too and haven't gotten bit by it.
This is yet another config file option. There are several quite adequate config file formats available.
Please take a moment to understand the system administrator's viewpoint or some 3rd party vendor supporting your product. If there is yet another config file format they might drop your product. If you have a product that is of monumental importance then people will go through the hassle of learning the syntax just to read your config file. (like X.org, or apache)
If you plan on another programming language accessing/writing the config file info then a python based config file would be a bad idea.
I'm making a wxpython app that I will compile with the various freezing utility out there to create an executable for multiple platforms.
the program will be a map editer for a tile-based game engine
in this app I want to provide a scripting system so that advanced users can modify the behavior of the program such as modifying project data, exporting the project to a different format ect.
I want the system to work like so.
the user will place the python script they wish to run into a styled textbox and then press a button to execute the script.
I'm good with this so far thats all really simple stuff.
obtain the script from the text-box as a string compile it to a cod object with the inbuilt function compile() then execute the script with an exec statment
script = textbox.text #bla bla store the string
code = compile(script, "script", "exec") #make the code object
eval(code, globals())
the thing is, I want to make sure that this feature can't cause any errors or bugs
say if there is an import statement in the script. will this cause any problems taking into account that the code has been compiled with something like py2exe or py2app?
how do I make sure that the user can't break critical part of the program like modifying part of the GUI while still allowing them to modify the project data (the data is held in global properties in it's own module)? I think that this would mean modifying the globals dict that is passed to the eval function.
how to I make sure that this eval can't cause the program to hang due to a long or infinite loop?
how do I make sure that an error raised inside the user's code can't crash the whole app?
basically, how to I avoid all those problems that can arise when allowing the user to run their own code?
EDIT: Concerning the answers given
I don't feel like any of the answers so far have really answered my questions
yes they have been in part answered but not completely. I'm well aware the it is impossible to completely stop unsafe code. people are just too clever for one man (or even a teem) to think of all the ways to get around a security system and prevent them.
in fact I don't really care if they do. I'm more worried about some one unintentional breaking something they didn't know about. if some one really wanted to they could tear the app to shreds with the scripting functionality, but I couldn't care less. it will be their instance and all the problem they create will be gone when they restart the app unless they have messed with files on the HD.
I want to prevent the problems that arise when the user dose something stupid.
things like IOError's, SystaxErrors, InfiniteLoopErrors ect.
now the part about scope has been answered. I now understand how to define what functions and globals can be accessed from the eval function
but is there a way to make sure that the execution of their code can be stopped if it is taking too long?
a green thread system perhaps? (green because it would be eval to make users worry about thread safety)
also if a users uses an import module statement to load a module from even the default library that isn't used in the rest of the class. could this cause problems with the app being frozen by Py2exe, Py2app, or Freeze? what if they call a modal out side of the standard library? would it be enough that the modal is present in the same directory as the frozen executable?
I would like to get these answers with out creating a new question but I will if I must.
Easy answer: don't.
You can forbid certain keywords (import) and operations, and accesses to certain data structures, but ultimately you're giving your power users quite a bit of power. Since this is for a rich client that runs on the user's machine, a malicious user can crash or even trash the whole app if they really feel like it. But it's their instance to crash. Document it well and tell people what not to touch.
That said, I've done this sort of thing for web apps that execute user input and yes, call eval like this:
eval(code, {"__builtins__":None}, {safe_functions})
where safe_functions is a dictionary containing {"name": func} type pairs of functions you want your users to be able to access. If there's some essential data structure that you're positive your users will never want to poke at, just pop it out of globals before passing them in.
Incidentally, Guido addressed this issue on his blog a while ago. I'll see if I can find it.
Edit: found.
Short Answer: No
Is using eval in Python a bad practice?
Other related posts:
Safety of Python 'eval' For List Deserialization
It is not easy to create a safety net. The details too many and clever hacks are around:
Python: make eval safe
On your design goals:
It seems you are trying to build an extensible system by providing user to modify a lot of behavior and logic.
Easiest option is to ask them to write a script which you can evaluate (eval) during the program run.
How ever, a good design describes , scopes the flexibility and provides scripting mechanism through various design schemes ranging from configuration, plugin to scripting capabilities etc. The scripting apis if well defined can provide more meaningful extensibility. It is safer too.
I'd suggest providing some kind of plug-in API and allowing users to provide plug-ins in the form of text files. You can then import them as modules into their own namespace, catching syntax errors in the process, and call the various functions defined in the plug-in module, again checking for errors. You can provide an API module that defines the functions/classes from your program that the plug-in module has access to. That gives you the freedom to make changes to your application's architecture without breaking plug-ins, since you can just adapt the API module to expose the functionality in the same way.
If you have the option to switch to Tkinter you can use the bundled tcl interpreter to process your script. For that matter you can probably do that with a wxpython app if you don't start the tk event loop; just use the tcl interpreter without creating any windows.
Since the tcl interpreter is a separate thing it should be nearly impossible to crash the python interpreter if you are careful about what commands you expose to tcl. Plus, tcl makes creating DSLs very easy.
Python - the only scripting language with a built-in scripting engine :-).
i am creating ( researching possibility of ) a highly customizable python client and would like to allow users to actually edit the code in another language to customize the running of program. ( analogous to browser which itself coded in c/c++ and run another language html/js ). so my question is , is there any programming language implemented in pure python which i can see as a reference ( or use directly ? ) -- i need simple language ( simple statements and ifs can do )
edit: sorry if i did not make myself clear but what i want is "a language to customize the running of program" , even though pypi seems a great option, what i am looking for is more simple which i can study and extend myself if need arise. my google searches pointing towards xml based langagues. ( BMEL , XForms etc ).
The question isn't completely clear on scope, but I have a hunch that PyPy, embedding other full languages, and similar solutions might be overkill. It sounds like iamgopal may really be interested in something more like Interpreter Pattern or Little Language.
If the language you want to support is really small (see the Interpreter Pattern link), then hand-coding this yourself in Python won't be too hard. You can write a simple parser (Google around; here's one example), then walk the AST and evaluate user expressions.
However, if you expect this to be used for a long time or by many people, it may be worth throwing a real language at the problem. (I'd recommend Python itself if your users are already familiar with basic Python syntax).
Ren'Py is a modification to Python syntax built on top of Python itself, using the language tools in the stdlib.
For your user's sake, don't use an XML based language - XML is an awful basis for a programming language and your users will hate you for it.
Here is a suggestion. Use a strict subset of Python for your language. Use the compiler module to convert their code into an abstract syntax tree and walk the tree to to validate that the code conforms to your subset before converting the AST into python bytecode.
N.B. I just checked the docs and see that the compiler package is deprecated in 2.6 and removed in Python 3.x. Does anyone know why that is?
Numerous template languages such as Cheetah, Django templates, Genshi, Mako, Mighty might serve as an example.
Why not Python itself? With some care you can use eval to run user code.
One of the good thing about interpreted scripting languages is that you don't need another extra scripting language!
PLY (Python Lex-Yacc)
is something of your interest.
Possibly Common Lisp (or any other Lisp) will be the best choice for that task. Because Lisp make it possible to easily extend host language with powerful macroses and construct DSL (domain specific language).
If all you need is simple if statements and expressions, I'm sure it wouldn't be an awful task to parse each line. Something like
if some flag
activate some feature
deactivate some feature
elif some other flag
activate some feature
activate some feature
else
logout
Just write a class which, while parsing takes the first word, checks if it's "if, elif, else," etc, and if so, check a flag and set a flag saying you either are or are not executing until the next conditional. If it's not a conditional, call a function based on the first keyword that would modify the program state in some way.
The class could store some local execution state (are we in an if statement? If so are we executing this branch?) and have another class containing some global application state (flags that are checkable by if statements, etc).
This is probably the wrong thing to do in your situation (it's very prone to bugs, it's dangerous if you don't treat the data in the scripts correctly), but it's at least a start if you do decide to interpret your own mini-language.
Seriously though, if you try this, be very, very, srs careful. Don't give the scripts any functionality that they don't definitely need, because you are almost certainly opening security holes by doing something like this.
Don't say I didn't warn you.
I'm responsible for developing a large Python/Windows/Excel application used by a financial institution which has offices all round the world. Recently the regulations in one country have changed, and as a result we have been told that we need to create a "locked-down" version of our distribution.
After some frustrating conversations with my foreign counterpart, it seems that they are concerned that somebody might misuse the python interpreter on their computer to generate non-standard applications which might be used to circumvent security.
My initial suggestion was just to take away execute-rights on python.exe and pythonw.exe: Our application works as an Excel plugin which only uses the Python DLL. Those exe files are never actually used.
My counterpart remained concerned that somebody could make calls against the Python DLL - a hacker could exploit the "exec" function, for example from another programming language or virtual machine capable of calling functions in Windows DLLs, for example VBA.
Is there something we can do to prevent the DLL we want installed from being abused? At this point I ran out of ideas. I need to find a way to ensure that Python will only run our authorized programs.
Of course, there is an element of absurdity to this question: Since the computers all have Excel and Word they all have VBA which is a well-known scripting language somewhat equivalent in capability to Python.
It obviously does not make sense to worry about python when Excel's VBA is wide-open, however this is corporate politics and it's my team who are proposing to use Python, so we need to prove that our stuff can be made reasonably safe.
"reasonably safe" defined arbitrarily as "safer than Excel and VBA".
You can't win that fight. Because the fight is over the wrong thing. Anyone can use any EXE or DLL.
You need to define "locked down" differently -- in a way that you can succeed.
You need to define "locked down" as "cannot change the .PY files" or "we can audit all changes to the .PY files". If you shift the playing field to something you can control, you stand a small chance of winning.
Get the regulations -- don't trust anyone else to interpret them for you.
Be absolutely sure what the regulations require -- don't listen to someone else's interpretation.
Sadly, Python is not very safe in this way, and this is quite well known. Its dynamic nature combined with the very thin layer over standard OS functionality makes it hard to lock things down. Usually the best option is to ensure the user running the app has limited rights, but I expect that is not practical. Another is to run it in a virtual machine where potential damage is at least limited to that environment.
Failing that, someone with knowledge of Python's C API could build you a bespoke .dll that explicitly limits the scope of that particular Python interpreter, vetting imports and file writes etc., but that would require quite a thorough inspection of what functionality you need and don't need and some equally thorough testing after the fact.
One idea is to run IronPython inside an AppDomain on .NET. See David W.'s comment here.
Also, there is a module you can import to restrict the interpreter from writing files. It's called safelite.
You can use that, and point out that even the inventor of Python was unable to break the security of that module! (More than twice.)
I know that writing files is not the only security you're worried about. But what you are being asked to do is stupid, so hopefully the person asking you will see "safe Python" and be satisfied.
Follow the geordi way.
Each suspect program is run in its own jailed account. Monitor system calls from the process (I know how to do this in Windows, but it is beyond the scope of this question) and kill the process if needed.
I agree that you should examine the regulations to see what they actually require. If you really do need a "locked down" Python DLL, here's a way that might do it. It will probably take a while and won't be easy to get right. BTW, since this is theoretical, I'm waving my hands over work that varies from massive to trivial :-).
The idea is to modify Python.DLL so it only imports .py modules that have been signed by your private key. It verifies this by using the public key and a code signature stashed in a variable you add to each .py that you trust via a coding signing tool.
Thus you have:
Build a private build from the Python source.
In your build, change the import statement to check for a code signature on every module when imported.
Create a code signing tool that signs all the modules you use and assert are safe (aka trusted code). Something like __code_signature__ = "ABD03402340"; but cryptographically secure in each module's __init__.py file.
Create a private/public key pair for signing the code and guard the private key.
You probably don't want to sign any of the modules with exec() or eval() type capabilities.
.NET's code signing model might be helpful here if you can use IronPython.
I'm not sure its a great idea to pursue, but in the end you would be able to assert that your build of the Python.DLL would only run the code that you have signed with your private key.