wrt the python standard logging framework,
is there an accepted wisdom in the Python world (or even in the log4j world) about doing setLevel() inside a library module that you're authoring?
Doing a setLevel() inside your library lets you establish a default,
and yet that default is easy to override by the sw that's using the library module.
Is this accepted as good practice or considered bad practice?
"""library module"""
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO) # Is this accepted as good practice or considered bad practice?
...
logger.info('milepost')
Maybe (A), "Yes, do it!" The practice is widespread -- see for yourself by cd-ing to your python tree, and see
egrep -r setLevel --exclude-dir=test --exclude-dir=tests *
Or (B), "No, it's bad form!" There already exists a "default"... implicit from a logger's ancestor in the hierarchy of loggers.
If you make yet another default, it's confusing to the users of your library.
Also, what if configurating (from a json or yaml-file, in the main code) happens before your library is loaded? Then later, your library is imported for the first time (loaded). Maybe a library default could override what was already configured via the configuration-file. That would be confusing too!
Or (C), "Yes, explicit is better than implicit!"
I'm hoping this has a clear answer, but I haven't found it yet.
Thanks for any pointers!
I am reading this below as implying DON'T setLevel() in the library you're authoring. Let it be, because the framework already defaults to WARNING, right?
From https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library,
Configuring Logging for a Library
When developing a library which uses
logging, you should take care to document how the library uses logging
-- for example, the names of loggers used. Some consideration also needs to be given to its logging configuration. If the using
application does not use logging, and library code makes logging
calls, then (as described in the previous section) events of severity
WARNING and greater will be printed to sys.stderr. This is regarded as
the best default behaviour.
While developing an application using PyQT5 I'm still trying to find a workflow that works best for me. At the moment, I'm mostly working on the gui (to demo to client), leaving the back-end code for later, so I'm connecting the button signals to empty functions that I need to write later. Up until now, I've added print('#TODO: xxx') in each function that is empty. This gives me some feedback in the terminal as to how often certain functions are called and which to prioritize. This sort-of works for me.
At the same time I am using the logging module as it is intended to be used. It seems like I could add a new logging level and use the logging module for this purpose as well, something like this:
def print_badge(self, pers: Person):
# TODO: print_badge
# print('#TODO: print badge') #<- not needed anymore
self.log.TODO('print badge')
The logging documentation seems to discourage creating own logging levels though. Is there a reason why I shouldn't do this or a better option?
The reason why custom logging levels are discouraged in general is that people who configure logging have to take these levels into account - if multiple libraries did this with multiple levels, one might have to know all of the custom levels used by all of the libraries in use in order to configure logging a particular way. Of course the logging package allows you to do it for those cases where it might be really necessary - perhaps for an application and libraries which are self-contained and not public, so the question of configuring those libraries by someone else doesn't arise.
It seems that one could easily use the DEBUG level for your use case - just include TODO in the message itself, and one can grep logs for that just as easily as a custom level. Or you can have a logger called TODO to which you log these messages, and handle those in a separate handler/destination (e.g. a file todo.log).
I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.
I'm developing a reusable Python module (for Python 2.7 if it matters). I am wondering what the best practices are with regard to logging for others who wish to include my module in a larger framework which has its own logging approach.
Is there a standard way to set up logging within my module to use whatever loggers an external calling program has defined?
Here is a great blog post that outlines some best practices, which I have tried to adopt as my own: http://eric.themoritzfamily.com/learning-python-logging.html
He goes through all the details and rationale, but essentially it comes down to a couple simple principles.
Use getLogger based on your module's __name__ and start logging:
import logging
logger = logging.getLogger(__name__)
logger.info( ... )
logger.debug( ... )
Don't define handlers or configure the appropriate log-level in your module, because they may interfere with the "external calling program" you have in mind. You (or your users) can set up the handlers and the desired level of logging detail for all the sub-modules as needed, in the main application code.
ConfigParser is the much debated vanilla configuration parser for Python.
However you can simply import config where config.py has python code which sets configuration parameters.
What are the pros\cons of these two approaches of configuration?
When should I choose each?
The biggest issue I see with import config is that you don't know what will happen when you import it. Yes, you will get a set of symbols that are naturally referenced using a . style interface. But the code in the configuration file can also do who-knows-what. Now, if you completely trust your users, then allowing them to do whatever they feel like in the config file is possibly a good thing. However, if you have unknown quantities, or you want to protect users from themselves, then having a configuration file in a more traditional format will be safer and more secure.
This completley depends on your needs and goals for the script. One way really isnt "better", just different. For a very detailed discussion on most of pythons config parsers (including ConfigParser and config modules), see:
Python Wiki - ConfigParserShootout
"import config" is very simple, flexible and powerfull but, since it can do anything, it might be dangerous if the config.py is not in a safe place.
IMO it comes down to a matter of personal style. Do you intend for 3rd parties to edit your config? If so, maybe it makes sense to have a more "natural" configuration style a la ConfigParser that is not as technical and that may not be too far over the heads of your target audience.
Many popular projects such as Fabric and Django use the "native" configuration style which is essentially just a Python module. Fabry has fabfile.py and Django has settings.py.
Overall, you're going to have a lot more flexibility using a native approach of importing a module simply because you can do anything you want in that file, including defining functions, classes, etc. because it's just another Python module you're importing.