I'm planning to build a WebApp that will need to execute scripts based on the argument that an user will provide in a text-field or in the Url.
possible solutions that I have found:
create a lib directory in the root directory of the project, and put the scripts there, and import it from views.
using subprocess module to directly run the scripts in the following way:
subprocess.call(['python', 'somescript.py', argument_1,...])
argument_1: should be what an end user provides.
I'm planning to build a WebApp that will need to execute scripts
Why should it "execute scripts" ? Turn your "scripts" into proper modules, import the relevant functions and call them. The fact that Python can be used as a "scripting language" doesn't mean it's not a proper programming language.
Approach (1) should be the default approach. Never subprocess unless you absolutely have to.
Disadvantages of subprocessing:
Depends on the underlying OS and in your case Python (i.e. is python command the same as the Python that runs the original script?).
Potentially harder to make safe.
Harder to pass values, return results and report errors.
Eats more memory and cpu (a side effect is that you can utilize all cpu cores but since you are writing a web app it is likely you do that anyway).
Generally harder to code and maintain.
Advantages of subprocessing:
Isolates the runtime. This is useful if for example scripts are uploaded by users. You don't want them to mess with your application.
Related to 1: potentially easier to dynamically add scripts. Not that you should do that anyway. Also becomes harder when you have more then 1 server and you need to synchronize them.
Well, you can run non-python code that way. But it doesn't apply to your case.
Related
I need to integrate a body of Python code into an existing OSGi (Apache Felix) deployment.
I assume, or at least hope, that packages exist to help with this effort.
If it helps, the Python code is still relatively new and small, so can probably re re-architected to meet whatever constraints are needed. However, it must remain in Python, because of dependencies on third-party libraries.
What are suggested best practices?
The trick is to make this an extender, see 1 and 2. You want your Python code to be separate from the code that handles the interaction with the interpreter. So what you do is wrap the Python code and any native libraries in a bundle. This is trivial since it is just a zip file.
You then develop a bundle that listens to starting bundle (see the BundleTracker) that have python code. A manifest is often used but you can also look in a directory in the JAR. If you detect this code, you extract any native libraries and run the code in the interpeter of your choice.
If can use JYthon then that would be highly recommended. You can then carry the interpreter as an OSGi bundle that runs on the VM. If you need to use a native compiler your life is less rosy. You can rely on the environment to provide you with an interpreter but then why use OSGi in the first place. You basically lose the write once run anywhere advantage. You could go the full monty by creating bundles that contain Python installers for all platforms you support. Can be done, not even that hard, but a maintenance nightmare. Believe me, native code suck, it only does it a bit faster than Java.
For to offer interactive examples about data analysis, I'd like to embed an interactive python shell. It does not necessarily have to be a real Python shell. Users shall be given tasks that they can execute in the shell. This is similar to existing tutorials, as seen on, e.g., http://www.codecademy.org, but I'd like to work with libraries that those solutions do not offer, as far as I understood.
In order to get a real shell on the website, I think of two approaches:
I found projects like http://www.repl.it, but it seems rather difficult to include the necessary libraries like SciPy, NumPy, and Pandas. In addition, user input has to be validated and I'm not sure whether that works with those shells I found.
I could pipe the commands through a web applications to a Python installation on my server, but I'm scared of using eval() on foreign, arbitrary code. Is there a safe mode for Python? I found http://www.pypy.org. Although they offer a Python sandbox, unfortunately, they do not support the libraries I need.
Alternatively, I thought of just embedding a "fake shell", which I build to copy the behaviour of the functions that I want to explain. Of course, this would result in more work, as I would have to write a fake interface, but for now it seems to be the only possibility.
I hope that this question is not too generic; I'm looking for either a good HTML/JS library that helps me put a fake shell on my website or a library/service/software that can embed a real Python shell with the required modules installed.
There is no way to run untrusted Python safely; Python's dynamic nature allows for too many ways to break through any protective layers you could care to think of.
Instead, run each session on a new virtual machine, properly locked down (firewalled, unprivileged user), which you shut down after a hard time limit. New sessions get a new, clean virtual machine.
This isolates you from any malicious code that might run and try to break out of a sandbox; a good virtual machine is hardware-isolated by the processor from the host OS, something a Python-only layer could never achieve.
This process is sometimes called sandboxing.
You can find some good information on the python wiki
There are basically three options available:
machine-level mechanisms (such as a VM, as Martijn Pieters suggested)
OS-level mechanisms (such as a chroot or SELinux)
custom interpreters, such as pypy (which has sandboxing capabilities, as you mentioned), or Jython, where you may be able to use the Java security manager or applet mechanisms.
You may also want to check Restricted Python, which is especially useful for very restricted environments, but security will depend on its configuration.
Ultimately, your choice of solution will depend on what you want to restrict:
Filesystem access? Block everything, or allow certain directories?
Network access, such as sockets?
Arbitrary system calls?
I need to send code to remote clients to be executed in them but security is a concern for me right now. I don't want unsafe code to be executed there so I would like to control what a program is doing. I mean for example, know if is making connections, where is connecting to, if is reading local files, etc. Is this possible with Python?
EDIT: I'm thinking in something similar to Android permission system. I want to know what a code will do and if it does something different, stop it.
You could use a different Python runtime:
if you run your script using Jython; you can exploit Java's permission system
with Pypy's sandboxed version you can choose what is allowed to run in your controller script
There used to be a module in Python called bastian, but that was deprecated as it wasn't that secure. There's also I believe something called RPython, but I don't know too much about that.
I would in this case use Pyro and write the code on the target server. That way you know clients can only execute written and tested code.
edit - it's probably worth noting that Pyro also supports http://en.wikipedia.org/wiki/Privilege_separation - although I've not had to use it for that.
I think you are looking for a sandboxed python. There used to be an effort to implement this, but it has been abolished a couple of years ago.
Sandboxed python in the python wiki offers a nice overview of possible options for your usecase.
The most rigourous (but probably the slowest) way is to run Python on a bare OS in an emulator.
Depending on the OS you use, there are several ways of running programs with restrictions, but without the overhead of an emulator:
FreeBSD has a nice integrated solution in the form of jails.
These grew out of the chroot system call.
Linux-VServer aims to do more or less the same on Linux.
As a long time Python programmer, I wonder, if a central aspect of Python culture eluded me a long time: What do we do instead of Makefiles?
Most ruby-projects I've seen (not just rails) use Rake, shortly after node.js became popular, there was cake. In many other (compiled and non-compiled) languages there are classic Make files.
But in Python, no one seems to need such infrastructure. I randomly picked Python projects on GitHub, and they had no automation, besides the installation, provided by setup.py.
What's the reason behind this?
Is there nothing to automate? Do most programmers prefer to run style checks, tests, etc. manually?
Some examples:
dependencies sets up a virtualenv and installs the dependencies
check calls the pep8 and pylint commandlinetools.
the test task depends on dependencies enables the virtualenv, starts selenium-server for the integration tests, and calls nosetest
the coffeescript task compiles all coffeescripts to minified javascript
the runserver task depends on dependencies and coffeescript
the deploy task depends on check and test and deploys the project.
the docs task calls sphinx with the appropiate arguments
Some of them are just one or two-liners, but IMHO, they add up. Due to the Makefile, I don't have to remember them.
To clarify: I'm not looking for a Python equivalent for Rake. I'm glad with paver. I'm looking for the reasons.
Actually, automation is useful to Python developers too!
Invoke is probably the closest tool to what you have in mind, for automation of common repetitive Python tasks: https://github.com/pyinvoke/invoke
With invoke, you can create a tasks.py like this one (borrowed from the invoke docs)
from invoke import run, task
#task
def clean(docs=False, bytecode=False, extra=''):
patterns = ['build']
if docs:
patterns.append('docs/_build')
if bytecode:
patterns.append('**/*.pyc')
if extra:
patterns.append(extra)
for pattern in patterns:
run("rm -rf %s" % pattern)
#task
def build(docs=False):
run("python setup.py build")
if docs:
run("sphinx-build docs docs/_build")
You can then run the tasks at the command line, for example:
$ invoke clean
$ invoke build --docs
Another option is to simply use a Makefile. For example, a Python project's Makefile could look like this:
docs:
$(MAKE) -C docs clean
$(MAKE) -C docs html
open docs/_build/html/index.html
release: clean
python setup.py sdist upload
sdist: clean
python setup.py sdist
ls -l dist
Setuptools can automate a lot of things, and for things that aren't built-in, it's easily extensible.
To run unittests, you can use the setup.py test command after having added a test_suite argument to the setup() call. (documentation)
Dependencies (even if not available on PyPI) can be handled by adding a install_requires/extras_require/dependency_links argument to the setup() call. (documentation)
To create a .deb package, you can use the stdeb module.
For everything else, you can add custom setup.py commands.
But I agree with S.Lott, most of the tasks you'd wish to automate (except dependencies handling maybe, it's the only one I find really useful) are tasks you don't run everyday, so there wouldn't be any real productivity improvement by automating them.
There is a number of options for automation in Python. I don't think there is a culture against automation, there is just not one dominant way of doing it. The common denominator is distutils.
The one which is closed to your description is buildout. This is mostly used in the Zope/Plone world.
I myself use a combination of the following: Distribute, pip and Fabric. I am mostly developing using Django that has manage.py for automation commands.
It is also being actively worked on in Python 3.3
Any decent test tool has a way of running the entire suite in a single command, and nothing is stopping you from using rake, make, or anything else, really.
There is little reason to invent a new way of doing things when existing methods work perfectly well - why re-invent something just because YOU didn't invent it? (NIH).
The make utility is an optimization tool which reduces the time spent building a software image. The reduction in time is obtained when all of the intermediate materials from a previous build are still available, and only a small change has been made to the inputs (such as source code). In this situation, make is able to perform an "incremental build": rebuild only a subset of the intermediate pieces that are impacted by the change to the inputs.
When a complete build takes place, all that make effectively does is to execute a set of scripting steps. These same steps could just be deposited into a flat script. The -n option of make will in fact print these steps, which makes this possible.
A Makefile isn't "automation"; it's "automation with a view toward optimized incremental rebuilds." Anything scripted with any scripting tool is automation.
So, why would Python project eschew tools like make? Probably because Python projects don't struggle with long build times that they are eager to optimize. And, also, the compilation of a .py to a .pyc file does not have the same web of dependencies like a .c to a .o.
A C source file can #include hundreds of dependent files; a one-character change in any one of these files can mean that the source file must be recompiled. A properly written Makefile will detect when that is or is not the case.
A big C or C++ project without an incremental build system would mean that a developer has to wait hours for an executable image to pop out for testing. Fast, incremental builds are essential.
In the case of Python, probably all you have to worry about is when a .py file is newer than its corresponding .pyc, which can be handled by simple scripting: loop over all the files, and recompile anything newer than its byte code. Moreover, compilation is optional in the first place!
So the reason Python projects tend not to use make is that their need to perform incremental rebuild optimization is low, and they use other tools for automation; tools that are more familiar to Python programmers, like Python itself.
The original PEP where this was raised can be found here. Distutils has become the standard method for distributing and installing Python modules.
Why? It just happens that python is a wonderful language to perform the installation of Python modules with.
Here are few examples of makefile usage with python:
https://blog.horejsek.com/makefile-with-python/
https://krzysztofzuraw.com/blog/2016/makefiles-in-python-projects.html
I think that a most of people is not aware "makefile for python" case. It could be useful, but "sexiness ratio" is too small to propagate rapidly (just my PPOV).
Is there nothing to automate?
Not really. All but two of the examples are one-line commands.
tl;dr Very little of this is really interesting or complex. Very little of this seems to benefit from "automation".
Due to documentation, I don't have to remember the commands to do this.
Do most programmers prefer to run stylechecks, tests, etc. manually?
Yes.
generation documentation,
the docs task calls sphinx with the appropiate arguments
It's one line of code. Automation doesn't help much.
sphinx-build -b html source build/html. That's a script. Written in Python.
We do this rarely. A few times a week. After "significant" changes.
running stylechecks (Pylint, Pyflakes and the pep8-cmdtool).
check calls the pep8 and pylint commandlinetools
We don't do this. We use unit testing instead of pylint.
You could automate that three-step process.
But I can see how SCons or make might help someone here.
tests
There might be space for "automation" here. It's two lines: the non-Django unit tests (python test/main.py) and the Django tests. (manage.py test). Automation could be applied to run both lines.
We do this dozens of times each day. We never knew we needed "automation".
dependecies sets up a virtualenv and installs the dependencies
Done so rarely that a simple list of steps is all that we've ever needed. We track our dependencies very, very carefully, so there are never any surprises.
We don't do this.
the test task depends on dependencies enables the virtualenv, starts selenium-server for the integration tests, and calls nosetest
The start server & run nosetest as a two-step "automation" makes some sense. It saves you from entering the two shell commands to run both steps.
the coffeescript task compiles all coffeescripts to minified javascript
This is something that's very rare for us. I suppose it's a good example of something to be automated. Automating the one-line script could be helpful.
I can see how SCons or make might help someone here.
the runserver task depends on dependencies and coffeescript
Except. The dependencies change so rarely, that this seems like overkill. I supposed it can be a good idea of you're not tracking dependencies well in the first place.
the deploy task depends on check and test and deploys the project.
It's an svn co and python setup.py install on the server, followed by a bunch of customer-specific copies from the subversion area to the customer /www area. That's a script. Written in Python.
It's not a general make or SCons kind of thing. It has only one actor (a sysadmin) and one use case. We wouldn't ever mingle deployment with other development, QA or test tasks.
Is there a way to limit the abilities of python scripts running under an embedded interpretor? Specifically I wish to prevent the scripts from doing things like the following:
Importing python extension modules (ie .pyd modules), except those specifically allowed by the application.
Manipulating processes in any way (ie starting new processes, or terminating the application).
Any kind of networking.
Manipulating the file system (eg creating, modifying and deleting files).
No. There's no easy way to prevent those things on CPython. Your options are:
Edit CPython source code and remove things you don't want - provide mocking methods for all those things. Very error-prone and hard to do. This is the approach of Google's App Engine.
Use Restricted Python. However, with it you can't prevent your user from exhausting the memory available or running infinite eat-all-cpu loops.
Use another python implementation. PyPy has a sandbox mode you can use. Jython runs under java and I guess java can be sandboxed.
Maybe this can be helpful. You have an example provided on how to work with the ast.
What you want it Google's Unladen Swallow project that Python version of App Engine runs on.
Modules are severely restricted, ctypes are not allowed, sockets are matched against some policy or other, in other words you get a sandboxed version of Python, in line with their Java offering.
I'd like to point out that this makes the system almost useless. Well useless for anything cooler than yet another [App Engine] App. Forget monkey-patching system modules, and even access to own stack is restricted. Totally un-dynamic-like.
OT: games typically embed LUA for scripting, perhaps you should check it out.