I need to perform some background advanced calculations on my data after it is collected in InfluxDb which is stored on the edge server, which means I have limited resources for the calculations.
Also I cannot block the data collection while I do calculations.
I am weighing using Kapacitor UDF streams vs custom Python scripts.
Please note I need to make the scripts configurable so that I can easily deply them to different environments with different sensors
It probably makes little difference, in general, especially for 'simple' usecases, though I lean towards standalone python scripts. (It may be better to use Kapacitor if you can cover your usecase using the kapacitor language for this instead of python based UDFs, but I found it insufficient since I needed to retrieve additional data from other databases)
Standalone python scripts could be a bit lighter, since you don't need to run the Kapacitor service.
Standalone python scripts could be a bit more configurable. Kapacitor is also pretty configurable but you'd have to spend a bit of time learning how to use it.
Standalone python scripts could be a bit more stable. I've experimented with python UDF's a couple of years ago and found them unstable and buggy. While this may have improved by now, you'd still be relying on this being supported and if go is not your language you might have trouble debugging and fixing issues yourself.
Related
For to offer interactive examples about data analysis, I'd like to embed an interactive python shell. It does not necessarily have to be a real Python shell. Users shall be given tasks that they can execute in the shell. This is similar to existing tutorials, as seen on, e.g., http://www.codecademy.org, but I'd like to work with libraries that those solutions do not offer, as far as I understood.
In order to get a real shell on the website, I think of two approaches:
I found projects like http://www.repl.it, but it seems rather difficult to include the necessary libraries like SciPy, NumPy, and Pandas. In addition, user input has to be validated and I'm not sure whether that works with those shells I found.
I could pipe the commands through a web applications to a Python installation on my server, but I'm scared of using eval() on foreign, arbitrary code. Is there a safe mode for Python? I found http://www.pypy.org. Although they offer a Python sandbox, unfortunately, they do not support the libraries I need.
Alternatively, I thought of just embedding a "fake shell", which I build to copy the behaviour of the functions that I want to explain. Of course, this would result in more work, as I would have to write a fake interface, but for now it seems to be the only possibility.
I hope that this question is not too generic; I'm looking for either a good HTML/JS library that helps me put a fake shell on my website or a library/service/software that can embed a real Python shell with the required modules installed.
There is no way to run untrusted Python safely; Python's dynamic nature allows for too many ways to break through any protective layers you could care to think of.
Instead, run each session on a new virtual machine, properly locked down (firewalled, unprivileged user), which you shut down after a hard time limit. New sessions get a new, clean virtual machine.
This isolates you from any malicious code that might run and try to break out of a sandbox; a good virtual machine is hardware-isolated by the processor from the host OS, something a Python-only layer could never achieve.
This process is sometimes called sandboxing.
You can find some good information on the python wiki
There are basically three options available:
machine-level mechanisms (such as a VM, as Martijn Pieters suggested)
OS-level mechanisms (such as a chroot or SELinux)
custom interpreters, such as pypy (which has sandboxing capabilities, as you mentioned), or Jython, where you may be able to use the Java security manager or applet mechanisms.
You may also want to check Restricted Python, which is especially useful for very restricted environments, but security will depend on its configuration.
Ultimately, your choice of solution will depend on what you want to restrict:
Filesystem access? Block everything, or allow certain directories?
Network access, such as sockets?
Arbitrary system calls?
I want to make a Cocoa OS X app. I would prefer to use python scripts in it's core. However, not sure how safe is it. I know that python penetration is quite high, but what about version conflicts and migrations? Is it worth bundling whole python runtime into the OS X app?
Thanks.
So.... what this really boils down to is compatibility issues across versions, something that scripting languages are notoriously bad at maintaining. Python does better than most, but it is still quite problematic.
Apple has generally shipped legacy versions of interpreters on the system for exactly this reason. Thus, if you do rely on the system installed Python, I would recommend locking to a particular version. I.e. use /usr/bin/python2.6 and not the generic /usr/bin/python.
The alternative is as you state; bundle the python interpreter and any needed resources into your app. That is a bit of a pain the butt to do, but it addresses the compatibility issue. More or less; the reality is that Python is, effectively, an interface to the OS and, thus, is quite large with potential to break across any release. Not much you can about that, though.
Another possibility is to go the route that #kindall proposes; use PyObjC and implement your Cocoa application entirely or mostly in Python. Works fine. Been there, done that, and wouldn't do it again, frankly, as the maintenance/debugging issues of large scale scripted applications are nasty.
As an alternative, you might want to investigate using Lua (http://www.lua.org) as it is very much designed to be embedded in applications. Lua has a tiny interpreter and you can fully control exactly what features of your app are accessible at runtime. For example, World of Warcraft's UI is mostly implemented as Lua gluing together a set of fast UI primitives. Fully customizable on the client side, which is really impressive when you consider the security implications.
You should use py2app. It will bundle a Python executable, all the libraries you need, and your script together into a single executable. You can then add other executables (e.g. your Objective-C parts) into that app bundle.
I would like to know if there are any documented performance differences between a Python interpreter that I can install from an rpm (or using yum) and a Python interpreter compiled from sources (with a priori well set flags for compilations).
I am using a Redhat 6.3 machine as Django/Apache/Mod_WSGI production server. I have already properly compiled everything in different setups and in different orders. However, I usually keep the build-dev dependencies on such machine. For some various ego-related (and more or less practical) reasons, I would like to use Python-2.7.3. By default, Redhat comes with Python-2.6.6. I think I could go with it but it would hurt me somehow (I would have to drop and find a replacement for a few libraries and my ego).
However, besides my ego and dependencies, I would like to know what would be the impact in terms of performance for a Django server.
If you compile with the exact same flags that were used to compile the RPM version, you will get a binary that's exactly as fast. And you can get those flags by looking at the RPM's spec file.
However, you can sometimes do better than the pre-built version. For example, you can let the compiler optimize for your specific CPU, instead of for "general 386 compatible" (or whatever the RPM was optimized for). Of course if you don't know what you're doing (or are doing it on purpose), it's always possible to build something slower than the pre-built version, too.
Meanwhile, 2.7.3 is faster in a few areas than 2.6.6. Most of them usually won't affect you, but if they do, they'll probably be a big win.
Finally, for the vast majority of Python code, the speed of the Python interpreter itself isn't relevant to your overall performance or scalability. (And when it is, you probably want to try PyPy, Jython, or IronPython to replace CPython.) This is especially true for a WSGI service. If you're not doing anything slow, Apache will probably be the bottleneck. If you are doing anything slow, it's probably something I/O bound and well outside of Python's control (like reading files).
Ultimately, the only way you can know how much gain you get is by trying it both ways and performance testing. But if you just want a rule of thumb, I'd say expect a 0% gain, and be pleasantly surprised if you get lucky.
Is there a way to limit the abilities of python scripts running under an embedded interpretor? Specifically I wish to prevent the scripts from doing things like the following:
Importing python extension modules (ie .pyd modules), except those specifically allowed by the application.
Manipulating processes in any way (ie starting new processes, or terminating the application).
Any kind of networking.
Manipulating the file system (eg creating, modifying and deleting files).
No. There's no easy way to prevent those things on CPython. Your options are:
Edit CPython source code and remove things you don't want - provide mocking methods for all those things. Very error-prone and hard to do. This is the approach of Google's App Engine.
Use Restricted Python. However, with it you can't prevent your user from exhausting the memory available or running infinite eat-all-cpu loops.
Use another python implementation. PyPy has a sandbox mode you can use. Jython runs under java and I guess java can be sandboxed.
Maybe this can be helpful. You have an example provided on how to work with the ast.
What you want it Google's Unladen Swallow project that Python version of App Engine runs on.
Modules are severely restricted, ctypes are not allowed, sockets are matched against some policy or other, in other words you get a sandboxed version of Python, in line with their Java offering.
I'd like to point out that this makes the system almost useless. Well useless for anything cooler than yet another [App Engine] App. Forget monkey-patching system modules, and even access to own stack is restricted. Totally un-dynamic-like.
OT: games typically embed LUA for scripting, perhaps you should check it out.
The background
I'm building a fair-sized web application with a friend in my own time, and we've decided to go with the Django framework on Python. Django provides us with a lot of features we're going to need, so please don't suggest alternative frameworks.
The only decision I'm having trouble with, is whether we use Python or Jython to develop our application. Now I'm pretty familiar with Java and could possibly benefit from the libraries within the JDK. I know minimal Python, but am using this project as an opportunity to learn a new language - so the majority of work will be written in Python.
The attractiveness of Jython is of course the JVM. The number of python/django enabled web-hosts is extremely minimal - whereas I'm assuming I could drop a jython/django application on a huge variety of hosts. This isn't a massive design decision, but still one I think needs to be decided. I'd really prefer jython over python for the jvm accessibility alone.
Questions
Does Jython have many limitations compared to regular python? Will running django on jython cause problems? How quick is the Jython team to release updates alongside Python? Will Django work as advertised on Jython (with very minimal pre-configuration)?
Decision
Thanks for the helpful comments. What I think I'm going to do is develop in Jython for the JVM support - but to try to only use Python code/libraries. Portability isn't a major concern so if I need a library in the JDK (not readily available in python), I'll use it. As long as Django is fully supported, I'm happy.
Django does work on Jython, although you'll need to use the development release of Jython, since technically Jython 2.5 is still in beta. However, Django 1.0 and up should work unmodified.
So as to whether you should use the regular Python implementation or Jython, I'd say it's a matter of whether you prefer having all the Java libraries available or all of the Python libraries. At this point you can expect almost everything in the Python standard library to work with Jython, but there are still plenty of third-party packages which will not work, especially C extension modules. I'd personally recommend going with regular Python, but if you've got a ton of JVM experience and want to stick with what you know, then I can respect that.
As for finding Python hosting, this page might be helpful.
I'd say that if you like Django, you'll also like Python. Don't make the (far too common) mistake of mixing past language's experience while you learn a new one. Only after mastering Python, you'll have the experience to judge if a hybrid language is better than either one.
It's true that very few cheap hostings offer Django preinstalled; but it's quite probable that that will change, given that it's the most similar environment to Google's app engine. (and most GAE projects can be made to run on Django)
I have recently started working on an open source desktop project in my spare time. So this may not apply. I came to the same the question. I decided that I should write as much of the code as possible in python (and Django) and target all the platforms CPython, Jython, and IronPython.
Then, I decided that I would write plugins that would interface with libraries on different implementations (for example, different GUI libraries).
Why? I decided early on that longevity of my code may depend on targeting not only CPython but also virtual machines. For today's purposes CPython is the way to go because of speed, but who knows about tomorrow. If you code is flexible enough, you may not have to decide on targeting one.
The downside to this approach is that you will have more code to create and maintain.
Django is supposed to be jython-compatible sinc version 1.0.
This tutorial is a bit outdated, but from there you can see there are no special issues.