faking a filesystem / virtual filesystem

faking a filesystem / virtual filesystem - python

I have a web service to which users upload python scripts that are run on a server. Those scripts process files that are on the server and I want them to be able to see only a certain hierarchy of the server's filesystem (best: a temporary folder on which I copy the files I want processed and the scripts).
The server will ultimately be a linux based one but if a solution is also possible on Windows it would be nice to know how.
What I though of is creating a user with restricted access to folders of the FS - ultimately only the folder containing the scripts and files - and launch the python interpreter using this user.
Can someone give me a better alternative? as relying only on this makes me feel insecure, I would like a real sandboxing or virtual FS feature where I could run safely untrusted code.

Either a chroot jail or a higher-order security mechanism such as SELinux can be used to restrict access to specific resources.

You are probably best to use a virtual machine like VirtualBox or VMware (perhaps even creating one per user/session).
That will allow you some control over other resources such as memory and network as well as disk
The only python that I know of that has such features built in is the one on Google App Engine. That may be a workable alternative for you too.

This is inherently insecure software. By letting users upload scripts you are introducing a remote code execution vulnerability. You have more to worry about than just modifying files, whats stopping the python script from accessing the network or other resources?
To solve this problem you need to use a sandbox. To better harden the system you can use a layered security approach.
The first layer, and the most important layer is a python sandbox. User supplied scripts will be executed within a python sandbox. This will give you the fine grained limitations that you need. Then, the entire python app should run within its own dedicated chroot. I highly recommend using the grsecurity kernel modules which improve the strength of any chroot. For instance a grsecuirty chroot cannot be broken unless the attacker can rip a hole into kernel land which is very difficult to do these days. Make sure your kernel is up to date.
The end result is that you are trying to limit the resources that an attacker's script has. Layers are a proven approach to security, as long as the layers are different enough such that the same attack won't break both of them. You want to isolate the script form the rest of the system as much as possible. Any resources that are shared are also paths for an attacker.

Related

How to limit python script so that it can't access local resources?

I am working on a project that allows users to upload a python script to an API and run it on a schedule. Currently, I'm trying to figure out a way to limit the functionality of the script so that it cannot access local files, mess with the flask server running the API, etc. Do you have any ideas on how I can achieve this? Is there anyway to make it so only specific libraries are available for importing?

Running other scripts on your server is serious security issue. If you are trying to deploy Python interpreter on your web application, you can try with something like judge0 - GitHub. It is free if you deploy it yourself and it will run scripts safely inside containers.

The simplest way is to ensure the user running the script is not root, but a user specifically designed for this task (e.g. part of a group that can only read and not write or execute). This means at minimum you should ensure all files have the appropriate mode. Then you can just use a pipe or something to run the script.
Alternatively, you could use a runtime that’s not “local”, like a VM or compute service (AWS lambda, etc). The latter would be simplest, and there’s lots of vendors who offer compute service with programmatic api.

What strategy should I use to periodically extract information from a specific folder

With this question I would like to gain some insights/verify that I'm on the right track with my thinking.
The request is as follows: I would like to create a database on a server. This database should be updated periodically by adding information that is present in a certain folder, on a different computer. Both the server and the computer will be within the same network (I may be running into some firewall issues).
So the method I am thinking of using is as follows. Create a tunnel between the two systems. I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Note: I dont think I will be able to install python on the pc with the files.
Is this at all doable? Is my approach solid? Please let me know if additional information is required.

Create a tunnel between the two systems.
If you mean setup the firewall between the two machines to allow connection, then yeah. Just open the postgresql port. Check postgresql.conf for the port number in case it isn't the default. Also put the correct permissions in pg_hba.conf so the computer's ip can connect to it.
I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Yeah, that's pretty standard. No problem.
Note: I dont think I will be able to install python on the pc with the files.
On Windows you can install anaconda for all users or just the current user. The latter doesn't require admin privileges, so that may help.
If you can't install python, then you can use some python tools to turn your python program into an executable that contains all the libraries, so you just have to drop that into a folder on the computer and execute it.
If you absolutely cannot install anything or execute any program, then you'll have to create a scheduled task to copy the data to a computer that has python over the network, and run the python script there, but that's extra complication.
If the source computer is automatically backed up to a server, you can also use the backup as a data source, but there will be a delay depending on how often it runs.

Loading Python libraries via http

I have several small Python libraries that I wrote with stuff that I find myself wanting over and over again. I think most programmers have something similar. I want to use these libraries from a variety of different machines so I've started keeping this stuff in my DropBox. However, I'd like to be able to use my code on machines on which I can't install DropBox or other cloud storage applications, even in portable form. I can just download the files every time one of them changes (DropBox can provide me a URL for each file in my Public folder), which is only a moderate nuisance. But--and I admit this is a longshot--is there a solution out there that will let me tell Python to load a library from my DropBox via http?
BTW, I'd like to add the whole remove folder to my sys.path, but getting a URL for a folder is complicated, so I'm going to try to walk before I run by starting with individual files.

Yes, it's possible. I think you want the combination of two previous questions:
How to download a file in python over HTTP
How to dynamically load a library in python
So your task basically breaks down into writing a little bit of glue code: download the URL via the first bullet, write it to a local file, and then import that file using the second bullet.
So that's how you'd do that.
BUT - please keep in mind that dynamically downloading and executing code has many potential security downfalls. Will you be doing this over a secure connection? Who else has the ability to manipulate that URL? There are a bunch of security issues inherent in downloading and executing code on the fly. I would ask you to consider going about your solution in a different way, but I'm giving you the answer you're asking for.
As a simple security check, you can establish a known-good hash for your file, and then refuse to import any file other than one that's on the list of known-good hashes. This makes it a pain to update your modules, but gives you a little bit of extra safety.

Don't use DropBox as a Revision control
Pick a real solution like Git
Setup access to the Git repository on one of your servers
Clone the repository to your worker machines and checkout master
Create a develop branch where you put every change you make
Test the changes and when you consider any of them stable, merge it to master
On your worker machines set up a cron job which periodically pulls from master branch of repository (and possibly restarts some Python processes as importing the same module again won't make Python interpreter aware of changes since imported modules are cached)
Enjoy your automatically updated workers :)
Don't feel shame - it happens that even experienced software developers come up with XY problem

Continuous Deployment: Version Numbering and Jenkins for Deployment?

We want to use continuous deployment.
We have:
all sources (python) in a local RhodeCode (git) server.
Jenkins for automated testing
SSH connections to the production systems (linux).
a tool which can update servers in one command.
Now something like this should be implemented:
run tests with Jenkins
if there is a failure. Stop, mail developers
If all tests are OK:
deploy
We are long enough in the business to write some scripts to do this.
My questions:
How to you update the version numbers? You could increment them, you could use a timestamp ...
Since we already use Jenkins, I think we do it in a script called by Jenkins. Any reason to do it with a different (better) tool?
My fear: Jenkins becomes a central server for things which are not related to testing (deploy). I think other tools like SaltStack or Ansible should be used for this. Up to now we use Fabric (simple layer above ssh). Maybe we should switch to a central management system before starting with continuous deployment.

Since we already use Jenkins, I think we do it in a script called by
Jenkins. Any reason to do it with a different (better) tool?
To answer your question: No, there aren't any big reasons to not go with Jenkins for deployment.
Pros:
You already know Jenkins (and you probably know some of the quirks)
You don't need to introduce yet another technology
You said that you want to write scripts called by Jenkins, so you can switch easily to a different system later.
Cons:
there might be better tools out there for deployment
Does not tie the best with Change Control tools.
Additional Considerations:
Do not use the same server for prod deployment and continuous build/integration. These are two different tasks performed by two different roles. Therefore two different permission schemes might be employed.
Use permissions wisely. I use two different permissions for my deploy and CI servers. We have 3 Jenkins servers right now.
CI and deploy to uncontrolled environments (Developers can play with these environments)
Deploy to controlled environments. (QA environemnts and upwards)
Deploy to prod (yes, that's the only purpose in live of this server.) with the most restrictive permission scheme.
sandbox, actually there is this forth server for Jenkins admins to play with.
Store your deployable artifacts outside of Jenkins (and you do if I read your question correctly).
So depending on your existing infrastructure and procedure you decide for the tooling. Jenkins won't log you in as long as you keep as much of the logic as possible in scripts that are only executed by Jenkins.

Python and os.chroot

I'm writing a web-server in Python as a hobby project. The code is targeted at *NIX machines. I'm new to developing on Linux and even newer to Python itself.
I am worried about people breaking out of the folder that I'm using to serve up the web-site. The most obvious way to do this is to filter requests for documents like /../../etc/passwd. However, I'm worried that there might be clever ways to go up the directory tree that I'm not aware of and consequentially my filter won't catch.
I'm considering adding using the os.chroot so that the root directory is the web-site itself. Is this is a safe way of protecting against these jail breaking attacks? Are there any potential pitfalls to doing this that will hurt me down the road?

Yes there are pitfalls. Security wise:
If you run as root, there are always ways to break out. So first chroot(), then PERMANENTLY drop privileges to an other user.
Put nothing which isn't absolutely required into the chroot tree. Especially no suid/sgid files, named pipes, unix domain sockets and device nodes.
Python wise your whole module loading gets screwed up. Python is simply not made for such scenarios. If your application is moderately complex you will run into module loading issues.
I think much more important than chrooting is running as a non privileged user and simply using the file system permissions to keep that user from reading anything of importance.

Check out Twisted. twistd supports privilege shedding and chroot operation out of the box. Additionally it has a whole framework for writing network services, daemons, and pretty much everything.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.