I am working on a project which is a system where students can submit their coding assignments to a server where the server then executes their code as part of tests and then returns the grades they received based on the results of the tests executed.
There is a security concern that where the student submitting the code could "damage" the server by including code to delete the directory where the system's files are stored. The files are stored in a directory hierarchy where if the student somehow figured out the path to it, they could easily code their program to delete this directory.
I have since setup permissions so that the server is run under a different user. This user only has access to a single directory that stores all the submissions for that module. They could still theoretically wipe out this directory, but it is better than deleting the whole system. While it is still not ideal, I am not sure how to approach it.
The server is coded in Python. I have tried using os.chown etc to change the ownership of directories to a single user, however, I found out that the program needs to be run under a superuser to change ownership and also for calls to os.setuid and os.setgid.
Basically, my question is, is there any way to run a program while restricting it to the directory it's running within? This would involve only allowing it to delete files/folders within its working directory. I know there is a chroot command but that also requires superuser privilegs.
It is also not possible to run the program under a different user without sudo privileges either. The programs are executed using subprocess.Popen().
I know it's a long shot as I have tried a lot of research with permissions and the current solution restricting deletion down to the submissions data directory is as far as I could get. It is still not ideal however and the server will not be allowed to be run with sudo privileges.
If there are any program attributes that can be set to prevent that program from deleting files, it would be great, but I don't think such a system exists. I may have to resort to "scanning" the submitted code file for dangerous calls and reject the file if there are any such calls in it.
The current directory hierarchy is used:
.handin/module-code/data (data is where submissions for each student are stored)`
Currently, the data directory is created with a group handin which allows any members of that group to create directories inside it. With the server running under a user handin, it creates directories/files inside in that data directory with user handin and group handin. So, the only files the server could delete as user handin is all directories underneath data, rather than the whole .handin directory.
Underneath data, you have directories named from the student ids, e.g. .handin/module-code/data/12345678 and underneath that you have a directory with the assignment name. The assignment directory is the directory the code is executed in. It would be ideal if it would be that directory that could only be deleted, but if not, the student-id directory.
So, I have solved the problem using separate Docker containers for each execution. I created separate images for different languages in the programs that would be executed in. I then created a user in these containers that had just enough permissions to create/delete files inside their home directory, essentially sandboxing it.
I then used the epicbox python module (https://pypi.org/project/epicbox/) which was created by Stepik.org to grade programming assignments in their own containers (very similar to the problem that I needed to solve).
That creates a volume internally to allow each docker container that is run to share files:
with epicbox.working_directory() as workdir:
epicbox.run(....)
epicbox.run(....)
.....
epicbox.run(....)
Each run call spins up a docker container but with the internally created volume, each container can access files produced from the previous call to run(). The module allows you to upload files from your local machine to the docker container and then compile/execute them there.
I did have to do some "hacks" to configure it to my requirements (change the default working directory in the docker container as epicbox did not provide a method to change it easily). This solution adds a few extra seconds to the execution time when compared to executing on the server itself, but for my use case, it is an acceptable trade-off for security.
Related
With this question I would like to gain some insights/verify that I'm on the right track with my thinking.
The request is as follows: I would like to create a database on a server. This database should be updated periodically by adding information that is present in a certain folder, on a different computer. Both the server and the computer will be within the same network (I may be running into some firewall issues).
So the method I am thinking of using is as follows. Create a tunnel between the two systems. I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Note: I dont think I will be able to install python on the pc with the files.
Is this at all doable? Is my approach solid? Please let me know if additional information is required.
Create a tunnel between the two systems.
If you mean setup the firewall between the two machines to allow connection, then yeah. Just open the postgresql port. Check postgresql.conf for the port number in case it isn't the default. Also put the correct permissions in pg_hba.conf so the computer's ip can connect to it.
I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Yeah, that's pretty standard. No problem.
Note: I dont think I will be able to install python on the pc with the files.
On Windows you can install anaconda for all users or just the current user. The latter doesn't require admin privileges, so that may help.
If you can't install python, then you can use some python tools to turn your python program into an executable that contains all the libraries, so you just have to drop that into a folder on the computer and execute it.
If you absolutely cannot install anything or execute any program, then you'll have to create a scheduled task to copy the data to a computer that has python over the network, and run the python script there, but that's extra complication.
If the source computer is automatically backed up to a server, you can also use the backup as a data source, but there will be a delay depending on how often it runs.
Say I’m working on a complex Python program for keeping track of shipment data along with Git for version control. On my local machine, is it best to create two different folder structures, one being the develop folder and the other being the production folder?
So, I push my final (develop) changes to the master branch, then pull those changes to production folder?
I like that idea of keeping separate folders in order to prevent mistakes, but I think it totally depends on the standards/conventions of your group/environment. If you communicate this approach to them and they approve of it, then I would recommend it. But the way version control works is that it allows you to access remote versions from anywhere and by using the same folder location, you're optimizing space on your machine.
Our client has a web application running a Django instance from virtualenv on a Ubuntu server. We did a security audit for that service and found a path traversal vulnerability in a file upload form that could allow the attacker to write arbitrary files in the django user owned paths. Example:
A parameter "Import Name" is supplied with value
../some/path/to/create
Then the form file field is supplied with arbitary filename and the correct file contents
The application then does
try:
path = os.path.join(DEFAULT_UPLOAD_DIR, <Import Name>)
os.mkdir(path)
...
with open(os.path.join(path, <Filename From Form>)) as upload_file:
upload_file.write(<File Contents>)
...
The unsafe os.path.join allows the attacker to walk up in the directory tree and upload to other directories than the DEFAULT_UPLOAD_DIR. So basically if the attacker is able to find a path that doesn't yet exist on the server he's able to create that folder avoiding the failure of os.mkdir() in the try...except and the file is uploaded there.
Now this translates to a real exploit if the attacker is able to write to
../virtualenvs/<env name>/lib/python2.7/
Since e.g Django modules are loaded from the subdirectory site-packages within the virtualenv python directory and pythonpath tells us whatever is directly under lib/python2.7 gets loaded first, essentially the module loading order allows the attacker to 'overwrite' a module and ensure their code is run on import.
We did a proof-of-concept penetration test and wrote to
../virtualenvs/somepath/__init__.py
Which succeeded but for some reason we are unable to write to
../virtualenvs/<actual env name>/
Which is strange cause the permissions are exactly the same as with somepath and owner / group is in both cases the Django user. Enabling the virtualenv for the Django user and going to the python shell it allows me to do the write so it seems weird that it can't when called from the vulnerable form view.
The question is: Is there something special about the virtualenv path from which the Django instance is running that makes it unable to write to that path? Or am I missing something?
I am working on a Django based application whose location on my disk is home/user/Documents/project/application. Now this application takes in some values from the user and writes them into a file located in a folder which is under the project directory i.e home/user/Documents/project/folder/file. While running the development server using the command python manage.py runserver everything worked fine, however after deployment the application/views.py which accesses the file via open('folder/path','w') is not able to access it anymore, because by default it looks in var/www folder when deployed via apache2 server using mod_wsgi.
Now, I am not putting the folder into /var/www because it is not a good practise to put any python code there as it might become readable clients which is a major security threat. Please let me know, how can I point the deployed application to read and write to correct file.
The real solution is to install your data files in /srv/data/myapp or some such so that you can give the webserver user correct permissions to only those directories. Whether you choose to put your code in /var/www or not, is a separate question, but I would suggest putting at least your wsgi file there (and, of course, specifying your <DocumentRoot..> correctly.
We use Django for a project. Prior to 1.7+, we were able to dynamically pull in files to our python environment when setup.py was being executed. For example, we have this directory structure:
/foo/bar
/foo/bar/__init__.py
/foo/bar/my_functions.py
Our Django project doesn't know anything about those files to start off. When the web server starts and setup.py is read, we look at a configuration which tells us where to find files to add to our environment. Again, let's assume /foo/bar is in our configuration to be dynamically loaded. We would then use import_module() to import it so that anything under /foo/bar essentially becomes part of the project and can be used. It's basically a "plugins" feature.
After Django >=1.7, this causes huge problems, mainly:
django.core.exceptions.AppRegistryNotReady: The translation infrastructure
cannot be initialized before the apps registry is ready. Check that you
don't make non-lazy gettext calls at import time.
It also limits our ability to add new files dynamically as you always have to put them in place and restart your web server. You couldn't have an "Upload Plugin" page to add new files to the server, for example.
Is there a way to add files like this both during the web server startup as well as after the startup without restarting it?