Security around user uploaded files

Security around user uploaded files - python

I have a client's python website which runs a dropbox-like feature that allows uploading of files.
I want to make sure that uploading files does not open up the server to vulnerabilities.
So, I store all uploaded files as blobs in a postgres database and do not trust the file name and extension of the file, I let the application determine that for itself.
I ran into problems when trying to let the application decide the file format itself, so my question boils down to:
Is it necessary, for security, to limit what file formats are allowed to be uploaded?
If yes, how, if not using something like libmagic, can I determine the file format in the best way?
Are there other measures I need to make in order to remain safe when allowing publically loaded files?
Thanks.

The referenced "bug" question (which chains to this)doesn't refer to a bug, it says that some MS Office file types are, like Java jars, packaged and compressed as zipfiles. If you rename a .xlsx file to .zip, you can view the contents - I found 13 .xml files and a .bin printer settings file in a simple example.
For security you can't "trust" mime-type and file extension provided by the user, but you can in principle use them to validate that the contents are valid for the claimed file type. The first level of checking would ensure that the claimed Office files are in fact valid zipfiles, the second would check that the contents conforms to what is expected by the Office application. Not being an Office developer I don't know of a process to inspect a zip archive, determine which Office application it is for, and validate that the application can open it, but I'm sure it exists somewhere on MSDN.
More fundamentally, what do you mean by "security is important to my application"? Security prevents unwanted events - you need to define what you want to prevent. Do you want users to only be able to upload files for whitelisted applications? Should they be prevented from uploading blacklisted file types (like .exe)? Is it OK with you if a user uploaded 10MB of random bits and called it a .xyz file?

Related

How to read and write to a text file using python while deploying on heroku

I use a bot that writes (file ids) of the files that sent from user to a text file and then read from this text file the (file ids) then send it back to the user.The method worked, but when I deploy it to Heroku, I can no longer see, process, or download the text file.
Is there a way to view the text files that we deploy to heroku? Or is there a way to upload the text files on a cloud website and then make the bot open (read & write) the text file using the text file URL (but I think this would allow any user on the internet to access and modify my text files, which means it is not safe)? Create SQL database and upload text files and link each text file with its own URL (But I'm new to SQL)?
Is there any other simple method to solve this problem? What do you advise me to do in this case?
https://github.com/zieadshabkalieh/a
NOTE: The text file in my code named first.txt

Heroku has an ephemeral filesystem: every file created by the application will be removed (also any change to existing files deployed with the application) when there is a new deployment or application restart.
Heroku Dynos also restart every 24 hours.
It is a good idea to persist data to a remote storage (like S3) or a DB (always a good option but requires a little bit more work).
For reading/writing simple files you can check HerokuFiles repository with some Python examples and options. I would suggest S3 (using Python boto module) as it is easy to use, even if the number/size of files will one day increase.

Securing/sanitizing remote calls to server by untrusted clients

I'm building an API which will expose (among other things) the following calls:
Upload file to remote server.
Perform various computations (over some set of possible function) on remotely uploaded file.
I'm trying to do this on Python. What are the best practices when the client is untrusted, meaning that they can upload arbitrarily crafted files?
What's the standard procedure nowadays? RPC, REST, something else?
I do not need to worry about authentication and/or encryption, requests can be anonymous and in the clear. MITM is not a concern either.

You should treat any client as untrusted, so your case will need a general approach which can be found at OWASP ASVS (v16: files and resources verification requirements). REST is OK for this purpose.
The main points are:
store files outside of webroot (e.g. it can't be served by static page server)
avoid setting the execution bit on (for Linux)
if it is possible, limit file types to know-good ones (e.g. validation against whitelist; validate filetypes by extension AND by file signature)
check that files do have an appropriate size before accepting requests and putting files into variables (you can check it by HTTP content-length and filter it before passing to an app)
if it is possible, check files with server antivirus
if files are served back to a user, ensure that the appropriate headers (content-type, no-sniff) are set. If they are not, some XSS scenarios are possible
verify that filenames are sanitized so they won't trick you program into serving other files (e.g. there might be a scenario where filename "../../../../../../etc/passwd" will serve an actual /etc/passwd file). Reject request if filename contains ../ or / sequences.
do not ever concatenate path to folders with filenames because it can give the same issue
if computations will be made via calling the command line, beware of command line injections (this issue and 2 previous can be solved by specifying the file name format to the users, e.g. accept only alphanumeric names without spaces or any special chars and reject any request that won't fit the pattern)
if you can, limit requests number by IP

Python/Twisted-- Render Paramiko SFTPFile as if it were twisted.web.static.File

Before I pose the question, some background: I'm creating a web management tool that, among other things, allows the user to download, tail, email, and move and files between predefined directories via the management panel. Many of these directories are local to the server, but some are actually located on remote hosts and accessed via SSH--however, this is transparent to the user. I've used Twisted to create a pseudo-REST API for the client to access, but since I want to avoid revealing actual server paths to the client, it requests downloads of files using a POST with an arbitrary ID to the api, as such: "http://XXXX:8880/api/transfer/download"
with POST params similar to this: {"srckey":"5","srcfile":"solar2-windows-1.10.zip"}. The idea being the client only knows the key of the directory and filename.
Pardon the excessive background--I'm hoping it will make my question more clear: The issue I have is I'm trying to allow users to download a copy of a file from one of the "remote" hosts via the management server that hosts the web panel, all without caching the file locally. I've used Twisted's File() object to stream large static files before, but since the file resides on another server, I'm trying to accomplish the same using a file object provided by Paramiko's "open()" method.
I've tried setting up a consumer/producer system similar to that used in the render methods of twisted.web.static.File, plugging in the file pointer provided by Paramiko in the appropriate places, but only the smallest text files transfer successfully--all cases cause Paramiko to throw this error:
socket.error: Socket is closed
The contents of the relevant python files are here:
serve-project.py: http://pastebin.com/YcjsQHu3
WrapSSH.py:
http://pastebin.com/XaKXJwxb
In a nutshell, I'm trying to stream the data from a Paramiko SFTPFile to an HTTP client. I suspect that my approach is majorly faulty, due to my minimal familiarity with Twisted. Anyone have suggestions on a more intelligent way to accomplish this?

Sync local file with HTTP server location (in Python)

I have an HTTP server which host some large file and have python clients (GUI apps) which download it.
I want the clients to download the file only when needed, but have an up-to-date file on each run.
I thought each client will download the file on each run using the If-Modified-Since HTTP header with the file time of the existing file, if any. Can someone suggest how to do it in python?
Can someone suggest an alternative, easy, way to achieve my goal?

You can add a header called ETag, (hash of your file, md5sum or sha256 etc ), to compare if two files are different instead of last-modified date

I'm assuming some things right now, BUT..
One solution would be to have a separate HTTP file on the server (check.php) which creates a hash/checksum of each files you're hosting. If the files differ from the local files, then the client will download the file. This means that if the content of the file on the server changes, the client will notice the change since the checksum will differ.
do a MD5 hash of the file contents, put it in a database or something and check against it before downloading anything.
Your solution would work to, but it requires the server to actually include the "modified" date in the Header for the GET request (some server softwares does not do this).
I'd say putting up a database that looks something like:
[ID] [File_name] [File_hash]
0001 moo.txt asd124kJKJhj124kjh12j

It seems to me the easiest solution is hosting the file in mercurial and using mercurial api to find the file's hash, downloading the file if the hash has changed.
Calculating the hash can be done as the answer to this question; for downloading the file urllib will be enough.

What's a Django/Python solution for providing a one-time url for people to download files?

I'm looking for a way to sell someone a card at an event that will have a unique code that they will be able to use later in order to download a file (mp3, pdf, etc.) only one time and mask the true file location so a savvy person downloading the file won't be able to download the file more than once. It would be nice to host the file on Amazon S3 to save on bandwidth where our server is co-located.
My thought for the codes would be to pre-generate the unique codes that will get printed on the cards and store those in a database that could also have a field that stores the number of times the file was downloaded. This way we could set how many attempts we would allow the user for downloading the file.
The part that I need direction on is how do I hide/mask the original file location so people can't steal that url and then download the file as many times as they want. I've done Google searches and I'm either not searching using the right keywords or there aren't very many libraries or snippets out there already for this type of thing.
I'm guessing that I might be able to rig something up using django.views.static.serve that acts as a sort of proxy between the actual file and the user downloading the file. The only drawback to this method I would think is that I would need to use the actual web server and wouldn't be able to store the file on Amazon S3.
Any suggestions or thoughts are greatly appreciated.

Neat idea. However, I would warn against the single-download method, because there is no guarantee that their first download attempt will be successful. Perhaps use a time-expiration method instead?
But it is certainly possible to do this with Django. Here is an outline of the basic approach:
Set up a django url for serving these files
Use a GET parameter which is a unique string to identify which file to get.
Keep a database table which has a FileField for the file to download. This table maps the unique strings to the location of the file on the file system.
To serve the file as a download, set the response headers in the view like this:
(path is the location of the file to serve)
with open(path, 'rb') as f:
response = HttpResponse(f.read())
response['Content-Type'] = 'application/octet-stream';
response['Content-Disposition'] = 'attachment; filename="%s"' % 'insert_filename_here'
return response
Since we are using this Django page to serve the file, the user cannot find out the original file location.

You can just use something simple such as mod_xsendfile. This functionality is also available in other popular webservers such lighttpd or nginx.
It works like this: when enabled your application (e.g. a trivial PHP script) can send a special response header, causing the webserver to serve a static file.
If you want it to work with S3 you will need to handle each and every request this way, meaning the traffic will go through your site, from there to AWS, back to your site and back to the client. Does S3 support symbolic links / aliases? If so you might just redirect a valid user to one of the symbolic URLs and delete that symlink after a couple of hours.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.