I have an HTTP server which host some large file and have python clients (GUI apps) which download it.
I want the clients to download the file only when needed, but have an up-to-date file on each run.
I thought each client will download the file on each run using the If-Modified-Since HTTP header with the file time of the existing file, if any. Can someone suggest how to do it in python?
Can someone suggest an alternative, easy, way to achieve my goal?
You can add a header called ETag, (hash of your file, md5sum or sha256 etc ), to compare if two files are different instead of last-modified date
I'm assuming some things right now, BUT..
One solution would be to have a separate HTTP file on the server (check.php) which creates a hash/checksum of each files you're hosting. If the files differ from the local files, then the client will download the file. This means that if the content of the file on the server changes, the client will notice the change since the checksum will differ.
do a MD5 hash of the file contents, put it in a database or something and check against it before downloading anything.
Your solution would work to, but it requires the server to actually include the "modified" date in the Header for the GET request (some server softwares does not do this).
I'd say putting up a database that looks something like:
[ID] [File_name] [File_hash]
0001 moo.txt asd124kJKJhj124kjh12j
It seems to me the easiest solution is hosting the file in mercurial and using mercurial api to find the file's hash, downloading the file if the hash has changed.
Calculating the hash can be done as the answer to this question; for downloading the file urllib will be enough.
Related
Before I pose the question, some background: I'm creating a web management tool that, among other things, allows the user to download, tail, email, and move and files between predefined directories via the management panel. Many of these directories are local to the server, but some are actually located on remote hosts and accessed via SSH--however, this is transparent to the user. I've used Twisted to create a pseudo-REST API for the client to access, but since I want to avoid revealing actual server paths to the client, it requests downloads of files using a POST with an arbitrary ID to the api, as such: "http://XXXX:8880/api/transfer/download"
with POST params similar to this: {"srckey":"5","srcfile":"solar2-windows-1.10.zip"}. The idea being the client only knows the key of the directory and filename.
Pardon the excessive background--I'm hoping it will make my question more clear: The issue I have is I'm trying to allow users to download a copy of a file from one of the "remote" hosts via the management server that hosts the web panel, all without caching the file locally. I've used Twisted's File() object to stream large static files before, but since the file resides on another server, I'm trying to accomplish the same using a file object provided by Paramiko's "open()" method.
I've tried setting up a consumer/producer system similar to that used in the render methods of twisted.web.static.File, plugging in the file pointer provided by Paramiko in the appropriate places, but only the smallest text files transfer successfully--all cases cause Paramiko to throw this error:
socket.error: Socket is closed
The contents of the relevant python files are here:
serve-project.py: http://pastebin.com/YcjsQHu3
WrapSSH.py:
http://pastebin.com/XaKXJwxb
In a nutshell, I'm trying to stream the data from a Paramiko SFTPFile to an HTTP client. I suspect that my approach is majorly faulty, due to my minimal familiarity with Twisted. Anyone have suggestions on a more intelligent way to accomplish this?
I want to download a file from SkyDrive programmatically using Python on Linux.
I can't use the API as it's a OneNote file and the API can't be used to download these.
My understanding is that SD supports Webdav and there are plenty of examples where people have mounted an SD folder using davfs2 but I just want to be able to grab a specific file without mounting.
I can use the API to get the document owner's cid so don't need to jump through any windows based hoops but my - probably lame, have not really researched webdav - efforts to download the file always resort in an error.
For example using easywebdav:
import easywebdav
webdav = easywebdav.connect("d.docs.live.net/mycid")
webdav.download('me/skydrive/Documents/Getting\ Started', '/tmp/foo')
#this gives the 302 error mentioned in the comments at the end of the the 'jumping through windows hoops' link I posted above.
Is there any workaround for the redirection problem I've seen mentioned?
Do I have this wrong and when accessing files on a webdav share it makes sense, and indeed it's essential, to mount it as a file system?
If you are downloading a specific file, and already know the exact path/URL to that file (as per your example), I'm not sure that you really need to worry about the DAV extensions. Have you tried downloading the file using a simple HTTP GET, through something like urllib2?
I do an url fetch to get info from an online txt file. It's a big file (like 2Mb and counting) that gets modified all the time, automatically.
I'm using memcache from Google App Engine to keep the data for a while. But for each new request, the incoming bandwith increased, and I started to get Over Quota error.
I need a way to make a partial download of this file downloading only whats changed, instead of all the file.
Any ideas? :)
Only if you know what part of the file has been changed.
For example, if you know that the file is only appended to, then you could use a HTTP Range request to request only the end of the file.
If you have no way of knowing where the file has been changed, then it would work only if the server sent you a patch or delta to a previous version.
I have a client's python website which runs a dropbox-like feature that allows uploading of files.
I want to make sure that uploading files does not open up the server to vulnerabilities.
So, I store all uploaded files as blobs in a postgres database and do not trust the file name and extension of the file, I let the application determine that for itself.
I ran into problems when trying to let the application decide the file format itself, so my question boils down to:
Is it necessary, for security, to limit what file formats are allowed to be uploaded?
If yes, how, if not using something like libmagic, can I determine the file format in the best way?
Are there other measures I need to make in order to remain safe when allowing publically loaded files?
Thanks.
The referenced "bug" question (which chains to this)doesn't refer to a bug, it says that some MS Office file types are, like Java jars, packaged and compressed as zipfiles. If you rename a .xlsx file to .zip, you can view the contents - I found 13 .xml files and a .bin printer settings file in a simple example.
For security you can't "trust" mime-type and file extension provided by the user, but you can in principle use them to validate that the contents are valid for the claimed file type. The first level of checking would ensure that the claimed Office files are in fact valid zipfiles, the second would check that the contents conforms to what is expected by the Office application. Not being an Office developer I don't know of a process to inspect a zip archive, determine which Office application it is for, and validate that the application can open it, but I'm sure it exists somewhere on MSDN.
More fundamentally, what do you mean by "security is important to my application"? Security prevents unwanted events - you need to define what you want to prevent. Do you want users to only be able to upload files for whitelisted applications? Should they be prevented from uploading blacklisted file types (like .exe)? Is it OK with you if a user uploaded 10MB of random bits and called it a .xyz file?
I'm looking for a way to sell someone a card at an event that will have a unique code that they will be able to use later in order to download a file (mp3, pdf, etc.) only one time and mask the true file location so a savvy person downloading the file won't be able to download the file more than once. It would be nice to host the file on Amazon S3 to save on bandwidth where our server is co-located.
My thought for the codes would be to pre-generate the unique codes that will get printed on the cards and store those in a database that could also have a field that stores the number of times the file was downloaded. This way we could set how many attempts we would allow the user for downloading the file.
The part that I need direction on is how do I hide/mask the original file location so people can't steal that url and then download the file as many times as they want. I've done Google searches and I'm either not searching using the right keywords or there aren't very many libraries or snippets out there already for this type of thing.
I'm guessing that I might be able to rig something up using django.views.static.serve that acts as a sort of proxy between the actual file and the user downloading the file. The only drawback to this method I would think is that I would need to use the actual web server and wouldn't be able to store the file on Amazon S3.
Any suggestions or thoughts are greatly appreciated.
Neat idea. However, I would warn against the single-download method, because there is no guarantee that their first download attempt will be successful. Perhaps use a time-expiration method instead?
But it is certainly possible to do this with Django. Here is an outline of the basic approach:
Set up a django url for serving these files
Use a GET parameter which is a unique string to identify which file to get.
Keep a database table which has a FileField for the file to download. This table maps the unique strings to the location of the file on the file system.
To serve the file as a download, set the response headers in the view like this:
(path is the location of the file to serve)
with open(path, 'rb') as f:
response = HttpResponse(f.read())
response['Content-Type'] = 'application/octet-stream';
response['Content-Disposition'] = 'attachment; filename="%s"' % 'insert_filename_here'
return response
Since we are using this Django page to serve the file, the user cannot find out the original file location.
You can just use something simple such as mod_xsendfile. This functionality is also available in other popular webservers such lighttpd or nginx.
It works like this: when enabled your application (e.g. a trivial PHP script) can send a special response header, causing the webserver to serve a static file.
If you want it to work with S3 you will need to handle each and every request this way, meaning the traffic will go through your site, from there to AWS, back to your site and back to the client. Does S3 support symbolic links / aliases? If so you might just redirect a valid user to one of the symbolic URLs and delete that symlink after a couple of hours.