I would like for a user, without having to have an Amazon account, to be able to upload mutli-gigabyte files to an S3 bucket of mine.
How can I go about this? I want to enable a user to do this by giving them a key or perhaps through an upload form rather than making a bucket world-writeable obviously.
I'd prefer to use Python on my serverside, but the idea is that a user would need nothing more than their web browser or perhaps opening up their terminal and using built-in executables.
Any thoughts?
You are attempting to proxy the file thorough your python backend to S3, that too large files. Instead you can configure S3 to accept files from user directly (without proxying through your backend code).
It is explained here: Browser Uploads to S3 using HTML POST Forms. This way your server need not handle any upload load at all.
If you also want your users to use their elsewhere ID (google/FB etc) to achieve this workflow, that too is possible. They will be able to upload these files to a sub-folder (path) in your bucket without exposing other parts of your bucket. This is detailed here: Web Identity Federation with Mobile Applications. Though it says mobile, you can apply the same to webapps.
Having said all that, as #Ratan points out, large file uploads could break in between when you try from a browser and it cant retry "only the failed parts". This is where a dedicated app's need come in. Another option is to ask your users to keep the files in their Dropbox/BOX.com account and your server can read from there - these services already take care of large file upload with all retries etc using their apps.
This answer is relevant to .Net as language.
We had such requirement, where we had created an executable. The executable internally called a web method, which validated the app authenticated to upload files to AWS S3 or NOT.
You can do this using a web browser too, but I would not suggest this, if you are targeting big files.
Related
I'm building a small website that involves users uploading images that will be displayed later. The images are stored in an S3 bucket.
Sometimes, I need to display a lot of these images at once, and I'm not sure how best to accommodate that, without allowing public access to S3.
Currently, when there's a request to the server, the server downloads the object from S3, and then returns the file to the client- This is understandably slow. I would love to just be able to return the S3 URL and have the client load from there (So the traffic doesn't have to pass through my server and I don't have to wait for the image to download from S3->Server->Client, but I also don't want S3 bucket urls that are just unsecured and that anyone can go to.
What is the best architecture to solve this? Is there a way of giving people very brief temporary permission to a bucket? Is it possible to scope that to a specific url?
I looked around on stackoverflow and github for similar questions, but most of them seem to have to do with how the files are uploaded and not accessing them securely.
As suggested by #jarmod, you can pre-sign your objects' URL.
In this case, once you need to share an image, you need to create a pre-sign URL for the object and share this URL.
Your server will only provide the URL. The user will access the image directly, without your server in the middle of the request.
The AWS site explains how to use pre-sign URLs:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html
https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/s3-example-presigned-urls.html
I am working on a script that takes files from a gcp bucket and uploads them to another server.
Currently my script downloads all of the files from the gcp bucket into my local storage using blob.download_to_filename and then sends a POST request (using requests library) to upload those files to my server.
I know that it is possible to download the files as a string and then re-construct the file. But what about for files that, for example, have pictures? This bucket could fill up with any type of file format and I need to make sure all files will be uploaded exactly how they look in GCP to my server.
Is there some way to temporarily store a file so that I can send it from GCP to my server without having to download it to my computer?
Looking for a way to refer to the file in the POST request that will let me upload it to my server without it being on my local storage?
Thank you so much in advance!
You will need to write some code that:
Authenticates your request(s) to GCS
Downloads the objects (perhaps to memory; perhaps in chunks)
optionally: Authenticates your request to your destination
Uploads the objects (perhaps in chunks)
You tagged Python and you can do this using Google's Python library for GCS. See Streaming downloads and the recommendation to use ChunkedDownloads.
With ChunkedDownloads, you'd:
Authenticate using the library
Iterate over the GCS bucket's content
Download the objects (files) in chunks (you decide the chunk size) to memory
Preferably upload/stream the chunks to your destination
It's very likely that there are utilities that support migrating from GCS to your preferred destination.
I'm unfamiliar with any of these and encourage you to validate any options before proceeding to ensure it doesn't steal your credentials.
Thanks for reading, I'm not asking for code snippets, but just more an overall architectural explanation of how the below could be achieved, ideally with best practices, would be very appreciated.
I have a situation where I allow users to upload files from their google drive that is later processed (essentially read) by a celery task. The current flow just authenticates the user through oauth2 and gets the file ID for the file. Then I just save a URL in the form of https://drive.google.com/uc?export=download&id=${file.id} into the database which is later read by the celery task and using the requests library, downloaded and read.
This works fine for unrestricted (publicly shared) files but when the files are set to have some restrictions (shared only to a group, to specific users, etc), it returns request forbidden 403s.
I'm currently following https://developers.google.com/drive/api/v3/manage-downloads guide and got a snippet working that downloads a restricted file. Essentially, it does a separate oauth2 flow initiated by the server and writes a token.pickle file that is later read to authenticate the file download request and voila, it succeeds in downloading.
The problem is tying the two flows together, specifically, how would / should I set this up so that the celery task is able to download the file sometime after oauth2 was completed?
I'm thinking I'd create the token.pickle file for a user somewhere that celery can read from and do the downloading. I'm not sure if there are any gotchas or security concerns with this.
Altogether, I'm using AWS, so I could put a token-for-user-123.pickle file in S3 from the app, save the bucket name to a DB record, and have the celery task query for that and read from that location? I suppose that would work, but I'm not sure of the security repercussions and even less sure of what would happen when the tokens expire, whenever that is. Files only process on a separate request, so it could happen seconds after the users authenticate and select a file or never.
One last thing, if there were a google URL I could hit with the tokens passed in in the form of something like https://drive.google.com/uc?export=download&id=${file.id}&token={token}&whatever-else={whatever-else} that can download restricted files, that would be amazing!
Thank you!
In a Django project of mine, users upload video files. Initially, I was uploading them directly to Azure Blob Storage (equivalent to storing it on Amazon S3). I.e. in models.py I had:
class Video(models.Model):
video_file = models.FileField(upload_to=upload_path, storage=OverwriteStorage())
Where OverwriteStorage overrides Storage in django.core.files.storage, and essentially uploads the file onto Azure.
Now I need to upload this file to a separate Linux server (not the same one that serves my Django web application). In this separate server, I'll perform some operations on the video file (compression, format change), and then I'll upload it to Azure Storage like before.
My question is: given my goal, how do I change the way I'm uploading the file in models.py? An illustrative example would be nice. I'm thinking I'll need to change FileField.upload_to, but all the examples I've seen indicate it's only to define a local filesystem path. Moreover, I don't want to let the user upload the content normally and then run a process to upload the file to another server. Doing it directly is my preference. Any ideas?
I've solved a similar issue with Amazon's S3, but the concept should be the same.
First, I use django-storages, and by default, upload my media files to S3 (django-storages also supports Azure). Then, my team set up an NFS share mount on our Django web servers from the destination server we occasional need to write user uploads to. Then we simply override django-storages by using "upload_to" to the local path that is a mount from the other server.
This answer has a quick example of how to set up an NFS share from one server on another: https://superuser.com/questions/300662/how-to-mount-a-folder-from-a-linux-machine-on-another-linux-machine
There are a few ways to skin the cat, but this one seemed easiest to our team. Good luck!
I am trying to serve files securely (images in this case) to my users. I would like to do this using flask and preferably amazon s3 however I would be open to another cloud storage solution if required.
I have managed to get my flask static files like css and such on s3 however this is all non-secure. So everyone who has the link can open the static files. This is obviously not what I want for secure content. I can't seems to figure out how I can make a file available to just authenticated user that 'owns' the file.
For example: When I log into my dropbox account and copy a random file's download link. Then go over to anther computer and use this link it will denie me access. Even though I am still logged in and the download link is available to user on the latter pc.
Make the request to your Flask application, which will authenticate the user and then issue a redirect to the S3 object. The trick is that the redirect should be to a signed temporary URL that expires in a minute or so, so it can't be saved and used later or by others.
You can use boto.s3.key.generate_url function in your Flask app to create the temporary URL.