I have a python script that accepts a file from the user and saves it.
Is it possible to not upload the file immediately but to que it up and when the server has less load to upload it then.
Can this be done by transferring the file to the browsers storage area or taking the file from the Harddrive and transferring to the User's RAM?
There is no reliable way to do what you're asking, because fundamentally, your server has no control over the user's browser, computer, or internet connection. If you don't care about reliability, you might try writing a bunch of javascript to trigger the upload at a scheduled time, but it just wouldn't work if the user closed his browser, navigated away from your web page, turned off his computer, walked away from his wifi signal, etc.
If your web site is really so heavily loaded that it buckles when lots of users upload files at once, it might be time to profile your code, use multiple servers, or perhaps use a separate upload server to accept files and then schedule transfer to your main server later.
Related
Thanks for reading, I'm not asking for code snippets, but just more an overall architectural explanation of how the below could be achieved, ideally with best practices, would be very appreciated.
I have a situation where I allow users to upload files from their google drive that is later processed (essentially read) by a celery task. The current flow just authenticates the user through oauth2 and gets the file ID for the file. Then I just save a URL in the form of https://drive.google.com/uc?export=download&id=${file.id} into the database which is later read by the celery task and using the requests library, downloaded and read.
This works fine for unrestricted (publicly shared) files but when the files are set to have some restrictions (shared only to a group, to specific users, etc), it returns request forbidden 403s.
I'm currently following https://developers.google.com/drive/api/v3/manage-downloads guide and got a snippet working that downloads a restricted file. Essentially, it does a separate oauth2 flow initiated by the server and writes a token.pickle file that is later read to authenticate the file download request and voila, it succeeds in downloading.
The problem is tying the two flows together, specifically, how would / should I set this up so that the celery task is able to download the file sometime after oauth2 was completed?
I'm thinking I'd create the token.pickle file for a user somewhere that celery can read from and do the downloading. I'm not sure if there are any gotchas or security concerns with this.
Altogether, I'm using AWS, so I could put a token-for-user-123.pickle file in S3 from the app, save the bucket name to a DB record, and have the celery task query for that and read from that location? I suppose that would work, but I'm not sure of the security repercussions and even less sure of what would happen when the tokens expire, whenever that is. Files only process on a separate request, so it could happen seconds after the users authenticate and select a file or never.
One last thing, if there were a google URL I could hit with the tokens passed in in the form of something like https://drive.google.com/uc?export=download&id=${file.id}&token={token}&whatever-else={whatever-else} that can download restricted files, that would be amazing!
Thank you!
I'm designing a website that will display data interactively for users. This data comes from various monitoring computers, that will then be loaded into a database on the same host as the website. The website will then display this data. What's the best (most efficient / most secure / most logical) way to design this? My architecture is Python (Flask) and MySQL, hosted on AWS. I can think of three possibilities:
HTML Post request. The monitoring computer has a script that loads the data onto an invisible webpage served by the host. The monitoring computer enters the data into some forms, along with a password to use for verification. After the host receives the data, it's loaded into the database by the host. My only concern here would be if someone would find this link and DDOSed it, seeing how a CAPTCHA is out of the question.
File transfer: The monitoring computer periodically transfers files to the website host. The website host then loads the new data into the website. This seems like the most straightforward method, but this feels insecure to me. Are my fears founded?
(Though I'm almost certain this one is out of the question): The monitoring computer uploads directly to the mysql database on the host. I can't imagine giving public access to the db is secure at all.
I have a django app that allows users to upload videos. Its hosted on Heroku and the uploaded files stored on an S3 Bucket.
I am using JavaScript to directly upload the files to S3 after obtaining a presigned request from Django app. This is due to Heroku 30s request timeout.
Is there anyway that i can possibly upload large files through Django backend without using JavaScript and compromising the user experience?
You should consider some of the points below for the solution of your problem.
Why your files should not come to your django-server then go to s3: Sending files to django server then sending them to s3 is just a waste of computational power and bandwidth both. Next con would be that why send files to django server when you can directly send them to your s3 storage.
How can you upload files to s3 without compromising UX: Sending files to django server is certainly not an option so you have to handle this on your frontend side. But front end side has its own limitation like limited memory. It won't be able to handle very large file because everything gets loaded into RAM and browser will eventually run out of memory if its a very large file. I would suggest that you use something like dropzone.js. It won't solve the problem of memory but it certainly can provide good UX to the user like showing progress bars, number of files etc.
The points in the other answer are valid. The short answer to the question of "Is there anyway that i can possibly upload large files through Django backend without using JavaScript" is "not without switching away from Heroku".
Keep in mind that any data transmitted to your dynos goes through Heroku's routing mesh, which is what enforces the 30 second request limit to conserve its own finite resources. Long-running transactions of any kind use up bandwidth/compute/etc that could be used to serve other requests, so Heroku applies the limit to help keep things moving across the thousands of dynos. When uploading a file, you will first be constrained by client bandwidth to your server. Then, you will be constrained by the bandwidth between your dynos and S3, on top of any processing your dyno actually does.
The larger the file, the more likely it will be that transmitting the data will exceed the 30 second timeout, particularly in step 1 for clients on unreliable networks. Creating a direct path from client to S3 is a reasonable compromise.
I'm working on small python script (raspberry pi + Linux) that getting filename, as script argument, and upload it to Google drive.
In order to upload file to Google drive, I'm using this tutorial:
https://developers.google.com/drive/web/quickstart/quickstart-python
This script is basically working good, but, it's require manual authorization of the request - EACH time. This impossible when developing automated background task.
What I'm want to improve is to accept my application only once. From this time, all the file upload tasks will pass without security questions.
How to achieve this?
You want to follow server-side auth. Basically you store a refresh token that you receive the first time the user authorizes you, and you can use that to get new tokens without prompting the user.
See https://developers.google.com/drive/web/auth/web-server
I'm building a web service using web.py, with S3 to store files. One part of my web service has a 'commit' method (similar to that of SVN/Git) that takes in a list of files in a user's local folder and compares them to the files that have currently been uploaded (by md5 hashing the local files, and keeping a database record of hashes of remote files). The client then zips the files that have been added or modified since the last 'commit' and uploads this zip to the server. The server then unzips this and puts the files where they should be, updating the database with the new hashes.
However, I want this to be a scalable web service and handling massive zip archives will be a problem. I've thought of two possible approaches:
Deal with files one at a time - server requests each file when last file has been uploaded.
Find a way of bypassing my server and letting the user upload straight to S3. However user should still be authenticated on my server!
I'm very new to web services so any advice would be appreciated. Needless to say, I'd like these transactions to be as cheap as possible in terms of memory and processing time.