So I found FileToGoogleCloudStorageOperator which helps in moving files from my local system to Google Cloud. But is there a similar airflow operator to move entire directory to Google Cloud.
Not an official one, but it'd be pretty easy to create one, you can use reuse most of the logic from https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/file_to_gcs.py
You can use the same GoogleCloudStorageHook it is using to upload a single file and just iterate over the directory, uploading all the files. This is what any directory upload functions for GCS would be do anyway.
Depending on the amount of files you routinely need to upload, you might be better off breaking the upload into multiple tasks. That way, should one upload task fail you don't have to restart the upload for all files. It depends on your use case though.
Related
I want to reorganize a series of linked google drives. Currently, each drive contains files corresponding to a letter (A-Z), and I want to reorganize them, so the files are organized by year instead. There is a massive amount of data in each drive, so it would take a lot of time to share and then copy files from one to another, and there are also many different file types. I've looked at some cloud transfer solutions, but if anyone knows if this is feasible with the drive API please let me know. I've looked at the documentation, but I'm not sure how to apply it to a transfer this large.
Use the Google Drive API, specifically the files listing via the Google Python library to list the files by page and store those in a file or database. Then, sort them accordingly with help of Python or the database.
To copy files between accounts, try rclone. It supports the server-side copy feature (--drive-server-side-across-configs) which means the file doesn't have to be downloaded and uploaded locally and instead is copied on the Google drive side. This should be significantly faster.
So I am working on a Flask application which is pretty much a property manager that involves allowing users to upload images of their properties. I am new to Flask and have never had to deal with images before. From a lot of Googling I understand that there are various ways to manage static files like images.
One way is to allow users to upload images directly to the file system, and then displaying it by retrieving the file location in the static folder using something like:
<img src="static/images/filename.jpg">
However, is this really an efficient way since this means storing generating and storing the location of each image URL in the database? Especially when it comes to deploying the application? Another way I discovered was using base64 encoding and storing the image directly into the database, which also doesn't sound very efficient either.
Another way, which I think might be the best to go about this, is to use an AWS S3 bucket. The user would then be able to upload an image directly to that bucket and be assigned a URL to that image. This URL is stored in the database and can then be used to display the image similarly to the file system method. Is my understanding of this correct? Is there a better way to go about this? And is there something similar to django-storages that can be used to connect Flask to S3?
Any input or pointing me in the right direction would be much appreciated. Thank you!
If you want to store the images in the web server then the best approach for you is to use nginx as proxy in front of flask and let nginx serve the static folder for all the images.
Nginx is pretty much enough for a small website. Don't try to serve the file using flask. It is too slow.
If you want to store the images in s3 ,then you just need to store the name of image in bucket in the database. You can tell flask to use s3 bucket as the static folder. You can use boto3 library in python to access s3.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html
If you are concerned of exposing s3 bucket to users, then you can use cloudfront distribution. It is cheaper in terms of price to serve and also hides your bucket.
How to transfer files from one cloud storage to another. The files are CSV.
Where is the best place to start in relation to this problem?
For the time being the file just needs to transfer the files every week via manual execution. Eventually the files will be transferred on a scheduled basis.
You can start searching for this sites APIs. For example, Dropbox has a very well documented API for python.
If you want to automate your script every X days/hours/etc, you can make use of cron if you are running Unix based systems.
Hope that helped.
I have a directory of images that I'd like to transfer to Google drive via a python script.
What's a good way to upload (recursively) a directory of images to Google drive while preserving the original directory structure? Would there be any benefit to making this multithreaded? And if so, how would that work?
I would do this in two passes. Start by scanning the folder hierarchy, and then recreate the folders on drive. Update your in-memory folder model with the Drive folder ids. Then scan your files, uploading each one with appropriate parent id.
Only make it multithreaded if each thread will have a unique client id. Otherwise you will end up triggering the rate limit bug in Drive. If you have a large number of files, buy the boxed set of Game Of Thrones.
I would like for a user, without having to have an Amazon account, to be able to upload mutli-gigabyte files to an S3 bucket of mine.
How can I go about this? I want to enable a user to do this by giving them a key or perhaps through an upload form rather than making a bucket world-writeable obviously.
I'd prefer to use Python on my serverside, but the idea is that a user would need nothing more than their web browser or perhaps opening up their terminal and using built-in executables.
Any thoughts?
You are attempting to proxy the file thorough your python backend to S3, that too large files. Instead you can configure S3 to accept files from user directly (without proxying through your backend code).
It is explained here: Browser Uploads to S3 using HTML POST Forms. This way your server need not handle any upload load at all.
If you also want your users to use their elsewhere ID (google/FB etc) to achieve this workflow, that too is possible. They will be able to upload these files to a sub-folder (path) in your bucket without exposing other parts of your bucket. This is detailed here: Web Identity Federation with Mobile Applications. Though it says mobile, you can apply the same to webapps.
Having said all that, as #Ratan points out, large file uploads could break in between when you try from a browser and it cant retry "only the failed parts". This is where a dedicated app's need come in. Another option is to ask your users to keep the files in their Dropbox/BOX.com account and your server can read from there - these services already take care of large file upload with all retries etc using their apps.
This answer is relevant to .Net as language.
We had such requirement, where we had created an executable. The executable internally called a web method, which validated the app authenticated to upload files to AWS S3 or NOT.
You can do this using a web browser too, but I would not suggest this, if you are targeting big files.