Restructure connected google drives - python

I want to reorganize a series of linked google drives. Currently, each drive contains files corresponding to a letter (A-Z), and I want to reorganize them, so the files are organized by year instead. There is a massive amount of data in each drive, so it would take a lot of time to share and then copy files from one to another, and there are also many different file types. I've looked at some cloud transfer solutions, but if anyone knows if this is feasible with the drive API please let me know. I've looked at the documentation, but I'm not sure how to apply it to a transfer this large.

Use the Google Drive API, specifically the files listing via the Google Python library to list the files by page and store those in a file or database. Then, sort them accordingly with help of Python or the database.
To copy files between accounts, try rclone. It supports the server-side copy feature (--drive-server-side-across-configs) which means the file doesn't have to be downloaded and uploaded locally and instead is copied on the Google drive side. This should be significantly faster.

Related

Google Drive: Upload and get a link through Python

I have a small startup that is growing a little now and I'm trying to optimize some processes.
Every day I upload manually more than 100 PDFs to Google Drive and after that, I create a link one by one (that can be sharable).
Is this possible to do through Python? I tried to find some information and I have a lot about uploading but not about getting a sharable link.

How do I work with large data with a web application?

I recently am wrapping up a personal project that involved using flask, python, and pythonanywhere. I learned a lot and now I have some new ideas for personal projects.
My next project involves changing video files and converting them into other file types for example JPGs. when I drafted up how my system could work I quikly realized that the current platform I am using for web application hosting, meaning pythonanywhere, will be too expensive and perhaps even too slow it since I will be working with large files.
I searched around and found AWS S3 for file storage but I am having trouble finding out how I can operate on that data to do my conversions in python. I definitely don't want to download from S3 operate, on the data in Python anywhere, and then reupload the converted files to a bucket. The project will be available for use on the internet so I am trying to make it as robust and scalable as possible.
I found it hard to even word this question on what to ask as I am not too sure if I am even asking the right questions. I guess I am looking for a way to manipulate large data files, preferably in python, without having to work with the data locally if that makes any sense.
I am open to learning new technologies if that is the case and am looking for some direction on how I might achieve this personal project.
Have you looked into AWS Elastic Transcoder?
Amazon Elastic Transcoder lets you convert media files that you have stored in Amazon Simple Storage Service (Amazon S3) into media files in the formats required by consumer playback devices. For example, you can convert large, high-quality digital media files into formats that users can play back on mobile devices, tablets, web browsers, and connected televisions.
Like all things AWS, there are SDKs (e.g. Python SDK) that allow you to programmatically access the service.

Alterantive to FileToGoogleCloudStorageOperator

So I found FileToGoogleCloudStorageOperator which helps in moving files from my local system to Google Cloud. But is there a similar airflow operator to move entire directory to Google Cloud.
Not an official one, but it'd be pretty easy to create one, you can use reuse most of the logic from https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/file_to_gcs.py
You can use the same GoogleCloudStorageHook it is using to upload a single file and just iterate over the directory, uploading all the files. This is what any directory upload functions for GCS would be do anyway.
Depending on the amount of files you routinely need to upload, you might be better off breaking the upload into multiple tasks. That way, should one upload task fail you don't have to restart the upload for all files. It depends on your use case though.

How to setup a push notification Listener/reciever in python to recieve File Changes

I have been trying to utilize Google Drive's REST API to recieve file changes as push notifications, but I have no idea on where to start. As I am new to programming all together, I am unable to find any solutions.
I am using Python to develop my code, and the script that I am writing is to monitor any changes in any given spreadsheet to run some operations on then modified spreadsheet data.
Considering I was able to set up the Sheets and Drive (readonly) APIs properly, I am confident that given some direction, I would be able to setup this notification reciever/listener as well.
Here is Google Drive API Feature Page.
Just follow the guide in Detect Changes
For Google Drive apps that need to keep track of changes to files, the
Changes collection provides an efficient way to detect changes to all
files, including those that have been shared with a user. The
collection works by providing the current state of each file, if and
only if the file has changed since a given point in time.
Retrieving changes requires a pageToken to indicate a point in time to
fetch changes from.
There's a github code demo that you can test and base your project on.

Storing text files > 1MB in GAE/P

I have a Google App Engine app where I need to store text files that are larger than 1 MB (the maximum entity size.
I'm currently storing them in the Blobstore and I make use of the Files API for reading and writing them. Current operations including uploading them from a user, reading them to process and update, and presenting them to a user. Eventually, I would like to allow a user to edit them (likely as a Google doc).
Are there advantages to storing such text files in Google Cloud Storage, as a Google Doc, or in some other location instead of using the Blobstore?
It really depends on what exactly you need. There are of course advantages of using one service over the other, but in the end it really doesn't matter, since all of the solutions will be almost equally fast and not that expensive. If you will have a huge amount of data after some time you might consider switching to another solution, just because you might save some money.
Having said that, I will suggest you to continue with the Blobstore API, since that will not require extra communication with external services, more secret keys, etc. Security and speed wise it is exactly the same. When you will reach 10K or 100K users you will already going to know if it'actually worth it to store them somewhere else. Continue with what you know best, but just make sure that you're following the right practices when building on Google App Engine.
If you're already using the Files API to read and write the files, I'd recommend you use Google Cloud Storage rather than the Blobstore. GCS offers a richer RESTful API (makes it easier to do things like access control), does a number of things to accelerate serving static data, etc.
Sharing data is more easy in Google Docs (now Google Drive) and Google Cloud Storage. Using Google drive u can also use the power of Google Apps scripts.

Categories

Resources