I wrote a script to upload my models and training examples to Google Drive after every iteration in case of crashes or anything that stops the notebook from running, which looks something like this:
drive_path = 'drive/My Drive/Colab Notebooks/models/'
if path.exists(drive_path):
shutil.rmtree(drive_path)
shutil.copytree('models', drive_path)
Whenever I check my Google Drive, a few GBs is taken up by dozens of deleted models folder in the Trash, which I have to manually delete them.
The only function in google.colab.drive seems to be mount and that's it.
According to this tutorial, shutil.rmtree() removes a directory permanently but apparently it doesn't work for Drive.
It is possible to perform this action inside Google Colab by using the pydrive module. I suggest that you first move your unwanted files and folders to Trash (by ordinarily removing them in your code), and then, anytime you think it's necessary (e.g. you want to free up some space for saving weights of a new DL project), empty your trash by coding the following lines.
In order to permanently empty your Google Drive's Trash, code the following lines in your Google Colab notebook:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
my_drive = GoogleDrive(gauth)
After entering authentication code and creating a valid instance of GoogleDrive class, write:
for a_file in my_drive.ListFile({'q': "trashed = true"}).GetList():
# print the name of the file being deleted.
print(f'the file "{a_file['title']}", is about to get deleted permanently.')
# delete the file permanently.
a_file.Delete()
If you don't want to use my suggestion and want to permanently delete a specific folder in your Drive, it is possible that you have to make more complex queries and deal with fileId, parentId, and the fact that a file or folder in your Drive may have multiple parent folders, when making queries to Google Drive API.
For more information:
You can find examples of more complex (yet typical) queries, here.
You can find an example of Checking if a file is in a specific folder, here.
This statement that Files and folders in Google Drive can each have multiple parent folders may become better and more deeply understood, by reading this post.
Files will move to bin upon delete, so this neat trick reduces the file size to 0 before deleting (cannot be undone!)
import os
delete_filepath = 'drive/My Drive/Colab Notebooks/somefolder/examplefile.png'
open(delete_filename, 'w').close() #overwrite and make the file blank instead - ref: https://stackoverflow.com/a/4914288/3553367
os.remove(delete_filename) #delete the blank file from google drive will move the file to bin instead
Just have to move the into trash and connect to your drive. From there delete the notebooks permanently.
Related
I'm new to google colaboration
My team is doing a miniproject together, so my partner built a drive folder and shared it with me. The problem is that her code is to link to the file in her 'My Drive'
While she shares with me only the "miniproject" folder, thus when I run the code on the file in it, it will get error because of wrong path.
Her code:
df = pandas.read_csv("/content/drive/MyDrive/ColabNotebooks/miniproject/zoo6.csv")
The code I need to run on my account:
df = pandas.read_csv("/content/drive/MyDrive/miniproject/zoo6.csv")
(since I made a shortcut to my My Drive)
How can I run the code by my drive account on her drive folder?
there currently exists some workarounds by adding the files to your drive though this is less than ideal. You can check out this answer
I'm using Python in Colabs to access specific files with their IDs, analyze them, and record info on them. I need to save the recorded info in a team folder on Drive. I'm reluctant to mount because the team folder is deep in the architecture of the company drive folders. I don't know how to access it from the main drive, I have it stared in my drive for easy access. I also just feel weird mounting my entire drive with all the company info in it. I really wish you could mount a single folder (I know you can change the path after the mount, but that also feels weird).
I have found a ton of ways to download a file based on the file ID, but I can't find any to upload to that file ID or save to the file ID. I know there's also a way with Pandas to read info from a file ID and use that as a data frame which is an option, but can you save the new info to the file ID with Pandas? There also seems to be a way to easily download with the Google API with a file ID, but again, not an easy way to upload to a file ID or a folder ID and over write the file.
These files are going to be really big as time goes on; tens of thousands of lines, so it needs to be able to deal with that either by only uploading the new info or being able to handle long downloads.
Edit: I did also just now try using gspread, but I'm not able to share files with emails outside of our company domain, so I'm unable to use gspread. ):
You can use pydrive to read and write based on FILE_ID
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
fid = 'Your File ID'
# read it
f = drive.CreateFile({'id': fid}) # just open an existing file
f.FetchMetadata(fetch_all=True)
text = f.GetContentString() # or f.GetContentFile('im.png') to save a local file
# or write it
f.SetContentString('Sample upload file content') # or SetContentFile('im.png')
f.Upload()
I'm trying to mount a directory from https://drive.google.com/drive/folders/my_folder_name for use in a google colab notebook.
The instructions for mounting a folder show an example for a directory starting with /content/drive:
from google.colab import drive
drive.mount('/content/drive')
but my directory doesn't start with /content/drive, and the following things I've tried have all resulted in ValueError: Mountpoint must be in a directory that exists:
drive.mount("/content/drive/folders/my_folder_name")
drive.mount("content/drive/folders/my_folder_name")
drive.mount("drive/folders/my_folder_name")
drive.mount("https://drive.google.com/drive/folders/my_folder_name")
How can I mount a google drive location which doesn't start with /content/drive?
The path in drive.mount('/content/drive') is the path (mount point) where to mount the GDrive inside the virtual box where you notebook is running (refer to 'mount point' in Unix/Linux). It does not point to the path you are trying to access of your Google Drive.
Leave "/content/drive" intact and work like this instead:
from google.colab import drive
drive.mount("/content/drive") # Don't change this.
my_path = "/path/in/google_drive/from/root" # Your path
gdrive_path = "/content/drive" + "/My Drive" + my_path # Change according to your locale, if neeeded.
# "/content/drive/My Drive/path/in/google_drive/from/root"
And modify my_path to your desired folder located in GDrive (i don't know if "/My Drive/" changes according to your locale). Now, Colab Notebooks saves notebooks by default in "/Colab Notebooks" so, in MY case, the root of my GDrive is actually gdrive_path = "/content/drive/My Drive" (and I'm guessing yours is too).
This leaves us with:
import pandas as pd
from google.colab import drive
drive.mount("/content/drive") # Don't change this.
my_path = "/folders/my_folder_name" # THIS is your GDrive path
gdrive_path = "/content/drive" + "/My Drive" + my_path
# /content/drive/My Drive/folders/my_folder_name
sample_input_file = gdrive_path + "input.csv" # The specific file you are trying to access
rawdata = pd.read_csv(sample_input_file)
# /content/drive/My Drive/folders/my_folder_name/input.csv
After a successul mount, you will be asked to paste a validation code after you have granted permissions to the drive.mount API.
Update: GColab does not require copy/paste of the code anymore but instead to simply confirm you are who you say you are via a usual Google login page.
You can try this way
drive.mount('/gdrive)
Now access your file from this path
/gdrive/'My Drive'/folders/my_folder_name
In my case, this is what worked. I think this is what Katardin suggested, except that I had to first add these subfolders (that I was given access to via a link) to My Drive:
right click on subfolders in the google drive link I was given, and select "Add to My Drive."
Log into my google drive. Add the subfolders to a new folder in my google drive my_folder_name.
Then I could access the contents of those subfolders from colab with the following standard code:
drive.mount('/content/drive')
data_dir = 'drive/My Drive/my_folder_name'
os.listdir(data_dir) # shows the subfolders I had shared with me
I have found the reason why one cant mount ones own google drive for these things is because of a race condition with google . First it was suggested that changing the mount location from /content/gdrive to /content/something else but this didnt fix it. What I ended up doing was copying manually the files that are copied to google drive, then installing the google drive desktop application I would then in windows 10 go to the folder which is now located on google drive and disable file permissions inheritance and then manually putting full control rights on the folder to the users group and to authenticated users group. This seems to have fixed this for me. Other times I have noticed with these colabs (not this one in particular but some of the components used like the trained models are missing from the repository (as if they had been removed) Only solution for this is to look around for other sources of these files. This includes scurrying through google search engine and also looking at the git checkout level to find branches besides master and also looking for projects that cloned the project on github to see if they still include the files.
Open the google drive and share the link to everybody or your own accounts.
colab part
from google.colab import drive
drive.mount('/content/drive')
You may want to try the following, though it depends if you're doing this in pro or personal. There is a My Drive that Google Drive keeps in place in the file structure after the /content/drive/.
drive.mount('/content/drive/My Drive/folders/my_folder_name')
Copy your Colab document link and open on Chrome incognito window. And run the command again ;) It should work with no error
I have a Google CoLab notebook used by third-parties. The user of the notebook needs the notebook to read CSVs both from their personal mounted GDrive as well as from a 3rd-party publicly shared GDrive.
As far as I can tell, reading from these 2 different sources each require the user to complete an authentication verification code workflow copy/pasting a code each time. The UX would be much improved if they only had to do a single authentication verification, rather than 2.
Put another way: if I've already authenticated and verified who I am to mount my drive, then why do I need to do it again to read data from a publicly shared Google Drive?
I figured there would be someway to use the authentication from one method first step in the second method (see details below), or to somehow request permissions to both in a single step, but I am not having any luck figuring it out.
Background
There has been a lot written about how to read data into Google Colab notebooks: Import data into Google Colaboratory &
Towards Data Science - 3 ways to load CSV files into colab and Google CoLab's official helper notebook are some good references.
To quickly recap, you have a few options, depending on where the data is coming from. If you are working with your own data, then an easy solution is to put your data in Google Drive, and then mount your drive.
from google.colab import drive as mountGoogleDrive
mountGoogleDrive.mount('/content/mountedDrive')
And you can read files as if they were in your local filesystem at content/mountedDrive/.
Sometimes mounting your drive is not sufficient. For example, let's say you want to read data from a publicly shared Google Drive owned by a 3rd party. In this case, you can't mount your drive, because the shared data is not in your Drive. You could copy all of the data out of the 3rd parties drive and into your drive, but it would be preferable to read directly from the Public Drive, especially if this is a shared notebook that many people use.
In this case, you can use PyDrive (see same references).
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
You have to look up the drive id for your dataset, and then you can read it, e.g., like this:
import pandas as pd
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('Filename.csv')
df = pd.read_csv('Filename.csv')
In both of these work flows, you must authenticate your Google Account by following a special link, copying a code, and pasting the code back into the notebook.
Here is my problem:
I want to do both of these things in the same notebook: (1) read from a mounted google drive and (2) read from a publicly shared GDrive.
The user of my notebook is a third party. If the notebook runs both sets of code, then the user is forced to perform the authentication validation code twice. It's a bad UX, and confusing, and seems like it should be unnecessary.
Things I have tried:
Regarding this code:
auth.authenticate_user() # We already authenticated when we mounted our GDrive
gauth = GoogleAuth()
I thought there might be a way to pass the gauth object into the .mount() function so that if credentials already existed, you would not need to re-request authentication with a new verification code. But I have not been able to find documentation on google.colab.drive.mount(), and guessing randomly at passing parameters is not working out.
Alternatively we could go vice versa, however I am not sure if it is possible to save/extract authentication permissions from .mount().
Next I tried running the following code, removing the explicit authenticate_user() call after the mounting had already happened, like this:
from google.colab import drive as mountGoogleDrive
mountGoogleDrive.mount('/content/mountedDrive')
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# auth.authenticate_user() # Commented out, hoping we already authenticated during mounting
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
The first 2 lines run as expected, including the authentication link and verification code.
However once we get to the line gauth.credentials = GoogleCredentials.get_application_default() my 3rd party user gets the following error:
1260 # If no credentials, fail.
-> 1261 raise ApplicationDefaultCredentialsError(ADC_HELP_MSG)
1262
1263 #staticmethod
ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
I'm not 100% what these different lines accomplish, so I tried removing the error line as well:
from google.colab import drive as mountGoogleDrive
mountGoogleDrive.mount('/content/mountedDrive')
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# auth.authenticate_user() # Commented out, hoping we already authenticated during mounting
gauth = GoogleAuth()
# gauth.credentials = GoogleCredentials.get_application_default() # Commented out, hoping we don't need this line if we are already mounted?
drive = GoogleDrive(gauth)
This now runs without error, however when I then try to read a file from the public drive I get the following error:
InvalidConfigError: Invalid client secrets file ('Error opening file', 'client_secrets.json', 'No such file or directory', 2)
At this point I noticed something that is probably important:
When I run the drive-mounting code, the authentication is requesting access to Google DriveFile Stream.
When I run the PyDrive authentication, the authentication is requesting access on behalf of Google Cloud SDK.
So these are different permissions.
So, the question is... is there anyway to streamline this and package all of these permissions into a single-verification-code authentication work-flow? If I want to read from both my mounted Drive AND from a publicly-shared GDrive, is it required that the notebook user do double-authentication?
Thanks for any pointers to documentation or examples.
There is no way to do this. The OAuth scope is different, one is for Google Drive file system ; the other is for Google Cloud SDK.
I am using Google Colab and I would like to use my custom libraries / scripts, that I have stored on my local machine. My current approach is the following:
# (Question 1)
from google.colab import drive
drive.mount("/content/gdrive")
# Annoying chain of granting access to Google Colab
# and entering the OAuth token.
And then I use:
# (Question 2)
!cp /content/gdrive/My\ Drive/awesome-project/*.py .
Question 1:
Is there a way to avoid the mounting of the drive entriely? Whenever the execution context changes (e.g. when I select "Hardware Acceleration = GPU", or when I wait an hour), I have to re-generate and re-enter the OAuth token.
Question 2:
Is there a way to sync files between my local machine and my Google Colab scripts more elegently?
Partial (not very satisfying answer) regarding Question 1: I saw that one could install and use Dropbox. Then you can hardcode the API Key into the application and mounting is done, regardless of whether or not it is a new execution context. I wonder if a similar approach exists based on Google Drive as well.
Question 1.
Great question and yes there is- I have been using this workaround which is particularly useful if you are a researcher and want other to be able to re run your code- or just 'colab'orate when working with larger datasets. The below method has worked well working as a team and there are challenges to each person having their own version of datasets.
I have used this regularly on 30 + Gb of image files downloaded and unzipped to colab run time.
The file id is in the link provided when you share from google drive
you can also select multiple files and select share all and then get a generate for example a .txt or .json file which you can parse and extract the file id's.
from google_drive_downloader import GoogleDriveDownloader as gdd
#some file id/ list of file ids parsed from file urls.
google_fid_id = '1-4PbytN2awBviPS4Brrb4puhzFb555g2'
destination = 'dir/dir/fid'
#if zip file ad kwarg unzip=true
gdd.download_file_from_google_drive(file_id=google_fid_id,
destination, unzip=True)
A url parsing function to get file ids from a list of urls might look like this:
def parse_urls():
with open('/dir/dir/files_urls.txt', 'r') as fb:
txt = fb.readlines()
return [url.split('/')[-2] for url in txt[0].split(',')]
One health warning is that you can only repeat this a small number of times in a 24 hour window for the same files.
Here's the gdd git repo:
https://github.com/ndrplz/google-drive-downloader
here is an working example (my own) of how it works inside bigger script:
https://github.com/fdsig/image_utils
Question 2.
You can connect to a local run time but this also means using local resources gpu/cpu etc.
Really hope this helps :-).
F~
If your code isn't secret, you can use git to sync your local codes to github. Then, git clone to Colab with no need for any authentication.