I'm trying to mount a directory from https://drive.google.com/drive/folders/my_folder_name for use in a google colab notebook.
The instructions for mounting a folder show an example for a directory starting with /content/drive:
from google.colab import drive
drive.mount('/content/drive')
but my directory doesn't start with /content/drive, and the following things I've tried have all resulted in ValueError: Mountpoint must be in a directory that exists:
drive.mount("/content/drive/folders/my_folder_name")
drive.mount("content/drive/folders/my_folder_name")
drive.mount("drive/folders/my_folder_name")
drive.mount("https://drive.google.com/drive/folders/my_folder_name")
How can I mount a google drive location which doesn't start with /content/drive?
The path in drive.mount('/content/drive') is the path (mount point) where to mount the GDrive inside the virtual box where you notebook is running (refer to 'mount point' in Unix/Linux). It does not point to the path you are trying to access of your Google Drive.
Leave "/content/drive" intact and work like this instead:
from google.colab import drive
drive.mount("/content/drive") # Don't change this.
my_path = "/path/in/google_drive/from/root" # Your path
gdrive_path = "/content/drive" + "/My Drive" + my_path # Change according to your locale, if neeeded.
# "/content/drive/My Drive/path/in/google_drive/from/root"
And modify my_path to your desired folder located in GDrive (i don't know if "/My Drive/" changes according to your locale). Now, Colab Notebooks saves notebooks by default in "/Colab Notebooks" so, in MY case, the root of my GDrive is actually gdrive_path = "/content/drive/My Drive" (and I'm guessing yours is too).
This leaves us with:
import pandas as pd
from google.colab import drive
drive.mount("/content/drive") # Don't change this.
my_path = "/folders/my_folder_name" # THIS is your GDrive path
gdrive_path = "/content/drive" + "/My Drive" + my_path
# /content/drive/My Drive/folders/my_folder_name
sample_input_file = gdrive_path + "input.csv" # The specific file you are trying to access
rawdata = pd.read_csv(sample_input_file)
# /content/drive/My Drive/folders/my_folder_name/input.csv
After a successul mount, you will be asked to paste a validation code after you have granted permissions to the drive.mount API.
Update: GColab does not require copy/paste of the code anymore but instead to simply confirm you are who you say you are via a usual Google login page.
You can try this way
drive.mount('/gdrive)
Now access your file from this path
/gdrive/'My Drive'/folders/my_folder_name
In my case, this is what worked. I think this is what Katardin suggested, except that I had to first add these subfolders (that I was given access to via a link) to My Drive:
right click on subfolders in the google drive link I was given, and select "Add to My Drive."
Log into my google drive. Add the subfolders to a new folder in my google drive my_folder_name.
Then I could access the contents of those subfolders from colab with the following standard code:
drive.mount('/content/drive')
data_dir = 'drive/My Drive/my_folder_name'
os.listdir(data_dir) # shows the subfolders I had shared with me
I have found the reason why one cant mount ones own google drive for these things is because of a race condition with google . First it was suggested that changing the mount location from /content/gdrive to /content/something else but this didnt fix it. What I ended up doing was copying manually the files that are copied to google drive, then installing the google drive desktop application I would then in windows 10 go to the folder which is now located on google drive and disable file permissions inheritance and then manually putting full control rights on the folder to the users group and to authenticated users group. This seems to have fixed this for me. Other times I have noticed with these colabs (not this one in particular but some of the components used like the trained models are missing from the repository (as if they had been removed) Only solution for this is to look around for other sources of these files. This includes scurrying through google search engine and also looking at the git checkout level to find branches besides master and also looking for projects that cloned the project on github to see if they still include the files.
Open the google drive and share the link to everybody or your own accounts.
colab part
from google.colab import drive
drive.mount('/content/drive')
You may want to try the following, though it depends if you're doing this in pro or personal. There is a My Drive that Google Drive keeps in place in the file structure after the /content/drive/.
drive.mount('/content/drive/My Drive/folders/my_folder_name')
Copy your Colab document link and open on Chrome incognito window. And run the command again ;) It should work with no error
Related
I'm new to google colaboration
My team is doing a miniproject together, so my partner built a drive folder and shared it with me. The problem is that her code is to link to the file in her 'My Drive'
While she shares with me only the "miniproject" folder, thus when I run the code on the file in it, it will get error because of wrong path.
Her code:
df = pandas.read_csv("/content/drive/MyDrive/ColabNotebooks/miniproject/zoo6.csv")
The code I need to run on my account:
df = pandas.read_csv("/content/drive/MyDrive/miniproject/zoo6.csv")
(since I made a shortcut to my My Drive)
How can I run the code by my drive account on her drive folder?
there currently exists some workarounds by adding the files to your drive though this is less than ideal. You can check out this answer
I wrote a script to upload my models and training examples to Google Drive after every iteration in case of crashes or anything that stops the notebook from running, which looks something like this:
drive_path = 'drive/My Drive/Colab Notebooks/models/'
if path.exists(drive_path):
shutil.rmtree(drive_path)
shutil.copytree('models', drive_path)
Whenever I check my Google Drive, a few GBs is taken up by dozens of deleted models folder in the Trash, which I have to manually delete them.
The only function in google.colab.drive seems to be mount and that's it.
According to this tutorial, shutil.rmtree() removes a directory permanently but apparently it doesn't work for Drive.
It is possible to perform this action inside Google Colab by using the pydrive module. I suggest that you first move your unwanted files and folders to Trash (by ordinarily removing them in your code), and then, anytime you think it's necessary (e.g. you want to free up some space for saving weights of a new DL project), empty your trash by coding the following lines.
In order to permanently empty your Google Drive's Trash, code the following lines in your Google Colab notebook:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
my_drive = GoogleDrive(gauth)
After entering authentication code and creating a valid instance of GoogleDrive class, write:
for a_file in my_drive.ListFile({'q': "trashed = true"}).GetList():
# print the name of the file being deleted.
print(f'the file "{a_file['title']}", is about to get deleted permanently.')
# delete the file permanently.
a_file.Delete()
If you don't want to use my suggestion and want to permanently delete a specific folder in your Drive, it is possible that you have to make more complex queries and deal with fileId, parentId, and the fact that a file or folder in your Drive may have multiple parent folders, when making queries to Google Drive API.
For more information:
You can find examples of more complex (yet typical) queries, here.
You can find an example of Checking if a file is in a specific folder, here.
This statement that Files and folders in Google Drive can each have multiple parent folders may become better and more deeply understood, by reading this post.
Files will move to bin upon delete, so this neat trick reduces the file size to 0 before deleting (cannot be undone!)
import os
delete_filepath = 'drive/My Drive/Colab Notebooks/somefolder/examplefile.png'
open(delete_filename, 'w').close() #overwrite and make the file blank instead - ref: https://stackoverflow.com/a/4914288/3553367
os.remove(delete_filename) #delete the blank file from google drive will move the file to bin instead
Just have to move the into trash and connect to your drive. From there delete the notebooks permanently.
I am using Google Colab and I would like to use my custom libraries / scripts, that I have stored on my local machine. My current approach is the following:
# (Question 1)
from google.colab import drive
drive.mount("/content/gdrive")
# Annoying chain of granting access to Google Colab
# and entering the OAuth token.
And then I use:
# (Question 2)
!cp /content/gdrive/My\ Drive/awesome-project/*.py .
Question 1:
Is there a way to avoid the mounting of the drive entriely? Whenever the execution context changes (e.g. when I select "Hardware Acceleration = GPU", or when I wait an hour), I have to re-generate and re-enter the OAuth token.
Question 2:
Is there a way to sync files between my local machine and my Google Colab scripts more elegently?
Partial (not very satisfying answer) regarding Question 1: I saw that one could install and use Dropbox. Then you can hardcode the API Key into the application and mounting is done, regardless of whether or not it is a new execution context. I wonder if a similar approach exists based on Google Drive as well.
Question 1.
Great question and yes there is- I have been using this workaround which is particularly useful if you are a researcher and want other to be able to re run your code- or just 'colab'orate when working with larger datasets. The below method has worked well working as a team and there are challenges to each person having their own version of datasets.
I have used this regularly on 30 + Gb of image files downloaded and unzipped to colab run time.
The file id is in the link provided when you share from google drive
you can also select multiple files and select share all and then get a generate for example a .txt or .json file which you can parse and extract the file id's.
from google_drive_downloader import GoogleDriveDownloader as gdd
#some file id/ list of file ids parsed from file urls.
google_fid_id = '1-4PbytN2awBviPS4Brrb4puhzFb555g2'
destination = 'dir/dir/fid'
#if zip file ad kwarg unzip=true
gdd.download_file_from_google_drive(file_id=google_fid_id,
destination, unzip=True)
A url parsing function to get file ids from a list of urls might look like this:
def parse_urls():
with open('/dir/dir/files_urls.txt', 'r') as fb:
txt = fb.readlines()
return [url.split('/')[-2] for url in txt[0].split(',')]
One health warning is that you can only repeat this a small number of times in a 24 hour window for the same files.
Here's the gdd git repo:
https://github.com/ndrplz/google-drive-downloader
here is an working example (my own) of how it works inside bigger script:
https://github.com/fdsig/image_utils
Question 2.
You can connect to a local run time but this also means using local resources gpu/cpu etc.
Really hope this helps :-).
F~
If your code isn't secret, you can use git to sync your local codes to github. Then, git clone to Colab with no need for any authentication.
I am trying to upload a whole folder to Dropbox at once but I can't seem to get it done is it possible? And even when I am trying to upload a single file I have to precise the file extension in the Dropbox path, is there another way to do it?
code I am using
client = dropbox.client.DropboxClient(access_token)
f= open(file_path)
response = client.put_file('/pass',f )
but it's not working
The Dropbox SDK doesn't automatically find all the local files for you, so you'll need to enumerate them yourself and upload each one at a time. os.walk is a convenient way to do that in Python.
Below is working code with some explanation in the comments. Usage is like this: python upload_dir.py abc123xyz /local/folder/to/upload /path/in/Dropbox:
import os
import sys
from dropbox.client import DropboxClient
# get an access token, local (from) directory, and Dropbox (to) directory
# from the command-line
access_token, local_directory, dropbox_destination = sys.argv[1:4]
client = DropboxClient(access_token)
# enumerate local files recursively
for root, dirs, files in os.walk(local_directory):
for filename in files:
# construct the full local path
local_path = os.path.join(root, filename)
# construct the full Dropbox path
relative_path = os.path.relpath(local_path, local_directory)
dropbox_path = os.path.join(dropbox_destination, relative_path)
# upload the file
with open(local_path, 'rb') as f:
client.put_file(dropbox_path, f)
EDIT: Note that this code doesn't create empty directories. It will copy all the files to the right location in Dropbox, but if there are empty directories, those won't be created. If you want the empty directories, consider using client.file_create_folder (using each of the directories in dirs in the loop).
For me there was a better way since dropbox is installing a folder on local machine you can use it and write with python to that folder the same way you would to any folder:
1. install dropbox app (and folder) on your local machine or server
2. write the files and folder you want same way as you would before to the dropbox folder directory
3. let dropbox do the synching automatically (do nothing)
dropbox is generally installing a "share" drive on local. When you upload on remote there is a lot of synching overhead that is going to make all the process slower. I chose to let dropbox do the synching in the background it made more sense for the problem i was facing and my guess is it is the right solution for most problem. Remember that the dropbox is not a remote database it is a local folder that is mirrored everywhere.
i didn't really measure but on local it took me about 10 second the other way took around 22 minutes so all in all it was about X 130 times faster than writing to local folder and let dropbox doing the synch than writing to the dropbox by using the other method people seem to recommend for unknown reason
How can I download a complete folder hierarchy using python Google drive api. Is it that I have query each files in the folder and then download that. But doing this way, folder hierarchy will be lost. Any way to make it in proper way. Thanks
You can achieve this using Google GAM.
gam all users show filelist >filelist.csv
gam all users show filetree >filetree.csv
I got all the answers from this site. I found it very useful.
https://github.com/jay0lee/GAM/wiki
Default max results of query is 100. Must use pageToken/nextPageToken to repeat it.
see Python Google Drive API - list the entire drive file tree