How to fetch image dataset from Google Drive to Colab? - python

I have this very weird problem. I have searched across internet, read documentation but am not able to figure out how to do it. So what I want to do is train a classifier using Colab. And for that I have a image dataset of dogs on my local machine.
So what I did was I packed that dataset folder of images into a zip file and uploaded it onto Drive. Then from Colab I mounted the drive and from there I tried to unzip the files. Everything good. But I've realised that after sometime some of the extracted files get deleted. And thing is that those files aren't on Colab storage, but instead on Drive and I dunno why they are getting deleted after sometime. Like about an hour.
So far I've used the following commands to do the extraction -
from google.colab import drive
drive.mount('/content/drive')
from zipfile import ZipFile
filename = 'Stanford Dogs Dataset.zip'
with ZipFile(filename, 'r') as zip:
zip.extractall()
print('Done')
and also tried this -
!unzip filename -d destination
Not sure where I am going wrong. And also, dunno why the extracted files though being extracted to a subfolder within drive, also starts showing up on the main root directory. And no I am not talking about the recent section, because when I want to check their location then they points to the root of the drive. It's all so confusing.

First you mount google drive
from google.colab import drive
drive.mount('/gdrive')
Then you can copy from your drive using !cp
!cp '/gdrive/My Drive/my_file' 'my_file'
then you can work as in your pc, unzip and ...

Related

Download folder from Google drive to colab notebook [duplicate]

This question already has answers here:
Python: download files from google drive using url
(12 answers)
Closed 2 years ago.
I Have an dataset folder of size 690Mo in my google drive, I would to copy the hole dataset on my google colab notebook to train my model, but the process of copying is very long, so how I can download the folder from Google drive with use of python script ?
Maybe, you have too many files on the root directory of Google Drive or in the dataset folder.
If you have too many files and folders in root directory, then you
should clean up and sort it in less folders.
If you have many files in dataset folder, then try the following
solutions:
Make a compressed file of your dataset folder and save it to drive. Then while running copy that compressed file (will take less than a
minute for 690Mb) and extract it in Colab.
Upload your dataset to any other platform (say OneDrive, Mega, etc.), get the link, and download on Colab using that link.

Google Colab - FileNotFoundError

i am trying to train YOLOv3 with custom dataset on Google Colab. I uploaded my folders, weights etc. When I run my train.py, I get path error. I run the code like this:
!python3 "drive/TrainYourOwnYolo/2_Training/Train_YOLO.py"
The error says,
content/TrainYourOwnYolo/2_Training/dataset/img20.jpg is not found.
As I understand on Colab, my all folders are under drive folder. I don't understand why yolo is trying to find my dataset under content folder. Do you have any idea?
As it seems, you have uploaded your data to /drive/TrainYourOwnYolo/, and not to /content/TrainYourOwnYolo/, where your script is looking.
The /content folder is normally used by Colab when saving, in case you don't use Google Drive. But you have mounted your Google Drive under /drive, so your notebook unsurprisingly fails to find the files.
You should change the file paths in your Train_YOLO.py" script to replace references to /content with /drive.
If this is not possible, you can find the /content folder on the file catalogue on the left of your Colab notebook:
and by right-clicling on it, you'll see an option for uploading files there.

Making an existing director of Python programs available to Colaboratory

I have a number of directories of Python programs what I would like to execute on Colaboratory. Is there a way to do that -- other than to load and save the files one-by-one? If it helps, the directories are all in my own Google Drive. So all I would need (I think) is a way to cd to a given directory. I tried !cd .., which presumably should go to my top Google Drive directory, but it doesn't seem to work.
Just copied a directory into Google Drive\Colab Notebooks using the file explorer. But Colab refused to cd to that directory.
You'll need to mount your Google Drive before files contained therein will be available in Colab.
A recipe for how to mount Drive is available in this answer:
https://stackoverflow.com/a/47744465/8841057
After you mount the drive, don’t use !cd.
Use %cd instead.

How to import python files in google colaboratory?

I am trying to run my program on Google Colab; where my code make use of .py files written seprately.
In normal system I have all files inside one folder and it works using import xyz, but when I tried using same folder in Google drive it gives import error.
Now in googlecolab(Nov 18) you can upload your python files easily
Navigate to Files (Tab on your left panel)
Click on UPLOAD Upload your python folder or .py files
Use googlecolab book to access the file.
Please check my screenshot below!
If you have just 2-3 files, you can try the solution I gave in another question here.
Importing .py files in Google Colab
But if you have something like 5-10 files, I would suggest you put your library on github, then !git clone it to Google Colab. Another solution is to zip all you library files, then modify the first solution by unzipping with !unzip mylib.zip
If those library files are not in a folder structure, just a few files in the same folder. You can upload and save them then import them. Upload them with:
def upload_files():
from google.colab import files
uploaded = files.upload()
for k, v in uploaded.items():
open(k, 'wb').write(v)
return list(uploaded.keys())
For example you have a module like this
simple.py
def helloworld():
print("hello")
Click arrow on left panel => Choose File tab => Upload simple.py
In notebook code like this
import simple
simple.helloworld()
=> hello
Something I've used when I have multiple python scripts and want to automatically import through code is to set it up as a package and clone from the repo.
First set up the scripts in a repo with a setup.py and __init__.py files (obviously).
Then add this to the top of your notebook:
!rm -rf <repo-name> # in case you need to refresh after pushing changes
!git clone https://github.com/<user>/<repo-name>.git
Then install the package:
!pip install ./<repo-name>
Now conveniently import functions or whatever:
from <app-name>.<module> import <function>
I found this easiest way
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/directory-location

How to upload and save large data to Google Colaboratory from local drive?

I have downloaded large image training data as zip from this Kaggle link
https://www.kaggle.com/c/yelp-restaurant-photo-classification/data
How do I efficiently achieve the following?
Create a project folder in Google Colaboratory
Upload zip file to project folder
unzip the files
Thanks
EDIT: I tried the below code but its crashing for my large zip file. Is there a better/efficient way to do this where I can just specify the location of the file in local drive?
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
!pip install kaggle
api_token = {"username":"USERNAME","key":"API_KEY"}
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
json.dump(api_token, file)
!chmod 600 /content/.kaggle/kaggle.json
!kaggle config set -n path -v /content
!kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
os.chdir('/content/competitions/jigsaw-toxic-comment-classification-challenge')
for file in os.listdir():
zip_ref = zipfile.ZipFile(file, 'r')
zip_ref.extractall()
zip_ref.close()
There is minor change on line 9, without which was encountering error.
source: https://gist.github.com/jayspeidell/d10b84b8d3da52df723beacc5b15cb27
couldn't add as comment cause rep.
You may refer with these threads:
Import data into Google Colaboratory
Load local data files to Colaboratory
Also check out the I/O example notebook. Example, for access to xls files, you'll want to upload the file to Google Sheets. Then, you can use the gspread recipes in the same I/O example notebook.
You may need to use kaggle-cli module to help with the download.
It’s discussed in this fast.ai thread.
I just wrote this script that downloads and extracts data from the Kaggle API to a Colab notebook. You just need to paste in your username, API key, and competition name.
https://gist.github.com/jayspeidell/d10b84b8d3da52df723beacc5b15cb27
The manual upload function in Colab is kind of buggy now, and it's better to download files via wget or an API service anyway because you start with a fresh VM each time you open the notebook. This way the data will download automatically.
Another option is to upload the data to dropbox (if it can fit), get a download link. Then in the notebook do
!wget link -0 new-name && ls

Categories

Resources