Downloading Data from Google Drive Colab - python

I'm a beginner in TensorFlow and python in general, so any help would be much appreciated. I'm following this tutorial from tensorflow, just with my own data.
So I'm trying to download my own data from a link I got from a folder that I uploaded to google drive. I will then use that data in an image classifier model. However, I start to see the images getting downloaded and it says:
dataset_url_training = "https://drive.google.com/drive/folders/genericid?usp=sharing"
data_dir_training = tf.keras.utils.get_file('flower_photos', origin=dataset_url_training, untar=True)
data_dir_training = pathlib.Path(data_dir_training)
Downloading data from https://drive.google.com/drive/folders/genericid?usp=sharing
106496/Unknown - 2s 14us/step
And then it just stops. And when I try to use the following code:
print(image_count)
The output spits out: 0
I'm really confused and I don't know what to do. Some suggestions have been to make a zip file url, but that only applies to individual files and doesn't work for whole folders like mine. Furthermore, as far as I know, Google Drive doesn't allow you to get links for zip files, just those for sharing (they are my own files, for clarification).
Thank you.
Edit 1: Just want to be clear: I'm NOT looking for a path. I'm looking for a URL, hence the use of a directory. I've also tried using the. link of a zip file, but I got the same error message as before.

When you want to use a file from google drive in colab you can mount our drive to colab.
from google.colab import drive
drive.mount('/content/gdrive')
Than you can open files form google drive.
For example your file is in the directory "folder" on your main drive page:
path = "gdrive/My Drive/folder/flower_photos"
Edit/Addition:
To make it more clear, you change this part from the tutorial
import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
to this
import pathlib
from google.colab import drive
drive.mount('/content/gdrive')
data_dir = "gdrive/My Drive/flower_photos"
data_dir = pathlib.Path(data_dir)

Related

How do I mount one Google Drive folder with images only onto Google Colab and iterate through it?

I have a folder of images I extracted for a computer-vision project, but don't know how to mount it onto Google Colab and iterate through the folder. I have a function I need to apply to each image inside the folder, but don't know how to get the images.
I've tried looking for resources but haven't found anything helpful to my situation, because most of them were for unzipping files that were not images. Can you please help me out? Thank you.
You can use the OpenCV library for this.
from google.colab import drive
import os
import cv2
First you need to change the current working directory to your image folder directory.
os.chdir("/content/drive/MyDrive/yourfolder")
You can iterate through every image, apply your function to them and save the final version like this:
for file in os.listdir():
img = cv2.imread(file)
result = myfunction(img)
cv2.imwrite(file, result)

Save/Export a custom tokenizer from google colab notebook

I have a custom tokenizer and want to use it for prediction in Production API. How do I save/download the tokenizer?
This is my code trying to save it:
import pickle
from tensorflow.python.lib.io import file_io
with file_io.FileIO('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
No error, but I can't find the tokenizer after saving it. So I assume the code didn't work?
Here is the situation, using a simple file to disentangle the issue from irrelevant specificities like pickle, Tensorflow, and tokenizers:
# Run in a new Colab notebook:
%pwd
/content
%ls
sample_data/
Let's save a simple file foo.npy:
import numpy as np
np.save('foo', np.array([1,2,3]))
%ls
foo.npy sample_data/
In this stage, %ls should show tokenizer.pickle in your case instead of foo.npy.
Now, Google Drive & Colab do not communicate by default; you have to mount the drive first (it will ask for identification):
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
After which, an %ls command will give:
%ls
drive/ foo.npy sample_data/
and you can now navigate (and save) inside drive/ (i.e. actually in your Google Drive), changing the path accordingly. Anything saved there can be retrieved later.

python how to import data in folder

hey guys i am new to python and have been trying to use google collaboratory notebook to learn pandas. i have been trying to import data but i was unable to do so, the error being :
`FileNotFoundError: [Errno 2] No such file or directory: './train.csv'`
but i had the csv file in my folder which my notebook is in.
This is the code i used to run. i had no idea why it doesnt work. Thanks for any suggestions.
train = pd.read_csv("./train.csv")
test = pd.read_csv("./test.csv")
Assuming you uploaded your files in Google colab correctly, I suspect that you're not using the exact location of the files (test.csv and
train.csv)
Once you navigate to the location of the files, find the location using
pwd
Once you find the location, you can read the files in pandas
train = pd.read_csv(Location_to_file)
test = pd.read_csv(location_to_file)

How to load large xml dataset file in python?

Hi I am working on a project in data analysis with python where I have an XML file of around 2,8GB which is too large to open . I downloaded EmEditor which helped me open the file . The problem is when i try to load the file in python google colaboratory like this :
import xml.etree.ElementTree as ET
tree = ET.parse('dataset.xml') //dataset.xml is the name of my file
root = tree.getroot()
I get the result that No such file or directory: 'dataset.xml' exists . I have my dataset.xml file on my desktop and it can be opened using the EmEditor which gives me the idea that it can be edited and loaded via the EmEditor but I don't know . I would appreciate your help with helping me load the data in python
google colab.
Google Colab runs remotely on a computer from Google, and can't access files that are on your desktop.
To open the file in Python, you'll first need to transfer the file to your colab instance. There's multiple ways to do this, and you can find them here: https://colab.research.google.com/notebooks/io.ipynb
The easiest is probably this:
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
Although keep in mind that every time you start a new colab session you'll need to reupload the file. This is because Google would like to use to the computer for someone else when you are not using it, and thus wipes all the data on the computer.

How to specify a return path in Azure Jupyter Notebook

I want to import >1100 seismic time series files from Azure into a Azure online hosted Notebook for processing. My code currently copies the file directly into my project source directory instead of neatly into my "../data/" directory.
I can import things from all over the project using the "~/library/../" string. However, the same trick isn't working when I try to direct the data where to go.
I did some research online but most results don't seem to cover this particular use case. I've tried many variations of file paths but to no avail.
How can I write files to a directory relative to my home path?
import_path = "~/library/04_processedData/seismicflatfile.csv"
return_path = "RSN1100_KOBE_ABN-UP.csv"
blob_service.get_blob_to_path(container_name, "RSN1100_KOBE_ABN-UP.DT2", return_path)
You can get the local path with,
local_path = os.path.join(folder_path, file_name)
if not os.path.isfile(local_path):
blob_service.get_blob_to_path(CONTAINER_NAME, blob_name, local_path)
Refer the sample here

Categories

Resources