Permanently saving train data in google colab - python

I have train data for 50GB.
My google drive capacity was 15GB so I upgraded it to 200GB and I uploaded my train data to my google drive
I connected to colab, but I can not find my train data in colab session, So I manually uploaded to colab which has 150GB capacity.
It says, it will be deleted when my colab connection is off.
It is impossible to save train data for colab permanently? And colab is free for 150GB?
And I see colab support nvidia P4 that is almost 5000$. can I use it 100% or it is shared to some portion(like 0.1%) to me? (When P4 is assigned to me)

The way you can do this is to mount your google drive into colab environment. Assume your files are kept under a folder named myfolder in your google drive. This is what I would suggest, do this before you read/write any file:
import os
from google.colab import drive
MOUNTPOINT = '/content/gdrive'
DATADIR = os.path.join(MOUNTPOINT, 'My Drive', 'myfolder')
drive.mount(MOUNTPOINT)
then, for example, your file bigthing.zip reside under myfolder in your google drive will be available in colab as path=os.path.join(DATADIR, 'bigthing.zip')
Similarly, when you save a file to a path like the above, you can find your file in Google Drive under the same directory.

In regards to the final questions, you are able to use it 100%, however, there are very inconsistent restrictions. Generally, you only get about 8 hours straight before you get kicked off, must be running code to keep the connection, and you can only use a GPU a few times in a row before you lose access for a day or so. You can pay for colab pro which would give you more access, and better GPUs in general for $10/month.
In my experience, before colab pro you could get a top GPU (Tesla P100) about 50% of the time. Now that they started the pro version I rarely get a P100 and get kicked off more often. So it can be a bit of a game to get regular use.
Another site that lets you do basically the same thing is https://console.paperspace.com/
They give you only 6 hour shifts on "notebook" but you wont get kicked off before then, and I can usually get a P5000 which is generally better than colab gives me.
https://www.kaggle.com/ will also give you 30 hours per week, so you really could get up to near 2 GPU hours for every hour of the day if you planned your life around it.

Related

how i can access google colab in my pc without RDP?

Here I want to know if can I access all of Google Colab's features just like Colab's internet speed and GPUs which I used on my pc without using any RDP.
DISPLAY= /opt/google/chrome-remote-desktop/start-host --code="4/0ARtbsJrzHf5E-l5eowqGJXek2W_2KOp3cDJArBtC7u1br3vDY4sjios03DW1rNEI4WcPFA" --redirect-url="https://remotedesktop.google.com/_/oauthredirect" --name=$(hostname)

When I run deep learning training code on Google Colab, do the resulting weights and biases get saved somewhere?

I am training some deep learning code from this repository on a Google Colab notebook. The training is ongoing and seems like it is going to take a day or two.
I am new to deep learning, but my question:
Once the Google Colab notebook has finished running the training script, does this mean that the resulting weights and biases will be hard written to a model somewhere (in the repository folder that I have on my Google Drive), and therefore I can then run the code on any test data I like at any point in the future? Or, once I close the Google Colab notebook, do I lose the weight and bias information and would have to run the training script again if I wanted to use the neural network?
I realise that this might depend on the details of the script (again, the repository is here), but I thought that there might be a general way that these things work also.
Any help in understanding would be greatly appreciated.
No; Colab comes with no built-in checkpointing; any saving must be done by the user - so unless the repository code does so, it's up to you.
Note that the repo would need to figure out how to connect to a remote server (or connect to your local device) for data transfer; skimming through its train.py, there's no such thing.
How to save model? See this SO; for a minimal version - the most common, and a reliable option is to "mount" your Google Drive onto Colab, and point save/load paths to direct
from google.colab import drive
drive.mount('/content/drive') # this should trigger an authentication prompt
%cd '/content/drive/My Drive/'
# alternatively, %cd '/content/drive/My Drive/my_folder/'
Once cd'd into, for example, DL Code in your My Drive (see below), you can simply do model.save("model0.h5"), and this will create model0.h5 in DL Code, containing entire model architecture & its optimizer. For just weights, use model.save_weights().

How to overcome TrainingException when training a large model with Azure Machine Learning service?

I'm training a large-ish model, trying to use for the purpose Azure Machine Learning service in Azure notebooks.
I thus create an Estimator to train locally:
from azureml.train.estimator import Estimator
estimator = Estimator(source_directory='./source_dir',
compute_target='local',
entry_script='train.py')
(my train.py should load and train starting from a large word vector file).
When running with
run = experiment.submit(config=estimator)
I get
TrainingException:
====================================================================
While attempting to take snapshot of
/data/home/username/notebooks/source_dir Your total
snapshot size exceeds the limit of 300.0 MB. Please see
http://aka.ms/aml-largefiles on how to work with large files.
====================================================================
The link provided in the error is likely broken.
Contents in my ./source_dir indeed exceed 300 MB.
How can I solve this?
You can place the training files outside source_dir so that they don't get uploaded as part of submitting the experiment, and then upload them separately to the data store (which is basically using the Azure storage associated with your workspace). All you need to do then is reference the training files from train.py.
See the Train model tutorial for an example of how to upload data to the data store and then access it from the training file.
After I read the GitHub issue Encounter |total Snapshot size 300MB while start logging and the offical document Manage and request quotas for Azure resources for Azure ML service, I think it's an unknown issue which need some time to wait Azure to fix.
Meanwhile, I recommended that you can try to migrate the current work to the other service Azure Databricks, to upload your dataset and codes and then run it in the notebook of Azure Databricks which is host on HDInsight Spark Cluster without any worry about memory or storage limits. You can refer to these samples for Azure ML on Azure Databricks.

Using Custom Libraries in Google Colab without Mounting Drive

I am using Google Colab and I would like to use my custom libraries / scripts, that I have stored on my local machine. My current approach is the following:
# (Question 1)
from google.colab import drive
drive.mount("/content/gdrive")
# Annoying chain of granting access to Google Colab
# and entering the OAuth token.
And then I use:
# (Question 2)
!cp /content/gdrive/My\ Drive/awesome-project/*.py .
Question 1:
Is there a way to avoid the mounting of the drive entriely? Whenever the execution context changes (e.g. when I select "Hardware Acceleration = GPU", or when I wait an hour), I have to re-generate and re-enter the OAuth token.
Question 2:
Is there a way to sync files between my local machine and my Google Colab scripts more elegently?
Partial (not very satisfying answer) regarding Question 1: I saw that one could install and use Dropbox. Then you can hardcode the API Key into the application and mounting is done, regardless of whether or not it is a new execution context. I wonder if a similar approach exists based on Google Drive as well.
Question 1.
Great question and yes there is- I have been using this workaround which is particularly useful if you are a researcher and want other to be able to re run your code- or just 'colab'orate when working with larger datasets. The below method has worked well working as a team and there are challenges to each person having their own version of datasets.
I have used this regularly on 30 + Gb of image files downloaded and unzipped to colab run time.
The file id is in the link provided when you share from google drive
you can also select multiple files and select share all and then get a generate for example a .txt or .json file which you can parse and extract the file id's.
from google_drive_downloader import GoogleDriveDownloader as gdd
#some file id/ list of file ids parsed from file urls.
google_fid_id = '1-4PbytN2awBviPS4Brrb4puhzFb555g2'
destination = 'dir/dir/fid'
#if zip file ad kwarg unzip=true
gdd.download_file_from_google_drive(file_id=google_fid_id,
destination, unzip=True)
A url parsing function to get file ids from a list of urls might look like this:
def parse_urls():
with open('/dir/dir/files_urls.txt', 'r') as fb:
txt = fb.readlines()
return [url.split('/')[-2] for url in txt[0].split(',')]
One health warning is that you can only repeat this a small number of times in a 24 hour window for the same files.
Here's the gdd git repo:
https://github.com/ndrplz/google-drive-downloader
here is an working example (my own) of how it works inside bigger script:
https://github.com/fdsig/image_utils
Question 2.
You can connect to a local run time but this also means using local resources gpu/cpu etc.
Really hope this helps :-).
F~
If your code isn't secret, you can use git to sync your local codes to github. Then, git clone to Colab with no need for any authentication.

How to take and restore snapshots of model training on another VM in Google Colab?

There is a 12 hour time limit for training DL models on GPU, according to google colab. Other people have had similar questions in the past, but there has been no clear answer on how to save and load models halfway through training when the 12 hour limits get exceeded, including saving the number of epochs that has been completed/saving other parameters. Is there an automated script for me to save the relevant parameters and resume operations on another VM? I am a complete noob; clear cut answers will be much appreciated.
As far as I know, there is no way to automatically reconnect to another VM whenever you reach the 12 hours limit. So in any case, you have to manually reconnect when the time is up.
As Bob Smith points out, you can mount Google Drive in Colab VM so that you can save and load data from there. In particular, you can periodically save model checkpoints so that you can load the most recent one whenever you connect to a new Colab VM.
Mount Drive in your Colab VM:
from google.colab import drive
drive.mount('/content/gdrive')
Create a saver in your graph:
saver = tf.train.Saver()
Periodically (e.g. every epoch) save a checkpoint in Drive:
saver.save(session, CHECKPOINT_PATH)
When you connect to a new Colab VM (because of the timeout), mount Drive again in your VM and restore the most recent checkpoint before the training phase:
saver.restore(session, CHECKPOINT_PATH)
...
# Start training with the restored model.
Take a look at the documentation to read more about tf.train.Saver.
Mount Drive, and save and load persistent data from there.
from google.colab import drive
drive.mount('/content/gdrive')
https://colab.research.google.com/notebooks/io.ipynb#scrollTo=RWSJpsyKqHjH
From colab you can access github, which makes it possible to save your model checkpoints to github periodically. When a session is ended, you can start another session and load the checkpoint back from your github repo.

Categories

Resources