Here I want to know if can I access all of Google Colab's features just like Colab's internet speed and GPUs which I used on my pc without using any RDP.
DISPLAY= /opt/google/chrome-remote-desktop/start-host --code="4/0ARtbsJrzHf5E-l5eowqGJXek2W_2KOp3cDJArBtC7u1br3vDY4sjios03DW1rNEI4WcPFA" --redirect-url="https://remotedesktop.google.com/_/oauthredirect" --name=$(hostname)
Related
My motive is to build a MLOps pipeline which is 100% independnt from Cloud service like AWS, GCP and Azure. I have a project for a client in a production factory and would like to build a Camera based Object Tracking ML service for them. I want to build this pipeline in my own server or (on-premise computer). I am really confused with what stacks i should use. I keep ending up with a Cloud component based solution. It would be great to get some advice on what are the components that i can use and preferably open source.
Assuming your main objective is to build a 100% no cloud MLOps pipeline you can do that with mostly open source tech. All of the following can be installed on prem / without cloud services
For Training: You can use whatever you want. I'd recommend Pytorch because it plays nicer with some of the following suggestions, but Tensorflow is also a popular choice.
For CI/CD: if this is going to be on prem and you are going to retrain the model with production data / need to trigger updates to your deployment with each code update you can use Jenkins (open source) or CircleCI (commercial)
For Model Packaging: Chassis (open source) is the only project I am aware of for generically turning AI/ ML model files into something useful that can be run on your intended hardware. It basically takes an AI / ML model file as input and creates a docker image as its output. It's open source and supports Intel, ARM, CPU, and GPU. The website is here: http://www.chassis.ml and the git repo is here: https://github.com/modzy/chassis
For Deployment: Chassis model containers are automatically built with internal gRPC servers that can be deployed locally as docker containers. If you just want to stream a single source of data through them, the SDK has methods for doing that. If you want something that accepts multiple streams or auto scales to available resources on infrastructure you'll need a Kubernetes cluster with a deployment solution like Modzy or KServe. Chassis containers work out of the box with either.
KServe (https://github.com/kserve/kserve) is free, but basically just
gives you a centralized processing platform hosting a bunch of copies
of your running model. It doesn't allow later triage of the model's processing history.
Modzy (https://www.modzy.com/) is commercial, but also adds in all
the RBAC, Job history preservation, auditing, etc. Modzy also has an
edge deployment feature if you want to mange your models centrally, but run them in a distributed manner on the camera hardware instead of on a centralized
server.
As per your requirement for on prem solution, you may go ahead with
Kubeflow ,
Also use the following
default storage class : nfs-provisioner
on prem load balancing : metallb
I have train data for 50GB.
My google drive capacity was 15GB so I upgraded it to 200GB and I uploaded my train data to my google drive
I connected to colab, but I can not find my train data in colab session, So I manually uploaded to colab which has 150GB capacity.
It says, it will be deleted when my colab connection is off.
It is impossible to save train data for colab permanently? And colab is free for 150GB?
And I see colab support nvidia P4 that is almost 5000$. can I use it 100% or it is shared to some portion(like 0.1%) to me? (When P4 is assigned to me)
The way you can do this is to mount your google drive into colab environment. Assume your files are kept under a folder named myfolder in your google drive. This is what I would suggest, do this before you read/write any file:
import os
from google.colab import drive
MOUNTPOINT = '/content/gdrive'
DATADIR = os.path.join(MOUNTPOINT, 'My Drive', 'myfolder')
drive.mount(MOUNTPOINT)
then, for example, your file bigthing.zip reside under myfolder in your google drive will be available in colab as path=os.path.join(DATADIR, 'bigthing.zip')
Similarly, when you save a file to a path like the above, you can find your file in Google Drive under the same directory.
In regards to the final questions, you are able to use it 100%, however, there are very inconsistent restrictions. Generally, you only get about 8 hours straight before you get kicked off, must be running code to keep the connection, and you can only use a GPU a few times in a row before you lose access for a day or so. You can pay for colab pro which would give you more access, and better GPUs in general for $10/month.
In my experience, before colab pro you could get a top GPU (Tesla P100) about 50% of the time. Now that they started the pro version I rarely get a P100 and get kicked off more often. So it can be a bit of a game to get regular use.
Another site that lets you do basically the same thing is https://console.paperspace.com/
They give you only 6 hour shifts on "notebook" but you wont get kicked off before then, and I can usually get a P5000 which is generally better than colab gives me.
https://www.kaggle.com/ will also give you 30 hours per week, so you really could get up to near 2 GPU hours for every hour of the day if you planned your life around it.
I am training some deep learning code from this repository on a Google Colab notebook. The training is ongoing and seems like it is going to take a day or two.
I am new to deep learning, but my question:
Once the Google Colab notebook has finished running the training script, does this mean that the resulting weights and biases will be hard written to a model somewhere (in the repository folder that I have on my Google Drive), and therefore I can then run the code on any test data I like at any point in the future? Or, once I close the Google Colab notebook, do I lose the weight and bias information and would have to run the training script again if I wanted to use the neural network?
I realise that this might depend on the details of the script (again, the repository is here), but I thought that there might be a general way that these things work also.
Any help in understanding would be greatly appreciated.
No; Colab comes with no built-in checkpointing; any saving must be done by the user - so unless the repository code does so, it's up to you.
Note that the repo would need to figure out how to connect to a remote server (or connect to your local device) for data transfer; skimming through its train.py, there's no such thing.
How to save model? See this SO; for a minimal version - the most common, and a reliable option is to "mount" your Google Drive onto Colab, and point save/load paths to direct
from google.colab import drive
drive.mount('/content/drive') # this should trigger an authentication prompt
%cd '/content/drive/My Drive/'
# alternatively, %cd '/content/drive/My Drive/my_folder/'
Once cd'd into, for example, DL Code in your My Drive (see below), you can simply do model.save("model0.h5"), and this will create model0.h5 in DL Code, containing entire model architecture & its optimizer. For just weights, use model.save_weights().
I'm very new to cloud computing and I don't come from a software engineering background, so excuse me if some things I say are incorrect.
I'm used to work on an IDE like Spyder and I'd like to keep it that way. Lately, in my organization we're experimenting with Google Cloud and what I'm trying to do is to run a simple script on the cloud instead of on my computer using Google Cloud's APIs.
Say I want to run this on the cloud through Spyder:
x=3
y=2
print(f'your result is {x+y}')
I'm guessing I could do something like:
from googleapiclient import discovery
compute = discovery.build('compute', 'v1')
request = compute.instances().start(project=project, zone=zone, instance=instance)
request.execute()
#Do something to connect to instance
x=3
y=2
print(f'your result is {x+y}')
Is there any way to do this? Or tell python to run script.py? Thanks, and please tell me if I'm not being clear.
You needn't apologize; everyone is new to cloud computing at some point.
I encourage to read around on cloud computing to get more of a feel for what it is and how it compares with your current experience.
The code you included won't work as-is.
There are 2 modes of interaction with Compute Engine which is one of several compute services in Google Cloud Platform.
Fundamentally, interacting with Compute Engine instances is similar to how you'd interact with your laptop. To run the python program, you'd either start Python's REPL or create a script and then run the script through the python interpreter. This is also how this would work on a Compute Engine instance.
You can do this on Linux in a single line:
python -c "x=2; y=3; print(x+y)"
But, first, you have to tell Compute Engine to create you an instance. You may do this using Google Cloud Console (http://console.cloud.google.com), Google Cloud SDK aka "gcloud", or using e.g. Google's Python library for Compute Engine (this is what your code does). Regardless of which of these approaches you use, all of them ultimately make REST calls against Google Cloud to e.g. provision an instance:
from googleapiclient import discovery
compute = discovery.build('compute', 'v1')
request = compute.instances().start(project=PROJECT, zone=ZONE, instance=INSTANCE)
request.execute()
#Do something to connect to instance
Your example ends connect to instance and this marks the transition between provisioning an instance and interacting with it. An alternative to your code above would be to use Google's command-line often called "gcloud", e.g.:
gcloud compute instances create ${INSTANCE} \
--project=${PROJECT} \
--zone=${ZONE}
gcloud provides a convenience command that allows you to use ssh but it takes care of authentication for you:
gcloud compute ssh ${INSTANCE} \
--project=${PROJECT} \
--zone=${ZONE} \
--command='python -c "x=2; y=3; print(x+y)"'
NB This command ssh's into the Compute Engine instance and then runs your Python program.
This is not the best way to achieve this but I hope it shows you one way that you could achieve it.
As you learn about Google Cloud Platform, you'll learn that there are other compute services. These other compute services provide a higher-level of abstraction. Instead of provisioning a virtual machine, you can deploy code directly to e.g. a Python runtime. Google App Engine and Google Cloud Functions both provide a way by which you could deploy your program directly to a compute service without provisioning instances. Because these services operate at a higher-level, you may work from write, test and even deploy code from within an IDE too.
Google Cloud Platform provides a myriad of compute services depending on your requirements. These are accompanied by storage services, machine-learning, analytics, internet-of-things, developer tools etc. etc. It can be overwhelming but you should start with the basics (follow some "hello world" tutorials) and take it from there.
HTH!
I am looking for a way to access an .csv document that I have registered on drive to perform data analysis. The idea would be to have something similar as pandas' read_csv but to access a remote file, not one registered locally. Note that I don't want to access a Google spreadsheet document : it's a .csv document that I have shared on Google drive. Ideally, I'd like to be able to save it on Drive as well.
Thank you for the help,
Best,
You will want to use Google's File Stream to do this. What it does is basically mount the drive to your computer so that you can access it from anywhere.
So on my windows computer I can open a terminal and then access anything on my drive. (Or if you have a mac you will find it mounted to /Volumes)
>>>ls /mnt/g/
$RECYCLE.BIN My Drive Team Drives
>>>ls /mnt/g/My\ Drive/
test.csv