Uploading Python files to GCP and execute code - python

I was trying to create a VM and upload some python files to GCP, and run the code on my buckets.
I created the VM and SSH into it. After I set up my VM instance with all the Python libraries that I need. Now I'm trying to upload my python files to GCP so that I can execute the code.
So on my Mac, I did gcloud init and then tried the following:
gcloud compute scp /Users/username/Desktop/LZ_demo_poc/helper_functions.py /home/user_name/lz_text_classification/
However, I keep getting these error messages.
WARNING: `gcloud compute copy-files` is deprecated. Please use `gcloud compute scp` instead. Note that `gcloud compute scp` does not have recursive copy on by default. To turn on recursion, use the `--recurse` flag.
ERROR: (gcloud.compute.copy-files) Source(s) must be remote when destination is local. Got sources: [/Users/username/Desktop/LZ_demo_poc/helper_functions.py], destination: /home/user_name/lz_text_classification/
Can anyone help me with the process of running a python script on GCP using data that is saved as buckets.

You need to also specify the instance where you want the files copied, otherwise the destination is interpreted as a local file, leading to the (2nd line of) your error message. From the gcloud compute scp examples:
Conversely, files from your local computer can be copied to a virtual
machine:
$ gcloud compute scp ~/localtest.txt ~/localtest2.txt \
example-instance:~/narnia
In your case it should be something like:
gcloud compute scp /Users/abrahammathew/Desktop/LZ_demo_poc/helper_functions.py your_instance_name:/home/abraham_mathew/lz_text_classification/

Related

Is it possible to get a Docker container to read a file from host file system not using Volumes?

I have a Python script inside a container that needs to continuously read changing values inside a file located on the host file system. Using a volume to mount the file won't work because that only captures a snapshot of the values in the file at that moment. I know it's possible since the node_exporter container is able to read files on the host filesystem using custom methods in Golang. Does anyone know a general method to accomplish this?
I have a Python script [...] that needs to continuously read changing values inside a file located on the host file system.
Just run it. Most Linux systems have Python preinstalled. You don't need Docker here. You can use tools like Python virtual environments if your application has Python library dependencies that need to be installed.
Is it possible to get a Docker container to read a file from host file system not using Volumes?
You need some kind of mount, perhaps a bind mount; docker run -v /host/path:/container/path image-name. Make sure to not overwrite the application's code in the image when you do this, since the mount will completely hide anything in the underlying image.
Without a bind mount, you can't access the host filesystem at all. This filesystem isolation is a key feature of Docker.
...the [Prometheus] node_exporter container...
Reading from the GitHub link in the question, "It's not recommended to deploy it as a Docker container because it requires access to the host system." The docker run example there uses a bind mount to access the entire host filesystem, circumventing Docker's filesystem isolation.

List files in Google Cloud Virtual Machines before SCP

I am trying to download specific files from a google cloud virtual machine. The majority of directories that my google cloud command is searching in have just 1 file for that name. However, some directories have multiple files of similar name with different time stamps. Is there a command I can use to list the files within a google cloud directory so i can find the latest file name before using SCP?
I am currently using the following f string via os.system to download the files. However, this is not good enough for the case where multiple files are in the directory.
download_file = f"gcloud compute scp {project}:/nfs-client/example/documents/ID-{ID}/files/response* --zone=europe-west2-c ./temp-documents/ID-{ID}.xml"
os.system(download_file)
You can use gcloud compute ssh command to get the latest file from a folder:
gcloud compute ssh example-instance --zone=us-central1-a --command "ls -t /nfs-client/example/documents/ID-{ID}/files/response* | head -1" or something like that
then substitute your scp command with the output from the above command to get the latest file.
You can use:
gcloud compute scp instance-2:$(gcloud compute ssh instance-2 --zone=europe-west2-c --command "ls -t /nfs-client/example/documents/ID-{ID}/files/response* | head -1") /temp-documents/ID-{ID}.xml --zone=europe-west2-c
It sounds like what you really want to do is download all the files in the directory. You can do this by passing scp the --recurse flag, like this:
command = f"gcloud compute scp --recurse {project}:/nfs-client/example/documents/ID-{ID}/files/ --zone=europe-west2-c ./temp-documents/ID-{ID}".format(project, ID)
os.system(command)
This will create a directory with the ID, and then put all the response files into that directory.
If, on the other hand, you'd really like to list the files, you could get a list by using gcloud compute ssh, and then fetching the output. You'll need to use something like subprocess.Popen instead of os.system, though:
import subprocess
command = f"gcloud compute ssh {project} -- ls /nfs-client/example/documents/ID-{ID}/files/ --zone=europe-west2-c".format(project, ID)
process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
out = process.communicate()
files = out[0].split()
A few things of note here: fetching a list of files via SSH like this is pretty hacky, and prone to potential breakage. Shelling out to gcloud to do this is pretty inelegant, and you're better off putting those files in something like Google Cloud Storage in order to easily access them from your local machine. The command needs to have your instance name (which you seem to be calling project), and your ID templated in -- I assume you're doing that, and just put in a placeholder. You also need to parse the output -- parsing with str.split() works okay, but could be error prone, particularly if there are spaces in the filenames. There are ways to handle this, but that's another rabbit hole.

How can I connect VS Code to a GPU instance on Google Cloud Platform?

I'm on a Windows 10 machine. I have GPU running on the Google Cloud Platform to train deep learning models.
Historically, I have been running Jupyter notebooks on the cloud server without problem, but recently began preferring to run Python notebooks in VS Code instead of the server based Jupyter notebooks. I'd like to train my VS Code notebooks on my GPUs but I don't have access to my google instances from VS Code, I can only run locally on my CPU.
Normally, to run a typical model, I spin up my instance on the cloud.google.com Compute Engine interface. I use the Ubuntu on the Windows Subsystem for Linux installation and I get in like this:
gcloud compute ssh --zone=$ZONE jupyter#$INSTANCE_NAME -- -L 8080:localhost:8080
I have tried installing the Cloud Code extension so far on VS Code, but as I go through the tutorials, I always sort of get stuck. One error I keep experiencing is that gcloud won't work on anything EXCEPT my Ubuntu terminal. I'd like it to work in the terminal inside VS Code.
Alternatively, I'd like to run the code . command on my Ubuntu command line so I can open VS Code from there, and that won't work. I've googled a few solutions, but they lead me to these same problems with neither gcloud not working, nor code . working.
Edit: I just tried the Google Cloud SDK installer from https://cloud.google.com/sdk/docs/quickstart-windows
and then I tried running gcloud compute ssh from the powershell from within VSCODE. This is the new error I got:
(base) PS C:\Users\user\Documents\dev\project\python> gcloud compute ssh --zone=$ZONE jupyter#$INSTANCE_NAME -- -L 8080:localhost:8080
WARNING: The PuTTY PPK SSH key file for gcloud does not exist.
WARNING: The public SSH key file for gcloud does not exist.
WARNING: The private SSH key file for gcloud does not exist.
WARNING: You do not have an SSH key for gcloud.
WARNING: SSH keygen will be executed to generate a key.
ERROR: (gcloud.compute.ssh) could not parse resource []
It still runs from Ubuntu using WSL, I logged in fine. I guess I just don't know entirely enough about how they're separated, what's shared, and what is missing, and to how to get all my command lines using the same stuff.
It seems as if your ssh key paths are configured correctly for your Ubuntu terminal but not for the VS Code one. If your account is not configured to use OS Login, with which Compute Engine stores the generated key with your user account, local SSH keys are needed. SSH keys are specific to each instance you want to access and here is where you can find them. Once you have find them you can specify their path using the --ssh-key-file flag.
Another option is to use OS Login as I have mentioned before.
Here you have another thread with a similar problem than yours.

How to run python script at my mac's terminal after being connected to GCP virtual machine?

I followed the instructions from this answer (https://datascience.stackexchange.com/questions/27352/how-to-run-a-python-script-on-gcp-compute-engine) to connect to a virtual machine which I created on GCP Compute Engine and run a python script from the terminal of my laptop on GCP Compute Engine.
This answer suggests that after I have connected to the virtual machine then I must enter the following at the terminal of my laptop: python your_script.py
However, when I point to the exact location of my .py file and enter:
python /Users/Paul/PycharmProjects/Eyeglasses_colour/Main.py
the response is the following:
python: can't open file '/Users/Paul/PycharmProjects/Eyeglasses_colour/Main.py': [Errno 2] No such file or directory
What is wrong? Can't I run my python script (while connected at the GCP virtual machine)?
You will not have access to any /Users folders on a Google Cloud machine because they're running Linux, and that's commonly a Mac path.
That link assumes your files are already on the server. There are several ways that can happen.
Code was initially written there
You checkout the code from version control.
You scp the files using gcloud compute scp
This problem does not apply to Python or Google's services. It's applicable to all remote SSH sessions

Some elementary doubts about running Mapreduce programs using mrjob on Amazon EMR

I am new to mrjob and I am having problems to get the job running on Amazon EMR. I will write them in sequential order.
I can run a mrjob on my local machine. However when I have mrjob.conf in /home/ankit/.mrjob.conf and in /etc/mrjob.conf, the job is not executed on my local machine.
Here is what I am getting. https://s3-ap-southeast-1.amazonaws.com/imagna.sample/local.txt
What is MRJOB_CONF in "the location specified by MR_CONF" in the documentation?
What is the use of 'base_tmp_directory' ? Also, do I need to upload the input data in S3 before starting the job or it will load from my local computer while starting the execution?
Do I need to do some bootstrapping if I use some libraries like numpy, scikit etc? If yes, how?
This is what I am getting when I execute the command for running a job on EMR https://s3-ap-southeast-1.amazonaws.com/imagna.sample/emr.txt
Any solutions?
Thanks a lot.
Your URL is invalid (I get an "Access Denied" error).
mrjob.conf is a configuration file. It can be located in several locations, see http://pythonhosted.org/mrjob/configs-conf.html
You can use input data from your local machine just by specifying the paths to the input files on the command line. MRJob will upload the data to S3 for you. If you specify an s3://... URL, MRJob will use the data at that S3 path.
To use non-standard packages, see http://pythonhosted.org/mrjob/writing-and-running.html#custom-python-packages
Your URL is invalid (I get an "Access Denied" error).

Categories

Resources