Run a gsutil command in a Google Cloud Function - python

I would like to run a gsutil command every x minutes as a cloud function. I tried the following:
# main.py
import os
def sync():
line = "gsutil -m rsync -r gs://some_bucket/folder gs://other_bucket/other_folder"
os.system(line)
While the Cloud Function gets triggered, the execution of the line does not work (or i.e. the files are not copied from one bucket to another). However, it does work fine when I run it locally in Pycharm or with cmd. What is the difference with cloud functions?

You can use Cloud Run for this. You have very few change to perform in your code.
Create a container with gsutil installed and python also, for example gcr.io/google.com/cloudsdktool/cloud-sdk as base image
Take care of the service account used when you deploy Cloud Run, grant the correct permission for accessing to your bucket
Let me know if you need more guidance

Cloud Functions server instances don't have gsutil installed. It works on your local machine because you do have it installed and configured there.
I suggest trying to find a way to do what you want with the Cloud Storage SDK for python. Or figure out how to deploy gsutil with your function and figure out how to configure and invoke it from your code, but that might be very difficult.

There's no straightforward option for that.
I think the best for Cloud Functions is to use google-cloud-storage python library

Related

How to simulate AWS services and env locally using Python moto?

Is it practically possible to simulate AWS environment locally using Moto and Python?
I want to write a aws gluejob that will fetch record from my local database and will upload to S3 bucket for data quality check and later trigger a lambda function for cronjob run using Moto Library using moto.lock_glue decorator.Any suggestion or document would be highly appreciated as I don't see much clue on same.Thank you in advance.
AFAIK, moto is meant to patch boto modules for testing.
I have experience working with LocalStack, a docker you can run locally, and it acts as a live service emulator for most AWS services (some are only available for paying users).
https://docs.localstack.cloud/getting-started/
You can see here which services are supported by the free version.
https://docs.localstack.cloud/user-guide/aws/feature-coverage/
in order to use it, you need to change the endpoint-url to point to the local service running on docker.
As it's a docker, you can incorporate it with remote tests as well e.g., if you're using k8s or a similar orchestrator

Unable to write files in a GCP bucket using gcsfuse

I have mounted a storage bucket on a VM using the command:
gcsfuse my-bucket /path/to/mount
After this I'm able to read files from the bucket in Python using Pandas, but I'm not able to write files nor create new folders. I have tried with Python and from the terminal using sudo but get the same error.
I have also tried Using the key_file from the bucket:
sudo mount -t gcsfuse -o implicit_dirs,allow_other,uid=1000,gid=1000,key_file=Notebooks/xxxxxxxxxxxxxx10b3464a1aa9.json <BUCKET> <PATH>
It does not through errors when I run the code, but still I'm not able to write in the bucket.
I have also tried:
gcloud auth login
But still have the same issue.
I ran into the same thing a while ago, which was really confusing. You have to set the correct access scope for the virtual machine so that anyone using the VM is able to call the storage API. The documentation shows that the default access scope for storage on a VM is read-only:
When you create a new Compute Engine instance, it is automatically
configured with the following access scopes:
Read-only access to Cloud Storage:
https://www.googleapis.com/auth/devstorage.read_only
All you have to do is change this scope so that you are also able to write to storage buckets from the VM. You can find an overview of different scopes here. To apply the new scope to your VM, you have to first shut it down. Then from your local machine execute the following command:
gcloud compute instances set-scopes INSTANCE_NAME \
--scopes=storage-rw \
--zone=ZONE
You can do the same thing from the portal if you go to the settings of your VM, scroll all the way down, and choose "Set Access for each API". You have the same options when you create the VM for the first time. Below is an example of how you would do this:

How to update gcloud on Google Cloud Composer worker nodes?

There's a similar question here but from 2018 were the solution requires changing the base image for the workers. Another suggestion is to ssh into each node and apt-get install there. This doesn't seem useful because when auto scale spawns new nodes, you'd need to do it again and again.
Anyway, is there a reasonable way to upgrade the base gcloud in late 2020?
Because task instances run in a shared execution environment, it is generally not recommended to use the gcloud CLI within Composer Airflow tasks, when possible, to avoid state or version conflicts. For example, if you have multiple users using the same Cloud Composer environment, and either of them changes the active credentials used by gcloud, then they can unknowingly break the other's workflows.
Instead, consider using the Cloud SDK Python libraries to do what you need to do programmatically, or use the airflow.providers.google.cloud operators, which may already have what you need.
If you really need to use the gcloud CLI and don't share the environment, then you can use a BashOperator with a install/upgrade script to create a prerequisite for any tasks that need to use the CLI. Alternatively, you can build a custom Docker image with gcloud installed, and use GKEPodOperator or KubernetesPodOperator to run a Kubernetes pod to run the CLI command. That would be slower, but more reliable than verifying dependencies each time.

Is there a way to create a Python script on Google Cloud Function that Downloads a file from a Bucket to your local computer?

Currently, I know of two ways to download a file from a bucket to your computer
1) Manually go to the bucket and clicking download
2) Use gsutil
Is there a way to do this in a program in Google Cloud Functions? You can't execute gsutil in a Python script. I also tried this:
with open("C:\", "wb") as file_obj:
blob.download_to_file(file_obj)
but I believe that looks for a directory on Google Cloud. Please help!
The code you are using is for downloading the files from your local machine, this is covered in this document, however, as you mention this would look for the local file in the machine executing the Cloud Function.
I would suggest you to create a cron to download the files to your local instance through gsutil, doing this through a Cloud Function is not going to be possible.
Hope you find this useful.
You can't directly achieve what you want. Cloud Function must be able to reach your local computer for copying the file on it.
The common way for sending file to a computer is to use FTP protocol. So install a FTP server on your computer and to set up your function for reading into your bucket and then send the file to your FTP server (you have to get your public IP, be sure that the firewall rules/routers are configured for this,...). It's not the easiest way.
A gsutil with a rsync command work perfectly. Use a planned task on Windows if you want to cron this.

How to use gcloud commands programmatically via Python

The Google documentation is a little generic on this topic and I find it hard to get around the different APIs and terms they're using, so I'm wondering if someone could point me to the right direction.
I'm looking for a way to call the gcloud command directly from Python. I've installed gcloud in my Python environment and as an example to follow, I'd like to know how to do the following from Python:
gcloud compute copy-files [Source directory or file name] [destination directory of file name]
You should check out gcloud:
https://pypi.python.org/pypi/gcloud
There's nothing magic about uploading files to a Computer Engine VM. I ended up using paramiko to upload files.
You can of course call gcloud from python directly and not care about the implementation details, or you can try to see what gcloud does:
Try running gcloud compute copy-files with the --dry-run flag. That will expose the scp command it uses underneath and with what arguments. Knowing what scp params you need, you can recreate them programmatically using paramiko_scp in python. More information on this here: How to scp in python?
You can use the subprocess.run function in python to execute commands from your terminal/shell/bash. That is what I have done to execute gcloud commands from python, rather than using the Python SDK.

Categories

Resources