Unable to write files in a GCP bucket using gcsfuse - python

I have mounted a storage bucket on a VM using the command:
gcsfuse my-bucket /path/to/mount
After this I'm able to read files from the bucket in Python using Pandas, but I'm not able to write files nor create new folders. I have tried with Python and from the terminal using sudo but get the same error.
I have also tried Using the key_file from the bucket:
sudo mount -t gcsfuse -o implicit_dirs,allow_other,uid=1000,gid=1000,key_file=Notebooks/xxxxxxxxxxxxxx10b3464a1aa9.json <BUCKET> <PATH>
It does not through errors when I run the code, but still I'm not able to write in the bucket.
I have also tried:
gcloud auth login
But still have the same issue.

I ran into the same thing a while ago, which was really confusing. You have to set the correct access scope for the virtual machine so that anyone using the VM is able to call the storage API. The documentation shows that the default access scope for storage on a VM is read-only:
When you create a new Compute Engine instance, it is automatically
configured with the following access scopes:
Read-only access to Cloud Storage:
https://www.googleapis.com/auth/devstorage.read_only
All you have to do is change this scope so that you are also able to write to storage buckets from the VM. You can find an overview of different scopes here. To apply the new scope to your VM, you have to first shut it down. Then from your local machine execute the following command:
gcloud compute instances set-scopes INSTANCE_NAME \
--scopes=storage-rw \
--zone=ZONE
You can do the same thing from the portal if you go to the settings of your VM, scroll all the way down, and choose "Set Access for each API". You have the same options when you create the VM for the first time. Below is an example of how you would do this:

Related

How to simulate AWS services and env locally using Python moto?

Is it practically possible to simulate AWS environment locally using Moto and Python?
I want to write a aws gluejob that will fetch record from my local database and will upload to S3 bucket for data quality check and later trigger a lambda function for cronjob run using Moto Library using moto.lock_glue decorator.Any suggestion or document would be highly appreciated as I don't see much clue on same.Thank you in advance.
AFAIK, moto is meant to patch boto modules for testing.
I have experience working with LocalStack, a docker you can run locally, and it acts as a live service emulator for most AWS services (some are only available for paying users).
https://docs.localstack.cloud/getting-started/
You can see here which services are supported by the free version.
https://docs.localstack.cloud/user-guide/aws/feature-coverage/
in order to use it, you need to change the endpoint-url to point to the local service running on docker.
As it's a docker, you can incorporate it with remote tests as well e.g., if you're using k8s or a similar orchestrator

Connection killed Azure container

I am trying to access to a Azure container to dowload some blobs with my python code.
My code is working perfectly on windows but when I execute it on my debian VM I have this error message :
<azure.storage.blob._container_client.ContainerClient object at 0x7f0c51cafd10>
Killed
admin_bbt#vm-bbt-cegidToAZ:/lsbCodePythonCegidToAZ/fuzeo_bbt_vmLinux_csvToAZ$
The blob I am trying to acces is not mine but I do have the sas key.
My code fail after this line :
container = ContainerClient.from_container_url(sas_url)
What I have tried to do :
move my VM to an other location
open the port 445 on my VM
install cifs-utils
Usually this issue comes when our VM was not enabled for managed identities for azure resources on VM. This MS Docs helped me to enable it successfully (MSDocs1, MSDocs2)
We need to check for network access rules which are as below
Go to the storage account you want to secure.
Select on the settings menu called Networking.
To deny access by default, choose to allow access from Selected networks. To allow traffic from all networks, choose to allow access from All networks.
Select Save to apply your changes.
Also along with these setting changes, need to ensure users can access blob storage, and might need to add vnet integration
Check this MS Docs for understanding about Azure Storage firewall rules.
We can use MSI to authenticate with VM

Run a gsutil command in a Google Cloud Function

I would like to run a gsutil command every x minutes as a cloud function. I tried the following:
# main.py
import os
def sync():
line = "gsutil -m rsync -r gs://some_bucket/folder gs://other_bucket/other_folder"
os.system(line)
While the Cloud Function gets triggered, the execution of the line does not work (or i.e. the files are not copied from one bucket to another). However, it does work fine when I run it locally in Pycharm or with cmd. What is the difference with cloud functions?
You can use Cloud Run for this. You have very few change to perform in your code.
Create a container with gsutil installed and python also, for example gcr.io/google.com/cloudsdktool/cloud-sdk as base image
Take care of the service account used when you deploy Cloud Run, grant the correct permission for accessing to your bucket
Let me know if you need more guidance
Cloud Functions server instances don't have gsutil installed. It works on your local machine because you do have it installed and configured there.
I suggest trying to find a way to do what you want with the Cloud Storage SDK for python. Or figure out how to deploy gsutil with your function and figure out how to configure and invoke it from your code, but that might be very difficult.
There's no straightforward option for that.
I think the best for Cloud Functions is to use google-cloud-storage python library

Simple Google Cloud deployment: Copy Python files from Google Cloud repository to app engine

I'm implementing continuous integration and continuous delivery for a large enterprise data warehouse project.
All the code reside in Google Cloud Repository and I'm able to set up Google Cloud Build trigger, so that every time code of specific file type (Python scripts) are pushed to the master branch, a Google Cloud build starts.
The Python scripts doesn't make up an app. They contain an ODBC connection string and script to extract data from a source and store it as a CSV-file. The Python scripts are to be executed on a Google Compute Engine VM Instance with AirFlow installed.
So the deployment of the Python scripts is as simple as can be: The .py files are only to be copied from the Google Cloud repository folder to a specific folder on the Google VM instance. There is not really a traditionally build to run, as all the Python files are separate for each other and not part of an application.
I thought this would be really easy, but now I have used several days trying to figure this out with no luck.
Google Cloud Platform provides several Cloud Builders, but as far as I can see none of them can do this simple task. Using GCLOUD also does not work. It can copy files but only from local pc to VM not from source repository to VM.
What I'm looking for is a YAML or JSON build config file to copy those Python files from source repository to Google Compute Engine VM Instance.
Hoping for some help here.
The files/folders in the Google Cloud repository aren't directly accessible (it's like a bare git repository), you need to first clone the repo then copy the desired files/folders from the cloned repo to their destinations.
It might be possible to use a standard Fetching dependencies build step to clone the repo, but I'm not 100% certain of it in your case, since you're not actually doing a build:
steps:
- name: gcr.io/cloud-builders/git
args: ['clone', 'https://github.com/GoogleCloudPlatform/cloud-builders']
If not you may need one (or more) custom build steps. From Creating Custom Build Steps:
A custom build step is a container image that the Cloud Build worker
VM pulls and runs with your source volume-mounted to /workspace.
Your custom build step can execute any script or binary inside the
container; as such, it can do anything a container can do.
Custom build steps are useful for:
Downloading source code or packages from external locations
...

GoogleCloud DataFlow Failed to write a file to temp location

I am building a beam pipeline on Google cloud dataflow.
I am getting an error that cloud dataflow does not have permissions to write to a temp directory.
This is confusing since clearly dataflow has the ability to write to the bucket, it created a staging folder.
Why would I be able to write a staging folder, but not a temp folder?
I am running from within a docker container on a compute engine. I am fully authenticated with my service account.
PROJECT=$(gcloud config list project --format "value(core.project)")
BUCKET=gs://$PROJECT-testing
python tests/prediction/run.py \
--runner DataflowRunner \
--project $PROJECT \
--staging_location $BUCKET/staging \
--temp_location $BUCKET/temp \
--job_name $PROJECT-deepmeerkat \
--setup_file tests/prediction/setup.py
EDIT
In response to #alex amato
Does the bucket belong to the project or is it owned by another project?
Yes, when I go the home screen for the project, this is one of four buckets listed. I commonly upload data and interact with other google cloud services (cloud vision API) from this bucket.
Would you please provide the full error message.
"(8d8bc4d7fc4a50bd): Failed to write a file to temp location 'gs://api-project-773889352370-testing/temp/api-project-773889352370-deepmeerkat.1498771638.913123'. Please make sure that the bucket for this directory exists, and that the project under which the workflow is running has the necessary permissions to write to it."
"8d8bc4d7fc4a5f8f): Workflow failed. Causes: (8d8bc4d7fc4a526c): One or more access checks for temp location or staged files failed. Please refer to other error messages for details. For more information on security and permissions, please see https://cloud.google.com/dataflow/security-and-permissions."
Can you confirm that there isn't already an existing GCS object which matches the name of the GCS folder path you are trying to use?
Yes, there is no folder named temp in the bucket.
Could you please verify the permissions you have match the members you run as
Bucket permissions have global admin
which matches my gcloud auth
#chamikara was correct. Despite inheriting credentials from my service account, cloud dataflow needs its own credentials.
Can you also give access to cloudservices account (<project-number>#developer.gserviceaccount.com) as mentioned in cloud.google.com/dataflow/security-and-permissions.
Ran into the same issue with a different cause: I had set object retention policies, which prevents manual deletions. Given that renaming triggers a deletion, this error happened.
Therefore, if anyone runs into a similar issue, investigate your temp bucket's properties and potentially lift any retention policies.
I've got similar errors while moving from DirectRunner to DataflowRunner:
Staged package XXX.jar at location 'gs://YYY/staging/XXX.jar' is inaccessible.
After I've played with the permissions, this is what I did:
at Storage Browser, clicked on Edit Bucket Permissions (for the specific bucket), added the right Storage Permission for the member ZZZ-compute#developer.gserviceaccount.com
I hope this will save future time for other users as well.

Categories

Resources