Securely accessing storage via GCP cloud function? - python

I need to write a cloud function in GCP, which responds to HTTP requests and has service account access to GCP cloud storage. My function will receive a string and threshold parameters. It will retrieve a csv file from cloud storage, compute similarity between the text string supplied and the entities in the csv file and return the entities that satisfy the threshold requirements.
From Google's cloud function tutorials, I have yet to see anything that gives it cloud storage access, a service account for access therein, etc.
Could anyone link a resource or otherwise explain how to get from A to B?

Let the magic happens!
In fact, nothing is magic. With most of Google Cloud products, you have a service account that you can grant the permission that you want. On Cloud Functions, the default service account is the AppEngine default service account with this pattern <projectID>#appspot.gserviceaccount.com.
When you deploy a Cloud FUnctions you can use a custom service account by using this paramenter --service-account=. It's safer because your Cloud Functions can have his own service account, with limited permissions (App Engine default service account is Project Editor by default, which is too wide!!)
So, this service is loaded automatically with your cloud functions and the Google Cloud auth libraries can access it via the Metadata server. The credentials is taken from the runtime context, it's the default credential of the environment
About your code, keep it as simple as that
from google.cloud import storage
client = storage.Client() # Use default credentials
bucket = client.get_bucket('myBucket')
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.size)
On your workstation, if you want to execute the same code, you can use your own credential by running this command gcloud auth application-default login If you prefer using a service account key file (that I strongly don't recommend, but it's not the topic), you can set the environment variable GOOGLE_APPLICATION_CREDENTIALS with the file path as value

Related

How can I resolve "HttpError: Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object., 401" error?

The problem:
I'm trying to read in a .gz JSON file that is stored in one of my project's cloud storage bucket using a google colab python notebook and I keep getting this error:
HttpError: Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object., 401
My code:
fs = gcsfs.GCSFileSystem(project='my-project')
with fs.open('bucket/path.json.gz') as f:
gz = gzip.GzipFile(fileobj=f)
file_as_string = gz.read()
json_a = json.loads(file_as_string)
I've tried all of these authentication methods and still get the same 401 error :
!gcloud auth login
!gcloud auth list
!gcloud projects list
!gcloud config set project 'myproject-id'
from google.colab import auth
auth.authenticate_user()
!gcloud config set account 'my GCP email'
!gcloud auth activate-service-account
!gcloud auth application-default login
!gsutil config
!gcloud config set pass_credentials_to_gsutil false
!gsutil config -a
I've also set my GCP IAM permissions to:
Editor
Owner
Storage Admin
Storage Object Admin
Storage Object Creator
Storage Object Viewer
Storage Transfer Admin
It's not entirely clear from your question but:
gcloud and Google SDKs both use Google's identity|auth platform but they don't share state. You usually (!) can't login using gcloud and expect code using an SDK to be authenticated too
#john-hanley correctly points out that one (often confusing) way to share state between gcloud and code using Google SDKs is to use gcloud auth application-default-login. However, this only works because gcloud writes its state locally and code using Google SDKs when running as the same user on the same host, will be able to access this state. I think (!?) this won't work with browser-based collab
I'm unfamiliar with gcsfs.GCSFileSystem but, it is not a Google SDK. Unless its developers have been particularly thoughtful, it won't be able to leverage authentication done by the Google SDK using auth.authenticate_user().
So...
I think you should:
Ensure that your user account (you#gmail.com or whatever) has roles/storage.objectAdmin (or any predefined role that permits storage.objects.get).
Use google.collab.auth and auth.authenticate_user() to obtain credentials for the browser's logged-in user (i.e. you#gmail.com).
Use a Google Cloud Storage library, e.g. google-cloud-storage to access the GCS object. The Google library can leverage the credentials obtained in the previous step.
Update
Here's an example.
NOTE: it use the API Client Library rather than the Cloud Client Library but these are functionally equivalent.

How can I grant a Cloud Run service access to service account's credentials without the key file?

I'm developing a Cloud Run Service that accesses different Google APIs using a service account's secrets file with the following python 3 code:
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(SECRETS_FILE_PATH, scopes=SCOPES)
In order to deploy it, I upload the secrets file during the build/deploy process (via gcloud builds submit and gcloud run deploy commands).
How can I avoid uploading the secrets file like this?
Edit 1:
I think it is important to note that I need to impersonate user accounts from GSuite/Workspace (with domain wide delegation). The way I deal with this is by using the above credentials followed by:
delegated_credentials = credentials.with_subject(USER_EMAIL)
Using the Secret Manager might help you, as you can manage the multiple secrets you have and not have them stored as files, as you are doing right now. I would recommend you to take a look at this article here, so you can get more information on how to use it with Cloud Run, to improve the way you manage your secrets.
In addition to that, as clarified in this similar case here, you have two options: use default service account that comes with it or deploy another one with the Service Admin role. This way, you won't need to specify keys with variables - as clarified by a Google developer in this specific answer.
To improve the security, the best way is to never use service account key file, locally or on GCP (I wrote an article on this). To achieve this, Google Cloud service have an automatically loaded service account, either this one by default or, when possible, a custom one.
On Cloud Run, the default service account is the Compute Engine default service account (I recommend you to never use it, it has editor role on the project, it's too wide!), or you can specify the service account to use (--service-account= parameter)
Then, in your code, simply use the ADC mechanism (Application Default Credential) to get your credentials, like this in Python
import google.auth
credentials, project_id = google.auth.default(scopes=SCOPES)
I've found one way to solve the problem.
First, as suggested by guillaume blaquiere answer, I used google.auth ADC mechanism:
import google.auth
credentials, project_id = google.auth.default(scopes=SCOPES)
However, as I need to impersonate GSuite's (now Workspace) accounts, this method is not enough, as the credentials object generated from this method does not have the with_subject property. This led me to this similar post and specific answer which works a way to convert google.auth.credentials into the Credential object returned by service_account.Credentials.from_service_account_file. There was one problem with his solution, as it seemed that an authentication scope was missing.
All I had to do is add the https://www.googleapis.com/auth/cloud-platform scope to the following places:
The SCOPES variable in the code
Google Admin > Security > API Controls > Set client ID and scope for the service account I am deploying with
At the OAuth Consent Screen of my project
After that, my Cloud Run had access to credentials that were able to impersonate user's accounts without using key files.

How to authenticate Google APIs (Google Drive API) from Google Compute Engine and locally without downloading Service Account credentials?

Our company is working on processing data from Google Sheets (within Google Drive) from Google Cloud Platform and we are having some problems with the authentication.
There are two different places where we need to run code that makes API calls to Google Drive: within production in Google Compute Engine, and within development environments i.e. locally on our developers' laptops.
Our company is quite strict about credentials and does not allow the downloading of Service Account credential JSON keys (this is better practice and provides higher security). Seemingly all of the docs from GCP say to simply download the JSON key for a Service Account and use that. Or Google APIs/Developers docs say to create an OAuth2 Client ID and download it’s key like here.
They often use code like this:
from google.oauth2 import service_account
SCOPES = ['https://www.googleapis.com/auth/sqlservice.admin']
SERVICE_ACCOUNT_FILE = '/path/to/service.json'
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
But we can't (or just don't want to) download our Service Account JSON keys, so we're stuck if we just follow the docs.
For the Google Compute Engine environment we have been able to authenticate by using GCP Application Default Credentials (ADCs) - i.e. not explicitly specifying credentials to use in code and letting the client libraries “just work” - this works great as long as one ensures that the VM is created with the correct scopes https://www.googleapis.com/auth/drive, and the default compute Service Account email is given permission to the Sheet that needs to be accessed - this is explained in the docs here. You can do this like so;
from googleapiclient.discovery import build
service = build('sheets', 'v4')
SPREADSHEET_ID="<sheet_id>"
RANGE_NAME="A1:A2"
s = service.spreadsheets().values().get(
spreadsheetId=SPREADSHEET_ID,
range=RANGE_NAME, majorDimension="COLUMNS"
).execute()
However, how do we do this for development, i.e. locally on our developers' laptops? Again, without downloading any JSON keys, and preferably with the most “just works” approach possible?
Usually we use gcloud auth application-default login to create default application credentials that the Google client libraries use which “just work”, such as for Google Storage. However this doesn't work for Google APIs outside of GCP, like Google Drive API service = build('sheets', 'v4') which fails with this error: “Request had insufficient authentication scopes.”. Then we tried all kinds of solutions like:
credentials, project_id = google.auth.default(scopes=["https://www.googleapis.com/auth/drive"])
and
credentials, project_id = google.auth.default()
credentials = google_auth_oauthlib.get_user_credentials(
["https://www.googleapis.com/auth/drive"], credentials._client_id, credentials._client_secret)
)
and more...
Which all give a myriad of errors/issues we can’t get past when trying to do authentication to Google Drive API :(
Any thoughts?
One method for making the authentication from development environments easy is to use Service Account impersonation.
Here is a blog about using service account impersonation, including the benefits of doing this. #johnhanley (who wrote the blog post) is a great guy and has lots of very informative answers on SO also!
To be able to have your local machine authenticate for Google Drive API you will need to create default application credentials on your local machine that impersonates a Service Account and apply the scopes needed for the APIs you want to access.
To be able to impersonate a Service Account your user must have the role roles/iam.serviceAccountTokenCreator. This role can be applied to an entire project or to an individual Service Account.
You can use the gcloud to do this:
gcloud iam service-accounts add-iam-policy-binding [COMPUTE_SERVICE_ACCOUNT_FULL_EMAIL] \
--member user:[USER_EMAIL] \
--role roles/iam.serviceAccountTokenCreator
Once this is done create the local credentials:
gcloud auth application-default login \
--scopes=openid,https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/accounts.reauth \
--impersonate-service-account=[COMPUTE_SERVICE_ACCOUNT_FULL_EMAIL]
This will solve the scopes error you got. The three extra scopes added beyond the Drive API scope are the default scopes that gcloud auth application-default login applies and are needed.
If you apply scopes without impersonation you will get an error like this when trying to authenticate:
HttpError: <HttpError 403 when requesting https://sheets.googleapis.com/v4/spreadsheets?fields=spreadsheetId&alt=json returned "Your application has authenticated using end user credentials from the Google Cloud SDK or Google Cloud Shell which are not supported by the sheets.googleapis.com. We recommend configuring the billing/quota_project setting in gcloud or using a service account through the auth/impersonate_service_account setting. For more information about service accounts and how to use them in your application, see https://cloud.google.com/docs/authentication/.">
Once you have set up the credentials you can use the same code that is run on Google Compute Engine on your local machine :)
Note: it is also possible to set the impersonation for all gcloud commands:
gcloud config set auth/impersonate_service_account [COMPUTE_SERVICE_ACCOUNT_FULL_EMAIL]
Creating default application credentails on your local machine by impersonating a service account is a slick way of authenticating development code. It means that the code will have exactly the same permissions as the Service Account that it is impersonating. If this is the same Service Account that will run the code in production you know that code in development runs the same as production. It also means that you never have to create or download any Service Account keys.

Authenticating with google cloud sdk without service account

Running from a local jupyter notebook.
I am trying to do the very simple task of downloading a file from a GCP storage bucket using the following code:
from google.cloud import storage
# Initialise a client
storage_client = storage.Client("gcpkey.json")
# Create a bucket object for our bucket
bucket = storage_client.get_bucket("gcp_bucket")
# Create a blob object from the filepath
blob = bucket.blob("file_in_bucket_I_want.file")
# Download the file to a destination
blob.download_to_filename("destination_file_name")
Importantly I want to use my end-user account and cannot use a service account. I am finding the google docs incredibly confusing. Could someone please tell me where to get the gcp json key and whether it is as simple as the code snippet above or whether I have to add some intermediary steps?
When I follow the linked docs I get sent to an OAuth portal and when I login through google I get this error message:
This app isn't verified
This app hasn't been verified by Google yet. Only proceed if you know and trust the developer.
If you’re the developer, submit a verification request to remove this screen. Learn more
The easiest would be to just run
gcloud auth application-default login
And then in the script just:
storage_client = storage.Client()
And it will get the credentials from the environment.
Alternatively, you can follow this doc to go through the OAuth consent screen and generate a client secret.

Use Python Google Storage Client without credentials

I am using the Python Google Storage Client, however I am using a bucket with public read/write access. (I know this is usually a terrible idea but I have a rare use case where it is fine).
When I try to retrieve some files, I get a DefaultCredentialsError.
BUCKET_NAME = 'my-public-bucket-name'
storage_client = storage.Client()
bucket = storage_client.get_bucket(BUCKET_NAME)
def list_blobs(prefix, delimiter=None):
blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)
print('Blobs:')
for blob in blobs:
print(blob.name)
The specific error reads:
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
That page suggests using Oath or other tokens, but I shouldn't need these since my bucket is public? I can make an HTTP request to the bucket in chrome and receive data.
How should I get around this issue? Can I provide default or null credentials?
The default for a storage client with no parameters is to use environment credentials (e.g. authenticate with the gcloud tools first). If you want to use a client with no credentials you have to use
the create_anonymous_client method, which lets you access resources available to allUsers.
Be careful though which APIs you use, not all of them support anonymous credentials. E.g. instead of client.get_bucket('my-bucket') you have to use client.bucket(bucket_name='my-bucket').
Also note that it seems any permissions error returns a generic ValueError: Anonymous credentials cannot be refreshed.. E.g. if you try to overwrite an existing file while only having read/write permissions.
So a full example of uploading a file to a publicly accessible bucket is
from google.cloud import storage
client = storage.Client.create_anonymous_client()
bucket = client.bucket(bucket_name='my-public-bucket')
blob = bucket.blob('my-file')
blob.upload_from_filename('my-local-file')
From "Cloud Storage Authentication":
Most of the operations you perform in Cloud Storage must be authenticated. The only exceptions are operations on objects that allow anonymous access. Objects are anonymously accessible if the allUsers group has READ permission. The allUsers group includes anyone on the Internet.

Categories

Resources