Running from a local jupyter notebook.
I am trying to do the very simple task of downloading a file from a GCP storage bucket using the following code:
from google.cloud import storage
# Initialise a client
storage_client = storage.Client("gcpkey.json")
# Create a bucket object for our bucket
bucket = storage_client.get_bucket("gcp_bucket")
# Create a blob object from the filepath
blob = bucket.blob("file_in_bucket_I_want.file")
# Download the file to a destination
blob.download_to_filename("destination_file_name")
Importantly I want to use my end-user account and cannot use a service account. I am finding the google docs incredibly confusing. Could someone please tell me where to get the gcp json key and whether it is as simple as the code snippet above or whether I have to add some intermediary steps?
When I follow the linked docs I get sent to an OAuth portal and when I login through google I get this error message:
This app isn't verified
This app hasn't been verified by Google yet. Only proceed if you know and trust the developer.
If you’re the developer, submit a verification request to remove this screen. Learn more
The easiest would be to just run
gcloud auth application-default login
And then in the script just:
storage_client = storage.Client()
And it will get the credentials from the environment.
Alternatively, you can follow this doc to go through the OAuth consent screen and generate a client secret.
Related
The problem:
I'm trying to read in a .gz JSON file that is stored in one of my project's cloud storage bucket using a google colab python notebook and I keep getting this error:
HttpError: Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object., 401
My code:
fs = gcsfs.GCSFileSystem(project='my-project')
with fs.open('bucket/path.json.gz') as f:
gz = gzip.GzipFile(fileobj=f)
file_as_string = gz.read()
json_a = json.loads(file_as_string)
I've tried all of these authentication methods and still get the same 401 error :
!gcloud auth login
!gcloud auth list
!gcloud projects list
!gcloud config set project 'myproject-id'
from google.colab import auth
auth.authenticate_user()
!gcloud config set account 'my GCP email'
!gcloud auth activate-service-account
!gcloud auth application-default login
!gsutil config
!gcloud config set pass_credentials_to_gsutil false
!gsutil config -a
I've also set my GCP IAM permissions to:
Editor
Owner
Storage Admin
Storage Object Admin
Storage Object Creator
Storage Object Viewer
Storage Transfer Admin
It's not entirely clear from your question but:
gcloud and Google SDKs both use Google's identity|auth platform but they don't share state. You usually (!) can't login using gcloud and expect code using an SDK to be authenticated too
#john-hanley correctly points out that one (often confusing) way to share state between gcloud and code using Google SDKs is to use gcloud auth application-default-login. However, this only works because gcloud writes its state locally and code using Google SDKs when running as the same user on the same host, will be able to access this state. I think (!?) this won't work with browser-based collab
I'm unfamiliar with gcsfs.GCSFileSystem but, it is not a Google SDK. Unless its developers have been particularly thoughtful, it won't be able to leverage authentication done by the Google SDK using auth.authenticate_user().
So...
I think you should:
Ensure that your user account (you#gmail.com or whatever) has roles/storage.objectAdmin (or any predefined role that permits storage.objects.get).
Use google.collab.auth and auth.authenticate_user() to obtain credentials for the browser's logged-in user (i.e. you#gmail.com).
Use a Google Cloud Storage library, e.g. google-cloud-storage to access the GCS object. The Google library can leverage the credentials obtained in the previous step.
Update
Here's an example.
NOTE: it use the API Client Library rather than the Cloud Client Library but these are functionally equivalent.
I am trying to access BigQuery from python code in Jupyter notebook run on a local machine. So I installed the google cloud API packages on my laptop.
I need to pass the OAuth2 authentication. But unfortunately, I only have user account to our bigquery. I do not have service account and not application credentials, nor do I have the permissions to create such. I am only allowed to work with user account.
When running the bigquery.Client() function, it appears to look for application credentials by looking at an environment variable GOOGLE_APPLICATION_CREDENTIALS. But this, it seems, for my non existing application credentials.
I cannot find any other way to connect using user account authentication. But I find it extremely weird because:
The google API for R language works simply with user authentication. Parallel code in R (it has different API) just works!
I run the code from the dataspell IDE. I have created in the IDE a database resource connection to bigquery (with my user authentication). There I am capable of opening a console for the database and I can run SQL queries in the console with no problem. I have attached the bigquery session to my python notebook, and I can see my notebook attached to the big query session in the services pane. But I am still missing something in order to access some valid running connection in the python code. (I do not know how to get a python object representing a valid connected client).
I have been reading manuals from google and looked for code examples for hours... Alas, I cannot find any description of connecting a client using user account from my notebook.
Please, can someone help?
You can use the pydata-google-auth library to authenticate with a user account. This function loads credentials from a cache on disk or initiates an OAuth2.0 flow if the credentials are not found. This is not the recommended method to do an authentication.
import pandas_gbq
import pydata_google_auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
]
credentials = pydata_google_auth.get_user_credentials(
SCOPES,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as over SSH or with Google Colab.
auth_local_webserver=True,
)
df = pandas_gbq.read_gbq(
"SELECT my_col FROM `my_dataset.my_table`",
project_id='YOUR-PROJECT-ID',
credentials=credentials,
)
The recommended way to do the authentication is to contact your GCP administrator and tell them to create a key for your account following the next instructions.
Then you can use this code to set up the authentication with the key that you have:
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
'/path/to/key.json')
You can see more of the documentation here.
I need to write a cloud function in GCP, which responds to HTTP requests and has service account access to GCP cloud storage. My function will receive a string and threshold parameters. It will retrieve a csv file from cloud storage, compute similarity between the text string supplied and the entities in the csv file and return the entities that satisfy the threshold requirements.
From Google's cloud function tutorials, I have yet to see anything that gives it cloud storage access, a service account for access therein, etc.
Could anyone link a resource or otherwise explain how to get from A to B?
Let the magic happens!
In fact, nothing is magic. With most of Google Cloud products, you have a service account that you can grant the permission that you want. On Cloud Functions, the default service account is the AppEngine default service account with this pattern <projectID>#appspot.gserviceaccount.com.
When you deploy a Cloud FUnctions you can use a custom service account by using this paramenter --service-account=. It's safer because your Cloud Functions can have his own service account, with limited permissions (App Engine default service account is Project Editor by default, which is too wide!!)
So, this service is loaded automatically with your cloud functions and the Google Cloud auth libraries can access it via the Metadata server. The credentials is taken from the runtime context, it's the default credential of the environment
About your code, keep it as simple as that
from google.cloud import storage
client = storage.Client() # Use default credentials
bucket = client.get_bucket('myBucket')
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.size)
On your workstation, if you want to execute the same code, you can use your own credential by running this command gcloud auth application-default login If you prefer using a service account key file (that I strongly don't recommend, but it's not the topic), you can set the environment variable GOOGLE_APPLICATION_CREDENTIALS with the file path as value
Our company is working on processing data from Google Sheets (within Google Drive) from Google Cloud Platform and we are having some problems with the authentication.
There are two different places where we need to run code that makes API calls to Google Drive: within production in Google Compute Engine, and within development environments i.e. locally on our developers' laptops.
Our company is quite strict about credentials and does not allow the downloading of Service Account credential JSON keys (this is better practice and provides higher security). Seemingly all of the docs from GCP say to simply download the JSON key for a Service Account and use that. Or Google APIs/Developers docs say to create an OAuth2 Client ID and download it’s key like here.
They often use code like this:
from google.oauth2 import service_account
SCOPES = ['https://www.googleapis.com/auth/sqlservice.admin']
SERVICE_ACCOUNT_FILE = '/path/to/service.json'
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
But we can't (or just don't want to) download our Service Account JSON keys, so we're stuck if we just follow the docs.
For the Google Compute Engine environment we have been able to authenticate by using GCP Application Default Credentials (ADCs) - i.e. not explicitly specifying credentials to use in code and letting the client libraries “just work” - this works great as long as one ensures that the VM is created with the correct scopes https://www.googleapis.com/auth/drive, and the default compute Service Account email is given permission to the Sheet that needs to be accessed - this is explained in the docs here. You can do this like so;
from googleapiclient.discovery import build
service = build('sheets', 'v4')
SPREADSHEET_ID="<sheet_id>"
RANGE_NAME="A1:A2"
s = service.spreadsheets().values().get(
spreadsheetId=SPREADSHEET_ID,
range=RANGE_NAME, majorDimension="COLUMNS"
).execute()
However, how do we do this for development, i.e. locally on our developers' laptops? Again, without downloading any JSON keys, and preferably with the most “just works” approach possible?
Usually we use gcloud auth application-default login to create default application credentials that the Google client libraries use which “just work”, such as for Google Storage. However this doesn't work for Google APIs outside of GCP, like Google Drive API service = build('sheets', 'v4') which fails with this error: “Request had insufficient authentication scopes.”. Then we tried all kinds of solutions like:
credentials, project_id = google.auth.default(scopes=["https://www.googleapis.com/auth/drive"])
and
credentials, project_id = google.auth.default()
credentials = google_auth_oauthlib.get_user_credentials(
["https://www.googleapis.com/auth/drive"], credentials._client_id, credentials._client_secret)
)
and more...
Which all give a myriad of errors/issues we can’t get past when trying to do authentication to Google Drive API :(
Any thoughts?
One method for making the authentication from development environments easy is to use Service Account impersonation.
Here is a blog about using service account impersonation, including the benefits of doing this. #johnhanley (who wrote the blog post) is a great guy and has lots of very informative answers on SO also!
To be able to have your local machine authenticate for Google Drive API you will need to create default application credentials on your local machine that impersonates a Service Account and apply the scopes needed for the APIs you want to access.
To be able to impersonate a Service Account your user must have the role roles/iam.serviceAccountTokenCreator. This role can be applied to an entire project or to an individual Service Account.
You can use the gcloud to do this:
gcloud iam service-accounts add-iam-policy-binding [COMPUTE_SERVICE_ACCOUNT_FULL_EMAIL] \
--member user:[USER_EMAIL] \
--role roles/iam.serviceAccountTokenCreator
Once this is done create the local credentials:
gcloud auth application-default login \
--scopes=openid,https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/accounts.reauth \
--impersonate-service-account=[COMPUTE_SERVICE_ACCOUNT_FULL_EMAIL]
This will solve the scopes error you got. The three extra scopes added beyond the Drive API scope are the default scopes that gcloud auth application-default login applies and are needed.
If you apply scopes without impersonation you will get an error like this when trying to authenticate:
HttpError: <HttpError 403 when requesting https://sheets.googleapis.com/v4/spreadsheets?fields=spreadsheetId&alt=json returned "Your application has authenticated using end user credentials from the Google Cloud SDK or Google Cloud Shell which are not supported by the sheets.googleapis.com. We recommend configuring the billing/quota_project setting in gcloud or using a service account through the auth/impersonate_service_account setting. For more information about service accounts and how to use them in your application, see https://cloud.google.com/docs/authentication/.">
Once you have set up the credentials you can use the same code that is run on Google Compute Engine on your local machine :)
Note: it is also possible to set the impersonation for all gcloud commands:
gcloud config set auth/impersonate_service_account [COMPUTE_SERVICE_ACCOUNT_FULL_EMAIL]
Creating default application credentails on your local machine by impersonating a service account is a slick way of authenticating development code. It means that the code will have exactly the same permissions as the Service Account that it is impersonating. If this is the same Service Account that will run the code in production you know that code in development runs the same as production. It also means that you never have to create or download any Service Account keys.
The following can be read on developers.google.com :
With some Google APIs, you can make authorized API calls using a signed JWT directly as a bearer token, rather than an OAuth 2.0 access token. When this is possible, you can avoid having to make a network request to Google's authorization server before making an API call.
If the API you want to call has a service definition published in the Google APIs GitHub repository, you can make authorized API calls using a JWT instead of an access token
My question is : is it possible to do authorized calls to Google Drive API using a signed JWT? I searched on the Google APIs Github repo but didn't find anything. If yes, can someone give me a link to a page that would show how to do it (preferably in Python).
You want to use Drive API using the service account.
You want to achieve this using python.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
In your situation, how about using google-api-python-client?
Usage:
1. Create service account:
In this case, please create the service account and download a JSON file. Ref
2. Install "google-api-python-client"
In order to use the following sample script, please install "google-api-python-client".
$ pip install google-api-python-client
3. Sample script:
Before you run the script, please set the variable of SERVICE_ACCOUNT_FILE.
from google.oauth2 import service_account
from googleapiclient.discovery import build
SERVICE_ACCOUNT_FILE = '###' # Please set the file including JSON values of the credentials of service account.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly']
credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES)
service = build('drive', 'v3', credentials=credentials)
fileList = service.files().list().execute()
print(fileList)
When this script is run, the file list in the Google Drive of the service account is retrieved using the method of Files: list in Drive API.
Note:
The Google Drive of the service account is different from your own Google Drive. If you want to retrieve the file in your Google Drive using the service account, it is required to shared the file with the service account. Please be careful this.
References:
google-api-python-client
Creating a service account
Files: list
If I misunderstood your question and this was not the direction you want, I apologize.
Currently, Google Drive API doesn't allow to use of JWT token in the Authorization Bearer header. You may see a list of allowed API's here: https://github.com/googleapis/googleapis/tree/master/google. Unfortunately, the drive is not in the list, and an extra step to get access_token for the Drive API is required.