(There are a lot of similar threads here but unfortunately I couldn't find the answer to my error anywhere here or on Goolge)
I'm trying to query a federated table in BigQuery which is pointing to a spreadsheet in Drive.
I've run the following command to create default application credentials for gcloud:
$ gcloud auth application-default login
But this doesn't include Drive into the scope so I'm getting the following error message (which makes sense): Forbidden: 403 Access Denied: BigQuery BigQuery: No OAuth token with Google Drive scope was found.
Then I've tried to auth with explicit Drive scope:
$ gcloud auth application-default login --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/bigquery
After that I'm getting the following error when I try to use bigquery python api:
"Forbidden: 403 Access Denied: BigQuery BigQuery: Access Not Configured. Drive API has not been used in project 764086051850 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/drive.googleapis.com/overview?project=764086051850 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry."
The project number above does not exist in our organisation and the provided link leads to a page which says:
The API "drive.googleapis.com" doesn't exist or you don't have permission to access it
Drive API is definitely enabled for the default project, so the error message doesn't make much sense. I can also query the table from the terminal using bq query_string command.
I'm currently out of ideas on how to debug this further, anyone suggestions?
Configuration:
Google Cloud SDK 187.0.0
Python 2.7
google-cloud 0.27.0
google-cloud-bigquery 0.29.0
There might be issues when using the default credentials. However, you can use a service account, save the credentials in a JSON file and add the necessary scopes. I did a quick test and this code worked for me:
from google.cloud import bigquery
from google.oauth2.service_account import Credentials
scopes = (
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive'
)
credentials = Credentials.from_service_account_file('/path/to/credentials.json')
credentials = credentials.with_scopes(scopes)
client = bigquery.Client(credentials=credentials)
query = "SELECT * FROM dataset.federated_table LIMIT 5"
query_job = client.query(query)
rows = query_job.result()
for row in rows: print(row)
If you get a 404 not found error is because you need to share the spreadsheet with the service account (view permission)
Related
I am trying to build an Airflow DAG (on Cloud Composer) that reads emails from Gmail, using the Google API Python client.
I would like to avoid the use of JSON files for Service Accounts, and therefore I am trying to take advantage of Workload Identity. Therefore, I performed the following steps:
Created a Service Account (my-service-account#my-project.iam.gserviceaccount.com) that will then be used to impersonate the Google mail my-email#my-domain.com
Granted Cloud Composer Service account the roles/iam.serviceAccountTokenCreator to the Google mail Service Account
Delegated domain-wide authority to the service account with the scopes 'https://www.googleapis.com/auth/gmail.readonly' such that the service account my-service-account#my-project.iam.gserviceaccount.com is authorized to access the emails of my-email#my-domain.com.
Now I'm trying to use the Google API Python client, in order to instantiate a Gmail service and use it to search the inbox of my-email#my-domain.com. Here's the code:
import google.auth
import google.auth.impersonated_credentials
SERVICE_ACCOUNT = 'my-service-account#my-project.iam.gserviceaccount.com'
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
credentials, project_id = google.auth.default()
logging.info(f'Obtained application default credentials for project {project_id}.')
impersonated_credentials = google.auth.impersonated_credentials.Credentials(
source_credentials=credentials,
target_principal=SERVICE_ACCOUNT,
target_scopes=SCOPES,
)
logging.info(f'Obtained impersonated credentials for {SERVICE_ACCOUNT}')
service = build(
serviceName='gmail',
version='v1',
credentials=impersonated_credentials,
cache_discovery=False,
)
So initially, the code infers the Application Default Credentials (Cloud Composer), and then impersonates Cloud composer to act like the my-service-account#my-project.iam.gserviceaccount.com Service Account). Finally, it uses the returned credentials to build the gmail service.
When attempting to run a query:
results = service.users().messages().list(userId='me', q='from: someEmail#outlook.com').execute()
I get the following error:
[2022-11-14, 18:23:47 UTC] {standard_task_runner.py:93} ERROR - Failed to execute job 604219 for task test (<HttpError 400 when requesting https://gmail.googleapis.com/gmail/v1/users/me/messages?q=from%3A+someEmail%40outlook.com&alt=json returned "Precondition check failed.". Details: "Precondition check failed.">; 30352)
Any clue what I might be missing here? I've found a few similar questions but apparently they all use Service Account JSON files, which is clearly not the case here.
To create a default bigquery client I use:
from google.cloud import bigquery
client = bigquery.Client()
This uses the (default) credentials available in the environment.
But how I see then which (default) service account is used?
While you can interrogate the credentials directly (be it json keys, metadata server, etc), I have occasionally found it valuable to simply query bigquery using the SESSION_USER() function.
Something quick like this should suffice:
client = bigquery.Client()
query_job = client.query("SELECT SESSION_USER() as whoami")
results = query_job.result()
for row in results:
print("i am {}".format(row.whoami))
This led me in the right direction:
Google BigQuery Python Client using the wrong credentials
To see the service-account used you can do:
client._credentials.service_account_email
However:
This statement above works when you run it on a jupyter notebook (in Vertex AI), but when you run it in a cloud function with print(client._credentials.service_account_email) then it just logs 'default' to Cloud Logging. But the default service account for a Cloud Function should be: <project_id>#appspot.gserviceaccount.com.
This will also give you the wrong answer:
client.get_service_account_email()
The call to client.get_service_account_email() does not return the credential's service account email address. Instead, it returns the BigQuery service account email address used for KMS encryption/decryption.
Following John Hanley's comment (when running on a Compute Engine) you can query the metadata service to get the email user name:
https://cloud.google.com/compute/docs/metadata/default-metadata-values
So you can either use linux:
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email" -H "Metadata-Flavor: Google"
Or python:
import requests
headers = {'Metadata-Flavor': 'Google'}
response = requests.get(
"http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email",
headers=headers
)
print(response.text)
The default in the url used is the alias of the actual service account used.
I am trying to use BigQuery inside python to query a table that is generated via a sheet:
from google.cloud import bigquery
# Prepare connexion and query
bigquery_client = bigquery.Client(project="my_project")
query = """
select * from `table-from-sheets`
"""
df = bigquery_client.query(query).to_dataframe()
I can usually do queries to BigQuery tables, but now I am getting the following error:
Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
What do I need to do to access drive from python?
Is there another way around?
You are missing the scopes for the credentials. I'm pasting the code snippet from the official documentation.
In addition, do not forget to give at least VIEWER access to the Service Account in the Google sheet.
from google.cloud import bigquery
import google.auth
# Create credentials with Drive & BigQuery API scopes.
# Both APIs must be enabled for your project before running this code.
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
# Construct a BigQuery client object.
client = bigquery.Client(credentials=credentials, project=project)
I am trying to run a simple query on Google BigQuery via a python script, but am getting the below error that my service account is missing bigquery.jobs.create permission.
My service Account has the following roles applied:
Owner
BigQuery Admin
BigQuery Job User
I've also tried creating a custom role with bigquery.jobs.create and applying that to the service account, but still consistently get this error. What am I doing wrong?
from google.cloud import bigquery
from google.oauth2 import service_account
project_id = "my-test-project"
credentials = service_account.Credentials.from_service_account_file("credentials.json")
client = bigquery.Client(
credentials=credentials,
project=project_id
)
print(client.project) # returns "my-test-project"
query = client.query("select 1 as test;")
Access Denied: Project my-test-project: The user my-service-account #
my-test-project. iam.gserviceaccount.com does not have
bigquery.jobs.create permission in project my-test-project.
Authenticating the client using client = bigquery.Client.from_service_account_json("credentials.json") is the preferred method to avoid "Access Denied" errors. For one reason or another (I'm not sure why since bigquery does use oauth 2.0 access tokens to authorize requests), setting credentials through google.oauth2.service_account can lead to permission issues.
I have a google app engine site, and what I want to do, is get access to the files on my drive and publish them. Note that, my account owns both the drive and the app engine page.
I have tried looking at the google drive api, and the problem is that I don't know where to start with the following boilerplate code located in their documentation.
If you take a look at this function:
def get_credentials(authorization_code, state):
"""Retrieve credentials using the provided authorization code.
This function exchanges the authorization code for an access token and queries
the UserInfo API to retrieve the user's e-mail address.
If a refresh token has been retrieved along with an access token, it is stored
in the application database using the user's e-mail address as key.
If no refresh token has been retrieved, the function checks in the application
database for one and returns it if found or raises a NoRefreshTokenException
with the authorization URL to redirect the user to.
Args:
authorization_code: Authorization code to use to retrieve an access token.
state: State to set to the authorization URL in case of error.
Returns:
oauth2client.client.OAuth2Credentials instance containing an access and
refresh token.
Raises:
CodeExchangeError: Could not exchange the authorization code.
NoRefreshTokenException: No refresh token could be retrieved from the
available sources.
"""
email_address = ''
try:
credentials = exchange_code(authorization_code)
user_info = get_user_info(credentials)
email_address = user_info.get('email')
user_id = user_info.get('id')
if credentials.refresh_token is not None:
store_credentials(user_id, credentials)
return credentials
else:
credentials = get_stored_credentials(user_id)
if credentials and credentials.refresh_token is not None:
return credentials
except CodeExchangeException, error:
logging.error('An error occurred during code exchange.')
# Drive apps should try to retrieve the user and credentials for the current
# session.
# If none is available, redirect the user to the authorization URL.
error.authorization_url = get_authorization_url(email_address, state)
raise error
except NoUserIdException:
logging.error('No user ID could be retrieved.')
# No refresh token has been retrieved.
authorization_url = get_authorization_url(email_address, state)
raise NoRefreshTokenException(authorization_url)
This is a part of the boilerplate code. However, where am I supposed to get authorisation_code from?
I recently had to implement something similar, and it is quite tricky to find the relevant pieces of documentation.
This is what worked for me.
One-time setup to enable Google Drive for your Google App Engine project
Go to the Google APIs Console and select your App Engine project. If you don't see your App Engine project listed, you need to enable the cloud integration in the App Engine admin tool first (Administration > Application Settings > Cloud Integration > Create project)
In Google APIs Console, now go to Services and look for the "Drive API" in that long list. Turn it on.
Go to the API Access section on Google APIs Console, and find back the "Simple API Access" API Key. (see screenshot below)
Getting and installing the Python Drive API Client
Download the Python Drive API Client: https://developers.google.com/api-client-library/python/start/installation#appengine
Documentation on this Python API: https://google-api-client-libraries.appspot.com/documentation/drive/v2/python/latest/
Using the Python Drive API Client
To create the Drive service object, I use this:
import httplib2
def createDriveService():
"""Builds and returns a Drive service object authorized with the
application's service account.
Returns:
Drive service object.
"""
from oauth2client.appengine import AppAssertionCredentials
from apiclient.discovery import build
credentials = AppAssertionCredentials(scope='https://www.googleapis.com/auth/drive')
http = httplib2.Http()
http = credentials.authorize(http)
return build('drive', 'v2', http=http, developerKey=API_KEY)
You can then use this service object to execute Google Drive API calls, for example, to create a folder:
service = createDriveService()
res = {'title': foldername,
'mimeType': "application/vnd.google-apps.folder"}
service.files().insert(body=res).execute()
Caveats
I was not able to get the Drive API to work in unittesting, nor on the dev_appserver. I always get an error that my credentials are not valid. However, it works fine on the real app engine server.