BigQuery GCP Python Integration

BigQuery GCP Python Integration - python

I am trying to write all my scripts in Python instead of BigQuery. I set my active project using 'glcoud config set project' but I still get this ERROR 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/analytics-supplychain-thd/jobs: Caller does not have required permission to use project analytics-supplychain-thd. Grant the caller the Owner or Editor role, or a custom role with the serviceusage.services.use permission, by visiting https://console.developers.google.com/iam-admin/iam/project?project=analytics-supplychain-thd and then retry (propagation of new permission may take a few minutes).
How do I fix this?

I suspect you are picking up the wrong "key".json, at least in terms of permissions for one of the operation you are trying to perform. The key currently defined [1] in GOOGLE_APPLICATION_CREDENTIALS seems not have right permission. A list of roles you should grant to the Service Account can be find here [2], anyway from your error you would need at least a primitive role as Owner or Editor. The latter depends on your needs and targets (operation you perform through such script).
You should pick up the right role for your operation and associating it to the Service Account you want to use, defining therefore an identity for it through the IAM portal UI, also doable also through the CLI or API calls.
After that be sure the client you are using is logged in with the correct service account (correct json key path).
Particularly, I used the code you gave me to test and I have been able to load the data:
import pandas_gbq
import google.oauth2.service_account as service_account
# TODO: Set project_id to your Google Cloud Platform project ID
project_id = "xxx-xxxx-xxxxx"
sql = """SELECT * FROM xxx-xxxx-xxxxx.fourth_dataset.2test LIMIT 100"""
credentials = service_account.Credentials.from_service_account_file('/home/myself/key.json')
df = pandas_gbq.read_gbq(sql, project_id=project_id, dialect="standard", credentials=credentials)
This
Hope this helps!!
[1] https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable
[2] https://cloud.google.com/iam/docs/understanding-roles#primitive_roles

Related

How can get list of multiple - GCP projects assigned under one service account ? ( GCP )( python-client library files)

I am trying to list the projects which are assigned under the service account file (credentials.json).
Below: by following this method i have added multiple project under one service account
And then I have generated the key (i.e) credientials.json file from the project B as mentioned in above image.
I checked inside the credientials.json : But that service account has only project name. it doesn't have the project A name.
Even though i applied this credientials.json file to my code its showing only the project B name.
it does't showes the project A name.
if the service account conneting method is wrong please correct me
Here is my code
import os
import google.auth
credential_path = r'c:test\credientials.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path
credentials,project = google.auth.default()
print(credentials)
print(project)

On Google Cloud, the IAM permission are resource centric (a project is a resource, like a Cloud Functions or a VM) and not account centric (as AWS is).
Therefore, you can't know all the resources granted on an account, but you have to browse ALL the resource and to check if the account is authorized on it.
Because it's boring, you can use Cloud Asset Inventory search IAM policies to make this task easier

How can I grant a Cloud Run service access to service account's credentials without the key file?

I'm developing a Cloud Run Service that accesses different Google APIs using a service account's secrets file with the following python 3 code:
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(SECRETS_FILE_PATH, scopes=SCOPES)
In order to deploy it, I upload the secrets file during the build/deploy process (via gcloud builds submit and gcloud run deploy commands).
How can I avoid uploading the secrets file like this?
Edit 1:
I think it is important to note that I need to impersonate user accounts from GSuite/Workspace (with domain wide delegation). The way I deal with this is by using the above credentials followed by:
delegated_credentials = credentials.with_subject(USER_EMAIL)

Using the Secret Manager might help you, as you can manage the multiple secrets you have and not have them stored as files, as you are doing right now. I would recommend you to take a look at this article here, so you can get more information on how to use it with Cloud Run, to improve the way you manage your secrets.
In addition to that, as clarified in this similar case here, you have two options: use default service account that comes with it or deploy another one with the Service Admin role. This way, you won't need to specify keys with variables - as clarified by a Google developer in this specific answer.

To improve the security, the best way is to never use service account key file, locally or on GCP (I wrote an article on this). To achieve this, Google Cloud service have an automatically loaded service account, either this one by default or, when possible, a custom one.
On Cloud Run, the default service account is the Compute Engine default service account (I recommend you to never use it, it has editor role on the project, it's too wide!), or you can specify the service account to use (--service-account= parameter)
Then, in your code, simply use the ADC mechanism (Application Default Credential) to get your credentials, like this in Python
import google.auth
credentials, project_id = google.auth.default(scopes=SCOPES)

I've found one way to solve the problem.
First, as suggested by guillaume blaquiere answer, I used google.auth ADC mechanism:
import google.auth
credentials, project_id = google.auth.default(scopes=SCOPES)
However, as I need to impersonate GSuite's (now Workspace) accounts, this method is not enough, as the credentials object generated from this method does not have the with_subject property. This led me to this similar post and specific answer which works a way to convert google.auth.credentials into the Credential object returned by service_account.Credentials.from_service_account_file. There was one problem with his solution, as it seemed that an authentication scope was missing.
All I had to do is add the https://www.googleapis.com/auth/cloud-platform scope to the following places:
The SCOPES variable in the code
Google Admin > Security > API Controls > Set client ID and scope for the service account I am deploying with
At the OAuth Consent Screen of my project
After that, my Cloud Run had access to credentials that were able to impersonate user's accounts without using key files.

BigQuery cross project access via cloud functions

Let's say I have two GCP Projects, A and B. And I am the owner of both projects. When I use the UI, I can query BigQuery tables in project B from both projects. But I run into problems when I try to run a Cloud Function in project A, from which I try to access a BigQuery table in project B. Specifically I run into a 403 Access Denied: Table <>: User does not have permission to query table <>.. I am a bit confused as to why I can't access the data in B and what I need to do. In my Cloud Function all I do is:
from google.cloud import bigquery
client = bigquery.Client()
query = cient.query(<my-query>)
res = query.result()
The service account used to run the function exists in project A - how do I give it editor access to BigQuery in project B? (Or what else should I do?).

Basically you have an issue with IAM Permissions and roles on the service account used to run the function.
You should define the role bigquery.admin on your service account and it would do the trick.
However it may not be the adequate solution in regards to best practices. The link below provides a few scenarios with examples of roles most suited to your case.
https://cloud.google.com/bigquery/docs/access-control-examples

Is it possible to limit a Google service account to specific BigQuery datasets within a project?

I've set up a service account using the GCP UI for a specific project Project X. Within Project X there are 3 datasets:
Dataset 1
Dataset 2
Dataset 3
If I assign the role BigQuery Admin to Project X this is currently being inherited by all 3 datasets.
Currently all of these datasets inherit the permissions assigned to the service account at the project level. Is there any way to modify the permissions for the service account such that it only has access to specified datasets? e.g. allow access to Dataset 1 but not Dataset 2 or Dataset 3.
Is this type of configuration possible?
I've tried to add a condition in the UI but when I use the Name resource type and set the value equal to Dataset 1 I'm not able to access any of the datasets - presumably the value is not correct. Or a dataset is not a valid name resource.
UPDATE
Adding some more detail regarding what I'd already tried before posting, as well as some more detail on what I'm doing.
For my particular use case, I'm trying to perform SQL queries as well as modifying tables in BigQuery through the API (using Python).
Case A:
I create a service account with the role 'BigQuery Admin'.
This role is propagated to all datasets within the project - the property is inherited and I can not delete this service account role from any of the datasets.
In this case I'm able to query all datasets and tables using the Python API - as you'd expect.
Case B:
I create a service account with no default role.
No role is propagated and I can assign roles to specific datasets by clicking on the 'Share dataset' option in the UI to assign the 'BigQuery Admin' role to them.
In this case I'm not able to query any of the datasets or tables and get the following error if I try:
*Forbidden: 403 POST https://bigquery.googleapis.com/bq/projects/project-x/jobs: Access Denied: Project X: User does not have bigquery.jobs.create permission in project Project X.*
Even though the permissions required (bigquery.jobs.create in this case) exist for the dataset I want, I can't query the data as it appears that the bigquery.jobs.create permission is also required at a project level to use the API.

I'm posting the solution that I found to the problem in case it is useful to anyone else trying to accomplish the same.
Assign the role "BigQuery Job User" at a project level in order to have the permission bigquery.jobs.create assigned to the service account for that project.
You can then manually assign specific datasets the role of "BigQuery Data Editor" in order to query them through the API in Python. Do this by clciking on "Share dataset" in the BigQuery UI. So for this example, I've "Shared" Dataset 1 and Dataset 2 with the service account.
You should now be able to query the datasets for which you've assigned the BigQuery Data Editor role in Python.
However, for Dataset 3, for which the "BigQuery Data Editor" role has not been assigned, if you attempt to query a table this should return the error:
Forbidden: 403 Access Denied: Table Project-x:dataset_1.table_1: User does not have permission to query table Project-x:dataset_1.table_1.
As described above, we now have sufficient permissions to access the project but not the table within Dataset 3 - by design.

As you can see here, you can grant access in your dataset to some entities, including service accounts:
Google account e-mail: Grants an individual Google account access to
the dataset
Google Group: Grants all members of a Google group access
to the dataset Google Apps
Domain: Grants all users and groups in a
Google domain access to the dataset
Service account: Grants a service
account access to the dataset
Anybody: Enter "allUsers" to grant
access to the general public
All Google accounts: Enter
"allAuthenticatedUsers" to grant access to any user signed in to a
Google Account
I suggest that you create a service account without permissions in BigQuery and then grant the access for a specific dataset.
I hope it helps you.

Please keep in mind that access to BigQuery can be granted at project level or dataset level.
The dataset is the lowest level you can assign permissions, so that accounts can access all the resources in the dataset, e.g. tables, views, columns and rows. Permissions at project level permissions, as you have already noticed, are propagated (heritage) for all the datasets in the project.
Regarding your service account, by default Google Cloud assigns it a structure like service_accunt_name#example.gserviceaccount.com, and during the process of sharing the dataset, as commented by #rmesteves, you will need this email address to grant it the desired permissions.
It seems that the steps you described "Name resource type" are not the correct ones. In the BigQuery UI please try:
Click on the dataset name (e.g. Dataset1 in your example) you want to share.
Then, at the right on the screen you will see the option "Share Dataset", click on it.
Follow instructions to set up to your service account a BigQuery role like BigQuery Admin, BigQuery Data Owner, BigQuery User, among others. Check the previous link to be aware of what kind of things the roles can perform.

Azure python SDK ComputerManagementClient error

I get an error when trying to deallocate a virtual machine with the Python SDK for Azure.
Basically I try something like:
credentials = ServicePrincipalCredentials(client_id, secret, tenant)
compute_client = ComputeManagementClient(credentials, subscription_id, '2015-05-01-preview')
compute_client.virtual_machines.deallocate(resource_group_name, vm_name)
pprint (result.result())
-> exception:
msrestazure.azure_exceptions.CloudError: Azure Error: AuthorizationFailed
Message: The client '<some client UUID>' with object id '<same client UUID>' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/deallocate/action' over scope '/subscriptions/<our subscription UUID>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachines/<our-machine>'.
What I don't understand is that the error message contains an unknown client UUID that I have not used in the credentials.
Python is version 2.7.13 and the SDK version was from yesterday.
What I guess I need is a registration for an Application, which I did to get the information for the credentials. I am not quite sure which exact permission(s) I need to register for the application with IAM. For adding an access entry I can only pick existing users, but not an application.
So is there any programmatic way to find out which permissions are required for an action and which permissions our client application has?
Thanks!

As #GauravMantri & #LaurentMazuel said, the issue was caused by not assign role/permission to a service principal. I had answered another SO thread Cannot list image publishers from Azure java SDK, which is similar with yours.
There are two ways to resolve the issue, which include using Azure CLI & doing these operations on Azure portal, please see the details of my answer for the first, and I update below for the second way which is old.
And for you want to find out these permissions programmatically, you can refer to the REST API Role Definition List to get all role definitions that are applicable at scope and above, or refer to Azure Python SDK Authentication Management to do it via the code authorization_client.role_definitions.list(scope).
Hope it helps.

Thank you all for your answers! The best recipe for creating an application and to register it with the right role - Virtual Machine Contributor - is presented indeed on https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal
The main issue I had was that there is a bug in the adding a role within IAM. I use add. I select "Virtual Machine Contributor". With "Select" I get presented a list of users, but not the application that I have created for this purpose. Entering the first few letters of the name of my application will give a filtered output that includes my application this time though. Registration is then finished and things can proceed.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.