BigQuery cross project access via cloud functions

BigQuery cross project access via cloud functions - python

Let's say I have two GCP Projects, A and B. And I am the owner of both projects. When I use the UI, I can query BigQuery tables in project B from both projects. But I run into problems when I try to run a Cloud Function in project A, from which I try to access a BigQuery table in project B. Specifically I run into a 403 Access Denied: Table <>: User does not have permission to query table <>.. I am a bit confused as to why I can't access the data in B and what I need to do. In my Cloud Function all I do is:
from google.cloud import bigquery
client = bigquery.Client()
query = cient.query(<my-query>)
res = query.result()
The service account used to run the function exists in project A - how do I give it editor access to BigQuery in project B? (Or what else should I do?).

Basically you have an issue with IAM Permissions and roles on the service account used to run the function.
You should define the role bigquery.admin on your service account and it would do the trick.
However it may not be the adequate solution in regards to best practices. The link below provides a few scenarios with examples of roles most suited to your case.
https://cloud.google.com/bigquery/docs/access-control-examples

Related

BigQuery GCP Python Integration

I am trying to write all my scripts in Python instead of BigQuery. I set my active project using 'glcoud config set project' but I still get this ERROR 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/analytics-supplychain-thd/jobs: Caller does not have required permission to use project analytics-supplychain-thd. Grant the caller the Owner or Editor role, or a custom role with the serviceusage.services.use permission, by visiting https://console.developers.google.com/iam-admin/iam/project?project=analytics-supplychain-thd and then retry (propagation of new permission may take a few minutes).
How do I fix this?

I suspect you are picking up the wrong "key".json, at least in terms of permissions for one of the operation you are trying to perform. The key currently defined [1] in GOOGLE_APPLICATION_CREDENTIALS seems not have right permission. A list of roles you should grant to the Service Account can be find here [2], anyway from your error you would need at least a primitive role as Owner or Editor. The latter depends on your needs and targets (operation you perform through such script).
You should pick up the right role for your operation and associating it to the Service Account you want to use, defining therefore an identity for it through the IAM portal UI, also doable also through the CLI or API calls.
After that be sure the client you are using is logged in with the correct service account (correct json key path).
Particularly, I used the code you gave me to test and I have been able to load the data:
import pandas_gbq
import google.oauth2.service_account as service_account
# TODO: Set project_id to your Google Cloud Platform project ID
project_id = "xxx-xxxx-xxxxx"
sql = """SELECT * FROM xxx-xxxx-xxxxx.fourth_dataset.2test LIMIT 100"""
credentials = service_account.Credentials.from_service_account_file('/home/myself/key.json')
df = pandas_gbq.read_gbq(sql, project_id=project_id, dialect="standard", credentials=credentials)
This
Hope this helps!!
[1] https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable
[2] https://cloud.google.com/iam/docs/understanding-roles#primitive_roles

Is it possible to limit a Google service account to specific BigQuery datasets within a project?

I've set up a service account using the GCP UI for a specific project Project X. Within Project X there are 3 datasets:
Dataset 1
Dataset 2
Dataset 3
If I assign the role BigQuery Admin to Project X this is currently being inherited by all 3 datasets.
Currently all of these datasets inherit the permissions assigned to the service account at the project level. Is there any way to modify the permissions for the service account such that it only has access to specified datasets? e.g. allow access to Dataset 1 but not Dataset 2 or Dataset 3.
Is this type of configuration possible?
I've tried to add a condition in the UI but when I use the Name resource type and set the value equal to Dataset 1 I'm not able to access any of the datasets - presumably the value is not correct. Or a dataset is not a valid name resource.
UPDATE
Adding some more detail regarding what I'd already tried before posting, as well as some more detail on what I'm doing.
For my particular use case, I'm trying to perform SQL queries as well as modifying tables in BigQuery through the API (using Python).
Case A:
I create a service account with the role 'BigQuery Admin'.
This role is propagated to all datasets within the project - the property is inherited and I can not delete this service account role from any of the datasets.
In this case I'm able to query all datasets and tables using the Python API - as you'd expect.
Case B:
I create a service account with no default role.
No role is propagated and I can assign roles to specific datasets by clicking on the 'Share dataset' option in the UI to assign the 'BigQuery Admin' role to them.
In this case I'm not able to query any of the datasets or tables and get the following error if I try:
*Forbidden: 403 POST https://bigquery.googleapis.com/bq/projects/project-x/jobs: Access Denied: Project X: User does not have bigquery.jobs.create permission in project Project X.*
Even though the permissions required (bigquery.jobs.create in this case) exist for the dataset I want, I can't query the data as it appears that the bigquery.jobs.create permission is also required at a project level to use the API.

I'm posting the solution that I found to the problem in case it is useful to anyone else trying to accomplish the same.
Assign the role "BigQuery Job User" at a project level in order to have the permission bigquery.jobs.create assigned to the service account for that project.
You can then manually assign specific datasets the role of "BigQuery Data Editor" in order to query them through the API in Python. Do this by clciking on "Share dataset" in the BigQuery UI. So for this example, I've "Shared" Dataset 1 and Dataset 2 with the service account.
You should now be able to query the datasets for which you've assigned the BigQuery Data Editor role in Python.
However, for Dataset 3, for which the "BigQuery Data Editor" role has not been assigned, if you attempt to query a table this should return the error:
Forbidden: 403 Access Denied: Table Project-x:dataset_1.table_1: User does not have permission to query table Project-x:dataset_1.table_1.
As described above, we now have sufficient permissions to access the project but not the table within Dataset 3 - by design.

As you can see here, you can grant access in your dataset to some entities, including service accounts:
Google account e-mail: Grants an individual Google account access to
the dataset
Google Group: Grants all members of a Google group access
to the dataset Google Apps
Domain: Grants all users and groups in a
Google domain access to the dataset
Service account: Grants a service
account access to the dataset
Anybody: Enter "allUsers" to grant
access to the general public
All Google accounts: Enter
"allAuthenticatedUsers" to grant access to any user signed in to a
Google Account
I suggest that you create a service account without permissions in BigQuery and then grant the access for a specific dataset.
I hope it helps you.

Please keep in mind that access to BigQuery can be granted at project level or dataset level.
The dataset is the lowest level you can assign permissions, so that accounts can access all the resources in the dataset, e.g. tables, views, columns and rows. Permissions at project level permissions, as you have already noticed, are propagated (heritage) for all the datasets in the project.
Regarding your service account, by default Google Cloud assigns it a structure like service_accunt_name#example.gserviceaccount.com, and during the process of sharing the dataset, as commented by #rmesteves, you will need this email address to grant it the desired permissions.
It seems that the steps you described "Name resource type" are not the correct ones. In the BigQuery UI please try:
Click on the dataset name (e.g. Dataset1 in your example) you want to share.
Then, at the right on the screen you will see the option "Share Dataset", click on it.
Follow instructions to set up to your service account a BigQuery role like BigQuery Admin, BigQuery Data Owner, BigQuery User, among others. Check the previous link to be aware of what kind of things the roles can perform.

New Azure Subscriptions are not listed with azure python sdk

I created 2 new subscriptions from Azure Portal, but I've not been able to list those newly created subscriptions using the python SDK. It lists the old subscriptions fine.
from azure.mgmt.resource import SubscriptionClient
...
subscriptionClient = SubscriptionClient(credentials)
for subscription in subscriptionClient.subscriptions.list():
print subscription
...
I was having the same problem with CLI as well, but logging out and logging back in resolved the issue.
I don't see any other subscriptions operations to scan and refresh the subscriptions. Is there something I need to under Azure Active Directory to manage new subscriptions?

I tried to reproduce your issue successfully, which was caused by your client registed on Azure AD that have no permission to retrieve the information of these subscriptions. So the solution is to add permission for each subsription via add role like Owner for your client, as the figure below.
Then your code works fine, but I know the solution is not perfect for you. I'm looking for a better one.

Azure python SDK ComputerManagementClient error

I get an error when trying to deallocate a virtual machine with the Python SDK for Azure.
Basically I try something like:
credentials = ServicePrincipalCredentials(client_id, secret, tenant)
compute_client = ComputeManagementClient(credentials, subscription_id, '2015-05-01-preview')
compute_client.virtual_machines.deallocate(resource_group_name, vm_name)
pprint (result.result())
-> exception:
msrestazure.azure_exceptions.CloudError: Azure Error: AuthorizationFailed
Message: The client '<some client UUID>' with object id '<same client UUID>' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/deallocate/action' over scope '/subscriptions/<our subscription UUID>/resourceGroups/<resource-group>/providers/Microsoft.Compute/virtualMachines/<our-machine>'.
What I don't understand is that the error message contains an unknown client UUID that I have not used in the credentials.
Python is version 2.7.13 and the SDK version was from yesterday.
What I guess I need is a registration for an Application, which I did to get the information for the credentials. I am not quite sure which exact permission(s) I need to register for the application with IAM. For adding an access entry I can only pick existing users, but not an application.
So is there any programmatic way to find out which permissions are required for an action and which permissions our client application has?
Thanks!

As #GauravMantri & #LaurentMazuel said, the issue was caused by not assign role/permission to a service principal. I had answered another SO thread Cannot list image publishers from Azure java SDK, which is similar with yours.
There are two ways to resolve the issue, which include using Azure CLI & doing these operations on Azure portal, please see the details of my answer for the first, and I update below for the second way which is old.
And for you want to find out these permissions programmatically, you can refer to the REST API Role Definition List to get all role definitions that are applicable at scope and above, or refer to Azure Python SDK Authentication Management to do it via the code authorization_client.role_definitions.list(scope).
Hope it helps.

Thank you all for your answers! The best recipe for creating an application and to register it with the right role - Virtual Machine Contributor - is presented indeed on https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal
The main issue I had was that there is a bug in the adding a role within IAM. I use add. I select "Virtual Machine Contributor". With "Select" I get presented a list of users, but not the application that I have created for this purpose. Entering the first few letters of the name of my application will give a filtered output that includes my application this time though. Registration is then finished and things can proceed.

Is it possible to connect and query a BigQuery table from Google App-Engine (python) without OAuth2 authentication dialog?

I’m working on a Google App-Engine project which stores around 100K entities in the Datastore. Since I have to search in the string properties of those entities I have to find an effective way to do it.
After some research I found Google’s BigQuery service which looks perfect for me. I already imported the entities into BigQuery via the web interface but I can not connect and run a query on the BigQuery from the App-Engine code.
My App-Engine project has no web interface. It generates only JSON outputs which are consumed by mobile applications.
So, my question is this: is it possible to connect and run a query from the App-Engine python code without the OAuth2 authentication dialog?

Yes. Simply use what's known as a "service account" as described here. Then, some simple Python code once you've exported GOOGLE_APPLICATION_CREDENTIALS to point to the credential file:
from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
for dataset in client.list_datasets():
do_something_with(dataset)
More info here too.

Just a quick caution this is case sensitive
- to check the captalisation for 'Client';
my_bigquery_client = bigquery.client(project = 'my_project')
- this fails with an error "TypeError: 'module' object is not callable"
my_bigquery_client = bigquery.Client(project = 'my_project')
- this works.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.