bigQuery Google Cloud how to share dataset with other users? using python

bigQuery Google Cloud how to share dataset with other users? using python - python

I have a bigQuery dataset defined in Google Cloud with my userA account, and I want my colleague userB, who is a member of the same group, to be able to see the dataset that I have defined. Using the bq command-line interface, userB can see the project, but not the dataset. How can I share the dataset created by userA with userB using python script?

Another thing you may run into is that you must give access at the data set level in BigQuery. Depending on how you have setup user roles in cloud platform and BigQuery, you may need to give the service account direct access to the Bigquery data set.
To do this go into BigQuery, hover on your dataset and click the down arrow, select 'share data set'. A modal will open where you can then specify which email address's and service accounts to share the data set with and control their access rights.
Let me know if my instructions are too confusing and I'll upload some images showing exactly how to do this.
Good Luck!!

An example using the Python Client Library. Adapted from here but adding a get_dataset call to get the current ACL policy for already existing datasets:
from google.cloud import bigquery
project_id = "PROJECT_ID"
dataset_id = "DATASET_NAME"
group_name= "google-group-name#google.com"
role = "READER"
client = bigquery.Client(project=project_id)
dataset_info = client.get_dataset(client.dataset(dataset_id))
access_entries = dataset_info.access_entries
access_entries.append(
bigquery.AccessEntry(role, "groupByEmail", group_name)
)
dataset_info.access_entries = access_entries
dataset_info = client.update_dataset(
dataset_info, ['access_entries'])
Another way to do it is using the Google Python API Client and the get and patch methods. First, we retrieve the existing dataset ACL, add the group as READER to the response and patch the dataset metadata:
from oauth2client.client import GoogleCredentials
from googleapiclient import discovery
project_id="PROJECT_ID"
dataset_id="DATASET_NAME"
group_name="google-group-name#google.com"
role="READER"
credentials = GoogleCredentials.get_application_default()
bq = discovery.build("bigquery", "v2", credentials=credentials)
response = bq.datasets().get(projectId=project_id, datasetId=dataset_id).execute()
response['access'].append({u'role': u'{}'.format(role), u'groupByEmail': u'{}'.format(group_name)})
bq.datasets().patch(projectId=project_id, datasetId=dataset_id, body=response).execute()
Replace the project_id, dataset_id, group_name and role variables accordingly.
Versions used:
$ pip freeze | grep -E 'bigquery|api-python'
google-api-python-client==1.7.7
google-cloud-bigquery==1.8.1

Related

How to get DB password from Azure app vault using python ? I am running this python file on google Dataproc cluster

My Sql server DB password is saved on Azure app vault which has DATAREF ID as a identifier. I need that password to create spark dataframe from table which is present in SQL server. I am running this .py file on google Dataproc cluster. How can I get that password using python?

Since you are accessing an Azure service from a non-Azure service, you will need a service principal. You can use certificate or secret. See THIS link for the different methods. You will need to give the service principal proper access and this will depend if you are using RBAC or access policy for your key vault.
So the steps you need to follow are:
Create a key vault and create a secret.
Create a Service principal or application registration. Store the clientid, clientsecret and tenantid.
Give the service principal proper access to the key vault(if you are using access policies) or to the specific secret(if you are using RBAC model)
The python link for the code is HERE.
The code that will work for you is below:
from azure.identity import ClientSecretCredential
from azure.keyvault.secrets import SecretClient
tenantid = <your_tenant_id>
clientsecret = <your_client_secret>
clientid = <your_client_id>
my_credentials = ClientSecretCredential(tenant_id=tenantid, client_id=clientid, client_secret=clientsecret)
secret_client = SecretClient(vault_url="https://<your_keyvault_name>.vault.azure.net/", credential=my_credentials)
secret = secret_client.get_secret("<your_secret_name>")
print(secret.name)
print(secret.value)

How can I see the service account that the python bigquery client uses?

To create a default bigquery client I use:
from google.cloud import bigquery
client = bigquery.Client()
This uses the (default) credentials available in the environment.
But how I see then which (default) service account is used?

While you can interrogate the credentials directly (be it json keys, metadata server, etc), I have occasionally found it valuable to simply query bigquery using the SESSION_USER() function.
Something quick like this should suffice:
client = bigquery.Client()
query_job = client.query("SELECT SESSION_USER() as whoami")
results = query_job.result()
for row in results:
print("i am {}".format(row.whoami))

This led me in the right direction:
Google BigQuery Python Client using the wrong credentials
To see the service-account used you can do:
client._credentials.service_account_email
However:
This statement above works when you run it on a jupyter notebook (in Vertex AI), but when you run it in a cloud function with print(client._credentials.service_account_email) then it just logs 'default' to Cloud Logging. But the default service account for a Cloud Function should be: <project_id>#appspot.gserviceaccount.com.
This will also give you the wrong answer:
client.get_service_account_email()
The call to client.get_service_account_email() does not return the credential's service account email address. Instead, it returns the BigQuery service account email address used for KMS encryption/decryption.

Following John Hanley's comment (when running on a Compute Engine) you can query the metadata service to get the email user name:
https://cloud.google.com/compute/docs/metadata/default-metadata-values
So you can either use linux:
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email" -H "Metadata-Flavor: Google"
Or python:
import requests
headers = {'Metadata-Flavor': 'Google'}
response = requests.get(
"http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email",
headers=headers
)
print(response.text)
The default in the url used is the alias of the actual service account used.

Python client for accessing kubernetes cluster on GKE

I am struggling to programmatically access a kubernetes cluster running on Google Cloud. I have set up a service account and pointed GOOGLE_APPLICATION_CREDENTIALS to a corresponding credentials file. I managed to get the cluster and credentials as follows:
import google.auth
from google.cloud.container_v1 import ClusterManagerClient
from kubernetes import client
credentials, project = google.auth.default(
scopes=['https://www.googleapis.com/auth/cloud-platform',])
credentials.refresh(google.auth.transport.requests.Request())
cluster_manager = ClusterManagerClient(credentials=credentials)
cluster = cluster_manager.get_cluster(project, 'us-west1-b', 'clic-cluster')
So far so good. But then I want to start using the kubernetes client:
config = client.Configuration()
config.host = f'https://{cluster.endpoint}:443'
config.verify_ssl = False
config.api_key = {"authorization": "Bearer " + credentials.token}
config.username = credentials._service_account_email
client.Configuration.set_default(config)
kub = client.CoreV1Api()
print(kub.list_pod_for_all_namespaces(watch=False))
And I get an error message like this:
pods is forbidden: User "12341234123451234567" cannot list resource "pods" in API group "" at the cluster scope: Required "container.pods.list" permission.
I found this website describing the container.pods.list, but I don't know where I should add it, or how it relates to the API scopes described here.

As per the error:
pods is forbidden: User "12341234123451234567" cannot list resource
"pods" in API group "" at the cluster scope: Required
"container.pods.list" permission.
it seems evident the user credentials you are trying to use, does not have permission on listing the pods.
The entire list of permissions mentioned in https://cloud.google.com/kubernetes-engine/docs/how-to/iam, states the following:
There are different Role which can play into account here:
If you are able to get cluster, then it is covered with multiple Role sections like: Kubernetes Engine Cluster Admin, Kubernetes Engine Cluster Viewer, Kubernetes Engine Developer & Kubernetes Engine Viewer
Whereas, if you want to list pods kub.list_pod_for_all_namespaces(watch=False) then you might need Kubernetes Engine Viewer access.
You should be able to add multiple roles.

how do I get the project of a service account?

I'm using the python google.cloud api
For example using the metrics module
from google.cloud import monitoring
client = monitoring.Client()
client.query(my/gcp/metric, minutes=10)
For my GOOGLE_APPLICATION_CREDENTIALS im using a service account that has specific access to a gcp project.
Does google.cloud have any modules that can let me derive the project from the service account (like get what project the service account is in)?
This would be convenient because each service account only has access to a single project, so I could set my service account and be able to reference that project in code.

Not sure if this will work, you may need to tweak it:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
credentials = GoogleCredentials.get_application_default()
service = discovery.build('yourservicename', credentials=credentials)
request = service.projects().list()[0]

Google Cloud Identity and Access Management (IAM) API has ‘serviceAccounts.get’ method and which shows the projects associated with a service account as shown here. You need to have proper permissions on the projects for the API to work.

The method google.auth.default return a tuple (project_id, credentials) if that information is available on the environment.
Also, the client object knows to which project it is linked from (either client.project or client.project_id, I'm not sure which one for the Monitoring API).
If you set the service account manually with the GOOGLE_APPLICATION_CREDENTIALS env var, you can open the file and load its json. One of the parameters in a service account key file is the project id.

How to use Bigquery streaming insertall on app engine & python

I would like to develop an app engine application that directly stream data into a BigQuery table.
According to Google's documentation there is a simple way to stream data into bigquery:
http://googlecloudplatform.blogspot.co.il/2013/09/google-bigquery-goes-real-time-with-streaming-inserts-time-based-queries-and-more.html
https://developers.google.com/bigquery/streaming-data-into-bigquery#streaminginsertexamples
(note: in the above link you should select the python tab and not Java)
Here is the sample code snippet on how streaming insert should be coded:
body = {"rows":[
{"json": {"column_name":7.7,}}
]}
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body=body).execute()
Although I've downloaded the client api I didn't find any reference to a "bigquery" module/object referenced in the above Google's example.
Where is the the bigquery object (from snippet) should be located?
Can anyone show a more complete way to use this snippet (with the right imports)?
I've Been searching for that a lot and found documentation confusing and partial.

Minimal working (as long as you fill in the right ids for your project) example:
import httplib2
from apiclient import discovery
from oauth2client import appengine
_SCOPE = 'https://www.googleapis.com/auth/bigquery'
# Change the following 3 values:
PROJECT_ID = 'your_project'
DATASET_ID = 'your_dataset'
TABLE_ID = 'TestTable'
body = {"rows":[
{"json": {"Col1":7,}}
]}
credentials = appengine.AppAssertionCredentials(scope=_SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery = discovery.build('bigquery', 'v2', http=http)
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body=body).execute()
print response
As Jordan says: "Note that this uses the appengine robot to authenticate with BigQuery, so you'll to add the robot account to the ACL of the dataset. Note that if you also want to use the robot to run queries, not just stream, you need the robot to be a member of the project 'team' so that it is authorized to run jobs."

Here is a working code example from an appengine app that streams records to a BigQuery table. It is open source at code.google.com:
http://code.google.com/p/bigquery-e2e/source/browse/sensors/cloud/src/main.py#124
To find out where the bigquery object comes from, see
http://code.google.com/p/bigquery-e2e/source/browse/sensors/cloud/src/config.py
Note that this uses the appengine robot to authenticate with BigQuery, so you'll to add the robot account to the ACL of the dataset.
Note that if you also want to use the robot to run queries, not just stream, you need to robot to be a member of the project 'team' so that it is authorized to run jobs.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

bigQuery Google Cloud how to share dataset with other users? using python - python

Related

How to get DB password from Azure app vault using python ? I am running this python file on google Dataproc cluster

How can I see the service account that the python bigquery client uses?

Python client for accessing kubernetes cluster on GKE

how do I get the project of a service account?

How to use Bigquery streaming insertall on app engine & python

Categories

Resources