Databricks table usage read/writes in a database

Databricks table usage read/writes in a database - python

Was trying to google how to obtain some read/write usage statistics for tables in a database with spark sql, no success however.
It can be as simple as:
table1 | 3 times this month
table2 | 4 times this month
Or any other more specific statistics will do.
I'm not an owner of the TAC cluster, so don't have a detailed access to driver logs.
thanks.

Configure diagnostic log delivery
Log in to the Azure portal as an Owner or Contributor for the Azure Databricks workspace and click your Azure Databricks Service resource.
In the Monitoring section of the sidebar, click the Diagnostic settings tab.
Click Turn on diagnostics.
On the Diagnostic settings page, provide the following configuration:
Name
Enter a name for the logs to create.
Archive to a storage account
Refer - https://learn.microsoft.com/en-us/azure/architecture/databricks-monitoring/application-logs

Related

Copy a Azure table (SAS) to a db on Microsoft SQL Server

Just that: Is there a way to copy a azure table (with SAS connection) to a db on Microsoft SQL Server? It could be possible with python?
Thank you all!
I've tried on SSIS visual studio 2019 with no success

You can use **azure data factory ** or azure synapse to copy the data from azure table storage to azure SQL database. Refer MS document on Introduction to Azure Data Factory - Azure Data Factory | Microsoft Learn if you are new to data factory.
Refer MS document on Copy data to and from Azure Table storage - Azure Data Factory & Azure Synapse | Microsoft Learn.
I tried to repro this in my environment.
Linked services are for Azure table storage and azure sql database.
In the linked service for azure table storage's Authentication method, SAS URI is selected and URL and token is given.
Similarly, linked service for Azure Sql databse is created by giving server name, database name, username and password.
Then Copy activity is taken and source dataset for table storage is created and given the same in source settings.
Similarly, sink dataset is created.
Once source and sink datasets are configured in copy activity, pipeline is run to copy data from table storage to Azure SQL DB.
By this way, Data can be copied from azure table storage with SAS key to Azure SQL Database.

How to read and edit MS Access file located in Cloud or in a Website using Python? [duplicate]

I have a MS-Access 2003 located on (Http://SubDomain.DomanName.Com/Folder1/mydb.mdb) and Visual Basic 6.0 ..... how to establish connection..
im just a registered user on the host domain with sub-domain and full access to the (Folder1) Directory ....
How to connect to this (Path) .

You can't.
Jet is an embedded database technology. The database engine runs in-process and needs full file I/O access to the MDB, LDB, and MDW files involved.
There is the possibility of using Remote Data Service but this is an "unconnected" model of access, basically a sort of Web Service. A 3rd party hosting provider would be very unlikely to provision this service or give you the ability to yourself.
Instead you'll probably have to design and implement some sort of Web Service or otherwise run a middle tier on the server instead.

In GCP, how can we fetch resources which doesn't have any label?

Is there any way from cli(GCloud or python) to fetch all the resources in compute, cloud composer, GCS bucket, k8s engine, data proc, and AI notebook which doesn't contain any label tags in it. It is required for the audit purpose.

Posting as a Community Wiki as it's based on John Hanley's comments:
You can use the beta version of the asset command in gcloud, more specifically you could use:
gcloud beta asset feeds list
you can find more information on the required flags needed for that command in here.

BigQuery cross project access via cloud functions

Let's say I have two GCP Projects, A and B. And I am the owner of both projects. When I use the UI, I can query BigQuery tables in project B from both projects. But I run into problems when I try to run a Cloud Function in project A, from which I try to access a BigQuery table in project B. Specifically I run into a 403 Access Denied: Table <>: User does not have permission to query table <>.. I am a bit confused as to why I can't access the data in B and what I need to do. In my Cloud Function all I do is:
from google.cloud import bigquery
client = bigquery.Client()
query = cient.query(<my-query>)
res = query.result()
The service account used to run the function exists in project A - how do I give it editor access to BigQuery in project B? (Or what else should I do?).

Basically you have an issue with IAM Permissions and roles on the service account used to run the function.
You should define the role bigquery.admin on your service account and it would do the trick.
However it may not be the adequate solution in regards to best practices. The link below provides a few scenarios with examples of roles most suited to your case.
https://cloud.google.com/bigquery/docs/access-control-examples

Is it possible to limit a Google service account to specific BigQuery datasets within a project?

I've set up a service account using the GCP UI for a specific project Project X. Within Project X there are 3 datasets:
Dataset 1
Dataset 2
Dataset 3
If I assign the role BigQuery Admin to Project X this is currently being inherited by all 3 datasets.
Currently all of these datasets inherit the permissions assigned to the service account at the project level. Is there any way to modify the permissions for the service account such that it only has access to specified datasets? e.g. allow access to Dataset 1 but not Dataset 2 or Dataset 3.
Is this type of configuration possible?
I've tried to add a condition in the UI but when I use the Name resource type and set the value equal to Dataset 1 I'm not able to access any of the datasets - presumably the value is not correct. Or a dataset is not a valid name resource.
UPDATE
Adding some more detail regarding what I'd already tried before posting, as well as some more detail on what I'm doing.
For my particular use case, I'm trying to perform SQL queries as well as modifying tables in BigQuery through the API (using Python).
Case A:
I create a service account with the role 'BigQuery Admin'.
This role is propagated to all datasets within the project - the property is inherited and I can not delete this service account role from any of the datasets.
In this case I'm able to query all datasets and tables using the Python API - as you'd expect.
Case B:
I create a service account with no default role.
No role is propagated and I can assign roles to specific datasets by clicking on the 'Share dataset' option in the UI to assign the 'BigQuery Admin' role to them.
In this case I'm not able to query any of the datasets or tables and get the following error if I try:
*Forbidden: 403 POST https://bigquery.googleapis.com/bq/projects/project-x/jobs: Access Denied: Project X: User does not have bigquery.jobs.create permission in project Project X.*
Even though the permissions required (bigquery.jobs.create in this case) exist for the dataset I want, I can't query the data as it appears that the bigquery.jobs.create permission is also required at a project level to use the API.

I'm posting the solution that I found to the problem in case it is useful to anyone else trying to accomplish the same.
Assign the role "BigQuery Job User" at a project level in order to have the permission bigquery.jobs.create assigned to the service account for that project.
You can then manually assign specific datasets the role of "BigQuery Data Editor" in order to query them through the API in Python. Do this by clciking on "Share dataset" in the BigQuery UI. So for this example, I've "Shared" Dataset 1 and Dataset 2 with the service account.
You should now be able to query the datasets for which you've assigned the BigQuery Data Editor role in Python.
However, for Dataset 3, for which the "BigQuery Data Editor" role has not been assigned, if you attempt to query a table this should return the error:
Forbidden: 403 Access Denied: Table Project-x:dataset_1.table_1: User does not have permission to query table Project-x:dataset_1.table_1.
As described above, we now have sufficient permissions to access the project but not the table within Dataset 3 - by design.

As you can see here, you can grant access in your dataset to some entities, including service accounts:
Google account e-mail: Grants an individual Google account access to
the dataset
Google Group: Grants all members of a Google group access
to the dataset Google Apps
Domain: Grants all users and groups in a
Google domain access to the dataset
Service account: Grants a service
account access to the dataset
Anybody: Enter "allUsers" to grant
access to the general public
All Google accounts: Enter
"allAuthenticatedUsers" to grant access to any user signed in to a
Google Account
I suggest that you create a service account without permissions in BigQuery and then grant the access for a specific dataset.
I hope it helps you.

Please keep in mind that access to BigQuery can be granted at project level or dataset level.
The dataset is the lowest level you can assign permissions, so that accounts can access all the resources in the dataset, e.g. tables, views, columns and rows. Permissions at project level permissions, as you have already noticed, are propagated (heritage) for all the datasets in the project.
Regarding your service account, by default Google Cloud assigns it a structure like service_accunt_name#example.gserviceaccount.com, and during the process of sharing the dataset, as commented by #rmesteves, you will need this email address to grant it the desired permissions.
It seems that the steps you described "Name resource type" are not the correct ones. In the BigQuery UI please try:
Click on the dataset name (e.g. Dataset1 in your example) you want to share.
Then, at the right on the screen you will see the option "Share Dataset", click on it.
Follow instructions to set up to your service account a BigQuery role like BigQuery Admin, BigQuery Data Owner, BigQuery User, among others. Check the previous link to be aware of what kind of things the roles can perform.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.