I know that using Google Client Library(dataset.AccessEntry), we can update our roles to the specific dataset for the requested user (Reference). But wants to know how to remove that access when roles have been changed like (from Reader to Writer/Owner, etc.).
I Want to do this deletion automatically like role, dataset name and email comes from the UI as input, python code should update the roles to the specific requested dataset. Appreciate your help.
I am able to delete the entry from dataset.AccessEntry by using remove() method, which removes the first matching element (passed as an argument) from a list in Python. You need to specify PROJECT, DATASET_NAME and role, entity_type, entity_id for corresponding entry you wish to remove.
from google.cloud import bigquery
from google.cloud.bigquery.dataset import DatasetReference
PROJECT='<PROJECT_NAME>'
bq = bigquery.Client(project=PROJECT)
dsinfo = bq.get_dataset("<DATASET_NAME>")
#Specify the entry that will loose access to a dataset
entry = bigquery.AccessEntry(
role="<ROLE>",
entity_type="<ENTITY_TYPE>",
entity_id="<EMAIL>",
)
if entry in dsinfo.access_entries:
entries = list(dsinfo.access_entries)
entries.remove(entry)
dsinfo.access_entries = entries
dsinfo = bq.update_dataset(dsinfo, ["access_entries"])
else:
print("Entry wasn't found in dsinfo.access_entries")
print(dsinfo.access_entries)
You can find the official documentation for google.cloud.bigquery.dataset.AccessEntry here.
Related
I've been trying to attach policy tags to existing BigQuery tables using GCP's Python client libraries but can't seem to find a way.
I'm aware of how to create taxonomies and policy tags within PolicyTagManager
But I can't seem to attach policy tags.
I found 'google.cloud.bigquery.schema.PolicyTagList' but I'm not sure where the tableID gets specified or how will policy tags be attached using this method?
Appreciate the help in advance!
Assumption: Keeping column names same before and after updating each with policy tags
Steps:
Pull the table using get_table() and iterate over it's schema (list of SchemaField)
Create new SchemaField object(s), add parameter values by referring to SchemaField's name/description (from step#1) except for policy_tags (type = PolicyTagList
example:
google.cloud.bigquery.SchemaField(name=column.name, field_type=column.field_type, policy_tags=google.cloud.bigquery.PolicyTagList(["projects/some-project/locations/us/taxonomies/123454321/policyTags/180000"]))
Append these new SchemaFields objects into a list and re-assign the table's schema with this new list
use update_table to update schema such as:
google.cloud.bigquery.Client(project = BIGQUERY_PROJECT).update_table(table, ['schema'])
I have a program reading data from a Google firestore database.
The database contains data for different users, and each instance of the program is supposed to read data only for a specified user.
The data is organized in this way:
UsersInfo (Collection)
|________User01 (document)
|________User02 (document)
...
|________UserN (document)
where each of the User documents contains an identifying ID.
The first time the program runs, it initializes the database and it looks for the right document containint the user info this way:
cred = credentials.Certificate(fcredentials_file.json)
firebase_admin.initialize_app(cred)
db = firestore.client()
docs = db.collection(u'UsersInfo').stream()
user_found = False
current_user_document = ''
## find the right document, based on user_ID
try:
for doc in docs:
if doc.to_dict()['Userid'] == user_ID:
current_user_document = doc.id
user_found = True
print(f"User found in document {current_user_document}")
break
except:
print("Impossible to find user in firestore!!!")
At this point, the correct document for the required user has been located.
This info is passed to other processes in the system, which routinely check this document to retrieve some info, something like:
doc_ref = db.collection(u'UserInfo').document(UserXX)
return doc_ref.get().to_dict()['some_field']
I was expecting that:
during the initialization, the program checks all the UserXX documents in the collection (it's about 50 of them) -> 50 reads;
everytime the other processes check the identified User document, it counts as another read.
However, the amount of reported reads is skyrocketing...I ran the system a couple of times today, each time it performed the initialization and the other components checked the User document 4 or 5 times...but now the Usage reports 11K reads!
Am I doing something wrong, or did I misunderstand what even counts as a read?
This one line alone immediately costs one read for every document in the collection:
docs = db.collection(u'UsersInfo').stream()
It doesn't matter what you do next - all of the documents are now read and available in memory.
If you are looking for only documents in the collection whose Userid field contains a specific value, you should query the collection using a filter on that field.
docs = db.collection(u'UsersInfo').where(u'Userid', u'==', user_ID).stream()
I can see in the Couchbase admin console that my following Python code is putting the data/documents in the bucket:
def write_to_bucket(received_cluster, current_board, always_never, position):
board_as_a_string = ''.join(['s', 'i', 'e'])
cb = received_cluster.open_bucket('boardwascreated')
cb.upsert(board_as_a_string,
{'BoardAsString': board_as_a_string})
But then no matter what I do I can't query the data from Python. I try things like:
def database_query(receiving_cluster):
cb = receiving_cluster.open_bucket('boardwascreated')
for row in cb.n1ql_query('SELECT * FROM boardwascreated'):
print(row)
I am trying every possible thing from https://docs.couchbase.com/python-sdk/2.5/n1ql-queries-with-sdk.html, but every time I try something I get the following error:
No index available on keyspace boardwascreated that matches your query.
Use CREATE INDEX or CREATE PRIMARY INDEX to create an index,
or check that your expected index is online.
To run N1QL queries on a bucket, you need to an index on that bucket. The basic way to do that is to create a primary index on it.
Open your admin console and go to the Query WorkBench. Once you're in the admin console, you should see a "Query" tab on the left. Then run this command to create the primary index.
CREATE PRIMARY INDEX ON boardwascreated
You will also need to supply username/password credentials in order to access the bucket. Initially, you can use whatever Administrator/admin_password combination you have created. I'm not sure off-hand how to supply that using the Python SDK; check the docs.
Later, you should go to the Security tab and create a specialized user for whatever application you are building and give that user whatever query permissions they need.
I'm using the Python SDK to create a TDE file. I want to add multiple tables to the TDE file. So I tried doing that but I got a duplicate name error:
dataextract.Exceptions.TableauException: TableauException (303):
duplicate table name
No problemo, I changed the name so that it counts up with each table I create:
tde_table = tde_file.addTable('Extract'+str(i), table_definition)
but then I get a new and exciting error:
dataextract.Exceptions.TableauException: TableauException (303): table
name must be "Extract"
Perhaps Extracts created through the SDK cannot have more than one table per extract? If every table in an extract needs to be named the same thing, but they can't have duplicate names... I'm confused. Can someone help clarify this for me?
Here's all the relevant code I think, but I don't know if it'll be much help:
...
for i, df in enumerate(dataframes):
table_return_list = _form_table_definition(df,data_types,read_out)
table_definition = table_return_list[0]
header_type_map = table_return_list[1]
#use the table definition to create the table and row
tde_table = tde_file.addTable('Extract'+str(i), table_definition)
tde_row = tde.Row(table_definition)
...
Seems that it's impossible at the present moment to add more than one table to a data extract through the Python SDK. I don't know otherwise.
http://onlinehelp.tableau.com/current/api/sdk/en-us/SDK/Python/html/classtableausdk_1_1_extract_1_1_extract.html#a70b49a6eca6f1724bd89a928c73ecc8c
From their SDK documentation:
def tableausdk.Extract.Extract.addTable ( self, name,
tableDefinition ) Adds a table to the extract.
Parameters
self The object pointer.
name The name of the table to add.
Currently, this method can only add a table named "Extract".
I'm having some trouble wrapping my head around NDB. For some reason it's just not clicking. The thing i'm struggling with the most is the whole key/kind/ancestor structure.
I'm just trying to store a simple set of Json data. When i store data, i want to check beforehand to see if a duplicate entity exists (based on the key, not the data) so i don't store a duplicate entity.
class EarthquakeDB(ndb.Model):
data = ndb.JsonProperty()
datetime = ndb.DateTimeProperty(auto_now_add=True)
Then, to store data:
quake_entry = EarthquakeDB(parent=ndb.Key('Earthquakes', quake['id']), data=quake).put()
So my questions are:
How do i check to see if that particular key exists before i insert more data?
How would i go about pulling that data out to read based on the key?
After some trial and error, and with the assistance of voscausa, here is what i came up with to solve the problem. The data is being read in via a for loop.
for quake in data:
quake_entity = EarthquakeDB.get_by_id(quake['id'])
if quake_entity:
continue
else:
quate_entity = EarthquakeDB(id=quake['id'], data=quake).put()
Because you do not provide a full NDB key (only a parent) you will always insert a unique key.
But you use your own entity id for the parent? Why?
I think you mean:
quake_entry = EarthquakeDB(id=quake['id'], data=quake)
quake_entry.put()
To get it, you can use:
quate_entry = ndb.Key('Earthquakes', quake['id']).get()
Here you can find two excellent videos about the datastore, strong consistency and entity groups. Datastore Introduction and Datastore Query, Index and Transaction.