Random IDs showing up in Firestore collection - python

I’m getting auto generated IDs in my firestore collection even though I’m specifying IDs when creating documents.
I am currently load testing my FastAPI app, currently testing it synchronously. My firestore IDs are coming from a counter stored in firebase’s realtime DB. The counter consists of alphanumeric characters and I’m incrementing them in a transaction. I’m then seeing if a document with that ID exists in firestore, when I find a ID that doesn’t exist I use .set() to create a document with that ID.
def increment_realtimedb() -> str:
try:
return rdb.transaction(crement)
except db.TransactionAbortedError:
increment_realtimedb()
def insert_firestore(payload: dict):
new_id = increment_realtimedb()
doc = collection.document(new_id).get()
while doc.exists:
new_id = increment_realtimedb()
doc = collection.document(new_id).get()
collection.document(new_id).set(payload)

Figured out that increment_realtimedb() was returning None somehow. I changed the while loop to check if new_id was None. That seems to have fixed the problem.
while doc.exists or new_id is None:
Edit:
After further research it turns out realtime db will return None when you max out the retries for a transaction.

Related

Reading data from Google firestore with Python, daily reads count increases too fast

I have a program reading data from a Google firestore database.
The database contains data for different users, and each instance of the program is supposed to read data only for a specified user.
The data is organized in this way:
UsersInfo (Collection)
|________User01 (document)
|________User02 (document)
...
|________UserN (document)
where each of the User documents contains an identifying ID.
The first time the program runs, it initializes the database and it looks for the right document containint the user info this way:
cred = credentials.Certificate(fcredentials_file.json)
firebase_admin.initialize_app(cred)
db = firestore.client()
docs = db.collection(u'UsersInfo').stream()
user_found = False
current_user_document = ''
## find the right document, based on user_ID
try:
for doc in docs:
if doc.to_dict()['Userid'] == user_ID:
current_user_document = doc.id
user_found = True
print(f"User found in document {current_user_document}")
break
except:
print("Impossible to find user in firestore!!!")
At this point, the correct document for the required user has been located.
This info is passed to other processes in the system, which routinely check this document to retrieve some info, something like:
doc_ref = db.collection(u'UserInfo').document(UserXX)
return doc_ref.get().to_dict()['some_field']
I was expecting that:
during the initialization, the program checks all the UserXX documents in the collection (it's about 50 of them) -> 50 reads;
everytime the other processes check the identified User document, it counts as another read.
However, the amount of reported reads is skyrocketing...I ran the system a couple of times today, each time it performed the initialization and the other components checked the User document 4 or 5 times...but now the Usage reports 11K reads!
Am I doing something wrong, or did I misunderstand what even counts as a read?
This one line alone immediately costs one read for every document in the collection:
docs = db.collection(u'UsersInfo').stream()
It doesn't matter what you do next - all of the documents are now read and available in memory.
If you are looking for only documents in the collection whose Userid field contains a specific value, you should query the collection using a filter on that field.
docs = db.collection(u'UsersInfo').where(u'Userid', u'==', user_ID).stream()

python mysqlx.Result.get_autoincrement_value() doesn't work

I'm trying to use the document storage of MySQL 8 in my python project(python 3.8). The version of MySQL-connector python is 8.0.20. According to the API reference and the X DevAPI User Guide, I tried to get the auto increment document ID after adding a document into the DB. Each time after the execution, the data would be inserted into DB successfully, but '-1' would be returned after get_autoincrement_value() was invoked.
My code is just like below:
try:
schema = session.get_schema('my_schema')
collection = schema.get_collection('my_collection')
topic_dict = protobuf_to_dict(topic)
doc_id = collection.add(topic_dict).execute().get_autoincrement_value()
logger.debug('doc_id: {}', doc_id)
return doc_id
except Exception as e:
logger.exception("failed to add topic to db, topic: {}, err: {}", topic, e)
Is there anything wrong with my usage? Thank you all~
Seems like you are interested in the document id that has been auto-generated. If that is the case, you should instead use get_generated_ids:
doc_id = collection.add(topic_dict).execute().get_generated_ids()[0]
In this case, the method returns a list of all the ids that were generated in the scope of the add() operation.
The documentation is probably not clear enough, but get_auto_increment_value() only contains useful data if you are inserting a row with either session.sql() or table.insert() on a table containing an AUTO_INCREMENT column. It has no meaning in the scope of NoSQL collections because in the end a collection is just a table created like (condensed version):
CREATE TABLE collection (
`doc` json DEFAULT NULL,
`_id` varbinary(32),
PRIMARY KEY (`_id`)
)
Which means there isn't anything to "auto increment".
Disclaimer: I'm the lead developer of the MySQL X DevAPI Connector for Node.js

BigQuery Dataset How to Remove Roles From dataset.AccessEntry

I know that using Google Client Library(dataset.AccessEntry), we can update our roles to the specific dataset for the requested user (Reference). But wants to know how to remove that access when roles have been changed like (from Reader to Writer/Owner, etc.).
I Want to do this deletion automatically like role, dataset name and email comes from the UI as input, python code should update the roles to the specific requested dataset. Appreciate your help.
I am able to delete the entry from dataset.AccessEntry by using remove() method, which removes the first matching element (passed as an argument) from a list in Python. You need to specify PROJECT, DATASET_NAME and role, entity_type, entity_id for corresponding entry you wish to remove.
from google.cloud import bigquery
from google.cloud.bigquery.dataset import DatasetReference
PROJECT='<PROJECT_NAME>'
bq = bigquery.Client(project=PROJECT)
dsinfo = bq.get_dataset("<DATASET_NAME>")
#Specify the entry that will loose access to a dataset
entry = bigquery.AccessEntry(
role="<ROLE>",
entity_type="<ENTITY_TYPE>",
entity_id="<EMAIL>",
)
if entry in dsinfo.access_entries:
entries = list(dsinfo.access_entries)
entries.remove(entry)
dsinfo.access_entries = entries
dsinfo = bq.update_dataset(dsinfo, ["access_entries"])
else:
print("Entry wasn't found in dsinfo.access_entries")
print(dsinfo.access_entries)
You can find the official documentation for google.cloud.bigquery.dataset.AccessEntry here.

python firebase update items using where query

I am using python with firebase, and have a table, I need to update all items that have a specific value in a field, I found out how to query for all the items having that value here and here
Now I need to update all the items that I get in return.
What I would like to do, is to use a single query to perform a single operation that will find the correct items using the where query, and update a certain field to a certain value, I need that functionality in order to keep some data consistent.
Thanks
It is possible to do it, but in more steps:
# the usual preparation
import firebase_admin
from firebase_admin import credentials, firestore
databaseURL = {'databaseURL': "https://<YOUR-DB>.firebaseio.com"}
cred = credentials.Certificate("<YOUR-SERVICE-KEY>.json")
firebase_admin.initialize_app(cred, databaseURL)
database = firestore.client()
col_ref = database.collection('<YOUR-COLLECTION>')
# Query generator for the documents; get all people named Pepe
results = col_ref.where('name', '==', 'Pepe').get()
# Update Pepe to José
field_updates = {"name": "José"}
for item in results:
doc = col_ref.document(item.id)
doc.update(field_updates)
(I use Cloud Firestore, maybe it is different in Realtime Database)
You can iterate through the query and find the DocumentReference in each iteration.
db = firestore.client()
a= request_json['message']['a']
b= request_json['message']['b']
ref = db.collection('doc').where(a'', u'==', u'a').stream()
for i in ref:
i.reference.update({"c":"c"}) # get the document reference and use it to update\delete ...

How to query if entity exists in app engine NDB

I'm having some trouble wrapping my head around NDB. For some reason it's just not clicking. The thing i'm struggling with the most is the whole key/kind/ancestor structure.
I'm just trying to store a simple set of Json data. When i store data, i want to check beforehand to see if a duplicate entity exists (based on the key, not the data) so i don't store a duplicate entity.
class EarthquakeDB(ndb.Model):
data = ndb.JsonProperty()
datetime = ndb.DateTimeProperty(auto_now_add=True)
Then, to store data:
quake_entry = EarthquakeDB(parent=ndb.Key('Earthquakes', quake['id']), data=quake).put()
So my questions are:
How do i check to see if that particular key exists before i insert more data?
How would i go about pulling that data out to read based on the key?
After some trial and error, and with the assistance of voscausa, here is what i came up with to solve the problem. The data is being read in via a for loop.
for quake in data:
quake_entity = EarthquakeDB.get_by_id(quake['id'])
if quake_entity:
continue
else:
quate_entity = EarthquakeDB(id=quake['id'], data=quake).put()
Because you do not provide a full NDB key (only a parent) you will always insert a unique key.
But you use your own entity id for the parent? Why?
I think you mean:
quake_entry = EarthquakeDB(id=quake['id'], data=quake)
quake_entry.put()
To get it, you can use:
quate_entry = ndb.Key('Earthquakes', quake['id']).get()
Here you can find two excellent videos about the datastore, strong consistency and entity groups. Datastore Introduction and Datastore Query, Index and Transaction.

Categories

Resources