Check if DynamoDB table Empty - python

I have a dynamoDB table and I want to check if there are any items in it (using python). In other words, return true is the table is empty.
I am not sure how to go about this. Any suggestions?

Using Scan
The best way is to scan and check the count. You might be using boto3 AWS sdk for python.Use the scan function to scan the whole table and get the count.This may not be costly as you are scanning the table only once and it would not scan the entire table.
A single scan returns only 1 MB of data, so it would not be time consuming.
Read the docs for more details : Boto3 Docs
Using describe table
This could be helpful as well to get the count but
DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
so this could be only used if you don't want the most recent updated value.
Read the docs for more details : describe table dynamodb

You can simply take the count of that particular table using boto3, which is the AWS SDK for Python:
import boto3
def table_is_empty(table_name):
dynamo_resource = boto3.resource('dynamodb')
table = dynamo_resource.Table(table_name)
return table.item_count == 0
Note that the values are updated periodically and the result might not be precise:
The number of items in the specified index. DynamoDB updates this
value approximately every six hours. Recent changes might not be
reflected in this value.

You can use the Describe table function from boto3 in the response you can get the number of items that are in the table as you can see in the response example on the link.
Part of the command response:
'TableSizeBytes': 123,
'ItemCount': 123,
'TableArn': 'string',
'TableId': 'string',
As said in the comments, the vaule is updated aproximately every 6h, so recent changes may not be updated.

Related

How to get live count immediately after insertion from dynamodb using boto3

I have table called details
I am trying to get the live count from table
Below is the code
I was having already 7 items in the table and I inserted 8 items. Now My output has to show 15.
Still my out showing 7 which is old count. How to get the live updated count
From UI also when i check its showing 7 but when i checked live count by Start Scan, I got 15 entries
Is there time is there like some hours which will update the live count?
dynamo_resource = boto3.resource('dynamodb')
dynamodb_table_name = dynamo_resource.Table('details')
item_count_table = dynamodb_table_name.item_count
print('table_name count for field is', item_count_table)
using client
dynamoDBClient = boto3.client('dynamodb')
table = dynamoDBClient.describe_table(TableName='details')
print(table['Table']['ItemCount'])
In your example you are calling DynamoDB DescribeTable which only updates its data approximately every 6 hours:
Item Count:
The number of items in the specified table. DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TableDescription.html#DDB-Type-TableDescription-ItemCount
In order to get the "Live Count" you have two possible options:
Option 1: Scan the entire table.
dynamoDBClient = boto3.client('dynamodb')
item_count = dynamoDBClient.scan(TableName='details', Select='COUNT')
print(item_count)
Be aware this will call for a full table Scan and may not be the best option for large tables.
Option 2: Update a counter for each item you write
You would need to use TransactWriteItems and Update a live counter for every Put which you do to the table. This can be useful, but be aware that you will limit your throughput to 1000 WCU per second as you are focusing all your writes on a single item.
You can then return the live item count with a simple GetItem.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactWriteItems.html
According to the DynamoDB documentation,
ItemCount - The number of items in the global secondary index. DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
You might need to keep track of the item counts yourself if you need to return an updated count shortly after editing the table.

How can I get the total number of calls to my dynamodb table?

I'm working with AWS Lambda. I created a lambda function that perform a get operation to my dynamoDB table. Depending on the id (primary key) I pass to this get function, it should return me the correct item in JSON format. For that, I'm using the get_item function from boto3:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.get_item
So normally, if a call my lambda function via an API (created from API Gateway) by specifying and ID, I should get the corresponding item. The problem is that I also need to get the number of times I retrieved a result. For example, if it's the seventh time I call my lambda function, I should get an item (still depending on the id) and the index 7, like this :
{
"7":{
"id" :1246 ,
"toy":"car",
"color": "red"
}
}
Logically, the number of times that I call my lambda function is the number of times that I call dynamoDB. Than I suppose that the correct way to get this number is by maybe using dynamodb, but I already spent hours trying to find a way to get this number of events/calls to my table by looking everywhere... What can I do to get this number ? and how could I implement this using boto3?
There is no out of box solution to get the number of calls to the table in DynamoDb. You need to write a custom counter that will be shared across Lambda calls.
The easiest option and the fastest solution is probably using Redis and it's INCR operation to perform atomic increments. If you're not familiar with Redis, check the doc for INCR operation and specifically the Pattern: Counter section.
If you only can use the DynamoDb, you need to maintain a counter in a separate single item. Example:
{
"partionKey": "counter_item",
"counter": 1
}
Then you can execute update calls to increment the counter like that:
response = table.update_item(
Key={'partionKey': {'S': ':pk'}},
TableName='your_table_name',
ReturnValues='ALL_NEW',
UpdateExpression='SET #counter = if_not_exists(#counter, :default) + :incr',
ExpressionAttributeValues={
':pk': 'counter_item',
':incr': 1,
':default': 0
},
ExpressionAttributeNames={
'#counter': 'counter'
}
)
There will be an updated item in the response so you can get the counter field from it.
You can check this DynamoDb guide for better examples in python.
I feel there are plenty of ways to find out the number of server calls.
If you have the logs you can easily get the server calls of any of the specific pages.
Use the AWS Dashboard to get the info matrices. (It has everything, latency, failure ratio, calls, etc.)
Write your own function which can count the get, post, and update calls. (It will be similar to profile hits, generally, this is used at
the initial stage of a project.)

How to select all data in PyMongo?

I want to select all data or select with conditional in table random but I can't find any guide in MongoDB in Python to do this.
And I can't show all data was select.
Here my code:
def mongoSelectStatement(result_queue):
client = MongoClient('mongodb://localhost:27017')
db = client.random
cursor = db.random.find({"gia_tri": "0.5748676522161966"})
# cursor = db.random.find()
inserted_documents_count = cursor.count()
for document in cursor:
result_queue.put(document)
There is a quite comprehensive documentation for mongodb. For python (Pymongo) here is the URL: https://api.mongodb.org/python/current/
Note: Consider the version you are running. Since the latest version has new features and functions.
To verify pymongo version you are using execute the following:
import pymongo
pymongo.version
Now. Regarding the select query you asked for. As far as I can tell the code you presented is fine. Here is the select structure in mongodb.
First off it is called find().
In pymongo; if you want to select specific rows( not really rows in mongodb they are called documents. I am saying rows to make it easy to understand. I am assuming you are comparing mongodb to SQL); alright so If you want to select specific document from the table (called collection in mongodb) use the following structure (I will use random as collection name; also assuming that the random table has the following attributes: age:10, type:ninja, class:black, level:1903):
db.random.find({ "age":"10" }) This will return all documents that have age 10 in them.
you could add more conditions simply by separating with commas
db.random.find({ "age":"10", "type":"ninja" }) This will select all data with age 10 and type ninja.
if you want to get all data just leave empty as:
db.random.find({})
Now the previous examples display everything (age, type, class, level and _id). If you want to display specific attributes say only the age you will have to add another argument to find called projection eg: (1 is show, 0 is do not show):
{'age':1}
Note here that this returns age as well as _id. _id is always returned by default. You have to explicitly tell it not to returning it as:
db.random.find({ "age":"10", "name":"ninja" }, {"age":1, "_id":0} )
I hope that could get you started.
Take a look at the documentation is very thorough.

PyMongo cursor operations are very slow

I'm new to both MongoDB and pyMongo,
and am having some performance issues
regarding cursors.
TL,DNR: Anything operation I try to perform
using a cursor takes about a second.
Long version
I have a small database, which I bulkloaded. Each entry has 3 fields:
dom: domain name (unique)
date: date, YYYYMMDD
flag: string
I've loaded about 1.9 million entries, without incident, and quite quickly.
I created a hash index on the dom field.
Now, I want to grab certain records by the domain field, and update them, using a Python program.
That's where the problem lies.
I'm using the latest MongoDB, and the latest pyMongo.
stripped down program...
import pymongo
from pymongo import MongoClient
db = client.myindexname
posts = db.posts
print list(db.profiles.index_information()) # shows hash index is present
for k in newdomainlist.keys(): #iterate list of domains to check
ret = posts.find({"dom": k}) #this runs fine, and quickly
#'ret' is a cursor
print ret #this runs quickly
#Here's the problem
print ret.count() #this takes about a second. why?
If I just 'print ret', the speed is fine. However, if I try to
reference anything in the cursor, the speed drops to the floor - I
can do about 1 operation per second.
In this case, I'm just trying to see if ret.count() returns '0' (we don't
have this domain), or '1' (we have it already).
I've tried adding a batch_size(10000) to the find, without it helping.
I DO have the Python C extensions loaded.
What the heck am I doing wrong?
thanks
It turned out that I'd created my hashed index on the wrong field, 'collection', rather than 'posts'. Chalk it up to mongodb inexperience. We can close this one now, or delete it entirely.

mongodb update(use upsert=true) not update exists data, insert a new data?

in my program , ten process to write mongodb by update(key, doc, upsert=true)
the "key" is mongodb index, but is not unique.
query = {'hotelid':hotelid,"arrivedate":arrivedate,"leavedate":leavedate}
where = "data.%s" % sourceid
data_value_where = {where:value}
self.collection.update(query,{'$set':data_value_where},True)
the "query" id the not unique index
I found sometimes the update not update exists data, but create a new data.
I write a log for update method return, the return is " {u'ok': 1.0, u'err': None, u'upserted': ObjectId('5245378b4b184fbbbea3f790'), u'singleShard': u'rs1/192.168.0.21:10000,192.168.1.191:10000,192.168.1.192:10000,192.168.1.41:10000,192.168.1.113:10000', u'connectionId': 1894107, u'n': 1, u'updatedExisting': False, u'lastOp': 5928205554643107852L}"
I modify the update method to update(query, {'$set':data_value_where},upsert=True, safe=True), but three is no change for this question.
You can call it "threadsafe", as the update itself is not done in Python, it's in the mongodb, which is built to cater many requests at once.
So in summary: You can safely do that.
You would not end up with duplicate documents due to the operator you are using. You are actually using an atomic operator to update.
Atomic (not to be confused with SQL atomic operations of all or nothing here) operations are done in sequence so each process will never pick up a stale document or be allowed to write two ids to the same array since the document each $set operation picks up will have the result of the last $set.
The fact that you did get duplicate documents most likely means you have an error in your code.

Categories

Resources