The goal is to scan and return all of the items in a DynamoDB table, but before the response is returned, modify a specific attribute of each specific item.
I have this completed already, but I'm curious to know if there is a more cost-effective way without looping through all the items.
Currently I'm returning a complete scan of the table and looping through each list item (found out it is not an object but a list):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('<table name>')
response = table.scan()
items = response['Items']
for item in items:
item['Thumbnail'] = 'https://s3.amazonaws.com/<s3bucket>/' + item['Thumbnail']
return items
I doubt the solution can be resolved without looping but if there is a solution that avoids looping I'm eager to hear it!
Your cost for the loop to update the items will be measured in ms. The Dynamodb scan + network latency will take much more time.
Related
When I use boto3 I can paginate if I am making a query or scan
Is it possible to do the same with put_item?
The closest to "paginating" PutItem with boto3 is probably the included BatchWriter class and associated context manager. This class handles buffering and sending items in batches. Aside from PutItem, it supports DeleteItem as well.
Here is an example of how to use it:
import boto3
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("name")
with table.batch_writer() as batch_writer:
for _ in range(1000):
batch_writer.put_item(Item={"HashKey": "...",
"Otherstuff": "..."})
Paginating is when DynamoDB reaches its maximum of 1MB response size or it you are using --limit. It allows you to get the next "page" of data.
That does not make sense with a PutItem as you are simply putting a single item.
If what you mean is you want to put more than 1 item at a time, then use BatchWriteItem API where you can pass in a batch of up to 25 items.
You can also use high level interfaces like the batch_writer in boto3 where you can give it a list of items any size and it breaks the list into chunks of 25 for you and writes those batches while also handling any retry logic:
import boto3
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("name")
with table.batch_writer() as batch_writer:
for _ in range(1000):
batch_writer.put_item(Item=myitem)
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/dynamodb.html#
I have table called details
I am trying to get the live count from table
Below is the code
I was having already 7 items in the table and I inserted 8 items. Now My output has to show 15.
Still my out showing 7 which is old count. How to get the live updated count
From UI also when i check its showing 7 but when i checked live count by Start Scan, I got 15 entries
Is there time is there like some hours which will update the live count?
dynamo_resource = boto3.resource('dynamodb')
dynamodb_table_name = dynamo_resource.Table('details')
item_count_table = dynamodb_table_name.item_count
print('table_name count for field is', item_count_table)
using client
dynamoDBClient = boto3.client('dynamodb')
table = dynamoDBClient.describe_table(TableName='details')
print(table['Table']['ItemCount'])
In your example you are calling DynamoDB DescribeTable which only updates its data approximately every 6 hours:
Item Count:
The number of items in the specified table. DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TableDescription.html#DDB-Type-TableDescription-ItemCount
In order to get the "Live Count" you have two possible options:
Option 1: Scan the entire table.
dynamoDBClient = boto3.client('dynamodb')
item_count = dynamoDBClient.scan(TableName='details', Select='COUNT')
print(item_count)
Be aware this will call for a full table Scan and may not be the best option for large tables.
Option 2: Update a counter for each item you write
You would need to use TransactWriteItems and Update a live counter for every Put which you do to the table. This can be useful, but be aware that you will limit your throughput to 1000 WCU per second as you are focusing all your writes on a single item.
You can then return the live item count with a simple GetItem.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactWriteItems.html
According to the DynamoDB documentation,
ItemCount - The number of items in the global secondary index. DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
You might need to keep track of the item counts yourself if you need to return an updated count shortly after editing the table.
I have a dynamoDB table and I want to check if there are any items in it (using python). In other words, return true is the table is empty.
I am not sure how to go about this. Any suggestions?
Using Scan
The best way is to scan and check the count. You might be using boto3 AWS sdk for python.Use the scan function to scan the whole table and get the count.This may not be costly as you are scanning the table only once and it would not scan the entire table.
A single scan returns only 1 MB of data, so it would not be time consuming.
Read the docs for more details : Boto3 Docs
Using describe table
This could be helpful as well to get the count but
DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
so this could be only used if you don't want the most recent updated value.
Read the docs for more details : describe table dynamodb
You can simply take the count of that particular table using boto3, which is the AWS SDK for Python:
import boto3
def table_is_empty(table_name):
dynamo_resource = boto3.resource('dynamodb')
table = dynamo_resource.Table(table_name)
return table.item_count == 0
Note that the values are updated periodically and the result might not be precise:
The number of items in the specified index. DynamoDB updates this
value approximately every six hours. Recent changes might not be
reflected in this value.
You can use the Describe table function from boto3 in the response you can get the number of items that are in the table as you can see in the response example on the link.
Part of the command response:
'TableSizeBytes': 123,
'ItemCount': 123,
'TableArn': 'string',
'TableId': 'string',
As said in the comments, the vaule is updated aproximately every 6h, so recent changes may not be updated.
I have hundreds of thousands of records in the collection named "student_details".
using below pymongo query
students_info = db.student_details.find()
it gives me all the records which is huge.and I don't want any filters there,i mean no where clause.i want to fetch all the records
Now if I am using for loop and appending it to list.
def student_information():
student_list = []
for student in students_info:
'''
in between there are numbers of if else blocks'''
student_list.append(student)
return jsonify({"result":student_list})
it takes huge number of time which is making the response time very late.
please help me how can I make it time efficient.
I use a loop query, and I want to avoid fetching data that was previously fetched.
The best idea that I came up with is to make an ever-expanding blacklist of the data that was fetched, and remove data that was blacklisted every time that I fetch.
I've managed to do so by adding every data that was fetched successfully to a blacklist (called 'allWords'):
allWords.Extend(fetchedData)
And then fetching all the items which are not in 'allWords':
c.execute("SELECT formatted FROM dictionary WHERE formatted LIKE ('__A_')")
words=[item[0] for item in c.fetchall() if item[0] not in allWords]
return words
But this way I still fetch all the date, is there any smart way to do so?