Paginating a DynamoDB query in boto3 - python

How can I loop through all results in a DynamoDB query, if they span more than one page? This answer implies that pagination is built into the query function (at least in v2), but when I try this in v3, my items seem limited:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
fooTable = dynamodb.Table('Foo')
response = fooTable.query(
KeyConditionExpression=Key('list_id').eq('123')
)
count = 0
for i in response['Items']:
count += 1
print count # Prints a subset of my total items

ExclusiveStartKey is the name of the attribute which you are looking for.
Use the value that was returned for LastEvaluatedKey in the previous operation.
The data type for ExclusiveStartKey must be String, Number or Binary. No set data types are allowed.
http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#DynamoDB.Client.query

Related

boto3 DynamoDB update_item() API creates a new item (with range key) instead of updating it

I am trying a simple operation on my DynamoDB table. The schema is very simple
(Hash Key) SummaryId : String
(Sort Key) Status : String
def put_item(dynamo_table, summary_id, status):
return dynamo_table.put_item(
Item={
'SummaryId': summary_id,
'Status': status
},
ReturnValues="ALL_OLD"
)
def update_item(dynamo_table, summary_id, status):
response = dynamo_table.update_item(
Key={'SummaryId': summary_id},
AttributeUpdates={
'Status': status,
},
ReturnValues="UPDATED_OLD"
)
return response
def initialize_dynamodb_table():
dynamodb = boto3.Session(profile_name=PROFILE_NAME,
region_name=REGION_NAME) \
.resource('dynamodb')
return dynamodb.Table(TABLE_NAME)
def main():
dynamodb_table = initialize_dynamodb_table()
# Update the above item
response = put_item(dynamodb_table, "Id1::Id2::Id4", "IN_PROGRESS")
pprint(response)
response = update_item(dynamodb_table, "Id1::Id2::Id4", "COMPLETE")
pprint(response)
if __name__ == '__main__':
main()
The item with PK "Id1::Id2::Id4" doesn't exist. So the put_item() is expected to add this item.
My intention with the update_item() api is that it will change the item status from "IN_PROGRESS" to "COMPLETE".
But instead the update_item() API creates a new item in the table with PK "Id1::Id2::Id4"
and RK "COMPLETE"
How do I achieve the expected behavior? Thanks in advance!
Based on the schema you described the two operations (put and update) result in two different items, this is an expected behaviour.
DynamoDB's Core Concepts page describes Partition Key (Hash) and Sort Key (Range) like so:
Partition key – A simple primary key, composed of one attribute known as the partition key.
and
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
The important bit for you, is this:
In a table that has a partition key and a sort key, it's possible for two items to have the same partition key value. However, those two items must have different sort key values.
Applying the above to your case, this means you're creating an object with PK Id1::Id2::Id4 - SK IN_PROGRESS AND another object Id1::Id2::Id4 - SK COMPLETE.
If your unique identifier is just Id1::Id2::Id4, then change the schema of your table and leave only the Partition Key, if you do that the code above will first insert an item with that ID and status IN_PROGRESS and then update that same item to COMPLETE.

how to use keycondition in Query method from DynamoDb Table using boto3

'
I am trying to query the Dynamodb table to display the items from the table.
I am using a table.query with some conditions such as projectionexpression etc.
Since we knew that scan Method scans the whole table which is time-consuming, so im trying to use a query where i am getting an issue with putting key conditions as it is mandatory criteria.
So can you help me in this code.
using this URL :http://localhost:0000/rsi/hotels
My Repository.py ->
def list_items(self):
ProjectionExpression = "Id,#Name,Description"
ean = {"#Name": "Name"}
# Using this URL: http://localhost:0000/rsi/hotels
# how can I use keycondition for this in below table
esk = Key('Id').gt(0) # stuck here dono what exact condition to
give
limit = settings.AWS_SCAN_LIMIT
items = []
tableResponse = table.query(
ProjectionExpression=ProjectionExpression,
ExpressionAttributeNames=ean,
KeyConditionExpression = esk,
Limit=limit )
items.extend(tableResponse['Items'])
while 'LastEvaluatedKey' in tableResponse:
tableResponse = table.query(
ProjectionExpression=ProjectionExpression,
ExpressionAttributeNames=ean,
ExclusiveStartKey=tableResponse['LastEvaluatedKey'],
KeyConditionExpression=esk,
Limit = limit
)
items.extend(tableResponse['Items'])
return items
ERROR: ERROR - An error occurred (ValidationException) when calling the Query operation: Query key condition not supported
Actually, my table consists of
id name description
1 hai des
09 bye des2
123 there des3
etc......
So, i am not able to get of any other option on how to use parallel scan and query effectively, So,i have approached you, people.
Please help me on this!

return item with maximum sort-key in dynamodb

I'm using a python script to access a dynamodb database in AWS.
I have a table with a hash key and sort key.
For a given hash key, I want to find the item with the largest sort key that is less than a certain value. How can I do that?
Alternatively, is there a way to find the previous item from a given key?
I am not trying to find the item with the largest attribute value (an expensive task in dynamodb), I want the largest key value.
I found the answer,
import boto3
import botocore
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(table_name)
response = table.query(
Limit = 1,
ScanIndexForward = False,
KeyConditionExpression=Key('device').eq(device) & Key('epoch').lte(threshold)
)
Where:
'device' is my hash key
'epoch' is my sort key
threshold is the value I want to search below

Returning the entire dataset using Google App Engine indexed search

Is there any way to fetch the entire dataset in an app engine search index? The below search takes an integer limit through QueryOptions, and the limit which always needs to be present.
I'm unable to determine if there is some special flag that can bypass this limit and return the entire result set. If the query is made without a QueryOptions, the result set is limited to 20 somehow.
_INDEX = search.Index(name=constants.SEARCH_INDEX)
_INDEX.search(query=search.Query(
query,
options=search.QueryOptions(
limit=limit,
sort_options=search.SortOptions(...))))
Any ideas?
You could customise the delete all example, if indeed you want every document in the index rather then every result in a query https://cloud.google.com/appengine/docs/python/search/#Python_Deleting_documents_from_an_index
from google.appengine.api import search
def delete_all_in_index(index_name):
"""Delete all the docs in the given index."""
doc_index = search.Index(name=index_name)
# looping because get_range by default returns up to 100 documents at a time
while True:
# Get a list of documents populating only the doc_id field and extract the ids.
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Delete the documents for the given ids from the Index.
doc_index.delete(document_ids)
So you might end up with something like:
while True:
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Get then something with the document
for id in document_ids:
document = index.get(id)
You'd probably want to get the document itself in the list comprehension rather then getting the ID then getting the document from that ID, but you get the idea.
Firstly, if you peek into the constructor of QueryOptions, that answers your question why it returns 20 results:
def __init__(self, limit=20, number_found_accuracy=None, cursor=None,
offset=None, sort_options=None, returned_fields=None,
ids_only=False, snippeted_fields=None,
returned_expressions=None):
The reason I think why the API is doing this is to avoid unnecessary fetching of results. You should use an offset if you need to fetch more results upon user action instead of always fetching all results. See this.
from google.appengine.api import search
...
# get the first set of results
page_size = 10
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size))
# calculate pages
pages = results.found_count / page_size
# user chooses page and hence an offset into results
next_page = ith * page_size
# get the search results for that page
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size, offset=next_page))

Retrieve all items from DynamoDB using query?

I am trying to retrieve all items in a dynamodb table using a query. Below is my code:
import boto.dynamodb2
from boto.dynamodb2.table import Table
from time import sleep
c = boto.dynamodb2.connect_to_region(aws_access_key_id="XXX",aws_secret_access_key="XXX",region_name="us-west-2")
tab = Table("rip.irc",connection=c)
x = tab.query()
for i in x:
print i
sleep(1)
However, I recieve the following error:
ValidationException: ValidationException: 400 Bad Request
{'message': 'Conditions can be of length 1 or 2 only', '__type': 'com.amazon.coral.validate#ValidationException'}
The code I have is pretty straightforward and out of the boto dynamodb2 docs, so I am not sure why I am getting the above error. Any insights would be appreciated (new to this and a bit lost). Thanks
EDIT: I have both an hash key and a range key. I am able to query by specific hash keys. For example,
x = tab.query(hash__eq="2014-01-20 05:06:29")
How can I retrieve all items though?
Ahh ok, figured it out. If anyone needs:
You can't use the query method on a table without specifying a specific hash key. The method to use instead is scan. So if I replace:
x = tab.query()
with
x = tab.scan()
I get all the items in my table.
I'm on groovy but it's gonna drop you a hint. Error :
{'message': 'Conditions can be of length 1 or 2 only'}
is telling you that your key condition can be length 1 -> hashKey only, or length 2 -> hashKey + rangeKey. All what's in a query on a top of keys will provoke this error.
The reason of this error is: you are trying to run search query but using key condition query. You have to add separate filterCondition to perform your query.
My code
String keyQuery = " hashKey = :hashKey and rangeKey between :start and :end "
queryRequest.setKeyConditionExpression(keyQuery)// define key query
String filterExpression = " yourParam = :yourParam "
queryRequest.setFilterExpression(filterExpression)// define filter expression
queryRequest.setExpressionAttributeValues(expressionAttributeValues)
queryRequest.setSelect('ALL_ATTRIBUTES')
QueryResult queryResult = client.query(queryRequest)
.scan() does not automatically return all elements of a table due to pagination of the table. There is a 1Mb max response limit Dynamodb Max response limit
Here is a recursive implementation of the boto3 scan:
import boto3
dynamo = boto3.resource('dynamodb')
def scanRecursive(tableName, **kwargs):
"""
NOTE: Anytime you are filtering by a specific equivalency attribute such as id, name
or date equal to ... etc., you should consider using a query not scan
kwargs are any parameters you want to pass to the scan operation
"""
dbTable = dynamo.Table(tableName)
response = dbTable.scan(**kwargs)
if kwargs.get('Select')=="COUNT":
return response.get('Count')
data = response.get('Items')
while 'LastEvaluatedKey' in response:
response = kwargs.get('table').scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
data.extend(response['Items'])
return data
I ran into this error when I was misusing KeyConditionExpression instead of FilterExpression when querying a dynamodb table.
KeyConditionExpression should only be used with partition key or sort key values.
FilterExpression should be used when you want filter your results even more.
However do note, using FilterExpression uses the same reads as it would without, because it performs the query based on the keyConditionExpression. It then removes items from the results based on your FilterExpression.
Source
Working with Queries
This is how I do a query if someone still needs a solution:
def method_name(a, b)
results = self.query(
key_condition_expression: '#T = :t',
filter_expression: 'contains(#S, :s)',
expression_attribute_names: {
'#T' => 'your_table_field_name',
'#S' => 'your_table_field_name'
},
expression_attribute_values: {
':t' => a,
':s' => b
}
)
results
end

Categories

Resources