AWS DynamoDB Python - boto3 Key() methods not recognized (Query) - python

I am using Lambda (Python) to query my DynamoDB database. I am using the boto3 library, and I was able to make an "equivalent" query:
This script works:
import boto3
from boto3.dynamodb.conditions import Key, Attr
import json
def create_list(event, context):
resource = boto3.resource('dynamodb')
table = resource.Table('Table_Name')
response = table.query(
TableName='Table_Name',
IndexName='Custom-Index-Name',
KeyConditionExpression=Key('Number_Attribute').eq(0)
)
return response
However, when I change the query expression to this:
KeyConditionExpression=Key('Number_Attribute').gt(0)
I get the error:
"errorType": "ClientError",
"errorMessage": "An error occurred (ValidationException) when calling the Query operation: Query key condition not supported"
According to this [1] resource, "gt" is a method of Key(). Does anyone know if this library has been updated, or what other methods are available other than "eq"?
[1] http://boto3.readthedocs.io/en/latest/reference/customizations/dynamodb.html#ref-dynamodb-conditions
---------EDIT----------
I also just tried the old method using:
response = client.query(
TableName = 'Table_Name',
IndexName='Custom_Index',
KeyConditions = {
'Custom_Number_Attribute':{
'ComparisonOperator':'EQ',
'AttributeValueList': [{'N': '0'}]
}
}
)
This worked, but when I try:
response = client.query(
TableName = 'Table_Name',
IndexName='Custom_Index',
KeyConditions = {
'Custom_Number_Attribute':{
'ComparisonOperator':'GT',
'AttributeValueList': [{'N': '0'}]
}
}
)
...it does not work.
Why would EQ be the only method working in these cases? I'm not sure what I'm missing in the documentation.

From what I think:
Your Partition Key is Number_Attribute, and so you cannot do a gt when doing a query (you can do an eq and that is it.)
You can do a gt or between for your Sort Key when doing a query. It is also called Range key, and because it "smartly" puts the items next to each other, it offers the possibility of doing gt and between efficiently in a query
Now, if you want to do a between to your partition Key, then you will have to use scan like the below:
Key('Number_Attribute').gt(0)
response = table.scan(
FilterExpression=fe
)
Keep in mind of the following concerning scan:
The scan method reads every item in the entire table, and returns all of the data in the table. You can provide an optional filter_expression, so that only the items matching your criteria are returned. However, note that the filter is only applied after the entire table has been scanned.
So in other words, it's a bit of a costly operation comparing to query. You can see an example in the documentation here.
Hope that helps!

Related

Problem querying AWS Athena from Lambda introducing a variable

I need help on a little problem that I have with my AWS Lambda function. This function queries my AWS Athena database.
The code looks like this :
import json
import boto3
import time
def lambda_handler(event, context):
client = boto3.client('athena')
QueryResponse = client.start_query_execution(
QueryString = "MY QUERY;",
QueryExecutionContext = {
'Database' : 'myDatabase'
},
ResultConfiguration = {
'OutputLocation' : 's3://mys3Bucket'
}
)
#Oberserve results :
queryId = QueryResponse['QueryExecutionId']
The code works great, but I am having some troubles with the "WHERE" part of my sql query (that is a long one)
Here is the part of my Query :
WHERE x.id_date > cast(date_format(date_trunc('day', current_timestamp -
interval '3' day), '%Y%m%d') as integer)
and x.id_date <= cast(date_format(current_timestamp, '%Y%m%d') as integer)
and c.label = 'NAME'
My query is written on a single line to fit the Python code replacing "MY QUERY".
Le problem is :
I need to replace the 'NAME' part by a variable (string) that will be given to my Lambda. I tried to use %s to replace by the given variable, but as there is '%Y%m%d' in my query, the code is waiting for string to replace these part too, but it is just made to format the date as I want to. I tried to replace NAME by a string and it works perfectly so I know my query is not the problem. I tried to put 'c.label = '%s' in first to see if it the % method would simply replace the first %s and let the other ones do their job but it didn't work.
So my question is : How can I replace 'NAME' by a str variable ?can I do this keeping my query on a single line ? (if yes, how ?) or at least how can I divide my query in different lines I could interact with ?
Thanks for your help.
As said in comment, the solution was to use :
MyString = 'my string to replace in query'
QueryString = f"SELECT * FROM {MyString};"

How to query all rows of one column in DynamoDB?

I just work with AWS DynamoDB in a short of time. I am wondering how can I get the same result with this statement (without WHERE clause):
SELECT column1 FROM DynamoTable;
I tried (but failed) with:
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('DynamoTable')
from boto3.dynamodb.conditions import Key, Attr
resp = table.query(KeyConditionExpression=Key('column1'))
It requires Key().eq() or Key().begin_with() ...
I tried with resp = table.scan() already, but the response data is too many fields while I only need column1
Thanks.
This lets you get the required column directly and you do not need to iterate over the whole dataset
import boto3
def getColumn1Items():
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('DynamoTable')
resp = table.scan(AttributesToGet=['column1'])
return resp['Items']
You should definitely use Scan operation. Check the documentation to implement it with python.
Regarding how to select just a specific attribute you could use:
import boto3
def getColumn1Items():
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('DynamoTable')
response = table.scan()
return [i['column1'] for i in response['Items']]
You have to iterate over the the entire table and just fetch the column you need.

How to do bulk insert with ordered false in mongoengine

I'm trying to insert documents in bulk, I have created a unique index in my collection and want to skip documents which are duplicate while doing bulk insertion. This can be accomplished with native mongodb function:
db.collection.insert(
<document or array of documents>,
{
ordered: <boolean>
}
)
I want to accomplish this with mongoengine, If anybody knows how to achieve this, please answer the question, thanks.
If you have a class like this:
class Foo(db.Document):
bar= db.StringField()
meta = {'indexes': [{'fields': ['bar'], 'unique': True}]}
And having a list with Foo instances foos=[Foo('a'), Foo('a'), Foo('a')]
and trying Foo.objects.insert(foos) you will get mongoengine.errors.NotUniqueError
1st woraround would be delete index from mongodb, insert duplicates, and than ensure index with {unique : true, dropDups : true}
2nd workaround would be using underlying pymongo API for bulk ops: https://docs.mongodb.com/manual/reference/method/db.collection.initializeOrderedBulkOp/#db.collection.initializeOrderedBulkOp
For now I am using raw pymongo from mongoengine as a workaround for this. This is the 2nd workaround that #Alexey Smirnov mentioned. So for a mongoengine Document class DocClass you will access the underlying pymongo collection and execute query like below:
from pymongo.errors import BulkWriteError
try:
doc_list = [doc.to_mongo() for doc in me_doc_list] # Convert ME objects to what pymongo can understand
DocClass._get_collection().insert_many(doc_list, ordered=False)
except BulkWriteError as bwe:
print("Batch Inserted with some errors. May be some duplicates were found and are skipped.")
print(f"Count is {DocClass.objects.count()}.")
except Exception as e:
print( { 'error': str(e) })

How to rename DynamoDB column/key

In one of my DynamoDb tables I have a column/key named "status", which turned out to be a reserved keyword. Unfortunately it isn't an option to delete the whole table and reinitiate it. How can I rename the key?
Here is the Lambda Code that causes the Exception:
try :
response = table.query(
IndexName='myId-index',
KeyConditionExpression=Key('myId').eq(someId)
)
for item in response['Items']:
print('Updating Item: ' + item['id'])
table.update_item(
Key={
'id': item['id']
},
UpdateExpression='SET myFirstKey = :val1, mySecondKey = :val2, myThirdKey = :val3, myFourthKey = :val4, myFifthKey = :val5, status = :val6',
ExpressionAttributeValues={
':val1': someValue1,
':val2': someValue2,
':val3': someValue3,
':val4': someValue4,
':val5': someValue5,
':val6': someValue6
}
)
except Exception, e:
print ('ok error: %s' % e)
And here is the Exception:
2016-06-14 18:47:24 UTC+2
ok error: An error occurred (ValidationException) when calling the UpdateItem operation: Invalid UpdateExpression: Attribute name is a reserved keyword; reserved keyword: status
There is no real easy way to rename a column. You will have to create a new attribute for each of the entries and then delete all the values for the existing attribute.
There is no reason to drop your attribute/column, if you are having trouble querying the table use Expression Attribute Names.
From the Expression Attribute Names documentation:
On some occasions, you might need to write an expression containing an attribute name that conflicts with a DynamoDB reserved word... To work around this, you can define an expression attribute name. An expression attribute name is a placeholder that you use in the expression, as an alternative to the actual attribute name.
There is a simple solution instead of renaiming a column: Use projection-expression and expression-attribute-names in your query.
I run over the same problem (my table contains the column "name". Here ist a sample query:
TableName: 'xxxxxxxxxx',
ExpressionAttributeNames: {
'#A': 'country',
'#B': 'postcode',
'#C': 'name'
},
ExpressionAttributeValues: {
':a': {S: 'DE'},
':c': {S: 'Peter Benz'}
},
FilterExpression: 'country = :a AND #C = :c',
ProjectionExpression: '#A, #B, #C'
Using the NoSQL Workbench app AWS supports I exported a copy of the table which had the column name(s) I wanted to change. I opened the JSON file NoSQL Workbench created then did a simple Find/Replace for the name of the column in question.
With the names looking correct on the .JSON file I re-imported the table back into dynamodb using the NoSQL app. The import overwrites existing data which will wipe out the bad column name.
If you have a HUGE data set, downloading a copy to your local computer may not be a good solution, but for my small table it worked pretty well.

How do I query AWS DynamoDB in python?

I'm fairly new to NoSQL and using AWS DynamoDB. I'm calling it from AWS Lambda using python 2.7
I'm trying to retrieve a value from an order_number field.
This is what my table looks like(only have one record.):
primary partition key: subscription_id
and my secondary global index: order_number
Is my setup correct?
If so given the order_number how do I retrieve the record using python?
I can't figure out the syntax to do it.
I've tried
response = table.get_item( Key = {'order_number': myordernumber} )
But i get:
An error occurred (ValidationException) when calling the GetItem operation: The provided key element does not match the schema: ClientError
DynamoDB does not automatically index all of the fields of your object. By default you can define a hash key (subscription_id in your case) and, optionally, a range key and those will be indexed. So, you could do this:
response = table.get_item(Key={'subscription_id': mysubid})
and it will work as expected. However, if you want to retrieve an item based on order_number you would have to use a scan operation which looks through all items in your table to find the one(s) with the correct value. This is a very expensive operation. Or you could create a Global Secondary Index in your table that uses order_number as the primary key. If you did that and called the new index order_number-index you could then query for objects that match a specific order number like this:
from boto3.dynamodb.conditions import Key, Attr
response = table.query(
IndexName='order_number-index',
KeyConditionExpression=Key('order_number').eq(myordernumber))
DynamoDB is an very fast, scalable, and efficient database but it does require a lot of thought about what fields you might want to search on and how to do that efficiently.
The good news is that now you can add GSI's to an existing table. Previously you would have had to delete your table and start all over again.
Make sure you've imported this:
from boto3.dynamodb.conditions import Key, Attr
If you don't have it, you'll get the error for sure. It's in the documentation examples.
Thanks #altoids for the comment above as this is the correct answer for me. I wanted to bring attention to it with a "formal" answer.
To query dynamodb using Index with filter:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb', region_name=region)
table = dynamodb.Table('<TableName>')
response = table.query(
IndexName='<Index>',
KeyConditionExpression=Key('<key1>').eq('<value>') & Key('<key2>').eq('<value>'),
FilterExpression=Attr('<attr>').eq('<value>')
)
print(response['Items'])
If filter is not rquired then don't use FilterExpression in query.
import boto3
from boto3.dynamodb.conditions import Key
dynamodb = boto3.resource('dynamodb', region_name=region_name)
table = dynamodb.Table(tableName)
def queryDynamo(pk, sk):
response = table.query(
ProjectionExpression="#pk, #sk, keyA, keyB",
ExpressionAttributeNames={"#pk": "pk", "#sk": "sk"},
KeyConditionExpression=
Key('pk').eq(pk) & Key('sk').eq(sk)
)
return response['Items']
If you use the boto3 dynamodb client, you can do the following (again you would need to use subscription_id as that is the primary key):
dynamodb = boto3.client('dynamodb')
response = dynamodb.query(
TableName='recurring_charges',
KeyConditionExpression="subscription_id = :subscription_id",
ExpressionAttributeValues={":subscription_id": {"S": "id"}}
)
So far, this is the cleanest way I've discovered; the query is in JSON format.
dynamodb_client = boto3.client('dynamodb')
def query_items():
arguments = {
"TableName": "your_dynamodb_table",
"IndexName": "order_number-index",
"KeyConditionExpression": "order_number = :V1",
"ExpressionAttributeValues": {":V1": {"S": "value"}},
}
return dynamodb_client.query(**arguments)

Categories

Resources