How do I query AWS DynamoDB in python? - python

I'm fairly new to NoSQL and using AWS DynamoDB. I'm calling it from AWS Lambda using python 2.7
I'm trying to retrieve a value from an order_number field.
This is what my table looks like(only have one record.):
primary partition key: subscription_id
and my secondary global index: order_number
Is my setup correct?
If so given the order_number how do I retrieve the record using python?
I can't figure out the syntax to do it.
I've tried
response = table.get_item( Key = {'order_number': myordernumber} )
But i get:
An error occurred (ValidationException) when calling the GetItem operation: The provided key element does not match the schema: ClientError

DynamoDB does not automatically index all of the fields of your object. By default you can define a hash key (subscription_id in your case) and, optionally, a range key and those will be indexed. So, you could do this:
response = table.get_item(Key={'subscription_id': mysubid})
and it will work as expected. However, if you want to retrieve an item based on order_number you would have to use a scan operation which looks through all items in your table to find the one(s) with the correct value. This is a very expensive operation. Or you could create a Global Secondary Index in your table that uses order_number as the primary key. If you did that and called the new index order_number-index you could then query for objects that match a specific order number like this:
from boto3.dynamodb.conditions import Key, Attr
response = table.query(
IndexName='order_number-index',
KeyConditionExpression=Key('order_number').eq(myordernumber))
DynamoDB is an very fast, scalable, and efficient database but it does require a lot of thought about what fields you might want to search on and how to do that efficiently.
The good news is that now you can add GSI's to an existing table. Previously you would have had to delete your table and start all over again.

Make sure you've imported this:
from boto3.dynamodb.conditions import Key, Attr
If you don't have it, you'll get the error for sure. It's in the documentation examples.
Thanks #altoids for the comment above as this is the correct answer for me. I wanted to bring attention to it with a "formal" answer.

To query dynamodb using Index with filter:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb', region_name=region)
table = dynamodb.Table('<TableName>')
response = table.query(
IndexName='<Index>',
KeyConditionExpression=Key('<key1>').eq('<value>') & Key('<key2>').eq('<value>'),
FilterExpression=Attr('<attr>').eq('<value>')
)
print(response['Items'])
If filter is not rquired then don't use FilterExpression in query.

import boto3
from boto3.dynamodb.conditions import Key
dynamodb = boto3.resource('dynamodb', region_name=region_name)
table = dynamodb.Table(tableName)
def queryDynamo(pk, sk):
response = table.query(
ProjectionExpression="#pk, #sk, keyA, keyB",
ExpressionAttributeNames={"#pk": "pk", "#sk": "sk"},
KeyConditionExpression=
Key('pk').eq(pk) & Key('sk').eq(sk)
)
return response['Items']

If you use the boto3 dynamodb client, you can do the following (again you would need to use subscription_id as that is the primary key):
dynamodb = boto3.client('dynamodb')
response = dynamodb.query(
TableName='recurring_charges',
KeyConditionExpression="subscription_id = :subscription_id",
ExpressionAttributeValues={":subscription_id": {"S": "id"}}
)

So far, this is the cleanest way I've discovered; the query is in JSON format.
dynamodb_client = boto3.client('dynamodb')
def query_items():
arguments = {
"TableName": "your_dynamodb_table",
"IndexName": "order_number-index",
"KeyConditionExpression": "order_number = :V1",
"ExpressionAttributeValues": {":V1": {"S": "value"}},
}
return dynamodb_client.query(**arguments)

Related

How to get/fetch certain columns from dynamoDB using python in Lambda?

I have a table called 'DATA' in dynamodb where I have 20 to 25 columns. But I need to pull only 3 columns from dynamodb.
Required columns are status, ticket_id and country
table_name = 'DATA'
# dynamodb client
dynamodb_client = boto3.client('dynamodb')
Required columns are status, ticket_id
I'm able to achieve using scan as provided below. But I want to do the same using query method.
response = table.scan(AttributesToGet=['ticket_id','ticket_status'])
I tried the below code with query method. But I'm getting error.
response = table.query(ProjectionExpression=['ticket_id','ticket_status']),keyConditionExpression('opco_type').eq('cwc') or keyConditionExpression('opco_type').eq('cwp'))
Is there any way of getting only required columns from dynamo?
As already commented, you need to use ProjectExpression:
dynamodb = boto3.resource('dynamodb', region_name=region)
table = dynamodb.Table(table_name)
item = table.get_item(Key={'Title': 'Scarface', 'Year': 1983}, ProjectionExpression='status, ticket_id, country')
Some things to note:
It is better to use resource instead of client. This will avoid special dynamodb json syntax.
You need to set the full (composite) key to get_item
Selected columns should be in a comma-separated string
It is a good idea to always use expression attribute names:
item = table.get_item(Key={'Title': 'Scarface', 'Year': 1983},
ProjectionExpression='#status, ticket_id, country',
ExpressionAttributeNames={'#status': 'status'})

How to query all rows of one column in DynamoDB?

I just work with AWS DynamoDB in a short of time. I am wondering how can I get the same result with this statement (without WHERE clause):
SELECT column1 FROM DynamoTable;
I tried (but failed) with:
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('DynamoTable')
from boto3.dynamodb.conditions import Key, Attr
resp = table.query(KeyConditionExpression=Key('column1'))
It requires Key().eq() or Key().begin_with() ...
I tried with resp = table.scan() already, but the response data is too many fields while I only need column1
Thanks.
This lets you get the required column directly and you do not need to iterate over the whole dataset
import boto3
def getColumn1Items():
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('DynamoTable')
resp = table.scan(AttributesToGet=['column1'])
return resp['Items']
You should definitely use Scan operation. Check the documentation to implement it with python.
Regarding how to select just a specific attribute you could use:
import boto3
def getColumn1Items():
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('DynamoTable')
response = table.scan()
return [i['column1'] for i in response['Items']]
You have to iterate over the the entire table and just fetch the column you need.

get the last modified date of tables using bigquery tables GET api

I am trying to get the list of tables and their last_modified_date using bigquery REST API.
In the bigquery API explorer I am getting all the fields correctly but when I use the api from Python code its returning 'None' for modified date.
This is the code written for the same in python
from google.cloud import bigquery
client = bigquery.Client(project='temp')
datasets = list(client.list_datasets())
for dataset in datasets:
print dataset.dataset_id
for dataset in datasets:
for table in dataset.list_tables():
print table.table_id
print table.created
print table.modified
In this code I am getting created date correctly but modified date is 'None' for all the tables.
Not quite sure which version of the API you are using but I suspect the latest versions do not have the method dataset.list_tables().
Still, this is one way of getting last modified field, see if this works for you (or gives you some idea on how to get this data):
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json('/key.json')
dataset_list = list(client.list_datasets())
for dataset_item in dataset_list:
dataset = client.get_dataset(dataset_item.reference)
tables_list = list(client.list_tables(dataset))
for table_item in tables_list:
table = client.get_table(table_item.reference)
print "Table {} last modified: {}".format(
table.table_id, table.modified)
If you want to get the last modified time from only one table:
from google.cloud import bigquery
def get_last_bq_update(project, dataset, table_name):
client = bigquery.Client.from_service_account_json('/key.json')
table_id = f"{project}.{dataset}.{table_name}"
table = client.get_table(table_id)
print(table.modified)

How to do bulk insert with ordered false in mongoengine

I'm trying to insert documents in bulk, I have created a unique index in my collection and want to skip documents which are duplicate while doing bulk insertion. This can be accomplished with native mongodb function:
db.collection.insert(
<document or array of documents>,
{
ordered: <boolean>
}
)
I want to accomplish this with mongoengine, If anybody knows how to achieve this, please answer the question, thanks.
If you have a class like this:
class Foo(db.Document):
bar= db.StringField()
meta = {'indexes': [{'fields': ['bar'], 'unique': True}]}
And having a list with Foo instances foos=[Foo('a'), Foo('a'), Foo('a')]
and trying Foo.objects.insert(foos) you will get mongoengine.errors.NotUniqueError
1st woraround would be delete index from mongodb, insert duplicates, and than ensure index with {unique : true, dropDups : true}
2nd workaround would be using underlying pymongo API for bulk ops: https://docs.mongodb.com/manual/reference/method/db.collection.initializeOrderedBulkOp/#db.collection.initializeOrderedBulkOp
For now I am using raw pymongo from mongoengine as a workaround for this. This is the 2nd workaround that #Alexey Smirnov mentioned. So for a mongoengine Document class DocClass you will access the underlying pymongo collection and execute query like below:
from pymongo.errors import BulkWriteError
try:
doc_list = [doc.to_mongo() for doc in me_doc_list] # Convert ME objects to what pymongo can understand
DocClass._get_collection().insert_many(doc_list, ordered=False)
except BulkWriteError as bwe:
print("Batch Inserted with some errors. May be some duplicates were found and are skipped.")
print(f"Count is {DocClass.objects.count()}.")
except Exception as e:
print( { 'error': str(e) })

AWS DynamoDB Python - boto3 Key() methods not recognized (Query)

I am using Lambda (Python) to query my DynamoDB database. I am using the boto3 library, and I was able to make an "equivalent" query:
This script works:
import boto3
from boto3.dynamodb.conditions import Key, Attr
import json
def create_list(event, context):
resource = boto3.resource('dynamodb')
table = resource.Table('Table_Name')
response = table.query(
TableName='Table_Name',
IndexName='Custom-Index-Name',
KeyConditionExpression=Key('Number_Attribute').eq(0)
)
return response
However, when I change the query expression to this:
KeyConditionExpression=Key('Number_Attribute').gt(0)
I get the error:
"errorType": "ClientError",
"errorMessage": "An error occurred (ValidationException) when calling the Query operation: Query key condition not supported"
According to this [1] resource, "gt" is a method of Key(). Does anyone know if this library has been updated, or what other methods are available other than "eq"?
[1] http://boto3.readthedocs.io/en/latest/reference/customizations/dynamodb.html#ref-dynamodb-conditions
---------EDIT----------
I also just tried the old method using:
response = client.query(
TableName = 'Table_Name',
IndexName='Custom_Index',
KeyConditions = {
'Custom_Number_Attribute':{
'ComparisonOperator':'EQ',
'AttributeValueList': [{'N': '0'}]
}
}
)
This worked, but when I try:
response = client.query(
TableName = 'Table_Name',
IndexName='Custom_Index',
KeyConditions = {
'Custom_Number_Attribute':{
'ComparisonOperator':'GT',
'AttributeValueList': [{'N': '0'}]
}
}
)
...it does not work.
Why would EQ be the only method working in these cases? I'm not sure what I'm missing in the documentation.
From what I think:
Your Partition Key is Number_Attribute, and so you cannot do a gt when doing a query (you can do an eq and that is it.)
You can do a gt or between for your Sort Key when doing a query. It is also called Range key, and because it "smartly" puts the items next to each other, it offers the possibility of doing gt and between efficiently in a query
Now, if you want to do a between to your partition Key, then you will have to use scan like the below:
Key('Number_Attribute').gt(0)
response = table.scan(
FilterExpression=fe
)
Keep in mind of the following concerning scan:
The scan method reads every item in the entire table, and returns all of the data in the table. You can provide an optional filter_expression, so that only the items matching your criteria are returned. However, note that the filter is only applied after the entire table has been scanned.
So in other words, it's a bit of a costly operation comparing to query. You can see an example in the documentation here.
Hope that helps!

Categories

Resources