AWS DynamoDB ExclusiveStartKey default value - python

I'm trying to make a query to DynamoDB, and if a LastEvaluatedKey is returned (meaning the query exceeds 1 MB) I want to make other queries in order to fetch all the required data from the table, using LastEvaluatedKey as ExclusiveStartKey for the next query.
This is the code I have for now:
query_response = table.query(
KeyConditionExpression=Key('brand').eq(brand)
)
pagination_key = None
if 'LastEvaluatedKey' in query_response:
pagination_key = query_response['LastEvaluatedKey']
while pagination_key:
next_query_response = table.query(
KeyConditionExpression=Key('brand').eq(brand),
ExclusiveStartKey=pagination_key
)
However, I'd like to refacto this code by extracting the query into a method, passing it pagination_key as an argument. To do this, I'd have to be able to either set ExclusiveStartKey to False, None or some other default value for the first call but I didn't find anything on this, or I'd have to be able to exclude the ExclusiveStartKey alltogether, but I don't know how to do this either.

Using keyword Arguments **kwargs it might look like this. Also,
I am setting up the query before and only updating the ExclusiveStartKeyevery time.
query = { "KeyConditionExpression": Key('brand').eq(brand) }
ExclusiveStartKey = None
while True:
if ExclusiveStartKey is not None:
query['ExclusiveStartKey'] = ExclusiveStartKey
query_response = table.query(**query)
if 'LastEvaluatedKey' in query_response:
ExclusiveStartKey = query_response['LastEvaluatedKey']
else:
break

I found an easy way of building the parameters:
query_params = { 'KeyConditionExpression': Key('brand').eq(brand) }
if pagination_key:
query_params['ExclusiveStartKey'] = pagination_key
query_response = table.query(query_params)

Related

Filtering with SQLAlchemy

I am new to ORM's and trying to query a table with a timestamp column. My results however are empty and understandably so since I am querying a timestamp field with a date. I read and found out I can use 'sqlalchemy.sql import func' but my filter is dynamically created based on query params so I was wondering how to go about it.
Code for query model:
def merch_trans_sum_daily_summaries(db_engine, query_params):
query_filters = get_filters(query_params)
page, limit, filters = query_filters.page, query_filters.limit, query_filters.filters
strict_limit = query_filters.strict_limit
with Session(db_engine) as sess:
results = paginate(sess.query(MerchTransSumDaily)
.filter_by(**filters).yield_per(1000),
page, limit, strict_limit)
metadata = results.metadata
query_data = results.data
if not query_data:
raise exc.NoResultFound
data = [record._asdict() for record in query_data]
return data, metadata
Here is my get_filters function
def get_filters(query_parameters, strict_limit=100, default_page=1):
if query_parameters and "batch_type" in query_parameters:
query_parameters.pop('batch_type')
limit = int(query_parameters["limit"]) if query_parameters and "limit" in query_parameters else strict_limit
page = int(query_parameters["page"]) if query_parameters and "page" in query_parameters else default_page
filters = ""
if query_parameters:
filters = {key_: value_ for key_, value_ in query_parameters.items() if key_ not in ["page", "limit", "paginate", "filter"]}
return QueryFilters(limit, page, filters, strict_limit)

Perform $gte and $lt on the same field _id in MongoDB

db.comments.find({"_id" : {"$gte": ObjectId("6225f932a7bce76715a9f3bd"), "$lt":ObjectId("6225f932a7bce76715a9f3bd")}}).sort({"created_datetime":1}).limit(10).pretty()
I am using this query which should give me the current "6225f932a7bce76715a9f3bd" doc, 4 docs inserted before this and 5 docs inserted after this. But currently when i run this query, i get null result. Where am i going wrong ??
I had no other option but to seperate my queries in order to achieve my expectation.
query = request.args.to_dict()
find_query = {}
find_query["_id"] = {"$lt": ObjectId(query["comment_id"])}
previous_comments = list(db.comments.find(find_query))
find_query["_id"] = {"$gte": ObjectId(query["comment_id"])}
next_comments = list(db.comments.find(find_query))
previous_comments.extend(next_comments)
return {"comments":previous_comments}

SQLAlchemy - How to filter_by multiple dynamic OR values?

I have a pretty reasonable use case: Multiple possible filter_by matches for a single column. Basically, a multiselect JS dropdown on front end posts multiple company industries to the backend. I need to know how to write the SQLAlchemy query and am surprised at how I couldn't find it.
{ filters: { type: "Industry", minmax: false, value: ["Financial Services", "Biotechnology"] } }
#app.route("/dev/api/saved/symbols", methods=["POST"])
#cross_origin(origin="*")
def get_saved_symbols():
req = request.get_json()
# res = None
# if "minmax" in req["filters"]:
# idx = req["filters"].index("minmax")
# if req["filters"][idx] == "min":
# res = db.session.query.filter(Company[req["filter"]["type"]] >= req["filters"]["value"])
# else:
# res = db.session.query.filter(Company[req["filter"]["type"]] <= req["filters"]["value"])
# else:
res = db.session.query.filter_by(Company[req["filters"]["type"]] == req["filters"]["value"])
return jsonify(res)
As you can see I am also working on a minmax which is like an above or below filter for other columns like price or market cap. However, the multiselect OR dynamic statement is really what I am stuck on...
I ended up creating a separate filter function for this that I can than loop over results with.
I will just show the first case for brevity. I am sending a list of strings in which I create a list of filters and then use the or_ operator imported from sqlalchemy package.
def company_filter(db, filter_type, filter_value, minmax):
match filter_type:
case "industry":
filter_list = []
for filter in filter_value:
filter_list.append(Company.industry == filter)
return db.query(Company).with_entities(Company.id, Company.symbol, Company.name, Company.monthly_exp).filter(or_(*filter_list))
...

Incrementing a counter in DynamoDB when value to be updated is in a map field

I have a lambda function that needs to retrieve an item from DynamoDB and update the counter of that item. But..
The DynamoDB table is structured as:
id: int
options: map
some_option: 0
some_other_option: 0
I need to first retrieve the item of the table that has a certain id and a certain option listed as a key in the options.
Then I want to increment that counter by some value.
Here is what I have so far:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('options')
response = None
try:
response = table.get_item(Key={'id': id})
except ClientError as e:
print(e.response['Error']['Message'])
option = response.get('Item', None)
if option:
option['options'][some_option] = int(option['options'][some_option]) + some_value
# how to update item in DynamoDB now?
My issues is how to update the record now and more importantly will such solution cause data races? Could 2 simultaneous lambda calls that try to update the same item at the same option cause data races? If so what's the way to solve this?
Any pointers/help is appreciated.
Ok, I found the answer:
All I need is:
response = table.update_item(
Key={
'id': my_id,
},
UpdateExpression='SET options.#s = options.#s + :val',
ExpressionAttributeNames={
"#s": my_option
},
ExpressionAttributeValues={
':val': Decimal(some_value)
},
ReturnValues="UPDATED_NEW"
)
This is inspired from Step 3.4: Increment an Atomic Counter which provides an atomic approach to increment values. According to the documentation:
DynamoDB supports atomic counters, which use the update_item method to
increment or decrement the value of an existing attribute without
interfering with other write requests. (All write requests are applied
in the order in which they are received.)

Cosmos DB - Delete Document with Python

In this SO question I had learnt that I cannot delete a Cosmos DB document using SQL.
Using Python, I believe I need the DeleteDocument() method. This is how I'm getting the document ID's that are required (I believe) to then call the DeleteDocument() method.
# set up the client
client = document_client.DocumentClient()
# use a SQL based query to get a bunch of documents
query = { 'query': 'SELECT * FROM server s' }
result_iterable = client.QueryDocuments('dbs/DB/colls/coll', query, options)
results = list(result_iterable);
for x in range(0, len (results)):
docID = results[x]['id']
Now, at this stage I want to call DeleteDocument().
The inputs into which are document_link and options.
I can define document_link as something like
document_link = 'dbs/DB/colls/coll/docs/'+docID
And successfully call ReadAttachments() for example, which has the same inputs as DeleteDocument().
When I do however, I get an error...
The partition key supplied in x-ms-partitionkey header has fewer
components than defined in the the collection
...and now I'm totally lost
UPDATE
Following on from Jay's help, I believe I'm missing the partitonKey element in the options.
In this example, I've created a testing database, it looks like this
So I think my partition key is /testPART
When I include the partitionKey in the options however, no results are returned, (and so print len(results) outputs 0).
Removing partitionKey means that results are returned, but the delete attempt fails as before.
# Query them in SQL
query = { 'query': 'SELECT * FROM c' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
options['partitionKey'] = '/testPART'
result_iterable = client.QueryDocuments('dbs/testDB/colls/testCOLL', query, options)
results = list(result_iterable)
# should be > 0
print len(results)
for x in range(0, len (results)):
docID = results[x]['id']
print docID
client.DeleteDocument('dbs/testDB/colls/testCOLL/docs/'+docID, options=options)
print 'deleted', docID
According to your description, I tried to use pydocument module to delete document in my azure document db and it works for me.
Here is my code:
import pydocumentdb;
import pydocumentdb.document_client as document_client
config = {
'ENDPOINT': 'Your url',
'MASTERKEY': 'Your master key',
'DOCUMENTDB_DATABASE': 'familydb',
'DOCUMENTDB_COLLECTION': 'familycoll'
};
# Initialize the Python DocumentDB client
client = document_client.DocumentClient(config['ENDPOINT'], {'masterKey': config['MASTERKEY']})
# use a SQL based query to get a bunch of documents
query = { 'query': 'SELECT * FROM server s' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
result_iterable = client.QueryDocuments('dbs/familydb/colls/familycoll', query, options)
results = list(result_iterable);
print(results)
client.DeleteDocument('dbs/familydb/colls/familycoll/docs/id1',options)
print 'delete success'
Console Result:
[{u'_self': u'dbs/hitPAA==/colls/hitPAL3OLgA=/docs/hitPAL3OLgABAAAAAAAAAA==/', u'myJsonArray': [{u'subId': u'sub1', u'val': u'value1'}, {u'subId': u'sub2', u'val': u'value2'}], u'_ts': 1507687788, u'_rid': u'hitPAL3OLgABAAAAAAAAAA==', u'_attachments': u'attachments/', u'_etag': u'"00002100-0000-0000-0000-59dd7d6c0000"', u'id': u'id1'}, {u'_self': u'dbs/hitPAA==/colls/hitPAL3OLgA=/docs/hitPAL3OLgACAAAAAAAAAA==/', u'myJsonArray': [{u'subId': u'sub3', u'val': u'value3'}, {u'subId': u'sub4', u'val': u'value4'}], u'_ts': 1507687809, u'_rid': u'hitPAL3OLgACAAAAAAAAAA==', u'_attachments': u'attachments/', u'_etag': u'"00002200-0000-0000-0000-59dd7d810000"', u'id': u'id2'}]
delete success
Please notice that you need to set the enableCrossPartitionQuery property to True in options if your documents are cross-partitioned.
Must be set to true for any query that requires to be executed across
more than one partition. This is an explicit flag to enable you to
make conscious performance tradeoffs during development time.
You could find above description from here.
Update Answer:
I think you misunderstand the meaning of partitionkey property in the options[].
For example , my container is created like this:
My documents as below :
{
"id": "1",
"name": "jay"
}
{
"id": "2",
"name": "jay2"
}
My partitionkey is 'name', so here I have two paritions : 'jay' and 'jay1'.
So, here you should set the partitionkey property to 'jay' or 'jay2',not 'name'.
Please modify your code as below:
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
options['partitionKey'] = 'jay' (please change here in your code)
result_iterable = client.QueryDocuments('dbs/db/colls/testcoll', query, options)
results = list(result_iterable);
print(results)
Hope it helps you.
Using the azure.cosmos library:
install and import azure cosmos package:
from azure.cosmos import exceptions, CosmosClient, PartitionKey
define delete items function - in this case using the partition key in query:
def deleteItems(deviceid):
client = CosmosClient(config.cosmos.endpoint, config.cosmos.primarykey)
# Create a database if not exists
database = client.create_database_if_not_exists(id=azure-cosmos-db-name)
# Create a container
# Using a good partition key improves the performance of database operations.
container = database.create_container_if_not_exists(id=container-name, partition_key=PartitionKey(path='/your-pattition-path'), offer_throughput=400)
#fetch items
query = f"SELECT * FROM c WHERE c.device.deviceid IN ('{deviceid}')"
items = list(container.query_items(query=query, enable_cross_partition_query=False))
for item in items:
container.delete_item(item, 'partition-key')
usage:
deviceid=10
deleteItems(items)
github full example here: https://github.com/eladtpro/python-iothub-cosmos

Categories

Resources