DynamoDB Query for users with expired IP addresses - python

So I have a DynamoDB database table which looks like this (exported to csv):
"email (S)","created_at (N)","firstName (S)","ip_addresses (L)","lastName (S)","updated_at (N)"
"name#email","1628546958.837838381","ddd","[ { ""M"" : { ""expiration"" : { ""N"" : ""1628806158"" }, ""IP"" : { ""S"" : ""127.0.0.1"" } } }]","ddd","1628546958.837940533"
I want to be able to do a "query" not a "scan" for all of the IP's (attribute attached to users) which are expired. The time is stored in unix time.
Right now I'm scanning the entire table and looking through each user, one by one and then I loop through all of their IPs to see if they are expired or not. But I need to do this using a query, scans are expensive.
The table layout is like this:
primaryKey = email
attributes = firstName, lastName, ip_addresses (array of {} maps where each map has IP, and Expiration as two keys).
I have no idea how to do this using a query so I would greatly appreciate if anyone could show me how! :)
I'm currently running the scan using python and boto3 like this:
response = client.scan(
TableName='users',
Select='SPECIFIC_ATTRIBUTES',
AttributesToGet=[
'ip_addresses',
])

As per the boto3 documentation, The Query operation finds items based on primary key values. You can query any table or secondary index that has a composite primary key (a partition key and a sort key).
Use the KeyConditionExpression parameter to provide a specific value for the partition key. The Query operation will return all of the items from the table or index with that partition key value. You can optionally narrow the scope of the Query operation by specifying a sort key value and a comparison operator in KeyConditionExpression . To further refine the Query results, you can optionally provide a FilterExpression . A FilterExpression determines which items within the results should be returned to you. All of the other results are discarded.
So long story short, it will only work to fetch a particular row whose primary key you have mentioned while running query.
A Query operation always returns a result set. If no matching items are found, the result set will be empt

Related

Retrieving data from a DynamoDB using Python

A newbie to DynamoDb and python in general here. I have a task to complete where I have to retrieve information from a DynamoDB using some information provided. I've set up the access keys and such and I've been provided a 'Hash Key' and a table name. I'm looking for a way to use the hash key in order to retrieve the information but I haven't been able to find something specific online.
#Table Name
table_name = 'Waypoints'
#Dynamodb client
dynamodb_client = boto3.client('dynamodb')
#Hash key
hash_key = {
''
}
#Retrieve items
response = dynamodb_client.get_item(TableName = table_name, Key = hash_key)
Above is what I have writtenbut that doesn't work. Get item only returns one_item from what I can gather but I'm not sure what to pass on to make it work in the first place.
Any sort of help would be greatly appreaciated.
First of all, in get_item() request the key should not be just the key's value, but rather a map with the key's name and value. For example, if your hash-key attribute is called "p", the Key you should pass would be {'p': hash_key}.
Second, is the hash key the entire key in your table? If you also have a sort key, in a get_item() you must also specify that part of the key - and the result is one item. If you are looking for all the items with a particular hash key but different sort keys, then the function you need to use is query(), not get_item().

Best practice for DynamoDB composite primary key travelling inside the system (partition key and sort key)

I am working on a system where I am storing data in DynamoDB and it has to be sorted chronologically. For partition_key I have an id (uuid) and for sort_key I have a date_created value. Now originally it was enough to save unique entries using only the ID, but then a problem arose that this data was not being sorted as I wanted, so a sort_key was added.
Using python boto3 library, it would be enough for me to get, update or delete items using only the id primary key since I know that it is always unique:
import boto3
resource = boto3.resource('dynamodb')
table = resource.Table('my_table_name')
table.get_item(
Key={'item_id': 'unique_item_id'}
)
table.update_item(
Key={'item_id': 'unique_item_id'}
)
table.delete_item(
Key={'item_id': 'unique_item_id'}
)
However, DynamoDB requires a sort key to be provided as well, since primary keys are composed partition key and sort key.
table.get_item(
Key={
'item_id': 'unique_item_id',
'date_created': 12345 # timestamp
}
)
First of all, is it the right approach to use sort key to sort data chronologically or are there better approaches?
Secondly, what would be the best approach for transmitting partition key and sort key across the system? For example I have an API endpoint which accepts the ID, by this ID the backend performs a get_item query and returns the corresponding data. Now since I also need the sort key, I was thinking about using a hashing algorithm internally, where I would hash a JSON like this:
{
"item_id": "unique_item_id",
"date_created": 12345
}
and a single value then becomes my identifier for this database entry. I would then dehash this value before performing any database queries. Is this the approach common?
First of all, is it the right approach to use sort key to sort data chronologically
Sort keys are the means of sorting data in DynamoDB. Using a timestamp as a sort key field is the right thing to do, and a common pattern in DDB.
DynamoDB requires a sort key to be provided ... since primary keys are composed partition key and sort key.
This is true. However, when reading from DDB it is possible to specify only the partition key using the query operation (as opposed to theget_item operation which requires the full primary key). This is a powerful construct that lets you specify which items you want to read from a given partition.
You may want to look into KSUIDs for your unique identifiers. KSUIDs are like UUIDs, but they contain a time component. This allows them to be sorted by generation time. There are several KSUID libraries in python, so you don't need to implement the algorithm yourself.

Is it possible to filter a DynamoDB query result in python?

The format of my data looks like this
{
ID:'some uuid'
Email:'some#email.com',
Tags=[tag1,tag2,tag3...],
Content:' some content'
}
The partition key is ID and the sort key is Email
I created a secondary index of email which is "email_index" if I only want to query by Email,
Now I want to query data both by Email and by a specific tag
For example I want to find all data that Email='some#email.com' and Tags contains 'tag2',
I want to first query by "email_index"
result=table.query(
IndexName='EmailIndex',
KeyConditionExpression='Email=:email',
ExpressionAttributeValues={
':email':'some#email.com'
}
)['Items']
then scan the result with Attr('Tags').contains('tag2')
So is it possible to do both at the same time? Or I have to write a loop to filter query results in Python?
Tags can be a tricky use case for DynamoDB.
One option is to use a FilterExpression on your query operation
result=table.query(
IndexName='EmailIndex',
KeyConditionExpression='Email=:email',
FilterExpression: 'contains(Tags, :tag)',
ExpressionAttributeValues={
':email':'some#email.com',
':tag': 'tag1'
}
)['Items']
Another option, as you've outlined, is to do the check in your application code.
If this isn't flexible enough for your use case, you may want to look into a more robust search solution like Elasticsearch.

Document found on table but not present on an global secondary index

really wondering what's going on here.
We have stored documents in the (abreviated) form:
{
"a_certain_id": "259217078123",
"name": "company name",
"vat_number": "BE0912111111"
}
the pk in the table is the "vat_number" property, but we also have a global secondary index on "a_certain_id".
When we perform a query on the table with the vat number, we get the document as expected.
We then perform a query on the secondary index with the copy-pasted property from the document,
and we find no document.
We then perform a scan with the vat number described above, no document is found.
I can only conclude that the document doesn't exist in the index!
Is there a way to manage this, such as repopulating the index or is there something wrong with the chosen hash key / pk? It shouldn't acccording to the documentation.
We do the queries in the following form:
key_condition_expression = Key(hash_key).eq(hash_value)
query_args = {"IndexName": index, "KeyConditionExpression": key_condition_expression}
result = dynamo_table.query(**query_args)
but that should not matter as we get the same result either via boto3 or via the aws console client.
And the query is working for most companies, it's only companies which are not in the index, apparently.

boto dynamodb batch_write and delete_item -- 'The provided key element does not match the schema'

I'm trying to delete a large number of items in a DynamoDB table using boto and python. My Table is set up with the primary key as a device ID (think MAC address.) There are multiple entries in the table for each device ID, as the secondary key is a UNIX timestamp.
From my reading this code should work:
from boto.dynamodb2.table import Table
def delete_batch(self, guid):
table = Table('Integers')
with table.batch_write() as batch:
batch.delete_item(Id=guid)
Source: http://docs.pythonboto.org/en/latest/dynamodb2_tut.html#batch-writing
However it returns 'The provided key element does not match the schema' as the error message.
I suspect the problem is because guid is not unique in my table.
Given that, is there way to delete multiple items with the same primary key without specifying the secondary key?
You are providing only the hash part of the key and not an item (hash+range) - this is why you get an error and can't delete items.
You can't ask DynamoDB to delete all items with a hash key (the same way Query gets them all)
Read this answer by Steffen for more information

Categories

Resources