sed recognition response to DynamoDB table using Lambda-python - python

I am using Lambda to detect faces and would like to send the response to a Dynamotable.
This is the code I am using:
rekognition = boto3.client('rekognition', region_name='us-east-1')
dynamodb = boto3.client('dynamodb', region_name='us-east-1')
# --------------- Helper Functions to call Rekognition APIs ------------------
def detect_faces(bucket, key):
response = rekognition.detect_faces(Image={"S3Object": {"Bucket": bucket,
"Name": key}}, Attributes=['ALL'])
TableName = 'table_test'
for face in response['FaceDetails']:
table_response = dynamodb.put_item(TableName=TableName, Item='{0} - {1}%')
return response
My problem is in this line:
for face in response['FaceDetails']:
table_response = dynamodb.put_item(TableName=TableName, Item= {'key:{'S':'value'}, {'S':'Value')
I am able to see the result in the console.
I don't want to add specific item(s) to the table- I need the whole response to be transferred to the table.
Do do this:
1. What to add as a key and partition key in the table?
2. How to transfer the whole response to the table
i have been stuck in this for three days now and can't figure out any result. Please help!
******************* EDIT *******************
I tried this code:
rekognition = boto3.client('rekognition', region_name='us-east-1')
# --------------- Helper Functions to call Rekognition APIs ------------------
def detect_faces(bucket, key):
response = rekognition.detect_faces(Image={"S3Object": {"Bucket": bucket,
"Name": key}}, Attributes=['ALL'])
TableName = 'table_test'
for face in response['FaceDetails']:
face_id = str(uuid.uuid4())
Age = face["AgeRange"]
Gender = face["Gender"]
print('Generating new DynamoDB record, with ID: ' + face_id)
print('Input Age: ' + Age)
print('Input Gender: ' + Gender)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['test_table'])
table.put_item(
Item={
'id' : face_id,
'Age' : Age,
'Gender' : Gender
}
)
return response
It gave me two of errors:
1. Error processing object xxx.jpg
2. cannot concatenate 'str' and 'dict' objects
Can you pleaaaaase help!

When you create a Table in DynamoDB, you must specify, at least, a Partition Key. Go to your DynamoDB table and grab your partition key. Once you have it, you can create a new object that contains this partition key with some value on it and the object you want to pass itself. The partition key is always a MUST upon creating a new Item in a DynamoDB table.
Your JSON object should look like this:
{
"myPartitionKey": "myValue",
"attr1": "val1",
"attr2:" "val2"
}
EDIT: After the OP updated his question, here's some new information:
For problem 1)
Are you sure the image you are trying to process is a valid one? If it is a corrupted file Rekognition will fail and throw that error.
For problem 2)
You cannot concatenate a String with a Dictionary in Python. Your Age and Gender variables are dictionaries, not Strings. So you need to access an inner attribute within these dictionaries. They have a 'Value' attribute. I am not a Python developer, but you need to access the Value attribute inside your Gender object. The Age object, however, has 'Low' and 'High' as attributes.
You can see the complete list of attributes in the docs
Hope this helps!

Related

passing parameter in Rest API request from other file or list variable- using

new to python and API.
i have list of values like below
typeid=['1','12','32','1000','9']
I have to pass this value as parameter in API request, so that it would take one typeid at a time and append the json. code i have following but not sure how it will move from one value to other?
# activity type id store as following in other .py file typeid=['1','12','32','1000','9']
#importing the file in main program file.
From typeid list import activitytypeids
act1 = requests.get(host + '/rest/v1/activities.json',
params={
'activityTypeIds': activitytypeids[0]
}).text
json_obj = json.loads(act1)
results.append(json_obj)
more_result = json_obj['moreResult']
while True:
act1 = requests.get(host + '/rest/v1/activities.json',
params={
'activityTypeIds': activitytypeids[0]
}).text
json_obj = json.loads(act1)
results.append(json_obj)
more_result =json(results['moreResult'])
if not more_result:
break
How do I pass the activity's in request param one by one, so that get the result of all type ids.
take your code to get one id and put it in a function that accepts an activity_id, and change all activitytypeids[0] to just be activity_id
From typeid list import activitytypeids
def get_activity_id(activity_id):
act1 = requests.get(host + '/rest/v1/activities.json',
params={
'activityTypeIds': activity_id
}).text
return act1.json()
then you can just iterate over your list
results = [get_activity_id(id) for id in activitytypeids]
that said it seems very surprising that a variable named activityTypeIds only accepts one id ... i would very much expect this to be able to accept a list based on nothing more than the variable name

Azure Function to CosmosDB

Need help on getting a function that would take a json and write the values to a cosmos DB. Everything I have read shows only single parameters.
name = req.params.get('name')
if not name:
try:
req_body = req.get_json()
except ValueError:
pass
else:
name = req_body.get('name')
if name:
count = 1
try:
counter = container_client.read_item(item=name, partition_key=name)
counter['count'] += 1
container_client.replace_item(item=counter['id'], body=counter)
count = counter['count']
except exceptions.CosmosResourceNotFoundError:
# Create new item
container_client.create_item({'id': name, 'count': count})
return func.HttpResponse(f"Hello, {name}! Current count is {count}.")
This code works but would like something {name:Kyle, job:engineer} and these get added to table.
I am following this blog to achieve your requirement.
Try to add the json values in the below format and insert them into cosmos DB.
if name:
newdocs = func.DocumentList()
#creating the userdetails as json in container of a cosmosdb
newproduct_dict = {
"id": str(uuid.uuid4()),
"name": name
}
newdocs.append(func.Document.from_dict(newproduct_dict))
doc.set(newdocs)
by using this I can able to add the JSON values in Cosmos DB.

Incrementing a counter in DynamoDB when value to be updated is in a map field

I have a lambda function that needs to retrieve an item from DynamoDB and update the counter of that item. But..
The DynamoDB table is structured as:
id: int
options: map
some_option: 0
some_other_option: 0
I need to first retrieve the item of the table that has a certain id and a certain option listed as a key in the options.
Then I want to increment that counter by some value.
Here is what I have so far:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('options')
response = None
try:
response = table.get_item(Key={'id': id})
except ClientError as e:
print(e.response['Error']['Message'])
option = response.get('Item', None)
if option:
option['options'][some_option] = int(option['options'][some_option]) + some_value
# how to update item in DynamoDB now?
My issues is how to update the record now and more importantly will such solution cause data races? Could 2 simultaneous lambda calls that try to update the same item at the same option cause data races? If so what's the way to solve this?
Any pointers/help is appreciated.
Ok, I found the answer:
All I need is:
response = table.update_item(
Key={
'id': my_id,
},
UpdateExpression='SET options.#s = options.#s + :val',
ExpressionAttributeNames={
"#s": my_option
},
ExpressionAttributeValues={
':val': Decimal(some_value)
},
ReturnValues="UPDATED_NEW"
)
This is inspired from Step 3.4: Increment an Atomic Counter which provides an atomic approach to increment values. According to the documentation:
DynamoDB supports atomic counters, which use the update_item method to
increment or decrement the value of an existing attribute without
interfering with other write requests. (All write requests are applied
in the order in which they are received.)

Passing pandas dataframe to fastapi

I wish to create an API using which I can take Pandas dataframe as an input and store that in my DB.
I am able to do so with the csv file. However, the problem with that is that, my datatype information is lost (column datatypes like: int, array, float and so on) which is important for what I am trying to do.
I have already read this: Passing a pandas dataframe to FastAPI for NLP ML
I cannot create a class like this:
class Data(BaseModel):
# id: str
project: str
messages: str
The reason being I don't have any fixed schema. the dataframe could be of any shape with varying data types. I have created a dynamic query to create a table as per coming data frame and insert into that dataframe as well.
However, being new to fastapi, I am not able to figure out if there is an efficient way of sending this changing (dynamic) dataframe requirement of mine and store it via the queries that I have created.
If the information is not sufficient, I can try to provide more examples.
Is there a way I can send pandas dataframe from my jupyter notebook itself.
Any guidance on this would be greatly appreciated.
#router.post("/send-df")
async def push_df_funct(
target_name: Optional[str] = Form(...),
join_key: str = Form(...),
local_csv_file: UploadFile = File(None),
db: Session = Depends(pg.get_db)
):
"""
API to upload dataframe to database
"""
return upload_dataframe(db, featureset_name, local_csv_file, join_key)
def registration_cassandra(self, feature_registation_dict):
'''
# Table creation in cassandra as per the given feature registration JSON
Takes:
1. feature_registration_dict: Feature registration JSON
Returns:
- Response stating that the table has been created in cassandra
'''
logging.info(feature_registation_dict)
target_table_name = feature_registation_dict.get('featureset_name')
join_key = feature_registation_dict.get('join_key')
metadata_list = feature_registation_dict.get('metadata_list')
table_name_delimiter = "__"
logging.info(metadata_list)
column_names = [ sub['name'] for sub in metadata_list ]
data_types = [ DataType.to_cass_datatype(eval(sub['data_type']).value) for sub in metadata_list ]
logging.info(f"Column names: {column_names}")
logging.info(f"Data types: {data_types}")
ls = list(zip(column_names, data_types))
target_table_name = target_table_name + table_name_delimiter + join_key
base_query = f"CREATE TABLE {self.keyspace}.{target_table_name} ("
# CREATE TABLE images_by_month5(tid object PRIMARY KEY , cc_num object,amount object,fraud_label object,activity_time object,month object);
# create_query_new = "CREATE TABLE vpinference_dev.images_by_month4 (month int,activity_time timestamp,amount double,cc_num varint,fraud_label varint,
# tid text,PRIMARY KEY (month, activity_time, tid)) WITH CLUSTERING ORDER BY (activity_time DESC, tid ASC)"
#CREATE TABLE group_join_dates ( groupname text, joined timeuuid, username text, email text, age int, PRIMARY KEY (groupname, joined) )
flag = True
for name, data_type in ls:
base_query += " " + name
base_query += " " + data_type
#if flag :
# base_query += " PRIMARY KEY "
# flag = False
base_query += ','
create_query = base_query.strip(',').rstrip(' ') + ', month varchar, activity_time timestamp,' + ' PRIMARY KEY (' + f'month, activity_time, {join_key}) )' + f' WITH CLUSTERING ORDER BY (activity_time DESC, {join_key} ASC' + ');'
logging.info(f"Query to create table in cassandra: {create_query}")
try:
session = self.get_session()
session.execute((create_query))
except Exception as e:
logging.exception(f"Some error occurred while doing the registration in cassandra. Details :: {str(e)}")
raise AppException(f"Some error occurred while doing the registration in cassandra. Details :: {str(e)}")
response = f"Table created successfully in cassandra at: vpinference_dev.{target_table_name}__{join_key};"
return response
This is the dictionary that I am passing:
feature_registation_dict = {
'featureSetName': 'data_type_testing_29',
'teamName': 'Harsh',
'frequency': 'DAILY',
'joinKey': 'tid',
'model_version': 'v1',
'model_name': 'data type testing',
'metadata_list': [{'name': 'tid',
'data_type': 'text',
'definition': 'Credit Card Number (Unique)'},
{'name': 'cc_num',
'data_type': 'bigint',
'definition': 'Aggregated Metric: Average number of transactions for the card aggregated by past 10 minutes'},
{'name': 'amount',
'data_type': 'double',
'definition': 'Aggregated Metric: Average transaction amount for the card aggregated by past 10 minutes'},
{'name': 'datetime',
'data_type': 'text',
'definition': 'Required feature for event timestamp'}]}
Not sure I understood exactly what you need but I'll give it a try. To send any dataframe to fastapi, you could do something like:
#fastapi
#app.post("/receive_df")
def receive_df(df_in: str):
df = pd.DataFrame.read_json(df_in)
#jupyter
payload={"df_in":df.to_json()}
requests.post("localhost:8000/receive_df", data=payload)
Can't really test this right now, there's probably some mistakes in there but the gist is just serializing the DataFrame to json and then serializing it in the endpoint. If you need (json) validation, you can also use the pydantic.Json data type. If there is no fixed schema then you can't use BaseModel in any useful way. But just sending a plain json string should be all you need, if your data comes only from reliable sources (your jupyter Notebook).

Empty a DynamoDB table with boto

How can I optimally (in terms financial cost) empty a DynamoDB table with boto? (as we can do in SQL with a truncate statement.)
boto.dynamodb2.table.delete() or boto.dynamodb2.layer1.DynamoDBConnection.delete_table() deletes the entire table, while boto.dynamodb2.table.delete_item() boto.dynamodb2.table.BatchTable.delete_item() only deletes the specified items.
While i agree with Johnny Wu that dropping the table and recreating it is much more efficient, there may be cases such as when many GSI's or Tirgger events are associated with a table and you dont want to have to re-associate those. The script below should work to recursively scan the table and use the batch function to delete all items in the table. For massively large tables though, this may not work as it requires all items in the table to be loaded into your computer
import boto3
dynamo = boto3.resource('dynamodb')
def truncateTable(tableName):
table = dynamo.Table(tableName)
#get the table keys
tableKeyNames = [key.get("AttributeName") for key in table.key_schema]
"""
NOTE: there are reserved attributes for key names, please see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ReservedWords.html
if a hash or range key is in the reserved word list, you will need to use the ExpressionAttributeNames parameter
described at https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Table.scan
"""
#Only retrieve the keys for each item in the table (minimize data transfer)
ProjectionExpression = ", ".join(tableKeyNames)
response = table.scan(ProjectionExpression=ProjectionExpression)
data = response.get('Items')
while 'LastEvaluatedKey' in response:
response = table.scan(
ProjectionExpression=ProjectionExpression,
ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
with table.batch_writer() as batch:
for each in data:
batch.delete_item(
Key={key: each[key] for key in tableKeyNames}
)
truncateTable("YOUR_TABLE_NAME")
As Johnny Wu mentioned, deleting a table and re-creating it is more efficient than deleting individual items. You should make sure your code doesn't try to create a new table before it is completely deleted.
def deleteTable(table_name):
print('deleting table')
return client.delete_table(TableName=table_name)
def createTable(table_name):
waiter = client.get_waiter('table_not_exists')
waiter.wait(TableName=table_name)
print('creating table')
table = dynamodb.create_table(
TableName=table_name,
KeySchema=[
{
'AttributeName': 'YOURATTRIBUTENAME',
'KeyType': 'HASH'
}
],
AttributeDefinitions= [
{
'AttributeName': 'YOURATTRIBUTENAME',
'AttributeType': 'S'
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1
},
StreamSpecification={
'StreamEnabled': False
}
)
def emptyTable(table_name):
deleteTable(table_name)
createTable(table_name)
Deleting a table is much more efficient than deleting items one-by-one. If you are able to control your truncation points, then you can do something similar to rotating tables as suggested in the docs for time series data.
This builds on the answer given by Persistent Plants. If the table already exists, you can extract the table definitions and use that to recreate the table.
import boto3
dynamodb = boto3.resource('dynamodb', region_name='us-east-2')
def delete_table_ddb(table_name):
table = dynamodb.Table(table_name)
return table.delete()
def create_table_ddb(table_name, key_schema, attribute_definitions,
provisioned_throughput, stream_enabled, billing_mode):
settings = dict(
TableName=table_name,
KeySchema=key_schema,
AttributeDefinitions=attribute_definitions,
StreamSpecification={'StreamEnabled': stream_enabled},
BillingMode=billing_mode
)
if billing_mode == 'PROVISIONED':
settings['ProvisionedThroughput'] = provisioned_throughput
return dynamodb.create_table(**settings)
def truncate_table_ddb(table_name):
table = dynamodb.Table(table_name)
key_schema = table.key_schema
attribute_definitions = table.attribute_definitions
if table.billing_mode_summary:
billing_mode = 'PAY_PER_REQUEST'
else:
billing_mode = 'PROVISIONED'
if table.stream_specification:
stream_enabled = True
else:
stream_enabled = False
capacity = ['ReadCapacityUnits', 'WriteCapacityUnits']
provisioned_throughput = {k: v for k, v in table.provisioned_throughput.items() if k in capacity}
delete_table_ddb(table_name)
table.wait_until_not_exists()
return create_table_ddb(
table_name,
key_schema=key_schema,
attribute_definitions=attribute_definitions,
provisioned_throughput=provisioned_throughput,
stream_enabled=stream_enabled,
billing_mode=billing_mode
)
Now call use the function:
table_name = 'test_ddb'
truncate_table_ddb(table_name)

Categories

Resources