It is my first encounter with DynamoDB and I have been given a JSON File that looks like:
{
"metadata":{
"schemaVersion":"1.0",
"importType":"LEX",
"importFormat":"JSON"
},
"resource":{
"description":"First Names",
"name":"ASDUKfirstNames",
"version":"1",
"enumerationValues":[
{
"value":"Zeshan"
},
{
"value":"Zoe"
},
{
"value":"Zul"
}
],
"valueSelectionStrategy":"ORIGINAL_VALUE"
}
}
and I want to import the data where value = FirstName in the DynamoDB Table that I have created named customerDetails that contains items CustomerID, FirstName and LastName.
Is there a way to utilize the boto3 put-item function to loop over the contents of the JSON file replacing value with FirstName?
You should use Python to do the data transformation. You can find the boto3 DDB docs here.
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('customerDetails')
json_data = { ... load the data into a dict here ... }
for enumeration_value in json_data['resouce']['enumerationValues']:
ddb_item = {
"CustomerID": 123,
"FirstName": enumeration_value['value']]
}
table.put_item(Item=ddb_item)
Related
Hello guys this will be my first post here as I am learning how to code. When I try to update my table in Dynamodb using a lambda function I get the following error message. "The provided key element does not match the schema" my table name is correct and I am able to connect to it. My primary key is just a hash key which is id. its value is 1 so I do not see why it is giving me this error here.
import json
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Visitors')
def lambda_handler (event, context):
response = table.update_item(
Key={
"id": {"N":"1"}
},
ExpressionAttributeNames = {
"#c": "Counters"
},
UpdateExpression= "set #c = :val",
ExpressionAttributeValues={
":val": {"N":"1"}
}
)
Since you are using the table resource, you should refer to this documentation. For example, the Key parameter should have the following syntax:
Key={
'string': 'string'|123|Binary(b'bytes')|True|None|set(['string'])|set([123])|set([Binary(b'bytes')])|[]|{}
}
This means that the DynamoDB data type is inferred from the Python data type. So instead of {"N":"1"}, you can use 1 directly. Here is a corrected version of your code snippet:
import json
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Visitors')
def lambda_handler (event, context):
response = table.update_item(
Key={
"id": 1
},
ExpressionAttributeNames = {
"#c": "Counters"
},
UpdateExpression= "set #c = :val",
ExpressionAttributeValues={
":val": 1
}
)
I have an object in DynamoDB:
{ 'UserID' : 'Hank', ConnectionList : {'con1', 'con2'} }
By using boto3 in lambda functions, I would like to add 'con3' to the String Set.
So far, I have been trying with the following code without success:
ddbClient = boto3.resource('dynamodb')
table = ddbClient.Table("UserInfo")
table.update_item(
Key={
"UserId" : 'Hank'
},
UpdateExpression =
"SET ConnectionList = list_append(ConnectionList, :i)",
ExpressionAttributeValues = {
":i": { "S": "Something" }
},
ReturnValues="ALL_NEW"
)
However, no matter the way I try to put the information inside the String Set, it always runs error.
Since you're using the resource API, you have to use the Python data type set in your statement:
table.update_item(
Key={
"UserId" : 'Hank'
},
UpdateExpression =
"ADD ConnectionList :i",
ExpressionAttributeValues = {
":i": {"Something"}, # needs to be a set type
},
ReturnValues="ALL_NEW"
)
I am trying to parse event data of AWS Lambda, I have connected it to SQS and I am sending the JSON format using SQS.
This is my AWS Lambda Function
import json
def lambda_handler(event, context):
# print(event)
# print(event['Records'][0])
x = event['Records'][0]['body']
print(x)
print(type(x))
Following is the event data
{
"Records":[
{
"messageId":"916f5e95-b2f6-4148-9c62-2ac8e764f06c",
"receiptHandle":"AQEBmLuoGWtLtFFgvyCFdSPMJh2HKgHOIPWNUq22EOwCzGT8iILZm97CE6j4J6oR71ZpDr3sgxQcJyVZ+dmmvGl+fFftT9GCJqZYrjMGsR2Q6WsMd8ciI8bTtDXyvsk8ektd7UGfh4gxIZoFp7WUKVRcMEeBkubKd8T4/Io81D0l/AK7MxcEfCj40vWEsex1kkGmMRlBtdSeGyy7fJgUq5CFAYWciiWtbSit8S0Y38xZPmsIFhoxP0egQRoJcW4aUgMi469Gj5+khizetybtgC8vux5NCg/IejxcCueXkQ7LKVF8kfRdqRSUYB6DsOrGgfmZpK4wpXIarByNz0R2p7J88meYpj2IVULv/emXsSYaKG4rXnpbH4J9ijbLWckYLAd7wPDzCYri1ZSTgAz0kchsEw==",
"body":"{\n\"name\": \"aniket\",\n\"tag\": \"hello\"\n}",
"attributes":{
"ApproximateReceiveCount":"1",
"SentTimestamp":"1602046897707",
"SenderId":"AIDAR3BXDV4FCWXL56NUU",
"ApproximateFirstReceiveTimestamp":"1602046897712"
},
"messageAttributes":{
},
"md5OfBody":"98da683a47692b39c1d43bd4fa21ed89",
"eventSource":"aws:sqs",
"eventSourceARN":"arn:aws:sqs:ap-south-1:126817120010:documentation",
"awsRegion":"ap-south-1"
}
]
}
I am trying to access the body of the data.
this is what I am getting
"{\n\"name\": \"aniket\",\n\"tag\": \"hello\"\n}"
And it's type is string.
What do I need to do convert it into a proper JSON format?
I also tried the following:
import json
def lambda_handler(event, context):
data = json.dumps(event['Records'][0]['body'])
print(data)
This is the output
"{\n\"name\": \"aniket\",\n\"tag\": \"hello\"\n}"
But this time the type is JSON.
The expected format is
{
"name": "aniket",
"tag": "hello"
}
You have to use json.loads not json.dumps.
Try this:
import json
event = {
"Records":[
{
"messageId":"916f5e95-b2f6-4148-9c62-2ac8e764f06c",
"receiptHandle":"AQEBmLuoGWtLtFFgvyCFdSPMJh2HKgHOIPWNUq22EOwCzGT8iILZm97CE6j4J6oR71ZpDr3sgxQcJyVZ+dmmvGl+fFftT9GCJqZYrjMGsR2Q6WsMd8ciI8bTtDXyvsk8ektd7UGfh4gxIZoFp7WUKVRcMEeBkubKd8T4/Io81D0l/AK7MxcEfCj40vWEsex1kkGmMRlBtdSeGyy7fJgUq5CFAYWciiWtbSit8S0Y38xZPmsIFhoxP0egQRoJcW4aUgMi469Gj5+khizetybtgC8vux5NCg/IejxcCueXkQ7LKVF8kfRdqRSUYB6DsOrGgfmZpK4wpXIarByNz0R2p7J88meYpj2IVULv/emXsSYaKG4rXnpbH4J9ijbLWckYLAd7wPDzCYri1ZSTgAz0kchsEw==",
"body":"{\n\"name\": \"aniket\",\n\"tag\": \"hello\"\n}",
"attributes":{
"ApproximateReceiveCount":"1",
"SentTimestamp":"1602046897707",
"SenderId":"AIDAR3BXDV4FCWXL56NUU",
"ApproximateFirstReceiveTimestamp":"1602046897712"
},
"messageAttributes":{
},
"md5OfBody":"98da683a47692b39c1d43bd4fa21ed89",
"eventSource":"aws:sqs",
"eventSourceARN":"arn:aws:sqs:ap-south-1:126817120010:documentation",
"awsRegion":"ap-south-1"
}
]
}
parsed = json.loads(event['Records'][0]['body'])
print(json.dumps(parsed, indent=4, sort_keys=True))
Output:
{
"name": "aniket",
"tag": "hello"
}
Try using json.loads(string) to deserialize the json.
Also, I don't believe you need to specify the index [0] since 'body' is an object and not an array.
I have some .json where not all fields are present in all records, for e.g. caseclass.json looks like:
[{
"name" : "john smith",
"age" : 12,
"cars": ["ford", "toyota"],
"comment": "i am happy"
},
{
"name": "a. n. other",
"cars": "",
"comment": "i am panicking"
}]
Using Elasticsearch-7.6.1 via python client elasticsearch:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import json
import os
from elasticsearch_dsl import Document, Text, Date, Integer, analyzer
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
class Person(Document):
class Index:
using = es
name = 'person_index'
name = Text()
age = Integer()
cars = Text()
comment = Text(analyzer='snowball')
Person.init()
with open ("caseclass.json") as json_file:
data = json.load(json_file)
for indexid in range(len(data)):
document = Person(name=data[indexid]['name'], age=data[indexid]['age'], cars=data[indexid]['cars'], comment=data[indexid]['comment'])
document.meta.id = indexid
document.save()
Naturally I get KeyError: 'age' when the second record is trying to be read. My question is: it is possible to load such records onto a Elasticsearch index using the Python client and a pre-defined mapping, instead of dynamic mapping? Above code works if all fields are present in all records but is there a way to do this without checking presence of each field per record as the actual records have complex structure and there are millions of them? Thanks
The error has nothing to do w/ your mapping -- it's just telling you that age could not be accessed in one of your caseclasses.
The index mapping is created when you call Person.init() -- you can verify that by calling print(es.indices.get_mapping(Person.Index.name)) right after Person.init().
I've cleaned up your code a bit:
import json
import os
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Document, Text, Date, Integer, analyzer
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
class Person(Document):
class Index:
using = es
name = 'person_index'
name = Text()
age = Integer()
cars = Text()
comment = Text(analyzer='snowball')
Person.init()
print(es.indices.get_mapping(Person.Index.name))
with open("caseclass.json") as json_file:
data = json.load(json_file)
for indexid, case in enumerate(data):
document = Person(**case)
document.meta.id = indexid
document.save()
Notice how I used **case to spread all key-value pairs inside of a case instead of using data[property_key].
The generated mapping is as follows:
{
"person_index" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "integer"
},
"cars" : {
"type" : "text"
},
"comment" : {
"type" : "text",
"analyzer" : "snowball"
},
"name" : {
"type" : "text"
}
}
}
}
}
Given a variable length list of items in Python containing primary keys (e.g. itemList = ["item1","item2","item3"]), how can I use boto3 to translate this list into the proper format for a dynamodb batch query?
I'm able to successfully run a query by manually formatting the request but my problem is how to elegantly translate a python list into this format. I've tried the serializer function in boto3 which seems to be the right direction, but I am missing some piece of the puzzle.
import boto3
dynamodb = boto3.resource('dynamodb', region_name='us-west-2')
response = dynamodb.batch_get_item(
RequestItems={
"dynamodb-table-name": {
"Keys": [
{
'pk': {
'S': 'item1'
},
'sk': {
'S': 'ITEM'
}
},
{
'pk': {
'S': 'item2'
},
'sk': {
'S': 'ITEM'
}
}
]
}
}
)
If I create a serializer serializer = boto3.dynamodb.types.TypeSerializer() and use it on my list, I'm returned with {'L': [{'S': 'item1'}, {'S': 'item2'}]}
I think I have figured out my own question. The first issue is that there's a difference between boto3.resource('dynamodb') and boto3.client('dynamodb'), and .client is the one I was able to get working. For the other piece, while serialize might still be a viable option to explore, what I did was this:
itemList = []
for item in items:
itemList.append({'pk':{'S':item},'sk': {'S':'ITEM'}})
response = client.batch_get_item(
RequestItems={
"dynamodb-table-name": {
"Keys": itemList
}
}
)