How to convert a DynamoDB Json in a regular JSON in Python? - python

I get a json from DynamoDB and that is the format:
payload_stack = {'Records': [{'eventID': '123456', 'eventName': 'INSERT', 'eventVersion': '1.1',
'eventSource': 'aws:dynamodb', 'awsRegion': 'sa-east-1',
'dynamodb': {'ApproximateCreationDateTime': 1644956685.0,
'Keys': {'body_field': {'N': '1931'}},
'NewImage': {'body_field': {'N': '1931'}, 'txt_vlr_parm_requ': {'M': {
'headers': {'M': {'Authorization': {
'S': 'token'},
'correlationID': {'S': '987654321'}}},
'requestContext': {
'M': {'requestId': {'S': '123'}}},
'body': {'M': {'avro_schema': {
'S': '{"type":"record","namespace":"Tutorialspoint","name":"Employee","fields":[{"name":"Name","type":"string"},{"name":"Age","type":"int"}, {"name":"Address","type":"string"}, {"name":"Role","type":"string"} ]}'},
'cluster': {'S': 'events'}, 'sigla': {'S': 'ft7'},
'subject': {'S': 'teste-dynamo'},
'branch': {'S': 'development'},
'id_requisicao': {'N': '1818'}}}}},
'nom_tabe': {'S': 'tabela_teste'},
'cod_situ_psst_ingo': {'S': 'NOVO'}, 'historic': {
'S': '{"historico": [{"data/hora": "09-02-22 18:18:41", "status": "NOVO"}]}'},
'nom_arqu_bckt': {'S': 'arquivo.avro'}},
'SequenceNumber': '87226300000000005691898607', 'SizeBytes': 1672,
'StreamViewType': 'NEW_IMAGE'},
'eventSourceARN': 'arn:aws'}]}
However I need to convert into a regular json and take only the 'body' field, for example:
'body': {
"cluster": "events",
"subject": "teste-dynamo",
"id_requisition": 1818,
"branch": "development",
}
To catch the body field I can imagine how to do, like getting indexes on Python.
But any idea how can I convert this DYNAMODB JSON in a regular JSON?
Thanks.

I authored a library called cerealbox that makes it easier to perform this common conversion as follows.
from cerealbox.dynamo import from_dynamodb_json
# convert the DynamoDB image to a regular python dictionary
result = from_dynamodb_json(payload_stack['Records'][0]['dynamodb']['NewImage'])
# access the body as a regular dictionary
body = result['txt_vlr_parm_requ']['body']
The documentation covers how to perfom the inverse using as_dynamodb_json.
This can also be done using boto3's TypeDeserializer/TypeSerializer - a good example of this can be found here

I was able to develop a little code to convert this DynamoDB json into a regular json.
I used the dynamodb_json import.
##that one in the question
payload_stack = {...}
convert_regular_json = json.loads(payload_stack)
print(convert_regular_json)
The output:
{
'Records': [{
'eventID': '123456',
'eventName': 'INSERT',
'eventVersion': '1.1',
'eventSource': 'aws:dynamodb',
'awsRegion': 'sa-east-1',
'dynamodb': {
'ApproximateCreationDateTime': 1644956685.0,
'Keys': {
'body_field': 1931
},
'NewImage': {
'body_field': 1931,
'txt_vlr_parm_requ': {
'headers': {
'Authorization': 'token',
'correlationID': '987654321'
},
'requestContext': {
'requestId': '123'
},
'body': {
'avro_schema': '{"type":"record","namespace":"Tutorialspoint","name":"Employee","fields":[{"name":"Name","type":"string"},{"name":"Age","type":"int"}, {"name":"Address","type":"string"}, {"name":"Role","type":"string"} ]}',
'cluster': 'events',
'sigla': 'ft7',
'subject': 'teste-dynamo',
'branch': 'development',
'id_requisicao': 1818
}
},
'nom_tabe': 'tabela_teste',
'cod_situ_psst_ingo': 'NOVO',
'historic': '{"historico": [{"data/hora": "09-02-22 18:18:41", "status": "NOVO"}]}',
'nom_arqu_bckt': 'arquivo.avro'
},
'SequenceNumber': '87226300000000005691898607',
'SizeBytes': 1672,
'StreamViewType': 'NEW_IMAGE'
},
'eventSourceARN': 'arn:aws'
}]
}
And to catch the 'body' field:
catch_body_payload = var['Records'][0].get('dynamodb').get('NewImage').get('txt_vlr_parm_requ').get('body')

Related

JSON viewers don't accept my pattern even after dict going through json.dumps() + json.loads()

The result when printing after a = json.dumps(dicter) and print(json.loads(a)) is this:
{
'10432981': {
'tournament': {
'name': 'Club Friendly Games',
'slug': 'club-friendly-games',
'category': {
'name': 'World',
'slug': 'world',
'sport': {
'name': 'Football',
'slug': 'football',
'id': 1
},
'id': 1468,
'flag': 'international'
},
'uniqueTournament': {
'name': 'Club Friendly Games',
'slug': 'club-friendly-games',
'category': {
'name': 'World',
'slug': 'world',
'sport': {
'name': 'Football',
'slug': 'football',
'id': 1
},
'id': 1468,
'flag': 'international'
},
'userCount': 0,
'hasPositionGraph': False,
'id': 853,
'hasEventPlayerStatistics': False,
'displayInverseHomeAwayTeams': False
},
'priority': 0,
'id': 86
}
}
}
But when trying to read in any json viewer, they warn that the format is incorrect but don't specify where the problem is.
If it doesn't generate any error when converting the dict to JSON and not even when reading it, why do views warn of failure?
You must enclose the strings using double quotes ("). The json.loads returns a python dictionary, so it is not a valid JSON object. If you want to get valid JSON you can get the string that json.dumps returns.

Fastest way to get specific key from a dict if it is found

I am currently writing a scraper that reads from an API that contains a JSON. By doing response.json() it would return a dict where we could easily use the e.g response["object"]to get the value we want as I assume that converts it to a dict. The current mock data looks like this:
data = {
'id': 336461,
'thumbnail': '/images/product/123456?trim&h=80',
'variants': None,
'name': 'Testing',
'data': {
'Videoutgång': {
'Typ av gränssnitt': {
'name': 'Typ av gränssnitt',
'value': 'PCI Test'
}
}
},
'stock': {
'web': 0,
'supplier': None,
'displayCap': '50',
'1': 0,
'orders': {
'CL': {
'ordered': -10,
'status': 1
}
}
}
}
What I am looking after is that the API sometimes does contain "orders -> CL" but sometime doesn't . That means that both happy path and unhappy path is what I am looking for which is the fastest way to get a data from a dict.
I have currently done something like this:
data = {
'id': 336461,
'thumbnail': '/images/product/123456?trim&h=80',
'variants': None,
'name': 'Testing',
'data': {
'Videoutgång': {
'Typ av gränssnitt': {
'name': 'Typ av gränssnitt',
'value': 'PCI Test'
}
}
},
'stock': {
'web': 0,
'supplier': None,
'displayCap': '50',
'1': 0,
'orders': {
'CL': {
'ordered': -10,
'status': 1
}
}
}
}
if (
"stock" in data
and "orders" in data["stock"]
and "CL" in data["stock"]["orders"]
and "status" in data["stock"]["orders"]["CL"]
and data["stock"]["orders"]["CL"]["status"]
):
print(f'{data["stock"]["orders"]["CL"]["status"]}: {data["stock"]["orders"]["CL"]["ordered"]}')
1: -10
However my question is that I would like to know which is the fastest way to get the data from a dict if it is in the dict?
Lookups are faster in dictionaries because Python implements them using hash tables.
If we explain the difference by Big O concepts, dictionaries have constant time complexity, O(1). This is another approach using .get() method as well:
data = {
'id': 336461,
'thumbnail': '/images/product/123456?trim&h=80',
'variants': None,
'name': 'Testing',
'data': {
'Videoutgång': {
'Typ av gränssnitt': {
'name': 'Typ av gränssnitt',
'value': 'PCI Test'
}
}
},
'stock': {
'web': 0,
'supplier': None,
'displayCap': '50',
'1': 0,
'orders': {
'CL': {
'ordered': -10,
'status': 1
}
}
}
}
if (data.get('stock', {}).get('orders', {}).get('CL')):
print(f'{data["stock"]["orders"]["CL"]["status"]}: {data["stock"]["orders"]["CL"]["ordered"]}')
Here is a nice writeup on lookups in Python with list and dictionary as example.
I got your point. For this question, since your stock has just 4 values it is hard to say if .get() method will work faster than using a loop or not. If your dictionary would have more items then certainly .get() would have worked much faster but since there are few keys, using loop will not make much difference.

cerberus - how to validate arbitrary dict keys?

I have read issues here and here using keysrules and valuesrules but I've only seen them validate nested not root. I'd like to valid the top level root dict keys.
schema = {
'any_arbitrary_str': {
'type': 'dict',
'keysrules': {'type': 'string'},
'valuesrules': {'type': 'integer'},
},
}
v = Validator(schema)
v.validate({'test': {'a': 1, 'b': 2}})
print(v.errors)
In this example, I'd like to just validate that schema is dict of str: Dict[str, int] where the keys can be any arbitrary string.
I'm not sure I'm using it right docs, this fails with cerberus.schema.SchemaError: {'any_arbitrary_str': [{'keysrules': ['unknown rule'], 'valuesrules': ['unknown rule']}]} but it's still looking for any_arbitrary_str instead of any string also.
You can just nest it. Not pretty, but works. I have not found a more elegant solution yet.
schema = {
'document': {
'type': 'dict',
'keysrules': {'type': 'string'},
'valuesrules': {
'type': 'dict',
'keysrules': {'type': 'string'},
'valuesrules': {'type': 'integer'},
},
},
}
v = Validator(schema)
document_to_test = {'test': {'a': 1, 'b': 2}}
v.validate({'document': document_to_test})
print(v.errors)

How to iterate over a JSON array and get values for a key which itself is a JSON object

I have been trying to do something simple yet something hard for me to solve it!
I have a json object that looks like:
jsonObject = {
'attributes': {
'192': { <--- This can be changed times to times meaning different number
'id': '192',
'code': 'hello',
'label': 'world',
'options': [
{
'id': '211',
'label': '5'
},
{
'id': '1202',
'label': '8.5'
},
{
'id': '54',
'label': '9'
},
{
'id': '1203',
'label': '9.5'
},
{
'id': '58',
'label': '10'
}
]
}
},
'template': '12345',
'basePrice': '51233',
'oldPrice': '51212',
'productId': 'hello',
}
and what I want to do is to get the values from options (To have both id and label saved into a list)
For now I only managed to do:
for att, value in jsonObject.items():
print(f"{att} - {value}"
How can I get the label and id?
You can try the following code:
attr = jsonObject['attributes']
temp = list(attr.values())[0] # It is same as "temp = attr['192']", but you said '192' can be changed.
options = temp['options']
for option in options:
print(f"id: {option['id']}, label: {option['label']}")

Python Eve not following set schema and returning full document when using aggregation pipeline

I have a simple api in which coordinates and distance are provided and a and documents from within that distance are returned. I intend it to return just the id and distance but the defined schema is being ignored and the whole document is being returned. Any ideas?
item = {'item_title': 'relate',
'datasource': {
'source': 'api',
'filter': {'_type': 'line'},
'aggregation': {'pipeline': [{'$geoNear':{'near':{'type': 'point', 'coordinates': '$coords'},'distanceField': 'distance','maxDistance': '$maxDist','num': 1, 'spherical': 'true'}}]}
},
'schema': {
'_id': {'type': 'string'},
'distance': {'type': 'float'}
},
}
DOMAIN = {"data": item}
and the postman query is:
http://localhost:8090/data?aggregate={"$maxDist": 500, "$coords": [-1.47, 50.93]}
EDIT:
Following Neil's comment I tried this:
item = {'item_title': 'relate',
'schema': {
'uri': {'type': 'string'},
'distance': {'type': 'float'}
},
'datasource': {
'source': 'api',
'filter': {'_type': 'link'},
'aggregation': {'pipeline': [{'$geoNear':{'near':{'type': 'point', 'coordinates': ['$lng', '$lat']},'distanceField': 'distance','maxDistance': '$maxDist','num': 1, 'spherical': 'true'}}]}
}
}
With the following postman request:
http://localhost:8090/data?aggregate={"$maxDist": 500, "$lng": -1.47, "$lat": 50.93}
This is leading to the following error:
geoNear command failed: { ok: 0.0, errmsg: "'near' field must be point", code: 17304, codeName: "Location17304" }

Categories

Resources