cerberus - how to validate arbitrary dict keys? - python

I have read issues here and here using keysrules and valuesrules but I've only seen them validate nested not root. I'd like to valid the top level root dict keys.
schema = {
'any_arbitrary_str': {
'type': 'dict',
'keysrules': {'type': 'string'},
'valuesrules': {'type': 'integer'},
},
}
v = Validator(schema)
v.validate({'test': {'a': 1, 'b': 2}})
print(v.errors)
In this example, I'd like to just validate that schema is dict of str: Dict[str, int] where the keys can be any arbitrary string.
I'm not sure I'm using it right docs, this fails with cerberus.schema.SchemaError: {'any_arbitrary_str': [{'keysrules': ['unknown rule'], 'valuesrules': ['unknown rule']}]} but it's still looking for any_arbitrary_str instead of any string also.

You can just nest it. Not pretty, but works. I have not found a more elegant solution yet.
schema = {
'document': {
'type': 'dict',
'keysrules': {'type': 'string'},
'valuesrules': {
'type': 'dict',
'keysrules': {'type': 'string'},
'valuesrules': {'type': 'integer'},
},
},
}
v = Validator(schema)
document_to_test = {'test': {'a': 1, 'b': 2}}
v.validate({'document': document_to_test})
print(v.errors)

Related

JSON viewers don't accept my pattern even after dict going through json.dumps() + json.loads()

The result when printing after a = json.dumps(dicter) and print(json.loads(a)) is this:
{
'10432981': {
'tournament': {
'name': 'Club Friendly Games',
'slug': 'club-friendly-games',
'category': {
'name': 'World',
'slug': 'world',
'sport': {
'name': 'Football',
'slug': 'football',
'id': 1
},
'id': 1468,
'flag': 'international'
},
'uniqueTournament': {
'name': 'Club Friendly Games',
'slug': 'club-friendly-games',
'category': {
'name': 'World',
'slug': 'world',
'sport': {
'name': 'Football',
'slug': 'football',
'id': 1
},
'id': 1468,
'flag': 'international'
},
'userCount': 0,
'hasPositionGraph': False,
'id': 853,
'hasEventPlayerStatistics': False,
'displayInverseHomeAwayTeams': False
},
'priority': 0,
'id': 86
}
}
}
But when trying to read in any json viewer, they warn that the format is incorrect but don't specify where the problem is.
If it doesn't generate any error when converting the dict to JSON and not even when reading it, why do views warn of failure?
You must enclose the strings using double quotes ("). The json.loads returns a python dictionary, so it is not a valid JSON object. If you want to get valid JSON you can get the string that json.dumps returns.

How to convert a DynamoDB Json in a regular JSON in Python?

I get a json from DynamoDB and that is the format:
payload_stack = {'Records': [{'eventID': '123456', 'eventName': 'INSERT', 'eventVersion': '1.1',
'eventSource': 'aws:dynamodb', 'awsRegion': 'sa-east-1',
'dynamodb': {'ApproximateCreationDateTime': 1644956685.0,
'Keys': {'body_field': {'N': '1931'}},
'NewImage': {'body_field': {'N': '1931'}, 'txt_vlr_parm_requ': {'M': {
'headers': {'M': {'Authorization': {
'S': 'token'},
'correlationID': {'S': '987654321'}}},
'requestContext': {
'M': {'requestId': {'S': '123'}}},
'body': {'M': {'avro_schema': {
'S': '{"type":"record","namespace":"Tutorialspoint","name":"Employee","fields":[{"name":"Name","type":"string"},{"name":"Age","type":"int"}, {"name":"Address","type":"string"}, {"name":"Role","type":"string"} ]}'},
'cluster': {'S': 'events'}, 'sigla': {'S': 'ft7'},
'subject': {'S': 'teste-dynamo'},
'branch': {'S': 'development'},
'id_requisicao': {'N': '1818'}}}}},
'nom_tabe': {'S': 'tabela_teste'},
'cod_situ_psst_ingo': {'S': 'NOVO'}, 'historic': {
'S': '{"historico": [{"data/hora": "09-02-22 18:18:41", "status": "NOVO"}]}'},
'nom_arqu_bckt': {'S': 'arquivo.avro'}},
'SequenceNumber': '87226300000000005691898607', 'SizeBytes': 1672,
'StreamViewType': 'NEW_IMAGE'},
'eventSourceARN': 'arn:aws'}]}
However I need to convert into a regular json and take only the 'body' field, for example:
'body': {
"cluster": "events",
"subject": "teste-dynamo",
"id_requisition": 1818,
"branch": "development",
}
To catch the body field I can imagine how to do, like getting indexes on Python.
But any idea how can I convert this DYNAMODB JSON in a regular JSON?
Thanks.
I authored a library called cerealbox that makes it easier to perform this common conversion as follows.
from cerealbox.dynamo import from_dynamodb_json
# convert the DynamoDB image to a regular python dictionary
result = from_dynamodb_json(payload_stack['Records'][0]['dynamodb']['NewImage'])
# access the body as a regular dictionary
body = result['txt_vlr_parm_requ']['body']
The documentation covers how to perfom the inverse using as_dynamodb_json.
This can also be done using boto3's TypeDeserializer/TypeSerializer - a good example of this can be found here
I was able to develop a little code to convert this DynamoDB json into a regular json.
I used the dynamodb_json import.
##that one in the question
payload_stack = {...}
convert_regular_json = json.loads(payload_stack)
print(convert_regular_json)
The output:
{
'Records': [{
'eventID': '123456',
'eventName': 'INSERT',
'eventVersion': '1.1',
'eventSource': 'aws:dynamodb',
'awsRegion': 'sa-east-1',
'dynamodb': {
'ApproximateCreationDateTime': 1644956685.0,
'Keys': {
'body_field': 1931
},
'NewImage': {
'body_field': 1931,
'txt_vlr_parm_requ': {
'headers': {
'Authorization': 'token',
'correlationID': '987654321'
},
'requestContext': {
'requestId': '123'
},
'body': {
'avro_schema': '{"type":"record","namespace":"Tutorialspoint","name":"Employee","fields":[{"name":"Name","type":"string"},{"name":"Age","type":"int"}, {"name":"Address","type":"string"}, {"name":"Role","type":"string"} ]}',
'cluster': 'events',
'sigla': 'ft7',
'subject': 'teste-dynamo',
'branch': 'development',
'id_requisicao': 1818
}
},
'nom_tabe': 'tabela_teste',
'cod_situ_psst_ingo': 'NOVO',
'historic': '{"historico": [{"data/hora": "09-02-22 18:18:41", "status": "NOVO"}]}',
'nom_arqu_bckt': 'arquivo.avro'
},
'SequenceNumber': '87226300000000005691898607',
'SizeBytes': 1672,
'StreamViewType': 'NEW_IMAGE'
},
'eventSourceARN': 'arn:aws'
}]
}
And to catch the 'body' field:
catch_body_payload = var['Records'][0].get('dynamodb').get('NewImage').get('txt_vlr_parm_requ').get('body')

Fastest way to get specific key from a dict if it is found

I am currently writing a scraper that reads from an API that contains a JSON. By doing response.json() it would return a dict where we could easily use the e.g response["object"]to get the value we want as I assume that converts it to a dict. The current mock data looks like this:
data = {
'id': 336461,
'thumbnail': '/images/product/123456?trim&h=80',
'variants': None,
'name': 'Testing',
'data': {
'Videoutgång': {
'Typ av gränssnitt': {
'name': 'Typ av gränssnitt',
'value': 'PCI Test'
}
}
},
'stock': {
'web': 0,
'supplier': None,
'displayCap': '50',
'1': 0,
'orders': {
'CL': {
'ordered': -10,
'status': 1
}
}
}
}
What I am looking after is that the API sometimes does contain "orders -> CL" but sometime doesn't . That means that both happy path and unhappy path is what I am looking for which is the fastest way to get a data from a dict.
I have currently done something like this:
data = {
'id': 336461,
'thumbnail': '/images/product/123456?trim&h=80',
'variants': None,
'name': 'Testing',
'data': {
'Videoutgång': {
'Typ av gränssnitt': {
'name': 'Typ av gränssnitt',
'value': 'PCI Test'
}
}
},
'stock': {
'web': 0,
'supplier': None,
'displayCap': '50',
'1': 0,
'orders': {
'CL': {
'ordered': -10,
'status': 1
}
}
}
}
if (
"stock" in data
and "orders" in data["stock"]
and "CL" in data["stock"]["orders"]
and "status" in data["stock"]["orders"]["CL"]
and data["stock"]["orders"]["CL"]["status"]
):
print(f'{data["stock"]["orders"]["CL"]["status"]}: {data["stock"]["orders"]["CL"]["ordered"]}')
1: -10
However my question is that I would like to know which is the fastest way to get the data from a dict if it is in the dict?
Lookups are faster in dictionaries because Python implements them using hash tables.
If we explain the difference by Big O concepts, dictionaries have constant time complexity, O(1). This is another approach using .get() method as well:
data = {
'id': 336461,
'thumbnail': '/images/product/123456?trim&h=80',
'variants': None,
'name': 'Testing',
'data': {
'Videoutgång': {
'Typ av gränssnitt': {
'name': 'Typ av gränssnitt',
'value': 'PCI Test'
}
}
},
'stock': {
'web': 0,
'supplier': None,
'displayCap': '50',
'1': 0,
'orders': {
'CL': {
'ordered': -10,
'status': 1
}
}
}
}
if (data.get('stock', {}).get('orders', {}).get('CL')):
print(f'{data["stock"]["orders"]["CL"]["status"]}: {data["stock"]["orders"]["CL"]["ordered"]}')
Here is a nice writeup on lookups in Python with list and dictionary as example.
I got your point. For this question, since your stock has just 4 values it is hard to say if .get() method will work faster than using a loop or not. If your dictionary would have more items then certainly .get() would have worked much faster but since there are few keys, using loop will not make much difference.

Storing List of Dict in a DynamoDB Table

I want to store a list of Tags of an Elasticsearch domain in a DynamoDB and i'm facing some errors.
I'm getting the list of tags using list_tags() function :
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/es.html#ElasticsearchService.Client.list_tags
response = client.list_tags(
ARN='string'
)
It returns that :
{
'TagList': [
{
'Key': 'string',
'Value': 'string'
},
]
}
Here's what they say in the doc :
Response Structure
(dict) --
The result of a ListTags operation. Contains tags for all requested Elasticsearch domains.
TagList (list) --
List of Tag for the requested Elasticsearch domain.
(dict) --
Specifies a key value pair for a resource tag.
Now i tried to insert the list in DynamoDB using various ways but i'm always getting errors :
':TagList': {
'M': response_list_tags['TagList']
},
Invalid type for parameter ExpressionAttributeValues.:TagList.M, value: [{'Key': 'Automation', 'Value': 'None'}, {'Key': 'Owner', 'Value': 'owner'}, {'Key': 'BU', 'Value': 'DS'}, {'Key': 'Support', 'Value': 'teamA'}, {'Key': 'Note', 'Value': ''}, {'Key': 'Environment', 'Value': 'dev'}, {'Key': 'Creator', 'Value': ''}, {'Key': 'SubProject', 'Value': ''}, {'Key': 'DateTimeTag', 'Value': 'nodef'}, {'Key': 'ApplicationCode', 'Value': ''}, {'Key': 'Criticity', 'Value': '3'}, {'Key': 'Name', 'Value': 'dev'}], type: , valid types: : ParamValidationError
Tried with L instead of M and got this :
Unknown parameter in ExpressionAttributeValues.:TagList.L[11]: "Value", must be one of: S, N, B, SS, NS, BS, M, L, NULL, BOOL: ParamValidationError
The specific error you are getting is because you are using the native DynamoDB document item JSON format which requires that any attribute value (including key-values in a map, nested in a list) to be fully qualified with a type as a key-value.
There are two ways you can do that and from your question I'm not sure if you wanted to store those key-value tag objects as a list, or you wanted to store that as an actual map in Dynamo.
Either way, I recommend you JSON encode you list and just store it in DynamoDB as a string value. There's no really good reason why you would want to go through the trouble of storing that as a map or list.
However, if you really wanted to you could do the conversion to the DynamoDB native JSON and store as a map. You will end up with something like this:
':TagList': {
'M': {
'Automation': { 'S': 'None' },
'Owner': {'S': 'owner'},
'BU': {'S': 'DS'},
'Support': {'S': 'teamA'}
...
}
}
Another possibility would be using a list of maps:
':TagList': {
'L': [
'M': {'Key': {'S': 'Automation'}, 'Value': { 'S': 'None' }},
'M': {'Key': {'S': 'Owner'}, 'Value' : {'S': 'owner'}},
'M': {'Key': {'S': 'BU'}, 'Value': {'S': 'DS'}},
'M': {'Key': {'S': 'Support'}, 'Value': {'S': 'teamA'}}
...
]
}
But in my experience I have never gotten any real value out of storing data like this in Dynamo. Instead, storing those tags as a JSON string is both easier and less error prone. You end up with this:
':TagList': {
'S': '{\'Key\': \'Automation\', \'Value\': \'None\'}, {\'Key\': \'Owner\', \'Value\': \'owner\'}, {\'Key\': \'BU\', \'Value\': \'DS\'}, {\'Key\': \'Support\', \'Value\': \'teamA\'}, ... }'
}
And all you have to do is writhe the equivalent of:
':TagList': {
'S': json.dumps(response_list_tags['TagList'])
}
Thank you Mike, i eneded up with a similar solution. I stored the Tag List as String like that :
':TagList': {
'S': str(response_list_tags['TagList'])
}
Then to convert the string to a list for a later use i did this :
import ast
...
TagList= ast.literal_eval(db_result['Item']['TagList']['S'])

Python Eve not following set schema and returning full document when using aggregation pipeline

I have a simple api in which coordinates and distance are provided and a and documents from within that distance are returned. I intend it to return just the id and distance but the defined schema is being ignored and the whole document is being returned. Any ideas?
item = {'item_title': 'relate',
'datasource': {
'source': 'api',
'filter': {'_type': 'line'},
'aggregation': {'pipeline': [{'$geoNear':{'near':{'type': 'point', 'coordinates': '$coords'},'distanceField': 'distance','maxDistance': '$maxDist','num': 1, 'spherical': 'true'}}]}
},
'schema': {
'_id': {'type': 'string'},
'distance': {'type': 'float'}
},
}
DOMAIN = {"data": item}
and the postman query is:
http://localhost:8090/data?aggregate={"$maxDist": 500, "$coords": [-1.47, 50.93]}
EDIT:
Following Neil's comment I tried this:
item = {'item_title': 'relate',
'schema': {
'uri': {'type': 'string'},
'distance': {'type': 'float'}
},
'datasource': {
'source': 'api',
'filter': {'_type': 'link'},
'aggregation': {'pipeline': [{'$geoNear':{'near':{'type': 'point', 'coordinates': ['$lng', '$lat']},'distanceField': 'distance','maxDistance': '$maxDist','num': 1, 'spherical': 'true'}}]}
}
}
With the following postman request:
http://localhost:8090/data?aggregate={"$maxDist": 500, "$lng": -1.47, "$lat": 50.93}
This is leading to the following error:
geoNear command failed: { ok: 0.0, errmsg: "'near' field must be point", code: 17304, codeName: "Location17304" }

Categories

Resources