Validating arbitrary dict keys with strict schemas with Cerberus - python

I am trying to validate JSON, the schema for which specifies a list of dicts with arbitrary string keys, the corresponding values of which are dicts with a strict schema (i.e, the keys of the inner dict are strictly some string, here 'a'). From the Cerberus docs, I think that what I want is the 'keysrules' rule. The example in the docs seems to only show how to use 'keysrules' to validate arbitrary keys, but not their values. I wrote the below code as an example; the best I could do was assume that 'keysrules' would support a 'schema' argument for defining a schema for these values.
keysrules = {
'myDict': {
'type': 'dict',
'keysrules': {
'type': 'string',
'schema': {
'type': 'dict',
'schema': {
'a': {'type': 'string'}
}
}
}
}
}
keysRulesTest = {
'myDict': {
'arbitraryStringKey': {
'a': 'arbitraryStringValue'
},
'anotherArbitraryStringKey': {
'shouldNotValidate': 'arbitraryStringValue'
}
}
}
def test_rules():
v = Validator(keysrules)
if not v.validate(keysRulesTest):
print(v.errors)
assert(0)
This example does validate, and I would like it to not validate on 'shouldNotValidate', because that key should be 'a'. Does the flexibility implied by 'keysrules' (i.e, keys governed by 'keysrules' have no constraint other than {'type': 'string'}) propagate down recursively to all schemas underneath it? Or have I made some different error? How can I achieve my desired outcome?

I didn't want keysrules, I wanted valuesrules:
keysrules = {
'myDict': {
'type': 'dict',
'valuesrules': {
'type': 'dict',
'schema': {
'a': {'type': 'string'}
}
}
}
}
keysRulesTest = {
'myDict': {
'arbitraryStringKey': {
'a': 'arbitraryStringValue'
},
'anotherArbitraryStringKey': {
'shouldNotValidate': 'arbitraryStringValue'
}
}
}
def test_rules():
v = Validator(keysrules)
if not v.validate(keysRulesTest):
print(v.errors)
assert(0)
This produces my desired outcome.

Related

MongoDB watch() aggregation match by field value

When I use the watch() function on my collection, I am passing a aggregation to filter what comes through. I was able to get operationType to work correctly, but I also only want to include documents in which the city field is equal to Vancouver. The current syntax I am using does not work:
change_stream = client.mydb.mycollection.watch([
{
'$match': {
'operationType': { '$in': ['replace', 'insert'] },
'fullDocument': {'city': {'$eq': 'Vancouver'} }
}
}
])
And for reference, this is the what the dictionary that I'm aggregating looks like:
{'_id': {'_data': '825F...E0004'},
'clusterTime': Timestamp(1595565179, 2),
'documentKey': {'_id': ObjectId('70fc7871...')},
'fullDocument': {'_id': ObjectId('70fc7871...'),
'city': 'Vancouver',
'ns': {'coll': 'notification', 'db': 'pipeline'},
'operationType': 'replace'}
I found I just have to use a dot to access the nested dictionary:
change_stream = client.mydb.mycollection.watch([
{
'$match': {
'operationType': { '$in': ['replace', 'insert'] },
'fullDocument.city': 'Vancouver' }
}
}
])

Accessing a json object nested in a json array with Python 3.x

Given the json payload below, how do I get the value of 'hotspot' using Python 3.x? The top level seems to be a a dict with one key value pair. 'Recs' is the key and the value is a Python list. I have loaded the json payload into the Python class using json.loads(payload).
json payload:
{
'Recs': [{
'eSrc': 'big-a1',
'reqPs': {
'srcIP': '11.111.11.111'
},
'a1': {
'a1Ver': '1.0',
'obj': {
'eTag': '38f028e',
'sz': 1217,
'seq': '02391D2',
'hotspot': 'web/acme/srv/dev/8dd'
},
'confId': 'acme-contains',
'pipe': {
'name': 'acme.dev',
'oId': {
'pId': 'BDAD'
}
}
}
}]
}
{ indicates a dict, [ indicates a list so hotspot is at:
my_json['Recs'][0]['a1']['obj']['hotspot']

Python Eve, how to replace a 'dict' with a PATCH request

I am using Python Eve, which is awesome, however I ran into a problem and not sure if there is a solution.
I have a 'fields' dict in this schema:
'profiles': {
'fields': {
'type': 'dict',
'default': {}
}
}
I'd like to be able to PATCH update the 'fields' field, but the issue is that a PATCH request will never REMOVE any field inside 'fields', but I cannot use a PUT command or else all my other profile fields (not shown above) will disappear.
I tried using a subresource like this:
'profile-fields': {
'schema': {
'fields': {
'type': 'dict',
'default': {}
}
},
'datasource': {
'source': 'profiles',
'projection': { 'fields': 1 }
}
},
but as the Python Eve documentation states:
Please note that POST and PATCH methods will still allow the whole schema to be manipulated
http://python-eve.org/config.html#multiple-api-endpoints-one-datasource
Anyone know of a way to do this?
For Example:
# Create a record
POST /api/profiles
{
'name': 'Test',
'fields': {
'one': 1,
'two': 2
}
}
# => { _created: 'blah', _id: '123456' }
# then update fields with a PATCH request
PATCH /api/profiles/123456
{
'fields': {
'three': 3,
'four': 4
}
}
# then get the updated record
GET /api/profiles/123456
# RESPONSE
{
'_id': '123456',
'name': 'Test',
'fields': {
'one': 1,
'two': 2,
'three': 3,
'four': 4
}
}
I have just conceded to using a PUT request and sending the entire object back again, which is ok I guess, just thought there might be a way to do this.

Cannot serialize data when patching to a field that has a 'valueschema' that is of type 'dict' in Eve

So say i have the following document:
test_obj = {
'my_things':{
'id17': {
'blah': 3,
'weird': 'yay',
'thechallenge': ObjectId('5712d06fdb4d0856551300d2')
},
'id32': {
'blah': 62,
'weird': 'hoorah',
'thechallenge': ObjectId('5712d06fdb4d0856551300d4')
}
},
'_id': 12,
'an_extra_field': 'asdf'
}
for this document i have the following schema:
API.config['DOMAIN']['test_obj']['schema'] = {
'id': {'type': 'int'},
'an_extra_field': {'type': 'string'},
'my_things': {
'type': 'dict',
'valueschema': {
'type': 'dict',
'schema': {
'blah': {'type': 'dict'},
'weird': {'type': 'string'},
'thechallenge': {'type': 'objectid'}
}
}
}
}
Now say i make a patch with the following pseudocode:
data = {
'mythings': {
'id17': {
'thechallenge': '5712d06fdb4d0856551300d8'
}
}
}
PATCH(url='/v1/test_objs/12', data=data)
When I make this patch Cerberus raises an error during validation, saying "value '5712d06fdb4d0856551300d8' cannot be converted to a ObjectId". Now this is a valid object id, and i find that if I make a patch to other non-valueschema fields it does not raise this error. It seems like valueschema was not meant to have a value of dict, and adding an extra 'schema' attribute was the only way i could get around cerberus raising a schemaerror/having cerberus actually validate my fields. But eve does not appear to actually be serializing my fields in my dictionary correctly. It should be of type ObjectId when it gets passed to Cerberus.
The way i'm temporarily getting around this is by manipulating my the code in Eve. In common.py (module) in serialize (function) in line 398 i added, where it checks if the field schema is a 'valueschema':
elif field_type == 'dict' and 'schema' in field_schema['valueschema']:
for subdocument in document[field].values():
serialize(subdocument, schema=field_schema['valueschema']['schema'])
Should i not be using type dict for the valueschema? If not how else should i handle this scenario? I would like to not have to maintain my own fork of Eve, so if others do want the ability to have valueschema be of type dict should i submit a pull-request for this change?
This has been fixed with Eve v0.6.4, which has just been released.

JSON Schema: Input malformed

I'm using Tornado_JSON which is based on jsonschema and there is a problem with my schema definition. I tried fixing it in an online schema validator and the problem seems to lie in "additionalItems": True. True with capital T works for python and leads to an error in the online validator (Schema is invalid JSON.). With true the online validator is happy and the example json validates against the schema, but my python script doesn't start anymore (NameError: name 'true' is not defined). Can this be resolved somehow?
#schema.validate(
"""input_schema={
'type': 'object',
'properties': {
'DB': {
'type': 'number'
},
'values': {
'type': 'array',
'items': [
{
'type': 'array',
'items': [
{
'type': 'string'
},
{
'type': [
'number',
'string',
'boolean',
'null'
]
}
]
}
],
'additionalItems': true
}
}
},
input_example={
'DB': 22,
'values': [['INT', 44],['REAL', 33.33],['CHAR', 'b']]
}"""
)
I changed it according to your comments ( external file with json.loads() ). Perfect. Thank you.
Put the schema in a triple-quoted string or an external file, then parse it with json.loads(). Use the lower-case spelling.
The error stems from trying to put a builtin Python datatype into a JSON schema. The latter is a template syntax that is used to check type consistency and should not hold actual data. Instead, under input_schema you'll want to define "additionalItems" to be of { "type": "boolean" } and then add it to the test JSON in your input_example with a boolean after for testing purposes.
Also, I'm not too familiar with Tornado_JSON but it looks like you aren't complying with the schema definition language by placing "additionalItems" inside of the "values" property. Bring that up one level.
More specifically, I think what you're trying to do should look like:
"values": {
...value schema definition...
}
"additionalItems": {
"type": "boolean"
}
And the input examples would become:
input_example={
"DB": 22,
"values": [['INT', 44],['REAL', 33.33],['CHAR', 'b']],
"additionalItems": true
}

Categories

Resources