python jsonschema remove additional and use defaults - python

I'm using the python jsonschema https://python-jsonschema.readthedocs.io/en/latest/
and I'm trying to find how to use default values and remove additional fields when found.
anyone know how am I suppose to do it?
or maybe have another solution to validate jsonschema that supports default values and remove any additional field (like js avj)?

Hidden in the FAQs you'll find this
Why doesn’t my schema’s default property set the default on my
instance? The basic answer is that the specification does not require
that default actually do anything.
For an inkling as to why it doesn’t actually do anything, consider
that none of the other validators modify the instance either. More
importantly, having default modify the instance can produce quite
peculiar things. It’s perfectly valid (and perhaps even useful) to
have a default that is not valid under the schema it lives in! So an
instance modified by the default would pass validation the first time,
but fail the second!
Still, filling in defaults is a thing that is useful. jsonschema
allows you to define your own validator classes and callables, so you
can easily create an jsonschema.IValidator that does do default
setting. Here’s some code to get you started. (In this code, we add
the default properties to each object before the properties are
validated, so the default values themselves will need to be valid
under the schema.)
from jsonschema import Draft4Validator, validators
def extend_with_default(validator_class):
validate_properties = validator_class.VALIDATORS["properties"]
def set_defaults(validator, properties, instance, schema):
for property, subschema in properties.iteritems():
if "default" in subschema:
instance.setdefault(property, subschema["default"])
for error in validate_properties(
validator, properties, instance, schema,
):
yield error
return validators.extend(
validator_class, {"properties" : set_defaults},
)
DefaultValidatingDraft4Validator = extend_with_default(Draft4Validator)
# Example usage:
obj = {}
schema = {'properties': {'foo': {'default': 'bar'}}}
# Note jsonschem.validate(obj, schema, cls=DefaultValidatingDraft4Validator)
# will not work because the metaschema contains `default` directives.
DefaultValidatingDraft4Validator(schema).validate(obj)
assert obj == {'foo': 'bar'}
From: https://python-jsonschema.readthedocs.io/en/latest/faq/#why-doesn-t-my-schema-s-default-property-set-the-default-on-my-instance

Related

Pydantic does not validate the key/values of dict fields

I have the following simple data model:
from typing import Dict
from pydantic import BaseModel
class TableModel(BaseModel):
table: Dict[str, str]
I want to add multiple tables like this:
tables = TableModel(table={'T1': 'Tea'})
print(tables) # table={'T1': 'Tea'}
tables.table['T2'] = 'coffee'
tables.table.update({'T3': 'Milk'})
print(tables) # table={'T1': 'Tea', 'T2': 'coffee', 'T3': 'Milk'}
So far everything is working as expected. However the next piece of code does not raise any error:
tables.table[1] = 2
print(tables) # table={'T1': 'Tea', 'T2': 'coffee', 'T3': 'Milk', 1: 2}
I changed tables field name to __root__. With this change as well I see the same behavior.
I also add the validate_assignment = True in the Model Config that also does not help.
How can I get the model to validate the dict fields? Am I missing something basic here?
There are actually two distinct issues here that I'll address separately.
Mutating a dict on a Pydantic model
Observed behavior
from typing import Dict
from pydantic import BaseModel
class TableModel(BaseModel):
table: Dict[str, str]
class Config:
validate_assignment = True
instance = TableModel(table={"a": "b"})
instance.table[1] = object()
print(instance)
Output: table={'a': 'b', 1: <object object at 0x7f7c427d65a0>}
Both key and value type clearly don't match our annotation of table. So, why does the assignment instance.table[1] = object() not cause a validation error?
Explanation
The reason is rather simple: There is no mechanism to enforce validation here. You need to understand what happens here from the point of view of the model.
A model can validate attribute assignment (if you configure validate_assignment = True). It does so by hooking into the __setattr__ method and running the value through the appropriate field validator(s).
But in that example above, we never called BaseModel.__setattr__. Instead, we called the __getattribute__ method that BaseModel inherits from object to access the value of instance.table. That returned the dictionary object ({"a": "b"}). And then we called the dict.__setitem__ method on that dictionary and added a key-value-pair of 1: object() to it.
The dictionary is just a regular old dictionary without any validation logic. And the mutation of that dictionary is completely obscure to the Pydantic model. It has no way of knowing that after accessing the object currently assigned to the table field, we changed something inside that object.
Validation would only be triggered, if we actually assigned a new object to the table field of the model. But that is not what happens here.
If we instead tried to do instance.table = {1: object()}, we would get a validation error because now we are actually setting the table attribute and trying to assign a value to it.
Possible workaround
Depending on how you intend to use the model, you could ensure that changes in the table dictionary will always happen "outside" of the model and are followed by a re-assignment in the form instance.table = .... I would say that is probably the most practical option. In general, re-parsing (subsets of) data should ensure consistency, if you mutated values. Something like this should work (i.e. cause an error):
tables.table[1] = 2
tables = TableModel.parse_obj(tables.dict())
Another option might be to play around and define your own subtype of Dict and add validation logic there, but I am not sure how much "reinventing the wheel" that might entail.
The most sophisticated option could maybe be a descriptor-based approach, where instead of just calling __getattribute__, a custom descriptor intercepts the attribute access and triggers the assignment validation. But that is just an idea. I have not tried this and don't know if that might break other Pydantic magic.
Implicit type coercion
Observed behavior
from typing import Dict
from pydantic import BaseModel
class TableModel(BaseModel):
table: Dict[str, str]
instance = TableModel(table={1: 2})
print(instance)
Output: table={'1': '2'}
Explanation
This is very easily explained. This is expected behavior and was put in place by choice. The idea is that if we can "simply" coerce a value to the specified type, we want to do that. Although you defined both the key and value type as str, passing an int for each is no big deal because the default string validator can just do str(1) and str(2) respectively.
Thus, instead of raising a validation error, the tables value ends up with {"1": "2"} instead.
Possible workaround
If you do not want this implicit coercion to happen, there are strict types that you can use to annotate with. In this case you could to table: Dict[StrictStr, StrictStr]. Then the previous example would indeed raise a validation error.

Trying to understand JSONField for django postgresql

I'm reading the docs on JSONField, a special postgresql field type. Since I intend to create a custom field that subclasses JSONField, with the added features of being able to convert my Lifts class:
class Lifts(object):
def __init__(self, series):
for serie in series:
if type(serie) != LiftSerie:
raise TypeError("List passed to constructor should only contain LiftSerie objects")
self.series = series
class AbstractSerie(object):
def __init__(self, activity, amount):
self.activity_name = activity.name
self.amount = amount
def pre_json(self):
"""A dict that can easily be turned into json."""
pre_json = {
self.activity_name:
self.amount
}
return pre_json
def __str__(self):
return str(self.pre_json())
class LiftSerie(AbstractSerie):
def __init__(self, lift, setlist):
""" lift should be an instance of LiftActivity.
setList is a list containing reps for each set
that has been performed.
"""
if not (isinstance(setlist, collections.Sequence) and not isinstance(setlist, str)):
raise TypeError("setlist has to behave as a list and can not be a string.")
super().__init__(lift, setlist)
I've read here that to_python() and from_db_value() are two methods on the Field class that are involved in loading values from the database and deserializing them. Also, in the docstring of the to_python() method on the Field class, it says that it should be overridden by subclasses. So, I looked in JSONField. Guess what, it doesn't override it. Also, from_db_value() isn't even defined on Field (and not on JOSNField either).
So what is going on here? This is making it very hard to understand how JSONField takes values and turns them into json and stores them in the database, and then the opposite when we query the database.
A summary of my questions:
Why isn't to_python() overridden in JSONField?
Why isn't from_db_value() overridden in JSONField?
Why isn't from_db_value() even defined on Field?
How does JSONField go about taking a python dict for example, converting it to a JSON string, and storing it in the database?
How does it do the opposite?
Sorry for many questions, but I really want to understand this and the docs are a bit lacking IMO.
For Django database fields, there are three relevant states/representations of the same data: form, python and database. In case of the example HandField, form/database representations are the same string, the python representation is the Hand object instance.
In case of a custom field on top of JSONField, the internal python might be a LiftSerie instance, the form representation a json string, the value sent to the database a json string and the value received from the database a json structure converted by psycopg2 from the string returned by postgres, if that makes sense.
In terms of your questions:
The python value is not customized, so the python data type of the field is the same as the expected input. In contrast to the HandField example, where the input could by a string or a Hand instance. In the latter case, the base Field.to_python() implementation, which just returns the input would be enough.
Psycopg2 already converts the database value to json, see 5. This is also true for other types like int/IntegerField.
from_db_value is not defined in the base Field class, but it is certainly taken into account if it exists. If you look at the implementation of Field.get_db_converters(), from_db_value is added to it if the Field has an attribute named like that.
The django.contrib.postgres.JSONField has an optional encoder argument. By default, it uses json.dumps without an encoder to convert a json structure to JSON string.
psycopg2 automatically convertes from database types to python types. It's called adaptation. Documentation for JSON adaptation explains how that works and can be customized.
Note that when implementing a custom field, I would suggest writing tests for it during development, especially if the mechanisms are not completely understood. You can get inspiration for such tests in for example django-localflavor.
Short answer is, both to_python and from_db_value return python strings that should serialize to JSON with no encoding errors, all things being equal.
If you're okay with strings, that's fine but I usually override Django's JSONFields's from_db_value method to return a dict or a list, not a string for use in my code. I created a custom field for that.
To me, the whole point of a Json field is to be able to interact with it's values as dicts or lists.

Overriding/blocking __set__ on db.Property to enforce an alternate set method/process

I'm trying to sub-class db.Property and override the set method to implement stuff like pre and post set logic.
The problem is that __set__ is being called directly on the property by db.Model.__init__() during the from_entity conversion of entity to instance (after it comes out of the datastore), so obviously pre and post set logic should not be called.
class MyProperty(db.StringProperty):
def __set__(model_instance, value):
self.pre_set(value)
super(MyProperty.__set__(model_instance, value)
self.post_set(value)
class MyModel(db.Model):
foo = MyProperty()
my_model = MyModel()
my_model.put()
my_model.foo = u'A new string.' """pre/post set logic runs."""
#onload the __set__ method will be called again
loaded_model = db.get(my_model.key())
# In db.Model.__init__()
for prop in self.properties().values():
value = kwargs.get(prop.name, None) or prop.default() #or something like that
prop.__set__(self, value) """pre/post set logic also runs :("""
How can I differentiate between these two occurrences without having to override db.Model.__init__()? or should I just do that? Am I not supposed to be doing this with prop.__set__()?
Unfortunately, there's no easy way to distinguish the two situations. You could examine the stack, but that's an extremely kludgy approach. You could override Model.__init__ - either by monkeypatching it or by requiring users of your property extend your custom model subclass - but I'm not sure how you'd modify it in a way that would help yet remain backwards compatible with existing property classes.
You might want to check out Guido's NDB project - it may be more flexible in this respect, and is still under active development.

Using Property Builtin with GAE Datastore's Model

I want to make attributes of GAE Model properties. The reason is for cases like to turn the value into uppercase before storing it. For a plain Python class, I would do something like:
Foo(db.Model):
def get_attr(self):
return self.something
def set_attr(self, value):
self.something = value.upper() if value != None else None
attr = property(get_attr, set_attr)
However, GAE Datastore have their own concept of Property class, I looked into the documentation and it seems that I could override get_value_for_datastore(model_instance) to achieve my goal. Nevertheless, I don't know what model_instance is and how to extract the corresponding field from it.
Is overriding GAE Property classes the right way to provides getter/setter-like functionality? If so, how to do it?
Added:
One potential issue of overriding get_value_for_datastore that I think of is it might not get called before the object was put into datastore. Hence getting the attribute before storing the object would yield an incorrect value.
Subclassing GAE's Property class is especially helpful if you want more than one "field" with similar behavior, in one or more models. Don't worry, get_value_for_datastore and make_value_from_datastore are going to get called, on any store and fetch respectively -- so if you need to do anything fancy (including but not limited to uppercasing a string, which isn't actually all that fancy;-), overriding these methods in your subclass is just fine.
Edit: let's see some example code (net of imports and main):
class MyStringProperty(db.StringProperty):
def get_value_for_datastore(self, model_instance):
vv = db.StringProperty.get_value_for_datastore(self, model_instance)
return vv.upper()
class MyModel(db.Model):
foo = MyStringProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
my = MyModel(foo='Hello World')
k = my.put()
mm = MyModel.get(k)
s = mm.foo
self.response.out.write('The secret word is: %r' % s)
This shows you the string's been uppercased in the datastore -- but if you change the get call to a simple mm = my you'll see the in-memory instance wasn't affected.
But, a db.Property instance itself is a descriptor -- wrapping it into a built-in property (a completely different descriptor) will not work well with the datastore (for example, you can't write GQL queries based on field names that aren't really instances of db.Property but instances of property -- those fields are not in the datastore!).
So if you want to work with both the datastore and for instances of Model that have never actually been to the datastore and back, you'll have to choose two names for what's logically "the same" field -- one is the name of the attribute you'll use on in-memory model instances, and that one can be a built-in property; the other one is the name of the attribute that ends up in the datastore, and that one needs to be an instance of a db.Property subclass and it's this second name that you'll need to use in queries. Of course the methods underlying the first name need to read and write the second name, but you can't just "hide" the latter because that's the name that's going to be in the datastore, and so that's the name that will make sense to queries!
What you want is a DerivedProperty. The procedure for writing one is outlined in that post - it's similar to what Alex describes, but by overriding get instead of get_value_for_datastore, you avoid issues with needing to write to the datastore to update it. My aetycoon library has it and other useful properties included.

Elixir not creating my tables with default values

class MyObject(Entity):
name = Field(Unicode(256), default=u'default name', nullable=False)
using_options(shortnames=True)
using_mapper_options(save_on_init=False)
def __init__(self):
self.name = None
I am using MySQL in this case, but have also checked against SQLite and I get the same result. It respects nullable, but ignores default entirely. I don't get any error messages, and it creates the tables just fine. I could go back through and add the defaults, but this is a serious pain that I would like to avoid if possible.
I've tried it with other Field types, but still no joy.
default keyword argument in SQLAlchemy is for Python runtime default value, it's used for INSERT statements. Use PassiveDefault() object as positional argument when you really need database level default.

Categories

Resources