Pydantic does not validate the key/values of dict fields

Pydantic does not validate the key/values of dict fields - python

I have the following simple data model:
from typing import Dict
from pydantic import BaseModel
class TableModel(BaseModel):
table: Dict[str, str]
I want to add multiple tables like this:
tables = TableModel(table={'T1': 'Tea'})
print(tables) # table={'T1': 'Tea'}
tables.table['T2'] = 'coffee'
tables.table.update({'T3': 'Milk'})
print(tables) # table={'T1': 'Tea', 'T2': 'coffee', 'T3': 'Milk'}
So far everything is working as expected. However the next piece of code does not raise any error:
tables.table[1] = 2
print(tables) # table={'T1': 'Tea', 'T2': 'coffee', 'T3': 'Milk', 1: 2}
I changed tables field name to __root__. With this change as well I see the same behavior.
I also add the validate_assignment = True in the Model Config that also does not help.
How can I get the model to validate the dict fields? Am I missing something basic here?

There are actually two distinct issues here that I'll address separately.
Mutating a dict on a Pydantic model
Observed behavior
from typing import Dict
from pydantic import BaseModel
class TableModel(BaseModel):
table: Dict[str, str]
class Config:
validate_assignment = True
instance = TableModel(table={"a": "b"})
instance.table[1] = object()
print(instance)
Output: table={'a': 'b', 1: <object object at 0x7f7c427d65a0>}
Both key and value type clearly don't match our annotation of table. So, why does the assignment instance.table[1] = object() not cause a validation error?
Explanation
The reason is rather simple: There is no mechanism to enforce validation here. You need to understand what happens here from the point of view of the model.
A model can validate attribute assignment (if you configure validate_assignment = True). It does so by hooking into the __setattr__ method and running the value through the appropriate field validator(s).
But in that example above, we never called BaseModel.__setattr__. Instead, we called the __getattribute__ method that BaseModel inherits from object to access the value of instance.table. That returned the dictionary object ({"a": "b"}). And then we called the dict.__setitem__ method on that dictionary and added a key-value-pair of 1: object() to it.
The dictionary is just a regular old dictionary without any validation logic. And the mutation of that dictionary is completely obscure to the Pydantic model. It has no way of knowing that after accessing the object currently assigned to the table field, we changed something inside that object.
Validation would only be triggered, if we actually assigned a new object to the table field of the model. But that is not what happens here.
If we instead tried to do instance.table = {1: object()}, we would get a validation error because now we are actually setting the table attribute and trying to assign a value to it.
Possible workaround
Depending on how you intend to use the model, you could ensure that changes in the table dictionary will always happen "outside" of the model and are followed by a re-assignment in the form instance.table = .... I would say that is probably the most practical option. In general, re-parsing (subsets of) data should ensure consistency, if you mutated values. Something like this should work (i.e. cause an error):
tables.table[1] = 2
tables = TableModel.parse_obj(tables.dict())
Another option might be to play around and define your own subtype of Dict and add validation logic there, but I am not sure how much "reinventing the wheel" that might entail.
The most sophisticated option could maybe be a descriptor-based approach, where instead of just calling __getattribute__, a custom descriptor intercepts the attribute access and triggers the assignment validation. But that is just an idea. I have not tried this and don't know if that might break other Pydantic magic.
Implicit type coercion
Observed behavior
from typing import Dict
from pydantic import BaseModel
class TableModel(BaseModel):
table: Dict[str, str]
instance = TableModel(table={1: 2})
print(instance)
Output: table={'1': '2'}
Explanation
This is very easily explained. This is expected behavior and was put in place by choice. The idea is that if we can "simply" coerce a value to the specified type, we want to do that. Although you defined both the key and value type as str, passing an int for each is no big deal because the default string validator can just do str(1) and str(2) respectively.
Thus, instead of raising a validation error, the tables value ends up with {"1": "2"} instead.
Possible workaround
If you do not want this implicit coercion to happen, there are strict types that you can use to annotate with. In this case you could to table: Dict[StrictStr, StrictStr]. Then the previous example would indeed raise a validation error.

Related

How to import a Pydantic model into SQLModel?

I generated a Pydantic model and would like to import it into SQLModel. Since said model does not inherit from the SQLModel class, it is not registered in the metadata which is why
SQLModel.metadata.create_all(engine)
just ignores it.
In this discussion I found a way to manually add models:
SQLModel.metadata.tables["hero"].create(engine)
But doing so throws a KeyError for me.
SQLModel.metadata.tables["sopro"].create(engine)
KeyError: 'sopro'
My motivation for tackling the problem this way is that I want to generate an SQLModel from a simple dictionary like this:
model_dict = {"feature_a": int, "feature_b": str}
And in this SO answer, I found a working approach. Thank you very much in advance for your help!

As far as I know, it is not possible to simply convert an existing Pydantic model to an SQLModel at runtime. (At least as of now.)
There are a lot of things that happen during model definition. There is a custom meta class involved, so there is no way that you can simply substitute a regular Pydantic model class for a real SQLModel class, short of manually monkeypatching all the missing pieces.
That being said, you clarified that your actual motivation was to be able to dynamically create an SQLModel class at runtime from a dictionary of field definitions. Luckily, this is in fact possible. All you need to do is utilize the Pydantic create_model function and pass the correct __base__ and __cls_kwargs__ arguments:
from pydantic import create_model
from sqlmodel import SQLModel
field_definitions = {
# your field definitions here
}
Hero = create_model(
"Hero",
__base__=SQLModel,
__cls_kwargs__={"table": True},
**field_definitions,
)
With that, SQLModel.metadata.create_all(engine) should create the corresponding database table according to your field definitions.
See this question for more details.
Be sure to use correct form for the field definitions, as the example you gave would not be valid. As the documentation says, you need to define fields in the form of 2-tuples (or just a default value):
model_dict = {
"feature_a": (int, ...),
"feature_b": (str, ...),
"feature_c": 3.14,
}
Hope this helps.

Django default of ArrayField is a callable, but I still get a warning?

so I've got this model:
class Action(models.Model):
d_changes = ArrayField(models.FloatField(), default=list(), verbose_name='D Changes')
w_changes = ArrayField(models.FloatField(), default=list(), verbose_name='A Changes')
And when I want to create a migration or a fixture I always receive the following warning for both fields:
backend.Action.d_changes: (postgres.E003) ArrayField default should be a callable instead of an instance so that it's not shared between all field instances.
HINT: Use a callable instead, e.g., use `list` instead of `[]`.
For my migrations its not such a big deal, since everything still works fine. But when I try to create a fixture of my db, my .json File always ends up with this bit in the very top of my .json file:
System check identified some issues:
WARNINGS:
[33;1mbackend.Action.d_changes: (postgres.E003) ArrayField default should be a callable instead of an instance so that it's not shared between all field instances.
HINT: Use a callable instead, e.g., use `list` instead of `[]`.[0m
[33;1mbackend.Action.w_changes: (postgres.E003) ArrayField default should be a callable instead of an instance so that it's not shared between all field instances.
HINT: Use a callable instead, e.g., use `list` instead of `[]`.[0m
Which breaks my .json file and thus I cannot use loaddata, as I always receive a DeserializationError(), so I have to manually remove that part.
So what exactly is wrong with the model fields? I mean I'm literally using default=list() which is a callable?
Thanks for the help :)

You have to do this:
class Action(models.Model):
d_changes = ArrayField(models.FloatField(), default=list, verbose_name='D Changes')
w_changes = ArrayField(models.FloatField(), default=list, verbose_name='A Changes')
list() is not callable list is callable. because list() has been already called.

How to limit choices for pydantic using Enum

I got next Enum options:
class ModeEnum(str, Enum):
""" mode """
map = "map"
cluster = "cluster"
region = "region"
This enum used in two Pydantic data structures.
In one data structure I need all Enum options.
In other data structure I need to exclude region.
If I use custom validation for this and try to enter some other value, standard Validation error message informs, that allowed values are all three.
So what is best decision in this situation?
P.S.
I use map variable in ModeEnum. Is it bad? I can't imagine situation when it can override built-in map object, but still, is it ok?

It's a little bit of a hack, but if you mark your validator with pre=True, you should be able to force it to run first, and then you can throw a custom error with the allowed values.

Would writing a class be a good way to create a collection of variables from a JSON object returned by a graphql API response?

I am trying to extract some data from a JSON object generated by a graphql API response. My thought was to write a python class for the "parent" object, and then establish each piece of required data as an attribute.
For example, if I were to get the JSON object:
{
'order_id': 'A1234567',
'quantity': 4,
'price': 100.99
'carrier': 'USPS'
'tracking_id': 'ABC987654321',
'fulfillment_status': 'FULFILLED'
}
I would assign each value as self.order_id, self.quantity, etc. in the class. I wanted to approach it this way because some of the json response is deeply nested and it seemed like a cleaner way to access and use the data rather than using a dictionary approach.
I suppose I could also just create variables for each datum, but the JSON object will almost always be a list of orders, and my thought was to make a list of instantiations of the class and then loop through them to "do stuff" with each instance.
The problem I have is that in some cases the API will not return a value for some of the keys, so I then cannot assign it to an attribute. I thought I could write try/except clauses for each attribute in the class, and assign None if there is no value, but that seemed insane (there are about 20 potential attributes that could/would need to be assigned for each instance).
I will then have some logic in the program afterwards to only do certain actions for 'fulfillment_status': 'Fulfilled', for example, that would not occur for 'fulfillment_status': 'UNFULFILLED', and either do something (or ignore) instances of the class that had None as the value for certain attributes.
Am I thinking about this the wrong way? It feels like it.

dataclass can work for you (it supports default values)
from dataclasses import dataclass
data = {
'order_id': 'A1234567',
'quantity': 4,
'price': 100.99,
'carrier': 'USPS',
'tracking_id': 'ABC987654321',
'fulfillment_status': 'FULFILLED'
}
#dataclass
class OrderDetails:
order_id: str
quantity: int
price: float
carrier: str
tracking_id: str
fulfillment_status: str
order_details = OrderDetails(**data)
print(order_details)
output
OrderDetails(order_id='A1234567', quantity=4, price=100.99, carrier='USPS', tracking_id='ABC987654321', fulfillment_status='FULFILLED')

To me it seems like a solid idea and I definitely encourage working with classes instead of raw JSON data or plain dictionaries. You can get the values from the JSON objects with self.attribute = json_obj.get('attribute', 'default') method. This way if the attribute is absent or null, you can declare a default value to go well with whatever logic you have in place for processing the data. It also doesn't require any try/except blocks to implement.

Trying to understand JSONField for django postgresql

I'm reading the docs on JSONField, a special postgresql field type. Since I intend to create a custom field that subclasses JSONField, with the added features of being able to convert my Lifts class:
class Lifts(object):
def __init__(self, series):
for serie in series:
if type(serie) != LiftSerie:
raise TypeError("List passed to constructor should only contain LiftSerie objects")
self.series = series
class AbstractSerie(object):
def __init__(self, activity, amount):
self.activity_name = activity.name
self.amount = amount
def pre_json(self):
"""A dict that can easily be turned into json."""
pre_json = {
self.activity_name:
self.amount
}
return pre_json
def __str__(self):
return str(self.pre_json())
class LiftSerie(AbstractSerie):
def __init__(self, lift, setlist):
""" lift should be an instance of LiftActivity.
setList is a list containing reps for each set
that has been performed.
"""
if not (isinstance(setlist, collections.Sequence) and not isinstance(setlist, str)):
raise TypeError("setlist has to behave as a list and can not be a string.")
super().__init__(lift, setlist)
I've read here that to_python() and from_db_value() are two methods on the Field class that are involved in loading values from the database and deserializing them. Also, in the docstring of the to_python() method on the Field class, it says that it should be overridden by subclasses. So, I looked in JSONField. Guess what, it doesn't override it. Also, from_db_value() isn't even defined on Field (and not on JOSNField either).
So what is going on here? This is making it very hard to understand how JSONField takes values and turns them into json and stores them in the database, and then the opposite when we query the database.
A summary of my questions:
Why isn't to_python() overridden in JSONField?
Why isn't from_db_value() overridden in JSONField?
Why isn't from_db_value() even defined on Field?
How does JSONField go about taking a python dict for example, converting it to a JSON string, and storing it in the database?
How does it do the opposite?
Sorry for many questions, but I really want to understand this and the docs are a bit lacking IMO.

For Django database fields, there are three relevant states/representations of the same data: form, python and database. In case of the example HandField, form/database representations are the same string, the python representation is the Hand object instance.
In case of a custom field on top of JSONField, the internal python might be a LiftSerie instance, the form representation a json string, the value sent to the database a json string and the value received from the database a json structure converted by psycopg2 from the string returned by postgres, if that makes sense.
In terms of your questions:
The python value is not customized, so the python data type of the field is the same as the expected input. In contrast to the HandField example, where the input could by a string or a Hand instance. In the latter case, the base Field.to_python() implementation, which just returns the input would be enough.
Psycopg2 already converts the database value to json, see 5. This is also true for other types like int/IntegerField.
from_db_value is not defined in the base Field class, but it is certainly taken into account if it exists. If you look at the implementation of Field.get_db_converters(), from_db_value is added to it if the Field has an attribute named like that.
The django.contrib.postgres.JSONField has an optional encoder argument. By default, it uses json.dumps without an encoder to convert a json structure to JSON string.
psycopg2 automatically convertes from database types to python types. It's called adaptation. Documentation for JSON adaptation explains how that works and can be customized.
Note that when implementing a custom field, I would suggest writing tests for it during development, especially if the mechanisms are not completely understood. You can get inspiration for such tests in for example django-localflavor.

Short answer is, both to_python and from_db_value return python strings that should serialize to JSON with no encoding errors, all things being equal.
If you're okay with strings, that's fine but I usually override Django's JSONFields's from_db_value method to return a dict or a list, not a string for use in my code. I created a custom field for that.
To me, the whole point of a Json field is to be able to interact with it's values as dicts or lists.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.