How to limit choices for pydantic using Enum - python

I got next Enum options:
class ModeEnum(str, Enum):
""" mode """
map = "map"
cluster = "cluster"
region = "region"
This enum used in two Pydantic data structures.
In one data structure I need all Enum options.
In other data structure I need to exclude region.
If I use custom validation for this and try to enter some other value, standard Validation error message informs, that allowed values are all three.
So what is best decision in this situation?
P.S.
I use map variable in ModeEnum. Is it bad? I can't imagine situation when it can override built-in map object, but still, is it ok?

It's a little bit of a hack, but if you mark your validator with pre=True, you should be able to force it to run first, and then you can throw a custom error with the allowed values.

Related

Pydantic does not validate the key/values of dict fields

I have the following simple data model:
from typing import Dict
from pydantic import BaseModel
class TableModel(BaseModel):
table: Dict[str, str]
I want to add multiple tables like this:
tables = TableModel(table={'T1': 'Tea'})
print(tables) # table={'T1': 'Tea'}
tables.table['T2'] = 'coffee'
tables.table.update({'T3': 'Milk'})
print(tables) # table={'T1': 'Tea', 'T2': 'coffee', 'T3': 'Milk'}
So far everything is working as expected. However the next piece of code does not raise any error:
tables.table[1] = 2
print(tables) # table={'T1': 'Tea', 'T2': 'coffee', 'T3': 'Milk', 1: 2}
I changed tables field name to __root__. With this change as well I see the same behavior.
I also add the validate_assignment = True in the Model Config that also does not help.
How can I get the model to validate the dict fields? Am I missing something basic here?
There are actually two distinct issues here that I'll address separately.
Mutating a dict on a Pydantic model
Observed behavior
from typing import Dict
from pydantic import BaseModel
class TableModel(BaseModel):
table: Dict[str, str]
class Config:
validate_assignment = True
instance = TableModel(table={"a": "b"})
instance.table[1] = object()
print(instance)
Output: table={'a': 'b', 1: <object object at 0x7f7c427d65a0>}
Both key and value type clearly don't match our annotation of table. So, why does the assignment instance.table[1] = object() not cause a validation error?
Explanation
The reason is rather simple: There is no mechanism to enforce validation here. You need to understand what happens here from the point of view of the model.
A model can validate attribute assignment (if you configure validate_assignment = True). It does so by hooking into the __setattr__ method and running the value through the appropriate field validator(s).
But in that example above, we never called BaseModel.__setattr__. Instead, we called the __getattribute__ method that BaseModel inherits from object to access the value of instance.table. That returned the dictionary object ({"a": "b"}). And then we called the dict.__setitem__ method on that dictionary and added a key-value-pair of 1: object() to it.
The dictionary is just a regular old dictionary without any validation logic. And the mutation of that dictionary is completely obscure to the Pydantic model. It has no way of knowing that after accessing the object currently assigned to the table field, we changed something inside that object.
Validation would only be triggered, if we actually assigned a new object to the table field of the model. But that is not what happens here.
If we instead tried to do instance.table = {1: object()}, we would get a validation error because now we are actually setting the table attribute and trying to assign a value to it.
Possible workaround
Depending on how you intend to use the model, you could ensure that changes in the table dictionary will always happen "outside" of the model and are followed by a re-assignment in the form instance.table = .... I would say that is probably the most practical option. In general, re-parsing (subsets of) data should ensure consistency, if you mutated values. Something like this should work (i.e. cause an error):
tables.table[1] = 2
tables = TableModel.parse_obj(tables.dict())
Another option might be to play around and define your own subtype of Dict and add validation logic there, but I am not sure how much "reinventing the wheel" that might entail.
The most sophisticated option could maybe be a descriptor-based approach, where instead of just calling __getattribute__, a custom descriptor intercepts the attribute access and triggers the assignment validation. But that is just an idea. I have not tried this and don't know if that might break other Pydantic magic.
Implicit type coercion
Observed behavior
from typing import Dict
from pydantic import BaseModel
class TableModel(BaseModel):
table: Dict[str, str]
instance = TableModel(table={1: 2})
print(instance)
Output: table={'1': '2'}
Explanation
This is very easily explained. This is expected behavior and was put in place by choice. The idea is that if we can "simply" coerce a value to the specified type, we want to do that. Although you defined both the key and value type as str, passing an int for each is no big deal because the default string validator can just do str(1) and str(2) respectively.
Thus, instead of raising a validation error, the tables value ends up with {"1": "2"} instead.
Possible workaround
If you do not want this implicit coercion to happen, there are strict types that you can use to annotate with. In this case you could to table: Dict[StrictStr, StrictStr]. Then the previous example would indeed raise a validation error.

Initialize classes from POST JSON data

I am writing a Django app, which will send some data from the site to a python script to process. I am planning on sending this data as a JSON string (this need not be the case). Some of the values sent over would ideally be class instances, however this is clearly not possible, and the class name plus any arguments needed to initialize the class must some how be serialized into a JSON value before then being deserialized by the python script. This could be achieved with the code below, but it has several problems:
My attempt
I have put all the data needed for each class, in a list and used that to initialize each class:
import json
class Class1():
def __init__(self, *args, **kwargs):
for k, v in kwargs.items():
setattr(self, k, v)
self._others = args
class Bar():
POTENTIAL_OBJECTS = {"RANGE": range,
"Class1": Class1}
def __init__(self, json_string):
python_dict = json.loads(json_string)
for key, value in python_dict.items():
if isinstance(value, list) and value[0] in Bar.POTENTIAL_OBJECTS:
setattr(self, key, Bar.POTENTIAL_OBJECTS[value[0]](*value[1], **value[2]))
else:
setattr(self, key, value)
example = ('{ "key_1":"Some string", "key_2":["heres", "a", "list"],'
'"key_3":["RANGE", [10], {}], "key_4":["Class1", ["stuff"], {"stuff2":"x"}] }')
a = Bar(example)
The Problems with my approach
Apart from generally being a bit messy and not particularly elegant, there are other problems. Some of the lists in the JSON object will be generated by the user, and this obviously presents problems if the user uses a key from POTENTIAL_OBJECTS. (In a non-simplified version, Bar will have lots of subclasses, each with a second POTENTIAL_OBJECTS so keeping track of all the potential values for front-end validation would be tricky).
My Question
It feels like this must be a reasonably common thing that is needed and there must be some standard patterns or ways of achieving this. Is there a common/better approach/method to achieve this?
EDIT: I have realised, one way round the problem is to make all the keys in POTENTIAL_OBJECTS start with an underscore, and then validate against any underscores in user-inputs at the front-end. It still seems like there must be a better way to de-serialize from JSON to more complex objects than strings/ints/bools/lists etc.
Instead of having one master method to turn any arbitrary JSON into an arbitrary hierarchy of Python objects, the typical pattern would be to create a Django model for each type of thing you are trying to model. Relationships between them would then be modeled via relationship fields (ForeignKey, ManyToMany, etc, as appropriate). For instance, you might create a class Employee that models an employee, and a class Paycheck. Paycheck could then have a ForeignKey field named issued_to that refers to an Employee.
Note also that any scheme similar to the one you describe (where user-created JSON is translated directly into arbitrary Python objects) would have security implications, potentially allowing users to execute arbitrary code in the context of the Django server, though if you were to attempt it, the whitelist approach have started here would be a decent place to start as a way to do it safely.
In short, you're reinventing most of what Django already does for you. The Django ORM features will help you to create models of the specific things you are interested in, validate the data, turn those data into Python objects safely, and even save instances of these models in the database for retrieval later.
That said, if you are to parse a JSON string directly into an object hierarchy, you would have to do a full traversal instead of just going over the top-level items. To do that, you should look into doing something like a depth-first traversal, creating new model instances at each new node in the hierarchy. If you want to validate these inputs server-side, you'd need to replicate this work in Javascript as well.

Trying to understand JSONField for django postgresql

I'm reading the docs on JSONField, a special postgresql field type. Since I intend to create a custom field that subclasses JSONField, with the added features of being able to convert my Lifts class:
class Lifts(object):
def __init__(self, series):
for serie in series:
if type(serie) != LiftSerie:
raise TypeError("List passed to constructor should only contain LiftSerie objects")
self.series = series
class AbstractSerie(object):
def __init__(self, activity, amount):
self.activity_name = activity.name
self.amount = amount
def pre_json(self):
"""A dict that can easily be turned into json."""
pre_json = {
self.activity_name:
self.amount
}
return pre_json
def __str__(self):
return str(self.pre_json())
class LiftSerie(AbstractSerie):
def __init__(self, lift, setlist):
""" lift should be an instance of LiftActivity.
setList is a list containing reps for each set
that has been performed.
"""
if not (isinstance(setlist, collections.Sequence) and not isinstance(setlist, str)):
raise TypeError("setlist has to behave as a list and can not be a string.")
super().__init__(lift, setlist)
I've read here that to_python() and from_db_value() are two methods on the Field class that are involved in loading values from the database and deserializing them. Also, in the docstring of the to_python() method on the Field class, it says that it should be overridden by subclasses. So, I looked in JSONField. Guess what, it doesn't override it. Also, from_db_value() isn't even defined on Field (and not on JOSNField either).
So what is going on here? This is making it very hard to understand how JSONField takes values and turns them into json and stores them in the database, and then the opposite when we query the database.
A summary of my questions:
Why isn't to_python() overridden in JSONField?
Why isn't from_db_value() overridden in JSONField?
Why isn't from_db_value() even defined on Field?
How does JSONField go about taking a python dict for example, converting it to a JSON string, and storing it in the database?
How does it do the opposite?
Sorry for many questions, but I really want to understand this and the docs are a bit lacking IMO.
For Django database fields, there are three relevant states/representations of the same data: form, python and database. In case of the example HandField, form/database representations are the same string, the python representation is the Hand object instance.
In case of a custom field on top of JSONField, the internal python might be a LiftSerie instance, the form representation a json string, the value sent to the database a json string and the value received from the database a json structure converted by psycopg2 from the string returned by postgres, if that makes sense.
In terms of your questions:
The python value is not customized, so the python data type of the field is the same as the expected input. In contrast to the HandField example, where the input could by a string or a Hand instance. In the latter case, the base Field.to_python() implementation, which just returns the input would be enough.
Psycopg2 already converts the database value to json, see 5. This is also true for other types like int/IntegerField.
from_db_value is not defined in the base Field class, but it is certainly taken into account if it exists. If you look at the implementation of Field.get_db_converters(), from_db_value is added to it if the Field has an attribute named like that.
The django.contrib.postgres.JSONField has an optional encoder argument. By default, it uses json.dumps without an encoder to convert a json structure to JSON string.
psycopg2 automatically convertes from database types to python types. It's called adaptation. Documentation for JSON adaptation explains how that works and can be customized.
Note that when implementing a custom field, I would suggest writing tests for it during development, especially if the mechanisms are not completely understood. You can get inspiration for such tests in for example django-localflavor.
Short answer is, both to_python and from_db_value return python strings that should serialize to JSON with no encoding errors, all things being equal.
If you're okay with strings, that's fine but I usually override Django's JSONFields's from_db_value method to return a dict or a list, not a string for use in my code. I created a custom field for that.
To me, the whole point of a Json field is to be able to interact with it's values as dicts or lists.

Overriding methods for defining custom model field in django

I have been trying to define custom django model field in python. I referred the django docs at following location https://docs.djangoproject.com/en/1.10/howto/custom-model-fields/. However, I am confused over the following methods(which I have divided into groups as per my understanding) :-
Group 1 (Methods in this group are inter-related as per docs)
__init__()
deconstruct()
Group 2
db_type()
rel_db_type()
get_internal_type()
Group 3
from_db_value()
to_python()
get_prep_value()
get_db_prep_value()
get_db_prep_save()
value_from_object()
value_to_string()
Group 4
formfield
I am having following questions :-
When deconstruct() is used ? Docs says that, it's useful during migration, but it's not clearly explained. Moreover, when is it called ?
Difference between db_type() and get_internal_type()
Difference between get_prep_value() and get_db_prep_value()
Difference between value_from_object() and value_to_string(). value_from_object() is not given in docs.
Both from_db_value(), value_to_string() and to_python() gives python object from string. Then, why these different methods are exists ?
I know, I have asked a bit lengthy question. But couldn't find any other way to better ask this question.
Thanks in advance.
I'll try to answer them:
Q: When deconstruct() is used ?
A: This method is being used when you have instance of your Field to re-create it based on arguments you just passed in __init__.
As they mentioned in docs, if you are setting max_length arg to a static value in your __init__ method; you do not need it for your instances. So you can delete it in your deconstruct() method. With this, max_length won't show up in your instance while you are using it in your models. You can think deconstruct as a last clean-up and control place before use your field in model.
Q: Difference between db_type() and get_internal_type()
A: They are both related, but belong to different levels.
If your custom field's data type is depends on which DB you are using, db_type() is the place you can do your controls. Again, like they mentioned in docs, if your field is a kind of date/time value, you should / may check if current database is PostgreSQL or MySQL in this method. Because while date/time values called as timestamp in PostgreSQL, it is called datetime in MySQL.
get_internal_type method is kind of higher level version of db_type(). Let's go over date/time value example: If you don't want to check and control each data types belongs to different databases, you can inherit your custom field's data type from built-in Django fields. Instead of checking if it should be datetime or timestamp; you can return simply DateField in your get_internal_type method. As they mentioned in docs, If you've created db_type method already, in most cases, you do not need get_internal_type method.
Q: Difference between get_prep_value() and get_db_prep_value()
A: These guys also share almost same logic between db_type() and get_internal_type(). First of all, both these methods stands for converting db values to python objects. But, like in db_type method, get_db_prep_value() stands for backend specific field types.
Q: Difference between value_from_object() and value_to_string(). value_from_object() is not given in docs
A: From the docs:
To customize how the values are serialized by a serializer, you can
override value_to_string(). Using value_from_object() is the best way
to get the field’s value prior to serialization.
So, Actually we don't need value_from_object as documented. This method is used to get field's raw value before serialization. Get the value with this method, and customize how it should be serialized in value_to_string method. They even put an example code in docs
Q: Both from_db_value(), value_to_string() and to_python() gives python object from string. Then, why these different methods are exists ?
A: While to_python() converts field value to a valid python object, value_to_string() converts field values to string with your custom serialization. They stands for different jobs.
And from_db_value converts the value returned by database to python object. Never heard of it actually. But check this part from docs:
This method is not used for most built-in fields as the database
backend already returns the correct Python type, or the backend itself
does the conversion.

Python: Assign an object's variable from a function (OpenERP)

I'm working on a OpenERP environment, but maybe my issue can be answered from a pure python perspective. What I'm trying to do is define a class whose "_columns" variable can be set from a function that returns the respective dictionary. So basically:
class repos_report(osv.osv):
_name = "repos.report"
_description = "Reposition"
_auto = False
def _get_dyna_cols(self):
ret = {}
cr = self.cr
cr.execute('Select ... From ...')
pass #<- Fill dictionary
return ret
_columns = _get_dyna_cols()
def init(self, cr):
pass #Other stuff here too, but I need to set my _columns before as per openerp
repos_report()
I have tried many ways, but these code reflects my basic need. When I execute my module for installation I get the following error.
TypeError: _get_dyna_cols() takes exactly 1 argument (0 given)
When defining the the _get_dyna_cols function I'm required to have self as first parameter (even before executing). Also, I need a reference to openerp's 'cr' cursor in order to query data to fill my _columns dictionary. So, how can I call this function so that it can be assigned to _columns? What parameter could I pass to this function?
From an OpenERP perspective, I guess I made my need quite clear. So any other approach suggested is also welcome.
From an OpenERP perspective, the right solution depends on what you're actually trying to do, and that's not quite clear from your description.
Usually the _columns definition of a model must be static, since it will be introspected by the ORM and (among other things) will result in the creation of corresponding database columns. You could set the _columns in the __init__ method (not init1) of your model, but that would not make much sense because the result must not change over time, (and it will only get called once when the model registry is initialized anyway).
Now there are a few exceptions to the "static columns" rules:
Function Fields
When you simply want to dynamically handle read/write operations on a virtual column, you can simply use a column of the fields.function type. It needs to emulate one of the other field types, but can do anything it wants with the data dynamically. Typical examples will store the data in other (real) columns after some pre-processing. There are hundreds of example in the official OpenERP modules.
Dynamic columns set
When you are developing a wizard model (a subclass of TransientModel, formerly osv_memory), you don't usually care about the database storage, and simply want to obtain some input from the user and take corresponding actions.
It is not uncommon in that case to need a completely dynamic set of columns, where the number and types of the columns may change every time the model is used. This can be achieved by overriding a few key API methods to simulate dynamic columns`:
fields_view_get is the API method that is called by the clients to obtain the definition of a view (form/tree/...) for the model.
fields_get is included in the result of fields_view_get but may be called separately, and returns a dict with the columns definition of the model.
search, read, write and create are called by the client in order to access and update record data, and should gracefully accept or return values for the columns that were defined in the result of fields_get
By overriding properly these methods, you can completely implement dynamic columns, but you will need to preserve the API behavior, and handle the persistence of the data (if any) yourself, in real static columns or in other models.
There are a few examples of such dynamic columns sets in the official addons, for example in the survey module that needs to simulate survey forms based on the definition of the survey campaign.
1 The init() method is only called when the model's module is installed or updated, in order to setup/update the database backend for this model. It relies on the _columns to do this.
When you write _columns = _get_dyna_cols() in the class body, that function call is made right there, in the class body, as Python is still parsing the class itself. At that point, your _get_dyn_cols method is just a function object in the local (class body) namespace - and it is called.
The error message you get is due to the missing self parameter, which is inserted only when you access your function as a method - but this error message is not what is wrong here: what is wrong is that you are making an imediate function call and expecting an special behavior, like late execution.
The way in Python to achieve what you want - i.e. to have the method called authomatically when the attribute colluns is accessed is to use the "property" built-in.
In this case, do just this: _columns = property(_get_dyna_cols) -
This will create a class attribute named "columns" which through a mechanism called "descriptor protocol" will call the desired method whenever the attribute is accessed from an instance.
To leran more about the property builtin, check the docs: http://docs.python.org/library/functions.html#property

Categories

Resources