Trying to understand JSONField for django postgresql

Trying to understand JSONField for django postgresql - python

I'm reading the docs on JSONField, a special postgresql field type. Since I intend to create a custom field that subclasses JSONField, with the added features of being able to convert my Lifts class:
class Lifts(object):
def __init__(self, series):
for serie in series:
if type(serie) != LiftSerie:
raise TypeError("List passed to constructor should only contain LiftSerie objects")
self.series = series
class AbstractSerie(object):
def __init__(self, activity, amount):
self.activity_name = activity.name
self.amount = amount
def pre_json(self):
"""A dict that can easily be turned into json."""
pre_json = {
self.activity_name:
self.amount
}
return pre_json
def __str__(self):
return str(self.pre_json())
class LiftSerie(AbstractSerie):
def __init__(self, lift, setlist):
""" lift should be an instance of LiftActivity.
setList is a list containing reps for each set
that has been performed.
"""
if not (isinstance(setlist, collections.Sequence) and not isinstance(setlist, str)):
raise TypeError("setlist has to behave as a list and can not be a string.")
super().__init__(lift, setlist)
I've read here that to_python() and from_db_value() are two methods on the Field class that are involved in loading values from the database and deserializing them. Also, in the docstring of the to_python() method on the Field class, it says that it should be overridden by subclasses. So, I looked in JSONField. Guess what, it doesn't override it. Also, from_db_value() isn't even defined on Field (and not on JOSNField either).
So what is going on here? This is making it very hard to understand how JSONField takes values and turns them into json and stores them in the database, and then the opposite when we query the database.
A summary of my questions:
Why isn't to_python() overridden in JSONField?
Why isn't from_db_value() overridden in JSONField?
Why isn't from_db_value() even defined on Field?
How does JSONField go about taking a python dict for example, converting it to a JSON string, and storing it in the database?
How does it do the opposite?
Sorry for many questions, but I really want to understand this and the docs are a bit lacking IMO.

For Django database fields, there are three relevant states/representations of the same data: form, python and database. In case of the example HandField, form/database representations are the same string, the python representation is the Hand object instance.
In case of a custom field on top of JSONField, the internal python might be a LiftSerie instance, the form representation a json string, the value sent to the database a json string and the value received from the database a json structure converted by psycopg2 from the string returned by postgres, if that makes sense.
In terms of your questions:
The python value is not customized, so the python data type of the field is the same as the expected input. In contrast to the HandField example, where the input could by a string or a Hand instance. In the latter case, the base Field.to_python() implementation, which just returns the input would be enough.
Psycopg2 already converts the database value to json, see 5. This is also true for other types like int/IntegerField.
from_db_value is not defined in the base Field class, but it is certainly taken into account if it exists. If you look at the implementation of Field.get_db_converters(), from_db_value is added to it if the Field has an attribute named like that.
The django.contrib.postgres.JSONField has an optional encoder argument. By default, it uses json.dumps without an encoder to convert a json structure to JSON string.
psycopg2 automatically convertes from database types to python types. It's called adaptation. Documentation for JSON adaptation explains how that works and can be customized.
Note that when implementing a custom field, I would suggest writing tests for it during development, especially if the mechanisms are not completely understood. You can get inspiration for such tests in for example django-localflavor.

Short answer is, both to_python and from_db_value return python strings that should serialize to JSON with no encoding errors, all things being equal.
If you're okay with strings, that's fine but I usually override Django's JSONFields's from_db_value method to return a dict or a list, not a string for use in my code. I created a custom field for that.
To me, the whole point of a Json field is to be able to interact with it's values as dicts or lists.

Related

How can I structure my JSON schema to validate for DynamoDB and RESTAPI?

I am writing a REST API that will store several complex objects to an AWS DynamoDB and then when requested, retrieve them, perform computations on them, and return a result. Here is a big of extracted, simplified, renamed, pseudo code.
class Widget:
def __init__(self, height, weight):
self.height = height
self.weight = weight
class Machine:
def __init__ (self, widgets):
self.widgets = widgets
def useful_method ():
return "something great"
class WidgetSchema (Schema):
height = fields.Decimal()
weight = fields.Decimal()
#post_load
def make_widget (self, data):
return Widget(*data)
class MachineSchema (Schema):
widgets = fields.List(fields.Nested(WidgetSchema))
def make_machine (self, data):
return Machine(*data)
app = Flask(__name__)
dynamodb = boto3.resource("dynamodb", ...)
#app.route("/machine/<uuid:machine_id>", methods=['POST'])
def create_machine(machine_id):
input_json = request.get_json()
validated_input = MachineSchema().load(input_json)
# NOTE: validated_input should be a Python dict which
# contains Decimals instead of floats, for storage in DynamoDB.
validate_input['id'] = machine_id
dynamodb.Table('machine').put_item(Item=validate_input)
return jsonify({"status", "success", error_message = ""})
#app.route("/machine/<uuid:machine_id>/compute", methods=['GET'])
def get_machine(machine_id):
result = dynamodb.Table('machine').get_item(Key=machine_id)
return jsonify(result['Item'])
#app.route("/machine/<uuid:machine_id>/compute", methods=['GET'])
def compute_machine(machine_id):
result = dynamodb.Table('machine').get_item(Key=machine_id)
validated_input = MachineSchema().load(result['Item'])
# NOTE: validated_input should be a Machine object
# which has made use of the post_load
return jsonify(validated_input.useful_method())
The issue with this is that I need to have my Marshmallow schema pull double duty. For starters, in the create_machine function, I need the schema to ensure that the user calling my REST API has passed me a properly formed object with no extra fields and meeting all required fields, etc. I need to make sure I'm not storing invalid junk in the DB after all. It also needs to recursively crawl the input JSON and translate all of the JSON values to the right type. For example, floats are not supported in Dynamo, so they need to be Decimals as shown here. This is something Marshmallow make pretty easy. If there was no post_load, this is exactly what would be produced as validated_input.
The second job of the schema is that it needs it to take the Python object retrieved from the DynamoDB, which looks almost exactly like the user input JSON with the exception of floats are decimals, and translate it into my Python objects, Machine and Widget. This is where I'll need to read the object again but this time use the post load to create objects. In this case, however, I do not want my numbers to be decimals. I'd like them to be standard Python floats.
I could write two totally different Marshmallow schema for this and be done with it, clearly. One would have Decimals for the height and weight and one would have just floats. One would have post loads for every object and one would have none. But writing two identical schemas is a huge pain. My schema definitions are several hundred lines long. Inheriting a DB version with a post load didn't seem like the right direction because I would need to change any fields.Nested to point to the correct class. For example even if I inherited MachineSchemaDBVersion from MachineSchema, and added a post_load, MachineScehemaDBVersion would still reference WidgetScehema, not some DB version of the WidgetSchema, unless I overroad the widgets field as well.
I could potentially derive my own Schema object and pass a flag for are we in the DB mode or not.
How are people generally handling this issue of wanting to store REST API input more or less directly to a DynamoDB with some validation and then use that data later to construct Python objects for a computation?
On method I have tried is to have my schema always instantiate my Python objects and then dumb them to the database using dumps from a fully constructed object. The problem with this is that the computation library's objects, in my example Machine or Widget, do not have all the required fields that I need to store in the database, like the IDs, or names or descriptions. The objects are made specifically for doing the computations.

I ended up finding a solution to this. Effectively, what I've done is to generate the Marshmallow schema exclusively for translation from the DynamoDB into the Python objects. All Schema classes have #post_load methods that translate into the Python objects and all fields are labeled with the type they need to be in the Python world, not the database world.
When validating the input from the REST API and ensuring that no bad data is allowed to get into the database, I call MySchema().validate(input_json), check to see that there are no errors, and if not, dump the input_json into the database.
This leaves only one extra problem which is that the input_json needs to be cleaned up for entry into the Database, which I was previously doing with Marshmallow. However, this can also easily be done by adjusting my JSON decoder to read Decimals from floats.
So in summary, my JSON decoder is doing the work of recursively walking the data structure and converting Float to Decimal separately from Marshmallow. Marshmallow is running a validate on the fields of every object, but the results are only checked for errors. The original input is then dumped into the database.
I needed to add this line to do the conversion to Decimal.
app.json_decoder = partial(flask.json.JSONDecoder, parse_float=decimal.Decimal)
My create function now looks like this. Notice how the original input_json, parsed by my updated JSON decoder, is inserted directly into the database, rather than any data mundged output from Marshmallow.
#app.route("/machine/<uuid:machine_id>", methods=['POST'])
def create_machine(machine_id):
input_json = request.get_json() # Already ready to be DB input as is.
errors = MachineSchema().validate(input_json)
if errors:
return jsonify({"status": "failure",message = dumps(errors)})
else:
input_json['id'] = machine_id
dynamodb.Table('machine').put_item(Item=input_json)
return jsonify({"status", "success", error_message = ""})

Initialize classes from POST JSON data

I am writing a Django app, which will send some data from the site to a python script to process. I am planning on sending this data as a JSON string (this need not be the case). Some of the values sent over would ideally be class instances, however this is clearly not possible, and the class name plus any arguments needed to initialize the class must some how be serialized into a JSON value before then being deserialized by the python script. This could be achieved with the code below, but it has several problems:
My attempt
I have put all the data needed for each class, in a list and used that to initialize each class:
import json
class Class1():
def __init__(self, *args, **kwargs):
for k, v in kwargs.items():
setattr(self, k, v)
self._others = args
class Bar():
POTENTIAL_OBJECTS = {"RANGE": range,
"Class1": Class1}
def __init__(self, json_string):
python_dict = json.loads(json_string)
for key, value in python_dict.items():
if isinstance(value, list) and value[0] in Bar.POTENTIAL_OBJECTS:
setattr(self, key, Bar.POTENTIAL_OBJECTS[value[0]](*value[1], **value[2]))
else:
setattr(self, key, value)
example = ('{ "key_1":"Some string", "key_2":["heres", "a", "list"],'
'"key_3":["RANGE", [10], {}], "key_4":["Class1", ["stuff"], {"stuff2":"x"}] }')
a = Bar(example)
The Problems with my approach
Apart from generally being a bit messy and not particularly elegant, there are other problems. Some of the lists in the JSON object will be generated by the user, and this obviously presents problems if the user uses a key from POTENTIAL_OBJECTS. (In a non-simplified version, Bar will have lots of subclasses, each with a second POTENTIAL_OBJECTS so keeping track of all the potential values for front-end validation would be tricky).
My Question
It feels like this must be a reasonably common thing that is needed and there must be some standard patterns or ways of achieving this. Is there a common/better approach/method to achieve this?
EDIT: I have realised, one way round the problem is to make all the keys in POTENTIAL_OBJECTS start with an underscore, and then validate against any underscores in user-inputs at the front-end. It still seems like there must be a better way to de-serialize from JSON to more complex objects than strings/ints/bools/lists etc.

Instead of having one master method to turn any arbitrary JSON into an arbitrary hierarchy of Python objects, the typical pattern would be to create a Django model for each type of thing you are trying to model. Relationships between them would then be modeled via relationship fields (ForeignKey, ManyToMany, etc, as appropriate). For instance, you might create a class Employee that models an employee, and a class Paycheck. Paycheck could then have a ForeignKey field named issued_to that refers to an Employee.
Note also that any scheme similar to the one you describe (where user-created JSON is translated directly into arbitrary Python objects) would have security implications, potentially allowing users to execute arbitrary code in the context of the Django server, though if you were to attempt it, the whitelist approach have started here would be a decent place to start as a way to do it safely.
In short, you're reinventing most of what Django already does for you. The Django ORM features will help you to create models of the specific things you are interested in, validate the data, turn those data into Python objects safely, and even save instances of these models in the database for retrieval later.
That said, if you are to parse a JSON string directly into an object hierarchy, you would have to do a full traversal instead of just going over the top-level items. To do that, you should look into doing something like a depth-first traversal, creating new model instances at each new node in the hierarchy. If you want to validate these inputs server-side, you'd need to replicate this work in Javascript as well.

Overriding methods for defining custom model field in django

I have been trying to define custom django model field in python. I referred the django docs at following location https://docs.djangoproject.com/en/1.10/howto/custom-model-fields/. However, I am confused over the following methods(which I have divided into groups as per my understanding) :-
Group 1 (Methods in this group are inter-related as per docs)
__init__()
deconstruct()
Group 2
db_type()
rel_db_type()
get_internal_type()
Group 3
from_db_value()
to_python()
get_prep_value()
get_db_prep_value()
get_db_prep_save()
value_from_object()
value_to_string()
Group 4
formfield
I am having following questions :-
When deconstruct() is used ? Docs says that, it's useful during migration, but it's not clearly explained. Moreover, when is it called ?
Difference between db_type() and get_internal_type()
Difference between get_prep_value() and get_db_prep_value()
Difference between value_from_object() and value_to_string(). value_from_object() is not given in docs.
Both from_db_value(), value_to_string() and to_python() gives python object from string. Then, why these different methods are exists ?
I know, I have asked a bit lengthy question. But couldn't find any other way to better ask this question.
Thanks in advance.

I'll try to answer them:
Q: When deconstruct() is used ?
A: This method is being used when you have instance of your Field to re-create it based on arguments you just passed in __init__.
As they mentioned in docs, if you are setting max_length arg to a static value in your __init__ method; you do not need it for your instances. So you can delete it in your deconstruct() method. With this, max_length won't show up in your instance while you are using it in your models. You can think deconstruct as a last clean-up and control place before use your field in model.
Q: Difference between db_type() and get_internal_type()
A: They are both related, but belong to different levels.
If your custom field's data type is depends on which DB you are using, db_type() is the place you can do your controls. Again, like they mentioned in docs, if your field is a kind of date/time value, you should / may check if current database is PostgreSQL or MySQL in this method. Because while date/time values called as timestamp in PostgreSQL, it is called datetime in MySQL.
get_internal_type method is kind of higher level version of db_type(). Let's go over date/time value example: If you don't want to check and control each data types belongs to different databases, you can inherit your custom field's data type from built-in Django fields. Instead of checking if it should be datetime or timestamp; you can return simply DateField in your get_internal_type method. As they mentioned in docs, If you've created db_type method already, in most cases, you do not need get_internal_type method.
Q: Difference between get_prep_value() and get_db_prep_value()
A: These guys also share almost same logic between db_type() and get_internal_type(). First of all, both these methods stands for converting db values to python objects. But, like in db_type method, get_db_prep_value() stands for backend specific field types.
Q: Difference between value_from_object() and value_to_string(). value_from_object() is not given in docs
A: From the docs:
To customize how the values are serialized by a serializer, you can
override value_to_string(). Using value_from_object() is the best way
to get the field’s value prior to serialization.
So, Actually we don't need value_from_object as documented. This method is used to get field's raw value before serialization. Get the value with this method, and customize how it should be serialized in value_to_string method. They even put an example code in docs
Q: Both from_db_value(), value_to_string() and to_python() gives python object from string. Then, why these different methods are exists ?
A: While to_python() converts field value to a valid python object, value_to_string() converts field values to string with your custom serialization. They stands for different jobs.
And from_db_value converts the value returned by database to python object. Never heard of it actually. But check this part from docs:
This method is not used for most built-in fields as the database
backend already returns the correct Python type, or the backend itself
does the conversion.

Python: Assign an object's variable from a function (OpenERP)

I'm working on a OpenERP environment, but maybe my issue can be answered from a pure python perspective. What I'm trying to do is define a class whose "_columns" variable can be set from a function that returns the respective dictionary. So basically:
class repos_report(osv.osv):
_name = "repos.report"
_description = "Reposition"
_auto = False
def _get_dyna_cols(self):
ret = {}
cr = self.cr
cr.execute('Select ... From ...')
pass #<- Fill dictionary
return ret
_columns = _get_dyna_cols()
def init(self, cr):
pass #Other stuff here too, but I need to set my _columns before as per openerp
repos_report()
I have tried many ways, but these code reflects my basic need. When I execute my module for installation I get the following error.
TypeError: _get_dyna_cols() takes exactly 1 argument (0 given)
When defining the the _get_dyna_cols function I'm required to have self as first parameter (even before executing). Also, I need a reference to openerp's 'cr' cursor in order to query data to fill my _columns dictionary. So, how can I call this function so that it can be assigned to _columns? What parameter could I pass to this function?
From an OpenERP perspective, I guess I made my need quite clear. So any other approach suggested is also welcome.

From an OpenERP perspective, the right solution depends on what you're actually trying to do, and that's not quite clear from your description.
Usually the _columns definition of a model must be static, since it will be introspected by the ORM and (among other things) will result in the creation of corresponding database columns. You could set the _columns in the __init__ method (not init1) of your model, but that would not make much sense because the result must not change over time, (and it will only get called once when the model registry is initialized anyway).
Now there are a few exceptions to the "static columns" rules:
Function Fields
When you simply want to dynamically handle read/write operations on a virtual column, you can simply use a column of the fields.function type. It needs to emulate one of the other field types, but can do anything it wants with the data dynamically. Typical examples will store the data in other (real) columns after some pre-processing. There are hundreds of example in the official OpenERP modules.
Dynamic columns set
When you are developing a wizard model (a subclass of TransientModel, formerly osv_memory), you don't usually care about the database storage, and simply want to obtain some input from the user and take corresponding actions.
It is not uncommon in that case to need a completely dynamic set of columns, where the number and types of the columns may change every time the model is used. This can be achieved by overriding a few key API methods to simulate dynamic columns`:
fields_view_get is the API method that is called by the clients to obtain the definition of a view (form/tree/...) for the model.
fields_get is included in the result of fields_view_get but may be called separately, and returns a dict with the columns definition of the model.
search, read, write and create are called by the client in order to access and update record data, and should gracefully accept or return values for the columns that were defined in the result of fields_get
By overriding properly these methods, you can completely implement dynamic columns, but you will need to preserve the API behavior, and handle the persistence of the data (if any) yourself, in real static columns or in other models.
There are a few examples of such dynamic columns sets in the official addons, for example in the survey module that needs to simulate survey forms based on the definition of the survey campaign.
1 The init() method is only called when the model's module is installed or updated, in order to setup/update the database backend for this model. It relies on the _columns to do this.

When you write _columns = _get_dyna_cols() in the class body, that function call is made right there, in the class body, as Python is still parsing the class itself. At that point, your _get_dyn_cols method is just a function object in the local (class body) namespace - and it is called.
The error message you get is due to the missing self parameter, which is inserted only when you access your function as a method - but this error message is not what is wrong here: what is wrong is that you are making an imediate function call and expecting an special behavior, like late execution.
The way in Python to achieve what you want - i.e. to have the method called authomatically when the attribute colluns is accessed is to use the "property" built-in.
In this case, do just this: _columns = property(_get_dyna_cols) -
This will create a class attribute named "columns" which through a mechanism called "descriptor protocol" will call the desired method whenever the attribute is accessed from an instance.
To leran more about the property builtin, check the docs: http://docs.python.org/library/functions.html#property

String Dictionary Representation of a Model/Object instance in Python/Django?

I feel that this is a very simple question, but I'm new to Python, and I'm learning Django at the same time.
My objective is to create a string dictionary representation i.e. it's dictionary formatted but a string, of a Model's instance in Django. How could I do that? Is there a built-in function that I can call straight from the object's instance, or do I have to define one?
UPDATE:
I would like to call this functionality within the model definition itself i.e. I'm implementing a class method or function which needs this functionality. I'm thinking of a functionality which behaves like python's built-in function locals() but should only return the model's attributes.
I also would like to add that I'll be calling this functionality on a model's instance which has not been saved yet to the database. So in essence, I'll be working on a model's instance representing a record which is not yet in the database. So anything function using a Manager or QuerySet I guess is not why I'm looking for.
Example:
class Person(models.Model):
name = ...
age = ...
def func_doing_something(self):
#get the string dictionary representation of this model's instance
#do something to it
#return something
Thanks everyone!

Use p = Person(name='john', age=10).values()
See here: https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.values
To get it to a string use:
s = str(p[0])

You can serialize your objects to json format, e.g. with django build-in serializers
This allows you deserialize quite easily.

I found from this SO post the solution I was looking for...
some Django object obj
[(field.name, getattr(obj,field.name)) for field in obj._meta.fields]
and just call dict() on the result.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.