I am writing a Django app, which will send some data from the site to a python script to process. I am planning on sending this data as a JSON string (this need not be the case). Some of the values sent over would ideally be class instances, however this is clearly not possible, and the class name plus any arguments needed to initialize the class must some how be serialized into a JSON value before then being deserialized by the python script. This could be achieved with the code below, but it has several problems:
My attempt
I have put all the data needed for each class, in a list and used that to initialize each class:
import json
class Class1():
def __init__(self, *args, **kwargs):
for k, v in kwargs.items():
setattr(self, k, v)
self._others = args
class Bar():
POTENTIAL_OBJECTS = {"RANGE": range,
"Class1": Class1}
def __init__(self, json_string):
python_dict = json.loads(json_string)
for key, value in python_dict.items():
if isinstance(value, list) and value[0] in Bar.POTENTIAL_OBJECTS:
setattr(self, key, Bar.POTENTIAL_OBJECTS[value[0]](*value[1], **value[2]))
else:
setattr(self, key, value)
example = ('{ "key_1":"Some string", "key_2":["heres", "a", "list"],'
'"key_3":["RANGE", [10], {}], "key_4":["Class1", ["stuff"], {"stuff2":"x"}] }')
a = Bar(example)
The Problems with my approach
Apart from generally being a bit messy and not particularly elegant, there are other problems. Some of the lists in the JSON object will be generated by the user, and this obviously presents problems if the user uses a key from POTENTIAL_OBJECTS. (In a non-simplified version, Bar will have lots of subclasses, each with a second POTENTIAL_OBJECTS so keeping track of all the potential values for front-end validation would be tricky).
My Question
It feels like this must be a reasonably common thing that is needed and there must be some standard patterns or ways of achieving this. Is there a common/better approach/method to achieve this?
EDIT: I have realised, one way round the problem is to make all the keys in POTENTIAL_OBJECTS start with an underscore, and then validate against any underscores in user-inputs at the front-end. It still seems like there must be a better way to de-serialize from JSON to more complex objects than strings/ints/bools/lists etc.
Instead of having one master method to turn any arbitrary JSON into an arbitrary hierarchy of Python objects, the typical pattern would be to create a Django model for each type of thing you are trying to model. Relationships between them would then be modeled via relationship fields (ForeignKey, ManyToMany, etc, as appropriate). For instance, you might create a class Employee that models an employee, and a class Paycheck. Paycheck could then have a ForeignKey field named issued_to that refers to an Employee.
Note also that any scheme similar to the one you describe (where user-created JSON is translated directly into arbitrary Python objects) would have security implications, potentially allowing users to execute arbitrary code in the context of the Django server, though if you were to attempt it, the whitelist approach have started here would be a decent place to start as a way to do it safely.
In short, you're reinventing most of what Django already does for you. The Django ORM features will help you to create models of the specific things you are interested in, validate the data, turn those data into Python objects safely, and even save instances of these models in the database for retrieval later.
That said, if you are to parse a JSON string directly into an object hierarchy, you would have to do a full traversal instead of just going over the top-level items. To do that, you should look into doing something like a depth-first traversal, creating new model instances at each new node in the hierarchy. If you want to validate these inputs server-side, you'd need to replicate this work in Javascript as well.
Related
I'm reading the docs on JSONField, a special postgresql field type. Since I intend to create a custom field that subclasses JSONField, with the added features of being able to convert my Lifts class:
class Lifts(object):
def __init__(self, series):
for serie in series:
if type(serie) != LiftSerie:
raise TypeError("List passed to constructor should only contain LiftSerie objects")
self.series = series
class AbstractSerie(object):
def __init__(self, activity, amount):
self.activity_name = activity.name
self.amount = amount
def pre_json(self):
"""A dict that can easily be turned into json."""
pre_json = {
self.activity_name:
self.amount
}
return pre_json
def __str__(self):
return str(self.pre_json())
class LiftSerie(AbstractSerie):
def __init__(self, lift, setlist):
""" lift should be an instance of LiftActivity.
setList is a list containing reps for each set
that has been performed.
"""
if not (isinstance(setlist, collections.Sequence) and not isinstance(setlist, str)):
raise TypeError("setlist has to behave as a list and can not be a string.")
super().__init__(lift, setlist)
I've read here that to_python() and from_db_value() are two methods on the Field class that are involved in loading values from the database and deserializing them. Also, in the docstring of the to_python() method on the Field class, it says that it should be overridden by subclasses. So, I looked in JSONField. Guess what, it doesn't override it. Also, from_db_value() isn't even defined on Field (and not on JOSNField either).
So what is going on here? This is making it very hard to understand how JSONField takes values and turns them into json and stores them in the database, and then the opposite when we query the database.
A summary of my questions:
Why isn't to_python() overridden in JSONField?
Why isn't from_db_value() overridden in JSONField?
Why isn't from_db_value() even defined on Field?
How does JSONField go about taking a python dict for example, converting it to a JSON string, and storing it in the database?
How does it do the opposite?
Sorry for many questions, but I really want to understand this and the docs are a bit lacking IMO.
For Django database fields, there are three relevant states/representations of the same data: form, python and database. In case of the example HandField, form/database representations are the same string, the python representation is the Hand object instance.
In case of a custom field on top of JSONField, the internal python might be a LiftSerie instance, the form representation a json string, the value sent to the database a json string and the value received from the database a json structure converted by psycopg2 from the string returned by postgres, if that makes sense.
In terms of your questions:
The python value is not customized, so the python data type of the field is the same as the expected input. In contrast to the HandField example, where the input could by a string or a Hand instance. In the latter case, the base Field.to_python() implementation, which just returns the input would be enough.
Psycopg2 already converts the database value to json, see 5. This is also true for other types like int/IntegerField.
from_db_value is not defined in the base Field class, but it is certainly taken into account if it exists. If you look at the implementation of Field.get_db_converters(), from_db_value is added to it if the Field has an attribute named like that.
The django.contrib.postgres.JSONField has an optional encoder argument. By default, it uses json.dumps without an encoder to convert a json structure to JSON string.
psycopg2 automatically convertes from database types to python types. It's called adaptation. Documentation for JSON adaptation explains how that works and can be customized.
Note that when implementing a custom field, I would suggest writing tests for it during development, especially if the mechanisms are not completely understood. You can get inspiration for such tests in for example django-localflavor.
Short answer is, both to_python and from_db_value return python strings that should serialize to JSON with no encoding errors, all things being equal.
If you're okay with strings, that's fine but I usually override Django's JSONFields's from_db_value method to return a dict or a list, not a string for use in my code. I created a custom field for that.
To me, the whole point of a Json field is to be able to interact with it's values as dicts or lists.
Using Python 2.6, I'm trying to handle tables in a growing variety of formats (xls, csv, shp, json, xml, html table data), and feed the content into an ArcGIS database table (stay with me please, this is more about the python part of of the process than the GIS part). In the current design, my base class formats the target database table and populates it with the content of the source format. The subclasses are currently designed to feed the content into a dictionary so that the base class can handle the content no matter what the source format was.
The problem is that my users could be feeding a file or table of any one of these formats into the script, so the subclass would optimally be determined at runtime. I do not know how to do this other than by running a really involved if-elif-elif-... block. The structure kind of looks like this:
class Input:
def __init__(self, name): # name is the filename, including path
self.name = name
self.ext = name[3:]
d = {} # content goes here
... # dictionary content written to database table here
# each subclass writes to d
class xls(Input):
...
class xml(Input):
...
class csv(Input):
...
x = Input("c:\foo.xls")
y = Input("c:\bar.xml")
My understanding of duck-typing and polymorphism suggests this is not the way to go about it, but I'm having a tough time figuring out a better design. Help on that front would help, but what I'm really after is how to turn x.ext or y.ext into the fork at which the subclass (and thus the input-handling) is determined.
If it helps, let's say that foo.xls and bar.xml have the same data, and so x.d and y.d will eventually have the same items, such as {'name':'Somegrad', 'lat':52.91025, 'lon':47.88267}.
This problem is commonly solved with a factory function that knows about subclasses.
input_implementations = { 'xls':xls, 'xml':xml, 'csv':csv }
def input_factory(filename):
ext = os.path.splitext(filename)[1][1:].lower()
impl = input_implementations.get(ext, None)
if impl is None:
print 'rain fire from the skies'
else:
return impl(filename)
Its harder to do from the base class itself (Input('file.xyz')) because the subclasses aren't defined when Input is defined. You can get tricky, but a simple factory is easy.
How about if each derived class contained a list of possible file extensions that it could parse? Then you could try to match the input file's extension with one of these to decide which subclass to use.
You're on the right track. Use your subclasses:
x = xls("c:\foo.xls")
y = xml("c:\bar.xml")
Write methods in each subclass to parse the appropriate data type, and use the base class (Input) to write the data to a database.
I'm working on a OpenERP environment, but maybe my issue can be answered from a pure python perspective. What I'm trying to do is define a class whose "_columns" variable can be set from a function that returns the respective dictionary. So basically:
class repos_report(osv.osv):
_name = "repos.report"
_description = "Reposition"
_auto = False
def _get_dyna_cols(self):
ret = {}
cr = self.cr
cr.execute('Select ... From ...')
pass #<- Fill dictionary
return ret
_columns = _get_dyna_cols()
def init(self, cr):
pass #Other stuff here too, but I need to set my _columns before as per openerp
repos_report()
I have tried many ways, but these code reflects my basic need. When I execute my module for installation I get the following error.
TypeError: _get_dyna_cols() takes exactly 1 argument (0 given)
When defining the the _get_dyna_cols function I'm required to have self as first parameter (even before executing). Also, I need a reference to openerp's 'cr' cursor in order to query data to fill my _columns dictionary. So, how can I call this function so that it can be assigned to _columns? What parameter could I pass to this function?
From an OpenERP perspective, I guess I made my need quite clear. So any other approach suggested is also welcome.
From an OpenERP perspective, the right solution depends on what you're actually trying to do, and that's not quite clear from your description.
Usually the _columns definition of a model must be static, since it will be introspected by the ORM and (among other things) will result in the creation of corresponding database columns. You could set the _columns in the __init__ method (not init1) of your model, but that would not make much sense because the result must not change over time, (and it will only get called once when the model registry is initialized anyway).
Now there are a few exceptions to the "static columns" rules:
Function Fields
When you simply want to dynamically handle read/write operations on a virtual column, you can simply use a column of the fields.function type. It needs to emulate one of the other field types, but can do anything it wants with the data dynamically. Typical examples will store the data in other (real) columns after some pre-processing. There are hundreds of example in the official OpenERP modules.
Dynamic columns set
When you are developing a wizard model (a subclass of TransientModel, formerly osv_memory), you don't usually care about the database storage, and simply want to obtain some input from the user and take corresponding actions.
It is not uncommon in that case to need a completely dynamic set of columns, where the number and types of the columns may change every time the model is used. This can be achieved by overriding a few key API methods to simulate dynamic columns`:
fields_view_get is the API method that is called by the clients to obtain the definition of a view (form/tree/...) for the model.
fields_get is included in the result of fields_view_get but may be called separately, and returns a dict with the columns definition of the model.
search, read, write and create are called by the client in order to access and update record data, and should gracefully accept or return values for the columns that were defined in the result of fields_get
By overriding properly these methods, you can completely implement dynamic columns, but you will need to preserve the API behavior, and handle the persistence of the data (if any) yourself, in real static columns or in other models.
There are a few examples of such dynamic columns sets in the official addons, for example in the survey module that needs to simulate survey forms based on the definition of the survey campaign.
1 The init() method is only called when the model's module is installed or updated, in order to setup/update the database backend for this model. It relies on the _columns to do this.
When you write _columns = _get_dyna_cols() in the class body, that function call is made right there, in the class body, as Python is still parsing the class itself. At that point, your _get_dyn_cols method is just a function object in the local (class body) namespace - and it is called.
The error message you get is due to the missing self parameter, which is inserted only when you access your function as a method - but this error message is not what is wrong here: what is wrong is that you are making an imediate function call and expecting an special behavior, like late execution.
The way in Python to achieve what you want - i.e. to have the method called authomatically when the attribute colluns is accessed is to use the "property" built-in.
In this case, do just this: _columns = property(_get_dyna_cols) -
This will create a class attribute named "columns" which through a mechanism called "descriptor protocol" will call the desired method whenever the attribute is accessed from an instance.
To leran more about the property builtin, check the docs: http://docs.python.org/library/functions.html#property
Pretty recent (but not newborn) to both Python, SQLAlchemy and Postgresql, and trying to understand inheritance very hard.
As I am taking over another programmer's code, I need to understand what is necessary, and where, for the inheritance concept to work.
My questions are:
Is it possible to rely only on SQLAlchemy for inheritance? In other words, can SQLAlchemy apply inheritance on Postgresql database tables that were created without specifying INHERITS=?
Is the declarative_base technology (SQLAlchemy) necessary to use inheritance the proper way. If so, we'll have to rewrite everything, so please don't discourage me.
Assuming we can use Table instance, empty Entity classes and mapper(), could you give me a (very simple) example of how to go through the process properly (or a link to an easily understandable tutorial - I did not find any easy enough yet).
The real world we are working on is real estate objects. So we basically have
- one table immobject(id, createtime)
- one table objectattribute(id, immoobject_id, oatype)
- several attribute tables: oa_attributename(oa_id, attributevalue)
Thanks for your help in advance.
Vincent
Welcome to Stack Overflow: in the future, if you have more than one question; you should provide a separate post for each. Feel free to link them together if it might help provide context.
Table inheritance in postgres is a very different thing and solves a different set of problems from class inheritance in python, and sqlalchemy makes no attempt to combine them.
When you use table inheritance in postgres, you're doing some trickery at the schema level so that more elaborate constraints can be enforced than might be easy to express in other ways; Once you have designed your schema; applications aren't normally aware of the inheritance; If they insert a row; it just magically appears in the parent table (much like a view). This is useful, for instance, for making some kinds of bulk operations more efficient (you can just drop the table for the month of january).
This is a fundamentally different idea from inheritance as seen in OOP (in python or otherwise, with relational persistence or otherwise). In that case, the application is aware that two types are related, and that the subtype is a permissible substitute for the supertype. "A holding is an address, a contact has an address therefore a contact can have a holding."
Which of these, (mostly orthogonal) tools you need depends on the application. You might need neither, you might need both.
Sqlalchemy's mechanisms for working with object inheritance is flexible and robust, you should use it in favor of a home built solution if it is compatible with your particular needs (this should be true for almost all applications).
The declarative extension is a convenience; It allows you to describe the mapped table, the python class and the mapping between the two in one 'thing' instead of three. It makes your code more "DRY"; It is however only a convenience layered on top of "classic sqlalchemy" and it isn't necessary by any measure.
If you find that you need table inheritance that's visible from sqlalchemy; your mapped classes won't be any different from not using those features; tables with inheritance are still normal relations (like tables or views) and can be mapped without knowledge of the inheritance in the python code.
For your #3, you don't necessarily have to declare empty entity classes to use mapper. If your application doesn't need fancy properties, you can just use introspection and metaclasses to model the existing tables without defining them. Here's what I did:
mymetadata = sqlalchemy.MetaData()
myengine = sqlalchemy.create_engine(...)
def named_table(tablename):
u"return a sqlalchemy.Table object given a SQL table name"
return sqlalchemy.Table(tablename, mymetadata, autoload=True, autoload_with=myengine)
def new_bound_class(engine, table):
u"returns a new ORM class (processed by sqlalchemy.orm.mapper) given a sqlalchemy.Table object"
fieldnames = table.c.__dict__['_data']
def format_attributes(obj, transform):
attributes = [u'%s=%s' % (x, transform(x)) for x in fieldnames]
return u', '.join(attributes)
class DynamicORMClass(object):
def __init__(self, **kw):
u"Keyword arguments may be used to initialize fields/columns"
for key in kw:
if key in fieldnames: setattr(self, key, kw[key])
else: raise KeyError, '%s is not a valid field/column' % (key,)
def __repr__(self):
return u'%s(%s)' % (self.__class__.__name__, format_attributes(self, repr))
def __str__(self):
return u'%s(%s)' % (str(self.__class__), format_attributes(self, str))
DynamicORMClass.__doc__ = u"This is a dynamic class created using SQLAlchemy based on table %s" % (table,)
return sqlalchemy.orm.mapper(DynamicORMClass, table)
def named_orm_class(table):
u"returns a new ORM class (processed by sqlalchemy.orm.mapper) given a table name or object"
if not isinstance(table, Table):
table = named_table(table)
return new_bound_class(table)
Example of use:
>>> myclass = named_orm_class('mytable')
>>> session = Session()
>>> obj = myclass(name='Fred', age=25, ...)
>>> session.add(obj)
>>> session.commit()
>>> print str(obj) # will print all column=value pairs
I beefed up my versions of new_bound_class and named_orm_class a little more with decorators, etc. to provide extra capabilities, and you can too. Of course, under the covers, it is declaring an empty entity class. But you don't have to do it, except this one time.
This will tide you over until you decide that you're tired of doing all those joins yourself, and why can't I just have an object attribute that does a lazy select query against related classes whenever I use it. That's when you make the leap to declarative (or Elixir).
I want to make attributes of GAE Model properties. The reason is for cases like to turn the value into uppercase before storing it. For a plain Python class, I would do something like:
Foo(db.Model):
def get_attr(self):
return self.something
def set_attr(self, value):
self.something = value.upper() if value != None else None
attr = property(get_attr, set_attr)
However, GAE Datastore have their own concept of Property class, I looked into the documentation and it seems that I could override get_value_for_datastore(model_instance) to achieve my goal. Nevertheless, I don't know what model_instance is and how to extract the corresponding field from it.
Is overriding GAE Property classes the right way to provides getter/setter-like functionality? If so, how to do it?
Added:
One potential issue of overriding get_value_for_datastore that I think of is it might not get called before the object was put into datastore. Hence getting the attribute before storing the object would yield an incorrect value.
Subclassing GAE's Property class is especially helpful if you want more than one "field" with similar behavior, in one or more models. Don't worry, get_value_for_datastore and make_value_from_datastore are going to get called, on any store and fetch respectively -- so if you need to do anything fancy (including but not limited to uppercasing a string, which isn't actually all that fancy;-), overriding these methods in your subclass is just fine.
Edit: let's see some example code (net of imports and main):
class MyStringProperty(db.StringProperty):
def get_value_for_datastore(self, model_instance):
vv = db.StringProperty.get_value_for_datastore(self, model_instance)
return vv.upper()
class MyModel(db.Model):
foo = MyStringProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
my = MyModel(foo='Hello World')
k = my.put()
mm = MyModel.get(k)
s = mm.foo
self.response.out.write('The secret word is: %r' % s)
This shows you the string's been uppercased in the datastore -- but if you change the get call to a simple mm = my you'll see the in-memory instance wasn't affected.
But, a db.Property instance itself is a descriptor -- wrapping it into a built-in property (a completely different descriptor) will not work well with the datastore (for example, you can't write GQL queries based on field names that aren't really instances of db.Property but instances of property -- those fields are not in the datastore!).
So if you want to work with both the datastore and for instances of Model that have never actually been to the datastore and back, you'll have to choose two names for what's logically "the same" field -- one is the name of the attribute you'll use on in-memory model instances, and that one can be a built-in property; the other one is the name of the attribute that ends up in the datastore, and that one needs to be an instance of a db.Property subclass and it's this second name that you'll need to use in queries. Of course the methods underlying the first name need to read and write the second name, but you can't just "hide" the latter because that's the name that's going to be in the datastore, and so that's the name that will make sense to queries!
What you want is a DerivedProperty. The procedure for writing one is outlined in that post - it's similar to what Alex describes, but by overriding get instead of get_value_for_datastore, you avoid issues with needing to write to the datastore to update it. My aetycoon library has it and other useful properties included.