I have a list of objects that I need to jsonify. I've looked at the flask jsonify docs, but I'm just not getting it.
My class has several inst-vars, each of which is a string: gene_id, gene_symbol, p_value. What do I need to do to make this serializable as JSON?
My naive code:
jsonify(eqtls = my_list_of_eqtls)
Results in:
TypeError: <__main__.EqtlByGene object at 0x1073ff790> is not JSON serializable
Presumably I have to tell jsonify how to serialize an EqtlByGene, but I can't find an example that shows how to serialize an instance of a class.
I've been trying to follow some of the suggestions show below to create my own JSONEncoder subclass. My code is now:
class EqtlByGene(Resource):
def __init__(self, gene_id, gene_symbol, p_value):
self.gene_id = gene_id
self.gene_symbol = gene_symbol
self.p_value = p_value
class EqtlJSONEncoder(JSONEncoder):
def default(self, obj):
if isinstance(obj, EqtlByGene):
return {
'gene_id' : obj.gene_id,
'gene_symbol' : obj.gene_symbol,
'p_value' : obj.p_value
}
return super(EqtlJSONEncoder, self).default(obj)
class EqtlByGeneList(Resource):
def get(self):
eqtl1 = EqtlByGene(1, 'EGFR', 0.1)
eqtl2 = EqtlByGene(2, 'PTEN', 0.2)
eqtls = [eqtl1, eqtl2]
return jsonify(eqtls_by_gene = eqtls)
api.add_resource(EqtlByGeneList, '/eqtl/eqtlsbygene')
app.json_encoder(EqtlJSONEncoder)
if __name__ == '__main__':
app.run(debug=True)
When I try to reach it via curl, I get:
TypeError(repr(o) + " is not JSON serializable")
Give your EqltByGene an extra method that returns a dictionary:
class EqltByGene(object):
#
def serialize(self):
return {
'gene_id': self.gene_id,
'gene_symbol': self.gene_symbol,
'p_value': self.p_value,
}
then use a list comprehension to turn your list of objects into a list of serializable values:
jsonify(eqtls=[e.serialize() for e in my_list_of_eqtls])
The alternative would be to write a hook function for the json.dumps() function, but since your structure is rather simple, the list comprehension and custom method approach is simpler.
You can also be really adventurous and subclass flask.json.JSONEncoder; give it a default() method that turns your EqltByGene() instances into a serializable value:
from flask.json import JSONEncoder
class MyJSONEncoder(JSONEncoder):
def default(self, obj):
if isinstance(obj, EqltByGene):
return {
'gene_id': obj.gene_id,
'gene_symbol': obj.gene_symbol,
'p_value': obj.p_value,
}
return super(MyJSONEncoder, self).default(obj)
and assign this to the app.json_encoder attribute:
app = Flask(__name__)
app.json_encoder = MyJSONEncoder
and just pass in your list directly to jsonify():
return jsonify(my_list_of_eqtls)
You could also look at the Marshmallow project for a more full-fledged and flexible project for serializing and de-serializing objects to Python primitives that easily fit JSON and other such formats; e.g.:
from marshmallow import Schema, fields
class EqltByGeneSchema(Schema):
gene_id = fields.Integer()
gene_symbol = fields.String()
p_value = fields.Float()
and then use
jsonify(eqlts=EqltByGeneSchema().dump(my_list_of_eqtls, many=True)
to produce JSON output. The same schema can be used to validate incoming JSON data and (with the appropriate extra methods), used to produce EqltByGene instances again.
If you look at the docs for the json module, it mentions that you can subclass JSONEncoder to override its default method and add support for types there. That would be the most generic way to handle it if you're going to be serializing multiple different structures that might contain your objects.
If you want to use jsonify, it's probably easier to convert your objects to simple types ahead of time (e.g. by defining your own method on the class, as Martijn suggests).
Related
Long gone are the days of creating marshmallow schemas identical to my models. I found this excellent answer that explained how I could auto generate schemas from my SQA models using a simple decorator, so I implemented it and replaced the deprecated ModelSchema for the newer SQLAlchemyAutoSchema:
def add_schema(cls):
class Schema(SQLAlchemyAutoSchema):
class Meta:
model = cls
cls.Schema = Schema
return cls
This worked great... until I bumped into a model with a bloody Enum.
The error: Object of type MyEnum is not JSON serializable
I searched online and I found this useful answer.
But I'd like to implement it as part of the decorator so that it is generated automatically as well. In other words, I'd like to automatically overwrite all Enums in my model with EnumField(TheEnum, by_value=True) when generating the schema using the add_schema decorator; that way I won't have to overwrite all the fields manually.
What would be the best way to do this?
I have found that the support for enum types that was initially suggested only works if OneOf is the only validation class that exists in field_details. I added in some argument parsing (in a rudimentary way by looking for choices after stringifying the results _repr_args() from OneOf) to check the validation classes to hopefully make this implementation more universally usable:
def add_schema(cls):
class Schema(ma.SQLAlchemyAutoSchema):
class Meta:
model = cls
fields = Schema._declared_fields
# support for enum types
for field_name, field_details in fields.items():
if len(field_details.validate) > 0:
check = str(field_details.validate[0]._repr_args)
if check.__contains__("choices") :
enum_list = field_details.validate[0].choices
enum_dict = {enum_list[i]: enum_list[i] for i in range(0, len(enum_list))}
enum_clone = Enum(field_name.capitalize(), enum_dict)
fields[field_name] = EnumField(enum_clone, by_value=True, validate=validate.OneOf(enum_list))
cls.Schema = Schema
return cls
Thank you jgozal for the initial solution, as I really needed this lead for my current project.
This is my solution:
from marshmallow import validate
from marshmallow_sqlalchemy import SQLAlchemyAutoSchema
from marshmallow_enum import EnumField
from enum import Enum
def add_schema(cls):
class Schema(SQLAlchemyAutoSchema):
class Meta:
model = cls
fields = Schema._declared_fields
# support for enum types
for field_name, field_details in fields.items():
if len(field_details.validate) > 0:
enum_list = field_details.validate[0].choices
enum_dict = {enum_list[i]: enum_list[i] for i in range(0, len(enum_list))}
enum_clone = Enum(field_name.capitalize(), enum_dict)
fields[field_name] = EnumField(enum_clone, by_value=True, validate=validate.OneOf(enum_list))
cls.Schema = Schema
return cls
The idea is to iterate over the fields in the Schema and find those that have validation (usually enums). From there we can extract a list of choices which can then be used to build an enum from scratch. Finally we overwrite the schema field with a new EnumField.
By all means, feel free to improve the answer!
The regular way of JSON-serializing custom non-serializable objects is to subclass json.JSONEncoder and then pass a custom encoder to json.dumps().
It usually looks like this:
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Foo):
return obj.to_json()
return json.JSONEncoder.default(self, obj)
print(json.dumps(obj, cls=CustomEncoder))
What I'm trying to do, is to make something serializable with the default encoder. I looked around but couldn't find anything.
My thought is that there would be some field in which the encoder looks at to determine the json encoding. Something similar to __str__. Perhaps a __json__ field.
Is there something like this in python?
I want to make one class of a module I'm making to be JSON serializable to everyone that uses the package without them worrying about implementing their own [trivial] custom encoders.
As I said in a comment to your question, after looking at the json module's source code, it does not appear to lend itself to doing what you want. However the goal could be achieved by what is known as monkey-patching
(see question What is a monkey patch?).
This could be done in your package's __init__.py initialization script and would affect all subsequent json module serialization since modules are generally only loaded once and the result is cached in sys.modules.
The patch changes the default json encoder's default method—the default default().
Here's an example implemented as a standalone module for simplicity's sake:
Module: make_json_serializable.py
""" Module that monkey-patches json module when it's imported so
JSONEncoder.default() automatically checks for a special "to_json()"
method and uses it to encode the object if found.
"""
from json import JSONEncoder
def _default(self, obj):
return getattr(obj.__class__, "to_json", _default.default)(obj)
_default.default = JSONEncoder.default # Save unmodified default.
JSONEncoder.default = _default # Replace it.
Using it is trivial since the patch is applied by simply importing the module.
Sample client script:
import json
import make_json_serializable # apply monkey-patch
class Foo(object):
def __init__(self, name):
self.name = name
def to_json(self): # New special method.
""" Convert to JSON format string representation. """
return '{"name": "%s"}' % self.name
foo = Foo('sazpaz')
print(json.dumps(foo)) # -> "{\"name\": \"sazpaz\"}"
To retain the object type information, the special method can also include it in the string returned:
return ('{"type": "%s", "name": "%s"}' %
(self.__class__.__name__, self.name))
Which produces the following JSON that now includes the class name:
"{\"type\": \"Foo\", \"name\": \"sazpaz\"}"
Magick Lies Here
Even better than having the replacement default() look for a specially named method, would be for it to be able to serialize most Python objects automatically, including user-defined class instances, without needing to add a special method. After researching a number of alternatives, the following — based on an answer by #Raymond Hettinger to another question — which uses the pickle module, seemed closest to that ideal to me:
Module: make_json_serializable2.py
""" Module that imports the json module and monkey-patches it so
JSONEncoder.default() automatically pickles any Python objects
encountered that aren't standard JSON data types.
"""
from json import JSONEncoder
import pickle
def _default(self, obj):
return {'_python_object': pickle.dumps(obj)}
JSONEncoder.default = _default # Replace with the above.
Of course everything can't be pickled—extension types for example. However there are ways defined to handle them via the pickle protocol by writing special methods—similar to what you suggested and I described earlier—but doing that would likely be necessary for a far fewer number of cases.
Deserializing
Regardless, using the pickle protocol also means it would be fairly easy to reconstruct the original Python object by providing a custom object_hook function argument on any json.loads() calls that used any '_python_object' key in the dictionary passed in, whenever it has one. Something like:
def as_python_object(dct):
try:
return pickle.loads(str(dct['_python_object']))
except KeyError:
return dct
pyobj = json.loads(json_str, object_hook=as_python_object)
If this has to be done in many places, it might be worthwhile to define a wrapper function that automatically supplied the extra keyword argument:
json_pkloads = functools.partial(json.loads, object_hook=as_python_object)
pyobj = json_pkloads(json_str)
Naturally, this could be monkey-patched it into the json module as well, making the function the default object_hook (instead of None).
I got the idea for using pickle from an answer by Raymond Hettinger to another JSON serialization question, whom I consider exceptionally credible as well as an official source (as in Python core developer).
Portability to Python 3
The code above does not work as shown in Python 3 because json.dumps() returns a bytes object which the JSONEncoder can't handle. However the approach is still valid. A simple way to workaround the issue is to latin1 "decode" the value returned from pickle.dumps() and then "encode" it from latin1 before passing it on to pickle.loads() in the as_python_object() function. This works because arbitrary binary strings are valid latin1 which can always be decoded to Unicode and then encoded back to the original string again (as pointed out in this answer by Sven Marnach).
(Although the following works fine in Python 2, the latin1 decoding and encoding it does is superfluous.)
from decimal import Decimal
class PythonObjectEncoder(json.JSONEncoder):
def default(self, obj):
return {'_python_object': pickle.dumps(obj).decode('latin1')}
def as_python_object(dct):
try:
return pickle.loads(dct['_python_object'].encode('latin1'))
except KeyError:
return dct
class Foo(object): # Some user-defined class.
def __init__(self, name):
self.name = name
def __eq__(self, other):
if type(other) is type(self): # Instances of same class?
return self.name == other.name
return NotImplemented
__hash__ = None
data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'},
Foo('Bar'), Decimal('3.141592653589793238462643383279502884197169')]
j = json.dumps(data, cls=PythonObjectEncoder, indent=4)
data2 = json.loads(j, object_hook=as_python_object)
assert data == data2 # both should be same
You can extend the dict class like so:
#!/usr/local/bin/python3
import json
class Serializable(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# hack to fix _json.so make_encoder serialize properly
self.__setitem__('dummy', 1)
def _myattrs(self):
return [
(x, self._repr(getattr(self, x)))
for x in self.__dir__()
if x not in Serializable().__dir__()
]
def _repr(self, value):
if isinstance(value, (str, int, float, list, tuple, dict)):
return value
else:
return repr(value)
def __repr__(self):
return '<%s.%s object at %s>' % (
self.__class__.__module__,
self.__class__.__name__,
hex(id(self))
)
def keys(self):
return iter([x[0] for x in self._myattrs()])
def values(self):
return iter([x[1] for x in self._myattrs()])
def items(self):
return iter(self._myattrs())
Now to make your classes serializable with the regular encoder, extend 'Serializable':
class MySerializableClass(Serializable):
attr_1 = 'first attribute'
attr_2 = 23
def my_function(self):
print('do something here')
obj = MySerializableClass()
print(obj) will print something like:
<__main__.MySerializableClass object at 0x1073525e8>
print(json.dumps(obj, indent=4)) will print something like:
{
"attr_1": "first attribute",
"attr_2": 23,
"my_function": "<bound method MySerializableClass.my_function of <__main__.MySerializableClass object at 0x1073525e8>>"
}
I suggest putting the hack into the class definition. This way, once the class is defined, it supports JSON. Example:
import json
class MyClass( object ):
def _jsonSupport( *args ):
def default( self, xObject ):
return { 'type': 'MyClass', 'name': xObject.name() }
def objectHook( obj ):
if 'type' not in obj:
return obj
if obj[ 'type' ] != 'MyClass':
return obj
return MyClass( obj[ 'name' ] )
json.JSONEncoder.default = default
json._default_decoder = json.JSONDecoder( object_hook = objectHook )
_jsonSupport()
def __init__( self, name ):
self._name = name
def name( self ):
return self._name
def __repr__( self ):
return '<MyClass(name=%s)>' % self._name
myObject = MyClass( 'Magneto' )
jsonString = json.dumps( [ myObject, 'some', { 'other': 'objects' } ] )
print "json representation:", jsonString
decoded = json.loads( jsonString )
print "after decoding, our object is the first in the list", decoded[ 0 ]
The problem with overriding JSONEncoder().default is that you can do it only once. If you stumble upon anything a special data type that does not work with that pattern (like if you use a strange encoding). With the pattern below, you can always make your class JSON serializable, provided that the class field you want to serialize is serializable itself (and can be added to a python list, barely anything). Otherwise, you have to apply recursively the same pattern to your json field (or extract the serializable data from it):
# base class that will make all derivatives JSON serializable:
class JSONSerializable(list): # need to derive from a serializable class.
def __init__(self, value = None):
self = [ value ]
def setJSONSerializableValue(self, value):
self = [ value ]
def getJSONSerializableValue(self):
return self[1] if len(self) else None
# derive your classes from JSONSerializable:
class MyJSONSerializableObject(JSONSerializable):
def __init__(self): # or any other function
# ....
# suppose your__json__field is the class member to be serialized.
# it has to be serializable itself.
# Every time you want to set it, call this function:
self.setJSONSerializableValue(your__json__field)
# ...
# ... and when you need access to it, get this way:
do_something_with_your__json__field(self.getJSONSerializableValue())
# now you have a JSON default-serializable class:
a = MyJSONSerializableObject()
print json.dumps(a)
I don't understand why you can't write a serialize function for your own class? You implement the custom encoder inside the class itself and allow "people" to call the serialize function that will essentially return self.__dict__ with functions stripped out.
edit:
This question agrees with me, that the most simple way is write your own method and return the json serialized data that you want. They also recommend to try jsonpickle, but now you're adding an additional dependency for beauty when the correct solution comes built in.
For production environment, prepare rather own module of json with your own custom encoder, to make it clear that you overrides something.
Monkey-patch is not recommended, but you can do monkey patch in your testenv.
For example,
class JSONDatetimeAndPhonesEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, (datetime.date, datetime.datetime)):
return obj.date().isoformat()
elif isinstance(obj, basestring):
try:
number = phonenumbers.parse(obj)
except phonenumbers.NumberParseException:
return json.JSONEncoder.default(self, obj)
else:
return phonenumbers.format_number(number, phonenumbers.PhoneNumberFormat.NATIONAL)
else:
return json.JSONEncoder.default(self, obj)
you want:
payload = json.dumps(your_data, cls=JSONDatetimeAndPhonesEncoder)
or:
payload = your_dumps(your_data)
or:
payload = your_json.dumps(your_data)
however in testing environment, go a head:
#pytest.fixture(scope='session', autouse=True)
def testenv_monkey_patching():
json._default_encoder = JSONDatetimeAndPhonesEncoder()
which will apply your encoder to all json.dumps occurrences.
I have two classes: Website and WordpressWebsite.
WordpressWebsite subclasses Website.
When an instance of WordpressWebsite is being encoded into JSON, only the attributes of WordpressWebsite are present (and none of the attributes of Website).
My goal is to write a custom encoder which will encode a WordpressWebsite as a Website instead.
This is what I have so far:
from django.core.serializers.json import DjangoJSONEncoder
from websites.models import Website
class WebsiteEncoder(DjangoJSONEncoder):
def default(self, obj):
raise Exception() # TEST
if isinstance(obj, Website) and hasattr(obj, 'website_ptr'):
return super().default(obj.website_ptr)
return super().default(obj)
I have the following test case:
from django.core import serializers
from django.test import TestCase
from websites.models.wordpress import WordpressWebsite
from websites.serialize import WebsiteEncoder
class SerializationTest(TestCase):
def setUp(self):
self.wordpress = WordpressWebsite.objects.create(
domain='test.com'
)
def test_foo(self):
JSONSerializer = serializers.get_serializer("json")
json_serializer = JSONSerializer()
json_serializer.serialize(
WordpressWebsite.objects.all(),
cls=WebsiteEncoder
)
data = json_serializer.getvalue()
print(data)
This test case runs fine. It does not raise an exception.
Does anyone know why WebsiteEncoder.default is not being invoked?
Django models are encoded natively with its serializers. Django's own DjangoJSONEncoder supplies a complete serializer for all possible models with any of the default Django datatypes. If you look at the JSONEncoder.default() documentation, you'll notice that you would only supply encoders for datatypes that are not yet known to the encoder.
Only if you were using a field type which Django doesn't natively support, you could provide an encoder for it - and only that field type - through .default(). Therefore DjangoJSONEncoder isn't what you're looking for.
Trying to make your example work I discovered you can actually customize the process by subclassing django.core.serializers.json.Serializer:
from django.core.serializers.json import Serializer
class WebsiteSerializer(Serializer):
def get_dump_object(self, obj):
return {
"pk": obj.pk,
**self._current,
}
After that, you can make your serializer work in the test case like so:
def test_foo(self):
serializer = WebsiteSerializer()
data = serializer.serialize(WordpressWebsite.objects.all())
print(data)
Say I've got this simple little Pony ORM mapping here. The built-in Enum class is new as of Python 3.4, and backported to 2.7.
from enum import Enum
from pony.orm import Database, Required
class State(Enum):
ready = 0
running = 1
errored = 2
if __name__ == '__main__':
db = Database('sqlite', ':memory:', create_db=True)
class StateTable(db.Entity):
state = Required(State)
db.generate_mapping(create_tables=True)
When I run the program, an error is thrown.
TypeError: No database converter found for type <enum 'State'>
This happens because Pony doesn't support mapping the enum type. Of course, the workaround here is to just store the Enum value, and provide a getter in Class StateTable to convert the value to the Enum once again. But this is tedious and error prone. I can also just use another ORM. Maybe I will if this issue becomes too much of a headache. But I would rather stick with Pony if I can.
I would much rather create a database converter to store the enum, like the error message is hinting at. Does anyone know how to do this?
UPDATE:
Thanks to Ethan's help, I have come up with the following solution.
from enum import Enum
from pony.orm import Database, Required, db_session
from pony.orm.dbapiprovider import StrConverter
class State(Enum):
ready = 0
running = 1
errored = 2
class EnumConverter(StrConverter):
def validate(self, val):
if not isinstance(val, Enum):
raise ValueError('Must be an Enum. Got {}'.format(type(val)))
return val
def py2sql(self, val):
return val.name
def sql2py(self, value):
# Any enum type can be used, so py_type ensures the correct one is used to create the enum instance
return self.py_type[value]
if __name__ == '__main__':
db = Database('sqlite', ':memory:', create_db=True)
# Register the type converter with the database
db.provider.converter_classes.append((Enum, EnumConverter))
class StateTable(db.Entity):
state = Required(State)
db.generate_mapping(create_tables=True)
with db_session:
s = StateTable(state=State.ready)
print('Got {} from db'.format(s.state))
Excerpt from some random mailing list:
2.2. CONVERTER METHODS
Each converter class should define the following methods:
class MySpecificConverter(Converter):
def init(self, kwargs):
# Override this method to process additional positional
# and keyword arguments of the attribute
if self.attr is not None:
# self.attr.args can be analyzed here
self.args = self.attr.args
self.my_optional_argument = kwargs.pop("kwarg_name")
# You should take all valid options from this kwargs
# What is left in is regarded as unrecognized option
def validate(self, val):
# convert value to the necessary type (e.g. from string)
# validate all necessary constraints (e.g. min/max bounds)
return val
def py2sql(self, val):
# prepare the value (if necessary) to storing in the database
return val
def sql2py(self, value):
# convert value (if necessary) after the reading from the db
return val
def sql_type(self):
# generate corresponding SQL type, based on attribute options
return "SOME_SQL_TYPE_DEFINITION"
You can study the code of the existing converters to see how these methods
are implemented.
The regular way of JSON-serializing custom non-serializable objects is to subclass json.JSONEncoder and then pass a custom encoder to json.dumps().
It usually looks like this:
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Foo):
return obj.to_json()
return json.JSONEncoder.default(self, obj)
print(json.dumps(obj, cls=CustomEncoder))
What I'm trying to do, is to make something serializable with the default encoder. I looked around but couldn't find anything.
My thought is that there would be some field in which the encoder looks at to determine the json encoding. Something similar to __str__. Perhaps a __json__ field.
Is there something like this in python?
I want to make one class of a module I'm making to be JSON serializable to everyone that uses the package without them worrying about implementing their own [trivial] custom encoders.
As I said in a comment to your question, after looking at the json module's source code, it does not appear to lend itself to doing what you want. However the goal could be achieved by what is known as monkey-patching
(see question What is a monkey patch?).
This could be done in your package's __init__.py initialization script and would affect all subsequent json module serialization since modules are generally only loaded once and the result is cached in sys.modules.
The patch changes the default json encoder's default method—the default default().
Here's an example implemented as a standalone module for simplicity's sake:
Module: make_json_serializable.py
""" Module that monkey-patches json module when it's imported so
JSONEncoder.default() automatically checks for a special "to_json()"
method and uses it to encode the object if found.
"""
from json import JSONEncoder
def _default(self, obj):
return getattr(obj.__class__, "to_json", _default.default)(obj)
_default.default = JSONEncoder.default # Save unmodified default.
JSONEncoder.default = _default # Replace it.
Using it is trivial since the patch is applied by simply importing the module.
Sample client script:
import json
import make_json_serializable # apply monkey-patch
class Foo(object):
def __init__(self, name):
self.name = name
def to_json(self): # New special method.
""" Convert to JSON format string representation. """
return '{"name": "%s"}' % self.name
foo = Foo('sazpaz')
print(json.dumps(foo)) # -> "{\"name\": \"sazpaz\"}"
To retain the object type information, the special method can also include it in the string returned:
return ('{"type": "%s", "name": "%s"}' %
(self.__class__.__name__, self.name))
Which produces the following JSON that now includes the class name:
"{\"type\": \"Foo\", \"name\": \"sazpaz\"}"
Magick Lies Here
Even better than having the replacement default() look for a specially named method, would be for it to be able to serialize most Python objects automatically, including user-defined class instances, without needing to add a special method. After researching a number of alternatives, the following — based on an answer by #Raymond Hettinger to another question — which uses the pickle module, seemed closest to that ideal to me:
Module: make_json_serializable2.py
""" Module that imports the json module and monkey-patches it so
JSONEncoder.default() automatically pickles any Python objects
encountered that aren't standard JSON data types.
"""
from json import JSONEncoder
import pickle
def _default(self, obj):
return {'_python_object': pickle.dumps(obj)}
JSONEncoder.default = _default # Replace with the above.
Of course everything can't be pickled—extension types for example. However there are ways defined to handle them via the pickle protocol by writing special methods—similar to what you suggested and I described earlier—but doing that would likely be necessary for a far fewer number of cases.
Deserializing
Regardless, using the pickle protocol also means it would be fairly easy to reconstruct the original Python object by providing a custom object_hook function argument on any json.loads() calls that used any '_python_object' key in the dictionary passed in, whenever it has one. Something like:
def as_python_object(dct):
try:
return pickle.loads(str(dct['_python_object']))
except KeyError:
return dct
pyobj = json.loads(json_str, object_hook=as_python_object)
If this has to be done in many places, it might be worthwhile to define a wrapper function that automatically supplied the extra keyword argument:
json_pkloads = functools.partial(json.loads, object_hook=as_python_object)
pyobj = json_pkloads(json_str)
Naturally, this could be monkey-patched it into the json module as well, making the function the default object_hook (instead of None).
I got the idea for using pickle from an answer by Raymond Hettinger to another JSON serialization question, whom I consider exceptionally credible as well as an official source (as in Python core developer).
Portability to Python 3
The code above does not work as shown in Python 3 because json.dumps() returns a bytes object which the JSONEncoder can't handle. However the approach is still valid. A simple way to workaround the issue is to latin1 "decode" the value returned from pickle.dumps() and then "encode" it from latin1 before passing it on to pickle.loads() in the as_python_object() function. This works because arbitrary binary strings are valid latin1 which can always be decoded to Unicode and then encoded back to the original string again (as pointed out in this answer by Sven Marnach).
(Although the following works fine in Python 2, the latin1 decoding and encoding it does is superfluous.)
from decimal import Decimal
class PythonObjectEncoder(json.JSONEncoder):
def default(self, obj):
return {'_python_object': pickle.dumps(obj).decode('latin1')}
def as_python_object(dct):
try:
return pickle.loads(dct['_python_object'].encode('latin1'))
except KeyError:
return dct
class Foo(object): # Some user-defined class.
def __init__(self, name):
self.name = name
def __eq__(self, other):
if type(other) is type(self): # Instances of same class?
return self.name == other.name
return NotImplemented
__hash__ = None
data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'},
Foo('Bar'), Decimal('3.141592653589793238462643383279502884197169')]
j = json.dumps(data, cls=PythonObjectEncoder, indent=4)
data2 = json.loads(j, object_hook=as_python_object)
assert data == data2 # both should be same
You can extend the dict class like so:
#!/usr/local/bin/python3
import json
class Serializable(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# hack to fix _json.so make_encoder serialize properly
self.__setitem__('dummy', 1)
def _myattrs(self):
return [
(x, self._repr(getattr(self, x)))
for x in self.__dir__()
if x not in Serializable().__dir__()
]
def _repr(self, value):
if isinstance(value, (str, int, float, list, tuple, dict)):
return value
else:
return repr(value)
def __repr__(self):
return '<%s.%s object at %s>' % (
self.__class__.__module__,
self.__class__.__name__,
hex(id(self))
)
def keys(self):
return iter([x[0] for x in self._myattrs()])
def values(self):
return iter([x[1] for x in self._myattrs()])
def items(self):
return iter(self._myattrs())
Now to make your classes serializable with the regular encoder, extend 'Serializable':
class MySerializableClass(Serializable):
attr_1 = 'first attribute'
attr_2 = 23
def my_function(self):
print('do something here')
obj = MySerializableClass()
print(obj) will print something like:
<__main__.MySerializableClass object at 0x1073525e8>
print(json.dumps(obj, indent=4)) will print something like:
{
"attr_1": "first attribute",
"attr_2": 23,
"my_function": "<bound method MySerializableClass.my_function of <__main__.MySerializableClass object at 0x1073525e8>>"
}
I suggest putting the hack into the class definition. This way, once the class is defined, it supports JSON. Example:
import json
class MyClass( object ):
def _jsonSupport( *args ):
def default( self, xObject ):
return { 'type': 'MyClass', 'name': xObject.name() }
def objectHook( obj ):
if 'type' not in obj:
return obj
if obj[ 'type' ] != 'MyClass':
return obj
return MyClass( obj[ 'name' ] )
json.JSONEncoder.default = default
json._default_decoder = json.JSONDecoder( object_hook = objectHook )
_jsonSupport()
def __init__( self, name ):
self._name = name
def name( self ):
return self._name
def __repr__( self ):
return '<MyClass(name=%s)>' % self._name
myObject = MyClass( 'Magneto' )
jsonString = json.dumps( [ myObject, 'some', { 'other': 'objects' } ] )
print "json representation:", jsonString
decoded = json.loads( jsonString )
print "after decoding, our object is the first in the list", decoded[ 0 ]
The problem with overriding JSONEncoder().default is that you can do it only once. If you stumble upon anything a special data type that does not work with that pattern (like if you use a strange encoding). With the pattern below, you can always make your class JSON serializable, provided that the class field you want to serialize is serializable itself (and can be added to a python list, barely anything). Otherwise, you have to apply recursively the same pattern to your json field (or extract the serializable data from it):
# base class that will make all derivatives JSON serializable:
class JSONSerializable(list): # need to derive from a serializable class.
def __init__(self, value = None):
self = [ value ]
def setJSONSerializableValue(self, value):
self = [ value ]
def getJSONSerializableValue(self):
return self[1] if len(self) else None
# derive your classes from JSONSerializable:
class MyJSONSerializableObject(JSONSerializable):
def __init__(self): # or any other function
# ....
# suppose your__json__field is the class member to be serialized.
# it has to be serializable itself.
# Every time you want to set it, call this function:
self.setJSONSerializableValue(your__json__field)
# ...
# ... and when you need access to it, get this way:
do_something_with_your__json__field(self.getJSONSerializableValue())
# now you have a JSON default-serializable class:
a = MyJSONSerializableObject()
print json.dumps(a)
I don't understand why you can't write a serialize function for your own class? You implement the custom encoder inside the class itself and allow "people" to call the serialize function that will essentially return self.__dict__ with functions stripped out.
edit:
This question agrees with me, that the most simple way is write your own method and return the json serialized data that you want. They also recommend to try jsonpickle, but now you're adding an additional dependency for beauty when the correct solution comes built in.
For production environment, prepare rather own module of json with your own custom encoder, to make it clear that you overrides something.
Monkey-patch is not recommended, but you can do monkey patch in your testenv.
For example,
class JSONDatetimeAndPhonesEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, (datetime.date, datetime.datetime)):
return obj.date().isoformat()
elif isinstance(obj, basestring):
try:
number = phonenumbers.parse(obj)
except phonenumbers.NumberParseException:
return json.JSONEncoder.default(self, obj)
else:
return phonenumbers.format_number(number, phonenumbers.PhoneNumberFormat.NATIONAL)
else:
return json.JSONEncoder.default(self, obj)
you want:
payload = json.dumps(your_data, cls=JSONDatetimeAndPhonesEncoder)
or:
payload = your_dumps(your_data)
or:
payload = your_json.dumps(your_data)
however in testing environment, go a head:
#pytest.fixture(scope='session', autouse=True)
def testenv_monkey_patching():
json._default_encoder = JSONDatetimeAndPhonesEncoder()
which will apply your encoder to all json.dumps occurrences.