Python3.5 object and json.dumps() output

Python3.5 object and json.dumps() output - python

I wrote a class that would allow me to add days (integers) to dates (string %Y-%m-%d). The objects of this class need to be JSON serializable.
Adding days in the form of integers to my objects works as expected. However json.dumps(obj) returns too much info ("2016-03-23 15:57:47.926362") for my original object. Why ? How would I need to modify the class to get ""2016-03-23" instead ? Please see the example below.
Code:
from datetime import datetime, timedelta
import json
class Day(str):
def __init__(self, _datetime):
self.day = _datetime
def __str__(self):
return self.day.date().isoformat()
def __repr__(self):
return "%s" % self.day.date().isoformat()
def __add__(self, day):
new_day = self.day + timedelta(days=day)
return Day(new_day).__str__()
def __sub__(self, day):
new_day = self.day - timedelta(days=day)
return Day(new_day).__str__()
if __name__ == "__main__":
today = Day(datetime.today())
print(today) # 2016-03-23
print(json.dumps(today)) # "2016-03-23 15:57:47.926362"
print(today+1) # 2016-03-24
print(json.dumps(today+1)) # "2016-03-24"
print(today-1) # 2016-03-22
print(json.dumps(today-1)) # "2016-03-22"
Update. Here's my final code for those interested:
from datetime import datetime, timedelta
import json
class Day(str):
def __init__(self, datetime_obj):
self.day = datetime_obj
def __new__(self, datetime):
return str.__new__(Day, datetime.date().isoformat())
def __add__(self, day):
new_day = self.day + timedelta(days=day)
return Day(new_day)
def __sub__(self, day):
new_day = self.day - timedelta(days=day)
return Day(new_day)
if __name__ == "__main__":
today = Day(datetime.today())
print(type(today))
print(today) # 2016-03-23
print(json.dumps(today)) # "2016-03-23"
print(today + 1) # 2016-03-24
print(json.dumps(today + 1)) # "2016-03-24"
print(today - 1) # 2016-03-22
print(json.dumps(today - 1)) # "2016-03-22"
print(json.dumps(dict(today=today))) # {"today": "2016-03-23"}
print(json.dumps(dict(next_year=today+365))) # {"next_year": "2017-03-23"}
print(json.dumps(dict(last_year=today-366))) # {"last_year": "2015-03-23"}

Cool! Let's go with it. You are seeing:
print(json.dumps(today)) # "2016-03-23 15:57:47.926362"
Because somewhere in the encoding process, when deciding how to serialize what was passed to it, json.dumps calls isinstance(..., str) on your object. This returns True and your object is serialized like this string it secretly is.
But where does the "2016-03-23 15:57:47.926362" value come from?
When you call day = Day(datetime_obj), two things happen:
__new__ is called to instantiate the object. You haven't provided a __new__ method, so str.__new__ is used.
__init__ is called to initialize the object.
So day = Day(datetime_obj) effectively translates to:
day = str.__new__(Day, datetime_obj)
For json.dumps, your object will be a str, but the value of the str is set to the default string representation of datetime_obj. Which happens to be the full format you are seeing. Builtins, man!
I played around with this, and it seems if you roll your own __new__ (which is slightly exciting territory, tread carefully) which intercepts the str.__new__ call, you ~~should~~ be fine:
class Day(str):
def __new__(self, datetime):
return str.__new__(Day, datetime.date().isoformat())
But you didn't hear it from me if the whole thing catches fire.
PS The proper way would be to subclass JSONEncoder. But there is zero fun in it.
PS2 Oh, shoot, I tested this on 2.7. I may be completely out there, and if I am, just give me a "you tried" badge.

The reason for the json.dumps(today)'s behavior is not as obvious as it might appear at the first glance. To understand the issue, you should be able to answer two questions:
where does the string value that includes the time come from?
why Day.__str__ is not called by json encoder ? Should it?
Here're some prerequisites:
datetime.today() method is similar to datetime.now() -- it includes the current time (hour, minutes, etc). You could use date.today(), to get only date.
str creates immutable objects in Python; its value is set in the __new__ method that you have not overriden and therefore the default conversion str(datetime.today()) is used to initialize Day's value as a string. It creates the string value that includes both date and time in your case. You could override __new__, to get a different string value:
def __new__(cls, _datetime):
return str.__new__(cls, _datetime.date())
Day is a str subclass and therefore its instances are encoded as JSON strings
str methods return str objects instead of the corresponding subclass objects unless you override them e.g.:
>>> class S(str):
... def duplicate(self):
... return S(self * 2)
...
>>> s = S('abc')
>>> s.duplicate().duplicate()
'abcabcabcabc'
>>> s.upper().duplicate()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'duplicate'
s.upper() returns str object instead of S here and the following .duplicate() call fails.
In your case, to create the corresponding JSON string, json.dumps(today) performs an operation (re.sub() call in json.encode.encode_basestring()) on the today object that uses its value as a string i.e., the issue is that neither re.sub() nor encode_basestring() call __str__() method on instances of str subclasses. Even if encode_basestring(s) were as simple as return '"' + s + '"'; the result would be the same: '"' + today returns a str object and Day.__str__ is not called.
I don't know whether re module should call str(obj) in functions that accept isinstance(obj, str). Or whether json.encode.encode_basestring() should do it (or neither).
If you can't fix Day class; you could patch json.encode.encode_basestring() to call str(obj), to get a desirable JSON representation for str subtype instances (if you want to get the value returned by __str__() method -- putting aside whether it is wise to override __str__() on a str subclass in the first place):
import json
for suffix in ['', '_ascii']:
function_name = 'encode_basestring' + suffix
orig_function = getattr(json.encoder, function_name)
setattr(json.encoder, function_name, lambda s,_e=orig_function: _e(str(s)))
Related Python issue: Cannot override JSON encoding of basic type subclasses

Related

Not specifying an actual name for positional argument in function definition

A single-parameter function can be written out like this:
import datetime
def func1(mydate):
frmt = '%d-%m-%Y'
return datetime.datetime.strptime(mydate, frmt)
The Python 3.5 interpreter also accepts this form of the same function:
def func2(str):
frmt = '%d-%m-%Y'
return datetime.datetime.strptime(str, frmt)
I am having trouble learning/researching why the latter single-parameter function works correctly.
Researching the web using phrases like "defining function with argument that is an actual type keyword", "type function as argument", etc., yields no information. If anybody is familiar with this behavior, or can direct me to a resource, I would be very grateful. Here is a verifiable example that can be pasted into the interpreter:
import datetime
def func1(mydate):
frmt = '%d-%m-%Y'
return datetime.datetime.strptime(mydate, frmt)
def func2(str):
frmt = '%d-%m-%Y'
return datetime.datetime.strptime(str, frmt)
# test
todaysdate = "28-10-2019"
print(func1(todaysdate) == func2(todaysdate))
Thank you for your help :)

Python doesn't stop you from "shadowing" built-in object types for variable names. It's not recommended of course. However, you're simply creating a variable with the name str.

I'll put this in an answer instead of leaving my comment but just to add upon the other answer. You aren't passing a type to your function, you are overwriting a built in type. Though python allows this, you shouldn't do it. It will lead to confusion and errors down the road.
def func(str):
print(str)
print(str(2))
func('test')
Output:
test
Traceback (most recent call last):
File "main.py", line 4, in <module>
func('test')
File "main.py", line 3, in func
print(str(2))
TypeError: 'str' object is not callable
Versus:
def func(my_str):
print(my_str) # prints test
print(str(2)) # prints 2
func('test')

import datetime
def func1(mydate):
frmt = '%d-%m-%Y'
return str(datetime.datetime.strptime(mydate, frmt))
def func2(str):
frmt = '%d-%m-%Y'
return str(datetime.datetime.strptime(str, frmt))
# test
todaysdate = "28-10-2019"
print(func1(todaysdate) == func2(todaysdate))
this program gives you overview of the error by overriding str() function.
just see monkey patching concept in python.
https://www.geeksforgeeks.org/monkey-patching-in-python-dynamic-behavior/

Python statically typed constructor?

first of all, I'm brand new to python, and have basic understanding of c/c++/c# which are all statically typed languages. So can the following be done in python?
I want the variable birthday to be a datetime. So that whenever I instantiate I have to pass a datetime in with the parameters.
import datetime
class Person:
"""class representing a person."""
def __init__(self, name, sirname, gender, birthday):
self.name = name
self.sirname = sirname
self.gender = gender
self.birthday = datetime.date(birthday)
def getage(self):
"""returns age"""
today = datetime.date.today()
return today.year - self.birthday.year
Further down I instantiate as following
BIRTHDAY = datetime.date(1989, 10, 9)
NIELSON = Person('Nielson', 'Jansen', 'Male', BIRTHDAY)
this gives me the error:
TypeError: an integer is required (got type datetime.date)
is my instantiate wrong or should i get the following out of my head asap with python?
self.birthday = datetime.date(birthday)
(Why I would like to do this is so that the getage method always is presented with a datetime.date instead of something random if I make an instantiate mistake.)
PS: also, if my terminology is not correct, don’t hesitate to correct me. :)

I'm assuming that you fix
self.birthday = datetime.date(birthday)
into
self.birthday = birthday
as suggested by jonrsharpe.
Now if you want to check the type of birthday, you can write
assert isinstance(birthday, datetime.date)
at the beginning of the constructor. This is however not a static check because the check will only be performed when the assertion will run.

Left truncate using python 3.5 str.format?

Q: Is is possible to create a format string using Python 3.5's string formatting syntax to left truncate?
Basically what I want to do is take a git SHA:
"c1e33f6717b9d0125b53688d315aff9cf8dd9977"
And using only a format string, get the display only the right 8 chars:
"f8dd9977"
Things Ive tried:
Invalid Syntax
>>> "{foo[-8:]}".format(foo="c1e33f6717b9d0125b53688d315aff9cf8dd9977")
>>> "{foo[-8]}".format(foo="c1e33f6717b9d0125b53688d315aff9cf8dd9977")
>>> "{:8.-8}".format("c1e33f6717b9d0125b53688d315aff9cf8dd9977")
Wrong Result
### Results in first 8 not last 8.
>>> "{:8.8}".format("c1e33f6717b9d0125b53688d315aff9cf8dd9977")
Works but inflexible and cumbersome
### solution requires that bar is always length of 40.
>>> bar="c1e33f6717b9d0125b53688d315aff9cf8dd9977"
>>> "{foo[32]}{foo[33]}{foo[34]}{foo[35]}{foo[36]}{foo[37]}{foo[38]}{foo[39]}".format(foo=bar)
A similar question was asked, but never answered. However mine differs in that I am limited to using only format string, I don't have the ability to change the range of the input param. This means that the following is an unacceptable solution:
>>> bar="c1e33f6717b9d0125b53688d315aff9cf8dd9977"
>>> "{0}".format(bar[-8:])
One more aspect I should clarify... the above explains the simplest form of the problem. In actual context, the problem is expressed more correctly as:
>>> import os
>>> "foo {git_sha}".format(**os.environ)
Where I want to left_truncate "git_sha" environment variable. Admittedly this is a tad more complex than simplest form, but if I can solve the simplest - I can find a way to solve the more complex.

So here is my solution, with thanks to #JacquesGaudin and folks on #Python for providing much guidance...
class MyStr(object):
"""Additional format string options."""
def __init__(self, obj):
super(MyStr, self).__init__()
self.obj = obj
def __format__(self, spec):
if spec.startswith("ltrunc."):
offset = int(spec[7:])
return self.obj[offset:]
else:
return self.obj.__format__(spec)
So this works when doing this:
>>> f = {k: MyStr(v) for k, v in os.environ.items()}
>>> "{PATH:ltrunc.-8}".format(**f)

Subclassing str and overriding the __format__ method is an option:
class CustomStr(str):
def __format__(self, spec):
if spec == 'trunc_left':
return self[-8:]
else:
return super().__format__(spec)
git_sha = 'c1e33f6717b9d0125b53688d315aff9cf8dd9977'
s = CustomStr(git_sha)
print('{:trunc_left}'.format(s))
Better though, you can create a custom Formatter which inherits from string.Formatter and will provide a format method. By doing this, you can override a number of methods used in the process of formatting strings. In your case, you want to override format_field:
from string import Formatter
class CustomFormatter(Formatter):
def format_field(self, value, format_spec):
if format_spec.startswith('trunc_left.'):
char_number = int(format_spec[len('trunc_left.'):])
return value[-char_number:]
return super().format_field(value, format_spec)
environ = {'git_sha': 'c1e33f6717b9d0125b53688d315aff9cf8dd9977'}
fmt = CustomFormatter()
print(fmt.format('{git_sha:trunc_left.8}', **environ))
Depending on the usage, you could put this in a context manager and temporarily shadow the builtin format function:
from string import Formatter
class CustomFormat:
class CustomFormatter(Formatter):
def format_field(self, value, format_spec):
if format_spec.startswith('trunc_left.'):
char_number = int(format_spec[len('trunc_left.'):])
return value[-char_number:]
return super().format_field(value, format_spec)
def __init__(self):
self.custom_formatter = self.CustomFormatter()
def __enter__(self):
self.builtin_format = format
return self.custom_formatter.format
def __exit__(self, exc_type, exc_value, traceback):
# make sure global format is set back to the original
global format
format = self.builtin_format
environ = {'git_sha': 'c1e33f6717b9d0125b53688d315aff9cf8dd9977'}
with CustomFormat() as format:
# Inside this context, format is our custom formatter's method
print(format('{git_sha:trunc_left.8}', **environ))
print(format) # checking that format is now the builtin function

Python class inheritance

Happy new years guys!
I'm new to Python and have been experimenting with class inheritance. I created the code below and have a few questions -
Why is shDate3 of type numpy.datetime64 instead of SHDate3? shDate seems to be of type SHDate, which is the behavior I was expecting.
Why can't shDate2 be created? I'm receiving "'an integer is required'" error...
Thanks a lot!
from datetime import *
from numpy import *
class SHDate(date):
def __init__(self, year, month, day):
date.__init__(self, year, month, day)
class SHDate2(date):
def __init__(self, dateString):
timeStruct = strptime(dateString, "%Y-%m-%d")
date.__init__(self, timeStruct.tm_year, timeStruct.tm_mon, timeStruct.tm_mday)
class SHDate3(datetime64):
def __init__(self, dateString):
super(SHDate3, self).__init__(dateString)
if __name__ == '__main__':
shDate = SHDate(2010,1,31)
print type(shDate)
shDate3 = SHDate3("2011-10-11")
print shDate3
print type(shDate3)
shDate2 = SHDate2("2011-10-11")
print shDate2

Quick answers:
Make sure you know when you should use either type or isinstance, they are different. You may want to take a look at this question, it clarifies type and isinstance usage.
You shouldn't be using __init__ to custom your date class, because it is an immutable class. This question provides some discussion on customizing instances for those classes.

Python: Pickling a dict with some unpicklable items

I have an object gui_project which has an attribute .namespace, which is a namespace dict. (i.e. a dict from strings to objects.)
(This is used in an IDE-like program to let the user define his own object in a Python shell.)
I want to pickle this gui_project, along with the namespace. Problem is, some objects in the namespace (i.e. values of the .namespace dict) are not picklable objects. For example, some of them refer to wxPython widgets.
I'd like to filter out the unpicklable objects, that is, exclude them from the pickled version.
How can I do this?
(One thing I tried is to go one by one on the values and try to pickle them, but some infinite recursion happened, and I need to be safe from that.)
(I do implement a GuiProject.__getstate__ method right now, to get rid of other unpicklable stuff besides namespace.)

I would use the pickler's documented support for persistent object references. Persistent object references are objects that are referenced by the pickle but not stored in the pickle.
http://docs.python.org/library/pickle.html#pickling-and-unpickling-external-objects
ZODB has used this API for years, so it's very stable. When unpickling, you can replace the object references with anything you like. In your case, you would want to replace the object references with markers indicating that the objects could not be pickled.
You could start with something like this (untested):
import cPickle
def persistent_id(obj):
if isinstance(obj, wxObject):
return "filtered:wxObject"
else:
return None
class FilteredObject:
def __init__(self, about):
self.about = about
def __repr__(self):
return 'FilteredObject(%s)' % repr(self.about)
def persistent_load(obj_id):
if obj_id.startswith('filtered:'):
return FilteredObject(obj_id[9:])
else:
raise cPickle.UnpicklingError('Invalid persistent id')
def dump_filtered(obj, file):
p = cPickle.Pickler(file)
p.persistent_id = persistent_id
p.dump(obj)
def load_filtered(file)
u = cPickle.Unpickler(file)
u.persistent_load = persistent_load
return u.load()
Then just call dump_filtered() and load_filtered() instead of pickle.dump() and pickle.load(). wxPython objects will be pickled as persistent IDs, to be replaced with FilteredObjects at unpickling time.
You could make the solution more generic by filtering out objects that are not of the built-in types and have no __getstate__ method.
Update (15 Nov 2010): Here is a way to achieve the same thing with wrapper classes. Using wrapper classes instead of subclasses, it's possible to stay within the documented API.
from cPickle import Pickler, Unpickler, UnpicklingError
class FilteredObject:
def __init__(self, about):
self.about = about
def __repr__(self):
return 'FilteredObject(%s)' % repr(self.about)
class MyPickler(object):
def __init__(self, file, protocol=0):
pickler = Pickler(file, protocol)
pickler.persistent_id = self.persistent_id
self.dump = pickler.dump
self.clear_memo = pickler.clear_memo
def persistent_id(self, obj):
if not hasattr(obj, '__getstate__') and not isinstance(obj,
(basestring, int, long, float, tuple, list, set, dict)):
return "filtered:%s" % type(obj)
else:
return None
class MyUnpickler(object):
def __init__(self, file):
unpickler = Unpickler(file)
unpickler.persistent_load = self.persistent_load
self.load = unpickler.load
self.noload = unpickler.noload
def persistent_load(self, obj_id):
if obj_id.startswith('filtered:'):
return FilteredObject(obj_id[9:])
else:
raise UnpicklingError('Invalid persistent id')
if __name__ == '__main__':
from cStringIO import StringIO
class UnpickleableThing(object):
pass
f = StringIO()
p = MyPickler(f)
p.dump({'a': 1, 'b': UnpickleableThing()})
f.seek(0)
u = MyUnpickler(f)
obj = u.load()
print obj
assert obj['a'] == 1
assert isinstance(obj['b'], FilteredObject)
assert obj['b'].about

This is how I would do this (I did something similar before and it worked):
Write a function that determines whether or not an object is pickleable
Make a list of all the pickleable variables, based on the above function
Make a new dictionary (called D) that stores all the non-pickleable variables
For each variable in D (this only works if you have very similar variables in d)
make a list of strings, where each string is legal python code, such that
when all these strings are executed in order, you get the desired variable
Now, when you unpickle, you get back all the variables that were originally pickleable. For all variables that were not pickleable, you now have a list of strings (legal python code) that when executed in order, gives you the desired variable.
Hope this helps

I ended up coding my own solution to this, using Shane Hathaway's approach.
Here's the code. (Look for CutePickler and CuteUnpickler.) Here are the tests. It's part of GarlicSim, so you can use it by installing garlicsim and doing from garlicsim.general_misc import pickle_tools.
If you want to use it on Python 3 code, use the Python 3 fork of garlicsim.

One approach would be to inherit from pickle.Pickler, and override the save_dict() method. Copy it from the base class, which reads like this:
def save_dict(self, obj):
write = self.write
if self.bin:
write(EMPTY_DICT)
else: # proto 0 -- can't use EMPTY_DICT
write(MARK + DICT)
self.memoize(obj)
self._batch_setitems(obj.iteritems())
However, in the _batch_setitems, pass an iterator that filters out all items that you don't want to be dumped, e.g
def save_dict(self, obj):
write = self.write
if self.bin:
write(EMPTY_DICT)
else: # proto 0 -- can't use EMPTY_DICT
write(MARK + DICT)
self.memoize(obj)
self._batch_setitems(item for item in obj.iteritems()
if not isinstance(item[1], bad_type))
As save_dict isn't an official API, you need to check for each new Python version whether this override is still correct.

The filtering part is indeed tricky. Using simple tricks, you can easily get the pickle to work. However, you might end up filtering out too much and losing information that you could keep when the filter looks a little bit deeper. But the vast possibility of things that can end up in the .namespace makes building a good filter difficult.
However, we could leverage pieces that are already part of Python, such as deepcopy in the copy module.
I made a copy of the stock copy module, and did the following things:
create a new type named LostObject to represent object that will be lost in pickling.
change _deepcopy_atomic to make sure x is picklable. If it's not, return an instance of LostObject
objects can define methods __reduce__ and/or __reduce_ex__ to provide hint about whether and how to pickle it. We make sure these methods will not throw exception to provide hint that it cannot be pickled.
to avoid making unnecessary copy of big object (a la actual deepcopy), we recursively check whether an object is picklable, and only make unpicklable part. For instance, for a tuple of a picklable list and and an unpickable object, we will make a copy of the tuple - just the container - but not its member list.
The following is the diff:
[~/Development/scratch/] $ diff -uN /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py mcopy.py
--- /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py 2010-01-09 00:18:38.000000000 -0800
+++ mcopy.py 2010-11-10 08:50:26.000000000 -0800
## -157,6 +157,13 ##
cls = type(x)
+ # if x is picklable, there is no need to make a new copy, just ref it
+ try:
+ dumps(x)
+ return x
+ except TypeError:
+ pass
+
copier = _deepcopy_dispatch.get(cls)
if copier:
y = copier(x, memo)
## -179,10 +186,18 ##
reductor = getattr(x, "__reduce_ex__", None)
if reductor:
rv = reductor(2)
+ try:
+ x.__reduce_ex__()
+ except TypeError:
+ rv = LostObject, tuple()
else:
reductor = getattr(x, "__reduce__", None)
if reductor:
rv = reductor()
+ try:
+ x.__reduce__()
+ except TypeError:
+ rv = LostObject, tuple()
else:
raise Error(
"un(deep)copyable object of type %s" % cls)
## -194,7 +209,12 ##
_deepcopy_dispatch = d = {}
+from pickle import dumps
+class LostObject(object): pass
def _deepcopy_atomic(x, memo):
+ try:
+ dumps(x)
+ except TypeError: return LostObject()
return x
d[type(None)] = _deepcopy_atomic
d[type(Ellipsis)] = _deepcopy_atomic
Now back to the pickling part. You simply make a deepcopy using this new deepcopy function and then pickle the copy. The unpicklable parts have been removed during the copying process.
x = dict(a=1)
xx = dict(x=x)
x['xx'] = xx
x['f'] = file('/tmp/1', 'w')
class List():
def __init__(self, *args, **kwargs):
print 'making a copy of a list'
self.data = list(*args, **kwargs)
x['large'] = List(range(1000))
# now x contains a loop and a unpickable file object
# the following line will throw
from pickle import dumps, loads
try:
dumps(x)
except TypeError:
print 'yes, it throws'
def check_picklable(x):
try:
dumps(x)
except TypeError:
return False
return True
class LostObject(object): pass
from mcopy import deepcopy
# though x has a big List object, this deepcopy will not make a new copy of it
c = deepcopy(x)
dumps(c)
cc = loads(dumps(c))
# check loop refrence
if cc['xx']['x'] == cc:
print 'yes, loop reference is preserved'
# check unpickable part
if isinstance(cc['f'], LostObject):
print 'unpicklable part is now an instance of LostObject'
# check large object
if loads(dumps(c))['large'].data[999] == x['large'].data[999]:
print 'large object is ok'
Here is the output:
making a copy of a list
yes, it throws
yes, loop reference is preserved
unpicklable part is now an instance of LostObject
large object is ok
You see that 1) mutual pointers (between x and xx) are preserved and we do not run into infinite loop; 2) the unpicklable file object is converted to a LostObject instance; and 3) not new copy of the large object is created since it is picklable.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python3.5 object and json.dumps() output - python

Related

Not specifying an actual name for positional argument in function definition

Python statically typed constructor?

Left truncate using python 3.5 str.format?

Python class inheritance

Python: Pickling a dict with some unpicklable items

Categories

Resources