What is the recommended way of serializing a namedtuple to json with the field names retained?
Serializing a namedtuple to json results in only the values being serialized and the field names being lost in translation. I would like the fields also to be retained when json-ized and hence did the following:
class foobar(namedtuple('f', 'foo, bar')):
__slots__ = ()
def __iter__(self):
yield self._asdict()
The above serializes to json as I expect and behaves as namedtuple in other places I use (attribute access etc.,) except with a non-tuple like results while iterating it (which fine for my use case).
What is the "correct way" of converting to json with the field names retained?
If it's just one namedtuple you're looking to serialize, using its _asdict() method will work (with Python >= 2.7)
>>> from collections import namedtuple
>>> import json
>>> FB = namedtuple("FB", ("foo", "bar"))
>>> fb = FB(123, 456)
>>> json.dumps(fb._asdict())
'{"foo": 123, "bar": 456}'
This is pretty tricky, since namedtuple() is a factory which returns a new type derived from tuple. One approach would be to have your class also inherit from UserDict.DictMixin, but tuple.__getitem__ is already defined and expects an integer denoting the position of the element, not the name of its attribute:
>>> f = foobar('a', 1)
>>> f[0]
'a'
At its heart the namedtuple is an odd fit for JSON, since it is really a custom-built type whose key names are fixed as part of the type definition, unlike a dictionary where key names are stored inside the instance. This prevents you from "round-tripping" a namedtuple, e.g. you cannot decode a dictionary back into a namedtuple without some other a piece of information, like an app-specific type marker in the dict {'a': 1, '#_type': 'foobar'}, which is a bit hacky.
This is not ideal, but if you only need to encode namedtuples into dictionaries, another approach is to extend or modify your JSON encoder to special-case these types. Here is an example of subclassing the Python json.JSONEncoder. This tackles the problem of ensuring that nested namedtuples are properly converted to dictionaries:
from collections import namedtuple
from json import JSONEncoder
class MyEncoder(JSONEncoder):
def _iterencode(self, obj, markers=None):
if isinstance(obj, tuple) and hasattr(obj, '_asdict'):
gen = self._iterencode_dict(obj._asdict(), markers)
else:
gen = JSONEncoder._iterencode(self, obj, markers)
for chunk in gen:
yield chunk
class foobar(namedtuple('f', 'foo, bar')):
pass
enc = MyEncoder()
for obj in (foobar('a', 1), ('a', 1), {'outer': foobar('x', 'y')}):
print enc.encode(obj)
{"foo": "a", "bar": 1}
["a", 1]
{"outer": {"foo": "x", "bar": "y"}}
It looks like you used to be able to subclass simplejson.JSONEncoder to make this work, but with the latest simplejson code, that is no longer the case: you have to actually modify the project code. I see no reason why simplejson should not support namedtuples, so I forked the project, added namedtuple support, and I'm currently waiting for my branch to be pulled back into the main project. If you need the fixes now, just pull from my fork.
EDIT: Looks like the latest versions of simplejson now natively support this with the namedtuple_as_object option, which defaults to True.
I wrote a library for doing this: https://github.com/ltworf/typedload
It can go from and to named-tuple and back.
It supports quite complicated nested structures, with lists, sets, enums, unions, default values. It should cover most common cases.
edit: The library also supports dataclass and attr classes.
It's impossible to serialize namedtuples correctly with the native python json library. It will always see tuples as lists, and it is impossible to override the default serializer to change this behaviour. It's worse if objects are nested.
Better to use a more robust library like orjson:
import orjson
from typing import NamedTuple
class Rectangle(NamedTuple):
width: int
height: int
def default(obj):
if hasattr(obj, '_asdict'):
return obj._asdict()
rectangle = Rectangle(width=10, height=20)
print(orjson.dumps(rectangle, default=default))
=>
{
"width":10,
"height":20
}
There is a more convenient solution is to use the decorator (it uses the protected field _fields).
Python 2.7+:
import json
from collections import namedtuple, OrderedDict
def json_serializable(cls):
def as_dict(self):
yield OrderedDict(
(name, value) for name, value in zip(
self._fields,
iter(super(cls, self).__iter__())))
cls.__iter__ = as_dict
return cls
#Usage:
C = json_serializable(namedtuple('C', 'a b c'))
print json.dumps(C('abc', True, 3.14))
# or
#json_serializable
class D(namedtuple('D', 'a b c')):
pass
print json.dumps(D('abc', True, 3.14))
Python 3.6.6+:
import json
from typing import TupleName
def json_serializable(cls):
def as_dict(self):
yield {name: value for name, value in zip(
self._fields,
iter(super(cls, self).__iter__()))}
cls.__iter__ = as_dict
return cls
# Usage:
#json_serializable
class C(NamedTuple):
a: str
b: bool
c: float
print(json.dumps(C('abc', True, 3.14))
It recursively converts the namedTuple data to json.
print(m1)
## Message(id=2, agent=Agent(id=1, first_name='asd', last_name='asd', mail='2#mai.com'), customer=Customer(id=1, first_name='asd', last_name='asd', mail='2#mai.com', phone_number=123123), type='image', content='text', media_url='h.com', la=123123, ls=4512313)
def reqursive_to_json(obj):
_json = {}
if isinstance(obj, tuple):
datas = obj._asdict()
for data in datas:
if isinstance(datas[data], tuple):
_json[data] = (reqursive_to_json(datas[data]))
else:
print(datas[data])
_json[data] = (datas[data])
return _json
data = reqursive_to_json(m1)
print(data)
{'agent': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'id': 1},
'content': 'text',
'customer': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'phone_number': 123123,
'id': 1},
'id': 2,
'la': 123123,
'ls': 4512313,
'media_url': 'h.com',
'type': 'image'}
The jsonplus library provides a serializer for NamedTuple instances. Use its compatibility mode to output simple objects if needed, but prefer the default as it is helpful for decoding back.
This is an old question. However:
A suggestion for all those with the same question, think carefully about using any of the private or internal features of the NamedTuple because they have before and will change again over time.
For example, if your NamedTuple is a flat value object and you're only interested in serializing it and not in cases where it is nested into another object, you could avoid the troubles that would come up with __dict__ being removed or _as_dict() changing and just do something like (and yes this is Python 3 because this answer is for the present):
from typing import NamedTuple
class ApiListRequest(NamedTuple):
group: str="default"
filter: str="*"
def to_dict(self):
return {
'group': self.group,
'filter': self.filter,
}
def to_json(self):
return json.dumps(self.to_dict())
I tried to use the default callable kwarg to dumps in order to do the to_dict() call if available, but that didn't get called as the NamedTuple is convertible to a list.
Here is my take on the problem. It serializes the NamedTuple, takes care of folded NamedTuples and Lists inside of them
def recursive_to_dict(obj: Any) -> dict:
_dict = {}
if isinstance(obj, tuple):
node = obj._asdict()
for item in node:
if isinstance(node[item], list): # Process as a list
_dict[item] = [recursive_to_dict(x) for x in (node[item])]
elif getattr(node[item], "_asdict", False): # Process as a NamedTuple
_dict[item] = recursive_to_dict(node[item])
else: # Process as a regular element
_dict[item] = (node[item])
return _dict
simplejson.dump() instead of json.dump does the job. It may be slower though.
Related
What is the recommended way of serializing a namedtuple to json with the field names retained?
Serializing a namedtuple to json results in only the values being serialized and the field names being lost in translation. I would like the fields also to be retained when json-ized and hence did the following:
class foobar(namedtuple('f', 'foo, bar')):
__slots__ = ()
def __iter__(self):
yield self._asdict()
The above serializes to json as I expect and behaves as namedtuple in other places I use (attribute access etc.,) except with a non-tuple like results while iterating it (which fine for my use case).
What is the "correct way" of converting to json with the field names retained?
If it's just one namedtuple you're looking to serialize, using its _asdict() method will work (with Python >= 2.7)
>>> from collections import namedtuple
>>> import json
>>> FB = namedtuple("FB", ("foo", "bar"))
>>> fb = FB(123, 456)
>>> json.dumps(fb._asdict())
'{"foo": 123, "bar": 456}'
This is pretty tricky, since namedtuple() is a factory which returns a new type derived from tuple. One approach would be to have your class also inherit from UserDict.DictMixin, but tuple.__getitem__ is already defined and expects an integer denoting the position of the element, not the name of its attribute:
>>> f = foobar('a', 1)
>>> f[0]
'a'
At its heart the namedtuple is an odd fit for JSON, since it is really a custom-built type whose key names are fixed as part of the type definition, unlike a dictionary where key names are stored inside the instance. This prevents you from "round-tripping" a namedtuple, e.g. you cannot decode a dictionary back into a namedtuple without some other a piece of information, like an app-specific type marker in the dict {'a': 1, '#_type': 'foobar'}, which is a bit hacky.
This is not ideal, but if you only need to encode namedtuples into dictionaries, another approach is to extend or modify your JSON encoder to special-case these types. Here is an example of subclassing the Python json.JSONEncoder. This tackles the problem of ensuring that nested namedtuples are properly converted to dictionaries:
from collections import namedtuple
from json import JSONEncoder
class MyEncoder(JSONEncoder):
def _iterencode(self, obj, markers=None):
if isinstance(obj, tuple) and hasattr(obj, '_asdict'):
gen = self._iterencode_dict(obj._asdict(), markers)
else:
gen = JSONEncoder._iterencode(self, obj, markers)
for chunk in gen:
yield chunk
class foobar(namedtuple('f', 'foo, bar')):
pass
enc = MyEncoder()
for obj in (foobar('a', 1), ('a', 1), {'outer': foobar('x', 'y')}):
print enc.encode(obj)
{"foo": "a", "bar": 1}
["a", 1]
{"outer": {"foo": "x", "bar": "y"}}
It looks like you used to be able to subclass simplejson.JSONEncoder to make this work, but with the latest simplejson code, that is no longer the case: you have to actually modify the project code. I see no reason why simplejson should not support namedtuples, so I forked the project, added namedtuple support, and I'm currently waiting for my branch to be pulled back into the main project. If you need the fixes now, just pull from my fork.
EDIT: Looks like the latest versions of simplejson now natively support this with the namedtuple_as_object option, which defaults to True.
I wrote a library for doing this: https://github.com/ltworf/typedload
It can go from and to named-tuple and back.
It supports quite complicated nested structures, with lists, sets, enums, unions, default values. It should cover most common cases.
edit: The library also supports dataclass and attr classes.
It's impossible to serialize namedtuples correctly with the native python json library. It will always see tuples as lists, and it is impossible to override the default serializer to change this behaviour. It's worse if objects are nested.
Better to use a more robust library like orjson:
import orjson
from typing import NamedTuple
class Rectangle(NamedTuple):
width: int
height: int
def default(obj):
if hasattr(obj, '_asdict'):
return obj._asdict()
rectangle = Rectangle(width=10, height=20)
print(orjson.dumps(rectangle, default=default))
=>
{
"width":10,
"height":20
}
There is a more convenient solution is to use the decorator (it uses the protected field _fields).
Python 2.7+:
import json
from collections import namedtuple, OrderedDict
def json_serializable(cls):
def as_dict(self):
yield OrderedDict(
(name, value) for name, value in zip(
self._fields,
iter(super(cls, self).__iter__())))
cls.__iter__ = as_dict
return cls
#Usage:
C = json_serializable(namedtuple('C', 'a b c'))
print json.dumps(C('abc', True, 3.14))
# or
#json_serializable
class D(namedtuple('D', 'a b c')):
pass
print json.dumps(D('abc', True, 3.14))
Python 3.6.6+:
import json
from typing import TupleName
def json_serializable(cls):
def as_dict(self):
yield {name: value for name, value in zip(
self._fields,
iter(super(cls, self).__iter__()))}
cls.__iter__ = as_dict
return cls
# Usage:
#json_serializable
class C(NamedTuple):
a: str
b: bool
c: float
print(json.dumps(C('abc', True, 3.14))
It recursively converts the namedTuple data to json.
print(m1)
## Message(id=2, agent=Agent(id=1, first_name='asd', last_name='asd', mail='2#mai.com'), customer=Customer(id=1, first_name='asd', last_name='asd', mail='2#mai.com', phone_number=123123), type='image', content='text', media_url='h.com', la=123123, ls=4512313)
def reqursive_to_json(obj):
_json = {}
if isinstance(obj, tuple):
datas = obj._asdict()
for data in datas:
if isinstance(datas[data], tuple):
_json[data] = (reqursive_to_json(datas[data]))
else:
print(datas[data])
_json[data] = (datas[data])
return _json
data = reqursive_to_json(m1)
print(data)
{'agent': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'id': 1},
'content': 'text',
'customer': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'phone_number': 123123,
'id': 1},
'id': 2,
'la': 123123,
'ls': 4512313,
'media_url': 'h.com',
'type': 'image'}
The jsonplus library provides a serializer for NamedTuple instances. Use its compatibility mode to output simple objects if needed, but prefer the default as it is helpful for decoding back.
This is an old question. However:
A suggestion for all those with the same question, think carefully about using any of the private or internal features of the NamedTuple because they have before and will change again over time.
For example, if your NamedTuple is a flat value object and you're only interested in serializing it and not in cases where it is nested into another object, you could avoid the troubles that would come up with __dict__ being removed or _as_dict() changing and just do something like (and yes this is Python 3 because this answer is for the present):
from typing import NamedTuple
class ApiListRequest(NamedTuple):
group: str="default"
filter: str="*"
def to_dict(self):
return {
'group': self.group,
'filter': self.filter,
}
def to_json(self):
return json.dumps(self.to_dict())
I tried to use the default callable kwarg to dumps in order to do the to_dict() call if available, but that didn't get called as the NamedTuple is convertible to a list.
Here is my take on the problem. It serializes the NamedTuple, takes care of folded NamedTuples and Lists inside of them
def recursive_to_dict(obj: Any) -> dict:
_dict = {}
if isinstance(obj, tuple):
node = obj._asdict()
for item in node:
if isinstance(node[item], list): # Process as a list
_dict[item] = [recursive_to_dict(x) for x in (node[item])]
elif getattr(node[item], "_asdict", False): # Process as a NamedTuple
_dict[item] = recursive_to_dict(node[item])
else: # Process as a regular element
_dict[item] = (node[item])
return _dict
simplejson.dump() instead of json.dump does the job. It may be slower though.
Unfortunately I have to load a dictionary containing an invalid name (which I can't change):
dict = {..., "invalid-name": 0, ...}
I would like to cast this dictionary into a dataclass object, but I can't define an attribute with this name.
from dataclasses import dataclass
#dataclass
class Dict:
...
invalid-name: int # can't do this
...
The only solution I could find is to change the dictionary key into a valid one right before casting it into a dataclass object:
dict["valid_name"] = dict.pop("invalid-name")
But I would like to avoid using string literals...
Is there any better solution to this?
One solution would be using dict-to-dataclass. As mentioned in its documents it has two options:
1.passing dictionary keys
It's probably quite common that your dataclass fields have the same names as the dictionary keys they map to but in case they don't, you can pass the dictionary key as the first argument (or the dict_key keyword argument) to field_from_dict:
#dataclass
class MyDataclass(DataclassFromDict):
name_in_dataclass: str = field_from_dict("nameInDictionary")
origin_dict = {
"nameInDictionary": "field value"
}
dataclass_instance = MyDataclass.from_dict(origin_dict)
>>> dataclass_instance.name_in_dataclass
"field value"
Custom converters
If you need to convert a dictionary value that isn't covered by the defaults, you can pass in a converter function using field_from_dict's converter parameter:
def yes_no_to_bool(yes_no: str) -> bool:
return yes_no == "yes"
#dataclass
class MyDataclass(DataclassFromDict):
is_yes: bool = field_from_dict(converter=yes_no_to_bool)
dataclass_instance = MyDataclass.from_dict({"is_yes": "yes"})
>>> dataclass_instance.is_yes
True
The following code allow to filter the nonexistent keys :
import dataclasses
#dataclasses.dataclass
class ClassDict:
valid-name0: str
valid-name1: int
...
dict = {..., "invalid-name": 0, ...}
dict = {k:v for k,v in dict.items() if k in tuple(e.name for e in dataclasses.fields(ClassDict).keys())}
However, I'm sure there should be a better way to do it since this is a bit hacky.
I would define a from_dict class method anyway, which would be a natural place to make the change.
#dataclass
class MyDict:
...
valid_name: int
...
#classmethod
def from_dict(cls, d):
d['valid_name'] = d.pop('invalid-name')
return cls(**d)
md = MyDict.from_dict({'invalid-name': 3, ...})
Whether you should modify d in place or do something to avoid unnecessary copies is another matter.
Another option could be to use the dataclass-wizard library, which is likewise a de/serialization library built on top of dataclasses. It should similarly support custom key mappings, as needed in this case.
I've also timed it with the builtin timeit module, and found it to be (on average) about 5x faster than a solution with dict_to_dataclass. I've added the code I used for comparison below.
from dataclasses import dataclass
from timeit import timeit
from typing_extensions import Annotated # Note: in Python 3.9+, can import this from `typing` instead
from dataclass_wizard import JSONWizard, json_key
from dict_to_dataclass import DataclassFromDict, field_from_dict
#dataclass
class ClassDictWiz(JSONWizard):
valid_name: Annotated[int, json_key('invalid-name')]
#dataclass
class ClassDict(DataclassFromDict):
valid_name: int = field_from_dict('invalid-name')
my_dict = {"invalid-name": 0}
n = 100_000
print('dict-to-dataclass: ', round(timeit('ClassDict.from_dict(my_dict)', globals=globals(), number=n), 3))
print('dataclass-wizard: ', round(timeit('ClassDictWiz.from_dict(my_dict)', globals=globals(), number=n), 3))
i1, i2 = ClassDict.from_dict(my_dict), ClassDictWiz.from_dict(my_dict)
# assert we get the same result with both approaches
assert i1.__dict__ == i2.__dict__
Results, on my Mac OS X laptop:
dict-to-dataclass: 0.594
dataclass-wizard: 0.098
I want to create a dictionary that would store its keys as string variables,
but would still be able to retrieve its values using enums.
The reason for that is that my code is sharing this dictionary with other programs and processes, by saving it into a database and by sending it to other applications using REST. In order to make it easier to manage the dictionary itself, I would like the keys to be of type string. However, I would appreciate it if I would be able to get values by enums.
I don't want to access values simply by using strings since that would be using "magic strings".
The option of constants variables is just ugly, and it seems like enum is the nicest solution to this problem. That being said, I don't want it to use the .value each time because it makes the code longer and uglier.
Is it possible to add support for both strings and enums?
You would need to override the __setitem__ and __getitem__ methods in order to support both string and enum types.
Your goal is to get the method's parameter and cast it into a string if it's an enum. That way would be able to retrieve and store string values using the enum variable.
Your code supposes to look like that:
from collections import UserDict
from enum import Enum
class EnumDict(UserDict):
def __setitem__(self, key, value):
string_key = key.value if isinstance(key, Enum) else key
super().__setitem__(string_key, value)
def __getitem__(self, item):
string_item = item.value if isinstance(item, Enum) else item
return super().__getitem__(string_item)
Pay attention that this would work only if you actually inherit from UserDict.
Please read this excellent blog post "The problem with inheriting from dict and list in Python" in order to understand why I'm using UserDict instead of a regular dict. Otherwise, you would need to override __init__, update, and get methods too.
Pay attention that if you're using python2 (which you shouldn't since it is no longer supported) the above code won't work because UserDict is used as a backward compatibility thing for the time in which you couldn't inherit directly from dict. It does not have the same set of features and capabilities as in python3.
Anyway, if you're using python2 or just don't want to inherit from UserDict,
this is what your code should look like:
from enum import Enum
class EnumDict(dict):
def __init__(self, *args, **kwargs):
self.update(*args, **kwargs)
def update(self, *args, **kwargs):
try:
for key, value in args[0].items():
self.__setitem__(key, value)
except Exception:
super().update(*args, *kwargs)
def __setitem__(self, key, value):
string_key = key.value if isinstance(key, Enum) else key
super().__setitem__(string_key, value)
def __getitem__(self, item):
string_item = item.value if isinstance(item, Enum) else item
return super().__getitem__(string_item)
def get(self, key, default=None):
string_key = key.value if isinstance(key, Enum) else key
return super().get(string_key, default)
To sum things up, the whole idea here is to cast enum into its string value.
When using dict and not UserDict, one should also take care of the
initialization, update and regular get processes - making sure you won't get
stuck with enums instead of regular strings.
Here is an example of how this dictionary works:
In [1]: class Test(Enum):
...: A = "abc"
...: B = "xyz"
...: C = "qwerty"
In [2]: my_dict = EnumDict({"ok": "fine", Test.B: "bla-bla"})
In [3]: my_dict
Out[3]: {'ok': 'fine', 'xyz': 'bla-bla'}
In [4]: my_dict[Test.A] = "hello"
In [5]: my_dict["testing"] = "works"
In [6]: my_dict
Out[6]: {'ok': 'fine', 'xyz': 'bla-bla', 'abc': 'hello', 'testing': 'works'}
In [7]: my_dict.update({"isOkay": "yes", Test.B: "wow"})
In [8]: my_dict
Out[8]: {'ok': 'fine', 'xyz': 'wow', 'abc': 'hello', 'testing': 'works', 'isOkay': 'yes'}
In [9]: my_dict.get(Test.C, "does not exist")
Out[9]: 'does not exist'
In [10]: my_dict.get(Test.A)
Out[10]: 'hello'
In [11]: my_dict[Test.A]
Out[11]: 'hello'
In [12]: my_dict["xyz"]
Out[12]: 'wow'
Combine your Enum with str, then each member will be a string as well as an Enum and you can use a normal dict:
class Test(str, Enum):
A = "abc"
B = "xyz"
C = "qwerty"
my_dict = {"ok": "fine", Test.B: "bla-bla"}
The only difference is the display of the dict when an Enum member is the key:
>>> my_dict
{'ok': 'fine', <Test.B: 'xyz'>: 'bla-bla'}
But it still works just fine:
>>> my_dict['xyz']
'bla-bla'
>>> my_dict[Test.B]
'bla-bla'
Is there a way to add duplicate keys to json with python?
From my understanding, you can't have duplicate keys in python dictionaries. Usually, how I go about creating json is to create the dictionary and then json.dumps. However, I need duplicated keys within the JSON for testing purposes. But I can't do so because I can't add duplicate keys in a python dictionary. I am trying to doing this in python 3
You could always construct such a string value by hand.
On the other hand, one can make the CPython json module to encode duplicate keys. This is very tricky in Python 2 because json module does not respect duck-typing at all.
The straightforward solution would be to inherit from collections.Mapping - well you can't, since "MyMapping is not a JSON serializable."
Next one tries to subclass a dict - well, but if json.dumps notices that the type is dict, it skips from calling __len__, and sees the underlying dict directly - if it is empty, {} is output directly, so clearly if we fake the methods, the underlying dictionary must not be empty.
The next source of joy is that actually __iter__ is called, which iterates keys; and for each key, the __getitem__ is called, so we need to remember what is the corresponding value to return for the given key... thus we arrive to a very ugly solution for Python 2:
class FakeDict(dict):
def __init__(self, items):
# need to have something in the dictionary
self['something'] = 'something'
self._items = items
def __getitem__(self, key):
return self.last_val
def __iter__(self):
def generator():
for key, value in self._items:
self.last_val = value
yield key
return generator()
In CPython 3.3+ it is slightly easier... no, collections.abc.Mapping does not work, yes, you need to subclass a dict, yes, you need to fake that your dictionary has content... but the internal JSON encoder calls items instead of __iter__ and __getitem__!
Thus on Python 3:
import json
class FakeDict(dict):
def __init__(self, items):
self['something'] = 'something'
self._items = items
def items(self):
return self._items
print(json.dumps(FakeDict([('a', 1), ('a', 2)])))
prints out
{"a": 1, "a": 2}
Thanks a lot Antti Haapala, I figured out you can even use this to convert an array of tuples into a FakeDict:
def function():
array_of_tuples = []
array_of_tuples.append(("key","value1"))
array_of_tuples.append(("key","value2"))
return FakeDict(array_of_tuples)
print(json.dumps(function()))
Output:
{"key": "value1", "key": "value2"}
And if you change the FakeDict class to this Empty dictionaries will be correctly parsed:
class FakeDict(dict):
def __init__(self, items):
if items != []:
self['something'] = 'something'
self._items = items
def items(self):
return self._items
def test():
array_of_tuples = []
return FakeDict(array_of_tuples)
print(json.dumps(test()))
Output:
"{}"
Actually, it's very easy:
$> python -c "import json; print json.dumps({1: 'a', '1': 'b'})"
{"1": "b", "1": "a"}
I'm very new to python and I wish I could do . notation to access values of a dict.
Lets say I have test like this:
>>> test = dict()
>>> test['name'] = 'value'
>>> print(test['name'])
value
But I wish I could do test.name to get value. Infact I did it by overriding the __getattr__ method in my class like this:
class JuspayObject:
def __init__(self,response):
self.__dict__['_response'] = response
def __getattr__(self,key):
try:
return self._response[key]
except KeyError,err:
sys.stderr.write('Sorry no key matches')
and this works! when I do:
test.name // I get value.
But the problem is when I just print test alone I get the error as:
'Sorry no key matches'
Why is this happening?
This functionality already exists in the standard libraries, so I recommend you just use their class.
>>> from types import SimpleNamespace
>>> d = {'key1': 'value1', 'key2': 'value2'}
>>> n = SimpleNamespace(**d)
>>> print(n)
namespace(key1='value1', key2='value2')
>>> n.key2
'value2'
Adding, modifying and removing values is achieved with regular attribute access, i.e. you can use statements like n.key = val and del n.key.
To go back to a dict again:
>>> vars(n)
{'key1': 'value1', 'key2': 'value2'}
The keys in your dict should be string identifiers for attribute access to work properly.
Simple namespace was added in Python 3.3. For older versions of the language, argparse.Namespace has similar behaviour.
I assume that you are comfortable in Javascript and want to borrow that kind of syntax... I can tell you by personal experience that this is not a great idea.
It sure does look less verbose and neat; but in the long run it is just obscure. Dicts are dicts, and trying to make them behave like objects with attributes will probably lead to (bad) surprises.
If you need to manipulate the fields of an object as if they were a dictionary, you can always resort to use the internal __dict__ attribute when you need it, and then it is explicitly clear what you are doing. Or use getattr(obj, 'key') to have into account the inheritance structure and class attributes too.
But by reading your example it seems that you are trying something different... As the dot operator will already look in the __dict__ attribute without any extra code.
In addition to this answer, one can add support for nested dicts as well:
from types import SimpleNamespace
class NestedNamespace(SimpleNamespace):
def __init__(self, dictionary, **kwargs):
super().__init__(**kwargs)
for key, value in dictionary.items():
if isinstance(value, dict):
self.__setattr__(key, NestedNamespace(value))
else:
self.__setattr__(key, value)
nested_namespace = NestedNamespace({
'parent': {
'child': {
'grandchild': 'value'
}
},
'normal_key': 'normal value',
})
print(nested_namespace.parent.child.grandchild) # value
print(nested_namespace.normal_key) # normal value
Note that this does not support dot notation for dicts that are somewhere inside e.g. lists.
Could you use a named tuple?
from collections import namedtuple
Test = namedtuple('Test', 'name foo bar')
my_test = Test('value', 'foo_val', 'bar_val')
print(my_test)
print(my_test.name)
__getattr__ is used as a fallback when all other attribute lookup rules have failed. When you try to "print" your object, Python look for a __repr__ method, and since you don't implement it in your class it ends up calling __getattr__ (yes, in Python methods are attributes too). You shouldn't assume which key getattr will be called with, and, most important, __getattr__ must raise an AttributeError if it cannot resolve key.
As a side note: don't use self.__dict__ for ordinary attribute access, just use the plain attribute notation:
class JuspayObject:
def __init__(self,response):
# don't use self.__dict__ here
self._response = response
def __getattr__(self,key):
try:
return self._response[key]
except KeyError,err:
raise AttributeError(key)
Now if your class has no other responsability (and your Python version is >= 2.6 and you don't need to support older versions), you may just use a namedtuple : http://docs.python.org/2/library/collections.html#collections.namedtuple
You can use the built-in method argparse.Namespace():
import argparse
args = argparse.Namespace()
args.name = 'value'
print(args.name)
# 'value'
You can also get the original dict via vars(args).
class convert_to_dot_notation(dict):
"""
Access dictionary attributes via dot notation
"""
__getattr__ = dict.get
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
test = {"name": "value"}
data = convert_to_dot_notation(test)
print(data.name)
You have to be careful when using __getattr__, because it's used for a lot of builtin Python functionality.
Try something like this...
class JuspayObject:
def __init__(self,response):
self.__dict__['_response'] = response
def __getattr__(self, key):
# First, try to return from _response
try:
return self.__dict__['_response'][key]
except KeyError:
pass
# If that fails, return default behavior so we don't break Python
try:
return self.__dict__[key]
except KeyError:
raise AttributeError, key
>>> j = JuspayObject({'foo': 'bar'})
>>> j.foo
'bar'
>>> j
<__main__.JuspayObject instance at 0x7fbdd55965f0>
Here is a simple, handy dot notation helper example that is working with nested items:
def dict_get(data:dict, path:str, default = None):
pathList = re.split(r'\.', path, flags=re.IGNORECASE)
result = data
for key in pathList:
try:
key = int(key) if key.isnumeric() else key
result = result[key]
except:
result = default
break
return result
Usage example:
my_dict = {"test1": "str1", "nested_dict": {"test2": "str2"}, "nested_list": ["str3", {"test4": "str4"}]}
print(dict_get(my_dict, "test1"))
# str1
print(dict_get(my_dict, "nested_dict.test2"))
# str2
print(dict_get(my_dict, "nested_list.1.test4"))
# str4
With a small addition to this answer you can support lists as well:
class NestedNamespace(SimpleNamespace):
def __init__(self, dictionary, **kwargs):
super().__init__(**kwargs)
for key, value in dictionary.items():
if isinstance(value, dict):
self.__setattr__(key, NestedNamespace(value))
elif isinstance(value, list):
self.__setattr__(key, map(NestedNamespace, value))
else:
self.__setattr__(key, value)
2022 answer: I've created the dotwiz package -- this is a fast, tiny library that seems to perform really well in most cases.
>>> from dotwiz import DotWiz
>>> test = DotWiz(hello='world')
>>> test.works = True
>>> test
✫(hello='world', works=True)
>>> test.hello
'world'
>>> assert test.works
This feature is baked into OmegaConf:
from omegaconf import OmegaConf
your_dict = {"k" : "v", "list" : [1, {"a": "1", "b": "2", 3: "c"}]}
adot_dict = OmegaConf.create(your_dict)
print(adot_dict.k)
print(adot_dict.list)
Installation is:
pip install omegaconf
This lib comes in handy for configurations, which it is actually made for:
from omegaconf import OmegaConf
cfg = OmegaConf.load('config.yml')
print(cfg.data_path)
I use the dotted_dict package:
>>> from dotted_dict import DottedDict
>>> test = DottedDict()
>>> test.name = 'value'
>>> print(test.name)
value
Advantages over SimpleNamespace
(See #win's answer.) DottedDict is an actual dict:
>>> isinstance(test, dict)
True
This allows, for example, checking for membership:
>>> 'name' in test
True
whereas for SimpleNamespace you need something much less readable like hasattr(test, 'name').
Don't use DotMap
I found this out the hard way. If you reference a non-member it adds it rather than throwing an error. This can lead to hard to find bugs in code:
>>> from dotmap import DotMap
>>> dm = DotMap()
>>> 'a' in dm
False
>>> x = dm.a
>>> 'a' in dm
True
#!/usr/bin/env python3
import json
from sklearn.utils import Bunch
from collections.abc import MutableMapping
def dotted(inpt: MutableMapping,
*args,
**kwargs
) -> Bunch:
"""
Enables recursive dot notation for ``dict``.
"""
return json.loads(json.dumps(inpt),
object_hook=lambda x:
Bunch(**{**Bunch(), **x}))
You can make hacks adding dot notation to Dicts mostly work, but there are always namespace problems. As in, what does this do?
x = DotDict()
x["values"] = 1989
print(x. values)
I use pydash, which is a Python port of JS's lodash, to do these things a different way when the nesting gets too ugly.
Add a __repr__() method to the class so that you can customize the text to be shown on
print text
Learn more here: https://web.archive.org/web/20121022015531/http://diveintopython.net/object_oriented_framework/special_class_methods2.html