Related
What is the recommended way of serializing a namedtuple to json with the field names retained?
Serializing a namedtuple to json results in only the values being serialized and the field names being lost in translation. I would like the fields also to be retained when json-ized and hence did the following:
class foobar(namedtuple('f', 'foo, bar')):
__slots__ = ()
def __iter__(self):
yield self._asdict()
The above serializes to json as I expect and behaves as namedtuple in other places I use (attribute access etc.,) except with a non-tuple like results while iterating it (which fine for my use case).
What is the "correct way" of converting to json with the field names retained?
If it's just one namedtuple you're looking to serialize, using its _asdict() method will work (with Python >= 2.7)
>>> from collections import namedtuple
>>> import json
>>> FB = namedtuple("FB", ("foo", "bar"))
>>> fb = FB(123, 456)
>>> json.dumps(fb._asdict())
'{"foo": 123, "bar": 456}'
This is pretty tricky, since namedtuple() is a factory which returns a new type derived from tuple. One approach would be to have your class also inherit from UserDict.DictMixin, but tuple.__getitem__ is already defined and expects an integer denoting the position of the element, not the name of its attribute:
>>> f = foobar('a', 1)
>>> f[0]
'a'
At its heart the namedtuple is an odd fit for JSON, since it is really a custom-built type whose key names are fixed as part of the type definition, unlike a dictionary where key names are stored inside the instance. This prevents you from "round-tripping" a namedtuple, e.g. you cannot decode a dictionary back into a namedtuple without some other a piece of information, like an app-specific type marker in the dict {'a': 1, '#_type': 'foobar'}, which is a bit hacky.
This is not ideal, but if you only need to encode namedtuples into dictionaries, another approach is to extend or modify your JSON encoder to special-case these types. Here is an example of subclassing the Python json.JSONEncoder. This tackles the problem of ensuring that nested namedtuples are properly converted to dictionaries:
from collections import namedtuple
from json import JSONEncoder
class MyEncoder(JSONEncoder):
def _iterencode(self, obj, markers=None):
if isinstance(obj, tuple) and hasattr(obj, '_asdict'):
gen = self._iterencode_dict(obj._asdict(), markers)
else:
gen = JSONEncoder._iterencode(self, obj, markers)
for chunk in gen:
yield chunk
class foobar(namedtuple('f', 'foo, bar')):
pass
enc = MyEncoder()
for obj in (foobar('a', 1), ('a', 1), {'outer': foobar('x', 'y')}):
print enc.encode(obj)
{"foo": "a", "bar": 1}
["a", 1]
{"outer": {"foo": "x", "bar": "y"}}
It looks like you used to be able to subclass simplejson.JSONEncoder to make this work, but with the latest simplejson code, that is no longer the case: you have to actually modify the project code. I see no reason why simplejson should not support namedtuples, so I forked the project, added namedtuple support, and I'm currently waiting for my branch to be pulled back into the main project. If you need the fixes now, just pull from my fork.
EDIT: Looks like the latest versions of simplejson now natively support this with the namedtuple_as_object option, which defaults to True.
I wrote a library for doing this: https://github.com/ltworf/typedload
It can go from and to named-tuple and back.
It supports quite complicated nested structures, with lists, sets, enums, unions, default values. It should cover most common cases.
edit: The library also supports dataclass and attr classes.
It's impossible to serialize namedtuples correctly with the native python json library. It will always see tuples as lists, and it is impossible to override the default serializer to change this behaviour. It's worse if objects are nested.
Better to use a more robust library like orjson:
import orjson
from typing import NamedTuple
class Rectangle(NamedTuple):
width: int
height: int
def default(obj):
if hasattr(obj, '_asdict'):
return obj._asdict()
rectangle = Rectangle(width=10, height=20)
print(orjson.dumps(rectangle, default=default))
=>
{
"width":10,
"height":20
}
There is a more convenient solution is to use the decorator (it uses the protected field _fields).
Python 2.7+:
import json
from collections import namedtuple, OrderedDict
def json_serializable(cls):
def as_dict(self):
yield OrderedDict(
(name, value) for name, value in zip(
self._fields,
iter(super(cls, self).__iter__())))
cls.__iter__ = as_dict
return cls
#Usage:
C = json_serializable(namedtuple('C', 'a b c'))
print json.dumps(C('abc', True, 3.14))
# or
#json_serializable
class D(namedtuple('D', 'a b c')):
pass
print json.dumps(D('abc', True, 3.14))
Python 3.6.6+:
import json
from typing import TupleName
def json_serializable(cls):
def as_dict(self):
yield {name: value for name, value in zip(
self._fields,
iter(super(cls, self).__iter__()))}
cls.__iter__ = as_dict
return cls
# Usage:
#json_serializable
class C(NamedTuple):
a: str
b: bool
c: float
print(json.dumps(C('abc', True, 3.14))
It recursively converts the namedTuple data to json.
print(m1)
## Message(id=2, agent=Agent(id=1, first_name='asd', last_name='asd', mail='2#mai.com'), customer=Customer(id=1, first_name='asd', last_name='asd', mail='2#mai.com', phone_number=123123), type='image', content='text', media_url='h.com', la=123123, ls=4512313)
def reqursive_to_json(obj):
_json = {}
if isinstance(obj, tuple):
datas = obj._asdict()
for data in datas:
if isinstance(datas[data], tuple):
_json[data] = (reqursive_to_json(datas[data]))
else:
print(datas[data])
_json[data] = (datas[data])
return _json
data = reqursive_to_json(m1)
print(data)
{'agent': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'id': 1},
'content': 'text',
'customer': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'phone_number': 123123,
'id': 1},
'id': 2,
'la': 123123,
'ls': 4512313,
'media_url': 'h.com',
'type': 'image'}
The jsonplus library provides a serializer for NamedTuple instances. Use its compatibility mode to output simple objects if needed, but prefer the default as it is helpful for decoding back.
This is an old question. However:
A suggestion for all those with the same question, think carefully about using any of the private or internal features of the NamedTuple because they have before and will change again over time.
For example, if your NamedTuple is a flat value object and you're only interested in serializing it and not in cases where it is nested into another object, you could avoid the troubles that would come up with __dict__ being removed or _as_dict() changing and just do something like (and yes this is Python 3 because this answer is for the present):
from typing import NamedTuple
class ApiListRequest(NamedTuple):
group: str="default"
filter: str="*"
def to_dict(self):
return {
'group': self.group,
'filter': self.filter,
}
def to_json(self):
return json.dumps(self.to_dict())
I tried to use the default callable kwarg to dumps in order to do the to_dict() call if available, but that didn't get called as the NamedTuple is convertible to a list.
Here is my take on the problem. It serializes the NamedTuple, takes care of folded NamedTuples and Lists inside of them
def recursive_to_dict(obj: Any) -> dict:
_dict = {}
if isinstance(obj, tuple):
node = obj._asdict()
for item in node:
if isinstance(node[item], list): # Process as a list
_dict[item] = [recursive_to_dict(x) for x in (node[item])]
elif getattr(node[item], "_asdict", False): # Process as a NamedTuple
_dict[item] = recursive_to_dict(node[item])
else: # Process as a regular element
_dict[item] = (node[item])
return _dict
simplejson.dump() instead of json.dump does the job. It may be slower though.
I'm very new to python and I wish I could do . notation to access values of a dict.
Lets say I have test like this:
>>> test = dict()
>>> test['name'] = 'value'
>>> print(test['name'])
value
But I wish I could do test.name to get value. Infact I did it by overriding the __getattr__ method in my class like this:
class JuspayObject:
def __init__(self,response):
self.__dict__['_response'] = response
def __getattr__(self,key):
try:
return self._response[key]
except KeyError,err:
sys.stderr.write('Sorry no key matches')
and this works! when I do:
test.name // I get value.
But the problem is when I just print test alone I get the error as:
'Sorry no key matches'
Why is this happening?
This functionality already exists in the standard libraries, so I recommend you just use their class.
>>> from types import SimpleNamespace
>>> d = {'key1': 'value1', 'key2': 'value2'}
>>> n = SimpleNamespace(**d)
>>> print(n)
namespace(key1='value1', key2='value2')
>>> n.key2
'value2'
Adding, modifying and removing values is achieved with regular attribute access, i.e. you can use statements like n.key = val and del n.key.
To go back to a dict again:
>>> vars(n)
{'key1': 'value1', 'key2': 'value2'}
The keys in your dict should be string identifiers for attribute access to work properly.
Simple namespace was added in Python 3.3. For older versions of the language, argparse.Namespace has similar behaviour.
I assume that you are comfortable in Javascript and want to borrow that kind of syntax... I can tell you by personal experience that this is not a great idea.
It sure does look less verbose and neat; but in the long run it is just obscure. Dicts are dicts, and trying to make them behave like objects with attributes will probably lead to (bad) surprises.
If you need to manipulate the fields of an object as if they were a dictionary, you can always resort to use the internal __dict__ attribute when you need it, and then it is explicitly clear what you are doing. Or use getattr(obj, 'key') to have into account the inheritance structure and class attributes too.
But by reading your example it seems that you are trying something different... As the dot operator will already look in the __dict__ attribute without any extra code.
In addition to this answer, one can add support for nested dicts as well:
from types import SimpleNamespace
class NestedNamespace(SimpleNamespace):
def __init__(self, dictionary, **kwargs):
super().__init__(**kwargs)
for key, value in dictionary.items():
if isinstance(value, dict):
self.__setattr__(key, NestedNamespace(value))
else:
self.__setattr__(key, value)
nested_namespace = NestedNamespace({
'parent': {
'child': {
'grandchild': 'value'
}
},
'normal_key': 'normal value',
})
print(nested_namespace.parent.child.grandchild) # value
print(nested_namespace.normal_key) # normal value
Note that this does not support dot notation for dicts that are somewhere inside e.g. lists.
Could you use a named tuple?
from collections import namedtuple
Test = namedtuple('Test', 'name foo bar')
my_test = Test('value', 'foo_val', 'bar_val')
print(my_test)
print(my_test.name)
__getattr__ is used as a fallback when all other attribute lookup rules have failed. When you try to "print" your object, Python look for a __repr__ method, and since you don't implement it in your class it ends up calling __getattr__ (yes, in Python methods are attributes too). You shouldn't assume which key getattr will be called with, and, most important, __getattr__ must raise an AttributeError if it cannot resolve key.
As a side note: don't use self.__dict__ for ordinary attribute access, just use the plain attribute notation:
class JuspayObject:
def __init__(self,response):
# don't use self.__dict__ here
self._response = response
def __getattr__(self,key):
try:
return self._response[key]
except KeyError,err:
raise AttributeError(key)
Now if your class has no other responsability (and your Python version is >= 2.6 and you don't need to support older versions), you may just use a namedtuple : http://docs.python.org/2/library/collections.html#collections.namedtuple
You can use the built-in method argparse.Namespace():
import argparse
args = argparse.Namespace()
args.name = 'value'
print(args.name)
# 'value'
You can also get the original dict via vars(args).
class convert_to_dot_notation(dict):
"""
Access dictionary attributes via dot notation
"""
__getattr__ = dict.get
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
test = {"name": "value"}
data = convert_to_dot_notation(test)
print(data.name)
You have to be careful when using __getattr__, because it's used for a lot of builtin Python functionality.
Try something like this...
class JuspayObject:
def __init__(self,response):
self.__dict__['_response'] = response
def __getattr__(self, key):
# First, try to return from _response
try:
return self.__dict__['_response'][key]
except KeyError:
pass
# If that fails, return default behavior so we don't break Python
try:
return self.__dict__[key]
except KeyError:
raise AttributeError, key
>>> j = JuspayObject({'foo': 'bar'})
>>> j.foo
'bar'
>>> j
<__main__.JuspayObject instance at 0x7fbdd55965f0>
Here is a simple, handy dot notation helper example that is working with nested items:
def dict_get(data:dict, path:str, default = None):
pathList = re.split(r'\.', path, flags=re.IGNORECASE)
result = data
for key in pathList:
try:
key = int(key) if key.isnumeric() else key
result = result[key]
except:
result = default
break
return result
Usage example:
my_dict = {"test1": "str1", "nested_dict": {"test2": "str2"}, "nested_list": ["str3", {"test4": "str4"}]}
print(dict_get(my_dict, "test1"))
# str1
print(dict_get(my_dict, "nested_dict.test2"))
# str2
print(dict_get(my_dict, "nested_list.1.test4"))
# str4
With a small addition to this answer you can support lists as well:
class NestedNamespace(SimpleNamespace):
def __init__(self, dictionary, **kwargs):
super().__init__(**kwargs)
for key, value in dictionary.items():
if isinstance(value, dict):
self.__setattr__(key, NestedNamespace(value))
elif isinstance(value, list):
self.__setattr__(key, map(NestedNamespace, value))
else:
self.__setattr__(key, value)
2022 answer: I've created the dotwiz package -- this is a fast, tiny library that seems to perform really well in most cases.
>>> from dotwiz import DotWiz
>>> test = DotWiz(hello='world')
>>> test.works = True
>>> test
✫(hello='world', works=True)
>>> test.hello
'world'
>>> assert test.works
This feature is baked into OmegaConf:
from omegaconf import OmegaConf
your_dict = {"k" : "v", "list" : [1, {"a": "1", "b": "2", 3: "c"}]}
adot_dict = OmegaConf.create(your_dict)
print(adot_dict.k)
print(adot_dict.list)
Installation is:
pip install omegaconf
This lib comes in handy for configurations, which it is actually made for:
from omegaconf import OmegaConf
cfg = OmegaConf.load('config.yml')
print(cfg.data_path)
I use the dotted_dict package:
>>> from dotted_dict import DottedDict
>>> test = DottedDict()
>>> test.name = 'value'
>>> print(test.name)
value
Advantages over SimpleNamespace
(See #win's answer.) DottedDict is an actual dict:
>>> isinstance(test, dict)
True
This allows, for example, checking for membership:
>>> 'name' in test
True
whereas for SimpleNamespace you need something much less readable like hasattr(test, 'name').
Don't use DotMap
I found this out the hard way. If you reference a non-member it adds it rather than throwing an error. This can lead to hard to find bugs in code:
>>> from dotmap import DotMap
>>> dm = DotMap()
>>> 'a' in dm
False
>>> x = dm.a
>>> 'a' in dm
True
#!/usr/bin/env python3
import json
from sklearn.utils import Bunch
from collections.abc import MutableMapping
def dotted(inpt: MutableMapping,
*args,
**kwargs
) -> Bunch:
"""
Enables recursive dot notation for ``dict``.
"""
return json.loads(json.dumps(inpt),
object_hook=lambda x:
Bunch(**{**Bunch(), **x}))
You can make hacks adding dot notation to Dicts mostly work, but there are always namespace problems. As in, what does this do?
x = DotDict()
x["values"] = 1989
print(x. values)
I use pydash, which is a Python port of JS's lodash, to do these things a different way when the nesting gets too ugly.
Add a __repr__() method to the class so that you can customize the text to be shown on
print text
Learn more here: https://web.archive.org/web/20121022015531/http://diveintopython.net/object_oriented_framework/special_class_methods2.html
What is the recommended way of serializing a namedtuple to json with the field names retained?
Serializing a namedtuple to json results in only the values being serialized and the field names being lost in translation. I would like the fields also to be retained when json-ized and hence did the following:
class foobar(namedtuple('f', 'foo, bar')):
__slots__ = ()
def __iter__(self):
yield self._asdict()
The above serializes to json as I expect and behaves as namedtuple in other places I use (attribute access etc.,) except with a non-tuple like results while iterating it (which fine for my use case).
What is the "correct way" of converting to json with the field names retained?
If it's just one namedtuple you're looking to serialize, using its _asdict() method will work (with Python >= 2.7)
>>> from collections import namedtuple
>>> import json
>>> FB = namedtuple("FB", ("foo", "bar"))
>>> fb = FB(123, 456)
>>> json.dumps(fb._asdict())
'{"foo": 123, "bar": 456}'
This is pretty tricky, since namedtuple() is a factory which returns a new type derived from tuple. One approach would be to have your class also inherit from UserDict.DictMixin, but tuple.__getitem__ is already defined and expects an integer denoting the position of the element, not the name of its attribute:
>>> f = foobar('a', 1)
>>> f[0]
'a'
At its heart the namedtuple is an odd fit for JSON, since it is really a custom-built type whose key names are fixed as part of the type definition, unlike a dictionary where key names are stored inside the instance. This prevents you from "round-tripping" a namedtuple, e.g. you cannot decode a dictionary back into a namedtuple without some other a piece of information, like an app-specific type marker in the dict {'a': 1, '#_type': 'foobar'}, which is a bit hacky.
This is not ideal, but if you only need to encode namedtuples into dictionaries, another approach is to extend or modify your JSON encoder to special-case these types. Here is an example of subclassing the Python json.JSONEncoder. This tackles the problem of ensuring that nested namedtuples are properly converted to dictionaries:
from collections import namedtuple
from json import JSONEncoder
class MyEncoder(JSONEncoder):
def _iterencode(self, obj, markers=None):
if isinstance(obj, tuple) and hasattr(obj, '_asdict'):
gen = self._iterencode_dict(obj._asdict(), markers)
else:
gen = JSONEncoder._iterencode(self, obj, markers)
for chunk in gen:
yield chunk
class foobar(namedtuple('f', 'foo, bar')):
pass
enc = MyEncoder()
for obj in (foobar('a', 1), ('a', 1), {'outer': foobar('x', 'y')}):
print enc.encode(obj)
{"foo": "a", "bar": 1}
["a", 1]
{"outer": {"foo": "x", "bar": "y"}}
It looks like you used to be able to subclass simplejson.JSONEncoder to make this work, but with the latest simplejson code, that is no longer the case: you have to actually modify the project code. I see no reason why simplejson should not support namedtuples, so I forked the project, added namedtuple support, and I'm currently waiting for my branch to be pulled back into the main project. If you need the fixes now, just pull from my fork.
EDIT: Looks like the latest versions of simplejson now natively support this with the namedtuple_as_object option, which defaults to True.
I wrote a library for doing this: https://github.com/ltworf/typedload
It can go from and to named-tuple and back.
It supports quite complicated nested structures, with lists, sets, enums, unions, default values. It should cover most common cases.
edit: The library also supports dataclass and attr classes.
It's impossible to serialize namedtuples correctly with the native python json library. It will always see tuples as lists, and it is impossible to override the default serializer to change this behaviour. It's worse if objects are nested.
Better to use a more robust library like orjson:
import orjson
from typing import NamedTuple
class Rectangle(NamedTuple):
width: int
height: int
def default(obj):
if hasattr(obj, '_asdict'):
return obj._asdict()
rectangle = Rectangle(width=10, height=20)
print(orjson.dumps(rectangle, default=default))
=>
{
"width":10,
"height":20
}
There is a more convenient solution is to use the decorator (it uses the protected field _fields).
Python 2.7+:
import json
from collections import namedtuple, OrderedDict
def json_serializable(cls):
def as_dict(self):
yield OrderedDict(
(name, value) for name, value in zip(
self._fields,
iter(super(cls, self).__iter__())))
cls.__iter__ = as_dict
return cls
#Usage:
C = json_serializable(namedtuple('C', 'a b c'))
print json.dumps(C('abc', True, 3.14))
# or
#json_serializable
class D(namedtuple('D', 'a b c')):
pass
print json.dumps(D('abc', True, 3.14))
Python 3.6.6+:
import json
from typing import TupleName
def json_serializable(cls):
def as_dict(self):
yield {name: value for name, value in zip(
self._fields,
iter(super(cls, self).__iter__()))}
cls.__iter__ = as_dict
return cls
# Usage:
#json_serializable
class C(NamedTuple):
a: str
b: bool
c: float
print(json.dumps(C('abc', True, 3.14))
It recursively converts the namedTuple data to json.
print(m1)
## Message(id=2, agent=Agent(id=1, first_name='asd', last_name='asd', mail='2#mai.com'), customer=Customer(id=1, first_name='asd', last_name='asd', mail='2#mai.com', phone_number=123123), type='image', content='text', media_url='h.com', la=123123, ls=4512313)
def reqursive_to_json(obj):
_json = {}
if isinstance(obj, tuple):
datas = obj._asdict()
for data in datas:
if isinstance(datas[data], tuple):
_json[data] = (reqursive_to_json(datas[data]))
else:
print(datas[data])
_json[data] = (datas[data])
return _json
data = reqursive_to_json(m1)
print(data)
{'agent': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'id': 1},
'content': 'text',
'customer': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2#mai.com',
'phone_number': 123123,
'id': 1},
'id': 2,
'la': 123123,
'ls': 4512313,
'media_url': 'h.com',
'type': 'image'}
The jsonplus library provides a serializer for NamedTuple instances. Use its compatibility mode to output simple objects if needed, but prefer the default as it is helpful for decoding back.
This is an old question. However:
A suggestion for all those with the same question, think carefully about using any of the private or internal features of the NamedTuple because they have before and will change again over time.
For example, if your NamedTuple is a flat value object and you're only interested in serializing it and not in cases where it is nested into another object, you could avoid the troubles that would come up with __dict__ being removed or _as_dict() changing and just do something like (and yes this is Python 3 because this answer is for the present):
from typing import NamedTuple
class ApiListRequest(NamedTuple):
group: str="default"
filter: str="*"
def to_dict(self):
return {
'group': self.group,
'filter': self.filter,
}
def to_json(self):
return json.dumps(self.to_dict())
I tried to use the default callable kwarg to dumps in order to do the to_dict() call if available, but that didn't get called as the NamedTuple is convertible to a list.
Here is my take on the problem. It serializes the NamedTuple, takes care of folded NamedTuples and Lists inside of them
def recursive_to_dict(obj: Any) -> dict:
_dict = {}
if isinstance(obj, tuple):
node = obj._asdict()
for item in node:
if isinstance(node[item], list): # Process as a list
_dict[item] = [recursive_to_dict(x) for x in (node[item])]
elif getattr(node[item], "_asdict", False): # Process as a NamedTuple
_dict[item] = recursive_to_dict(node[item])
else: # Process as a regular element
_dict[item] = (node[item])
return _dict
simplejson.dump() instead of json.dump does the job. It may be slower though.
I have a class that inherits from a dictionary in order to add some custom behavior - in this case it passes each key and value to a function for validation. In the example below, the 'validation' simply prints a message.
Assignment to the dictionary works as expected, printing messages whenever items are added to the dict. But when I try to use the custom dictionary type as the __dict__ attribute of a class, attribute assignments, which in turn puts keys/values into my custom dictionary class, somehow manages to insert values into the dictionary while completely bypassing __setitem__ (and the other methods I've defined that may add keys).
The custom dictionary:
from collections import MutableMapping
class ValidatedDict(dict):
"""A dictionary that passes each value it ends up storing through
a given validator function.
"""
def __init__(self, validator, *args, **kwargs):
self.__validator = validator
self.update(*args, **kwargs)
def __setitem__(self, key, value):
self.__validator(value)
self.__validator(key)
dict.__setitem__(self, key, value)
def copy(self): pass # snipped
def fromkeys(validator, seq, v = None): pass # snipped
setdefault = MutableMapping.setdefault
update = MutableMapping.update
def Validator(i): print "Validating:", i
Using it as the __dict__ attribute of a class yields behavior I don't understand.
>>> d = ValidatedDict(Validator)
>>> d["key"] = "value"
Validating: value
Validating: key
>>> class Foo(object): pass
...
>>> foo = Foo()
>>> foo.__dict__ = ValidatedDict(Validator)
>>> type(foo.__dict__)
<class '__main__.ValidatedDict'>
>>> foo.bar = 100 # Yields no message!
>>> foo.__dict__['odd'] = 99
Validating: 99
Validating: odd
>>> foo.__dict__
{'odd': 99, 'bar': 100}
Can someone explain why it doesn't behave the way I expect? Can it or can't it work the way I'm attempting?
This is an optimization. To support metamethods on __dict__, every single instance assignment would need to check the existance of the metamethod. This is a fundamental operation--every attribute lookup and assignment--so the extra couple branches needed to check this would become overhead for the whole language, for something that's more or less redundant with obj.__getattr__ and obj.__setattr__.
I'm new to Python, and am sort of surprised I cannot do this.
dictionary = {
'a' : '123',
'b' : dictionary['a'] + '456'
}
I'm wondering what the Pythonic way to correctly do this in my script, because I feel like I'm not the only one that has tried to do this.
EDIT: Enough people were wondering what I'm doing with this, so here are more details for my use cases. Lets say I want to keep dictionary objects to hold file system paths. The paths are relative to other values in the dictionary. For example, this is what one of my dictionaries may look like.
dictionary = {
'user': 'sholsapp',
'home': '/home/' + dictionary['user']
}
It is important that at any point in time I may change dictionary['user'] and have all of the dictionaries values reflect the change. Again, this is an example of what I'm using it for, so I hope that it conveys my goal.
From my own research I think I will need to implement a class to do this.
No fear of creating new classes -
You can take advantage of Python's string formating capabilities
and simply do:
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item) % self
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/%(user)s',
'bin' : '%(home)s/bin'
})
print dictionary["home"]
print dictionary["bin"]
Nearest I came up without doing object:
dictionary = {
'user' : 'gnucom',
'home' : lambda:'/home/'+dictionary['user']
}
print dictionary['home']()
dictionary['user']='tony'
print dictionary['home']()
>>> dictionary = {
... 'a':'123'
... }
>>> dictionary['b'] = dictionary['a'] + '456'
>>> dictionary
{'a': '123', 'b': '123456'}
It works fine but when you're trying to use dictionary it hasn't been defined yet (because it has to evaluate that literal dictionary first).
But be careful because this assigns to the key of 'b' the value referenced by the key of 'a' at the time of assignment and is not going to do the lookup every time. If that is what you are looking for, it's possible but with more work.
What you're describing in your edit is how an INI config file works. Python does have a built in library called ConfigParser which should work for what you're describing.
This is an interesting problem. It seems like Greg has a good solution. But that's no fun ;)
jsbueno as a very elegant solution but that only applies to strings (as you requested).
The trick to a 'general' self referential dictionary is to use a surrogate object. It takes a few (understatement) lines of code to pull off, but the usage is about what you want:
S = SurrogateDict(AdditionSurrogateDictEntry)
d = S.resolve({'user': 'gnucom',
'home': '/home/' + S['user'],
'config': [S['home'] + '/.emacs', S['home'] + '/.bashrc']})
The code to make that happen is not nearly so short. It lives in three classes:
import abc
class SurrogateDictEntry(object):
__metaclass__ = abc.ABCMeta
def __init__(self, key):
"""record the key on the real dictionary that this will resolve to a
value for
"""
self.key = key
def resolve(self, d):
""" return the actual value"""
if hasattr(self, 'op'):
# any operation done on self will store it's name in self.op.
# if this is set, resolve it by calling the appropriate method
# now that we can get self.value out of d
self.value = d[self.key]
return getattr(self, self.op + 'resolve__')()
else:
return d[self.key]
#staticmethod
def make_op(opname):
"""A convience class. This will be the form of all op hooks for subclasses
The actual logic for the op is in __op__resolve__ (e.g. __add__resolve__)
"""
def op(self, other):
self.stored_value = other
self.op = opname
return self
op.__name__ = opname
return op
Next, comes the concrete class. simple enough.
class AdditionSurrogateDictEntry(SurrogateDictEntry):
__add__ = SurrogateDictEntry.make_op('__add__')
__radd__ = SurrogateDictEntry.make_op('__radd__')
def __add__resolve__(self):
return self.value + self.stored_value
def __radd__resolve__(self):
return self.stored_value + self.value
Here's the final class
class SurrogateDict(object):
def __init__(self, EntryClass):
self.EntryClass = EntryClass
def __getitem__(self, key):
"""record the key and return"""
return self.EntryClass(key)
#staticmethod
def resolve(d):
"""I eat generators resolve self references"""
stack = [d]
while stack:
cur = stack.pop()
# This just tries to set it to an appropriate iterable
it = xrange(len(cur)) if not hasattr(cur, 'keys') else cur.keys()
for key in it:
# sorry for being a duche. Just register your class with
# SurrogateDictEntry and you can pass whatever.
while isinstance(cur[key], SurrogateDictEntry):
cur[key] = cur[key].resolve(d)
# I'm just going to check for iter but you can add other
# checks here for items that we should loop over.
if hasattr(cur[key], '__iter__'):
stack.append(cur[key])
return d
In response to gnucoms's question about why I named the classes the way that I did.
The word surrogate is generally associated with standing in for something else so it seemed appropriate because that's what the SurrogateDict class does: an instance replaces the 'self' references in a dictionary literal. That being said, (other than just being straight up stupid sometimes) naming is probably one of the hardest things for me about coding. If you (or anyone else) can suggest a better name, I'm all ears.
I'll provide a brief explanation. Throughout S refers to an instance of SurrogateDict and d is the real dictionary.
A reference S[key] triggers S.__getitem__ and SurrogateDictEntry(key) to be placed in the d.
When S[key] = SurrogateDictEntry(key) is constructed, it stores key. This will be the key into d for the value that this entry of SurrogateDictEntry is acting as a surrogate for.
After S[key] is returned, it is either entered into the d, or has some operation(s) performed on it. If an operation is performed on it, it triggers the relative __op__ method which simple stores the value that the operation is performed on and the name of the operation and then returns itself. We can't actually resolve the operation because d hasn't been constructed yet.
After d is constructed, it is passed to S.resolve. This method loops through d finding any instances of SurrogateDictEntry and replacing them with the result of calling the resolve method on the instance.
The SurrogateDictEntry.resolve method receives the now constructed d as an argument and can use the value of key that it stored at construction time to get the value that it is acting as a surrogate for. If an operation was performed on it after creation, the op attribute will have been set with the name of the operation that was performed. If the class has a __op__ method, then it has a __op__resolve__ method with the actual logic that would normally be in the __op__ method. So now we have the logic (self.op__resolve) and all necessary values (self.value, self.stored_value) to finally get the real value of d[key]. So we return that which step 4 places in the dictionary.
finally the SurrogateDict.resolve method returns d with all references resolved.
That'a a rough sketch. If you have any more questions, feel free to ask.
If you, just like me wandering how to make #jsbueno snippet work with {} style substitutions, below is the example code (which is probably not much efficient though):
import string
class MyDict(dict):
def __init__(self, *args, **kw):
super(MyDict,self).__init__(*args, **kw)
self.itemlist = super(MyDict,self).keys()
self.fmt = string.Formatter()
def __getitem__(self, item):
return self.fmt.vformat(dict.__getitem__(self, item), {}, self)
xs = MyDict({
'user' : 'gnucom',
'home' : '/home/{user}',
'bin' : '{home}/bin'
})
>>> xs["home"]
'/home/gnucom'
>>> xs["bin"]
'/home/gnucom/bin'
I tried to make it work with the simple replacement of % self with .format(**self) but it turns out it wouldn't work for nested expressions (like 'bin' in above listing, which references 'home', which has it's own reference to 'user') because of the evaluation order (** expansion is done before actual format call and it's not delayed like in original % version).
Write a class, maybe something with properties:
class PathInfo(object):
def __init__(self, user):
self.user = user
#property
def home(self):
return '/home/' + self.user
p = PathInfo('thc')
print p.home # /home/thc
As sort of an extended version of #Tony's answer, you could build a dictionary subclass that calls its values if they are callables:
class CallingDict(dict):
"""Returns the result rather than the value of referenced callables.
>>> cd = CallingDict({1: "One", 2: "Two", 'fsh': "Fish",
... "rhyme": lambda d: ' '.join((d[1], d['fsh'],
... d[2], d['fsh']))})
>>> cd["rhyme"]
'One Fish Two Fish'
>>> cd[1] = 'Red'
>>> cd[2] = 'Blue'
>>> cd["rhyme"]
'Red Fish Blue Fish'
"""
def __getitem__(self, item):
it = super(CallingDict, self).__getitem__(item)
if callable(it):
return it(self)
else:
return it
Of course this would only be usable if you're not actually going to store callables as values. If you need to be able to do that, you could wrap the lambda declaration in a function that adds some attribute to the resulting lambda, and check for it in CallingDict.__getitem__, but at that point it's getting complex, and long-winded, enough that it might just be easier to use a class for your data in the first place.
This is very easy in a lazily evaluated language (haskell).
Since Python is strictly evaluated, we can do a little trick to turn things lazy:
Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
d1 = lambda self: lambda: {
'a': lambda: 3,
'b': lambda: self()['a']()
}
# fix the d1, and evaluate it
d2 = Y(d1)()
# to get a
d2['a']() # 3
# to get b
d2['b']() # 3
Syntax wise this is not very nice. That's because of us needing to explicitly construct lazy expressions with lambda: ... and explicitly evaluate lazy expression with ...(). It's the opposite problem in lazy languages needing strictness annotations, here in Python we end up needing lazy annotations.
I think with some more meta-programmming and some more tricks, the above could be made more easy to use.
Note that this is basically how let-rec works in some functional languages.
The jsbueno answer in Python 3 :
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item).format(self)
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/{0[user]}',
'bin' : '{0[home]}/bin'
})
print(dictionary["home"])
print(dictionary["bin"])
Her ewe use the python 3 string formatting with curly braces {} and the .format() method.
Documentation : https://docs.python.org/3/library/string.html