I have a Python dict-of-dict structure with a large number of outer-dict keys (millions to billions). The inner dicts are mostly empty, but can store key-value pairs. Currently I create a separate dict as each of the inner dicts. But it uses a lot of memory that I don't end up using. Each empty dict is small, but I have a lot of them. I'd like to delay creating the inner dict until needed.
Ideally, I'd like to even delay creating the inner dict until a key-value pair is set in the inner dict. I envision using a single DelayDict object for ALL outer-dict values. This object would act like an empty dict for get and getitem calls, but as soon as a setitem or update call comes in it would create an empty dict to take its place. I run into trouble having the delaydict object know how to connect the new empty dict with the dict-of-dict structure.
class DelayDict(object): % can do much more - only showing get/set
def __init__(self, dod):
self.dictofdict = dod % the outer dict
def __getitem__(self, key):
raise KeyError(key)
def __setitem__(self, key, value):
replacement = {key: value}
% replace myself in the outer dict!!
self.dict-of-dict[?????] = replacement
I can't think of how to store the new replacement dict in the dict-of-dict structure so that it replaces the DelayDict class as the inner dict. I know properties can do similar things, but I believe the same fundamental trouble arises when I try to replace myself inside the outer dict.
Old question, but I came across a similar problem. I'm not sure that it's a
good idea to try to spare some memory, but if you really need to do that, you should try to build your own data structure.
If you are stuck with the dict of dict, here's a solution.
First, you need a way to create keys in the OuterDict without value (value is {}) by default). if OuterDict is a wrapper around a dict __d:
def create(self, key):
self.__d[key] = None
How much memory will you spare?
>>> import sys
>>> a = {}
>>> sys.getsizeof(a)
136
As you pointed out, None is created only once, but you have to keep a reference on it. In Cpython (64 bits), it's 8 bytes. For 1 billion elements, you spare (136-8)* 10**9 bytes = 128 Gb (and not Mb, thanks!). You need to give a
placeholder when someone ask for the value. The placeholder keeps track of the outer dict and the key in the outer dict. It wraps a dict and assigns this dict to outer[key] when you assign a value.
No more talking, code:
class OuterDict():
def __init__(self):
self.__d = {}
def __getitem__(self, key):
v = self.__d[key]
if v is None: # an orphan key
v = PlaceHolder(self.__d, key)
return v
def create(self, key):
self.__d[key] = None
class PlaceHolder():
def __init__(self, parent, key):
self.__parent = parent
self.__key = key
self.__d = {}
def __getitem__(self, key):
return self.__d[key]
def __setitem__(self, key, value):
if not self.__d:
self.__parent[self.__key] = self.__d # copy me in the outer dict
self.__d[key] = value
def __repr__(self):
return repr("PlaceHolder for "+str(self.__d))
# __len__, ...
A test:
o = OuterDict()
o.create("a") # a is empty
print (o["a"])
try:
o["a"]["b"] # Key Error
except KeyError as e:
print ("KeyError", e)
o["a"]["b"] = 2
print (o["a"])
# output:
# 'PlaceHolder for {}'
# KeyError 'b'
# {'b': 2}
Why it doesn't use much memory? Because you are not building billions of placeholders. You release them when you don't need them anymore. Maybe you will need just one at a time.
Possible improvements: you can create a pool of PlaceHolders. A stack may be a good data structure: recently created placeholder are likely to be released soon. When you need a new PlaceHolder, you
look into the stack, and if a placeholder has only one ref (sys.getrefcount(ph) == 1), you can use it. To fasten the process, when you are looking for
a free placeholder, you can remember the placeholder with the maximum refcount. You switch the free placeholder with this "max refcount" placeholder. Hence, the placeholders with the maximum
refcount are sent to the bottom of the stack.
Related
I have a bit of code I was hoping to clean up/shrink down.
I have a function which receives a key and returns the corresponding value from either of two dictionaries, or a default value if the key is present in neither.
Here is a verbose (but explicit) version of the problem:
def lookup_function( key ):
if key.lower() in Dictionary_One: return Dictionary_One[ key.lower ]
if key.lower() in Dictionary_Two: return Dictionary_Two[ key.lower ]
return Globally_Available_Default_Value
Not horrifying to look at. Just seems a bit voluminous to me.
So, assuming that both dictionaries and the default value are available from the global scope, and that the key must be a string in lowercase, what is the cleanest, shortest, most graceful, and most pythonic way of achieving this?
Have fun!
You can shorten that to:
def lookup_function( key ):
key = key.lower()
return Dictionary_One.get(key, Dictionary_Two.get(key, Globally_Available_Default_Value))
Alternative answer using an "on the fly" mixed dictionary:
dict(Dictionary_Two, **Dictionary_One).get(key.lower(), Globally_Available_Default_Value)
Some good answers. Here's mine!
return { **{ key.lower(): default_value }, **dictionary_one, **dictionary_two }[ key.lower() ];
Just requires changing the names a bit to be a little shorter. This method would NOT be recommended for long dictionaries, but is perfectly workable for short ones. dictionary_one would override the default value if it contains the key, with dictionary_two then overriding any keys in dictionary_one.
Not perfect in all cases, but in a for general use this is what I'm going to go with.
Lesser/shorter code is not necessarily better as there are other considerations, such as generality, reusability, and extensibility.
Raymond Hettinger contributed a recipe to the Python Cookbook™, Second Edition for a Chainmap class that automates chained dictionary lookups which I think would be a very elegant way of doing what you want (and the class is reusable).
Update: Just discovered that the Python 3 collections module contains a ChainMap, so it ought to be easier to use it instead of writing your own as shown below.
class Chainmap(object):
def __init__(self, *mappings):
# record the sequence of mappings into which we must look
self._mappings = mappings
def __getitem__(self, key):
# try looking up into each mapping in sequence
for mapping in self._mappings:
try:
return mapping[key]
except KeyError:
pass
# 'key' not found in any mapping, so raise KeyError exception
raise KeyError(key)
def get(self, key, default=None):
# return self[key] if present, otherwise 'default'
try:
return self[key]
except KeyError:
return default
def __contains__(self, key):
# return True if 'key' is present in self, otherwise False
try:
self[key]
return True
except KeyError:
return False
Sample usage:
dictionary_one = {'key1': 1, 'key2':2}
dictionary_two = {'key2': 3, 'key4':4}
globally_available_default_value = 42
chmap = Chainmap(dictionary_one, dictionary_two)
def lookup_function(key):
try:
return chmap[key.lower()]
except KeyError:
return globally_available_default_value
print(lookup_function('Key1')) # -> 1
print(lookup_function('Key2')) # -> 2
print(lookup_function('Key3')) # -> 42
print(lookup_function('Key4')) # -> 4
I'm looking for a fast, clean and pythonic way of slicing custom made objects while preserving their type after the operation.
To give you some context, I have to deal with a lot of semi-unstructured data and handle it I work with lists of dictionaries. To streamline some operations I have created an "ld" object, that inherits from "list". Amongst its many capabilities it checks that the data was provided on the correct format. Let's simplify it by saying it ensures that all entries of the list are dictionaries containing some key "a", as shown bellow:
class ld( list):
def __init__(self, x):
list.__init__(self, x)
self.__init_check()
def __init_check(self):
for record in self:
if isinstance( record, dict) and "a" in record:
pass
else:
raise TypeError("not all entries are dictionaries or have the key 'a'")
return
This behaves correctly when the data is as desired and initialises ld:
tt = ld( [{"a": 1, "b":2}, {"a":4}, {"a":6, "c":67}])
type( tt)
It is also does the right thing when the data is incorrect:
ld( [{"w":1}])
ld( [1,2,3])
However the problems comes when I proceed to slice the object:
type( tt[:2])
tt[:2] is a list and no longer as all the methods and attributes that I created in the full-fledged ld object. I could reconvert the slice into an ld but that means that it would have to go through the entire initial data checking process again, slowing down computations a lot.
Here is the solution I came up with to speed things up:
class ld( list):
def __init__(self, x, safe=True):
list.__init__(self, x)
self.__init_check( safe)
def __init_check(self, is_safe):
if not is_safe:
return
for record in self:
if isinstance( record, dict) and "a" in record:
pass
else:
raise TypeError("not all entries are dictionaries or have the key 'a'")
return
def __getslice__(self, i, j):
return ld( list.__getslice__( self, i, j), safe=False)
Is there a cleaner and more pythonic way of going about it?
Thanks in advance for you help.
I don't think subclassing list to verify the shape or type of its contents in general is the right approach. The list pointedly doesn't care about its contents, and implementing a class whose constructor behavior varies based on flags passed to it is messy. If you need a constructor that verifies inputs, just do your check logic in a function that returns a list.
def make_verified_list(items):
"""
:type items: list[object]
:rtype: list[dict]
"""
new_list = []
for item in items:
if not verify_item(item):
raise InvalidItemError(item)
new_list.append(item)
return new_list
def verify_item(item):
"""
:type item: object
:rtype: bool
"""
return isinstance(item, dict) and "a" in item
Take this approach and you won't find yourself struggling with the behavior of core data structures.
Is there a way to add duplicate keys to json with python?
From my understanding, you can't have duplicate keys in python dictionaries. Usually, how I go about creating json is to create the dictionary and then json.dumps. However, I need duplicated keys within the JSON for testing purposes. But I can't do so because I can't add duplicate keys in a python dictionary. I am trying to doing this in python 3
You could always construct such a string value by hand.
On the other hand, one can make the CPython json module to encode duplicate keys. This is very tricky in Python 2 because json module does not respect duck-typing at all.
The straightforward solution would be to inherit from collections.Mapping - well you can't, since "MyMapping is not a JSON serializable."
Next one tries to subclass a dict - well, but if json.dumps notices that the type is dict, it skips from calling __len__, and sees the underlying dict directly - if it is empty, {} is output directly, so clearly if we fake the methods, the underlying dictionary must not be empty.
The next source of joy is that actually __iter__ is called, which iterates keys; and for each key, the __getitem__ is called, so we need to remember what is the corresponding value to return for the given key... thus we arrive to a very ugly solution for Python 2:
class FakeDict(dict):
def __init__(self, items):
# need to have something in the dictionary
self['something'] = 'something'
self._items = items
def __getitem__(self, key):
return self.last_val
def __iter__(self):
def generator():
for key, value in self._items:
self.last_val = value
yield key
return generator()
In CPython 3.3+ it is slightly easier... no, collections.abc.Mapping does not work, yes, you need to subclass a dict, yes, you need to fake that your dictionary has content... but the internal JSON encoder calls items instead of __iter__ and __getitem__!
Thus on Python 3:
import json
class FakeDict(dict):
def __init__(self, items):
self['something'] = 'something'
self._items = items
def items(self):
return self._items
print(json.dumps(FakeDict([('a', 1), ('a', 2)])))
prints out
{"a": 1, "a": 2}
Thanks a lot Antti Haapala, I figured out you can even use this to convert an array of tuples into a FakeDict:
def function():
array_of_tuples = []
array_of_tuples.append(("key","value1"))
array_of_tuples.append(("key","value2"))
return FakeDict(array_of_tuples)
print(json.dumps(function()))
Output:
{"key": "value1", "key": "value2"}
And if you change the FakeDict class to this Empty dictionaries will be correctly parsed:
class FakeDict(dict):
def __init__(self, items):
if items != []:
self['something'] = 'something'
self._items = items
def items(self):
return self._items
def test():
array_of_tuples = []
return FakeDict(array_of_tuples)
print(json.dumps(test()))
Output:
"{}"
Actually, it's very easy:
$> python -c "import json; print json.dumps({1: 'a', '1': 'b'})"
{"1": "b", "1": "a"}
I'm imitating the behavior of the ConfigParser module to write a highly specialized parser that exploits some well-defined structure in the configuration files for a particular application I work with. Several sections of the config file contain hundreds of variable and routine mappings prefixed with either Variable_ or Routine_, like this:
[Map.PRD]
Variable_FOO=LOC1
Variable_BAR=LOC2
Routine_FOO=LOC3
Routine_BAR=LOC4
...
[Map.SHD]
Variable_FOO=LOC1
Variable_BAR=LOC2
Routine_FOO=LOC3
Routine_BAR=LOC4
...
I'd like to maintain the basic structure of ConfigParser where each section is stored as a single dictionary, so users would still have access to the classic syntax:
config.content['Mappings']['Variable_FOO'] = 'LOC1'
but also be able to use a simplified API that drills down to this section:
config.vmapping('PRD')['FOO'] = 'LOC1'
config.vmapping('PRD')['BAR'] = 'LOC2'
config.rmapping('PRD')['FOO'] = 'LOC3'
config.rmapping('PRD')['BAR'] = 'LOC4'
Currently I'm implementing this by storing the section in a special subclass of dict to which I've added a prefix attribute. The variable and routine properties of the parser set the prefix attribute of the dict-like object to 'Variable_' or 'Routine_' and then modified __getitem__ and __setitem__ attributes of the dict handle gluing the prefix together with the key to access the appropriate item. It's working, but involves a lot of boilerplate to implement all the associated niceties like supporting iteration.
I suppose my ideal solution would be do dispense with the subclassed dict and have have the variable and routine properties somehow present a "view" of the plain dict object underneath without the prefixes.
Update
Here's the solution I implemented, largely based on #abarnet's answer:
class MappingDict(object):
def __init__(self, prefix, d):
self.prefix, self.d = prefix, d
def prefixify(self, name):
return '{}_{}'.format(self.prefix, name)
def __getitem__(self, name):
name = self.prefixify(name)
return self.d.__getitem__(name)
def __setitem__(self, name, value):
name = self.prefixify(name)
return self.d.__setitem__(name, value)
def __delitem__(self, name):
name = self.prefixify(name)
return self.d.__delitem__(name)
def __iter__(self):
return (key.partition('_')[-1] for key in self.d
if key.startswith(self.prefix))
def __repr__(self):
return 'MappingDict({})'.format(dict.__repr__(self))
class MyParser(object):
SECTCRE = re.compile(r'\[(?P<header>[^]]+)\]')
def __init__(self, filename):
self.filename = filename
self.content = {}
lines = [x.strip() for x in open(filename).read().splitlines()
if x.strip()]
for line in lines:
match = re.match(self.SECTCRE, line)
if match:
section = match.group('header')
self.content[section] = {}
else:
key, sep, value = line.partition('=')
self.content[section][key] = value
def write(self, filename):
fp = open(filename, 'w')
for section in sorted(self.content, key=sectionsort):
fp.write("[%s]\n" % section)
for key in sorted(self.content[section], key=cpfsort):
value = str(self.content[section][key])
fp.write("%s\n" % '='.join([key,value]))
fp.write("\n")
fp.close()
def vmapping(self, nsp):
section = 'Map.{}'.format(nsp)
return MappingDict('Variable', self.content[section])
def rmapping(self, nsp):
section = 'Map.{}'.format(nsp)
return MappingDict('Routine', self.content[section])
It's used like this:
config = MyParser('myfile.cfg')
vmap = config.vmapping('PRD')
vmap['FOO'] = 'LOC5'
vmap['BAR'] = 'LOC6'
config.write('newfile.cfg')
The resulting newfile.cfg reflects the LOC5 and LOC6 changes.
I don't think you want inheritance here. You end up with two separate dict objects which you have to create on load and then paste back together on save…
If that's acceptable, you don't even need to bother with the prefixing during normal operations; just do the prefixing while saving, like this:
class Config(object):
def save(self):
merged = {'variable_{}'.format(key): value for key, value
in self.variable_dict.items()}
merged.update({'routine_{}'.format(key): value for key, value
in self.routine_dict.items()}
# now save merged
If you want that merged object to be visible at all times, but don't expect to be called on that very often, make it a #property.
If you want to access the merged dictionary regularly, at the same time you're accessing the two sub-dictionaries, then yes, you want a view:
I suppose my ideal solution would be do dispense with the subclassed dict and have have the global and routine properties somehow present a "view" of the plain dict object underneath without the prefixes.
This is going to be very hard to do with inheritance. Certainly not with inheritance from dict; inheritance from builtins.dict_items might work if you're using Python 3, but it still seems like a stretch.
But with delegation, it's easy. Each sub-dictionary just holds a reference to the parent dict:
class PrefixedDict(object):
def __init__(self, prefix, d):
self.prefix, self.d = prefix, d
def prefixify(self, key):
return '{}_{}'.format(self.prefix, key)
def __getitem__(self, key):
return self.d.__getitem__(self.prefixify(key))
def __setitem__(self, key, value):
return self.d.__setitem__(self.prefixify(key), value)
def __delitem__(self, key):
return self.d.__delitem__(self.prefixify(key))
def __iter__(self):
return (key[len(self.prefix):] for key in self.d
if key.startswith(self.prefix)])
You don't get any of the dict methods for free that way—but that's a good thing, because they were mostly incorrect anyway, right? Explicitly delegate the ones you want. (If you do have some you want to pass through as-is, use __getattr__ for that.)
Besides being conceptually simpler and harder to screw up through accidentally forgetting to override something, this also means that PrefixDict can work with any type of mapping, not just a dict.
So, no matter which way you go, where and how do these objects get created?
The easy answer is that they're attributes that you create when you construct a Config:
def __init__(self):
self.d = {}
self.variable = PrefixedDict('Variable', self.d)
self.routine = PrefixedDict('Routine', self.d)
If this needs to be dynamic (e.g., there can be an arbitrary set of prefixes), create them at load time:
def load(self):
# load up self.d
prefixes = set(key.split('_')[0] for key in self.d)
for prefix in prefixes:
setattr(self, prefix, PrefixedDict(prefix, self.d)
If you want to be able to create them on the fly (so config.newprefix['foo'] = 3 adds 'Newprefix_foo'), you can do this instead:
def __getattr__(self, name):
return PrefixedDict(name.title(), self.d)
But once you're using dynamic attributes, you really have to question whether it isn't cleaner to use dictionary (item) syntax instead, like config['newprefix']['foo']. For one thing, that would actually let you call one of the sub-dictionaries 'global', as in your original question…
Or you can first build the dictionary syntax, use what's usually referred to as an attrdict (search ActiveState recipes and PyPI for 3000 implementations…), which lets you automatically make config.newprefix mean config['newprefix'], so you can use attribute syntax when you have valid identifiers, but fall back to dictionary syntax when you don't.
There are a couple of options for how to proceed.
The simplest might be to use nested dictionaries, so Variable_FOO becomes config["variable"]["FOO"]. You might want to use a defaultdict(dict) for the outer dictionary so you don't need to worry about initializing the inner ones when you add the first value to them.
Another option would be to use tuple keys in a single dictionary. That is, Variable_FOO would become config[("variable", "FOO")]. This is easy to do with code, since you can simply assign to config[tuple(some_string.split("_"))]. Though, I suppose you could also just use the unsplit string as your key in this case.
A final approach allows you to use the syntax you want (where Variable_FOO is accessed as config.Variable["FOO"]), by using __getattr__ and a defaultdict behind the scenes:
from collections import defaultdict
class Config(object):
def __init__(self):
self._attrdicts = defaultdict(dict)
def __getattr__(self, name):
return self._attrdicts[name]
You could extend this with behavior for __setattr__ and __delattr__ but it's probably not necessary. The only serious limitation to this approach (given the original version of the question), is that the attributes names (like Variable) must be legal Python identifiers. You can't use strings with leading numbers, Python keywords (like global) or strings containing whitespace characters.
A downside to this approach is that it's a bit more difficult to use programatically (by, for instance, your config-file parser). To read a value of Variable_FOO and save it to config.Variable["FOO"] you'll probably need to use the global getattr function, like this:
name, value = line.split("=")
prefix, suffix = name.split("_")
getattr(config, prefix)[suffix] = value
I'm new to Python, and am sort of surprised I cannot do this.
dictionary = {
'a' : '123',
'b' : dictionary['a'] + '456'
}
I'm wondering what the Pythonic way to correctly do this in my script, because I feel like I'm not the only one that has tried to do this.
EDIT: Enough people were wondering what I'm doing with this, so here are more details for my use cases. Lets say I want to keep dictionary objects to hold file system paths. The paths are relative to other values in the dictionary. For example, this is what one of my dictionaries may look like.
dictionary = {
'user': 'sholsapp',
'home': '/home/' + dictionary['user']
}
It is important that at any point in time I may change dictionary['user'] and have all of the dictionaries values reflect the change. Again, this is an example of what I'm using it for, so I hope that it conveys my goal.
From my own research I think I will need to implement a class to do this.
No fear of creating new classes -
You can take advantage of Python's string formating capabilities
and simply do:
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item) % self
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/%(user)s',
'bin' : '%(home)s/bin'
})
print dictionary["home"]
print dictionary["bin"]
Nearest I came up without doing object:
dictionary = {
'user' : 'gnucom',
'home' : lambda:'/home/'+dictionary['user']
}
print dictionary['home']()
dictionary['user']='tony'
print dictionary['home']()
>>> dictionary = {
... 'a':'123'
... }
>>> dictionary['b'] = dictionary['a'] + '456'
>>> dictionary
{'a': '123', 'b': '123456'}
It works fine but when you're trying to use dictionary it hasn't been defined yet (because it has to evaluate that literal dictionary first).
But be careful because this assigns to the key of 'b' the value referenced by the key of 'a' at the time of assignment and is not going to do the lookup every time. If that is what you are looking for, it's possible but with more work.
What you're describing in your edit is how an INI config file works. Python does have a built in library called ConfigParser which should work for what you're describing.
This is an interesting problem. It seems like Greg has a good solution. But that's no fun ;)
jsbueno as a very elegant solution but that only applies to strings (as you requested).
The trick to a 'general' self referential dictionary is to use a surrogate object. It takes a few (understatement) lines of code to pull off, but the usage is about what you want:
S = SurrogateDict(AdditionSurrogateDictEntry)
d = S.resolve({'user': 'gnucom',
'home': '/home/' + S['user'],
'config': [S['home'] + '/.emacs', S['home'] + '/.bashrc']})
The code to make that happen is not nearly so short. It lives in three classes:
import abc
class SurrogateDictEntry(object):
__metaclass__ = abc.ABCMeta
def __init__(self, key):
"""record the key on the real dictionary that this will resolve to a
value for
"""
self.key = key
def resolve(self, d):
""" return the actual value"""
if hasattr(self, 'op'):
# any operation done on self will store it's name in self.op.
# if this is set, resolve it by calling the appropriate method
# now that we can get self.value out of d
self.value = d[self.key]
return getattr(self, self.op + 'resolve__')()
else:
return d[self.key]
#staticmethod
def make_op(opname):
"""A convience class. This will be the form of all op hooks for subclasses
The actual logic for the op is in __op__resolve__ (e.g. __add__resolve__)
"""
def op(self, other):
self.stored_value = other
self.op = opname
return self
op.__name__ = opname
return op
Next, comes the concrete class. simple enough.
class AdditionSurrogateDictEntry(SurrogateDictEntry):
__add__ = SurrogateDictEntry.make_op('__add__')
__radd__ = SurrogateDictEntry.make_op('__radd__')
def __add__resolve__(self):
return self.value + self.stored_value
def __radd__resolve__(self):
return self.stored_value + self.value
Here's the final class
class SurrogateDict(object):
def __init__(self, EntryClass):
self.EntryClass = EntryClass
def __getitem__(self, key):
"""record the key and return"""
return self.EntryClass(key)
#staticmethod
def resolve(d):
"""I eat generators resolve self references"""
stack = [d]
while stack:
cur = stack.pop()
# This just tries to set it to an appropriate iterable
it = xrange(len(cur)) if not hasattr(cur, 'keys') else cur.keys()
for key in it:
# sorry for being a duche. Just register your class with
# SurrogateDictEntry and you can pass whatever.
while isinstance(cur[key], SurrogateDictEntry):
cur[key] = cur[key].resolve(d)
# I'm just going to check for iter but you can add other
# checks here for items that we should loop over.
if hasattr(cur[key], '__iter__'):
stack.append(cur[key])
return d
In response to gnucoms's question about why I named the classes the way that I did.
The word surrogate is generally associated with standing in for something else so it seemed appropriate because that's what the SurrogateDict class does: an instance replaces the 'self' references in a dictionary literal. That being said, (other than just being straight up stupid sometimes) naming is probably one of the hardest things for me about coding. If you (or anyone else) can suggest a better name, I'm all ears.
I'll provide a brief explanation. Throughout S refers to an instance of SurrogateDict and d is the real dictionary.
A reference S[key] triggers S.__getitem__ and SurrogateDictEntry(key) to be placed in the d.
When S[key] = SurrogateDictEntry(key) is constructed, it stores key. This will be the key into d for the value that this entry of SurrogateDictEntry is acting as a surrogate for.
After S[key] is returned, it is either entered into the d, or has some operation(s) performed on it. If an operation is performed on it, it triggers the relative __op__ method which simple stores the value that the operation is performed on and the name of the operation and then returns itself. We can't actually resolve the operation because d hasn't been constructed yet.
After d is constructed, it is passed to S.resolve. This method loops through d finding any instances of SurrogateDictEntry and replacing them with the result of calling the resolve method on the instance.
The SurrogateDictEntry.resolve method receives the now constructed d as an argument and can use the value of key that it stored at construction time to get the value that it is acting as a surrogate for. If an operation was performed on it after creation, the op attribute will have been set with the name of the operation that was performed. If the class has a __op__ method, then it has a __op__resolve__ method with the actual logic that would normally be in the __op__ method. So now we have the logic (self.op__resolve) and all necessary values (self.value, self.stored_value) to finally get the real value of d[key]. So we return that which step 4 places in the dictionary.
finally the SurrogateDict.resolve method returns d with all references resolved.
That'a a rough sketch. If you have any more questions, feel free to ask.
If you, just like me wandering how to make #jsbueno snippet work with {} style substitutions, below is the example code (which is probably not much efficient though):
import string
class MyDict(dict):
def __init__(self, *args, **kw):
super(MyDict,self).__init__(*args, **kw)
self.itemlist = super(MyDict,self).keys()
self.fmt = string.Formatter()
def __getitem__(self, item):
return self.fmt.vformat(dict.__getitem__(self, item), {}, self)
xs = MyDict({
'user' : 'gnucom',
'home' : '/home/{user}',
'bin' : '{home}/bin'
})
>>> xs["home"]
'/home/gnucom'
>>> xs["bin"]
'/home/gnucom/bin'
I tried to make it work with the simple replacement of % self with .format(**self) but it turns out it wouldn't work for nested expressions (like 'bin' in above listing, which references 'home', which has it's own reference to 'user') because of the evaluation order (** expansion is done before actual format call and it's not delayed like in original % version).
Write a class, maybe something with properties:
class PathInfo(object):
def __init__(self, user):
self.user = user
#property
def home(self):
return '/home/' + self.user
p = PathInfo('thc')
print p.home # /home/thc
As sort of an extended version of #Tony's answer, you could build a dictionary subclass that calls its values if they are callables:
class CallingDict(dict):
"""Returns the result rather than the value of referenced callables.
>>> cd = CallingDict({1: "One", 2: "Two", 'fsh': "Fish",
... "rhyme": lambda d: ' '.join((d[1], d['fsh'],
... d[2], d['fsh']))})
>>> cd["rhyme"]
'One Fish Two Fish'
>>> cd[1] = 'Red'
>>> cd[2] = 'Blue'
>>> cd["rhyme"]
'Red Fish Blue Fish'
"""
def __getitem__(self, item):
it = super(CallingDict, self).__getitem__(item)
if callable(it):
return it(self)
else:
return it
Of course this would only be usable if you're not actually going to store callables as values. If you need to be able to do that, you could wrap the lambda declaration in a function that adds some attribute to the resulting lambda, and check for it in CallingDict.__getitem__, but at that point it's getting complex, and long-winded, enough that it might just be easier to use a class for your data in the first place.
This is very easy in a lazily evaluated language (haskell).
Since Python is strictly evaluated, we can do a little trick to turn things lazy:
Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
d1 = lambda self: lambda: {
'a': lambda: 3,
'b': lambda: self()['a']()
}
# fix the d1, and evaluate it
d2 = Y(d1)()
# to get a
d2['a']() # 3
# to get b
d2['b']() # 3
Syntax wise this is not very nice. That's because of us needing to explicitly construct lazy expressions with lambda: ... and explicitly evaluate lazy expression with ...(). It's the opposite problem in lazy languages needing strictness annotations, here in Python we end up needing lazy annotations.
I think with some more meta-programmming and some more tricks, the above could be made more easy to use.
Note that this is basically how let-rec works in some functional languages.
The jsbueno answer in Python 3 :
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item).format(self)
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/{0[user]}',
'bin' : '{0[home]}/bin'
})
print(dictionary["home"])
print(dictionary["bin"])
Her ewe use the python 3 string formatting with curly braces {} and the .format() method.
Documentation : https://docs.python.org/3/library/string.html