Python construct for naming string constants - python

Often I use string constants such as:
DICT_KEY1 = 'DICT_KEY1'
DICT_KEY2 = 'DICT_KEY2'
...
Many of those times I don't mind what the actual literals are, as long as they're unique and understandable to the human reader.
This way it is easier to refactor and change a literal across the project.
So my question is, is there a standard way to make these string constants declarations simpler? I don't want to repeat writing the literal 'DICT_KEYn'.
For example, something like this could work:
#string_consts
class DictKeys:
DICT_KEY1: str
DICT_KEY2: str
...
assert DictKeys.DICT_KEY1 == 'DICT_KEY1'

If it helps, the implementation for your string_consts decorator would be
def string_consts(cls):
for key, value in cls.__annotations__.items():
if value is str:
setattr(cls, key, key)
return cls
This makes
#string_consts
class DictKeys:
DICT_KEY1: str
DICT_KEY2: str
assert DictKeys.DICT_KEY1 == 'DICT_KEY1'
as in your example work OOTB.

When at global level, you could do something like this:
for i in range(100):
globals()['DICT_KEY'+str(i)] = 'DICT_KEY'+str(i)
At class level, setattr could be used in the same way:
for i in range(100):
setattr(self, 'DICT_KEY'+str(i), 'DICT_KEY'+str(i))
If your keys are not a numeric list as suggested by your question, you could still do this:
for key in ['KEY', 'OTHERKEY', 'SOMEOTHERKEY']:
globals()[key] = key
In my opinion, this is way easier to understand than using the decorator solution - and does the job.

Related

Can I shorten this piece of code with less elif? [duplicate]

I love using the expression
if 'MICHAEL89' in USERNAMES:
...
where USERNAMES is a list.
Is there any way to match items with case insensitivity or do I need to use a custom method? Just wondering if there is a need to write extra code for this.
username = 'MICHAEL89'
if username.upper() in (name.upper() for name in USERNAMES):
...
Alternatively:
if username.upper() in map(str.upper, USERNAMES):
...
Or, yes, you can make a custom method.
str.casefold is recommended for case-insensitive string matching. #nmichaels's solution can trivially be adapted.
Use either:
if 'MICHAEL89'.casefold() in (name.casefold() for name in USERNAMES):
Or:
if 'MICHAEL89'.casefold() in map(str.casefold, USERNAMES):
As per the docs:
Casefolding is similar to lowercasing but more aggressive because it
is intended to remove all case distinctions in a string. For example,
the German lowercase letter 'ß' is equivalent to "ss". Since it is
already lowercase, lower() would do nothing to 'ß'; casefold()
converts it to "ss".
I would make a wrapper so you can be non-invasive. Minimally, for example...:
class CaseInsensitively(object):
def __init__(self, s):
self.__s = s.lower()
def __hash__(self):
return hash(self.__s)
def __eq__(self, other):
# ensure proper comparison between instances of this class
try:
other = other.__s
except (TypeError, AttributeError):
try:
other = other.lower()
except:
pass
return self.__s == other
Now, if CaseInsensitively('MICHAEL89') in whatever: should behave as required (whether the right-hand side is a list, dict, or set). (It may require more effort to achieve similar results for string inclusion, avoid warnings in some cases involving unicode, etc).
Usually (in oop at least) you shape your object to behave the way you want. name in USERNAMES is not case insensitive, so USERNAMES needs to change:
class NameList(object):
def __init__(self, names):
self.names = names
def __contains__(self, name): # implements `in`
return name.lower() in (n.lower() for n in self.names)
def add(self, name):
self.names.append(name)
# now this works
usernames = NameList(USERNAMES)
print someone in usernames
The great thing about this is that it opens the path for many improvements, without having to change any code outside the class. For example, you could change the self.names to a set for faster lookups, or compute the (n.lower() for n in self.names) only once and store it on the class and so on ...
Here's one way:
if string1.lower() in string2.lower():
...
For this to work, both string1 and string2 objects must be of type string.
I think you have to write some extra code. For example:
if 'MICHAEL89' in map(lambda name: name.upper(), USERNAMES):
...
In this case we are forming a new list with all entries in USERNAMES converted to upper case and then comparing against this new list.
Update
As #viraptor says, it is even better to use a generator instead of map. See #Nathon's answer.
You could do
matcher = re.compile('MICHAEL89', re.IGNORECASE)
filter(matcher.match, USERNAMES)
Update: played around a bit and am thinking you could get a better short-circuit type approach using
matcher = re.compile('MICHAEL89', re.IGNORECASE)
if any( ifilter( matcher.match, USERNAMES ) ):
#your code here
The ifilter function is from itertools, one of my favorite modules within Python. It's faster than a generator but only creates the next item of the list when called upon.
To have it in one line, this is what I did:
if any(([True if 'MICHAEL89' in username.upper() else False for username in USERNAMES])):
print('username exists in list')
I didn't test it time-wise though. I am not sure how fast/efficient it is.
Example from this tutorial:
list1 = ["Apple", "Lenovo", "HP", "Samsung", "ASUS"]
s = "lenovo"
s_lower = s.lower()
res = s_lower in (string.lower() for string in list1)
print(res)
My 5 (wrong) cents
'a' in "".join(['A']).lower()
UPDATE
Ouch, totally agree #jpp, I'll keep as an example of bad practice :(
I needed this for a dictionary instead of list, Jochen solution was the most elegant for that case so I modded it a bit:
class CaseInsensitiveDict(dict):
''' requests special dicts are case insensitive when using the in operator,
this implements a similar behaviour'''
def __contains__(self, name): # implements `in`
return name.casefold() in (n.casefold() for n in self.keys())
now you can convert a dictionary like so USERNAMESDICT = CaseInsensitiveDict(USERNAMESDICT) and use if 'MICHAEL89' in USERNAMESDICT:

Use an object (e.g., an Enum) as a key in **kwargs?

Very simple question from a Python newbie:
My understanding is that the keys in a dict are able to be just about any immutable data type. Is it possible to pass an immutable object (e.g., a member of an enum class) as a key in the **kwargs dictionary for a function or a class? I have tried it and the answer seems to be "no":
from enum import Enum
class MyEnum(Enum):
X= 'X'
Y= 'Y'
def func(*args,**kwargs):
pass
func(MyEnum.X = 1)
Output:
"SyntaxError: keyword can't be an expression"
However, there may be something I am missing.
EDIT: Note that I am not trying to make the key equal to MyEnum.X.value (which is a string in this case); I want the key to be the actual Enum object, e.g. MyEnum.X.
You're doing:
func(MyEnum.X = 1)
Here, the problem is MyEnum.X = 1 -- Your keyword (MyEnum.X) is actually an expression (getattr(MyEnum, 'X')), and expressions can't be used as keywords in function calls. In fact, only identifiers can be used as keywords.
To get your call to work, you'll need to use dictionary unpacking like this:
func(**{MyEnum.X.name: 1})
Note, to get the name of the attribute, I needed to do MyEnum.X.name or MyEnum.X.value, depending on how you set up your enum -- In your case, I they are the same thing.
>>> from enum import Enum
>>> class Foo(Enum):
... X = 'X'
...
>>> Foo.X.value
'X'
>>> Foo.X.name
'X'
This won't work, because of the way keyword arguments are being processed. The documentation says:
[...] Next, for each keyword argument, the identifier is used to determine the corresponding slot (if the identifier is the same as the first formal parameter name, the first slot is used, and so on) [...]
So there must be a way to match the key from the dictionary to the formal parameter name. The exception:
keywords must be strings
when you try to pass something that's not a string:
func(**{MyEnum.X: 1})
suggest the simplest case is required: keys must be strings.
A possible workaround is to make implicit things explicit: just create a class that contains all the necessary information you want to pass in its attributes and pass it. The code will surely be more readable.
The answer to my original question is indeed "no". However, thanks to the input from mgilson and BartoszKP and others, the following work around I came up with is not a bad solution, and solves my current problem. I offer it for others to look at who are trying to do something similar:
from enum import Enum
class MyEnum(Enum):
X= 'X'
Y= 'Y'
def func(*args,**kwargs):
#replace kwargs with kwargsNew
kwargsNew = {}
for kwkey, kwvalue in kwargs.items():
try: kwargsNew[MyEnum(kwkey)] = kwvalue
except ValueError: kwargsNew[kwkey] = kwvalue
doStuffWithKwargs(kwargsNew)
def doStuffWithKwargs(k):
for K in k:
print(K)
#Pass the name X or Y as the key;
#all other keys not found in `MyEnum` are treated normally
func(X = 1, Y = 2, Z = 3)
Output:
Z
MyEnum.X
MyEnum.Y
(no errors)
Do you actually want to create an instnace of MyEnum?
myenum = MyEnum()
func(myenum.X = 1)
One alternative I have found is to pass a dict into *args instead of **kwargs, or to assign a dict to kwargs[0] directly:
func({MyEnum.X: 1})
func(kwargs = {MyEnum.X: 1})
(No errors produced)
However, I really don't like either of these methods.
EDIT: See my second answer for a much better solution.

How can I apply a prefix to dictionary access?

I'm imitating the behavior of the ConfigParser module to write a highly specialized parser that exploits some well-defined structure in the configuration files for a particular application I work with. Several sections of the config file contain hundreds of variable and routine mappings prefixed with either Variable_ or Routine_, like this:
[Map.PRD]
Variable_FOO=LOC1
Variable_BAR=LOC2
Routine_FOO=LOC3
Routine_BAR=LOC4
...
[Map.SHD]
Variable_FOO=LOC1
Variable_BAR=LOC2
Routine_FOO=LOC3
Routine_BAR=LOC4
...
I'd like to maintain the basic structure of ConfigParser where each section is stored as a single dictionary, so users would still have access to the classic syntax:
config.content['Mappings']['Variable_FOO'] = 'LOC1'
but also be able to use a simplified API that drills down to this section:
config.vmapping('PRD')['FOO'] = 'LOC1'
config.vmapping('PRD')['BAR'] = 'LOC2'
config.rmapping('PRD')['FOO'] = 'LOC3'
config.rmapping('PRD')['BAR'] = 'LOC4'
Currently I'm implementing this by storing the section in a special subclass of dict to which I've added a prefix attribute. The variable and routine properties of the parser set the prefix attribute of the dict-like object to 'Variable_' or 'Routine_' and then modified __getitem__ and __setitem__ attributes of the dict handle gluing the prefix together with the key to access the appropriate item. It's working, but involves a lot of boilerplate to implement all the associated niceties like supporting iteration.
I suppose my ideal solution would be do dispense with the subclassed dict and have have the variable and routine properties somehow present a "view" of the plain dict object underneath without the prefixes.
Update
Here's the solution I implemented, largely based on #abarnet's answer:
class MappingDict(object):
def __init__(self, prefix, d):
self.prefix, self.d = prefix, d
def prefixify(self, name):
return '{}_{}'.format(self.prefix, name)
def __getitem__(self, name):
name = self.prefixify(name)
return self.d.__getitem__(name)
def __setitem__(self, name, value):
name = self.prefixify(name)
return self.d.__setitem__(name, value)
def __delitem__(self, name):
name = self.prefixify(name)
return self.d.__delitem__(name)
def __iter__(self):
return (key.partition('_')[-1] for key in self.d
if key.startswith(self.prefix))
def __repr__(self):
return 'MappingDict({})'.format(dict.__repr__(self))
class MyParser(object):
SECTCRE = re.compile(r'\[(?P<header>[^]]+)\]')
def __init__(self, filename):
self.filename = filename
self.content = {}
lines = [x.strip() for x in open(filename).read().splitlines()
if x.strip()]
for line in lines:
match = re.match(self.SECTCRE, line)
if match:
section = match.group('header')
self.content[section] = {}
else:
key, sep, value = line.partition('=')
self.content[section][key] = value
def write(self, filename):
fp = open(filename, 'w')
for section in sorted(self.content, key=sectionsort):
fp.write("[%s]\n" % section)
for key in sorted(self.content[section], key=cpfsort):
value = str(self.content[section][key])
fp.write("%s\n" % '='.join([key,value]))
fp.write("\n")
fp.close()
def vmapping(self, nsp):
section = 'Map.{}'.format(nsp)
return MappingDict('Variable', self.content[section])
def rmapping(self, nsp):
section = 'Map.{}'.format(nsp)
return MappingDict('Routine', self.content[section])
It's used like this:
config = MyParser('myfile.cfg')
vmap = config.vmapping('PRD')
vmap['FOO'] = 'LOC5'
vmap['BAR'] = 'LOC6'
config.write('newfile.cfg')
The resulting newfile.cfg reflects the LOC5 and LOC6 changes.
I don't think you want inheritance here. You end up with two separate dict objects which you have to create on load and then paste back together on save…
If that's acceptable, you don't even need to bother with the prefixing during normal operations; just do the prefixing while saving, like this:
class Config(object):
def save(self):
merged = {'variable_{}'.format(key): value for key, value
in self.variable_dict.items()}
merged.update({'routine_{}'.format(key): value for key, value
in self.routine_dict.items()}
# now save merged
If you want that merged object to be visible at all times, but don't expect to be called on that very often, make it a #property.
If you want to access the merged dictionary regularly, at the same time you're accessing the two sub-dictionaries, then yes, you want a view:
I suppose my ideal solution would be do dispense with the subclassed dict and have have the global and routine properties somehow present a "view" of the plain dict object underneath without the prefixes.
This is going to be very hard to do with inheritance. Certainly not with inheritance from dict; inheritance from builtins.dict_items might work if you're using Python 3, but it still seems like a stretch.
But with delegation, it's easy. Each sub-dictionary just holds a reference to the parent dict:
class PrefixedDict(object):
def __init__(self, prefix, d):
self.prefix, self.d = prefix, d
def prefixify(self, key):
return '{}_{}'.format(self.prefix, key)
def __getitem__(self, key):
return self.d.__getitem__(self.prefixify(key))
def __setitem__(self, key, value):
return self.d.__setitem__(self.prefixify(key), value)
def __delitem__(self, key):
return self.d.__delitem__(self.prefixify(key))
def __iter__(self):
return (key[len(self.prefix):] for key in self.d
if key.startswith(self.prefix)])
You don't get any of the dict methods for free that way—but that's a good thing, because they were mostly incorrect anyway, right? Explicitly delegate the ones you want. (If you do have some you want to pass through as-is, use __getattr__ for that.)
Besides being conceptually simpler and harder to screw up through accidentally forgetting to override something, this also means that PrefixDict can work with any type of mapping, not just a dict.
So, no matter which way you go, where and how do these objects get created?
The easy answer is that they're attributes that you create when you construct a Config:
def __init__(self):
self.d = {}
self.variable = PrefixedDict('Variable', self.d)
self.routine = PrefixedDict('Routine', self.d)
If this needs to be dynamic (e.g., there can be an arbitrary set of prefixes), create them at load time:
def load(self):
# load up self.d
prefixes = set(key.split('_')[0] for key in self.d)
for prefix in prefixes:
setattr(self, prefix, PrefixedDict(prefix, self.d)
If you want to be able to create them on the fly (so config.newprefix['foo'] = 3 adds 'Newprefix_foo'), you can do this instead:
def __getattr__(self, name):
return PrefixedDict(name.title(), self.d)
But once you're using dynamic attributes, you really have to question whether it isn't cleaner to use dictionary (item) syntax instead, like config['newprefix']['foo']. For one thing, that would actually let you call one of the sub-dictionaries 'global', as in your original question…
Or you can first build the dictionary syntax, use what's usually referred to as an attrdict (search ActiveState recipes and PyPI for 3000 implementations…), which lets you automatically make config.newprefix mean config['newprefix'], so you can use attribute syntax when you have valid identifiers, but fall back to dictionary syntax when you don't.
There are a couple of options for how to proceed.
The simplest might be to use nested dictionaries, so Variable_FOO becomes config["variable"]["FOO"]. You might want to use a defaultdict(dict) for the outer dictionary so you don't need to worry about initializing the inner ones when you add the first value to them.
Another option would be to use tuple keys in a single dictionary. That is, Variable_FOO would become config[("variable", "FOO")]. This is easy to do with code, since you can simply assign to config[tuple(some_string.split("_"))]. Though, I suppose you could also just use the unsplit string as your key in this case.
A final approach allows you to use the syntax you want (where Variable_FOO is accessed as config.Variable["FOO"]), by using __getattr__ and a defaultdict behind the scenes:
from collections import defaultdict
class Config(object):
def __init__(self):
self._attrdicts = defaultdict(dict)
def __getattr__(self, name):
return self._attrdicts[name]
You could extend this with behavior for __setattr__ and __delattr__ but it's probably not necessary. The only serious limitation to this approach (given the original version of the question), is that the attributes names (like Variable) must be legal Python identifiers. You can't use strings with leading numbers, Python keywords (like global) or strings containing whitespace characters.
A downside to this approach is that it's a bit more difficult to use programatically (by, for instance, your config-file parser). To read a value of Variable_FOO and save it to config.Variable["FOO"] you'll probably need to use the global getattr function, like this:
name, value = line.split("=")
prefix, suffix = name.split("_")
getattr(config, prefix)[suffix] = value

Hashing a dictionary?

For caching purposes I need to generate a cache key from GET arguments which are present in a dict.
Currently I'm using sha1(repr(sorted(my_dict.items()))) (sha1() is a convenience method that uses hashlib internally) but I'm curious if there's a better way.
Using sorted(d.items()) isn't enough to get us a stable repr. Some of the values in d could be dictionaries too, and their keys will still come out in an arbitrary order. As long as all the keys are strings, I prefer to use:
json.dumps(d, sort_keys=True)
That said, if the hashes need to be stable across different machines or Python versions, I'm not certain that this is bulletproof. You might want to add the separators and ensure_ascii arguments to protect yourself from any changes to the defaults there. I'd appreciate comments.
If your dictionary is not nested, you could make a frozenset with the dict's items and use hash():
hash(frozenset(my_dict.items()))
This is much less computationally intensive than generating the JSON string or representation of the dictionary.
UPDATE: Please see the comments below, why this approach might not produce a stable result.
EDIT: If all your keys are strings, then before continuing to read this answer, please see Jack O'Connor's significantly simpler (and faster) solution (which also works for hashing nested dictionaries).
Although an answer has been accepted, the title of the question is "Hashing a python dictionary", and the answer is incomplete as regards that title. (As regards the body of the question, the answer is complete.)
Nested Dictionaries
If one searches Stack Overflow for how to hash a dictionary, one might stumble upon this aptly titled question, and leave unsatisfied if one is attempting to hash multiply nested dictionaries. The answer above won't work in this case, and you'll have to implement some sort of recursive mechanism to retrieve the hash.
Here is one such mechanism:
import copy
def make_hash(o):
"""
Makes a hash from a dictionary, list, tuple or set to any level, that contains
only other hashable types (including any lists, tuples, sets, and
dictionaries).
"""
if isinstance(o, (set, tuple, list)):
return tuple([make_hash(e) for e in o])
elif not isinstance(o, dict):
return hash(o)
new_o = copy.deepcopy(o)
for k, v in new_o.items():
new_o[k] = make_hash(v)
return hash(tuple(frozenset(sorted(new_o.items()))))
Bonus: Hashing Objects and Classes
The hash() function works great when you hash classes or instances. However, here is one issue I found with hash, as regards objects:
class Foo(object): pass
foo = Foo()
print (hash(foo)) # 1209812346789
foo.a = 1
print (hash(foo)) # 1209812346789
The hash is the same, even after I've altered foo. This is because the identity of foo hasn't changed, so the hash is the same. If you want foo to hash differently depending on its current definition, the solution is to hash off whatever is actually changing. In this case, the __dict__ attribute:
class Foo(object): pass
foo = Foo()
print (make_hash(foo.__dict__)) # 1209812346789
foo.a = 1
print (make_hash(foo.__dict__)) # -78956430974785
Alas, when you attempt to do the same thing with the class itself:
print (make_hash(Foo.__dict__)) # TypeError: unhashable type: 'dict_proxy'
The class __dict__ property is not a normal dictionary:
print (type(Foo.__dict__)) # type <'dict_proxy'>
Here is a similar mechanism as previous that will handle classes appropriately:
import copy
DictProxyType = type(object.__dict__)
def make_hash(o):
"""
Makes a hash from a dictionary, list, tuple or set to any level, that
contains only other hashable types (including any lists, tuples, sets, and
dictionaries). In the case where other kinds of objects (like classes) need
to be hashed, pass in a collection of object attributes that are pertinent.
For example, a class can be hashed in this fashion:
make_hash([cls.__dict__, cls.__name__])
A function can be hashed like so:
make_hash([fn.__dict__, fn.__code__])
"""
if type(o) == DictProxyType:
o2 = {}
for k, v in o.items():
if not k.startswith("__"):
o2[k] = v
o = o2
if isinstance(o, (set, tuple, list)):
return tuple([make_hash(e) for e in o])
elif not isinstance(o, dict):
return hash(o)
new_o = copy.deepcopy(o)
for k, v in new_o.items():
new_o[k] = make_hash(v)
return hash(tuple(frozenset(sorted(new_o.items()))))
You can use this to return a hash tuple of however many elements you'd like:
# -7666086133114527897
print (make_hash(func.__code__))
# (-7666086133114527897, 3527539)
print (make_hash([func.__code__, func.__dict__]))
# (-7666086133114527897, 3527539, -509551383349783210)
print (make_hash([func.__code__, func.__dict__, func.__name__]))
NOTE: all of the above code assumes Python 3.x. Did not test in earlier versions, although I assume make_hash() will work in, say, 2.7.2. As far as making the examples work, I do know that
func.__code__
should be replaced with
func.func_code
The code below avoids using the Python hash() function because it will not provide hashes that are consistent across restarts of Python (see hash function in Python 3.3 returns different results between sessions). make_hashable() will convert the object into nested tuples and make_hash_sha256() will also convert the repr() to a base64 encoded SHA256 hash.
import hashlib
import base64
def make_hash_sha256(o):
hasher = hashlib.sha256()
hasher.update(repr(make_hashable(o)).encode())
return base64.b64encode(hasher.digest()).decode()
def make_hashable(o):
if isinstance(o, (tuple, list)):
return tuple((make_hashable(e) for e in o))
if isinstance(o, dict):
return tuple(sorted((k,make_hashable(v)) for k,v in o.items()))
if isinstance(o, (set, frozenset)):
return tuple(sorted(make_hashable(e) for e in o))
return o
o = dict(x=1,b=2,c=[3,4,5],d={6,7})
print(make_hashable(o))
# (('b', 2), ('c', (3, 4, 5)), ('d', (6, 7)), ('x', 1))
print(make_hash_sha256(o))
# fyt/gK6D24H9Ugexw+g3lbqnKZ0JAcgtNW+rXIDeU2Y=
Here is a clearer solution.
def freeze(o):
if isinstance(o,dict):
return frozenset({ k:freeze(v) for k,v in o.items()}.items())
if isinstance(o,list):
return tuple([freeze(v) for v in o])
return o
def make_hash(o):
"""
makes a hash out of anything that contains only list,dict and hashable types including string and numeric types
"""
return hash(freeze(o))
MD5 HASH
The method which resulted in the most stable results for me was using md5 hashes and json.stringify
from typing import Dict, Any
import hashlib
import json
def dict_hash(dictionary: Dict[str, Any]) -> str:
"""MD5 hash of a dictionary."""
dhash = hashlib.md5()
# We need to sort arguments so {'a': 1, 'b': 2} is
# the same as {'b': 2, 'a': 1}
encoded = json.dumps(dictionary, sort_keys=True).encode()
dhash.update(encoded)
return dhash.hexdigest()
While hash(frozenset(x.items()) and hash(tuple(sorted(x.items())) work, that's doing a lot of work allocating and copying all the key-value pairs. A hash function really should avoid a lot of memory allocation.
A little bit of math can help here. The problem with most hash functions is that they assume that order matters. To hash an unordered structure, you need a commutative operation. Multiplication doesn't work well as any element hashing to 0 means the whole product is 0. Bitwise & and | tend towards all 0's or 1's. There are two good candidates: addition and xor.
from functools import reduce
from operator import xor
class hashable(dict):
def __hash__(self):
return reduce(xor, map(hash, self.items()), 0)
# Alternative
def __hash__(self):
return sum(map(hash, self.items()))
One point: xor works, in part, because dict guarantees keys are unique. And sum works because Python will bitwise truncate the results.
If you want to hash a multiset, sum is preferable. With xor, {a} would hash to the same value as {a, a, a} because x ^ x ^ x = x.
If you really need the guarantees that SHA makes, this won't work for you. But to use a dictionary in a set, this will work fine; Python containers are resiliant to some collisions, and the underlying hash functions are pretty good.
Updated from 2013 reply...
None of the above answers seem reliable to me. The reason is the use of items(). As far as I know, this comes out in a machine-dependent order.
How about this instead?
import hashlib
def dict_hash(the_dict, *ignore):
if ignore: # Sometimes you don't care about some items
interesting = the_dict.copy()
for item in ignore:
if item in interesting:
interesting.pop(item)
the_dict = interesting
result = hashlib.sha1(
'%s' % sorted(the_dict.items())
).hexdigest()
return result
Use DeepHash from DeepDiff Module
from deepdiff import DeepHash
obj = {'a':'1',b:'2'}
hashes = DeepHash(obj)[obj]
To preserve key order, instead of hash(str(dictionary)) or hash(json.dumps(dictionary)) I would prefer quick-and-dirty solution:
from pprint import pformat
h = hash(pformat(dictionary))
It will work even for types like DateTime and more that are not JSON serializable.
You can use the maps library to do this. Specifically, maps.FrozenMap
import maps
fm = maps.FrozenMap(my_dict)
hash(fm)
To install maps, just do:
pip install maps
It handles the nested dict case too:
import maps
fm = maps.FrozenMap.recurse(my_dict)
hash(fm)
Disclaimer: I am the author of the maps library.
You could use the third-party frozendict module to freeze your dict and make it hashable.
from frozendict import frozendict
my_dict = frozendict(my_dict)
For handling nested objects, you could go with:
import collections.abc
def make_hashable(x):
if isinstance(x, collections.abc.Hashable):
return x
elif isinstance(x, collections.abc.Sequence):
return tuple(make_hashable(xi) for xi in x)
elif isinstance(x, collections.abc.Set):
return frozenset(make_hashable(xi) for xi in x)
elif isinstance(x, collections.abc.Mapping):
return frozendict({k: make_hashable(v) for k, v in x.items()})
else:
raise TypeError("Don't know how to make {} objects hashable".format(type(x).__name__))
If you want to support more types, use functools.singledispatch (Python 3.7):
#functools.singledispatch
def make_hashable(x):
raise TypeError("Don't know how to make {} objects hashable".format(type(x).__name__))
#make_hashable.register
def _(x: collections.abc.Hashable):
return x
#make_hashable.register
def _(x: collections.abc.Sequence):
return tuple(make_hashable(xi) for xi in x)
#make_hashable.register
def _(x: collections.abc.Set):
return frozenset(make_hashable(xi) for xi in x)
#make_hashable.register
def _(x: collections.abc.Mapping):
return frozendict({k: make_hashable(v) for k, v in x.items()})
# add your own types here
One way to approach the problem is to make a tuple of the dictionary's items:
hash(tuple(my_dict.items()))
This is not a general solution (i.e. only trivially works if your dict is not nested), but since nobody here suggested it, I thought it might be useful to share it.
One can use a (third-party) immutables package and create an immutable 'snapshot' of a dict like this:
from immutables import Map
map = dict(a=1, b=2)
immap = Map(map)
hash(immap)
This seems to be faster than, say, stringification of the original dict.
I learned about this from this nice article.
For nested structures, having string keys at the top level dict, you can use pickle(protocol=5) and hash the bytes object. If you need safety, you can use a safe serializer.
I do it like this:
hash(str(my_dict))

What do I do when I need a self referential dictionary?

I'm new to Python, and am sort of surprised I cannot do this.
dictionary = {
'a' : '123',
'b' : dictionary['a'] + '456'
}
I'm wondering what the Pythonic way to correctly do this in my script, because I feel like I'm not the only one that has tried to do this.
EDIT: Enough people were wondering what I'm doing with this, so here are more details for my use cases. Lets say I want to keep dictionary objects to hold file system paths. The paths are relative to other values in the dictionary. For example, this is what one of my dictionaries may look like.
dictionary = {
'user': 'sholsapp',
'home': '/home/' + dictionary['user']
}
It is important that at any point in time I may change dictionary['user'] and have all of the dictionaries values reflect the change. Again, this is an example of what I'm using it for, so I hope that it conveys my goal.
From my own research I think I will need to implement a class to do this.
No fear of creating new classes -
You can take advantage of Python's string formating capabilities
and simply do:
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item) % self
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/%(user)s',
'bin' : '%(home)s/bin'
})
print dictionary["home"]
print dictionary["bin"]
Nearest I came up without doing object:
dictionary = {
'user' : 'gnucom',
'home' : lambda:'/home/'+dictionary['user']
}
print dictionary['home']()
dictionary['user']='tony'
print dictionary['home']()
>>> dictionary = {
... 'a':'123'
... }
>>> dictionary['b'] = dictionary['a'] + '456'
>>> dictionary
{'a': '123', 'b': '123456'}
It works fine but when you're trying to use dictionary it hasn't been defined yet (because it has to evaluate that literal dictionary first).
But be careful because this assigns to the key of 'b' the value referenced by the key of 'a' at the time of assignment and is not going to do the lookup every time. If that is what you are looking for, it's possible but with more work.
What you're describing in your edit is how an INI config file works. Python does have a built in library called ConfigParser which should work for what you're describing.
This is an interesting problem. It seems like Greg has a good solution. But that's no fun ;)
jsbueno as a very elegant solution but that only applies to strings (as you requested).
The trick to a 'general' self referential dictionary is to use a surrogate object. It takes a few (understatement) lines of code to pull off, but the usage is about what you want:
S = SurrogateDict(AdditionSurrogateDictEntry)
d = S.resolve({'user': 'gnucom',
'home': '/home/' + S['user'],
'config': [S['home'] + '/.emacs', S['home'] + '/.bashrc']})
The code to make that happen is not nearly so short. It lives in three classes:
import abc
class SurrogateDictEntry(object):
__metaclass__ = abc.ABCMeta
def __init__(self, key):
"""record the key on the real dictionary that this will resolve to a
value for
"""
self.key = key
def resolve(self, d):
""" return the actual value"""
if hasattr(self, 'op'):
# any operation done on self will store it's name in self.op.
# if this is set, resolve it by calling the appropriate method
# now that we can get self.value out of d
self.value = d[self.key]
return getattr(self, self.op + 'resolve__')()
else:
return d[self.key]
#staticmethod
def make_op(opname):
"""A convience class. This will be the form of all op hooks for subclasses
The actual logic for the op is in __op__resolve__ (e.g. __add__resolve__)
"""
def op(self, other):
self.stored_value = other
self.op = opname
return self
op.__name__ = opname
return op
Next, comes the concrete class. simple enough.
class AdditionSurrogateDictEntry(SurrogateDictEntry):
__add__ = SurrogateDictEntry.make_op('__add__')
__radd__ = SurrogateDictEntry.make_op('__radd__')
def __add__resolve__(self):
return self.value + self.stored_value
def __radd__resolve__(self):
return self.stored_value + self.value
Here's the final class
class SurrogateDict(object):
def __init__(self, EntryClass):
self.EntryClass = EntryClass
def __getitem__(self, key):
"""record the key and return"""
return self.EntryClass(key)
#staticmethod
def resolve(d):
"""I eat generators resolve self references"""
stack = [d]
while stack:
cur = stack.pop()
# This just tries to set it to an appropriate iterable
it = xrange(len(cur)) if not hasattr(cur, 'keys') else cur.keys()
for key in it:
# sorry for being a duche. Just register your class with
# SurrogateDictEntry and you can pass whatever.
while isinstance(cur[key], SurrogateDictEntry):
cur[key] = cur[key].resolve(d)
# I'm just going to check for iter but you can add other
# checks here for items that we should loop over.
if hasattr(cur[key], '__iter__'):
stack.append(cur[key])
return d
In response to gnucoms's question about why I named the classes the way that I did.
The word surrogate is generally associated with standing in for something else so it seemed appropriate because that's what the SurrogateDict class does: an instance replaces the 'self' references in a dictionary literal. That being said, (other than just being straight up stupid sometimes) naming is probably one of the hardest things for me about coding. If you (or anyone else) can suggest a better name, I'm all ears.
I'll provide a brief explanation. Throughout S refers to an instance of SurrogateDict and d is the real dictionary.
A reference S[key] triggers S.__getitem__ and SurrogateDictEntry(key) to be placed in the d.
When S[key] = SurrogateDictEntry(key) is constructed, it stores key. This will be the key into d for the value that this entry of SurrogateDictEntry is acting as a surrogate for.
After S[key] is returned, it is either entered into the d, or has some operation(s) performed on it. If an operation is performed on it, it triggers the relative __op__ method which simple stores the value that the operation is performed on and the name of the operation and then returns itself. We can't actually resolve the operation because d hasn't been constructed yet.
After d is constructed, it is passed to S.resolve. This method loops through d finding any instances of SurrogateDictEntry and replacing them with the result of calling the resolve method on the instance.
The SurrogateDictEntry.resolve method receives the now constructed d as an argument and can use the value of key that it stored at construction time to get the value that it is acting as a surrogate for. If an operation was performed on it after creation, the op attribute will have been set with the name of the operation that was performed. If the class has a __op__ method, then it has a __op__resolve__ method with the actual logic that would normally be in the __op__ method. So now we have the logic (self.op__resolve) and all necessary values (self.value, self.stored_value) to finally get the real value of d[key]. So we return that which step 4 places in the dictionary.
finally the SurrogateDict.resolve method returns d with all references resolved.
That'a a rough sketch. If you have any more questions, feel free to ask.
If you, just like me wandering how to make #jsbueno snippet work with {} style substitutions, below is the example code (which is probably not much efficient though):
import string
class MyDict(dict):
def __init__(self, *args, **kw):
super(MyDict,self).__init__(*args, **kw)
self.itemlist = super(MyDict,self).keys()
self.fmt = string.Formatter()
def __getitem__(self, item):
return self.fmt.vformat(dict.__getitem__(self, item), {}, self)
xs = MyDict({
'user' : 'gnucom',
'home' : '/home/{user}',
'bin' : '{home}/bin'
})
>>> xs["home"]
'/home/gnucom'
>>> xs["bin"]
'/home/gnucom/bin'
I tried to make it work with the simple replacement of % self with .format(**self) but it turns out it wouldn't work for nested expressions (like 'bin' in above listing, which references 'home', which has it's own reference to 'user') because of the evaluation order (** expansion is done before actual format call and it's not delayed like in original % version).
Write a class, maybe something with properties:
class PathInfo(object):
def __init__(self, user):
self.user = user
#property
def home(self):
return '/home/' + self.user
p = PathInfo('thc')
print p.home # /home/thc
As sort of an extended version of #Tony's answer, you could build a dictionary subclass that calls its values if they are callables:
class CallingDict(dict):
"""Returns the result rather than the value of referenced callables.
>>> cd = CallingDict({1: "One", 2: "Two", 'fsh': "Fish",
... "rhyme": lambda d: ' '.join((d[1], d['fsh'],
... d[2], d['fsh']))})
>>> cd["rhyme"]
'One Fish Two Fish'
>>> cd[1] = 'Red'
>>> cd[2] = 'Blue'
>>> cd["rhyme"]
'Red Fish Blue Fish'
"""
def __getitem__(self, item):
it = super(CallingDict, self).__getitem__(item)
if callable(it):
return it(self)
else:
return it
Of course this would only be usable if you're not actually going to store callables as values. If you need to be able to do that, you could wrap the lambda declaration in a function that adds some attribute to the resulting lambda, and check for it in CallingDict.__getitem__, but at that point it's getting complex, and long-winded, enough that it might just be easier to use a class for your data in the first place.
This is very easy in a lazily evaluated language (haskell).
Since Python is strictly evaluated, we can do a little trick to turn things lazy:
Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
d1 = lambda self: lambda: {
'a': lambda: 3,
'b': lambda: self()['a']()
}
# fix the d1, and evaluate it
d2 = Y(d1)()
# to get a
d2['a']() # 3
# to get b
d2['b']() # 3
Syntax wise this is not very nice. That's because of us needing to explicitly construct lazy expressions with lambda: ... and explicitly evaluate lazy expression with ...(). It's the opposite problem in lazy languages needing strictness annotations, here in Python we end up needing lazy annotations.
I think with some more meta-programmming and some more tricks, the above could be made more easy to use.
Note that this is basically how let-rec works in some functional languages.
The jsbueno answer in Python 3 :
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item).format(self)
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/{0[user]}',
'bin' : '{0[home]}/bin'
})
print(dictionary["home"])
print(dictionary["bin"])
Her ewe use the python 3 string formatting with curly braces {} and the .format() method.
Documentation : https://docs.python.org/3/library/string.html

Categories

Resources