Related
Do you know of a Python library which provides mutable strings? Google returned surprisingly few results. The only usable library I found is http://code.google.com/p/gapbuffer/ which is in C but I would prefer it to be written in pure Python.
Edit: Thanks for the responses but I'm after an efficient library. That is, ''.join(list) might work but I was hoping for something more optimized. Also, it has to support the usual stuff regular strings do, like regex and unicode.
In Python mutable sequence type is bytearray see this link
This will allow you to efficiently change characters in a string. Although you can't change the string length.
>>> import ctypes
>>> a = 'abcdefghijklmn'
>>> mutable = ctypes.create_string_buffer(a)
>>> mutable[5:10] = ''.join( reversed(list(mutable[5:10].upper())) )
>>> a = mutable.value
>>> print `a, type(a)`
('abcdeJIHGFklmn', <type 'str'>)
class MutableString(object):
def __init__(self, data):
self.data = list(data)
def __repr__(self):
return "".join(self.data)
def __setitem__(self, index, value):
self.data[index] = value
def __getitem__(self, index):
if type(index) == slice:
return "".join(self.data[index])
return self.data[index]
def __delitem__(self, index):
del self.data[index]
def __add__(self, other):
self.data.extend(list(other))
def __len__(self):
return len(self.data)
...
and so on, and so forth.
You could also subclass StringIO, buffer, or bytearray.
How about simply sub-classing list (the prime example for mutability in Python)?
class CharList(list):
def __init__(self, s):
list.__init__(self, s)
#property
def list(self):
return list(self)
#property
def string(self):
return "".join(self)
def __setitem__(self, key, value):
if isinstance(key, int) and len(value) != 1:
cls = type(self).__name__
raise ValueError("attempt to assign sequence of size {} to {} item of size 1".format(len(value), cls))
super(CharList, self).__setitem__(key, value)
def __str__(self):
return self.string
def __repr__(self):
cls = type(self).__name__
return "{}(\'{}\')".format(cls, self.string)
This only joins the list back to a string if you want to print it or actively ask for the string representation.
Mutating and extending are trivial, and the user knows how to do it already since it's just a list.
Example usage:
s = "te_st"
c = CharList(s)
c[1:3] = "oa"
c += "er"
print c # prints "toaster"
print c.list # prints ['t', 'o', 'a', 's', 't', 'e', 'r']
The following is fixed, see update below.
There's one (solvable) caveat: There's no check (yet) that each element is indeed a character. It will at least fail printing for everything but strings. However, those can be joined and may cause weird situations like this: [see code example below]
With the custom __setitem__, assigning a string of length != 1 to a CharList item will raise a ValueError. Everything else can still be freely assigned but will raise a TypeError: sequence item n: expected string, X found when printing, due to the string.join() operation. If that's not good enough, further checks can be added easily (potentially also to __setslice__ or by switching the base class to collections.Sequence (performance might be different?!), cf. here)
s = "test"
c = CharList(s)
c[1] = "oa"
# with custom __setitem__ a ValueError is raised here!
# without custom __setitem__, we could go on:
c += "er"
print c # prints "toaster"
# this looks right until here, but:
print c.list # prints ['t', 'oa', 's', 't', 'e', 'r']
Efficient mutable strings in Python are arrays.
PY3 Example for unicode string using array.array from standard library:
>>> ua = array.array('u', 'teststring12')
>>> ua[-2:] = array.array('u', '345')
>>> ua
array('u', 'teststring345')
>>> re.search('string.*', ua.tounicode()).group()
'string345'
bytearray is predefined for bytes and is more automatic regarding conversion and compatibility.
You can also consider memoryview / buffer, numpy arrays, mmap and multiprocessing.shared_memory for certain cases.
The FIFOStr package in pypi supports pattern matching and mutable strings. This may or may not be exactly what is wanted but was created as part of a pattern parser for a serial port (the chars are added one char at a time from left or right - see docs). It is derived from deque.
from fifostr import FIFOStr
myString = FIFOStr("this is a test")
myString.head(4) == "this" #true
myString[2] = 'u'
myString.head(4) == "thus" #true
(full disclosure I'm the author of FIFOstr)
Just do this
string = "big"
string = list(string)
string[0] = string[0].upper()
string = "".join(string)
print(string)
'''OUTPUT'''
> Big
I am trying to implement slice functionality for a class I am making that creates a vector representation.
I have this code so far, which I believe will properly implement the slice but whenever I do something like v[4] where v is a vector, python raises an error about not having enough arguments. So I am trying to figure out how to define the __getitem__ special method in my class to handle both plain indexes and slicing.
def __getitem__(self, start, stop, step):
index = start
if stop == None:
end = start + 1
else:
end = stop
if step == None:
stride = 1
else:
stride = step
return self.__data[index:end:stride]
The __getitem__() method will receive a slice object when the object is sliced. Simply look at the start, stop, and step members of the slice object in order to get the components for the slice.
>>> class C(object):
... def __getitem__(self, val):
... print val
...
>>> c = C()
>>> c[3]
3
>>> c[3:4]
slice(3, 4, None)
>>> c[3:4:-2]
slice(3, 4, -2)
>>> c[():1j:'a']
slice((), 1j, 'a')
I have a "synthetic" list (one where the data is larger than you would want to create in memory) and my __getitem__ looks like this:
def __getitem__(self, key):
if isinstance(key, slice):
# Get the start, stop, and step from the slice
return [self[ii] for ii in xrange(*key.indices(len(self)))]
elif isinstance(key, int):
if key < 0: # Handle negative indices
key += len(self)
if key < 0 or key >= len(self):
raise IndexError, "The index (%d) is out of range." % key
return self.getData(key) # Get the data from elsewhere
else:
raise TypeError, "Invalid argument type."
The slice doesn't return the same type, which is a no-no, but it works for me.
How to define the getitem class to handle both plain indexes and slicing?
Slice objects gets automatically created when you use a colon in the subscript notation - and that is what is passed to __getitem__. Use isinstance to check if you have a slice object:
from __future__ import print_function
class Sliceable(object):
def __getitem__(self, subscript):
if isinstance(subscript, slice):
# do your handling for a slice object:
print(subscript.start, subscript.stop, subscript.step)
else:
# Do your handling for a plain index
print(subscript)
Say we were using a range object, but we want slices to return lists instead of new range objects (as it does):
>>> range(1,100, 4)[::-1]
range(97, -3, -4)
We can't subclass range because of internal limitations, but we can delegate to it:
class Range:
"""like builtin range, but when sliced gives a list"""
__slots__ = "_range"
def __init__(self, *args):
self._range = range(*args) # takes no keyword arguments.
def __getattr__(self, name):
return getattr(self._range, name)
def __getitem__(self, subscript):
result = self._range.__getitem__(subscript)
if isinstance(subscript, slice):
return list(result)
else:
return result
r = Range(100)
We don't have a perfectly replaceable Range object, but it's fairly close:
>>> r[1:3]
[1, 2]
>>> r[1]
1
>>> 2 in r
True
>>> r.count(3)
1
To better understand the slice notation, here's example usage of Sliceable:
>>> sliceme = Sliceable()
>>> sliceme[1]
1
>>> sliceme[2]
2
>>> sliceme[:]
None None None
>>> sliceme[1:]
1 None None
>>> sliceme[1:2]
1 2 None
>>> sliceme[1:2:3]
1 2 3
>>> sliceme[:2:3]
None 2 3
>>> sliceme[::3]
None None 3
>>> sliceme[::]
None None None
>>> sliceme[:]
None None None
Python 2, be aware:
In Python 2, there's a deprecated method that you may need to override when subclassing some builtin types.
From the datamodel documentation:
object.__getslice__(self, i, j)
Deprecated since version 2.0: Support slice objects as parameters to the __getitem__() method. (However, built-in types in CPython currently still implement __getslice__(). Therefore, you have to override it in derived classes when implementing slicing.)
This is gone in Python 3.
To extend Aaron's answer, for things like numpy, you can do multi-dimensional slicing by checking to see if given is a tuple:
class Sliceable(object):
def __getitem__(self, given):
if isinstance(given, slice):
# do your handling for a slice object:
print("slice", given.start, given.stop, given.step)
elif isinstance(given, tuple):
print("multidim", given)
else:
# Do your handling for a plain index
print("plain", given)
sliceme = Sliceable()
sliceme[1]
sliceme[::]
sliceme[1:, ::2]
```
Output:
('plain', 1)
('slice', None, None, None)
('multidim', (slice(1, None, None), slice(None, None, 2)))
The correct way to do this is to have __getitem__ take one parameter, which can either be a number or a slice object.
I have nested dictionaries that may contain other dictionaries or lists. I need to be able to compare a list (or set, really) of these dictionaries to show that they are equal.
The order of the list is not uniform. Typically, I would turn the list into a set, but it is not possible since there are values that are also dictionaries.
a = {'color': 'red'}
b = {'shape': 'triangle'}
c = {'children': [{'color': 'red'}, {'age': 8},]}
test_a = [a, b, c]
test_b = [b, c, a]
print(test_a == test_b) # False
print(set(test_a) == set(test_b)) # TypeError: unhashable type: 'dict'
Is there a good way to approach this to show that test_a has the same contents as test_b?
You can use a simple loop to check if each of one list is in the other:
def areEqual(a, b):
if len(a) != len(b):
return False
for d in a:
if d not in b:
return False
return True
I suggest writing a function that turns any Python object into something orderable, with its contents, if it has any, in sorted order. If we call it canonicalize, we can compare nested objects with:
canonicalize(test_a) == canonicalize(test_b)
Here's my attempt at writing a canonicalize function:
def canonicalize(x):
if isinstance(x, dict):
x = sorted((canonicalize(k), canonicalize(v)) for k, v in x.items())
elif isinstance(x, collections.abc.Iterable) and not isinstance(x, str):
x = sorted(map(canonicalize, x))
else:
try:
bool(x < x) # test for unorderable types like complex
except TypeError:
x = repr(x) # replace with something orderable
return x
This should work for most Python objects. It won't work for lists of heterogeneous items, containers that contain themselves (which will cause the function to hit the recursion limit), nor float('nan') (which has bizarre comparison behavior, and so may mess up the sorting of any container it's in).
It's possible that this code will do the wrong thing for non-iterable, unorderable objects, if they don't have a repr function that describes all the data that makes up their value (e.g. what is tested by ==). I picked repr as it will work on any kind of object and might get it right (it works for complex, for example). It should also work as desired for classes that have a repr that looks like a constructor call. For classes that have inherited object.__repr__ and so have repr output like <Foo object at 0xXXXXXXXX> it at least won't crash, though the objects will be compared by identity rather than value. I don't think there's any truly universal solution, and you can add some special cases for classes you expect to find in your data if they don't work with repr.
If the elements in both lists are shallow, the idea of sorting them, and then comparing with equality can work. The problem with #Alex's solution is that he is only using "id" - but if instead of id, one uses a function that will sort dictionaries properly, things shuld just work:
def sortkey(element):
if isinstance(element, dict):
element = sorted(element.items())
return repr(element)
sorted(test_a, key=sortkey) == sorted(test_b, key=sotrkey)
(I use an repr to wrap the key because it will cast all elements to string before comparison, which will avoid typerror if different elements are of unorderable types - which would almost certainly happen if you are using Python 3.x)
Just to be clear, if your dictionaries and lists have nested dictionaries themselves, you should use the answer by #m_callens. If your inner lists are also unorderd, you can fix this to work, jsut sorting them inside the key function as well.
In this case they are the same dicts so you can compare ids (docs). Note that if you introduced a new dict whose values were identical it would still be treated differently. I.e. d = {'color': 'red'} would be treated as not equal to a.
sorted(map(id, test_a)) == sorted(map(id, test_b))
As #jsbueno points out, you can do this with the kwarg key.
sorted(test_a, key=id) == sorted(test_b, key=id)
An elegant and relatively fast solution:
class QuasiUnorderedList(list):
def __eq__(self, other):
"""This method isn't as ineffiecient as you think! It runs in O(1 + 2 + 3 + ... + n) time,
possibly better than recursively freezing/checking all the elements."""
for item in self:
for otheritem in other:
if otheritem == item:
break
else:
# no break was reached, item not found.
return False
return True
This runs in O(1 + 2 + 3 + ... + n) flat. While slow for dictionaries of low depth, this is faster for dictionaries of high depth.
Here's a considerably longer snippet which is faster for dictionaries where depth is low and length is high.
class FrozenDict(collections.Mapping, collections.Hashable): # collections.Hashable = portability
"""Adapated from http://stackoverflow.com/a/2704866/1459669"""
def __init__(self, *args, **kwargs):
self._d = dict(*args, **kwargs)
self._hash = None
def __iter__(self):
return iter(self._d)
def __len__(self):
return len(self._d)
def __getitem__(self, key):
return self._d[key]
def __hash__(self):
# It would have been simpler and maybe more obvious to
# use hash(tuple(sorted(self._d.iteritems()))) from this discussion
# so far, but this solution is O(n). I don't know what kind of
# n we are going to run into, but sometimes it's hard to resist the
# urge to optimize when it will gain improved algorithmic performance.
# Now thread safe by CrazyPython
if self._hash is None:
_hash = 0
for pair in self.iteritems():
_hash ^= hash(pair)
self._hash = _hash
return _hash
def freeze(obj):
if type(obj) in (str, int, ...): # other immutable atoms you store in your data structure
return obj
elif issubclass(type(obj), list): # ugly but needed
return set(freeze(item) for item in obj)
elif issubclass(type(obj), dict): # for defaultdict, etc.
return FrozenDict({key: freeze(value) for key, value in obj.items()})
else:
raise NotImplementedError("freeze() doesn't know how to freeze " + type(obj).__name__ + " objects!")
class FreezableList(list, collections.Hashable):
_stored_freeze = None
_hashed_self = None
def __eq__(self, other):
if self._stored_freeze and (self._hashed_self == self):
frozen = self._stored_freeze
else:
frozen = freeze(self)
if frozen is not self._stored_freeze:
self._stored_hash = frozen
return frozen == freeze(other)
def __hash__(self):
if self._stored_freeze and (self._hashed_self == self):
frozen = self._stored_freeze
else:
frozen = freeze(self)
if frozen is not self._stored_freeze:
self._stored_hash = frozen
return hash(frozen)
class UncachedFreezableList(list, collections.Hashable):
def __eq__(self, other):
"""No caching version of __eq__. May be faster.
Don't forget to get rid of the declarations at the top of the class!
Considerably more elegant."""
return freeze(self) == freeze(other)
def __hash__(self):
"""No caching version of __hash__. See the notes in the docstring of __eq__2"""
return hash(freeze(self))
Test all three (QuasiUnorderedList, FreezableList, and UncachedFreezableList) and see which one is faster in your situation. I'll betcha it's faster than the other solutions.
Does python have immutable lists?
Suppose I wish to have the functionality of an ordered collection of elements, but which I want to guarantee will not change, how can this be implemented? Lists are ordered but they can be mutated.
Yes. It's called a tuple.
So, instead of [1,2] which is a list and which can be mutated, (1,2) is a tuple and cannot.
Further Information:
A one-element tuple cannot be instantiated by writing (1), instead, you need to write (1,). This is because the interpreter has various other uses for parentheses.
You can also do away with parentheses altogether: 1,2 is the same as (1,2)
Note that a tuple is not exactly an immutable list. Click here to read more about the differences between lists and tuples
Here is an ImmutableList implementation. The underlying list is not exposed in any direct data member. Still, it can be accessed using the closure property of the member function. If we follow the convention of not modifying the contents of closure using the above property, this implementation will serve the purpose. Instance of this ImmutableList class can be used anywhere a normal python list is expected.
from functools import reduce
__author__ = 'hareesh'
class ImmutableList:
"""
An unmodifiable List class which uses a closure to wrap the original list.
Since nothing is truly private in python, even closures can be accessed and
modified using the __closure__ member of a function. As, long as this is
not done by the client, this can be considered as an unmodifiable list.
This is a wrapper around the python list class
which is passed in the constructor while creating an instance of this class.
The second optional argument to the constructor 'copy_input_list' specifies
whether to make a copy of the input list and use it to create the immutable
list. To make the list truly immutable, this has to be set to True. The
default value is False, which makes this a mere wrapper around the input
list. In scenarios where the input list handle is not available to other
pieces of code, for modification, this approach is fine. (E.g., scenarios
where the input list is created as a local variable within a function OR
it is a part of a library for which there is no public API to get a handle
to the list).
The instance of this class can be used in almost all scenarios where a
normal python list can be used. For eg:
01. It can be used in a for loop
02. It can be used to access elements by index i.e. immList[i]
03. It can be clubbed with other python lists and immutable lists. If
lst is a python list and imm is an immutable list, the following can be
performed to get a clubbed list:
ret_list = lst + imm
ret_list = imm + lst
ret_list = imm + imm
04. It can be multiplied by an integer to increase the size
(imm * 4 or 4 * imm)
05. It can be used in the slicing operator to extract sub lists (imm[3:4] or
imm[:3] or imm[4:])
06. The len method can be used to get the length of the immutable list.
07. It can be compared with other immutable and python lists using the
>, <, ==, <=, >= and != operators.
08. Existence of an element can be checked with 'in' clause as in the case
of normal python lists. (e.g. '2' in imm)
09. The copy, count and index methods behave in the same manner as python
lists.
10. The str() method can be used to print a string representation of the
list similar to the python list.
"""
#staticmethod
def _list_append(lst, val):
"""
Private utility method used to append a value to an existing list and
return the list itself (so that it can be used in funcutils.reduce
method for chained invocations.
#param lst: List to which value is to be appended
#param val: The value to append to the list
#return: The input list with an extra element added at the end.
"""
lst.append(val)
return lst
#staticmethod
def _methods_impl(lst, func_id, *args):
"""
This static private method is where all the delegate methods are
implemented. This function should be invoked with reference to the
input list, the function id and other arguments required to
invoke the function
#param list: The list that the Immutable list wraps.
#param func_id: should be the key of one of the functions listed in the
'functions' dictionary, within the method.
#param args: Arguments required to execute the function. Can be empty
#return: The execution result of the function specified by the func_id
"""
# returns iterator of the wrapped list, so that for loop and other
# functions relying on the iterable interface can work.
_il_iter = lambda: lst.__iter__()
_il_get_item = lambda: lst[args[0]] # index access method.
_il_len = lambda: len(lst) # length of the list
_il_str = lambda: lst.__str__() # string function
# Following represent the >, < , >=, <=, ==, != operators.
_il_gt = lambda: lst.__gt__(args[0])
_il_lt = lambda: lst.__lt__(args[0])
_il_ge = lambda: lst.__ge__(args[0])
_il_le = lambda: lst.__le__(args[0])
_il_eq = lambda: lst.__eq__(args[0])
_il_ne = lambda: lst.__ne__(args[0])
# The following is to check for existence of an element with the
# in clause.
_il_contains = lambda: lst.__contains__(args[0])
# * operator with an integer to multiply the list size.
_il_mul = lambda: lst.__mul__(args[0])
# + operator to merge with another list and return a new merged
# python list.
_il_add = lambda: reduce(
lambda x, y: ImmutableList._list_append(x, y), args[0], list(lst))
# Reverse + operator, to have python list as the first operand of the
# + operator.
_il_radd = lambda: reduce(
lambda x, y: ImmutableList._list_append(x, y), lst, list(args[0]))
# Reverse * operator. (same as the * operator)
_il_rmul = lambda: lst.__mul__(args[0])
# Copy, count and index methods.
_il_copy = lambda: lst.copy()
_il_count = lambda: lst.count(args[0])
_il_index = lambda: lst.index(
args[0], args[1], args[2] if args[2] else len(lst))
functions = {0: _il_iter, 1: _il_get_item, 2: _il_len, 3: _il_str,
4: _il_gt, 5: _il_lt, 6: _il_ge, 7: _il_le, 8: _il_eq,
9: _il_ne, 10: _il_contains, 11: _il_add, 12: _il_mul,
13: _il_radd, 14: _il_rmul, 15: _il_copy, 16: _il_count,
17: _il_index}
return functions[func_id]()
def __init__(self, input_lst, copy_input_list=False):
"""
Constructor of the Immutable list. Creates a dynamic function/closure
that wraps the input list, which can be later passed to the
_methods_impl static method defined above. This is
required to avoid maintaining the input list as a data member, to
prevent the caller from accessing and modifying it.
#param input_lst: The input list to be wrapped by the Immutable list.
#param copy_input_list: specifies whether to clone the input list and
use the clone in the instance. See class documentation for more
details.
#return:
"""
assert(isinstance(input_lst, list))
lst = list(input_lst) if copy_input_list else input_lst
self._delegate_fn = lambda func_id, *args: \
ImmutableList._methods_impl(lst, func_id, *args)
# All overridden methods.
def __iter__(self): return self._delegate_fn(0)
def __getitem__(self, index): return self._delegate_fn(1, index)
def __len__(self): return self._delegate_fn(2)
def __str__(self): return self._delegate_fn(3)
def __gt__(self, other): return self._delegate_fn(4, other)
def __lt__(self, other): return self._delegate_fn(5, other)
def __ge__(self, other): return self._delegate_fn(6, other)
def __le__(self, other): return self._delegate_fn(7, other)
def __eq__(self, other): return self._delegate_fn(8, other)
def __ne__(self, other): return self._delegate_fn(9, other)
def __contains__(self, item): return self._delegate_fn(10, item)
def __add__(self, other): return self._delegate_fn(11, other)
def __mul__(self, other): return self._delegate_fn(12, other)
def __radd__(self, other): return self._delegate_fn(13, other)
def __rmul__(self, other): return self._delegate_fn(14, other)
def copy(self): return self._delegate_fn(15)
def count(self, value): return self._delegate_fn(16, value)
def index(self, value, start=0, stop=0):
return self._delegate_fn(17, value, start, stop)
def main():
lst1 = ['a', 'b', 'c']
lst2 = ['p', 'q', 'r', 's']
imm1 = ImmutableList(lst1)
imm2 = ImmutableList(lst2)
print('Imm1 = ' + str(imm1))
print('Imm2 = ' + str(imm2))
add_lst1 = lst1 + imm1
print('Liist + Immutable List: ' + str(add_lst1))
add_lst2 = imm1 + lst2
print('Immutable List + List: ' + str(add_lst2))
add_lst3 = imm1 + imm2
print('Immutable Liist + Immutable List: ' + str(add_lst3))
is_in_list = 'a' in lst1
print("Is 'a' in lst1 ? " + str(is_in_list))
slice1 = imm1[2:]
slice2 = imm2[2:4]
slice3 = imm2[:3]
print('Slice 1: ' + str(slice1))
print('Slice 2: ' + str(slice2))
print('Slice 3: ' + str(slice3))
imm1_times_3 = imm1 * 3
print('Imm1 Times 3 = ' + str(imm1_times_3))
three_times_imm2 = 3 * imm2
print('3 Times Imm2 = ' + str(three_times_imm2))
# For loop
print('Imm1 in For Loop: ', end=' ')
for x in imm1:
print(x, end=' ')
print()
print("3rd Element in Imm1: '" + imm1[2] + "'")
# Compare lst1 and imm1
lst1_eq_imm1 = lst1 == imm1
print("Are lst1 and imm1 equal? " + str(lst1_eq_imm1))
imm2_eq_lst1 = imm2 == lst1
print("Are imm2 and lst1 equal? " + str(imm2_eq_lst1))
imm2_not_eq_lst1 = imm2 != lst1
print("Are imm2 and lst1 different? " + str(imm2_not_eq_lst1))
# Finally print the immutable lists again.
print("Imm1 = " + str(imm1))
print("Imm2 = " + str(imm2))
# The following statemetns will give errors.
# imm1[3] = 'h'
# print(imm1)
# imm1.append('d')
# print(imm1)
if __name__ == '__main__':
main()
You can simulate a Lisp-style immutable singly-linked list using two-element tuples (note: this is different than the any-element tuple answer, which creates a tuple that's much less flexible):
nil = ()
cons = lambda ele, l: (ele, l)
e.g. for the list [1, 2, 3], you would have the following:
l = cons(1, cons(2, cons(3, nil))) # (1, (2, (3, ())))
Your standard car and cdr functions are straightforward:
car = lambda l: l[0]
cdr = lambda l: l[1]
Since this list is singly linked, appending to the front is O(1). Since this list is immutable, if the underlying elements in the list are also immutable, then you can safely share any sublist to be reused in another list.
This question deserves a modern answer, now that type annotations and type checking via mypy are getting more popular.
Replacing a List[T] by a tuple may not be the ideal solution when using type annotations. Conceptually a list has a generic arity of 1, i.e., they have a single generic argument T (of course, this argument can be a Union[A, B, C, ...] to account for heterogeneously typed lists). In contrast tuples are inherently variadic generics Tuple[A, B, C, ...]. This makes tuples an awkward list replacement.
In fact, type checking offers another possibility: It is possible to annotate variables as immutable lists by using typing.Sequence, which corresponds to the type of the immutable interface collections.abc.Sequence. For example:
from typing import Sequence
def f(immutable_list: Sequence[str]) -> None:
# We want to prevent mutations like:
immutable_list.append("something")
mutable_list = ["a", "b", "c"]
f(mutable_list)
print(mutable_list)
Of course, in terms of runtime behavior this isn't immutable, i.e., the Python interpreter will happily mutate immutable_list, and the output would be ["a", "b", "c", "something"].
However, if your project uses a type checker like mypy, it will reject the code with:
immutable_lists_1.py:6: error: "Sequence[str]" has no attribute "append"
Found 1 error in 1 file (checked 1 source file)
So under the hood you can continue to use regular lists, but the type checker can effectively prevent any mutations at type-check time.
Similarly you could prevent modifications of list members e.g. in immutable dataclasses (note that field assignment on a frozen dataclass in fact is prevent at runtime):
#dataclass(frozen=True)
class ImmutableData:
immutable_list: Sequence[str]
def f(immutable_data: ImmutableData) -> None:
# mypy will prevent mutations here as well:
immutable_data.immutable_list.append("something")
The same principle can be used for dicts via typing.Mapping.
But if there is a tuple of arrays and tuples, then the array inside a tuple can be modified.
>>> a
([1, 2, 3], (4, 5, 6))
>>> a[0][0] = 'one'
>>> a
(['one', 2, 3], (4, 5, 6))
List and Tuple have a difference in their working style.
In LIST we can make changes after its creation but if you want an ordered sequence in which no changes can be applied in the future you can use TUPLE.
further information::
1) the LIST is mutable that means you can make changes in it after its creation
2) In Tuple, we can not make changes once it created
3) the List syntax is
abcd=[1,'avn',3,2.0]
4) the syntax for Tuple is
abcd=(1,'avn',3,2.0)
or abcd= 1,'avn',3,2.0 it is also correct
Instead of tuple, you can use frozenset. frozenset creates an immutable set. you can use list as member of frozenset and access every element of list inside frozenset using single for loop.
As an exercise, and mostly for my own amusement, I'm implementing a backtracking packrat parser. The inspiration for this is i'd like to have a better idea about how hygenic macros would work in an algol-like language (as apposed to the syntax free lisp dialects you normally find them in). Because of this, different passes through the input might see different grammars, so cached parse results are invalid, unless I also store the current version of the grammar along with the cached parse results. (EDIT: a consequence of this use of key-value collections is that they should be immutable, but I don't intend to expose the interface to allow them to be changed, so either mutable or immutable collections are fine)
The problem is that python dicts cannot appear as keys to other dicts. Even using a tuple (as I'd be doing anyways) doesn't help.
>>> cache = {}
>>> rule = {"foo":"bar"}
>>> cache[(rule, "baz")] = "quux"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>>
I guess it has to be tuples all the way down. Now the python standard library provides approximately what i'd need, collections.namedtuple has a very different syntax, but can be used as a key. continuing from above session:
>>> from collections import namedtuple
>>> Rule = namedtuple("Rule",rule.keys())
>>> cache[(Rule(**rule), "baz")] = "quux"
>>> cache
{(Rule(foo='bar'), 'baz'): 'quux'}
Ok. But I have to make a class for each possible combination of keys in the rule I would want to use, which isn't so bad, because each parse rule knows exactly what parameters it uses, so that class can be defined at the same time as the function that parses the rule.
Edit: An additional problem with namedtuples is that they are strictly positional. Two tuples that look like they should be different can in fact be the same:
>>> you = namedtuple("foo",["bar","baz"])
>>> me = namedtuple("foo",["bar","quux"])
>>> you(bar=1,baz=2) == me(bar=1,quux=2)
True
>>> bob = namedtuple("foo",["baz","bar"])
>>> you(bar=1,baz=2) == bob(bar=1,baz=2)
False
tl'dr: How do I get dicts that can be used as keys to other dicts?
Having hacked a bit on the answers, here's the more complete solution I'm using. Note that this does a bit extra work to make the resulting dicts vaguely immutable for practical purposes. Of course it's still quite easy to hack around it by calling dict.__setitem__(instance, key, value) but we're all adults here.
class hashdict(dict):
"""
hashable dict implementation, suitable for use as a key into
other dicts.
>>> h1 = hashdict({"apples": 1, "bananas":2})
>>> h2 = hashdict({"bananas": 3, "mangoes": 5})
>>> h1+h2
hashdict(apples=1, bananas=3, mangoes=5)
>>> d1 = {}
>>> d1[h1] = "salad"
>>> d1[h1]
'salad'
>>> d1[h2]
Traceback (most recent call last):
...
KeyError: hashdict(bananas=3, mangoes=5)
based on answers from
http://stackoverflow.com/questions/1151658/python-hashable-dicts
"""
def __key(self):
return tuple(sorted(self.items()))
def __repr__(self):
return "{0}({1})".format(self.__class__.__name__,
", ".join("{0}={1}".format(
str(i[0]),repr(i[1])) for i in self.__key()))
def __hash__(self):
return hash(self.__key())
def __setitem__(self, key, value):
raise TypeError("{0} does not support item assignment"
.format(self.__class__.__name__))
def __delitem__(self, key):
raise TypeError("{0} does not support item assignment"
.format(self.__class__.__name__))
def clear(self):
raise TypeError("{0} does not support item assignment"
.format(self.__class__.__name__))
def pop(self, *args, **kwargs):
raise TypeError("{0} does not support item assignment"
.format(self.__class__.__name__))
def popitem(self, *args, **kwargs):
raise TypeError("{0} does not support item assignment"
.format(self.__class__.__name__))
def setdefault(self, *args, **kwargs):
raise TypeError("{0} does not support item assignment"
.format(self.__class__.__name__))
def update(self, *args, **kwargs):
raise TypeError("{0} does not support item assignment"
.format(self.__class__.__name__))
# update is not ok because it mutates the object
# __add__ is ok because it creates a new object
# while the new object is under construction, it's ok to mutate it
def __add__(self, right):
result = hashdict(self)
dict.update(result, right)
return result
if __name__ == "__main__":
import doctest
doctest.testmod()
Here is the easy way to make a hashable dictionary. Just remember not to mutate them after embedding in another dictionary for obvious reasons.
class hashabledict(dict):
def __hash__(self):
return hash(tuple(sorted(self.items())))
Hashables should be immutable -- not enforcing this but TRUSTING you not to mutate a dict after its first use as a key, the following approach would work:
class hashabledict(dict):
def __key(self):
return tuple((k,self[k]) for k in sorted(self))
def __hash__(self):
return hash(self.__key())
def __eq__(self, other):
return self.__key() == other.__key()
If you DO need to mutate your dicts and STILL want to use them as keys, complexity explodes hundredfolds -- not to say it can't be done, but I'll wait until a VERY specific indication before I get into THAT incredible morass!-)
All that is needed to make dictionaries usable for your purpose is to add a __hash__ method:
class Hashabledict(dict):
def __hash__(self):
return hash(frozenset(self))
Note, the frozenset conversion will work for all dictionaries (i.e. it doesn't require the keys to be sortable). Likewise, there is no restriction on the dictionary values.
If there are many dictionaries with identical keys but with distinct values, it is necessary to have the hash take the values into account. The fastest way to do that is:
class Hashabledict(dict):
def __hash__(self):
return hash((frozenset(self), frozenset(self.itervalues())))
This is quicker than frozenset(self.iteritems()) for two reasons. First, the frozenset(self) step reuses the hash values stored in the dictionary, saving unnecessary calls to hash(key). Second, using itervalues will access the values directly and avoid the many memory allocator calls using by items to form new many key/value tuples in memory every time you do a lookup.
The given answers are okay, but they could be improved by using frozenset(...) instead of tuple(sorted(...)) to generate the hash:
>>> import timeit
>>> timeit.timeit('hash(tuple(sorted(d.iteritems())))', "d = dict(a=3, b='4', c=2345, asdfsdkjfew=0.23424, x='sadfsadfadfsaf')")
4.7758948802947998
>>> timeit.timeit('hash(frozenset(d.iteritems()))', "d = dict(a=3, b='4', c=2345, asdfsdkjfew=0.23424, x='sadfsadfadfsaf')")
1.8153600692749023
The performance advantage depends on the content of the dictionary, but in most cases I've tested, hashing with frozenset is at least 2 times faster (mainly because it does not need to sort).
A reasonably clean, straightforward implementation is
import collections
class FrozenDict(collections.Mapping):
"""Don't forget the docstrings!!"""
def __init__(self, *args, **kwargs):
self._d = dict(*args, **kwargs)
def __iter__(self):
return iter(self._d)
def __len__(self):
return len(self._d)
def __getitem__(self, key):
return self._d[key]
def __hash__(self):
return hash(tuple(sorted(self._d.iteritems())))
I keep coming back to this topic... Here's another variation. I'm uneasy with subclassing dict to add a __hash__ method; There's virtually no escape from the problem that dict's are mutable, and trusting that they won't change seems like a weak idea. So I've instead looked at building a mapping based on a builtin type that is itself immutable. although tuple is an obvious choice, accessing values in it implies a sort and a bisect; not a problem, but it doesn't seem to be leveraging much of the power of the type it's built on.
What if you jam key, value pairs into a frozenset? What would that require, how would it work?
Part 1, you need a way of encoding the 'item's in such a way that a frozenset will treat them mainly by their keys; I'll make a little subclass for that.
import collections
class pair(collections.namedtuple('pair_base', 'key value')):
def __hash__(self):
return hash((self.key, None))
def __eq__(self, other):
if type(self) != type(other):
return NotImplemented
return self.key == other.key
def __repr__(self):
return repr((self.key, self.value))
That alone puts you in spitting distance of an immutable mapping:
>>> frozenset(pair(k, v) for k, v in enumerate('abcd'))
frozenset([(0, 'a'), (2, 'c'), (1, 'b'), (3, 'd')])
>>> pairs = frozenset(pair(k, v) for k, v in enumerate('abcd'))
>>> pair(2, None) in pairs
True
>>> pair(5, None) in pairs
False
>>> goal = frozenset((pair(2, None),))
>>> pairs & goal
frozenset([(2, None)])
D'oh! Unfortunately, when you use the set operators and the elements are equal but not the same object; which one ends up in the return value is undefined, we'll have to go through some more gyrations.
>>> pairs - (pairs - goal)
frozenset([(2, 'c')])
>>> iter(pairs - (pairs - goal)).next().value
'c'
However, looking values up in this way is cumbersome, and worse, creates lots of intermediate sets; that won't do! We'll create a 'fake' key-value pair to get around it:
class Thief(object):
def __init__(self, key):
self.key = key
def __hash__(self):
return hash(pair(self.key, None))
def __eq__(self, other):
self.value = other.value
return pair(self.key, None) == other
Which results in the less problematic:
>>> thief = Thief(2)
>>> thief in pairs
True
>>> thief.value
'c'
That's all the deep magic; the rest is wrapping it all up into something that has an interface like a dict. Since we're subclassing from frozenset, which has a very different interface, there's quite a lot of methods; we get a little help from collections.Mapping, but most of the work is overriding the frozenset methods for versions that work like dicts, instead:
class FrozenDict(frozenset, collections.Mapping):
def __new__(cls, seq=()):
return frozenset.__new__(cls, (pair(k, v) for k, v in seq))
def __getitem__(self, key):
thief = Thief(key)
if frozenset.__contains__(self, thief):
return thief.value
raise KeyError(key)
def __eq__(self, other):
if not isinstance(other, FrozenDict):
return dict(self.iteritems()) == other
if len(self) != len(other):
return False
for key, value in self.iteritems():
try:
if value != other[key]:
return False
except KeyError:
return False
return True
def __hash__(self):
return hash(frozenset(self.iteritems()))
def get(self, key, default=None):
thief = Thief(key)
if frozenset.__contains__(self, thief):
return thief.value
return default
def __iter__(self):
for item in frozenset.__iter__(self):
yield item.key
def iteritems(self):
for item in frozenset.__iter__(self):
yield (item.key, item.value)
def iterkeys(self):
for item in frozenset.__iter__(self):
yield item.key
def itervalues(self):
for item in frozenset.__iter__(self):
yield item.value
def __contains__(self, key):
return frozenset.__contains__(self, pair(key, None))
has_key = __contains__
def __repr__(self):
return type(self).__name__ + (', '.join(repr(item) for item in self.iteritems())).join('()')
#classmethod
def fromkeys(cls, keys, value=None):
return cls((key, value) for key in keys)
which, ultimately, does answer my own question:
>>> myDict = {}
>>> myDict[FrozenDict(enumerate('ab'))] = 5
>>> FrozenDict(enumerate('ab')) in myDict
True
>>> FrozenDict(enumerate('bc')) in myDict
False
>>> FrozenDict(enumerate('ab', 3)) in myDict
False
>>> myDict[FrozenDict(enumerate('ab'))]
5
The accepted answer by #Unknown, as well as the answer by #AlexMartelli work perfectly fine, but only under the following constraints:
The dictionary's values must be hashable. For example, hash(hashabledict({'a':[1,2]})) will raise TypeError.
Keys must support comparison operation. For example, hash(hashabledict({'a':'a', 1:1})) will raise TypeError.
The comparison operator on keys imposes total ordering. For example, if the two keys in a dictionary are frozenset((1,2,3)) and frozenset((4,5,6)), they compare unequal in both directions. Therefore, sorting the items of a dictionary with such keys can result in an arbitrary order, and therefore will violate the rule that equal objects must have the same hash value.
The much faster answer by #ObenSonne lifts the constraints 2 and 3, but is still bound by constraint 1 (values must be hashable).
The faster yet answer by #RaymondHettinger lifts all 3 constraints because it does not include .values() in the hash calculation. However, its performance is good only if:
Most of the (non-equal) dictionaries that need to be hashed have do not identical .keys().
If this condition isn't satisfied, the hash function will still be valid, but may cause too many collisions. For example, in the extreme case where all the dictionaries are generated from a website template (field names as keys, user input as values), the keys will always be the same, and the hash function will return the same value for all the inputs. As a result, a hashtable that relies on such a hash function will become as slow as a list when retrieving an item (O(N) instead of O(1)).
I think the following solution will work reasonably well even if all 4 constraints I listed above are violated. It has an additional advantage that it can hash not only dictionaries, but any containers, even if they have nested mutable containers.
I'd much appreciate any feedback on this, since I only tested this lightly so far.
# python 3.4
import collections
import operator
import sys
import itertools
import reprlib
# a wrapper to make an object hashable, while preserving equality
class AutoHash:
# for each known container type, we can optionally provide a tuple
# specifying: type, transform, aggregator
# even immutable types need to be included, since their items
# may make them unhashable
# transformation may be used to enforce the desired iteration
# the result of a transformation must be an iterable
# default: no change; for dictionaries, we use .items() to see values
# usually transformation choice only affects efficiency, not correctness
# aggregator is the function that combines all items into one object
# default: frozenset; for ordered containers, we can use tuple
# aggregator choice affects both efficiency and correctness
# e.g., using a tuple aggregator for a set is incorrect,
# since identical sets may end up with different hash values
# frozenset is safe since at worst it just causes more collisions
# unfortunately, no collections.ABC class is available that helps
# distinguish ordered from unordered containers
# so we need to just list them out manually as needed
type_info = collections.namedtuple(
'type_info',
'type transformation aggregator')
ident = lambda x: x
# order matters; first match is used to handle a datatype
known_types = (
# dict also handles defaultdict
type_info(dict, lambda d: d.items(), frozenset),
# no need to include set and frozenset, since they are fine with defaults
type_info(collections.OrderedDict, ident, tuple),
type_info(list, ident, tuple),
type_info(tuple, ident, tuple),
type_info(collections.deque, ident, tuple),
type_info(collections.Iterable, ident, frozenset) # other iterables
)
# hash_func can be set to replace the built-in hash function
# cache can be turned on; if it is, cycles will be detected,
# otherwise cycles in a data structure will cause failure
def __init__(self, data, hash_func=hash, cache=False, verbose=False):
self._data=data
self.hash_func=hash_func
self.verbose=verbose
self.cache=cache
# cache objects' hashes for performance and to deal with cycles
if self.cache:
self.seen={}
def hash_ex(self, o):
# note: isinstance(o, Hashable) won't check inner types
try:
if self.verbose:
print(type(o),
reprlib.repr(o),
self.hash_func(o),
file=sys.stderr)
return self.hash_func(o)
except TypeError:
pass
# we let built-in hash decide if the hash value is worth caching
# so we don't cache the built-in hash results
if self.cache and id(o) in self.seen:
return self.seen[id(o)][0] # found in cache
# check if o can be handled by decomposing it into components
for typ, transformation, aggregator in AutoHash.known_types:
if isinstance(o, typ):
# another option is:
# result = reduce(operator.xor, map(_hash_ex, handler(o)))
# but collisions are more likely with xor than with frozenset
# e.g. hash_ex([1,2,3,4])==0 with xor
try:
# try to frozenset the actual components, it's faster
h = self.hash_func(aggregator(transformation(o)))
except TypeError:
# components not hashable with built-in;
# apply our extended hash function to them
h = self.hash_func(aggregator(map(self.hash_ex, transformation(o))))
if self.cache:
# storing the object too, otherwise memory location will be reused
self.seen[id(o)] = (h, o)
if self.verbose:
print(type(o), reprlib.repr(o), h, file=sys.stderr)
return h
raise TypeError('Object {} of type {} not hashable'.format(repr(o), type(o)))
def __hash__(self):
return self.hash_ex(self._data)
def __eq__(self, other):
# short circuit to save time
if self is other:
return True
# 1) type(self) a proper subclass of type(other) => self.__eq__ will be called first
# 2) any other situation => lhs.__eq__ will be called first
# case 1. one side is a subclass of the other, and AutoHash.__eq__ is not overridden in either
# => the subclass instance's __eq__ is called first, and we should compare self._data and other._data
# case 2. neither side is a subclass of the other; self is lhs
# => we can't compare to another type; we should let the other side decide what to do, return NotImplemented
# case 3. neither side is a subclass of the other; self is rhs
# => we can't compare to another type, and the other side already tried and failed;
# we should return False, but NotImplemented will have the same effect
# any other case: we won't reach the __eq__ code in this class, no need to worry about it
if isinstance(self, type(other)): # identifies case 1
return self._data == other._data
else: # identifies cases 2 and 3
return NotImplemented
d1 = {'a':[1,2], 2:{3:4}}
print(hash(AutoHash(d1, cache=True, verbose=True)))
d = AutoHash(dict(a=1, b=2, c=3, d=[4,5,6,7], e='a string of chars'),cache=True, verbose=True)
print(hash(d))
You might also want to add these two methods to get the v2 pickling protocol work with hashdict instances. Otherwise cPickle will try to use hashdict.____setitem____ resulting in a TypeError. Interestingly, with the other two versions of the protocol your code works just fine.
def __setstate__(self, objstate):
for k,v in objstate.items():
dict.__setitem__(self,k,v)
def __reduce__(self):
return (hashdict, (), dict(self),)
serialize the dict as string with json package:
d = {'a': 1, 'b': 2}
s = json.dumps(d)
restore the dict when you need:
d2 = json.loads(s)
If you don't put numbers in the dictionary and you never lose the variables containing your dictionaries, you can do this:
cache[id(rule)] = "whatever"
since id() is unique for every dictionary
EDIT:
Oh sorry, yeah in that case what the other guys said would be better. I think you could also serialize your dictionaries as a string, like
cache[ 'foo:bar' ] = 'baz'
If you need to recover your dictionaries from the keys though, then you'd have to do something uglier like
cache[ 'foo:bar' ] = ( {'foo':'bar'}, 'baz' )
I guess the advantage of this is that you wouldn't have to write as much code.