Get reference to Python dict key - python

In Python (3.7 and above) I would like to obtain a reference to a dict key. More precisely, let d be a dict where the keys are strings. In the following code, the value of k is potentially stored at two distinct locations in memory (one pointed to by the dict and one pointed to by k), whereas the value of v is stored at only one location (the one pointed to by the dict).
# d is a dict
# k is a string dynamically constructed, in particular not from iterating over d's keys
if k in d:
v = d[k]
# Now store k and v in other data structures
In my case, the dict is very large and the string keys are very long. To keep memory usage down I would like to replace k with a pointer to the corresponding string used by d before storing k in other data structures. Is there a straightforward way of doing this, that is using the keys of the dict as a string pool?
(Footnote: this may seem as premature optimisation, and perhaps it is, but being an old-school C programmer I sleep better at night doing "memory tricks". Joke aside, I do genuinely would like to know the answer out of curiosity, and I am indeed going to run my code on a Raspberry Pi and will probably face memory issues.)

Where does the key k come from? Is it dynamically constructed by something like str.join, + , slicing another string, bytes.decode etc? Is it read from a file or input()? Did you get it from iterating over d at some point? Or does it originate from a literal somewhere in your source code?
In the last two cases, you don't need to worry about it since it is going to be a single instance anyway.
If not, you could use sys.intern to intern your keys. If a == b then sys.intern(a) is sys.intern(b).
Another possible solution, in case you might want to garbage collect the strings at some point or you want to intern some non-string values, like tuples of strings, you could do the following:
# create this dictionary once after `d` has all the right keys
canonical_keys = {key: key for key in d}
k = canonical_keys.get(k, k) # use the same instance if possible
I recommend reading up on Python's data model.

Related

Choose one key arbitrarily in a dictionary without iteration [duplicate]

This question already has answers here:
dict.keys()[0] on Python 3 [duplicate]
(3 answers)
Closed 6 years ago.
I just wanna make sure that in Python dictionaries there's no way to get just a key (with no specific quality or relation to a certain value) but doing iteration. As much as I found out you have to make a list of them by going through the whole dictionary in a loop. Something like this:
list_keys=[k for k in dic.keys()]
The thing is I just need an arbitrary key if the dictionary is not empty and don't care about the rest. I guess iterating over a long dictionary in addition to creation of a long list for just randomly getting a key is a whole lot overhead, isn't it?
Is there a better trick somebody can point out?
Thanks
A lot of the answers here produce a random key but the original question asked for an arbitrary key. There's quite a difference between those two. Randomness has a handful of mathematical/statistical guarantees.
Python dictionaries are not ordered in any meaningful way. So, yes, accessing an arbitrary key requires iteration. But for a single arbitrary key, we do not need to iterate the entire dictionary. The built-in functions next and iter are useful here:
key = next(iter(mapping))
The iter built-in creates an iterator over the keys in the mapping. The iteration order will be arbitrary. The next built-in returns the first item from the iterator. Iterating the whole mapping is not necessary for an arbitrary key.
If you're going to end up deleting the key from the mapping, you may instead use dict.popitem. Here's the docstring:
D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple;
but raise KeyError if D is empty.
You can use random.choice
rand_key = random.choice(dict.keys())
And this will only work in python 2.x, in python 3.x dict.keys returns an iterator, so you'll have to do cast it into a list -
rand_key = random.choice(list(dict.keys()))
So, for example -
import random
d = {'rand1':'hey there', 'rand2':'you love python, I know!', 'rand3' : 'python has a method for everything!'}
random.choice(list(d.keys()))
Output -
rand1
You are correct: there is not a way to get a random key from an ordinary dict without using iteration. Even solutions like random.choice must iterate through the dictionary in the background.
However you could use a sorted dict:
from sortedcontainers import SortedDict as sd
d = sd(dic)
i = random.randrange(len(d))
ran_key = d.iloc[i]
More here:.
http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html
Note that whether or not using something like SortedDict will result in any efficiency gains is going to be entirely dependent upon the actual implementation. If you are creating a lot of SD objects, or adding new keys very often (which have to be sorted), and are only getting a random key occasionally in relation to those other two tasks, you are unlikely to see much of a performance gain.
How about something like this:
import random
arbitrary_key = random.choice( dic.keys() )
BTW, your use of a list comprehension there really makes no sense:
dic.keys() == [k for k in dic.keys()]
check the length of dictionary like this, this should do !!
import random
if len(yourdict) > 0:
randomKey = random.sample(yourdict,1)
print randomKey[0]
else:
do something
randomKey will return a list, as we have passed 1 so it will return list with 1 key and then get the key by using randomKey[0]

Finding a key in a dictionary without knowing its full name

I have a dictionary with a key called ev#### where #### is some number that I do not know ahead of time. There is only one of this type of key in the dictionary and no other key starts with ev.
What's the cleanest way to access that key without knowing what the #### is?
You can try this list comprehension: (ideone)
result = [v for k, v in d.iteritems() if k.startswith('ev')][0]
Or this approach using a generator expression: (ideone)
result = next(v for k, v in d.iteritems() if k.startswith('ev'))
Note that these will both require a linear scan of the items in the dictionary, unlike an ordinary key-lookup which runs in constant time on average (assuming a good hash function). The generator expression however can stop as soon as it finds the key. The list comprehension will always scan the entire dicitonary.
If there is only one such value in the dictionary, I would say it's better to use an approach similar to this:
for k,v in d.iteritems():
if k.startswith('ev'):
result = v
break
else:
raise KeyError() # or set to default value
That way you don't have to loop through every value in the dictionary, but only until you find the key, which should speed up the calculation by ~ 2x on average.
Store the item in the dictionary without the ev prefix in the first place.
If you also need to access it with the prefix, store it both ways.
If there can be multiple prefixes for a given number, use a second dictionary that stores the actual keys associated with each number as a list or sub-dictionary, and use that to find the available keys in the main dictionary matching the number.
If you can't easily do this when the dictionary is initially created (e.g. someone else's code is giving you the dict and you can't change it), and you will be doing a lot of lookups of this sort, it is probably worthwhile to iterate over the dict once and make the second dict, or use a dict to cache the lookups, or something of that sort, to avoid iterating the keys each time.

Python, checksum of a dict

I'm thinking to create a checksum of a dict to know if it was modified or not
For the moment i have that:
>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'
Perhaps a better solution exists?
Note: I want to create an unique id of a dict to create a good Etag.
EDIT: I can have abstract data in the dict.
Something like this:
reduce(lambda x,y : x^y, [hash(item) for item in d.items()])
Take the hash of each (key, value) tuple in the dict and XOR them alltogether.
#katrielalex
If the dict contains unhashable items you could do this:
hash(str(d))
or maybe even better
hash(repr(d))
In Python 3, the hash function is initialized with a random number, which is different for each python session. If that is not acceptable for the intended application, use e.g. zlib.adler32 to build the checksum for a dict:
import zlib
d={'key1':'value1','key2':'value2'}
checksum=0
for item in d.items():
c1 = 1
for t in item:
c1 = zlib.adler32(bytes(repr(t),'utf-8'), c1)
checksum=checksum ^ c1
print(checksum)
I would recommend an approach very similar to the one your propose, but with some extra guarantees:
import hashlib, json
hashlib.md5(json.dumps(d, sort_keys=True, ensure_ascii=True).encode('utf-8')).hexdigest()
sort_keys=True: keep the same hash if the order of your keys changes
ensure_ascii=True: in case you have some non-ascii characters, to make sure the representation does not change
We use this for our ETag.
I don't know whether pickle guarantees you that the hash is serialized the same way every time.
If you only have dictionaries, I would go for o combination of calls to keys(), sorted(), build a string based on the sorted key/value pairs and compute the checksum on that
I think you may not realise some of the subtleties that go into this. The first problem is that the order that items appear in a dict is not defined by the implementation. This means that simply asking for str of a dict doesn't work, because you could have
str(d1) == "{'a':1, 'b':2}"
str(d2) == "{'b':2, 'a':1}"
and these will hash to different values. If you have only hashable items in the dict, you can hash them and then join up their hashes, as #Bart does or simply
hash(tuple(sorted(hash(x) for x in d.items())))
Note the sorted, because you have to ensure that the hashed tuple comes out in the same order irrespective of which order the items appear in the dict. If you have dicts in the dict, you could recurse this, but it will be complicated.
BUT it would be easy to break any implementation like this if you allow arbitrary data in the dictionary, since you can simply write an object with a broken __hash__ implementation and use that. And you can't use id, because then you might have equal items which compare different.
The moral of the story is that hashing dicts isn't supported in Python for a reason.
As you said, you wanted to generate an Etag based on the dictionary content, OrderedDict which preserves the order of the dictionary may be better candidate here. Just iterator through the key,value pairs and construct your Etag string.

How to rewrite this Dictionary For Loop in Python?

I have a Dictionary of Classes where the classes hold attributes that are lists of strings.
I made this function to find out the max number of items are in one of those lists for a particular person.
def find_max_var_amt(some_person) #pass in a patient id number, get back their max number of variables for a type of variable
max_vars=0
for key, value in patients[some_person].__dict__.items():
challenger=len(value)
if max_vars < challenger:
max_vars= challenger
return max_vars
What I want to do is rewrite it so that I do not have to use the .iteritems() function. This find_max_var_amt function works fine as is, but I am converting my code from using a dictionary to be a database using the dbm module, so typical dictionary functions will no longer work for me even though the syntax for assigning and accessing the key:value pairs will be the same. Thanks for your help!
Since dbm doesn't let you iterate over the values directly, you can iterate over the keys. To do so, you could modify your for loop to look like
for key in patients[some_person].__dict__:
value = patients[some_person].__dict__[key]
# then continue as before
I think a bigger issue, though, will be the fact that dbm only stores strings. So you won't be able to store the list directly in the database; you'll have to store a string representation of it. And that means that when you try to compute the length of the list, it won't be as simple as len(value); you'll have to develop some code to figure out the length of the list based on whatever string representation you use. It could just be as simple as len(the_string.split(',')), just be aware that you have to do it.
By the way, your existing function could be rewritten using a generator, like so:
def find_max_var_amt(some_person):
return max(len(value) for value in patients[some_person].__dict__.itervalues())
and if you did it that way, the change to iterating over keys would look like
def find_max_var_amt(some_person):
dct = patients[some_person].__dict__
return max(len(dct[key]) for key in dct)

Most efficient way to update attribute of one instance

I'm creating an arbitrary number of instances (using for loops and ranges). At some event in the future, I need to change an attribute for only one of the instances. What's the best way to do this?
Right now, I'm doing the following:
1) Manage the instances in a list.
2) Iterate through the list to find a key value.
3) Once I find the right object within the list (i.e. key value = value I'm looking for), change whatever attribute I need to change.
for Instance within ListofInstances:
if Instance.KeyValue == SearchValue:
Instance.AttributeToChange = 10
This feels really inefficient: I'm basically iterating over the entire list of instances, even through I only need to change an attribute in one of them.
Should I be storing the Instance references in a structure more suitable for random access (e.g. dictionary with KeyValue as the dictionary key?) Is a dictionary any more efficient in this case? Should I be using something else?
Thanks,
Mike
Should I be storing the Instance references in a structure more suitable for random access (e.g. dictionary with KeyValue as the dictionary key?)
Yes, if you are mapping from a key to a value (which you are in this case), such that one typically accesses an element via its key, then a dict rather than a list is better.
Is a dictionary any more efficient in this case?
Yes, it is much more efficient. A dictionary takes O(1) on average to lookup an item by its key whereas a list takes O(n) to lookup an item by its key, which is what you are currently doing.
Using a Dictionary
# Construct the dictionary
d = {}
# Insert items into the dictionary
d[key1] = value1
d[key2] = value2
# ...
# Checking if an item exists
if key in d:
# Do something requiring d[key]
# such as updating an attribute:
d[key].attr = val
As you mention, you need to keep an auxiliary dictionary with the key value as the key and the instance (or list of instance with that value for their attribute) as the value(s) -- way more efficient. Indeed, there's nothing more efficient than a dictionary for such uses.
It depends on what the other needs of your program are. If all you ever do with these objects is access the one with that particular key value, then sure, a dictionary is perfect. But if you need to preserve the order of the elements, storing them in a dictionary won't do that. (You could store them in both a dict and a list, or there might be a data structure that provides a compromise between random access and order preservation) Alternatively, if more than one object can have the same key value, then you can't store both of them in a single dict at the same time, at least not directly. (You could have a dict of lists or something)

Categories

Resources