I have a collections.OrderedDict with a list of key, value pairs. I would like to compute the index i such that the ith key matches a given value. For example:
food = OrderedDict([('beans',33),('rice',44),('pineapple',55),('chicken',66)])
I want to go from the key chicken to the index 3, or from the key rice to the index 1. I can do this now with
food.keys().index('rice')
but is there any way to leverage the OrderedDict's ability to look things up quickly by key name? Otherwise it seems like the index-finding would be O(N) rather than O(log N), and I have a lot of items.
I suppose I can do this manually by making my own index:
>>> foodIndex = {k:i for i,k in enumerate(food.keys())}
>>> foodIndex
{'chicken': 3, 'rice': 1, 'beans': 0, 'pineapple': 2}
but I was hoping there might be something built in to an OrderedDict.
Basically, no. OrderedDict gets its ability to look things up quickly by key name just by using a regular, unordered dict under the hood. The order information is stored separately in a doubly linked list. Because of this, there's no way to go directly from the key to its index. The order in an OrderedDict is mainly intended to be available for iteration; a key does not "know" its own order.
As others have pointed out, an OrderedDict is just a dictionary that internally remembers what order entries were added to it. However, you can leverage its ability to look-up things quickly by storing the desired index along with the rest of the data for each entry. Here's what I mean:
from collections import OrderedDict
foods = [('beans', 33), ('rice', 44), ('pineapple', 55), ('chicken', 66)]
food = OrderedDict(((v[0], (v[1], i)) for i, v in enumerate(foods))) # saves i
print(food['rice'][1]) # --> 1
print(food['chicken'][1]) # --> 3
The OrderedDict is a subclass of dict which has the ability to traverse its keys in order (and reversed order) by maintaining a doubly linked list. So it does not know the index of a key. It can only traverse the linked list to find the items in O(n) time.
Perusing the source code may be the most satisfying way to confirm that the index is not maintained by OrderedDict. You'll see that no where is an index ever used or obtained.
Related
I'm looking for the fastest way to do the following: given a dictionary and a key value, return the lowest key in the dictionary greater than than the value given. Per this question, the natural way would seem to be to create an OrderedDict, then use bisect on the keys to find the proper key location. The OrderedDict.keys() method doesn't support indexing, so per e.g. this question, one has to convert the keys to a list, before doing bisect or similar.
So once an OrderedDict has been created with its keys in order, in order to access the correct position one has to do the following:
Convert the keys to a list
Do a binary search of the keys with bisect or similar.
Check that this insertion point isn't at the end of the list, before retrieving the key located after this index.
Retrieve the key value in our original OrderedDict.
I'm most concerned about step 1 above, from an efficiency perspective (although all of this looks roundabout to me). Without knowing the details of how Python does the conversion to list, it seems like it would have to be O(n), completely eliminating the savings of using OrderedDict and binary search. I'm hoping someone can tell me whether this assumption I have about step 1 is or isn't correct, and regardless whether or not there may be a better method.
As an alternative, I could simply create a list of tuples, sorted by the first element (key), where the second element is the dict value associated with that key. Then I could pass the key lambda x:x[0] to bisect. This seems reasonable, but I'd prefer to store my key / value pairs more uniformly (e.g. JSON), since that's how it's done with other dicts in the same project that don't need this specific type of comparison.
Here's some example code for a single lookup. Edit: But lest anyone think I'm over-optimizing, the actual dictionary has ~3 million keys, and will be accessed ~7 million times in a batch, daily. So I'm very interested in finding a fast way of doing this.
# Single lookup example
from collections import OrderedDict
from bisect import bisect
d = OrderedDict()
d[5] = 'lowest_value'
d[7] = 'middle_value'
d[12] = 'highest_value'
sample_key = 6 # we want to find the value for the key above this in d, e.g. d[7]
list_of_keys = list(d.keys())
key_insertion_index = bisect(list_of_keys,sample_key)
if key_insertion_index < len(list_of_keys):
next_higher_key = list_of_keys[key_insertion_index]
v = d[next_higher_key]
D = {(1,1):2, (2,3):6, (3,4):12, (0,1):0, (4,9):36}
for (i,j),val in D.items():
print(i,j,"-->",val)
When I loop over the (key,value) pairs of a dictionary, is the order deterministic? How can I loop over them in a random order? Below works when the dictionary is small, but incurs memory error when there are thousands of pairs.
from itertools import permutations
P = list(permutations(D.items()))
for (i,j),val in sample(P,1)[0]:
print(i,j,"-->",val)
It should be relatively simple to shuffle your entries before iterating over them.
from random import shuffle
dict_as_list_of_entries = [*D.items()]
shuffle(dict_as_list_of_entries) # randomizes the order
for (i, j), value in dict_as_list_of_entries:
# do something
This makes sense for cpython3.6 (python3.7) and greater because dictionaries remember their insertion order. On lower versions the result is not deterministic (it's a bit more nuianced than "not deterministic", actually, but this will do).
Note that the output of shuffle can also be made deterministic by seeding the randomizer. "True randomness" is still only a concept - computers come close to achieving it though. Most randomizer routines are a tradeoff between randomness and performance.
Note that you're OOM'ing because you're generating every single permutation of your dictionary (when in reality you just want one, random permutation, not all of them).
When I loop over the (key,value) pairs of a dictionary, is the order deterministic?
This depends on what version of Python you're using.
For Python < 3.6, the order is inconsistent, but not truly random. It will likely be different for different Python implementations, but you can't count on it being the same or not the same for two different people or two different runs.
For Python 3.6 (specifically CPython), the iteration order happens to be the same as the order of insertion, but it's still not "officially" guaranteed
For Python > 3.6, iteration order is explicitly guaranteed to be the order of insertion.
How can I loop over them in a random order?
Try randomizing the order of the keys, then looping over those:
import random
shuffled_keys = random.sample(D.keys(), len(D.keys()))
for k in shuffled_keys:
print(f'{k} --> {D[k]}')
You can make a shuffled list of the dictionary's keys, and then iterate over that. This means you have to make a copy of all the keys, but it avoids making a copy of the values. Depending on what is in your dictionary, this may save you some memory.
import random
d = {(1, 1): 2, (2, 3): 6, (3, 4): 12, (0, 1): 0, (4, 9): 36}
for key in random.sample(d.keys(), k=len(d)):
value = d[key]
i, j = key
print(i, j, "-->", value)
(Disclaimer: I haven't tested whether it actually saves memory over cs95's solution, or other solutions. Intuitions about memory use and performance can often be wrong, so you should test how this code works on your data to see how it compares to other solutions.)
A dict is an unordered set of key-value pairs. When you iterate a dict, it is effectively random. But to explicitly randomize the sequence of key-value pairs, you need to work with a different object that is ordered, like a list. dict.items(), dict.keys(), and dict.values() each return lists, which can be shuffled. hope this help you in any way
I have a dict which is part of a SpriteSheet class of the attribute sprite_info. sprite info holds the names and location and xd/yd of each sprite on the sheet, i.e.
{'stone_Wall' : { 'x': '781', 'xd': '70', 'y': '568', 'yd': '70'} ... }
What I'd like to do is sort the dict by each name. In other words, there are other names in the list list stone_Right and stone_Mid and so on. The one thing they all have in common is stone_.
What's the most efficient way of doing this? My limited experience tells me to just go into a bunch of nested for-loops, but I know there's a better way.
Further clarification:
Once everything is sorted, I would like to separate the dict by name. Using my example, for any key that includes stone or stone_, add it to a new dict within the already existing dict.
You can't sort a dict, as it uses its own sorted tree structure internally to keep efficient.
You can use OrderedDict which will keep track of the order elements are added.
To created a sorted OrderedDict from a dict, sort the key/value pairs from the dict based on the key.
edit: having thought a little more, sorted will compare tuples element by element, so item zero (the key) will be compared first, and will also be unique (as dict has unique keys), so we don't need to do anything clever with the sorted key parameter.
from collections import OrderedDict
# from operator import itemgetter
sprites = dict()
# sorted_sprites = OrderedDict(sorted(sprites.items(), key=itemgetter(0)))
sorted_sprites = OrderedDict(sorted(sprites.items())) # equivalent to above
sorted is a built-in function which returns a list from a sequence.
The key parameter determines how to order the values in the sequence. As we are passing sprites.items() it is getting tuple pairs e.g. ('stone_Wall', { 'x': '781', 'xd': '70', 'y': '568', 'yd': '70'}), so the key we want is the zero-th element of the tuple, 'stone_Wall'.
itemgetter is a functor which will retrieve a particular object (or objects) from a sequence. Here we ask it to get the zero-th.
However, as noted above, default tuple comparison will do this for us. See this related question: python tuple comparison
I've been researching online for a simple way to create an ordered dictionary and landed on OrderedDict and its update method, I've successfully implemented this once but however now the code tends not to sort on the added terms for example the items being placed are:
Doc1: Alpha, zebra, top
Doc2: Andres, tell, exta
Output: Alpha, top, zebra, Andres, exta, tell
My goal is to have Alpha, Andres......, top, zebra
This is the code:
finalindex= collections.OrderedDict()
ctr=0
while ctr < docCtr:
filename = 'dictemp%d.csv' % (ctr,)
ctr+=1
dicTempList = io.openTempDic(filename)
print filename
for key in dicTempList:
if key in finalindex:
print key
for k, v in finalindex.items():
newvalue = v + "," + dicTempList.get(key)
finalindex.update([(key, newvalue)])
else:
finalindex.update([(key, dicTempList.get(key))])
io.saveTempDic(filename,finalindex)
Can someone please assist me?
OrderedDicts remember the order that they were created. If you want it sorted, you need to do that when you create them. Here's how to sort an OrderedDict, an example taken from the docs:
from collections import OrderedDict
d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
sorted_dict = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
This will work with another ordered dict, and I prefer to import the module and reference functions and classes from it for clarity for the reader, so this is done in a slightly different style, but again, to have it sorted, you need to sort it before creating a new OrderedDict:
import collections
ordered_dict=collections.OrderedDict()
ordered_dict['foo'] = 1
ordered_dict['bar'] = 2
ordered_dict['baz'] = 3
sorted_dict = collections.OrderedDict(sorted(ordered_dict.items(),
key=lambda t: t[0]))
and sorted_dict returns:
OrderedDict([('bar', 2), ('baz', 3), ('foo', 1)])
If lambdas are confusing, you can use operator.itemgetter
import operator
get_first = operator.itemgetter(0)
sorted_dict = collections.OrderedDict(sorted(ordered_dict.items(),
key=get_first))
I'm using key arguments to demonstrate their usage in case you want to sort by values, but Python sorts tuples (what dict.items() provides to iterate over by means of a list in Python 2 and an iterator in Python 3) by first element then second and so on, so you can even do this and get the same result:
sorted_dict = collections.OrderedDict(sorted(ordered_dict.items()))
An ordered dictionary is not a sorted dictionary.
From the documentation 8.3. collections — High-performance container datatypes:
OrderedDict dict subclass that remembers the order entries were added
(emphasis mine)
The ordered dictionary is a hash table backed structure that also maintains a linked list along side it that stores the order of which items are inserted. The dictionary, when iterated over, uses that linked list.
This type of structure is very useful for LRU caches where one wants to only maintain the N most recent items requested, and then evict the oldest one when a new one would push it over capacity.
The code is working correctly.
Some explanation of the design philosophy behind this can be found at Why are there no containers sorted by insertion order in Python's standard libraries? which suggests that the lack of sorted structures confuses the "one obvious way to do it" when it comes to selecting which container you want (compare with all the different types of classes implementing Map, Set and List in Java - do you use a LinkedHashMap? or a ConcurrentSkipListMap? or a TreeMap? or a WeakHashMap?).
When i declare a list 1,2,3,4 and i do something with it , even just print i get back the same sequence 1,2,3,4.
But when i do anything with dictionaries , they always change number sequence , like it is being sorted in a twisted way i can't understand .
test1 = [4,1,2,3,6,5]
print test1
test2 = {"c":3,"a":1,"b":2,"d":4}
print test2
[4, 1, 2, 3, 6, 5]
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
How in the world did 'a' become the first element and 'c' , even if it alphabetically sorted the dictionary it should have been 1,2,3,4 or a,b,c,d not 1,3,2,4 . wT?F #!$!#$##!
So how do i print , get values from dictionary without changing the positions of the elements .?
Dictionaries in Python are unordered by definition. Use OrderedDict if you need the order in which values were inserted (it's available in Python 2.7 and 3.x).
dictionary sort order is undefined! Do not rely on it for anything. Look for a sorted dictionary if you really want a sorted dictionary, but usually you don't need one.
Examples:
python 2.7, it's built in to the collections module
Django has a SortedDict shipped with it
2.4-2.7 you can use the ordereddict module, you can pip install or easy_install it
Before you get so angry and frustrated, perhaps you should read about what a dictionary actually is and how it works:
http://docs.python.org/library/stdtypes.html#mapping-types-dict
Python dicts use a hash table as the underlying storage mechanism. That means that a hash key is generated from the key that you provide. There are no guarantees about ordering with these hash keys. The entries in a dictionary are fetched in sequential order of their location in the underlying hash table when you request values(), keys(), or items().
The advantage of using a hash table is that it is extremely fast. Unlike the map class from c++ which uses a red-black tree storage mechanism ( which is sorted by the raw keys ), a hash table doesn't constantly need to be restructured to keep it efficient. For more on hash tables, see:
http://en.wikipedia.org/wiki/Hash_table
Like the other posters have said, look up OrderedDict if you need to have a key-sorted dictionary.
Good Luck!
Clearly you know about lists. You can ask for the element at the ith index of a list. This is because lists are ordered.
>>> [1,2,3,4] == [1,4,3,2]
False
In this context, you can think of dictionaries, but where the index is the key. Therefore, two dictionaries are equal if the corresponding values of all keys in both dictionaries are the same (if one dictionary has keys that the other doesn't, then the two are not equal). Thus:
>>> {1:'a', 2:'b'} == {2:'b', 1:'a'}
True
Further Trivia
A dictionary does something called hashing on the keys of the dictionary so that when you ask for the value at a particular key (index), it can retrieve this value faster.
Hope this helps
Dictionaries are unsorted. This is well-documented. Do not rely on the ordering of dictionaries.
If you want to see the entries in order. something like:
test2 = {"c":3,"a":1,"b":2,"d":4}
ks = test2.keys()
ks.sort()
for key in ks:
print key + ':' + str(test2[key])
(cut,paste, season to taste)