How does this code sort a dictionary? - python

I was looking for ways to sort a dictionary and came across this code on a SO thread:
import operator
x = {1: 2, 3: 4, 4:3, 2:1, 0:0}
sorted_x = sorted(x.iteritems(), key=operator.itemgetter(1))
How does this code work?
When I call iteritems() over a dictionary I get this:
<dictionary-itemiterator object at 0xf09f18>
I know that this is a reference, but how do you use it?
And afaik, in sorted(a,b), as is supposed to be the thing you want to sort, and b would be the indicator for sorting right? How does itemgetter(1) work here?

operator.itemgetter(1) is equivalent to lambda x: x[1]. It's an efficient way to specify a function that returns the value at index 1 of its input.
.iteritems() is a method of a dictionary that returns an iterator over the entries in the dictionary in (key,value) tuple form.

iteritems() is just like items(), except it returns an iterator rather than a list. For large dictionaries, this saves memory because you can iterate over each individual element without having to build up the complete list of items in memory first.
sorted accepts a keyword argument key which is a function used to determine what to compare by when sorting something. In this case, it is using operator.itemgetter, which is like the function version of doing something[1]. Therefore, the code is sorting on the [1] item of the tuples returned by items(), which is the value stored in the dictionary.

Most python built-ins which deal with lists or list like objects also accept iterators, these are like a pointer into the list, which you can advance to the next item in the list with the next() member function. This can be very convenient for infinite lists or very large lists, (either many elements or very large elements,) to keep memory usage down. See http://docs.python.org/library/stdtypes.html#iterator-types
iteritems() gives an iterator into the list of items in the dictionary.

Related

How can I pass each element of a set to a function?

I have a set with multiple tuples: set1 = {(1,1),(2,1)} for example.
Now I want to pass each tuple of the set to a method with this signature: process_tuple(self, tuple).
I am doing it with a for loop like this:
for tuple in set1:
process_tuple(tuple)
Is there a better way to do it?
Your question is basically "how can I loop without using a loop". While it's possible to do what you're asking with out an explicit for loop, the loop is by far the clearest and best way to go.
There are some alternatives, but mostly they're just changing how the loop looks, not preventing it in the first place. If you want to collect the return values from the calls to your function in a list, you can use a list comprehension to build the list at the same time as you loop:
results = [process_tuple(tuple) for tuple in set1]
You can also do set or dict comprehensions if those seem useful to your specific needs. For example, you could build a dictionary mapping from the tuples in your set to their processed results with:
results_dict = {tuple: process_tuple(tuple) for tuple in set1}
If you don't want to write out for tuple in set1 at all, you could use the builtin map function to do the looping and passing of values for you. It returns an iterator, which you'll need to fully consume to run the function over the full input. Passing the map object to list sometimes makes sense, for instance, to convert inputs into numbers:
user_numbers = list(map(int, input("Enter space-separated integers: ").split()))
But I'd also strongly encourage you to think of your current code as perhaps the best solution. Just because you can change it to something else, doesn't mean you should.

Why enumerate should accept a set as an input?

A pyton set is meant as not ordered, so why enumerate accepts them as input?
The same question would apply to dictionary.
From my point of view these are giving the false impression that there is a predictable way of enumerating them, but there is not.
This is quite misleading. I would have expected at least a warning from enumerate whens I request the enumerate(set) or enumerate(dict).
Can anyone explain why this warning is not there? is it "pythonic" to allow enumeration which can be not predictable?
There is a distinction between a container and its iterator. Technically, enumerate doesn't work with set, dict, or list, because none of those types is an iterator. They are iterable, though, meaning enumerate can get an iterator from each by implicitly using the iter function (i.e., enumerate(some_list_dict_or_set) == enumerate(iter(some_list_dict_or_set)))
>>> iter([1,2,3])
<list_iterator object at 0x109d924e0>
>>> iter(dict(a=1, b=2))
<dict_keyiterator object at 0x109d4b818>
>>> iter({1,2,3})
<set_iterator object at 0x109d53ab0>
So while a given container may not have any inherent ordering of its elements, its iterator can impose an order, and enumerate simply pairs that ordering with a sequence of int values.
You can really see the difference between inherent ordering and imposed ordering when comparing dict and OrderedDict in Python 3.7 or later. Both remember the order in which its keys were added, but that order isn't an important part of a dict's identity. That is, two dicts with the same keys and values mapped to those keys are equivalent, no matter what order the keys were added.
>>> dict(a=1, b=2) == dict(b=2, a=1)
True
The same is not true of two OrderedDicts, which are only equal they have the same keys, the same values for those keys, and the keys were added in the same order.
>>> from collections import OrderedDict
>>> OrderedDict(a=1, b=2) == OrderedDict(b=2, a=1)
False
enumerate accepts any iterable which includes set and dict. set might be unordered but its order of iteration is not arbitrary; if you iterate the same set multiple times, it will yield elements in the same order.
Also note that as of Python 3.7 dict preserves insertion order. Whether or not this is useful solely depends on your use case.

Python loading sorted json Dict is inconsistent in Windows and Linux

I am using Python 3.6 to process the data I receive in a text file containing a Dict having sorted keys. An example of such file can be:
{"0.1":"A","0.2":"B","0.3":"C","0.4":"D","0.5":"E","0.6":"F","0.7":"G","0.8":"H","0.9":"I","1.0":"J"}
My data load and transform is simple - I load the file, and then transform the dict into a list of tuples. Simplified it looks like this:
import json
import decimal
with open('test.json') as fp:
o = json.loads(fp.read())
l = [(decimal.Decimal(key), val) for key, val in o.items()]
is_sorted = all(l[i][0] <= l[i+1][0] for i in range(len(l)-1))
print(l)
print('Sorted:', is_sorted)
The list is always sorted in Windows and never in Linux. I know that I can sort the list manually, but since the data file is always sorted already and rather big, I'm looking for a different approach. Is there a way to somehow force json package to load the data to my dict sorted in both Windows and Linux?
For the clarification: I have no control over the structure of data I receive. My goal is to find the most efficient method to load the data into the list of tuples for further processing from what I get.
A dictionary is just a mapping between its keys and corresponding values. It doesn't have any order. It doesn't make sense to say you always find them sorted. In addition, any dictionary member access is O(1) so it's probably fast enough for your need. In case you think you still need some order, ordered dictionary may be useful.
Dicts are unordered objects in Python, so the problem you're running into is actually by design.
If you want to get a sorted list of tuples, you can do something like:
sorted_tuples = sorted(my_dict.items(),key=lambda x:x[0])
or
import operator
sorted_tuples = sorted(my_dict.items(),key=operator.itemgetter(0)
The dict method .items() converts the dict to a list of tuples and sorted() sorts that list. The key parameter to sorted explains how to sort the list. Both lambda x: x[0] and operator.itemgetter(0) select the first element of the tuple (the key from the original dict), and sort on that element.

Python 3.5 OrderedDict: applying sorted to a nested dictionary iterator

So the python documentation suggests using itemgetter, attrgetter, or methodgetter from the operator module when applying sorted on complex data types. Further, iterators are smaller and faster than lists for large size objects.
Thus I am wondering how to create an iterator on an OrderDict's values. The reason being that in the OrderDict I wish to sort all the values are also (regular) dictionaries.
For regular dictionaries, one could do this with:
sorted(my_dict.itervalues(), key=itemgetter('my_key'))
however OrderedDict only seems to have the method __iter__() which works on the OrderedDict keys.
So how can I efficiently make an iterator for the values of the OrderedDict.
Note, I am not looking for list comprehension, a lambda function, or extracting the relevant sub key (key inside the dictionary (a value)) values of the OrderedDict.
e.g.
sorted (my_dict, key= lambda key: my_dict[key]['my_key'])
example nested:
test = OrderedDict({'a': {'x':1, 'y':2, 'z':3},
'b': {'x':1, 'y':2, 'z':3}
})
Neither dict nor OrderedDict have an itervalues() method in Python 3. That method only exists in Python 2.
Use dict.values():
sorted(my_dict.values(), key=itemgetter('my_key'))
In Python 2 you want to use itervalues() not so much because it is an iterator, but because dict.values() had to create a new list object which is then discarded again. Iterables are also not faster (rather, they are often slower!), they are instead more memory efficient. In this case it is faster because not having to create a (large) list that you then discard again takes time.
In Python 3, dict.values() creates a view instead, a lightweight object that like dict.itervalues() yields values on demand and doesn't have to produce a list up front.
You don't have to call iter() on this. sorted() takes an iterable, and will itself call iter() on whatever you passed in. Because it does this from native code and doesn't have to look up a global name, it can do this much faster than Python code ever could.
The answer is to call the method .values() to get a view and type set it to iter:
sorted(iter(my_dict.values()), key=itemgetter('my_subkey'))

dict.keys()[0] on Python 3 [duplicate]

This question already has answers here:
Accessing dict_keys element by index in Python3
(7 answers)
Closed 2 years ago.
I have this sentence:
def Ciudad(prob):
numero = random.random()
ciudad = prob.keys()[0]
for i in prob.keys():
if(numero > prob[i]):
if(prob[i] > prob[ciudad]):
ciudad = i
else:
if(prob[i] > prob[ciudad]):
ciudad = i
return ciudad
But when I call it this error pops:
TypeError: 'dict_keys' object does not support indexing
is it a version problem? I'm using Python 3.3.2
dict.keys() is a dictionary view. Just use list() directly on the dictionary instead if you need a list of keys, item 0 will be the first key in the (arbitrary) dictionary order:
list(prob)[0]
or better still just use:
next(iter(dict))
Either method works in both Python 2 and 3 and the next() option is certainly more efficient for Python 2 than using dict.keys(). Note however that dictionaries have no set order and you will not know what key will be listed first.
It looks as if you are trying to find the maximum key instead, use max() with dict.get:
def Ciudad(prob):
return max(prob, key=prob.get)
The function result is certainly going to be the same for any given prob dictionary, as your code doesn't differ in codepaths between the random number comparison branches of the if statement.
In Python 3.x, dict.keys() does not return a list, it returns an iterable (specifically, a dictionary view). It is worth noting that dict itself is also an iterable of the keys.
If you want to obtain the first key, use next(iter(dict)) instead. (Note that before Python 3.6 dictionaries were unordered, so the 'first' element was an arbitrary one. Since 3.6 it will be based on insertion order. If you need that behaviour in older versions or with cross-version compatibility, you can use collections.OrderedDict).
This works quite simply: we take the iterable from the dictionary view with iter(), then use next() to advance it by one and get the first key.
If you need to iterate over the keys—then there is definitely no need to construct a list:
for key in dict:
...
These are all advantageous when compared to using list() as it means a list isn't constructed - making it faster and more memory efficient (hence why the default behaviour of keys() was changed in 3.x). Even in Python 2.x you would be better off doing next(iter(dict.iterkeys()).
Note all these things apply to dict.values() and dict.items() as well.
I've had success turning the iterables taken from a dictionary into a list.
So, for dic.keys(), dic.values(), and dic.items(), in Python3.6, you can:
dic = {'a':3, 'b':2, 'c':3}
print(dic)
dictkeys = dic.keys() # or values/items
print(dictkeys)
keylist = []
keylist.extend(iter(dictkeys)) # my big revelation
print('keylist', keylist)

Categories

Resources