Sorting an OrderedDict both ascendingly and descendingly - python

Define:
diction = {"Book":(1, 2), "Armchair":(2, 2), "Lamp":(1, 3)}
If one wants to sort this dictionary by item["key"][1] descendingly and by "keys" ascendingly, what will be the appropriate way?
Desired output:
diction = {"Lamp":(1, 3), "Armchair":(2, 2), "Book":(1, 2)}
After receiving the correct answer by Sinan Note that I did not ask about sorting either ascendingly or descendingly, that you sent me the relevant links! I asked about doing both at the same time which is not solved by the trick of minus given by Sinan.

The main idea is copied from here.
from collections import OrderedDict
diction = {"Book":(1, 2), "Armchair":(2, 2), "Lamp":(1, 3)}
diction_list = list(diction.items())
diction = OrderedDict(sorted(diction_list, key=lambda x: (-x[1][1], x[0])))
print(diction)
OrderedDict([('Lamp', (1, 3)), ('Armchair', (2, 2)), ('Book', (1, 2))])
Here is where the magic is happening:
sorted(diction_list, key=lambda x: (-x[1][1], x[0]))
sorted sorts stuff for you. It is very good at it. If you use the key parameter with sort, you can give it a function to be used on each item to be sorted.
In this case, we are giving at a lambda that returns the negative value from the tuples -x[1][1] (this is the second value in the tuple) and the key x[0]. The negative sign makes it sort in reverse. Of course, this would not work with non-numeric values.
The OrderedDict thing is not strictly necessary. Python keeps dict in order after version 3.6 I think (or was it 3.7?). OrderedDicts are always kept in the same order, so it is there that so our dict stays ordered independent of the Python version.
For a better way of doing this see the Sorting HOW TO in Python Documentation.

Related

Python 3.x: How do I sort a list of tuples that has the same second element (count) by string value?

So for example
['John','John','Mike','Mike','Kate','Kate']
Should return:
[('John', 2), ('Kate', 2), ('Mike', 2)]
How can I write code so there is order instead of those three pairs just being in random order?
I need to sort the list of tuples by count from biggest to smallest unless there are ties, then I need to sort the times alphabetically.
This works:
>>> names = ['John','John','Mike','Mike','Kate','Kate']
>>> sorted(Counter(names).items(), key=lambda item: (-item[1], item[0]))
[('John', 2), ('Kate', 2), ('Mike', 2)]
The counter's items will give you tuples of (name, count). Normally you'd use Counter.most_common to get the items in order of their counts, but as far as I can tell, it only sorts by count and disregards any key (name) information in the sorting.
Since we have to re-sort again anyway, we might as well use sorted on the items instead. Since tuples sort lexicographically, and you want to sort primarily by the count, the key function should return a tuple of the format (count, name). However, since you want this to be decreasing by count, but increasing by name, the only thing we can do is return a tuple of the format (-count, name). This way, larger count will result in a lower value so it will sort before values with lower counts.
You can sort your result using sorted() function using the key argument to define how to sort the items:
result = [('John', 2), ('Kate', 2), ('Mike', 3)]
sorted_result = sorted(result, key=lambda x: (-x[1], x[0]))
As you want to sort the result in descending order on the count value and then the name in ascending order, so the key (-x[1], x[0]) will do the trick.
The sorted_result will be:
[('Mike', 3), ('John', 2), ('Kate', 2)]
There are two ways to do this. In the first method, a sorted list is returned. In the second method, the list is sorted in-place.
import operator
# Method 1
a = [('Mike', 2), ('John', 2), ('Kate', 2), ('Arvind', 5)]
print(sorted(a, key = lambda x : (x[0],)))
# Method 2
a = [('Mike', 2), ('John', 2), ('Kate', 2), ('Arvind', 5)]
a.sort(key=operator.itemgetter(0))
print(a)

Use OrderedDict or ordered list?(novice)

(Using Python 3.4.3)
Here's what I want to do: I have a dictionary where the keys are strings and the values are the number of times that string occurs in file. I need to output which string(s) occur with the greatest frequency, along with their frequencies (if there's a tie for the most-frequent, output all of the most-frequent).
I had tried to use OrderedDict. I can create it fine, but I struggle to get it to output specifically the most frequently occurring. I can keep trying, but I'm not sure an OrderedDict is really what I should be using, since I'll never need the actual OrderedDict once I've determined and output the most-frequent strings and their frequency. A fellow student recommended an ordered list, but I don't see how I'd preserve the link between the keys and values as I currently have them.
Is OrderedDict the best tool to do what I'm looking for, or is there something else? If it is, is there a way to filter/slice(or equivalent) the OrderedDict?
You can simply use sorted with a proper key function, in this case you can use operator.itemgetter(1) which will sorts your items based on values.
from operator import itemgetter
print sorted(my_dict.items(),key=itemgetter(1),reverse=True)
This can be solved in two steps. First sort your dictionary entries by their frequency so that the highest frequency is first.
Secondly use Python's groupby function to take matching entries from the list. As you are only interested in the highest, you stop after one iteration. For example:
from itertools import groupby
from operator import itemgetter
my_dict = {"a" : 8, "d" : 3, "c" : 8, "b" : 2, "e" : 2}
for k, g in groupby(sorted(my_dict.items(), key=itemgetter(1), reverse=True), key=itemgetter(1)):
print list(g)
break
This would display:
[('a', 8), ('c', 8)]
As a and c are equal top.
If you remove the break statement, you would get the full list:
[('a', 8), ('c', 8)]
[('d', 3)]
[('b', 2), ('e', 2)]

Using dictionaries in loop

I am trying to write a code that replicates greedy algorithm and for that I need to make sure that my calculations use the highest value possible. Potential values are presented in a dictionary and my goal is to use largest value first and then move on to lower values. However since dictionary values are not sequenced, in for loop I am getting unorganized sequences. For example, out put of below code would start from 25.
How can I make sure that my code is using a dictionary yet following the sequence of (500,100,25,10,5)?
a={"f":500,"o":100,"q":25,"d":10,"n":5}
for i in a:
print a[i]
Two ideas spring to mind:
Use collections.OrderedDict, a dictionary subclass which remembers the order in which items are added. As long as you add the pairs in descending value order, looping over this dict will return them in the right order.
If you can't be sure the items will be added to the dict in the right order, you could construct them by sorting:
Get the values of the dictionary with values()
Sort by (ascending) value: this is sorted(), and Python will default to sorting in ascending order
Get them by descending value instead: this is reverse=True
Here's an example:
for value in sorted(a.values(), reverse=True):
print value
Dictionaries yield their keys when you iterate them normally, but you can use the items() view to get tuples of the key and value. That'll be un-ordered, but you can then use sorted() on the "one-th" element of the tuples (the value) with reverse set to True:
a={"f":500,"o":100,"q":25,"d":10,"n":5}
for k, v in sorted(a.items(), key=operator.itemgetter(1), reverse=True):
print(v)
I'm guessing that you do actually need the keys, but if not, you can just use values() instead of items(): sorted(a.values(), reverse=True)
You can use this
>>> a={"f":500,"o":100,"q":25,"d":10,"n":5}
>>> for value in sorted(a.itervalues(),reverse=True):
... print value
...
500
100
25
10
5
>>>
a={"f":500,"o":100,"q":25,"d":10,"n":5}
k = sorted(a, key=a.__getitem__, reverse=True)
v = sorted(a.values(), reverse=True)
sorted_a = zip(k,v)
print (sorted_a)
Output:
[('f', 500), ('o', 100), ('q', 25), ('d', 10), ('n', 5)]

Getting all keys in a dict that overlap with other keys in the same dict

I have a list comprehension that looks like this:
cart = [ ((p,pp),(q,qq)) for ((p,pp),(q,qq))\
in itertools.product(C.items(), repeat=2)\
if p[1:] == q[:-1] ]
C is a dict with keys that are tuples of arbitrary integers . All the tuples have the same length. Worst case is that all the combinations should be included in the new list. This can happen quite frequently.
As an example, I have a dictionary like this:
C = { (0,1):'b',
(2,0):'c',
(0,0):'d' }
And I want the the result to be:
cart = [ (((2, 0), 'c'), ((0, 1), 'b'))
(((2, 0), 'c'), ((0, 0), 'd'))
(((0, 0), 'd'), ((0, 1), 'b'))
(((0, 0), 'd'), ((0, 0), 'd')) ]
So, by overlap I am referring to, for instance, that the tuples (1,2,3,4) and (2,3,4,5) have the overlapping section (2,3,4). The overlapping sections must be on the "edges" of the tuples. I only want overlaps that have length one shorter than the tuple length. Thus (1,2,3,4) does not overlap with (3,4,5,6). Also note that when removing the first or last element of a tuple we might end up with non-distinct tuples, all of which must be compared to all the other elements. This last point was not emphasized in my first example.
The better part of my codes execution time is spent in this list comprehension. I always need all elements of cart so there appears to be no speedup when using a generator instead.
My question is: Is there a faster way of doing this?
A thought I had was that I could try to create two new dictionaries like this:
aa = defaultdict(list)
bb = defaultdict(list)
[aa[p[1:]].append(p) for p in C.keys()]
[bb[p[:-1]].append(p) for p in C.keys()]
And somehow merge all combinations of elements of the list in aa[i] with the list in bb[i] for all i, but I can not seem to wrap my head around this idea either.
Update
Both the solution added by tobias_k and shx2 have better complexity than my original code (as far as I can tell). My code is O(n^2) whereas the two other solutions are O(n). For my problem size and composition however, all three solutions seem to run at more or less the same time. I suppose this has to do with a combination of overhead associated with function calls, as well as the nature of the data I am working with. In particular the number of different keys, as well as the actual composition of the keys, seem to have a large impact. The latter I know because the code runs much slower for completely random keys. I have accepted tobias_k's answer because his code is the easiest to follow. However, i would still greatly welcome other suggestions on how to perform this task.
You were actually on the right track, using the dictionaries to store all the prefixes to the keys. However, keep in mind that (as far as I understand the question) two keys can also overlap if the overlap is less than len-1, e.g. the keys (1,2,3,4) and (3,4,5,6) would overlap, too. Thus we have to create a map holding all the prefixes of the keys. (If I am mistaken about this, just drop the two inner for loops.) Once we have this map, we can iterate over all the keys a second time, and check whether for any of their suffixes there are matching keys in the prefixes map. (Update: Since keys can overlap w.r.t. more than one prefix/suffix, we store the overlapping pairs in a set.)
def get_overlap(keys):
# create map: prefix -> set(keys with that prefix)
prefixes = defaultdict(set)
for key in keys:
for prefix in [key[:i] for i in range(len(key))]:
prefixes[prefix].add(key)
# get keys with matching prefixes for all suffixes
overlap = set()
for key in keys:
for suffix in [key[i:] for i in range(len(key))]:
overlap.update([(key, other) for other in prefixes[suffix]
if other != key])
return overlap
(Note that, for simplicity, I only care about the keys in the dictionary, not the values. Extending this to return the values, too, or doing this as a postprocessing step, should be trivial.)
Overall running time should be only 2*n*k, n being the number of keys and k the length of the keys. Space complexity (the size of the prefixes map) should be between n*k and n^2*k, if there are very many keys with the same prefixes.
Note: The above answer is for the more general case that the overlapping region can have any length. For the simpler case that you consider only overlaps one shorter than the original tuple, the following should suffice and yield the results described in your examples:
def get_overlap_simple(keys):
prefixes = defaultdict(list)
for key in keys:
prefixes[key[:-1]].append(key)
return [(key, other) for key in keys for other in prefixes[key[1:]]]
Your idea of preprocessing the data into a dict was a good one. Here goes:
from itertools import groupby
C = { (0,1): 'b', (2,0): 'c', (0,0): 'd' }
def my_groupby(seq, key):
"""
>>> group_by(range(10), lambda x: 'mod=%d' % (x % 3))
{'mod=2': [2, 5, 8], 'mod=0': [0, 3, 6, 9], 'mod=1': [1, 4, 7]}
"""
groups = dict()
for x in seq:
y = key(x)
groups.setdefault(y, []).append(x)
return groups
def get_overlapping_items(C):
prefixes = my_groupby(C.iteritems(), key = lambda (k,v): k[:-1])
for k1, v1 in C.iteritems():
prefix = k1[1:]
for k2, v2 in prefixes.get(prefix, []):
yield (k1, v1), (k2, v2)
for x in get_overlapping_items(C):
print x
(((2, 0), 'c'), ((0, 1), 'b'))
(((2, 0), 'c'), ((0, 0), 'd'))
(((0, 0), 'd'), ((0, 1), 'b'))
(((0, 0), 'd'), ((0, 0), 'd'))
And by the way, instead of:
itertools.product(*[C.items()]*2)
do:
itertools.product(C.items(), repeat=2)

I want to rank document and store them in a list in python

I am just a beginner in python. I have document score= {1:0.98876, 8:0.12245, 13:0.57689} which is stored in dictionary. The keys are corresponding to a series of document id and the values are corresponding to the score for each document id. How do I rank the document based on the scores?
inverse=[(value, key) for key, value in score.items()]
fmax=max(inverse)
I already found the maximum values by using the method above which return:
(0.98876,1)
But what I want is to rank the documents and store in a list:
{(0.98876,1),(0.57689,13),(0.12245,8)}
sorted(score.items(), key=lambda x:-x[1])
should do the trick
The order of the elements in a dictionary is not defined, so the result of the sorting has to be stored in a list (or an OrderedDict).
You should convert it to a list of tuples using items(). With sorted() you can sort them, the key parameter tells it to sort according to the inverse of the second tuple element.
Full example:
>>> score= {1:0.98876, 8:0.12245, 13:0.57689}
>>> sorted(score.items(), key=lambda x:-x[1])
[(1, 0.98875999999999997), (13, 0.57689000000000001), (8, 0.12245)]
>>> print [(y,x) for (x,y) in _]
[(0.98875999999999997, 1), (0.57689000000000001, 13), (0.12245, 8)]
This also shows how to reverse the elements in the tuple if you really want to do that.
if you want to modify original list inverse then use inverse.sort(reverse=True).
If you want to produce a new list and leave original list untouched, use sorted(inverse, reverse=True).
You don't need an intermediate list, however, just use score:
>>> sorted(score.items(), key=lambda x: x[1], reverse=True)
[(1, 0.98876), (13, 0.57689), (8, 0.12245)]
After your inverse method, this would do the trick:
ranked = inverse.sort()
And here's some more info on sorting in python: http://wiki.python.org/moin/HowTo/Sorting/
Sort the inverse list:
inverse.sort()
This will return the list in ascending order, if you want it in reverse order, reverse it also:
inverse.reverse()
use this:
inverse.sort(reverse=True)
have a look here for more info on sorting
if you want rank itens in dict:
score = {1:0.98876, 8:0.12245, 13:0.57689}
# get a list of items...
list = score.items()
print list
[(8, 0.12245), (1, 0.98875999999999997), (13, 0.57689000000000001)]
# Sort items.
list.sort()
print list
[(1, 0.98875999999999997), (8, 0.12245), (13, 0.57689000000000001)]
# reverse order
list.reverse()
print list
[(13, 0.57689000000000001), (8, 0.12245), (1, 0.98875999999999997)]

Categories

Resources