Sorting a dictionary of tuples in Python - python

I know there's tonnes of questions on python sorting lists/dictionaries already, but I can't seem to find one which helps in my case, and i'm looking for the most efficient solution as I'm going to be sorting a rather large dataset.
My data basically looks like this at the moment:
a = {'a': (1, 2, 3), 'b': (3, 2, 1)}
I'm basically creating a word list in which I store each word along with some stats about it (n, Sigma(x), Sigma(x^2) )
I want to sort it based on a particular stat. So far I've been trying something along the lines of:
b = a.items()
b.sort(key = itemgetter(1), reverse=True)
I'm not sure how to control which index it is sorted based on when its effectively a list of tuples of tuples? I guess I effectively need to nest two itemgetter operations but not really sure how to do this.
If there's a better data structure I should be using instead please let me know. Should I perhaps create a small class/struct and then use a lambda function to access a member of the class?
Many Thanks

Something like this?
>>> a = {'a': (1, 2, 3), 'b': (3, 2, 1)}
>>> b = a.items()
>>> b
[('a', (1, 2, 3)), ('b', (3, 2, 1))]
>>> b.sort(key=lambda x:x[1][2]) # sorting by the third item in the tuple
>>> b
[('b', (3, 2, 1)), ('a', (1, 2, 3))]

Names are easier to work with and remember that indices, so I would go with a class:
class Word(object): # don't need `object` in Python 3
def __init__(self, word):
self.word = word
self.sigma = (some calculation)
self.sigma_sq = (some other calculation)
def __repr__(self):
return "Word(%r)" % self.word
def __str__(self):
return self.word
#property
def sigma(self):
return self._sigma
#sigma.setter # requires python 2.6+
def sigma(self, value):
if not value:
raise ValueError("sigma must be ...")
self._sigma = value
word_list = [Word('python'), Word('totally'), Word('rocks')]
word_list.sort(key=lambda w: w.sigma_sq)

Related

Reverse lexicographical using heapq

Essentially I am looking for an efficient way to implement custom comparators using heapq.
For instance x = [('a',5),('c',3),('d',2),('e',1)]
I could heapify it heapq.heapify(x) then pop the min value heapq.heappop(x) which would return ('a', 5). How can I make it such that it returns in reverse lexicographical order ('e', 1)?
I know the convention when it comes to numbers is simply multiply the first element of the tuple by -1. Are there simple tricks like that when it comes to strings? I know I could potentially implement a map from a to z ... z to a but it sounds cumbersome.
For numbers you would do something like this:
import heapq
x = [(1, 5), (3, 3), (4, 2), (5, 1)]
x = [(-a, b) for a, b in x]
heapq.heapify(x)
result = heapq.heappop(x)
result = (-result[0], result[1])
Similarly, I would do this with letters:
import heapq
x = [('a',5), ('c',3), ('d',2), ('e',1)]
x = [(-ord(a), b) for a, b in x]
heapq.heapify(x)
result = heapq.heappop(x)
result = (chr(-result[0]), result[1])
You may want to treat similarly also the second element of each tuple

How to replace a tuple in sorted list of tuples basing on first value Python

I am new to python, and I'm looking for help with solving the following problem
I need to create a sorted list of tuples/dictionaries basing on provided data. Then when new data is provided, and has the same first key, I want to replace it's value with new one. To be more clear I will give example... Imagine I have data, which looks as follows :
data = [(1, 100), (2, 200), (4, 400), (7, 900)]
Then I have new entry from user for example:
(4,500)
So, now I want to replace this tuple with (4,400) to this one with (4,500). I know that tuples are immutable, thus I don't want to update anything, just erase previous one basing on the key (here: 4) and replace it with new tuple.
So far I used a class from another stack which insert every new entry (tuple) in a sorted way to a list of tuples, and I want it to stay this way, because further I need to calculate closest lower and higher number to provided one in case if its not in a list.
My code looks as follows:
from bisect import bisect_left
class KeyWrapper:
def __init__(self, iterable, key):
self.it = iterable
self.key = key
def __getitem__(self, i):
return self.key(self.it[i])
def __len__(self):
return len(self.it)
data = [(1, 100), (2, 200), (4, 400), (7, 900)]
data.sort(key=lambda c: c[0])
newcol = (3, 500)
bslindex = bisect_left(KeyWrapper(data, key=lambda c: c[0]), newcol[0])
data.insert(bslindex, newcol)
The output looks as follows:
[(1,100),(2,200),(3,500),(4,400),(7,900)]
And providing new variable for ex. newcols2 = (3,600), basing on first element of a tuple (here 3)
I want to be output like:
[(1,100),(2,200),(3,600),(4,400),(7,900)]
How about this?
data = [(1, 100), (2, 200), (4, 400), (7, 900)]
new = (4, 500)
# Filter out matching
data = [i for i in data if i[0] != new[0]]
# Add the new
data.append(new)
# Re-sort
data = sorted(data, key=lambda x: x[0])

Order dictionary by key with numerical representation

I have this input, where each value has a range of 200:
d = {'600-800': 3, '1800-2000': 3, '1000-1200': 5, '400-600': 1, '2600-2800': 1}
And I am looking for this expected order:
{'400-600': 1, '600-800': 3, '1000-1200': 5, '1800-2000': 3, '2600-2800': 1}
Already tried something like this, but the order is just wrong:
import collections
od = collections.OrderedDict(sorted(d.items()))
print od
You can split the key into parts at '-' and use the first part as integer value to sort it. The second part is irrelevant for ordering because of the nature of your key-values (when converted to integer):
d = {'600-800': 3, '1800-2000': 3, '1000-1200': 5, '400-600': 1, '2600-2800': 1}
import collections
od = collections.OrderedDict(sorted(d.items(),key =lambda x: int(x[0].split("-")[0])))
print od
Output:
OrderedDict([('400-600', 1), ('600-800', 3), ('1000-1200', 5),
('1800-2000', 3), ('2600-2800', 1)])
Doku:
sorted(iterable,key)
Related:
How to sort a list of objects based on an attribute of the objects? for more "sort by key" examples
Are dictionaries ordered in Python 3.6+? .. which lets you omit the OrderedDict from 3.7+ on (or 3.6 CPython)
If you want to order your dictionary by the first year first (and then by the second year if needed, which is unnecessary in the given example, but feels more natural), you need to convert to integers and set a custom key:
d = {'600-800': 3, '1800-2000': 3, '1000-1200': 5, '400-600': 1, '2600-2800': 1}
sorted(d.items(), key=lambda t: tuple(map(int, t[0].split("-"))))
# [('400-600', 1),
# ('600-800', 3),
# ('1000-1200', 5),
# ('1800-2000', 3),
# ('2600-2800', 1)]
The conversion to integers is needed because e.g. "1000" < "200", but 1000 > 200. This list can be passed to OrderedDict afterwards like in your code, if needed.

Efficient Way of making a set of tuple in which the order of tuple doesn't matters

I want to make a set of tuples in which the order of tuples shouldn't matter.
For eg.- If the tuples I want to add is :
[(1,2),(1,3),(2,1)]
It should output like this:
{(1,2),(1,3)}
Is there any efficient way of doing this in python?
You can apply sorted and then tuple, followed by conversion to set:
res = set(map(tuple, map(sorted, L)))
print(res)
{(1, 2), (1, 3)}
Explanation
There are a couple of good reasons why you should not convert each tuple to set as an initial step:
Tuples (1, 1, 2) and (1, 2) would become equal after conversion to set.
Even in the case where we are considering tuples of length 2, we would be adding an assumption that tuple({(1, 2)}) and tuple({(2, 1)}) are equal. While this may be true, it would be considered an implementation detail, since set is considered to be unordered.
Function composition
Function composition is not native to Python, but if you have access to the 3rd party toolz library you can avoid nested map:
from toolz import compose
tup_sort = compose(tuple, sorted)
res = set(map(tup_sort, L))
You can sort the tuples:
l = [(1,2),(1,3),(2,1)]
res = set(map(lambda x: tuple(sorted(x)), l))
print(res)
{(1, 2), (1, 3)}
The other answers all work! I'm just going to post mine here because I'm a beginner and I love to practice.
mainSet = set()
l = [(1,2),(1,3),(2,1)]
for i in l:
if tuple(sorted(i)) not in mainSet:
mainSet.add(tuple(sorted(i)))
print(mainSet)
Gives back
{(1, 2), (1, 3)}
Whether you want to use this or not is up to you! The other answers are much more shorter.
You can use comprehension, too:
l=[(1, 2), (1, 3), (2, 1)]
res={ tuple(sorted(t)) for t in l }
print(res)
{(1, 2), (1, 3)}

Is it possible to redefine the equal operator for tuples?

I have some code in which edges are represented as tuple
(vertex_1, vertex_2)
and I have lists of edges that represent planar embedded faces, as for the example below.
I need to search if an edge is present in the list, but I need to return true both if a use (v1, v2) and (v2, v1):
f1 = [(6, 1), (1, 2), (2, 7), (7, 6)]
(6,1) in f1
(1,6) in f1
True
False
You cannot override the equality method for existing types, so you would have to create your own type which would then require you to replace all your existing tuples with your custom type.
If your main problem is just the (6,1) in f1 use case, then maybe you should just consider creating a method for that instead:
def contains(t, lst):
return (t[0], t[1]) in lst or (t[1], t[0]) in lst
And then you can just use it like this:
>>> f1 = [(6, 1), (1, 2), (2, 7), (7, 6)]
>>> contains((6, 1), f1)
True
>>> contains((1, 6), f1)
True
This essentially has the benefit that you don’t need to replace your tuples by a different type instead. So you can work with all your data sources the way they are.
You should make a tuple subclass and change it's equality method (__eq__):
class UnorderedTuple(tuple):
def __eq__(self, other):
return len(self) == len(other) and set(self) == set(other)
will work for your case with (tuple lengths == 2 if the tuple elements are hashable - that is immutable and have a well defined comparison)
To have your list of tuples converted to a list of Unordered tuples do:
f1 = [UnorderedTuple(f_) for f_ in f1]
To have a proper containment query (the in operator) over a list can be slow - so you'd better have a set than a list:
set_f1 = { UnorderedTuple(f_) for f_ in f1 }
(6,1) in set_f1
(1,6) in set_f1
This implementation will not be very performant, as it creates a new set for each comparison. So if your tuples will always be f two elements, it is more performant to have the __eq__ method unroled like:
def __eq__(self, other):
return super(UnordoredTuple, self).__eq__(other) or (self[0] == other[1] and self[1] == other[0])
"Is it possible to redefine the equal operator for tuples"
Sort of. You can't do it on the basic tuple type, but you can to it on a subclass:
class MyTuple(tuple):
def __eq__(self, other):
orig_eq = super(MyTuple, self).__eq__(other)
if orig_eq and orig_eq is not NotImplemented:
return True
else:
return super(MyTuple, self).__eq__(other[::-1])
Generally, this probably isn't the best approach. Depending on the constraints of the problem, you could try a set of frozenset:
f1_set = {frozenset(tup) for tup in f1}
frozenset((1, 6)) in f1_set
The advantage here is that if you're doing multiple membership tests on the same data, you'll likely get better runtime (Each membership test on the list is O(N) and you need to do up to two for each item you want to check whereas you only have a single O(N) step to build f1_set and then each membership test is O(1) afterward).
like others have posted you can use class to redefine the equal operator for tuples, but still you have to use that class you have to call it, so if you have
class new_tuple:
...
than you have to use:
tuple = (1,6)
tuple = new_tuple(tuple)
i think it's easier to use function to determine if tuple is in list:
def check(tuple_, list_):
v1, v2 = tuple_
if (v1, v2) in list_ or (v2, v1) in list_:
return True
return False
f1 = [(6, 1), (1, 2), (2, 7), (7, 6)]
print(check((6, 1), f1)) # this prints True
print(check((1, 6), f1)) # this prints True
The general solution to this problem is to use multisets, which are sets where an element may appear more than once. The collections module defines a Counter class, which is a subclass of dict, that implements multisets. The dict keys are the elements of the multiset, and the values are the number of times the keys occur.
This avoids limitations on the number of elements in the multiset, and is already available. The main shortcoming is that there is no "frozen", hashable version that I know of.
Examples:
>>> from collections import Counter
>>> Counter((3, 6, 2, 4, 2, 8)) == Counter((8, 4, 3, 6, 2, 2))
True
>>> Counter((3, 6, 2, 4, 2, 8)) == Counter((8, 4, 3, 6, 4, 2))
False
You can use the Counter class directly, which is probably simplest, but if you want to retain the underlying tuple representation, you can use the Counter class to implement a more general version of the tuple subclass that others have proposed:
class MultisetTuple(tuple):
def __eq__(self, other):
return Counter(self) == Counter(other)
Examples:
>>> MultisetTuple((3, 6, 2, 4, 2, 8)) == MultisetTuple((8, 4, 3, 6, 2, 2))
True
>>> MultisetTuple((3, 6, 2, 4, 2, 8)) == MultisetTuple((8, 4, 3, 6, 4, 2))
False

Categories

Resources