I am learning python, and I do not know what is the best way to sort a list of objects using many attributes. Now I have this
class Example:
def __init__(self, a,b,c):
self.a = a
self.b = b
self.c = c
List = [Example(3,1,5), Example(2,1,2), Example(2,2,2), Example(1,4,1),Example(1,4,5), Example(1,4,2)]
I do not know how to sort is. Is there any tool in Python to help with this or need to write some custom function?
You need to implement rich comparison methods like __lt__ and __ne__ in your class in order to be able to sort a list of instances of your class. Rather than implementing all six comparisons, though, we can get away with only implementing two of them (__eq__ and one of the inequalities) if we decorate with functools.total_ordering.
If you want a lexicographic sort, so that you first compare on a, and then if tied, compare on b, and if still tied, compare on c, see below:
import functools
#functools.total_ordering
class Example:
def __init__(self, a,b,c):
self.a = a
self.b = b
self.c = c
def __eq__(self, other):
if self.a == other.a and self.b == other.b and self.c == other.c:
return True
else:
return False
def __lt__(self, other):
if self.a < other.a:
return True
elif self.a == other.a and self.b < other.b:
return True
elif self.a == other.a and self.b == other.b and self.c < other.c:
return True
else:
return False
def __repr__(self): # included for readability in an interactive session
return 'Example({}, {}, {})'.format(self.a, self.b, self.c)
Now, we can do the following:
>>> lst = [Example(3,1,5), Example(2,1,2), Example(2,2,2), Example(1,4,1),Example(1,4,5), Example(1,4,2)]
>>> lst
[Example(3, 1, 5), Example(2, 1, 2), Example(2, 2, 2), Example(1, 4, 1), Example(1, 4, 5), Example(1, 4, 2)]
>>> lst.sort()
>>> lst
[Example(1, 4, 1), Example(1, 4, 2), Example(1, 4, 5), Example(2, 1, 2), Example(2, 2, 2), Example(3, 1, 5)]
You can sort by multiple items as follows:
List.sort(key=lambda e: [e.a, e.b, e.c])
# or
List.sort(key=operator.attrgetter('a', 'b', 'c'))
This all depends on what you are planning on sorting by. However, whatever that may be you are probably looking for a lambda function. Say you wanted to sort by the self.a attribute you would write your sort as such
#[Example(3, 1, 5), Example(2, 1, 2), Example(2, 2, 2), Example(1, 4, 1), Example(1, 4, 5), Example(1, 4, 2)]
List.sort(key=lambda x: x.a, reverse=False)
#[Example(1, 4, 1), Example(1, 4, 2), Example(1, 4, 5), Example(2, 1, 2), Example(2, 2, 2), Example(3, 1, 5)]
One way would be, as #senshin already explained, to make the object ordered. That works if Example is ordered inherently and that ordering can be also used e.g. to compare standalone objects. However, if your sorting order may vary, then sorted or list.sort with key argument is what you need, and operator module functions can make it more elegant:
from operator import attrgetter
sorted(alist, key=attrgetter('a')) # sort just by a
sorted(alist, key=attrgetter('c', 'b')) # sort by c then by b
Related
Consider the following piece of code
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
class Widget:
def __init__(self, low=None, mid=None, high=None):
self.low = low
self.mid = mid
self.high = high
widget = Widget(low=[Point(0, 1), Point(1, 2), Point(2, 3)],
mid=[Point(3, 4), Point(4, 5), Point(5, 6)],
high=[Point(6, 7), Point(7, 8), Point(8, 9)])
a, b, c = Point(11, 11), Point(12, 12), Point(13, 13)
Now I would like to alter the attributes of the widget instance. Each attribute has to be altered in a certain way. Specifically, let us consider the (simplified) example where the first element of widget.low needs to be set to a, the second element of widget.mid to b and the last element of widget.high to c. Since these operations are very similar I am tempted to write it in a nested fashion like so,
for attr, ix, value in (('low', 0, a), ('mid', 1, b), ('high', 2, c):
getattr(widget, attr)[ix] = value
Now, this feels very naughty. Because, I am using getattr to set (part of) an attribute. In this the first answer states that setting attributes should be done by setattr. The above construction would then become something like,
for attr, ix, value in (('low', 0, a), ('mid', 1, b), ('high', 2, c):
setattr(widget, attr, getattr(widget, attr)[:ix] + [value] + getattr(widget, attr)[ix+1:])
Wow! That is really ugly. I believe the result of these two loops is the same, for instance, as expected self.mid = [Point(3,4), Point(12, 12), Point(5,6)] . I am interested in the 'correct' (most pythonic) way to do this? I know 'flat is better than nested' and for this task I could write out three lines. But I am considering a situation where nesting can save a significant amount of duplication. Thanks in advance :)
rules: If one list is shorter than the other, the last element of the shorter list should be repeated as often as necessary. If one or both lists are empty, the empty list should be returned.
merge([0, 1, 2], [5, 6, 7])
should return [(0, 5), (1, 6), (2, 7)]
merge([2, 1, 0], [5, 6])
should return [(2, 5), (1, 6), (0, 6)]
merge([ ], [2, 3])
should return []
this is what I've written so far
def merge(a, b):
mergelist = []
for pair in zip(a, b):
for item in pair :
mergelist.append(item )
return mergelist
print(merge([0, 1, 2], [5, 6]))
Thanks for asking the question.
I tried to amend your code as it is always easier to understand our own code.
Please find modifications
def merge(a, b):
mergelist = []
if not a or not b:
return []
elif len(a) > len(b):
occ = len(a)-len(b)
b.extend([b[len(b)-1] for i in range(occ)])
elif len(a) < len(b):
occ = len(b)-len(a)
a.extend([a[len(a)-1] for i in range(occ)])
for pair in zip(a, b):
mergelist.append(pair)
return mergelist
print(merge(l,l1))
You need to manually append each tuple in the return list as you need to check if the length of the second list accordingly. This is one way of solving this
def merge(l1, l2):
new = []
for i in range(len(l1)):
if i > len(l2)-1:
s2 = l2[len(l2)-1] # use the last element of second list if there are no more elements
else:
s2 = l2[i]
new.append(l1[i], s2)
return new
"""
>>> merge([0,1,2],[5,6,7])
[(0, 5), (1, 6), (2, 7)]
>>> merge([2,1,0],[5,6])
[(2, 5), (1, 6), (0, 6)]
>>> merge([],[2,3])
[]
"""
This is actually somewhat tricky.
You would think something simple like this would work:
def merge(a, b):
# use iterator to keep iterations state after zip
a, b = iter(a), iter(b)
rtrn = list(zip(a, b))
try:
taila, tailb = rtrn[-1]
except IndexError: # one or both empty
return rtrn
# only one of these loops will run, draining the longer input list
rtrn.extend((ai, tailb) for ai in a)
rtrn.extend((taila, bi) for bi in b)
return rtrn
Here the trick is to use an iterator, not an iterable. An iterator keeps its state. So after the zip, both iterators should still point at the place where zip stopped.
However, this does not work if b is the shorter list. Because then zip will have removed one value from a and will discard it. You have to be careful to avoid this.
The easiest way is to just materialize two lists and deal with the length differences explicitely.
def merge(a, b):
# ensure that we have lists, not anything else like iterators, sets, etc
a, b = list(a), list(b)
rtrn = list(zip(a, b))
try:
taila, tailb = rtrn[-1]
except IndexError: # one or both empty
return rtrn
rtrnlen = len(rtrn)
# only one of these loops will run, draining the longer input list
# You could also use itertools.zip_longest for this
rtrn.extend((ai, tailb) for ai in a[rtrnlen:])
rtrn.extend((taila, bi) for bi in b[rtrnlen:])
return rtrn
I'd use zip_longest:
from itertools import zip_longest
def merge(a, b):
return list(a and b and zip_longest(a, b, fillvalue=min(a, b, key=len)[-1]))
Same thing, different style:
def merge(a, b):
if a and b:
short = min(a, b, key=len)
return list(zip_longest(a, b, fillvalue=short[-1]))
return []
from itertools import zip_longest
def merge(a,b):
if len(a) > len(b):
return list((zip_longest(a,b,fillvalue=b[-1])))
else:
return list((zip_longest(a,b,fillvalue=a[-1])))`
for example
a = [2,3,5]
b = [1,2]
merge(a,b)
[(2, 1), (3, 2), (5, 2)]
Link to documentation for zip_longest
https://docs.python.org/3/library/itertools.html#itertools.zip_longest
I have some code in which edges are represented as tuple
(vertex_1, vertex_2)
and I have lists of edges that represent planar embedded faces, as for the example below.
I need to search if an edge is present in the list, but I need to return true both if a use (v1, v2) and (v2, v1):
f1 = [(6, 1), (1, 2), (2, 7), (7, 6)]
(6,1) in f1
(1,6) in f1
True
False
You cannot override the equality method for existing types, so you would have to create your own type which would then require you to replace all your existing tuples with your custom type.
If your main problem is just the (6,1) in f1 use case, then maybe you should just consider creating a method for that instead:
def contains(t, lst):
return (t[0], t[1]) in lst or (t[1], t[0]) in lst
And then you can just use it like this:
>>> f1 = [(6, 1), (1, 2), (2, 7), (7, 6)]
>>> contains((6, 1), f1)
True
>>> contains((1, 6), f1)
True
This essentially has the benefit that you don’t need to replace your tuples by a different type instead. So you can work with all your data sources the way they are.
You should make a tuple subclass and change it's equality method (__eq__):
class UnorderedTuple(tuple):
def __eq__(self, other):
return len(self) == len(other) and set(self) == set(other)
will work for your case with (tuple lengths == 2 if the tuple elements are hashable - that is immutable and have a well defined comparison)
To have your list of tuples converted to a list of Unordered tuples do:
f1 = [UnorderedTuple(f_) for f_ in f1]
To have a proper containment query (the in operator) over a list can be slow - so you'd better have a set than a list:
set_f1 = { UnorderedTuple(f_) for f_ in f1 }
(6,1) in set_f1
(1,6) in set_f1
This implementation will not be very performant, as it creates a new set for each comparison. So if your tuples will always be f two elements, it is more performant to have the __eq__ method unroled like:
def __eq__(self, other):
return super(UnordoredTuple, self).__eq__(other) or (self[0] == other[1] and self[1] == other[0])
"Is it possible to redefine the equal operator for tuples"
Sort of. You can't do it on the basic tuple type, but you can to it on a subclass:
class MyTuple(tuple):
def __eq__(self, other):
orig_eq = super(MyTuple, self).__eq__(other)
if orig_eq and orig_eq is not NotImplemented:
return True
else:
return super(MyTuple, self).__eq__(other[::-1])
Generally, this probably isn't the best approach. Depending on the constraints of the problem, you could try a set of frozenset:
f1_set = {frozenset(tup) for tup in f1}
frozenset((1, 6)) in f1_set
The advantage here is that if you're doing multiple membership tests on the same data, you'll likely get better runtime (Each membership test on the list is O(N) and you need to do up to two for each item you want to check whereas you only have a single O(N) step to build f1_set and then each membership test is O(1) afterward).
like others have posted you can use class to redefine the equal operator for tuples, but still you have to use that class you have to call it, so if you have
class new_tuple:
...
than you have to use:
tuple = (1,6)
tuple = new_tuple(tuple)
i think it's easier to use function to determine if tuple is in list:
def check(tuple_, list_):
v1, v2 = tuple_
if (v1, v2) in list_ or (v2, v1) in list_:
return True
return False
f1 = [(6, 1), (1, 2), (2, 7), (7, 6)]
print(check((6, 1), f1)) # this prints True
print(check((1, 6), f1)) # this prints True
The general solution to this problem is to use multisets, which are sets where an element may appear more than once. The collections module defines a Counter class, which is a subclass of dict, that implements multisets. The dict keys are the elements of the multiset, and the values are the number of times the keys occur.
This avoids limitations on the number of elements in the multiset, and is already available. The main shortcoming is that there is no "frozen", hashable version that I know of.
Examples:
>>> from collections import Counter
>>> Counter((3, 6, 2, 4, 2, 8)) == Counter((8, 4, 3, 6, 2, 2))
True
>>> Counter((3, 6, 2, 4, 2, 8)) == Counter((8, 4, 3, 6, 4, 2))
False
You can use the Counter class directly, which is probably simplest, but if you want to retain the underlying tuple representation, you can use the Counter class to implement a more general version of the tuple subclass that others have proposed:
class MultisetTuple(tuple):
def __eq__(self, other):
return Counter(self) == Counter(other)
Examples:
>>> MultisetTuple((3, 6, 2, 4, 2, 8)) == MultisetTuple((8, 4, 3, 6, 2, 2))
True
>>> MultisetTuple((3, 6, 2, 4, 2, 8)) == MultisetTuple((8, 4, 3, 6, 4, 2))
False
I am trying to create a method (sum) that takes a variable number of vectors and adds them in. For educational purposes, I have written my own Vector class, and the underlying data is stored in an instance variable named data.
My code for the #classmethod sum works (for each of the vectors passed in, loop through each element in the data variable and add it to a result list), but it seems non-Pythonic, and wondering if there is a better way?
class Vector(object):
def __init__(self, data):
self.data = data
#classmethod
def sum(cls, *args):
result = [0 for _ in range(len(args[0].data))]
for v in args:
if len(v.data) != len(result): raise
for i, element in enumerate(v.data):
result[i] += element
return cls(result)
itertools.izip_longest may come very handy in your situation:
a = [1, 2, 3, 4]
b = [1, 2, 3, 4, 5, 6]
c = [1, 2]
lists = (a, b, c)
result = [sum(el) for el in itertools.izip_longest(*lists, fillvalue=0)]
And here you got what you wanted:
>>> result
[3, 6, 6, 8, 5, 6]
What it does is simply zips up your lists together, by filling empty value with 0. e.g. izip_longest(a, b) would be [(1, 1), (2, 2), (3, 0), (4, 0)]. Then just sums up all the values in each tuple element of the intermediate list.
So here you go step by step:
>>> lists
([1, 2, 3, 4], [1, 2, 3, 4, 5, 6], [1, 2])
>>> list(itertools.izip_longest(*lists, fillvalue=0))
[(1, 1, 1), (2, 2, 2), (3, 3, 0), (4, 4, 0), (0, 5, 0), (0, 6, 0)]
So if you run a list comprehension, summing up all sub-elements, you get your result.
Another thing that you could do (and that might be more "pythonic") would be to implement the __add__ magic method, so you can use + and sum directly on vectors.
class Vector(object):
def __init__(self, data):
self.data = data
def __add__(self, other):
if isinstance(other, Vector):
return Vector([s + o for s, o in zip(self.data, other.data)])
if isinstance(other, int):
return Vector([s + other for s in self.data])
raise TypeError("can not add %s to vector" % other)
def __radd__(self, other):
return self.__add__(other)
def __repr__(self):
return "Vector(%r)" % self.data
Here, I also implemented addition of Vector and int, adding the number on each of the Vector's data elements, and the "reverse addition" __radd__, to make sum work properly.
Example:
>>> v1 = Vector([1,2,3])
>>> v2 = Vector([4,5,6])
>>> v3 = Vector([7,8,9])
>>> v1 + v2 + v3
Vector([12, 15, 18])
>>> sum([v1,v2,v3])
Vector([12, 15, 18])
args = [[1, 2, 3],
[10, 20, 30],
[7, 3, 15]]
result = [sum(data) for data in zip(*args)]
# [18, 25, 48]
Is this what you want?
I know there's tonnes of questions on python sorting lists/dictionaries already, but I can't seem to find one which helps in my case, and i'm looking for the most efficient solution as I'm going to be sorting a rather large dataset.
My data basically looks like this at the moment:
a = {'a': (1, 2, 3), 'b': (3, 2, 1)}
I'm basically creating a word list in which I store each word along with some stats about it (n, Sigma(x), Sigma(x^2) )
I want to sort it based on a particular stat. So far I've been trying something along the lines of:
b = a.items()
b.sort(key = itemgetter(1), reverse=True)
I'm not sure how to control which index it is sorted based on when its effectively a list of tuples of tuples? I guess I effectively need to nest two itemgetter operations but not really sure how to do this.
If there's a better data structure I should be using instead please let me know. Should I perhaps create a small class/struct and then use a lambda function to access a member of the class?
Many Thanks
Something like this?
>>> a = {'a': (1, 2, 3), 'b': (3, 2, 1)}
>>> b = a.items()
>>> b
[('a', (1, 2, 3)), ('b', (3, 2, 1))]
>>> b.sort(key=lambda x:x[1][2]) # sorting by the third item in the tuple
>>> b
[('b', (3, 2, 1)), ('a', (1, 2, 3))]
Names are easier to work with and remember that indices, so I would go with a class:
class Word(object): # don't need `object` in Python 3
def __init__(self, word):
self.word = word
self.sigma = (some calculation)
self.sigma_sq = (some other calculation)
def __repr__(self):
return "Word(%r)" % self.word
def __str__(self):
return self.word
#property
def sigma(self):
return self._sigma
#sigma.setter # requires python 2.6+
def sigma(self, value):
if not value:
raise ValueError("sigma must be ...")
self._sigma = value
word_list = [Word('python'), Word('totally'), Word('rocks')]
word_list.sort(key=lambda w: w.sigma_sq)