I am trying to implement a user defined sort function, similar to the python List sort as in list.sort(cmp = None, key = None, reverse = False) for example.
Here is my code so far
from operator import itemgetter
class Sort:
def __init__(self, sList, key = itemgetter(0), reverse = False):
self._sList = sList
self._key = key
self._reverse = reverse
self.sort()
def sort(self):
for index1 in range(len(self._sList) - 1):
for index2 in range(index1, len(self._sList)):
if self._reverse == True:
if self._sList[index1] < self._sList[index2]:
self._sList[index1], self._sList[index2] = self._sList[index2], self._sList[index1]
else:
if self._sList[index1] > self._sList[index2]:
self._sList[index1], self._sList[index2] = self._sList[index2], self._sList[index1]
List = [[1 ,2],[3, 5],[5, 1]]
Sort(List, reverse = True)
print List
I have a really bad time when it comes to the key parameter.
More specifically, I would like to know if there is a way to code a list with optional indexes (similar to foo(*parameters) ).
I really hope you understand my question.
key is a function to convert the item to a criterion used for comparison.
Called with the item as the sole parameter, it returns a comparable value of your choice.
One classical key example for integers stored as string is:
lambda x : int(x)
so strings are sorted numerically.
In your algorithm, you would have to replace
self._sList[index1] < self._sList[index2]
by
self._key(self._sList[index1]) < self._key(self._sList[index2])
so the values computed from items are compared, rather than the items themselves.
note that Python 3 dropped the cmp method, and just kept key method.
also note that in your case, using itemgetter(0) as the key function works for subscriptable items such as list (sorting by first item only) or str (sorting by first character only).
Related
I have a list of dictionaries as below,
test = [{'a':100, 'b':1, 'd':3.2},
{'a':200, 'b':5, 'd':8.75},
{'a':500, 'b':2, 'd':6.67},
{'a':150, 'b':7, 'd':3.86},
{'a':425, 'b':2, 'd':7.72},
{'a':424, 'b':2, 'd':7.72}]
Given a 'b' value, I need to find the maximum value of 'd' and extract the corresponding value of 'a' in that dictionary. If there's a tie, then take the highest value of 'a'. e.g. {a:424, b:2, d:7.72} and {a:424, b:2, d:7.72} has b = 2 and their corresponding d values are equal. In that case, I return a = 425.
Following code runs alright. However, I would like to know possible ways to optimise this or to use an anonymous function (lambda) to solve this.
def gerMaxA(bValue):
temporary_d = -999.99
temporary_a = 0
for i in test:
if i['b'] == bValue:
if i['d'] > temporary_d:
temporary_a = i['a']
temporary_d = i['d']
elif i['d'] == temporary_d:
if i['a'] >= temporary_a:
temporary_a = i['a']
ans = (temporary_a, temporary_d)
return ans
Appreciate any insights.
However, I would like to know possible ways to optimise this or to use an anonymous function (lambda) to solve this.
"Optimise" is a red herring - you cannot simple "optimise" something in a void, you must optimise it for some quality (speed, memory usage, etc.)
Instead, I will show how to make the code simpler and more elegant. This is theoretically off-topic for Stack Overflow, but IMX the system doesn't work very well if I try to send people elsewhere.
Given a 'b' value
This means that we will be selecting elements of the list that meet a condition (i.e., the 'b' value matches a target). Another word for this is filtering; while Python does have a built-in filter function, it is normally cleaner and more Pythonic to use a comprehension or generator expression for this purpose. Since we will be doing further processing on the results, we shouldn't choose yet.
I need to find the maximum value of 'd'
More accurately: you see the element which has the maximum value for 'd'. Or, as we like to think of it in the Python world, the maximum element, keyed by 'd'. This is built-in, using the max function. Since we will feed data directly to this function, we don't care about building up a container, so we will choose a generator expression for the first step.
The first step looks like this, and means exactly what it says, read left to right:
(element for element in test if element['b'] == b_value)
"A generator (()) producing: the element, for each element found in test, but only including it if the element's ['b'] value is == b_value".
In the second step, we wrap the max call around that, and supply the appropriate key function. This is, indeed, where we could use lambda:
max(
(element for element in test if element['b'] == b_value),
key=lambda element:(element['d'], element['a'])
)
The lambda here is a function that transforms a given element into that pair of values; max will then compare the filtered dicts according to what value is produced by that lambda for each.
Alternately, we could use a named function - lambdas are the same thing, just without a name and with limits on what can be done within them:
def my_key(element):
return element['d'], element['a']
# and then we do
max((element for element in test if element['b'] == b_value), key=my_key)
Or we could use the standard library:
from operator import itemgetter
max((element for element in test if element['b'] == b_value), key=itemgetter('d', 'a'))
The last step, of course, is simply to extract the ['a'] value from the max result.
Here's an approach that uses built-ins:
In [1]: from operator import itemgetter
In [2]: def max_a_value(b_value, data):
...: matching_values = (d for d in data if d['b'] == b_value)
...: return max(matching_values, key=itemgetter('d','a'))['a']
...:
In [3]: test = [{"a":100, "b":1, "d":3.2},
...: {"a":200, "b":5, "d":8.75},
...: {"a":500, "b":2, "d":6.67},
...: {"a":150, "b":7, "d":3.86},
...: {"a":425, "b":2, "d":7.72},
...: {"a":424, "b":2, "d":7.72}]
In [4]: max_a_value(2, test)
Out[4]: 425
Note, this isn't more algorithmically efficient. Both are O(N)
Yes, you can optimize this. With the given specifications, there is no reason to retain inferior entries. There is also no particular reason to keep this as a list of dictionaries. Instead, make this a simple data frame or reference table. The key is the 'b' value; the value is the desired 'a' value.
Make one pass over your data to convert to a single dict:
test = [ {
1: 100,
2: 425,
5: 200,
7: 150 } ]
There's your better data storage; you've already managed one version of conversion logic.
May be you want to check this. I don't know if it is more efficient but at least looks pythonic:
def gerMaxA(bValue):
dict = {i:x['d'] for i, x in enumerate(test) if x['b']== bValue}
idx = max(dict, key=dict.get)
max_d = test[idx]['d']
dict_a = {k :test[k]['a'] for k in dict.keys() if dict[k] == max_d}
idx_a = max(dict_a, key = dict_a.get)
return test[idx_a]['a'], test[idx]['d']
The last three lines of code make sure that it'll take the greater 'a' value in the case there were many of them.
Useful information:
For information on how to sort a list of various data types see:
How to sort (list/tuple) of lists/tuples?
.. and for information on how to perform a binary search on a sorted list see: Binary search (bisection) in Python
My question:
How can you neatly apply binary search (or another log(n) search algorithm) to a list of some data type, where the key is a inner-component of the data type itself? To keep the question simple we can use a list of tuples as an example:
x = [("a", 1), ("b",2), ("c",3)]
binary_search(x, "b") # search for "b", should return 1
# note how we are NOT searching for ("b",2) yet we want ("b",2) returned anyways
To simplify even further: we only need to return a single search result, not multiple if for example ("b",2) and ("b",3) both existed.
Better yet:
How can we modify the following simple code to perform the above operation?
from bisect import bisect_left
def binary_search(a, x, lo=0, hi=None): # can't use a to specify default for hi
hi = hi if hi is not None else len(a) # hi defaults to len(a)
pos = bisect_left(a, x, lo, hi) # find insertion position
return (pos if pos != hi and a[pos] == x else -1) # don't walk off the end
PLEASE NOTE: I am not looking for the complete algorithm itself. Rather, I am looking for the application of some of Python's standard(ish) libraries, and/or Python's other functionalities so that I can easily search a sorted list of some arbitrary data type at any time.
Thanks
Take advantage of how lexicographic ordering deals with tuples of unequal length:
# bisect_right would also work
index = bisect.bisect_left(x, ('b',))
It may sometimes be convenient to feed a custom sequence type to bisect:
class KeyList(object):
# bisect doesn't accept a key function, so we build the key into our sequence.
def __init__(self, l, key):
self.l = l
self.key = key
def __len__(self):
return len(self.l)
def __getitem__(self, index):
return self.key(self.l[index])
import operator
# bisect_right would *not* work for this one.
index = bisect.bisect_left(KeyList(x, operator.itemgetter(0)), 'b')
What about converting the list of tuples to a dict?
>>> d = dict([("a", 1), ("b",2), ("c",3)])
>>> d['b'] # 2
I came across this question in a very specific context but I soon realized that it has a quite general relevance.
FYI: I'm getting data from a framework and at a point I have transformed it into a list of unordered pairs (could be list of lists or tupels of any size as well but atm. I have 100% pairs). In my case these pairs are representing relationships between data objects and I want to refine my data.
I have a list of unordered tupels and want a list of objects or in this case a dict of dicts. If the same letter indicates the same class and differing numbers indicate different instances I want to accomplish this transformation:
[(a1, x1), (x2, a2), (y1, a2), (y1, a1)] -> {a1:{"y":y1,"x":x1},a2:{"y":y1,"x":x2}}
Note that there can be many "a"s that are connected to the same "x" or "y" but every "a" has at most one "x" or "y" each and that I can't rely on neither the order of the tupels nor the order of the tupel's elements (because the framework does not make a difference between "a" and "x") and I obviously don't care about the order of elements in my dicts - I just need the proper relations. There are many other pairs I don't care about and they can contain "a" elements, "y" elements or "x" elements as well
So the main question is "How to iterate over nested data when there is no reliable order but a need of accessing and checking all elements of the lowest level?"
I tried it in several ways but they don't seem right. For simplicity I just check for A-X pairs here:
def first_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
if pair[0].__cls__ is A and pair[1].__class__ is X:
result[pair[0]] = {"X": pair[1]}
if pair[0].__cls__ is X and pair[1].__class__ is A:
result[pair[1]] = {"X": pair[0]}
return result
def second_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
for index, item in enumerate(pair):
if item.__cls__ is A:
other_index = (index + 1) % 2
if pair[other_index].__class__ is X:
result[item] = {"X":pair[other_index]}
return result
def third_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
for item in pair:
if item.__class__ is A:
for any_item in pair:
if any_item.__class__ is X:
result[item] = {"X":any_item}
return result
The third draft actually works for every size of sub lists and got rid of any non pythonic integer usage but iterating over the same list while iterating over itself? And quintuple nesting for just one line of code? That does not seem right to me and I learned "When there is a problem according to iteration in python and you don't know a good solution - there is a great solution in itertools!" - I just didn't find one.
Does someone now a buildin that can help me or simply a better way to implement my methods?
You can do something like this with strings:
l = [('a1', 'x1','z3'), ('x2', 'a2'), ('y1', 'a2'), ('y1', 'a1')]
res = {}
for tup in l:
main_class = ""
sub_classes = ""
for item in tup:
if item.startswith('a'):
main_class = item
sub_classes = list(tup)
sub_classes.remove(main_class)
if not main_class in res:
res[main_class] = {}
for item in sub_classes:
res[main_class][item[0]] = item[-1]
If your objects aren't strings, you just need to change if a.startswith('a'): to something that determines whether the first item in your pair should be the key or not.
This also handles tuples greater than length two. It iterates each tuple, finding the "main class", and then removes it from a list version of the tuple (so that the new list is all the sub classes).
Looks like Ned Batchelder (who said that every time one have a problem with iterables and don't think there is a nice solution in Python there is a solution in itertools) was right. I finally found a solution I overlooked last time: the permutations method
def final_draft(list_of_pairs):
result = {}
for pair in list_of_pairs:
for permutation in permutations(pair):
if permutation[0].__class__ is A:
my_a = permutation[0]
if permutation[1].__class__ is X:
my_x = permutation[1]
if my_a not in result:
result[my_a] = {}
result[my_a]["key for X"] = my_x
return result
I still have quintuple nesting because I added a check if the key exists (so my original drafts would have sextuple nesting and two productive lines of code) but I got rid of the double iteration over the same iterable and have both minimal index usage and the possibility of working with triplets in the future.
One could avoid the assignments but I prefere "my_a" before permutation[0]
This is how my class looks:
class Item:
def __init__(self, name, is_old, is_init):
self.name = name
self.is_old = is_old
self.is_init = is_init
I have a list of these objects and I want to sort them like this: if the is_init parameter is true, they have to be in the front of the list, if the is_old parameter is true they have to be at the end of the list. The others should be in the middle. I would also like to generate some sort of count for each object (how many have the is_old parameter true, how many have is_init parameter true, etc).
I have been using this:
is_init_count = sum(p.is_init == True for p in item_list)
is_old_count = sum(p.is_old == True for p in item_list)
other_count = len(item_list) - is_init_count + is_old_count
but I'm thinking there might be a more pythonic way and that this could be done together with the sorting.
You can sort with a key that returns a tuple of the properties you're interested in:
item_list = sorted(item_list, key=lambda x: (not x.is_init, x.is_new))
Note that the is_old parameter is saved as self.is_new in your class, which may lead to some confusion.
Next, you can loop through this list a single time and count all the properties you're interested in:
init = 0
old = 0
for i in item_list:
if i.is_init:
init += 1
if i.is_new:
old += 1
I have a list of the following kind:
class Ind(object):
def __init__(self,ID,mate):
self.ID=ID
self.mate=mate
population=[Ind(8,None), Ind(1,2), Ind(20,3), Ind(2,1), Ind(12,None), Ind(3,20), Ind(10,11), Ind(11,10)]
You can think of this list population as a population of individuals which all have an ID. Some of them have a mate (an individual who is present in the same population or the same list). The mate value is actually the ID of the mate! Therefore, if there is an instance of Ind which attributes ID equals 12 and mate equals 34, then there is necessarily an individual in the list whose ID equals 34 and whose mate equals 12. Individuals that do not have a mate have None in the mateattribute. Does it make sense?
I'd like to sort this list so that the first individual mates with the last one, the second individual mates with the second-to-last individual, etc... The individual which attribute mateequals None should stand in the middle of the list.
There are many possible outputs that fit what I want. Here is one example of these outputs for the above list:
population=[Ind(1,2), Ind(20,3), Ind(10,11), Ind(8,None), Ind(12,None), Ind(11,10), Ind(3,20), Ind(2,1)]
You can try something like this:
def custom_sort(population):
pop_dict = { ind.ID: ind for ind in population }
start = []
nones = []
end = []
for ind in population:
if ind.mate is None:
nones.append(ind)
elif pop_dict[ind.mate] not in start:
start.insert(0, ind)
end.append(pop_dict[ind.mate])
return start + nones + end
This is under assumption that "being a mate" is a 1-to-1 relation.
You just need a key for the sorting function. The following example requires that individuals are monogamous and not married to themselves. It also requires that if (a,b) is listed, (b,a) is also listed. If these prerequisites are not met and Ind(2,1) can occur without Ind(1,2), this function will place Ind(2,1) towards the end of the list. The first index in the key function is the type: "first" in relationship (where IDmate) comes third. These first and second types are sorted in order by their ids; last type is sorted in reverse order by its mate.
def keyfun(x):
if x.mate==None:
return (1,x.ID)
elif x.ID<x.mate:
return (0,x.ID)
else:
return (2,-x.mate)
sorted(population,key=keyfun)
Another way to handle this, still assuming that if (a,b) is in the list (b,a) will also be in the list, is to just preprocess by removing (b,a) cases, then postprocess by adding them back in in reverse order.
How about this. Split list into three lists, one with ID < mate, the second with ID > mate, and the third with mate is None. Then, concatenate the sorted lists, each sorted via ID.
I've added a __repr__ method to the Ind class for output readability.
class Ind(object):
def __init__(self,ID,mate):
self.ID=ID
self.mate=mate
def __repr__(self):
return 'Ind({},{})'.format(self.ID,self.mate)
population=[Ind(8,None), Ind(1,2), Ind(2,3), Ind(2,1), Ind(12,None), Ind(3,2), Ind(10,11), Ind(11,10)]
def custom_sort(pop):
singles, less, more = [], [], []
for p in pop:
if p.mate is None:
singles.append(p)
elif p.ID < p.mate:
less.append(p)
elif p.ID > p.mate:
more.append(p)
comp = lambda x,y: cmp(x.ID,y.ID)
return sorted(less,cmp=comp) + sorted(singles,cmp=comp) + sorted(more,cmp=comp,reverse=True)
print custom_sort(population)
This outputs:
[Ind(1,2), Ind(2,3), Ind(10,11), Ind(8,None), Ind(12,None), Ind(11,10), Ind(3,2), Ind(2,1)]
There is a lot you can do with costum key functions:
def my_key(ind):
if ind.mate is None:
return 0
if ind.ID < ind.mate:
return -ind.ID - 1
else:
return ind.mate + 1
population.sort(key=my_key)
This assumes that IDs will never be negative. If IDs are always greater than 0, you can discard the - 1 and + 1.