Ranking a list without breaking ties in Python - python

I need to attribute ranks to elements of a list while making sure tied elements get the same rank.
For instance:
data = [[1],[3],[2],[2]]
c = 0
for i in sorted(data, reverse=True):
i.append(c+1)
c += 1
print data
returns:
[[1, 4], [3, 1], [2, 2], [2, 3]]
Where a rank is appended to the score.
What would I need to change to this simple code to instead obtain:
[[1, 3], [3, 1], [2, 2], [2, 2]]
Where elements scoring 2 are tied and both obtain the second place, while 1, the previously fourth place, is promoted to third place?

Using itertools.groupby, enumerate:
>>> from itertools import groupby
>>> data = [[1],[3],[2],[2]]
>>> sorted_data = sorted(data, reverse=True)
>>> for rank, (_, grp) in enumerate(groupby(sorted_data, key=lambda xs: xs[0]), 1):
... for x in grp:
... x.append(rank)
...
>>> print data
[[1, 3], [3, 1], [2, 2], [2, 2]]

Related

Python list remove does opposite of condition specified

I'm trying to remove elements which have cost>gas but opposite getting printed
[[2, 4], [4, 1], [5, 2]]
def canCompleteCircuit(gas,cost):
pair=[[a,b] for a,b in zip(gas,cost)] #gas,cost
for a in pair:
if(a[1]>a[0]):
pair.remove(a)
print(pair)
gas=[1,2,3,4,5]
cost = [3,4,5,1,2]
canCompleteCircuit(gas,cost)
Instead of removing try to create a new list:
>>> [[g,c] for g, c in zip(gas, cost) if g >= c]
[[4, 1], [5, 2]]
Why don't you use a simple list comprehension?
out = [[g,c] for g,c in zip(gas,cost) if g>=c]
output: [[4, 1], [5, 2]]

Fastest way to find all elements that maximize / minimize a function in a Python list

Let's use a simple example: say I have a list of lists
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
and I want to find all longest lists, which means all lists that maximize the len function. Of course we can do
def func(x):
return len(x)
maxlen = func(max(ll, key=lambda x: func(x)))
res = [l for l in ll if func(l) == maxlen]
print(res)
Output
[[1, 2, 3], [2, 3, 4]]
But I wonder if there are more efficient way to do this, especially when the function is very expensive or the list is very long. Any suggestions?
From a computer science/algorithms perspective, this is a very classical "reduce" problem.
so, pseudocode. It's honestly very straightforward.
metric():= a mapping from elements to non-negative numbers
winner = []
maxmetric = 0
for element in ll:
if metric(element) larger than maxmetric:
winner = [ element ]
maxmetric = metric(element)
else if metric(element) equal to maxmetric:
append element to winner
when the function is very expensive
Note that you do compute func(x) for each element twice, first there
maxlen = func(max(ll, key=lambda x: func(x)))
then there
res = [l for l in ll if func(l) == maxlen]
so it would be beneficial to store what was already computed. functools.lru_cache allow that easily just replace
def func(x):
return len(x)
using
import functools
#functools.lru_cache(maxsize=None)
def func(x):
return len(x)
However, beware as due to way how data are stored argument(s) must be hashable, so in your example you would first need convert list e.g. to tuples i.e.
ll = [(1, 2), (1, 3), (2, 3), (1, 2, 3), (2, 3, 4)]
See descripiton in docs for further discussion
Is not OK use dictionary like below, (this is O(n))
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
from collections import defaultdict
dct = defaultdict(list)
for l in ll:
dct[len(l)].append(l)
dct[max(dct)]
Output:
[[1, 2, 3], [2, 3, 4]]
>>> dct
defaultdict(list, {2: [[1, 2], [1, 3], [2, 3]], 3: [[1, 2, 3], [2, 3, 4]]})
OR use setdefault and without defaultdict like below:
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
dct = {}
for l in ll:
dct.setdefault(len(l), []).append(l)
Output:
>>> dct
{2: [[1, 2], [1, 3], [2, 3]], 3: [[1, 2, 3], [2, 3, 4]]}

remove duplicates from 2d lists regardless of order [duplicate]

This question already has answers here:
Removing permutations from a list of tuples [duplicate]
(1 answer)
Python: Remove pair of duplicated strings in random order
(2 answers)
Closed 10 months ago.
I have a 2d list
a = [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2]]
How can I get the result:
result = [[1,2],[1,3],[2,3]]
Where duplicates are removed regardless of their order of the inner lists.
In [3]: b = []
In [4]: for aa in a:
...: if not any([set(aa) == set(bb) for bb in b if len(aa) == len(bb)]):
...: b.append(aa)
In [5]: b
Out[5]: [[1, 2], [1, 3], [2, 3]]
Try using a set to keep track of what lists you have seen:
from collections import Counter
a = [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2], [1, 2, 1]]
seen = set()
result = []
for lst in a:
current = frozenset(Counter(lst).items())
if current not in seen:
result.append(lst)
seen.add(current)
print(result)
Which outputs:
[[1, 2], [1, 3], [2, 3], [1, 2, 1]]
Note: Since lists are not hash able, you can store frozensets of Counter objects to detect order less duplicates. This removes the need to sort at all.
You can try
a = [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2]]
aa = [tuple(sorted(elem)) for elem in a]
set(aa)
output
{(1, 2), (1, 3), (2, 3)}
The 'Set' concept would come in handy here. The list you have (which contains duplicates) can be converted to a Set (which will never contain a duplicate).
Find more about Sets here : Set
Example :
l = ['foo', 'foo', 'bar', 'hello']
A set can be created directly:
s = set(l)
now if you check the contents of the list
print(s)
>>> {'foo', 'bar', 'hello'}
Set will work this way with any iterable object!
Hope it helps!

Checking for duplicates in list of list and sorting them

I have a table containing:
table = [[5, 7],[4, 3],[3, 3],[2, 3],[1, 3]]
and the first values represented in each list, (5,4,3,2,1) can be said to be an ID of a person. the second values represented (7,3,3,3,3) would be a score. What I'm trying to do is to detect duplicates values in the second column which is in this case is the 3s in the list. Because the 4 lists has 3 as the second value, i now want to sort them based on the first value.
In the table, notice that [1,3] has one as the first value hence, it should replace [4,3] position in the table. [2,3] should replace [3,3] in return.
Expected output: [[5,7],[1,3],[2,3],[3,3],[4,3]]
I attempted:
def checkDuplicate(arr):
i = 0
while (i<len(arr)-1):
if arr[i][1] == arr[i+1][1] and arr[i][0] > arr[i+1][0]:
arr[i],arr[i+1] = arr[i+1],arr[i]
i+=1
return arr
checkDuplicate(table)
The code doesn't fulfil the output i wanted and i would appreciate some help on this matter.
You can use sorted with a key.
table = [[5, 7], [4, 3], [3, 3], [2, 3], [1, 3]]
# Sorts by second index in decreasing order and then by first index in increasing order
sorted_table = sorted(table, key=lambda x: (-x[1], x[0]))
# sorted_table: [[5, 7], [1, 3], [2, 3], [3, 3], [4, 3]]
You should sort the entire list by the second column, using the first to break ties. This has the advantage of correctly grouping the threes even when the seven is interpersed among them, e.g. something like
table = [[4, 3],[3, 3],[5, 7],[2, 3],[1, 3]]
In Python, you can do it with a one-liner:
result = sorted(table, key=lambda x: (-x[1], x[0]))
If you want an in-place sort, do
table.sort(key=lambda x: (-x[1], x[0]))
Another neat thing you can do in this situation is to rely on the stability of Python's sorting algorithm. The docs actually suggest doing multiple sorts in complex cases like this, in the reverse order of the keys. Using the functions from operator supposedly speeds up the code as well:
from opetator import itemgetter
result = sorted(table, key=itemgetter(0))
result.sort(key=itemgetter(1), reversed=True)
The first sort will arrange the IDs in the correct order. The second will sort by score, in descending order, leaving the IDs undisturbed for identical scores since the sort is stable.
If you want to leave the list items with non-duplicate second elements untouched, and the ability to deal with the cases where multiple second items can be duplicate, I think you'll need more than the built-in sort.
What my function achieves:
Say your list is: table = [[5, 7], [6, 1], [8, 9], [3, 1], [4, 3], [3, 3], [2, 3], [1, 3]]
It will not touch the items [5, 7] and [8, 9], but will sort the remaining items by swapping them based on their second elements. The result will be:
[[5, 7], [3, 1], [8, 9], [6, 1], [1, 3], [2, 3], [3, 3], [4, 3]]
Here is the code:
def secondItemSort(table):
# First get your second values
secondVals = [e[1] for e in table]
# The second values that are duplicate
dups = [k for k,v in Counter(secondVals).items() if v>1]
# The indices of those duplicate second values
indices = dict()
for d in dups:
for i, e in enumerate(table):
if e[1]==d:
indices.setdefault(d, []).append(i)
# Now do the sort by swapping the items intelligently
for dupVal, indexList in indices.items():
sortedItems = sorted([table[i] for i in indexList])
c = 0
for i in range(len(table)):
if table[i][1] == dupVal:
table[i] = sortedItems[c]
c += 1
# And return the intelligently sorted list
return table
Test
Let's test on a little bit more complicated table:
table = [[5, 7], [6, 1], [8, 9], [3, 1], [4, 3], [3, 9], [3, 3], [2, 2], [2, 3], [1, 3]]
Items that should stay in their places: [5, 7] and [2, 2].
Items that should be swapped:
[6, 1] and [3, 1].
[8, 9] and [3, 9]
[4, 3], [3, 3], [2, 3], [1, 3]
Drumroll...
In [127]: secondItemSort(table)
Out[127]:
[[5, 7],
[3, 1],
[3, 9],
[6, 1],
[1, 3],
[8, 9],
[2, 3],
[2, 2],
[3, 3],
[4, 3]]

Sort lists by items at given index

Sorry if I get terminology wrong - I've only just started learning Python, and I'm receiving instruction from friends instead of being on an actual course.
I want to search a list containing lots of arrays containing multiple elements, and find arrays with some elements matching, but some different.
In less confusing terms e.g. I have a list of arrays that each contain 2 elements (I think this is called a 2D array?) so:
list = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2], ...]
In my specific example, the first elements in each sub array just ascend linearly, but the second elements are almost random. I want to find or sort the arrays only by the second number. I could just remove the first number from each array:
list = [2, 2, 5, 1, 2 ...]
And then use something like "if list[x] == 1" to find '1' etc.
(side note: I'm not sure how to find ALL the values if one value is repeated - I can't remember quite what I wrote but it would only ever find the first instance where the value matched, so e.g. it would detect the first '2' but not the second or third)
But I want to keep the first values in each array. My friend told me that you could use a dictionary with values and keys, which would work for my example, but I want to know what the more general method would be.
So in my example, I hoped that if I wrote this:
if list[[?, x]] == [?, 1]
Then it would find the array where the second value of the array was 1, (i.e. [4, 1] in my example) and not care about the first value. Obviously it didn't work because '?' isn't Python syntax as far as I'm aware, but hopefully you can see what I'm trying to do?
So for a more general case, if I had a list of 5 dimensional arrays and I wanted to find the second and fourth values of each array, I would write:
if list[[?, x, ?, y, ?]] == [?, a, ?, b, ?]
And it would match any array where the value of the second element was 'a', and the value of the fourth was 'b'.
e.g. [3, a, 4, b, 7], [20, a, 1, b, 9], ['cat', a, 'dog', b, 'fish'] etc. would all be possible results found by the command.
So I want to know if there's any similar way to my method of using a question mark (but that actually works) to denote that an element in an array can have any value.
To sort on the second element for a list containg lists (or tuples):
from operator import itemgetter
mylist = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2]]
sortedlist = sorted(mylist, key=itemgetter(1))
See the Python sorting howto.
Use sorted if you want to keep original list unaffected
lst = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2]]
In [103]: sorted(lst, key=lambda x: x[1])
Out[103]: [[4, 1], [1, 2], [2, 2], [5, 2], [3, 5]]
else use list.sort to sort current list and keep sorted list
In [106]: lst.sort(key=lambda x: x[1])
In [107]: lst
Out[107]: [[4, 1], [1, 2], [2, 2], [5, 2], [3, 5]]
or use operator.itemgetter
from operator import itemgetter
In [108]: sorted(lst, key=itemgetter(1))
Out[108]: [[4, 1], [1, 2], [2, 2], [5, 2], [3, 5]]
You could use a list comprehension to build a list of all the desired items:
In [16]: seq = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2]]
To find all items where the second element is 1:
In [17]: [pair for pair in seq if pair[1] == 1]
Out[17]: [[4, 1]]
This finds all items where the second element is 2:
In [18]: [pair for pair in seq if pair[1] == 2]
Out[18]: [[1, 2], [2, 2], [5, 2]]
Instead of
if list[[?, x, ?, y, ?]] == [?, a, ?, b, ?]
you could use
[item for item in seq if item[1] == 'a' and item[3] == 'b']
Note, however, that each time you use a list comprehension, Python has to loop
through all the elements of seq. If you are doing this search multiple times,
you might be better off building a dict:
import collections
seq = [[1, 2], [2, 2], [3, 5], [4, 1], [5, 2]]
dct = collections.defaultdict(list)
for item in seq:
key = item[1]
dct[key].append(item)
And then you could access the items like this:
In [22]: dct[1]
Out[22]: [[4, 1]]
In [23]: dct[2]
Out[23]: [[1, 2], [2, 2], [5, 2]]
The list comprehension
[pair for pair in seq if pair[1] == 1]
is roughly equivalent to
result = list()
for pair in seq:
if pair[1] == 1:
result.append(pair)
in the sense that result would then equal the list comprehension.
The list comprehension is just a syntactically prettier way to express the same
thing.
The list comprehension above has three parts:
[expression for-loop conditional]
The expression is pair, the for-loop is for pair in seq, and the conditional is if pair[1] == 1.
Most, but not all list comprehensions share this syntax. The full list comprehension grammar is given here.

Categories

Resources