Say I have a list:
my_list = ['foo', 'fa', 'goo']
I would like to turn this list into this:
[(1, 'foo'), (2, 'fa'), (3, 'goo')]
This way, I could iterate over the list and see what number it is in the list. Any help would be appreciated, I have been wondering what function does this for so long, I just don't know what exactly to search to find the answer.
The built-in function enumerate(iterable) is made to do literally this:
new_list = list(enumerate(my_list))
# [(0, 'foo'), (1, 'fa'), (2, 'goo')]
Giving a second argument to enumerate() will let you choose what index to start at, so you can 1-index:
new_list = list(enumerate(my_list, 1))
# [(1, 'foo'), (2, 'fa'), (3, 'goo')]
You can alternatively use a list comprehension to 1-index it, if you need to:
new_list = [(i+1, v) for (i, v) in enumerate(my_list)]
# [(1, 'foo'), (2, 'fa'), (3, 'goo')]
all I have a list of list of tuple here
A =[[(1, 52), (1, 12), (-1, -1)],[(-1, 23), (1, 42), (-1, -1)],[(1, -1), (-1, -1), (1, 42)]]
I wanted get the tuples containing the max values in second element of the tuple, column-wise.
I tried accessing columns like this
A[:,2]
But I get the error
TypeError: list indices must be integers, not tuple
Thanks in advance, Please let me know if you need any other information
Edit 1:
Desired output:
[(1, 52),(1, 42),(1, 42)]
[max(a,key=lambda x:x[1]) for a in zip(*A)]
output:
[(1, 52), (1, 42), (1, 42)]
Let me know if this works for you I will explain the answer.
You can access columns like this..
>>> list(zip(*A)[0])
[(1, 52), (-1, 23), (1, -1)]
>>> list(zip(*A)[1])
[(1, 12), (1, 42), (-1, -1)]
Explanation
zip https://docs.python.org/3/library/functions.html#zip
>>> x=[1,2,3]
>>> y=['a','b','c']
>>> z=['first','second','third']
>>> zip(x,y,z)
[(1, 'a', 'first'), (2, 'b', 'second'), (3, 'c', 'third')]
Now imagine x,y,z being the rows you had in A. By zip(rows) it returns 1st elements, 2nd elements, 3rd elements etc... There by returning us columns of the rows we passed.
Note: zip acts on multiple arguments passed in so we need to send multiple rows like x,y,z separately, not like [x,y,z] as a list. That is done by *A which separates the rows and passes to zip.
Now we got different columns
maxhttps://docs.python.org/3/library/functions.html#max
max(1,2) #Will return 2
max(cars,lambda x:x.speed) #Will give you the fastest car
max(cars,lambda x:x.capacity) #Will give you the biggest passenger car
max(tups,lambda x:x[1]) #Will give you the tuple with biggest 2nd element
List Comprehensionhttps://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
A=[1,2,3]
[x**2 for x in A] #Will give you [1,4,9]
[x**3 for x in A] #Will give you [1,8,27]
Finally
[max(a,key=lambda x:x[1]) for a in zip(*A)]
Will give you max for each column!
You can try this:
A =[[(1, 52), (1, 12), (-1, -1)],[(-1, 23), (1, 42), (-1, -1)],[(1, -1), (-1, -1), (1, 42)]]
new_A = [max(a, key=lambda x: x[-1]) for a in zip(*A)]
Output:
[(1, 52), (1, 42), (1, 42)]
A is a list is lists of tuples. Basic Python does not recognised multiple-element subscripting, although Numpy and similar modules extend it. Your subscript expression :,2 is therefore interpreted as a tuple whose first element is a lice and whose second element is an integer, which (as the message explains) is not acceptable as a list index.
Unfortunately, "the tuples containing the max values in second element of the tuple, column-wise" isn't a terribly good description of the actual desired result.
I presume the answer you would like is [(1, 52), (1, 42), (1, 42)].
One relatively simple way to achieve this is to sort each of the sub-lists separately, taking the last element of each. this could be spelled as
result = [sorted(x, key=lambda z: z[1])[-1] for x in A]
The key argument to the sorted function ensures that each list is sorted on its second element, the
[-1] subscript takes the last (and therefore highest) element of the sorted list, and the for x in A ensure that each element of the output corresponds to an element (i.e., a list of three tuples) of the input.
This is a question is an extension of What's the most Pythonic way to identify consecutive duplicates in a list?.
Suppose you have a list of tuples:
my_list = [(1,4), (2,3), (3,2), (4,4), (5,2)]
and you sort it by each tuple's last value:
my_list = sorted(my_list, key=lambda tuple: tuple[1])
# [(3,2), (5,2), (2,3), (1,4), (4,4)]
then we have two consecutive runs (looking at the last value in each tuple), namely [(3,2), (5,2)] and [(1,4), (4,4)].
What is the pythonic way to reverse each run (not the tuples within), e.g.
reverse_runs(my_list)
# [(5,2), (3,2), (2,3), (4,4), (1,4)]
Is this possible to do within a generator?
UPDATE
It has come to my attention that perhaps the example list was not clear. So instead consider:
my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"), (6,"A"),(7,"A"), (8,"D")]
Where the ideal output from reverse_runs would be
[(7,"A"), (6,"A"), (1,"A"), (2,"B"), (3,"C"), (4,"C"), (5,"C"), (8,"D")]
To be clear on terminology, I am adopting the use of "run" as used in describing TimSort which is what Python's sort function is based upon - giving it (the sort function) its safety.
Thus if you sort on a collection, should the collection be multi-faceted, then only the specified dimension is sorted on and if two elements are the same for the specified dimension, their ordering will not be altered.
Thus the following function:
sorted(my_list,key=lambda t: t[1])
yields:
[(1, 'A'), (6, 'A'), (7, 'A'), (2, 'B'), (5, 'C'), (4, 'C'), (3, 'C'), (8, 'D')]
and the run on "C" (i.e. (5, 'C'), (4, 'C'), (3, 'C') ) is not disturbed.
So in conclusion the desired output from the yet to be defined function reverse_runs:
1.) sorts the tuples by their last element
2.) maintaining the order of the first element, reverses runs on the last element
Ideally I would like this in a generator functions, but that does not (to me at the moment) seem possible.
Thus one could adopt the following strategy:
1.) Sort the tuples by the last element via sorted(my_list, key=lambda tuple: tuple[1])
2.) Identify the indexes for the last element in each tuple when the succeeding tuple (i+1) is different than the last element in (i). i.e. identify runs
3.) Make an empty list
4.) Using the splice operator, obtain, reverse, and the append each sublist to the empty list
I think this will work.
my_list = [(1,4), (2,3), (3,2), (4,4), (5,2)]
my_list = sorted(my_list, key=lambda tuple: (tuple[1], -tuple[0]))
print(my_list)
Output
[(5, 2), (3, 2), (2, 3), (4, 4), (1, 4)]
Misunderstood question. Less pretty but this should work for what you really want:
from itertools import groupby
from operator import itemgetter
def reverse_runs(l):
sorted_list = sorted(l, key=itemgetter(1))
reversed_groups = (reversed(list(g)) for _, g in groupby(sorted_list, key=itemgetter(1)))
reversed_runs = [e for sublist in reversed_groups for e in sublist]
return reversed_runs
if __name__ == '__main__':
print(reverse_runs([(1, 4), (2, 3), (3, 2), (4, 4), (5, 2)]))
print(reverse_runs([(1, "A"), (2, "B"), (5, "C"), (4, "C"), (3, "C"), (6, "A"), (7, "A"), (8, "D")]))
Output
[(5, 2), (3, 2), (2, 3), (4, 4), (1, 4)]
[(7, 'A'), (6, 'A'), (1, 'A'), (2, 'B'), (3, 'C'), (4, 'C'), (5, 'C'), (8, 'D')]
Generator version:
from itertools import groupby
from operator import itemgetter
def reverse_runs(l):
sorted_list = sorted(l, key=itemgetter(1))
reversed_groups = (reversed(list(g)) for _, g in groupby(sorted_list, key=itemgetter(1)))
for group in reversed_groups:
yield from group
if __name__ == '__main__':
print(list(reverse_runs([(1, 4), (2, 3), (3, 2), (4, 4), (5, 2)])))
print(list(reverse_runs([(1, "A"), (2, "B"), (5, "C"), (4, "C"), (3, "C"), (6, "A"), (7, "A"), (8, "D")])))
The most general case requires 2 sorts. The first sort is a reversed sort on the second criteria. The second sort is a forward sort on the first criteria:
pass1 = sorted(my_list, key=itemgetter(0), reverse=True)
result = sorted(pass1, key=itemgetter(1))
We can sort in multiple passes like this because python's sort algorithm is guaranteed to be stable.
However, in real life it's often possible to simply construct a more clever key function which allows the sorting to happen in one pass. This usually involves "negating" one of the values and relying on the fact that tuples order themselves lexicographically:
result = sorted(my_list, key=lambda t: (t[1], -t[0]))
In response to your update, it looks like the following might be a suitable solution:
from operator import itemgetter
from itertools import chain, groupby
my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"), (6,"A"),(7,"A"), (8,"D")]
pass1 = sorted(my_list, key=itemgetter(1))
result = list(chain.from_iterable(reversed(list(g)) for k, g in groupby(pass1, key=itemgetter(1))))
print(result)
We can take apart the expression:
chain.from_iterable(reversed(list(g)) for k, g in groupby(pass1, key=itemgetter(1)))
to try to figure out what it's doing...
First, let's look at groupby(pass1, key=itemgetter(1)). groupby will yield 2-tuples. The first item (k) in the tuple is the "key" -- e.g. whatever was returned from itemgetter(1). The key isn't really important here after the grouping has taken place, so we don't use it. The second item (g -- for "group") is an iterable that yields consecutive values that have the same "key". This is exactly the items that you requested, however, they're in the order that they were in after sorting. You requested them in reverse order. In order to reverse an arbitrary iterable, we can construct a list from it and then reverse the list. e.g. reversed(list(g)). Finally, we need to paste those chunks back together again which is where chain.from_iterable comes in.
If we want to get more clever, we might do better from an algorithmic standpoint (assuming that the "key" for the bins is hashible). The trick is to bin the objects in a dictionary and then sort the bins. This means that we're potentially sorting a much shorter list than the original:
from collections import defaultdict, deque
from itertools import chain
my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"), (6,"A"),(7,"A"), (8,"D")]
bins = defaultdict(deque)
for t in my_list:
bins[t[1]].appendleft(t)
print(list(chain.from_iterable(bins[key] for key in sorted(bins))))
Note that whether this does better than the first approach is very dependent on the initial data. Since TimSort is such a beautiful algorithm, if the data starts already grouped into bins, then this algorithm will likely not beat it (though, I'll leave it as an exercise for you to try...). However, if the data is well scattered (causing TimSort to behave more like MergeSort), then binning first will possibly make for a slight win.