How to get the highest 4 tuple values? - python

I am trying to get the highest 4 values in a list of tuples and put them into a new list. However, if there are two tuples with the same value I want to take the one with the lowest number.
The list originally looks like this:
[(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)...]
And I want the new list to look like this:
[(9,20), (3,16), (54, 13), (2,10)]
This is my current code any suggestions?
sorted_y = sorted(sorted_x, key=lambda t: t[1], reverse=True)[:5]
sorted_z = []
while n < 4:
n = 0
x = 0
y = 0
if sorted_y[x][y] > sorted_y[x+1][y]:
sorted_z.append(sorted_y[x][y])
print(sorted_z)
print(n)
n = n + 1
elif sorted_y[x][y] == sorted_y[x+1][y]:
a = sorted_y[x]
b = sorted_y[x+1]
if a > b:
sorted_z.append(sorted_y[x+1][y])
else:
sorted_z.append(sorted_y[x][y])
n = n + 1
print(sorted_z)
print(n)
Edit: When talking about lowest value I mean the highest value in the second value of the tuple and then if two second values are the same I want to take the lowest first value of the two.

How about groupby?
from itertools import groupby, islice
from operator import itemgetter
data = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
pre_sorted = sorted(data, key=itemgetter(1), reverse=True)
result = [sorted(group, key=itemgetter(0))[0] for key, group in islice(groupby(pre_sorted, key=itemgetter(1)), 4)]
print(result)
Output:
[(9, 20), (3, 16), (54, 13), (2, 10)]
Explanation:
This first sorts the data by the second element's value in descending order. groupby then puts them into groups where each tuple in the group has the same value for the second element.
Using islice, we take the top four groups and sort each by the value of the first element in ascending order. Taking the first value of each group, we arrive at our answer.

You can try this :
l = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
asv = set([i[1] for i in l]) # The set of unique second elements
new_l = [(min([i[0] for i in l if i[1]==k]),k) for k in asv]
OUTPUT :
[(3, 16), (2, 10), (9, 20), (54, 13)]

Related

How to find the maximum per group in an rdd?

I'm using PySpark and I have an RDD that looks like this:
[
("Moviex", [(1, 100), (2, 20), (3, 50)]),
("MovieY", [(1, 100), (2, 250), (3, 100), (4, 120)]),
("MovieZ", [(1, 1000), (2, 250)]),
("MovieX", [(4, 50), (5, 10), (6, 0)]),
("MovieY", [(3, 0), (4, 260)]),
("MovieZ", [(5, 180)]),
]
The first element in the tuple represents the week number and the second element represents the number of viewers. I want to find the week with the most views for each movie, but ignoring the first week.
I've tried some things but nothing worked, for example:
stats.reduceByKey(max).collect()
returns:
[('MovieX', [(4, 50), (5, 10), (6, 0)]),
('MovieY', [(5, 180)]),
('MovieC', [(3, 0), (4, 260)])]
so the entire second set.
Also this:
stats.groupByKey().reduce(max)
which returns just this:
('MovieZ', <pyspark.resultiterable.ResultIterable at 0x558f75eeb0>)
How can I solve this?
If you want the most views per movie, ignoring the first week ... [('MovieA', 50), ('MovieC', 250), ('MovieB', 260)]
Then, you'll want your own map function rather than a reduce.
movie_stats = spark.sparkContext.parallelize([
("MovieA", [(1, 100), (2, 20), (3, "50")]),
("MovieC", [(1, 100), (2, "250"), (3, 100), (4, "120")]),
("MovieB", [(1, 1000), (2, 250)]),
("MovieA", [(4, 50), (5, "10"), (6, 0)]),
("MovieB", [(3, 0), (4, "260")]),
("MovieC", [(5, "180")]),
])
def get_views_after_first_week(v):
values = iter(v) # iterator of tuples, groupped by key
result = list()
for x in values:
result.extend([int(y[1]) for y in x if y[0] > 1])
return result
mapped = movie_stats.groupByKey().mapValues(get_views_after_first_week).mapValues(max)
mapped.collect()
to include the week number... [('MovieA', (3, 50)), ('MovieC', (2, 250)), ('MovieB', (4, 260))]
def get_max_weekly_views_after_first_week(v):
values = iter(v) # iterator of tuples, groupped by key
max_views = float('-inf')
max_week = None
for x in values:
for t in x:
week, views = t
views = int(views)
if week > 1 and views > max_views:
max_week = week
max_views = views
return (max_week, max_views, )
mapped = movie_stats.groupByKey().mapValues(get_max_weekly_views_after_first_week)
Some code is needed to convert the string into int, and apply a map function to 1) filter out week 1 data; 2) get the week with max view.
def helper(arr: list):
max_week = None
for sub_arr in arr:
for item in sub_arr:
if item[0] == 1:
continue
count = int(item[1])
if max_week is None or max_week[1] < count:
max_week = [item[0], count]
return max_week
movie_stats.groupByKey().map(lambda x: (x[0], helper(x[1]))).collect()

Sort a list of tuples in consecutive order

I want to sort a list of tuples in a consecutive order, so the first element of each tuple is equal to the last element of the previous one.
For example:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
I have developed a search like this:
output=[]
given = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
t = given[0][0]
for i in range(len(given)):
# search tuples starting with element t
output += [e for e in given if e[0] == t]
t = output[-1][-1] # Get the next element to search
print(output)
Is there a pythonic way to achieve such order?
And a way to do it "in-place" (with only a list)?
In my problem, the input can be reordered in a circular way using all the tuples, so it is not important the first element chosen.
Assuming your tuples in the list will be circular, you may use dict to achieve it within complexity of O(n) as:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
input_dict = dict(input) # Convert list of `tuples` to dict
elem = input[0][0] # start point in the new list
new_list = [] # List of tuples for holding the values in required order
for _ in range(len(input)):
new_list.append((elem, input_dict[elem]))
elem = input_dict[elem]
if elem not in input_dict:
# Raise exception in case list of tuples is not circular
raise Exception('key {} not found in dict'.format(elem))
Final value hold by new_list will be:
>>> new_list
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
if you are not afraid to waste some memory you could create a dictionary start_dict containing the start integers as keys and the tuples as values and do something like this:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_dict = {item[0]: item for item in tpl}
start = tpl[0][0]
res = []
while start_dict:
item = start_dict[start]
del start_dict[start]
res.append(item)
start = item[-1]
print(res)
if two tuples start with the same number you will lose one of them... if not all the start numbers are used the loop will not terminate.
but maybe this is something to build on.
Actually there're many questions about what you intend to have as an output and what if the input list has invalid structure to do what you need.
Assuming you have an input of pairs where each number is included twice only. So we can consider such input as a graph where numbers are nodes and each pair is an edge. And as far as I understand your question you suppose that this graph is cyclic and looks like this:
10 - 7 - 13 - 4 - 9 - 10 (same 10 as at the beginning)
This shows you that you can reduce the list to store the graph to [10, 7, 13, 4, 9]. And here is the script that sorts the input list:
# input
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# sorting and archiving
first = input[0][0]
last = input[0][1]
output_in_place = [first, last]
while last != first:
for item in input:
if item[0] == last:
last = item[1]
if last != first:
output_in_place.append(last)
print(output_in_place)
# output
output = []
for i in range(len(output_in_place) - 1):
output.append((output_in_place[i], output_in_place[i+1]))
output.append((output_in_place[-1], output_in_place[0]))
print(output)
I would first create a dictionary of the form
{first_value: [list of tuples with that first value], ...}
Then work from there:
from collections import defaultdict
chosen_tuples = input[:1] # Start from the first
first_values = defaultdict()
for tup in input[1:]:
first_values[tup[0]].append(tup)
while first_values: # Loop will end when all lists are removed
value = chosen_tuples[-1][1] # Second item of last tuple
tuples_with_that_value = first_values[value]
chosen_tuples.append(tuples_with_that_value.pop())
if not chosen_with_that_value:
del first_values[value] # List empty, remove it
You can try this:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [input[0]] # output contains the first element of input
temp = input[1:] # temp contains the rest of elements in input
while temp:
item = [i for i in temp if i[0] == output[-1][1]].pop() # We compare each element with output[-1]
output.append(item) # We add the right item to output
temp.remove(item) # We remove each handled element from temp
Output:
>>> output
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
Here is a robust solution using the sorted function and a custom key function:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
def consec_sort(lst):
def key(x):
nonlocal index
if index <= lower_index:
index += 1
return -1
return abs(x[0] - lst[index - 1][1])
for lower_index in range(len(lst) - 2):
index = 0
lst = sorted(lst, key=key)
return lst
output = consec_sort(input)
print(output)
The original list is not modified. Note that sorted is called 3 times for your input list of length 5. In each call, one additional tuple is placed correctly. The first tuple keeps it original position.
I have used the nonlocal keyword, meaning that this code is for Python 3 only (one could use global instead to make it legal Python 2 code).
My two cents:
def match_tuples(input):
# making a copy to not mess up with the original one
tuples = input[:] # [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
last_elem = tuples.pop(0) # (10,7)
# { "first tuple's element": "index in list"}
indexes = {tup[0]: i for i, tup in enumerate(tuples)} # {9: 3, 4: 0, 13: 1, 7: 2}
yield last_elem # yields de firts element
for i in range(len(tuples)):
# get where in the list is the tuple which first element match the last element in the last tuple
list_index = indexes.get(last_elem[1])
last_elem = tuples[list_index] # just get that tuple
yield last_elem
Output:
input = [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
print(list(match_tuples(input)))
# output: [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
this is a (less efficient than the dictionary version) variant where the list is changed in-place:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
for i in range(1, len(tpl)-1): # iterate over the indices of the list
item = tpl[i]
for j, next_item in enumerate(tpl[i+1:]): # find the next item
# in the remaining list
if next_item[0] == item[1]:
next_index = i + j
break
tpl[i], tpl[next_index] = tpl[next_index], tpl[i] # now swap the items
here is a more efficient version of the same idea:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_index = {item[0]: i for i, item in enumerate(tpl)}
item = tpl[0]
next_index = start_index[item[-1]]
for i in range(1, len(tpl)-1):
tpl[i], tpl[next_index] = tpl[next_index], tpl[i]
# need to update the start indices:
start_index[tpl[next_index][0]] = next_index
start_index[tpl[i][0]] = i
next_index = start_index[tpl[i][-1]]
print(tpl)
the list is changed in-place; the dictionary only contains the starting values of the tuples and their index in the list.
To get a O(n) algorithm one needs to make sure that one doesn't do a double-loop over the array. One way to do this is by keeping already processed values in some sort of lookup-table (a dict would be a good choice).
For example something like this (I hope the inline comments explain the functionality well). This modifies the list in-place and should avoid unnecessary (even implicit) looping over the list:
inp = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# A dictionary containing processed elements, first element is
# the key and the value represents the tuple. This is used to
# avoid the double loop
seen = {}
# The second value of the first tuple. This must match the first
# item of the next tuple
current = inp[0][1]
# Iteration to insert the next element
for insert_idx in range(1, len(inp)):
# print('insert', insert_idx, seen)
# If the next value was already found no need to search, just
# pop it from the seen dictionary and continue with the next loop
if current in seen:
item = seen.pop(current)
inp[insert_idx] = item
current = item[1]
continue
# Search the list until the next value is found saving all
# other items in the dictionary so we avoid to do unnecessary iterations
# over the list.
for search_idx in range(insert_idx, len(inp)):
# print('search', search_idx, inp[search_idx])
item = inp[search_idx]
first, second = item
if first == current:
# Found the next tuple, break out of the inner loop!
inp[insert_idx] = item
current = second
break
else:
seen[first] = item

How to merge repeated elements in list in python?

I have a list of coordinate like:
list_coordinate =[(9,0),(9,1),(9,3) ... (53,0),(53,1),(53,3)...(54,0),(54,1)..]
value = []
for m in range(0,len(list_coordinate)):
if m != len(list_coordinate)-1:
if list_coordinate[m][0]==list_coordinate[m+1][0]:
value.append(list_coordinate[m][0])`
Output of this code:
value = [9,9 ,9,...,53,53,53,...,54,54,54,54...]
I want to merge this value list for similar element and want output as:
Expected output:
[9,53,54]
If you prefer one-liners, you can do it like this:
list(set(map(lambda x: x[0], list_coordinate)))
It will output:
[9, 53, 54]
Note: As set is being used in the code, ordering of the elements is not guaranteed here.
you can use itertools.groupby
from itertools import groupby
value = [9,9 ,9,53,53,53,54,54,54,54]
g = [k for k,_ in groupby(value)]
print(g)
which produces
[9, 53, 54]
and it is guaranteed to be in the same order as the input list (if it matters).
Basically
groupby(iterable[, keyfunc])
groups the elements in the iterable, passing to a new group when the key function changes.
If the key function is omitted, the identity function is assumed, and the key for the group will be each element encountered.
So as long as the elements in value stay the same, they will be grouped under the same key, which is the element itself.
Note: this works for contiguous repetitions only. In case you wanted to get rid of re-occurring duplicates, you should sort the list first (as groupby docs explains)
As per your comment below, in case you wanted to operate on the coordinates directly
list_coordinate = [(9,0), (9,1), (9,3), (53,0), (53,1), (53,3), (54,0), (54,1)]
g = [k for k,_ in groupby(list_coordinate, lambda x: x[0])]
print(g)
produces the same output
[9, 53, 54]
You could use an OrderedDict for both of your cases. Firstly for just the x coordinates:
list_coords = [(9, 0), (9, 1), (9, 3), (53, 0), (53, 1), (53, 3), (54, 0), (54, 1)]
merged = OrderedDict()
for coord in list_coords:
merged[coord[0]] = 1
print merged.keys()
Giving:
[9, 53, 54]
Note, if for example (9, 0) was repeated later on, it would not change the output.
Secondly, for whole coordinates. Note, the data has (10 ,0) repeated 3 times:
list_coords = [(9, 0), (9, 1), (9, 3), (10, 0), (10, 0), (10, 0), (53, 0), (53, 1), (53, 3), (54, 0), (54, 1)]
merged = OrderedDict()
for coord in list_coords:
merged[coord] = 1
print merged.keys()
Giving:
[(9, 0), (9, 1), (9, 3), (10, 0), (53, 0), (53, 1), (53, 3), (54, 0), (54, 1)]
Why don't you use a set:
{ k[0] for k in list_coordinate }

Group continuous numbers in a tuple with tolerance range

if i have a tuple set of numbers:
locSet = [(62.5, 121.0), (62.50000762939453, 121.00001525878906), (63.0, 121.0),(63.000003814697266, 121.00001525878906), (144.0, 41.5)]
I want to group them with a tolerance range of +/- 3.
aFunc(locSet)
which returns
[(62.5, 121.0), (144.0, 41.5)]
I have seen Identify groups of continuous numbers in a list but that is for continous integers.
If I have understood well, you are searching the tuples whose values differs in an absolute amount that is in the tolerance range: [0, 1, 2, 3]
Assuming this, my solution returns a list of lists, where every internal list contains tuples that satisfy the condition.
def aFunc(locSet):
# Sort the list.
locSet = sorted(locSet,key=lambda x: x[0]+x[1])
toleranceRange = 3
resultLst = []
for i in range(len(locSet)):
sum1 = locSet[i][0] + locSet[i][1]
tempLst = [locSet[i]]
for j in range(i+1,len(locSet)):
sum2 = locSet[j][0] + locSet[j][1]
if (abs(sum1-sum2) in range(toleranceRange+1)):
tempLst.append(locSet[j])
if (len(tempLst) > 1):
for lst in resultLst:
if (list(set(tempLst) - set(lst)) == []):
# This solution is part of a previous solution.
# Doesn't include it.
break
else:
# Valid solution.
resultLst.append(tempLst)
return resultLst
Here two use examples:
locSet1 = [(62.5, 121.0), (62.50000762939453, 121.00001525878906), (63.0, 121.0),(63.000003814697266, 121.00001525878906), (144.0, 41.5)]
locSet2 = [(10, 20), (12, 20), (13, 20), (14, 20)]
print aFunc(locSet1)
[[(62.5, 121.0), (144.0, 41.5)]]
print aFunc(locSet2)
[[(10, 20), (12, 20), (13, 20)], [(12, 20), (13, 20), (14, 20)]]
I hope to have been of help.

max second element in tuples python [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Sorting or Finding Max Value by the second element in a nested list. Python
I've written a program that gives me a list of tuples. I need to grab the tuple with with the max number in the second value.
(840, 32), (841, 3), (842, 4), (843, 4), (844, 6), (845, 6), (846, 12), (847, 6), (848, 10), (849, 4), ..snip...
I need to get back (840,32) because 32 is the highest second number in the tuple. How can I achieve this? I've tried a variety of ways but keep getting stuck here is the complete code:
D = {}
def divisor(n):
global D
L = []
for i in range(1,n+1):
if n % i == 0:
L.append(i)
D[n] = len(L)
for j in range(1001):
divisor(j)
print(D.items())
Use max() with lambda:
In [22]: lis=[(840, 32), (841, 3), (842, 4), (843, 4), (844, 6), (845, 6), (846, 12), (847, 6), (848, 10), (849, 4)]
In [23]: max(lis, key=lambda x:x[1])
Out[23]: (840, 32)
or operator.itemgetter:
In [24]: import operator
In [25]: max(lis, key=operator.itemgetter(1))
Out[25]: (840, 32)

Categories

Resources