Related
I am trying to get the highest 4 values in a list of tuples and put them into a new list. However, if there are two tuples with the same value I want to take the one with the lowest number.
The list originally looks like this:
[(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)...]
And I want the new list to look like this:
[(9,20), (3,16), (54, 13), (2,10)]
This is my current code any suggestions?
sorted_y = sorted(sorted_x, key=lambda t: t[1], reverse=True)[:5]
sorted_z = []
while n < 4:
n = 0
x = 0
y = 0
if sorted_y[x][y] > sorted_y[x+1][y]:
sorted_z.append(sorted_y[x][y])
print(sorted_z)
print(n)
n = n + 1
elif sorted_y[x][y] == sorted_y[x+1][y]:
a = sorted_y[x]
b = sorted_y[x+1]
if a > b:
sorted_z.append(sorted_y[x+1][y])
else:
sorted_z.append(sorted_y[x][y])
n = n + 1
print(sorted_z)
print(n)
Edit: When talking about lowest value I mean the highest value in the second value of the tuple and then if two second values are the same I want to take the lowest first value of the two.
How about groupby?
from itertools import groupby, islice
from operator import itemgetter
data = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
pre_sorted = sorted(data, key=itemgetter(1), reverse=True)
result = [sorted(group, key=itemgetter(0))[0] for key, group in islice(groupby(pre_sorted, key=itemgetter(1)), 4)]
print(result)
Output:
[(9, 20), (3, 16), (54, 13), (2, 10)]
Explanation:
This first sorts the data by the second element's value in descending order. groupby then puts them into groups where each tuple in the group has the same value for the second element.
Using islice, we take the top four groups and sort each by the value of the first element in ascending order. Taking the first value of each group, we arrive at our answer.
You can try this :
l = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
asv = set([i[1] for i in l]) # The set of unique second elements
new_l = [(min([i[0] for i in l if i[1]==k]),k) for k in asv]
OUTPUT :
[(3, 16), (2, 10), (9, 20), (54, 13)]
I want to sort a list of tuples in a consecutive order, so the first element of each tuple is equal to the last element of the previous one.
For example:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
I have developed a search like this:
output=[]
given = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
t = given[0][0]
for i in range(len(given)):
# search tuples starting with element t
output += [e for e in given if e[0] == t]
t = output[-1][-1] # Get the next element to search
print(output)
Is there a pythonic way to achieve such order?
And a way to do it "in-place" (with only a list)?
In my problem, the input can be reordered in a circular way using all the tuples, so it is not important the first element chosen.
Assuming your tuples in the list will be circular, you may use dict to achieve it within complexity of O(n) as:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
input_dict = dict(input) # Convert list of `tuples` to dict
elem = input[0][0] # start point in the new list
new_list = [] # List of tuples for holding the values in required order
for _ in range(len(input)):
new_list.append((elem, input_dict[elem]))
elem = input_dict[elem]
if elem not in input_dict:
# Raise exception in case list of tuples is not circular
raise Exception('key {} not found in dict'.format(elem))
Final value hold by new_list will be:
>>> new_list
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
if you are not afraid to waste some memory you could create a dictionary start_dict containing the start integers as keys and the tuples as values and do something like this:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_dict = {item[0]: item for item in tpl}
start = tpl[0][0]
res = []
while start_dict:
item = start_dict[start]
del start_dict[start]
res.append(item)
start = item[-1]
print(res)
if two tuples start with the same number you will lose one of them... if not all the start numbers are used the loop will not terminate.
but maybe this is something to build on.
Actually there're many questions about what you intend to have as an output and what if the input list has invalid structure to do what you need.
Assuming you have an input of pairs where each number is included twice only. So we can consider such input as a graph where numbers are nodes and each pair is an edge. And as far as I understand your question you suppose that this graph is cyclic and looks like this:
10 - 7 - 13 - 4 - 9 - 10 (same 10 as at the beginning)
This shows you that you can reduce the list to store the graph to [10, 7, 13, 4, 9]. And here is the script that sorts the input list:
# input
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# sorting and archiving
first = input[0][0]
last = input[0][1]
output_in_place = [first, last]
while last != first:
for item in input:
if item[0] == last:
last = item[1]
if last != first:
output_in_place.append(last)
print(output_in_place)
# output
output = []
for i in range(len(output_in_place) - 1):
output.append((output_in_place[i], output_in_place[i+1]))
output.append((output_in_place[-1], output_in_place[0]))
print(output)
I would first create a dictionary of the form
{first_value: [list of tuples with that first value], ...}
Then work from there:
from collections import defaultdict
chosen_tuples = input[:1] # Start from the first
first_values = defaultdict()
for tup in input[1:]:
first_values[tup[0]].append(tup)
while first_values: # Loop will end when all lists are removed
value = chosen_tuples[-1][1] # Second item of last tuple
tuples_with_that_value = first_values[value]
chosen_tuples.append(tuples_with_that_value.pop())
if not chosen_with_that_value:
del first_values[value] # List empty, remove it
You can try this:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [input[0]] # output contains the first element of input
temp = input[1:] # temp contains the rest of elements in input
while temp:
item = [i for i in temp if i[0] == output[-1][1]].pop() # We compare each element with output[-1]
output.append(item) # We add the right item to output
temp.remove(item) # We remove each handled element from temp
Output:
>>> output
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
Here is a robust solution using the sorted function and a custom key function:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
def consec_sort(lst):
def key(x):
nonlocal index
if index <= lower_index:
index += 1
return -1
return abs(x[0] - lst[index - 1][1])
for lower_index in range(len(lst) - 2):
index = 0
lst = sorted(lst, key=key)
return lst
output = consec_sort(input)
print(output)
The original list is not modified. Note that sorted is called 3 times for your input list of length 5. In each call, one additional tuple is placed correctly. The first tuple keeps it original position.
I have used the nonlocal keyword, meaning that this code is for Python 3 only (one could use global instead to make it legal Python 2 code).
My two cents:
def match_tuples(input):
# making a copy to not mess up with the original one
tuples = input[:] # [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
last_elem = tuples.pop(0) # (10,7)
# { "first tuple's element": "index in list"}
indexes = {tup[0]: i for i, tup in enumerate(tuples)} # {9: 3, 4: 0, 13: 1, 7: 2}
yield last_elem # yields de firts element
for i in range(len(tuples)):
# get where in the list is the tuple which first element match the last element in the last tuple
list_index = indexes.get(last_elem[1])
last_elem = tuples[list_index] # just get that tuple
yield last_elem
Output:
input = [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
print(list(match_tuples(input)))
# output: [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
this is a (less efficient than the dictionary version) variant where the list is changed in-place:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
for i in range(1, len(tpl)-1): # iterate over the indices of the list
item = tpl[i]
for j, next_item in enumerate(tpl[i+1:]): # find the next item
# in the remaining list
if next_item[0] == item[1]:
next_index = i + j
break
tpl[i], tpl[next_index] = tpl[next_index], tpl[i] # now swap the items
here is a more efficient version of the same idea:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_index = {item[0]: i for i, item in enumerate(tpl)}
item = tpl[0]
next_index = start_index[item[-1]]
for i in range(1, len(tpl)-1):
tpl[i], tpl[next_index] = tpl[next_index], tpl[i]
# need to update the start indices:
start_index[tpl[next_index][0]] = next_index
start_index[tpl[i][0]] = i
next_index = start_index[tpl[i][-1]]
print(tpl)
the list is changed in-place; the dictionary only contains the starting values of the tuples and their index in the list.
To get a O(n) algorithm one needs to make sure that one doesn't do a double-loop over the array. One way to do this is by keeping already processed values in some sort of lookup-table (a dict would be a good choice).
For example something like this (I hope the inline comments explain the functionality well). This modifies the list in-place and should avoid unnecessary (even implicit) looping over the list:
inp = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# A dictionary containing processed elements, first element is
# the key and the value represents the tuple. This is used to
# avoid the double loop
seen = {}
# The second value of the first tuple. This must match the first
# item of the next tuple
current = inp[0][1]
# Iteration to insert the next element
for insert_idx in range(1, len(inp)):
# print('insert', insert_idx, seen)
# If the next value was already found no need to search, just
# pop it from the seen dictionary and continue with the next loop
if current in seen:
item = seen.pop(current)
inp[insert_idx] = item
current = item[1]
continue
# Search the list until the next value is found saving all
# other items in the dictionary so we avoid to do unnecessary iterations
# over the list.
for search_idx in range(insert_idx, len(inp)):
# print('search', search_idx, inp[search_idx])
item = inp[search_idx]
first, second = item
if first == current:
# Found the next tuple, break out of the inner loop!
inp[insert_idx] = item
current = second
break
else:
seen[first] = item
I have a list of tuples which I create dynamically.
The list appears as:
List = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
Each tuple (a, b) of list represents the range of indexes from a certain table.
The ranges (a, b) and (b, d) is same in my situation as (a, d)
I want to merge the tuples where the 2nd element matches the first of any other.
So, in the example above, I want to merge (8, 10), (10,13) to obtain (8,13) and remove (8, 10), (10,13)
(19,25) and (25,30) merge should yield (19, 30)
I don't have a clue where to start. The tuples are non overlapping.
Edit: I have been trying to just avoid any kind of for loop as I have a pretty large list
If you need to take into account things like skovorodkin's example in the comment,
[(1, 4), (4, 8), (8, 10)]
(or even more complex examples), then one way to do efficiently would be using graphs.
Say you create a digraph (possibly using networkx), where each pair is a node, and there is an edge from (a, b) to node (c, d) if b == c. Now run topological sort, iterate according to the order, and merge accordingly. You should take care to handle nodes with two (or more) outgoing edges properly.
I realize your question states you'd like to avoid loops on account of the long list size. Conversely, for long lists, I doubt you'll find even an efficient linear time solution using list comprehension (or something like that). Note that you cannot sort the list in linear time, for example.
Here is a possible implementation:
Say we start with
l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
It simplifies the following to remove duplicates, so let's do:
l = list(set(l))
Now to build the digraph:
import networkx as nx
import collections
g = nx.DiGraph()
The vertices are simply the pairs:
g.add_nodes_from(l)
To build the edges, we need a dictionary:
froms = collections.defaultdict(list)
for p in l:
froms[p[0]].append(p)
Now we can add the edges:
for p in l:
for from_p in froms[p[1]]:
g.add_edge(p, from_p)
Next two lines are unneeded - they're just here to show what the graph looks like at this point:
>>> g.nodes()
[(25, 30), (14, 16), (10, 13), (8, 10), (1, 4), (19, 25)]
>>> g.edges()
[((8, 10), (10, 13)), ((19, 25), (25, 30))]
Now, let's sort the pairs by topological sort:
l = nx.topological_sort(g)
Finally, here's the tricky part. The result will be a DAG. We have to to traverse things recursively, but remember what we visited already.
Let's create a dict of what we visited:
visited = {p: False for p in l}
Now a recursive function, that given a node, returns the maximum range edge from any node reachable from it:
def visit(p):
neighbs = g.neighbors(p)
if visited[p] or not neighbs:
visited[p] = True
return p[1]
mx = max([visit(neighb_p) for neighb_p in neighbs])
visited[p] = True
return mx
We're all ready. Let's create a list for the final pairs:
final_l = []
and visit all nodes:
for p in l:
if visited[p]:
continue
final_l.append((p[0], visit(p)))
Here's the final result:
>>> final_l
[(1, 4), (8, 13), (14, 16)]
If they don't overlap, then you can sort them, and then just combine adjacent ones.
Here's a generator that yields the new tuples:
def combine_ranges(L):
L = sorted(L) # Make a copy as we're going to remove items!
while L:
start, end = L.pop(0) # Get the first item
while L and L[0][0] == end:
# While the first of the rest connects to it, adjust
# the end and remove the first of the rest
_, end = L.pop(0)
yield (start, end)
print(list(combine_ranges(List)))
If speed is important, use a collections.deque instead of a list, so that the .pop(0) operations can be in constant speed.
non-recursive approach, using sorting (I've added more nodes to handle complex case):
l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30), (30,34), (38,40)]
l = sorted(l)
r=[]
idx=0
while idx<len(l):
local=idx+1
previous_value = l[idx][1]
# search longest string
while local<len(l):
if l[local][0]!=previous_value:
break
previous_value = l[local][1]
local+=1
# store tuple
r.append((l[idx][0],l[local-1][1]))
idx = local
print(r)
result:
[(1, 4), (8, 13), (14, 16), (19, 34), (38, 40)]
The only drawback is that original sort order is not preserved. I don't know if it's a problem.
Here is one optimized recursion approach:
In [44]: def find_intersection(m_list):
for i, (v1, v2) in enumerate(m_list):
for j, (k1, k2) in enumerate(m_list[i + 1:], i + 1):
if v2 == k1:
m_list[i] = (v1, m_list.pop(j)[1])
return find_intersection(m_list)
return m_list
Demo:
In [45]: lst = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
In [46]: find_intersection(lst)
Out[46]: [(1, 4), (8, 13), (19, 30), (14, 16)]
You can use a dictionary to map the different end indices to the range ending at that index; then just iterate the list sorted by start index and merge the segments accordingly:
def join_lists(lst):
ending = {} # will map end position to range
for start, end in sorted(lst): # iterate in sorted order
if start in ending:
ending[end] = (ending[start][0], end) # merge
del ending[start] # remove old value
else:
ending[end] = (start, end)
return list(ending.values()) # return remaining values from dict
Alternatively, as pointed out by Tomer W in comments, you can do without the sorting, by iterating the list twice, making this solution take only linear time (O(n)) w.r.t. the length of the list.
def join_lists(lst):
ending = {} # will map end position to range
# first pass: add to dictionary
for start, end in lst:
ending[end] = (start, end)
# second pass: lookup and merge
for start, end in lst:
if start in ending:
ending[end] = (ending[start][0], end)
del ending[start]
# return remaining values from dict
return list(ending.values())
Examples output, for both cases:
>>> join_lists([(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)])
[(1, 4), (8, 13), (14, 16), (19, 30)]
>>> join_lists(lst = [(1, 4), (4, 8), (8, 10)])
[(1, 10)]
The list is first sorted and adjacent pairs of (min1, max1), (min2, max2) are merged together if they overlap.
MIN=0
MAX=1
def normalize(intervals):
isort = sorted(intervals)
for i in range(len(isort) - 1):
if isort[i][MAX] >= isort[i + 1][MIN]:
vmin = isort[i][MIN]
vmax = max(isort[i][MAX], isort[i + 1][MAX])
isort[i] = None
isort[i + 1] = (vmin, vmax)
return [r for r in isort if r is not None]
List1 = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
List2 = [(1, 4), (4, 8), (8, 10)]
print(normalize(List1))
print(normalize(List2))
#[(1, 4), (8, 13), (14, 16), (19, 30)]
#[(1, 10)]
The following should work. It breaks tuples into individual numbers, then finds the tuple bound on each cluster. This should work even with difficult overlaps, like [(4, 10), (9, 12)]
It's a very simple fix.
# First turn your list of tuples into a list of numbers:
my_list = []
for item in List: my_list = my_list + [i for i in range(item[0], item[1]+1)]
# Then create tuple pairs:
output = []
a = False
for x in range(max(my_list)+1):
if (not a) and (x in my_list): a = x
if (a) and (x+1 not in my_list):
output.append((a, x))
a = False
print output
Given a tuple list a:
a =[(23, 11), (10, 16), (13, 11), (12, 3), (4, 15), (10, 16), (10, 16)]
We can count how many appearances of each tuple we have using Counter:
>>> from collections import Counter
>>> b = Counter(a)
>>> b
Counter({(4, 15): 1, (10, 16): 3, (12, 3): 1, (13, 11): 1, (23, 11): 1}
Now, the idea is to select 3 random tuples from the list, without repetition, such that the count determines the probability that a particular tuple is chosen.
For instance, (10, 16) is more likely to be chosen than the others - its weight is 3/7 while the other four tuples have weight 1/7.
I have tried to use np.random.choice:
a[np.random.choice(len(a), 3, p=b/len(a))]
But I'm not able to generate the tuples.
Im trying:
a =[(23, 11), (10, 16), (13, 11), (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(a)
c = []
print "counter list"
print b
for item in b:
print "item from current list"
print item
print "prob of the item"
print (float(b[item])/float(len(a)))
c.append(float(b[item])/float(len(a)))
print "prob list"
print c
print (np.random.choice(np.arange(len(b)), 3, p=c, replace=False))
In this case im getting the random indexes of the array.
Is there any more optimized way not to have to calculate the
probabilities array?
Also there is an issue that is that the prob array does not correspond to the Counter array.
If you aren't interested in the intermediate step of calculating frequencies, you could use random.shuffle (either on the list or a copy) and then slice off as many items as you need.
e.g.
import random
a =[(23, 11), (10, 16), (13, 11), (12, 3), (4, 15), (10, 16), (10, 16)]
random.shuffle(a)
random_sample = a[0:3]
print(random_sample)
As shuffle reorders in place it will avoid the repetition issue, and statistically should give the same result (excluding differences in random number generation between np and random).
This will do the trick
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import random
listOfNumbers =[(23, 11), (10, 16), (13, 11), (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(listOfNumbers)
c = []
pres=[]
for k,v in b.most_common():
c.append(float(v)/float(len(listOfNumbers)))
pres.append(k)
resultIndex = np.random.choice(np.arange(len(b)), 3, p=c, replace=False)
ass=[]
for res in resultIndex:
ass.append(pres[res])
print ass
Now is just to see if is there any way to optimize it.
You can repeat the following steps 3 times:
Randomly chose a number i in range [0..n-1] where n is a current number of elements in a.
Find a tuple on i-th position in the initial a list. Add tuple to resulting triplet.
Remove all occurrences of tuple from a.
Pay attention to a corner case when a can be empty.
The overall time complexity will be O(n) for a list.
On the first step number i should be generated according to uniform distribution which regular random provides. The more occurrences of a particular tuple are in a the more likely it will be chosen.
So I am making a program that generates a list of prime numbers and then sorts them into twin prime pairs, then calculates out what two sets of twin primes have the largest difference. I have gotten to sorting it into a list of twin prime pairs with my code, but now I am having a hard time figuring out how to make the next part happen. I am not quite sure how I can calculate the largest gap between the primes. Here is what I got so far:
def is_factor(n,f):
'''
Returns True if f is a factor of n,
OTW returns False.
Both n and f are ints.
'''
TV = (n % f == 0)
return TV
def properFactorsOf(n):
'''
Returns a list of the proper factors
of n. n is an int.
f is a proper factor of n if:
f is a factor of n
f > 1 and f < n.
'''
L = []
upper = n//2 # largest f to test
for f in range(2,upper + 1):
if is_factor(n,f):
L.append(f)
return L
def is_prime(n):
'''
Returns True if n is a prime,
OTW returns False.
n is an int.
Use properFactorsOf(n) to check whether n is prime.
'''
TV = len(properFactorsOf(n)) == 0
return TV
def LofPrimes(n):
'''
Returns a list of the first n primes.
Uses is_prime(n).
'''
primes = [2,3]
p = 5
while len(primes) < n:
if is_prime(p):
primes.append(p)
p += 2
return primes
def twins():
n = eval(input("How many primes?"))
L = (LofPrimes(n)) #Makes a list consisting of the function LofPrimes(n)
m = 1 #If m is zero it would take the zeroth position
L3 = [] # This is the list of twins
for i in range(len(L)-1): #keeps it in range
tp = L[m]-L[m-1] #subtract pos 1 from pos zero
if tp == 2: # If the difference of the pair is 2
L3.append(([L[m-1],L[m]])) #add the twins to the list L3
m += 1 # set m back to 1 at the end of the loop
print(L3)
So I feel like I am kind of on the right path, I made some pseudo code to give you an idea on where my thought is going:
assign a temp variable to m-1 on the first run,
assign a temp variable to m on the second run,
make a loop to go through the list of twins
take the difference of m-1 from the first set and m from the second set
in this loop calculate the max gap
return the greatest difference
Suppose we have a list of the pairs of primes, what you call L3 which could be like this:
L3 = [(3, 5), (5, 7), (11, 13), (17, 19), (29, 31), (41, 43), (59, 61),
(71, 73), (101, 103), (107, 109), (137, 139)]
Then what we want to do is take the first element of a pair minus the last element of the previous pair.
We also want to accumulate a list of these values so later we can see at which index the maximum happens. The reduce function is good for this.
def helper_to_subtract_pairs(acc, x):
return acc[:-1] + [x[0] - acc[-1]] + [x[1]]
Then printing
reduce(helper_to_subtract_pairs, L3, [0])
gives
[3, 0, 4, 4, 10, 10, 16, 10, 28, 4, 28, 139]
The first element happens because inside the call to reduce we use a starting value of [0] (so the first prime 3 leads to 3 - 0). We can ignore that. The final item, 139, represents the number that would be part of the subtraction if there was one more pair on the end. But since there is not, we can ignore that too:
In [336]: reduce(helper_to_subtract_pairs, L3, [0])[1:-1]
Out[336]: [0, 4, 4, 10, 10, 16, 10, 28, 4, 28]
Now, we want the index(es) where the max occurs. For this let's use the Python recipe for argmax:
def argmax(some_list):
return max(enumerate(some_list), key=lambda x:x[1])[0]
Then we get:
In [338]: argmax(reduce(helper_to_subtract_pairs, L3, [0])[1:-1])
Out[338]: 7
telling us that the gap at index 7 is the biggest (or at least tied for the biggest). The 7th gap in this example was between (101, 103) and (71, 73) (remember Python is 0-based, so the 7th gap is really between pair 7 and pair 8).
So as a function of just the list L3, we can write:
def max_gap(prime_list):
gaps = reduce(helper_to_subtract_pairs, prime_list, [0])[1:-1]
max_idx = argmax(gaps)
return gaps[max_idx], prime_list[max_idx:max_idx + 2]
The whole thing could look like:
def argmax(some_list):
return max(enumerate(some_list), key=lambda x:x[1])[0]
def helper_to_subtract_pairs(acc, x):
return acc[:-1] + [x[0] - acc[-1]] + [x[1]]
def max_gap(prime_list):
gaps = reduce(helper_to_subtract_pairs, prime_list, [0])[1:-1]
max_idx = argmax(gaps)
return gaps[max_idx], prime_list[max_idx:max_idx + 2]
L3 = [(3, 5), (5, 7), (11, 13), (17, 19), (29, 31), (41, 43), (59, 61),
(71, 73), (101, 103), (107, 109), (137, 139)]
print max_gap(L3)
which prints
In [342]: print max_gap(L3)
(28, [(71, 73), (101, 103)])
It's going to be more effort to modify argmax to return all occurrences of the max. If this is for investigating the asymptotic growth of the gap size, you won't need to, since any max will do to summarize a particular collection of primes.