I have a list of tuples which I create dynamically.
The list appears as:
List = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
Each tuple (a, b) of list represents the range of indexes from a certain table.
The ranges (a, b) and (b, d) is same in my situation as (a, d)
I want to merge the tuples where the 2nd element matches the first of any other.
So, in the example above, I want to merge (8, 10), (10,13) to obtain (8,13) and remove (8, 10), (10,13)
(19,25) and (25,30) merge should yield (19, 30)
I don't have a clue where to start. The tuples are non overlapping.
Edit: I have been trying to just avoid any kind of for loop as I have a pretty large list
If you need to take into account things like skovorodkin's example in the comment,
[(1, 4), (4, 8), (8, 10)]
(or even more complex examples), then one way to do efficiently would be using graphs.
Say you create a digraph (possibly using networkx), where each pair is a node, and there is an edge from (a, b) to node (c, d) if b == c. Now run topological sort, iterate according to the order, and merge accordingly. You should take care to handle nodes with two (or more) outgoing edges properly.
I realize your question states you'd like to avoid loops on account of the long list size. Conversely, for long lists, I doubt you'll find even an efficient linear time solution using list comprehension (or something like that). Note that you cannot sort the list in linear time, for example.
Here is a possible implementation:
Say we start with
l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
It simplifies the following to remove duplicates, so let's do:
l = list(set(l))
Now to build the digraph:
import networkx as nx
import collections
g = nx.DiGraph()
The vertices are simply the pairs:
g.add_nodes_from(l)
To build the edges, we need a dictionary:
froms = collections.defaultdict(list)
for p in l:
froms[p[0]].append(p)
Now we can add the edges:
for p in l:
for from_p in froms[p[1]]:
g.add_edge(p, from_p)
Next two lines are unneeded - they're just here to show what the graph looks like at this point:
>>> g.nodes()
[(25, 30), (14, 16), (10, 13), (8, 10), (1, 4), (19, 25)]
>>> g.edges()
[((8, 10), (10, 13)), ((19, 25), (25, 30))]
Now, let's sort the pairs by topological sort:
l = nx.topological_sort(g)
Finally, here's the tricky part. The result will be a DAG. We have to to traverse things recursively, but remember what we visited already.
Let's create a dict of what we visited:
visited = {p: False for p in l}
Now a recursive function, that given a node, returns the maximum range edge from any node reachable from it:
def visit(p):
neighbs = g.neighbors(p)
if visited[p] or not neighbs:
visited[p] = True
return p[1]
mx = max([visit(neighb_p) for neighb_p in neighbs])
visited[p] = True
return mx
We're all ready. Let's create a list for the final pairs:
final_l = []
and visit all nodes:
for p in l:
if visited[p]:
continue
final_l.append((p[0], visit(p)))
Here's the final result:
>>> final_l
[(1, 4), (8, 13), (14, 16)]
If they don't overlap, then you can sort them, and then just combine adjacent ones.
Here's a generator that yields the new tuples:
def combine_ranges(L):
L = sorted(L) # Make a copy as we're going to remove items!
while L:
start, end = L.pop(0) # Get the first item
while L and L[0][0] == end:
# While the first of the rest connects to it, adjust
# the end and remove the first of the rest
_, end = L.pop(0)
yield (start, end)
print(list(combine_ranges(List)))
If speed is important, use a collections.deque instead of a list, so that the .pop(0) operations can be in constant speed.
non-recursive approach, using sorting (I've added more nodes to handle complex case):
l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30), (30,34), (38,40)]
l = sorted(l)
r=[]
idx=0
while idx<len(l):
local=idx+1
previous_value = l[idx][1]
# search longest string
while local<len(l):
if l[local][0]!=previous_value:
break
previous_value = l[local][1]
local+=1
# store tuple
r.append((l[idx][0],l[local-1][1]))
idx = local
print(r)
result:
[(1, 4), (8, 13), (14, 16), (19, 34), (38, 40)]
The only drawback is that original sort order is not preserved. I don't know if it's a problem.
Here is one optimized recursion approach:
In [44]: def find_intersection(m_list):
for i, (v1, v2) in enumerate(m_list):
for j, (k1, k2) in enumerate(m_list[i + 1:], i + 1):
if v2 == k1:
m_list[i] = (v1, m_list.pop(j)[1])
return find_intersection(m_list)
return m_list
Demo:
In [45]: lst = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
In [46]: find_intersection(lst)
Out[46]: [(1, 4), (8, 13), (19, 30), (14, 16)]
You can use a dictionary to map the different end indices to the range ending at that index; then just iterate the list sorted by start index and merge the segments accordingly:
def join_lists(lst):
ending = {} # will map end position to range
for start, end in sorted(lst): # iterate in sorted order
if start in ending:
ending[end] = (ending[start][0], end) # merge
del ending[start] # remove old value
else:
ending[end] = (start, end)
return list(ending.values()) # return remaining values from dict
Alternatively, as pointed out by Tomer W in comments, you can do without the sorting, by iterating the list twice, making this solution take only linear time (O(n)) w.r.t. the length of the list.
def join_lists(lst):
ending = {} # will map end position to range
# first pass: add to dictionary
for start, end in lst:
ending[end] = (start, end)
# second pass: lookup and merge
for start, end in lst:
if start in ending:
ending[end] = (ending[start][0], end)
del ending[start]
# return remaining values from dict
return list(ending.values())
Examples output, for both cases:
>>> join_lists([(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)])
[(1, 4), (8, 13), (14, 16), (19, 30)]
>>> join_lists(lst = [(1, 4), (4, 8), (8, 10)])
[(1, 10)]
The list is first sorted and adjacent pairs of (min1, max1), (min2, max2) are merged together if they overlap.
MIN=0
MAX=1
def normalize(intervals):
isort = sorted(intervals)
for i in range(len(isort) - 1):
if isort[i][MAX] >= isort[i + 1][MIN]:
vmin = isort[i][MIN]
vmax = max(isort[i][MAX], isort[i + 1][MAX])
isort[i] = None
isort[i + 1] = (vmin, vmax)
return [r for r in isort if r is not None]
List1 = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
List2 = [(1, 4), (4, 8), (8, 10)]
print(normalize(List1))
print(normalize(List2))
#[(1, 4), (8, 13), (14, 16), (19, 30)]
#[(1, 10)]
The following should work. It breaks tuples into individual numbers, then finds the tuple bound on each cluster. This should work even with difficult overlaps, like [(4, 10), (9, 12)]
It's a very simple fix.
# First turn your list of tuples into a list of numbers:
my_list = []
for item in List: my_list = my_list + [i for i in range(item[0], item[1]+1)]
# Then create tuple pairs:
output = []
a = False
for x in range(max(my_list)+1):
if (not a) and (x in my_list): a = x
if (a) and (x+1 not in my_list):
output.append((a, x))
a = False
print output
Related
Given a list of iterables:
li = [(1,2), (3,4,8), (3,4,7), (9,)]
I want to sort by the third element if present, otherwise leave the order unchanged. So here the desired output would be:
[(1,2), (3,4,7), (3,4,8), (9,)]
Using li.sort(key=lambda x:x[2]) returns an IndexError. I tried a custom function:
def safefetch(li, idx):
try:
return li[idx]
except IndexError:
return # (ie return None)
li.sort(key=lambda x: safefetch(x, 2))
But None in sorting yields a TypeError.
Broader context: I first want to sort by the first element, then the second, then the third, etc. until the length of the longest element, ie I want to run several sorts of decreasing privilege (as in SQL's ORDER BY COL1 , COL2), while preserving order among those elements that aren't relevant. So: first sort everything by first element; then among the ties on el_1 sort on el_2, etc.. until el_n. My feeling is that calling a sort function on the whole list is probably the wrong approach.
(Note that this was an "XY question": for my actual question, just using sorted on tuples is simplest, as Patrick Artner pointed out in the comments. But the question is posed is trickier.)
We can first get the indices for distinct lengths of elements in the list via a defaultdict and then sort each sublist with numpy's fancy indexing:
from collections import defaultdict
# {length -> inds} mapping
d = defaultdict(list)
# collect indices per length
for j, tup in enumerate(li):
d[len(tup)].append(j)
# sort
li = np.array(li, dtype=object)
for inds in d.values():
li[inds] = sorted(li[inds])
# convert back to list if desired
li = li.tolist()
to get li at the end as
[(1, 2), (3, 4, 7), (3, 4, 8), (9,)]
For some other samples:
In [134]: the_sorter([(12,), (3,4,8), (3,4,7), (9,)])
Out[134]: [(9,), (3, 4, 7), (3, 4, 8), (12,)]
In [135]: the_sorter([(12,), (3,4,8,9), (3,4,7), (11, 9), (9, 11), (2, 4, 4, 4)])
Out[135]: [(12,), (2, 4, 4, 4), (3, 4, 7), (9, 11), (11, 9), (3, 4, 8, 9)]
where the_sorter is above procedure wrapped in a function (name lacks imagination...)
def the_sorter(li):
# {length -> inds} mapping
d = defaultdict(list)
# collect indices per length
for j, tup in enumerate(li):
d[len(tup)].append(j)
# sort
li = np.array(li)
for inds in d.values():
li[inds] = sorted(li[inds])
return li.tolist()
Whatever you return as fallback value must be comparable to the other key values that might be returned. In your example that would require a numerical value.
import sys
def safefetch(li, idx):
try:
return li[idx]
except IndexError:
return sys.maxsize # largest int possible
This would put all the short ones at the back of the sort order, but maintain a stable order among them.
Inspired by #Mustafa Aydın here is a solution in Pandas. Would prefer one without the memory overhead of a dataframe, but this might be good enough.
import pandas as pd
li = [(1,2), (3,4,8), (3,4,7), (9,)]
tmp = pd.DataFrame(li)
[tuple(int(el) for el in t if not pd.isna(el)) for t in tmp.sort_values(by=tmp.columns.tolist()).values]
> [(1, 2), (3, 4, 7), (3, 4, 8), (9,)]
list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]
Since (0,2) & (4,6) are both within the indexes of (0,6), so I want to remove them. The resulting list would be:
list_of_tuple = [(0,6), (6,7), (8,9)]
It seems I need to sort this tuple of list somehow to make it easier to remove. But How to sort a list of tuples?
Given two tuples of array indexes, [m,n] and [a,b], if:
m >=a & n<=b
Then [m,n] is included in [a,b], then remove [m,n] from the list.
To remove all tuples from list_of_tuples with a range out of the specified tuple:
list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]
def rm(lst,tup):
return [tup]+[t for t in lst if t[0] < tup[0] or t[1] > tup[1]]
print(rm(list_of_tuple,(0,6)))
Output:
[(0, 6), (6, 7), (8, 9)]
Here's a dead-simple solution, but it's O(n2):
intervals = [(0, 2), (0, 6), (4, 6), (6, 7), (8, 9)] # list_of_tuple
result = [
t for t in intervals
if not any(t != u and t[0] >= u[0] and t[1] <= u[1] for u in intervals)
]
It filters out intervals that are not equal to, but contained in, any other intervals.
Seems like an opportunity to abuse both reduce() and Python's logical operators! Solution assumes list is sorted as in the OP's example, primarily on the second element of each tuple, and secondarily on the first:
from functools import reduce
list_of_sorted_tuples = [(0, 2), (0, 6), (4, 6), (6, 7), (8, 9)]
def contains(a, b):
return a[0] >= b[0] and a[1] <= b[1] and [b] or b[0] >= a[0] and b[1] <= a[1] and [a] or [a, b]
reduced_list = reduce(lambda x, y: x[:-1] + contains(x[-1], y) if x else [y], list_of_sorted_tuples, [])
print(reduced_list)
OUTPUT
> python3 test.py
[(0, 6), (6, 7), (8, 9)]
>
You could try something like this to check if both ends of the (half-open) interval are contained within another interval:
list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)]
reduced_list = []
for t in list_of_tuple:
add = True
for o in list_of_tuple:
if t is not o:
r = range(*o)
if t[0] in r and (t[1] - 1) in r:
add = False
if add:
reduced_list.append(t)
print(reduced_list) # [(0, 6), (6, 7), (8, 9)]
Note: This assumes that your tuples are half-open intervals, i.e. [0, 6) where 0 is inclusive but 6 is exclusive, similar to how range would treat the start and stop parameters. A couple of small changes would have to be made for the case of fully closed intervals:
range(*o) -> range(o[0], o[1] + 1)
and
if t[0] in r and (t[1] - 1) in r: -> if t[0] in r and t[1] in r:
Here is the first step towards a solution that can be done in O(n log(n)):
def non_cont(lot):
s = sorted(lot, key = lambda t: (t[0], -t[1]))
i = 1
while i < len(s):
if s[i][0] >= s[i - 1][0] and s[i][1] <= s[i - 1][1]:
del s[i]
else:
i += 1
return s
The idea is that after sorting using the special key function, the each element that is contained in some other element, will be located directly after an element that contains it. Then, we sweep the list, removing elements that are contained by the element that precedes them. Now, the sweep and delete loop is, itself, of complexity O(n^2). The above solution is for clarity, more than anything else. We can move to the next implementation:
def non_cont_on(lot):
s = sorted(lot, key = lambda t: (t[0], -t[1]))
i = 1
result = s[:1]
for i in s:
if not (i[0] >= result[-1][0] and i[1] <= result[-1][1]):
result.append(i)
return result
There is no quadratic sweep and delete loop here, only a nice, linear process of constructing the result. Space complexity is O(n). It is possible to perform this algorithm without extra, non-constant, space, but I will leave this out.
A side effect of both algorithm is that the intervals are sorted.
If you want to preserve the information about the inclusion-structure (by which enclosing interval an interval of the original set is consumed) you can build a "one-level tree":
def contained(tpl1, tpl2):
return tpl1[0] >= tpl2[0] and tpl1[1] <= tpl2[1]
def interval_hierarchy(lst):
if not lst:
return
root = lst.pop()
children_dict = {root: []}
while lst:
t = lst.pop()
curr_children = list(children_dict.keys())
for k in curr_children:
if contained(k, t):
children_dict[t] = (children_dict[t] if t in children_dict else []) +\
[k, *children_dict[k]]
children_dict.pop(k)
elif contained(t, k):
children_dict[k].append(t)
if t in children_dict:
children_dict[k] += children_dict[t]
children_dict.pop(t)
else:
if not t in children_dict:
children_dict[t] = []
# return whatever information you might want to use
return children_dict, list(children_dict.keys())
It appears you are trying to merge intervals which are overlapping. For example, (9,11), (10,12) are merged in the second example below to produce (9,12).
In that case, a simple sort using sorted will automatically handle tuples.
Approach: Store the next interval to be added. Keep extending the end of the interval until you encounter a value whose "start" comes after (>=) the "end" of the next value to add. At that point, that stored next interval can be appended to the results. Append at the end to account for processing all values.
def merge_intervals(val_input):
if not val_input:
return []
vals_sorted = sorted(val_input) # sorts by tuple values "natural ordering"
result = []
x0, x1 = vals_sorted[0] # store next interval to be added as (x0, x1)
for start, end in vals_sorted[1:]:
if start >= x1: # reached next separate interval
result.append((x0, x1))
x0, x1 = (start, end)
elif end > x1:
x1 = end # extend length of next interval to be added
result.append((x0, x1))
return result
print(merge_intervals([(0,2), (0,6), (4,6), (6,7), (8,9)]))
print(merge_intervals([(1,2), (9,11), (10,12), (1,7)]))
Output:
[(0, 6), (6, 7), (8, 9)]
[(1, 7), (9, 12)]
Alright. So I've been through some SO answers such as Find an element in a list of tuples in python and they don't seem that specific to my case. And I am getting no idea on how to use them in my issue.
Let us say I have a list of a tuple of tuples; i.e. the list stores several data points each referring to a Cartesian point. Each outer tuple represents the entire data of the point. There is an inner tuple in this tuple which is the point exactly. That is, let us take the point (1,2) and have 5 denoting some meaning to this point. The outer tuple will be ((1,2),5)
Well, it is easy to figure out how to generate this. However, I want to search for an outer tuple based on the value of the inner tuple. That is I wanna do:
for y in range(0, 10):
for x in range(0, 10):
if (x, y) in ###:
print("Found")
or something of this sense. How can this be done?
Based on the suggestion posted as a comment by #timgen, here is some pseudo-sample data.
The list is gonna be
selectPointSet = [((9, 2), 1), ((4, 7), 2), ((7, 3), 0), ((5, 0), 0), ((8, 1), 2)]
So I may wanna iterate through the whole domain of points which ranges from (0,0) to (9,9) and do something if the point is one among those in selectPointSet; i.e. if it is (9, 2), (4, 7), (7, 3), (5, 0) or (8, 1)
Using the data structures that you currently are, you can do it like this:
listTuple = [((1,1),5),((2,3),5)] #dummy list of tuples
for y in range(0, 10):
for x in range(0, 10):
for i in listTuple:#loop through list of tuples
if (x, y) in listTuple[listTuple.index(i)]:#test to see if (x,y) is in the tuple at this index
print(str((x,y)) , "Found")
You can make use of a dictionary.
temp = [((1,2),3),((2,3),4),((6,7),4)]
newDict = {}
# a dictionary with inner tuple as key
for t in temp:
newDict[t[0]] = t[1]
for y in range(0, 10):
for x in range(0, 10):
if newDict.__contains__((x,y)):
print("Found")
I hope this is what you are asking for.
Make a set from the two-element tuples for O(1) lookup.
>>> data = [((1,2),3),((2,3),4),((6,7),4)]
>>> tups = {x[0] for x in data}
Now you can query tups with any tuple you like.
>>> (6, 7) in tups
True
>>> (3, 2) in tups
False
Searching for values from 0 to 9:
>>> from itertools import product
>>> for x, y in product(range(10), range(10)):
... if (x, y) in tups:
... print('found ({}, {})'.format(x, y))
...
found (1, 2)
found (2, 3)
found (6, 7)
If you need to retain information about the third number (and the two-element inner tuples in data are unique) then you can also construct a dictionary instead of a set.
>>> d = dict(data)
>>> d
{(1, 2): 3, (2, 3): 4, (6, 7): 4}
>>> (2, 3) in d
True
>>> d[(2, 3)]
4
I want to compare tuples in two or more lists and print out the intersection of them. I have 25 element (which includes empty) in every tuple and tuple count changes in every list.
So far I have tried taking intersection of two lists, the code that I used can be seen below :
res_final = set(tuple(x) for x in res).intersection(set(tuple(x) for x in res1))
output:
set()
(res and res1 are my lists)
Hope this example helps:
import numpy as np
np.random.seed(0) # random seed for repeatability
a_ = np.random.randint(15,size=(1000,2)) # create random data for tuples
b_ = np.random.randint(15,size=(1000,2)) # create random data for tuples
a, b = set(tuple(d) for d in a_), set(tuple(d) for d in b_) # set of tuples
intersection = a&b # intersection
print(intersection) # result
In the code, matrices of random variables are created, then the rows are converted to tuples. Then we get the set of tuples and finally the important part for you, the intersection of the tuples.
If your input looks something like this:
in_1 = [(1, 1), (2, 2), (3, 3)]
in_2 = [(4, 4), (5, 5), (1, 1)]
in_3 = [(6, 6), (7, 7), (1, 1)]
ins = [in_1, in_2, in_3]
then I think you can use itertools.combinations to find pairwise intersections, and then take a set from them in order to remove duplicates.
from itertools import combinations
intersected = []
for first, second in combinations(ins, 2):
elems = set(first).intersection(set(second))
intersected.extend(elems)
dedup_intersected = set(intersected)
print(dedup_intersected)
# {(1, 1)}
I want to sort a list of tuples in a consecutive order, so the first element of each tuple is equal to the last element of the previous one.
For example:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
I have developed a search like this:
output=[]
given = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
t = given[0][0]
for i in range(len(given)):
# search tuples starting with element t
output += [e for e in given if e[0] == t]
t = output[-1][-1] # Get the next element to search
print(output)
Is there a pythonic way to achieve such order?
And a way to do it "in-place" (with only a list)?
In my problem, the input can be reordered in a circular way using all the tuples, so it is not important the first element chosen.
Assuming your tuples in the list will be circular, you may use dict to achieve it within complexity of O(n) as:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
input_dict = dict(input) # Convert list of `tuples` to dict
elem = input[0][0] # start point in the new list
new_list = [] # List of tuples for holding the values in required order
for _ in range(len(input)):
new_list.append((elem, input_dict[elem]))
elem = input_dict[elem]
if elem not in input_dict:
# Raise exception in case list of tuples is not circular
raise Exception('key {} not found in dict'.format(elem))
Final value hold by new_list will be:
>>> new_list
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
if you are not afraid to waste some memory you could create a dictionary start_dict containing the start integers as keys and the tuples as values and do something like this:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_dict = {item[0]: item for item in tpl}
start = tpl[0][0]
res = []
while start_dict:
item = start_dict[start]
del start_dict[start]
res.append(item)
start = item[-1]
print(res)
if two tuples start with the same number you will lose one of them... if not all the start numbers are used the loop will not terminate.
but maybe this is something to build on.
Actually there're many questions about what you intend to have as an output and what if the input list has invalid structure to do what you need.
Assuming you have an input of pairs where each number is included twice only. So we can consider such input as a graph where numbers are nodes and each pair is an edge. And as far as I understand your question you suppose that this graph is cyclic and looks like this:
10 - 7 - 13 - 4 - 9 - 10 (same 10 as at the beginning)
This shows you that you can reduce the list to store the graph to [10, 7, 13, 4, 9]. And here is the script that sorts the input list:
# input
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# sorting and archiving
first = input[0][0]
last = input[0][1]
output_in_place = [first, last]
while last != first:
for item in input:
if item[0] == last:
last = item[1]
if last != first:
output_in_place.append(last)
print(output_in_place)
# output
output = []
for i in range(len(output_in_place) - 1):
output.append((output_in_place[i], output_in_place[i+1]))
output.append((output_in_place[-1], output_in_place[0]))
print(output)
I would first create a dictionary of the form
{first_value: [list of tuples with that first value], ...}
Then work from there:
from collections import defaultdict
chosen_tuples = input[:1] # Start from the first
first_values = defaultdict()
for tup in input[1:]:
first_values[tup[0]].append(tup)
while first_values: # Loop will end when all lists are removed
value = chosen_tuples[-1][1] # Second item of last tuple
tuples_with_that_value = first_values[value]
chosen_tuples.append(tuples_with_that_value.pop())
if not chosen_with_that_value:
del first_values[value] # List empty, remove it
You can try this:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [input[0]] # output contains the first element of input
temp = input[1:] # temp contains the rest of elements in input
while temp:
item = [i for i in temp if i[0] == output[-1][1]].pop() # We compare each element with output[-1]
output.append(item) # We add the right item to output
temp.remove(item) # We remove each handled element from temp
Output:
>>> output
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
Here is a robust solution using the sorted function and a custom key function:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
def consec_sort(lst):
def key(x):
nonlocal index
if index <= lower_index:
index += 1
return -1
return abs(x[0] - lst[index - 1][1])
for lower_index in range(len(lst) - 2):
index = 0
lst = sorted(lst, key=key)
return lst
output = consec_sort(input)
print(output)
The original list is not modified. Note that sorted is called 3 times for your input list of length 5. In each call, one additional tuple is placed correctly. The first tuple keeps it original position.
I have used the nonlocal keyword, meaning that this code is for Python 3 only (one could use global instead to make it legal Python 2 code).
My two cents:
def match_tuples(input):
# making a copy to not mess up with the original one
tuples = input[:] # [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
last_elem = tuples.pop(0) # (10,7)
# { "first tuple's element": "index in list"}
indexes = {tup[0]: i for i, tup in enumerate(tuples)} # {9: 3, 4: 0, 13: 1, 7: 2}
yield last_elem # yields de firts element
for i in range(len(tuples)):
# get where in the list is the tuple which first element match the last element in the last tuple
list_index = indexes.get(last_elem[1])
last_elem = tuples[list_index] # just get that tuple
yield last_elem
Output:
input = [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
print(list(match_tuples(input)))
# output: [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
this is a (less efficient than the dictionary version) variant where the list is changed in-place:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
for i in range(1, len(tpl)-1): # iterate over the indices of the list
item = tpl[i]
for j, next_item in enumerate(tpl[i+1:]): # find the next item
# in the remaining list
if next_item[0] == item[1]:
next_index = i + j
break
tpl[i], tpl[next_index] = tpl[next_index], tpl[i] # now swap the items
here is a more efficient version of the same idea:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_index = {item[0]: i for i, item in enumerate(tpl)}
item = tpl[0]
next_index = start_index[item[-1]]
for i in range(1, len(tpl)-1):
tpl[i], tpl[next_index] = tpl[next_index], tpl[i]
# need to update the start indices:
start_index[tpl[next_index][0]] = next_index
start_index[tpl[i][0]] = i
next_index = start_index[tpl[i][-1]]
print(tpl)
the list is changed in-place; the dictionary only contains the starting values of the tuples and their index in the list.
To get a O(n) algorithm one needs to make sure that one doesn't do a double-loop over the array. One way to do this is by keeping already processed values in some sort of lookup-table (a dict would be a good choice).
For example something like this (I hope the inline comments explain the functionality well). This modifies the list in-place and should avoid unnecessary (even implicit) looping over the list:
inp = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# A dictionary containing processed elements, first element is
# the key and the value represents the tuple. This is used to
# avoid the double loop
seen = {}
# The second value of the first tuple. This must match the first
# item of the next tuple
current = inp[0][1]
# Iteration to insert the next element
for insert_idx in range(1, len(inp)):
# print('insert', insert_idx, seen)
# If the next value was already found no need to search, just
# pop it from the seen dictionary and continue with the next loop
if current in seen:
item = seen.pop(current)
inp[insert_idx] = item
current = item[1]
continue
# Search the list until the next value is found saving all
# other items in the dictionary so we avoid to do unnecessary iterations
# over the list.
for search_idx in range(insert_idx, len(inp)):
# print('search', search_idx, inp[search_idx])
item = inp[search_idx]
first, second = item
if first == current:
# Found the next tuple, break out of the inner loop!
inp[insert_idx] = item
current = second
break
else:
seen[first] = item