Merging overlapping items in a list

Merging overlapping items in a list - python

My goal is to merge overlapping tuples in the example list below.
If an item falls within the range of the next, the two tuples will have to be merged. The resulting tuple is one that covers the range of the two items (minimum to maximum values). For instance; [(1,6),(2,5)] will result in [(1,6)], as [2,5] falls within the range of [(1,6)]
mylist=[(1, 1), (1, 6), (2, 5), (4, 4), (9, 10)]
My attempt:
c=[]
t2=[]
for i, x in enumerate(mylist):
w=x,mylist[i-1]
if x[0]-my[i-1][1]<=1:
d=min([x[0] for x in w]),max([x[1] for x in w])
c.append(d)
for i, x in enumerate(set(c)):
t=x,c[i-1]
if x[0]-c[i-1][1]<=1:
t1=min([x[0] for x in t]),max([x[1] for x in t])
t2.append(t1)
print sorted(set(t2))
Derived Output:
[(1, 6), (1, 10)]
Desired output:
[(1, 6), (9, 10)]
Any suggestions on how to get the desired output (in fewer lines if possible)? Thanks.

Basing on answer from #Valera, python implementation:
mylist=[(1, 6), (2, 5), (1, 1), (3, 7), (4, 4), (9, 10)]
result = []
for item in sorted(mylist):
result = result or [item]
if item[0] > result[-1][1]:
result.append(item)
else:
old = result[-1]
result[-1] = (old[0], max(old[1], item[1]))
print result # [(1, 7), (9, 10)]

You can solve this problem in O(nlogn)
First, you need to sort your intervals by it's starting points. After that, you create a new stack, and for each interval do the following:
if it's empty, just push the current interval
if it's not, you check if the first interval in the stack overlaps with you current interval. If it does, you pop it, merge it with your current interval, and push the result back. If it doesn't, you just push your current interval. After you check all the intervals, your stack will contain all merged intervals.

Related

Sorting a list of tuples made up of two numbers by consecutive numbers

The list of tuples is the output from a capacitated vehicle-routing optimization and represents the arcs from one stop to another, where 0 represents the depot of the vehicle (start and end point of vehicle). As the vehicle must drive several laps it may return to the depot before all stops were made. The solver will always return the starting-arcs first, which in the example below means that the first consecutive tuples starting with 0, namely (0, 3), (0, 7), (0, 8) will determine how many laps (= 3) there are.
How can I sort the arcs in consecutive order so that one vehicle could drive the arcs one after another?
Input:
li = [(0, 3), (0, 7), (0, 8), (3, 0), (4, 0), (7, 3), (8, 4), (3, 0)]
Output:
[(0, 3), (3, 0), (0, 7), (7, 3), (3, 0), (0, 8), (8, 4), (4, 0)]
What I tried so far:
laps = 0
for arc in li:
if arc[0] == 0:
laps = laps + 1
new_list = []
for i in range(laps):
value = li.pop(0)
new_list.append([value])
for i in range(laps):
while new_list[i][-1][1] != 0:
arc_end = new_list[i][-1][1]
for j in range(len(li)):
if li[j][0] == arc_end:
value = li.pop(j)
new_list[i].append(value)
break
flat_list = [item for sublist in new_list for item in sublist]
return flat_list

You are trying to solve the problem of finding cycles in a directed graph. The problem itself is not a difficult one to solve, and Python has a very good package for solving such problems - networkx. It would be a good idea to learn a bit about fundamental graph algorithms. There is also another stack overflow question about this algorithm, which you can consult, at Finding all cycles in a directed graph

Sort a list of tuples in consecutive order

I want to sort a list of tuples in a consecutive order, so the first element of each tuple is equal to the last element of the previous one.
For example:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
I have developed a search like this:
output=[]
given = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
t = given[0][0]
for i in range(len(given)):
# search tuples starting with element t
output += [e for e in given if e[0] == t]
t = output[-1][-1] # Get the next element to search
print(output)
Is there a pythonic way to achieve such order?
And a way to do it "in-place" (with only a list)?
In my problem, the input can be reordered in a circular way using all the tuples, so it is not important the first element chosen.

Assuming your tuples in the list will be circular, you may use dict to achieve it within complexity of O(n) as:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
input_dict = dict(input) # Convert list of `tuples` to dict
elem = input[0][0] # start point in the new list
new_list = [] # List of tuples for holding the values in required order
for _ in range(len(input)):
new_list.append((elem, input_dict[elem]))
elem = input_dict[elem]
if elem not in input_dict:
# Raise exception in case list of tuples is not circular
raise Exception('key {} not found in dict'.format(elem))
Final value hold by new_list will be:
>>> new_list
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]

if you are not afraid to waste some memory you could create a dictionary start_dict containing the start integers as keys and the tuples as values and do something like this:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_dict = {item[0]: item for item in tpl}
start = tpl[0][0]
res = []
while start_dict:
item = start_dict[start]
del start_dict[start]
res.append(item)
start = item[-1]
print(res)
if two tuples start with the same number you will lose one of them... if not all the start numbers are used the loop will not terminate.
but maybe this is something to build on.

Actually there're many questions about what you intend to have as an output and what if the input list has invalid structure to do what you need.
Assuming you have an input of pairs where each number is included twice only. So we can consider such input as a graph where numbers are nodes and each pair is an edge. And as far as I understand your question you suppose that this graph is cyclic and looks like this:
10 - 7 - 13 - 4 - 9 - 10 (same 10 as at the beginning)
This shows you that you can reduce the list to store the graph to [10, 7, 13, 4, 9]. And here is the script that sorts the input list:
# input
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# sorting and archiving
first = input[0][0]
last = input[0][1]
output_in_place = [first, last]
while last != first:
for item in input:
if item[0] == last:
last = item[1]
if last != first:
output_in_place.append(last)
print(output_in_place)
# output
output = []
for i in range(len(output_in_place) - 1):
output.append((output_in_place[i], output_in_place[i+1]))
output.append((output_in_place[-1], output_in_place[0]))
print(output)

I would first create a dictionary of the form
{first_value: [list of tuples with that first value], ...}
Then work from there:
from collections import defaultdict
chosen_tuples = input[:1] # Start from the first
first_values = defaultdict()
for tup in input[1:]:
first_values[tup[0]].append(tup)
while first_values: # Loop will end when all lists are removed
value = chosen_tuples[-1][1] # Second item of last tuple
tuples_with_that_value = first_values[value]
chosen_tuples.append(tuples_with_that_value.pop())
if not chosen_with_that_value:
del first_values[value] # List empty, remove it

You can try this:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [input[0]] # output contains the first element of input
temp = input[1:] # temp contains the rest of elements in input
while temp:
item = [i for i in temp if i[0] == output[-1][1]].pop() # We compare each element with output[-1]
output.append(item) # We add the right item to output
temp.remove(item) # We remove each handled element from temp
Output:
>>> output
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]

Here is a robust solution using the sorted function and a custom key function:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
def consec_sort(lst):
def key(x):
nonlocal index
if index <= lower_index:
index += 1
return -1
return abs(x[0] - lst[index - 1][1])
for lower_index in range(len(lst) - 2):
index = 0
lst = sorted(lst, key=key)
return lst
output = consec_sort(input)
print(output)
The original list is not modified. Note that sorted is called 3 times for your input list of length 5. In each call, one additional tuple is placed correctly. The first tuple keeps it original position.
I have used the nonlocal keyword, meaning that this code is for Python 3 only (one could use global instead to make it legal Python 2 code).

My two cents:
def match_tuples(input):
# making a copy to not mess up with the original one
tuples = input[:] # [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
last_elem = tuples.pop(0) # (10,7)
# { "first tuple's element": "index in list"}
indexes = {tup[0]: i for i, tup in enumerate(tuples)} # {9: 3, 4: 0, 13: 1, 7: 2}
yield last_elem # yields de firts element
for i in range(len(tuples)):
# get where in the list is the tuple which first element match the last element in the last tuple
list_index = indexes.get(last_elem[1])
last_elem = tuples[list_index] # just get that tuple
yield last_elem
Output:
input = [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
print(list(match_tuples(input)))
# output: [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]

this is a (less efficient than the dictionary version) variant where the list is changed in-place:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
for i in range(1, len(tpl)-1): # iterate over the indices of the list
item = tpl[i]
for j, next_item in enumerate(tpl[i+1:]): # find the next item
# in the remaining list
if next_item[0] == item[1]:
next_index = i + j
break
tpl[i], tpl[next_index] = tpl[next_index], tpl[i] # now swap the items
here is a more efficient version of the same idea:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_index = {item[0]: i for i, item in enumerate(tpl)}
item = tpl[0]
next_index = start_index[item[-1]]
for i in range(1, len(tpl)-1):
tpl[i], tpl[next_index] = tpl[next_index], tpl[i]
# need to update the start indices:
start_index[tpl[next_index][0]] = next_index
start_index[tpl[i][0]] = i
next_index = start_index[tpl[i][-1]]
print(tpl)
the list is changed in-place; the dictionary only contains the starting values of the tuples and their index in the list.

To get a O(n) algorithm one needs to make sure that one doesn't do a double-loop over the array. One way to do this is by keeping already processed values in some sort of lookup-table (a dict would be a good choice).
For example something like this (I hope the inline comments explain the functionality well). This modifies the list in-place and should avoid unnecessary (even implicit) looping over the list:
inp = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# A dictionary containing processed elements, first element is
# the key and the value represents the tuple. This is used to
# avoid the double loop
seen = {}
# The second value of the first tuple. This must match the first
# item of the next tuple
current = inp[0][1]
# Iteration to insert the next element
for insert_idx in range(1, len(inp)):
# print('insert', insert_idx, seen)
# If the next value was already found no need to search, just
# pop it from the seen dictionary and continue with the next loop
if current in seen:
item = seen.pop(current)
inp[insert_idx] = item
current = item[1]
continue
# Search the list until the next value is found saving all
# other items in the dictionary so we avoid to do unnecessary iterations
# over the list.
for search_idx in range(insert_idx, len(inp)):
# print('search', search_idx, inp[search_idx])
item = inp[search_idx]
first, second = item
if first == current:
# Found the next tuple, break out of the inner loop!
inp[insert_idx] = item
current = second
break
else:
seen[first] = item

Insert element to list based on previous and next elements

I'm trying to add a new tuple to a list of tuples (sorted by first element in tuple), where the new tuple contains elements from both the previous and the next element in the list.
Example:
oldList = [(3, 10), (4, 7), (5,5)]
newList = [(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
(4,10) was constructed from and added in between (3,10) and (4,7).
Construct (x,y) from (a,y) and (x,b)
I've tried using enumerate() to insert at the specific position, but that doesn't really let me access the next element.

oldList = [(3, 10), (4, 7), (5,5)]
def pair(lst):
# create two iterators
it1, it2 = iter(lst), iter(lst)
# move second to the second tuple
next(it2)
for ele in it1:
# yield original
yield ele
# yield first ele from next and first from current
yield (next(it2)[0], ele[1])
Which will give you:
In [3]: oldList = [(3, 10), (4, 7), (5, 5)]
In [4]: list(pair(oldList))
Out[4]: [(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
Obviously we need to do some error handling to handle different possible situations.
You could also do it using a single iterator if you prefer:
def pair(lst):
it = iter(lst)
prev = next(it)
for ele in it:
yield prev
yield (prev[0], ele[1])
prev = ele
yield (prev[0], ele[1])
You can use itertools.tee in place of calling iter:
from itertools import tee
def pair(lst):
# create two iterators
it1, it2 = tee(lst)
# move second to the second tuple
next(it2)
for ele in it1:
# yield original
yield ele
# yield first ele from next and first from current
yield (next(it2)[0], ele[1])

You can use a list comprehension and itertools.chain():
>>> list(chain.from_iterable([((i, j), (x, j)) for (i, j), (x, y) in zip(oldList, oldList[1:])])) + oldList[-1:]
[(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]

Not being a big fan of one-liners (or complexity) myself, I will propose a very explicit and readable (which is usually a good thing!) solution to your problem.
So, in a very simplistic approach, you could do this:
def insertElements(oldList):
"""
Return a new list, alternating oldList tuples with
new tuples in the form (oldList[i+1][0],oldList[i][1])
"""
newList = []
for i in range(len(oldList)-1):
# take one tuple as is
newList.append(oldList[i])
# then add a new one with items from current and next tuple
newList.append((oldList[i+1][0],oldList[i][1]))
else:
# don't forget the last tuple
newList.append(oldList[-1])
return newList
oldList = [(3, 10), (4, 7), (5, 5)]
newList = insertElements(oldList)
That will give you the desired result in newList:
print(newList)
[(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
This is not much longer code than other more sophisticated (and memory efficient!) solutions, like using generators, AND I consider it a lot easier to read than intricate one-liners. Also, it would be easy to add some checks to this simple function (like making sure you have a list of tuples).
Unless you already know you need to optimize this particular piece of your code (assuming this is part of a bigger project), this should be good enough. At the same time it is: easy to implement, easy to read, easy to explain, easy to maintain, easy to extend, easy to refactor, etc.
Note: all other previous answers to your question are also better solutions than this simple one, in many ways. Just wanted to give you another choice. Hope this helps.

Sorting out strands in python

I have a list of pairs of numbers with the list sorted by the number on the right- eg:
[(7, 1)
(6, 2)
(5, 3)
(8, 5)
(9, 7)
(4, 9)]
and I want to get out the strands that are linked. A strand is defined as:
x->y->z
where tuples exist:
(y, x)
(z, y)
The strands in the above example are:
1->7->9->4
2->6
3->5->8
in the above example. I cannot think of any sensible code; as simple iteration with a counting variable will cause significant repeats. Please give me some pointers.

There's an easier way to do this than a real linked list. Since there's no real need for traversal, you can simply build regular lists as you go.
ts = [(7, 1),
(6, 2),
(5, 3),
(8, 5),
(9, 7),
(4, 9)]
def get_strands(tuples):
'''builds a list of lists of connected x,y tuples
get_strands([(2,1), (3,2), (4,3)]) -> [[1,2,3,4]]
Note that this will not handle forked or merging lists intelligently
'''
lst = []
for end, start in tuples:
strand = next((strand for strand in lst if strand[-1]==start), None)
# give me the sublist that ends with `start`, or None
if strand is None:
lst.append([start, end]) # start a new strand
else:
strand.append(end)
return lst
Demo:
In [21]: get_strands(ts)
Out[21]: [[1, 7, 9, 4], [2, 6], [3, 5, 8]]

I think the most complete solution is to create a graph from your data and then perform a topological sort on it. It will provide your expected result as long as the your graph doesn't have any cycles.

Python Easiest Way to Sum List Intersection of List of Tuples

Let's say I have the following two lists of tuples
myList = [(1, 7), (3, 3), (5, 9)]
otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
returns => [(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
I would like to design a merge operation that merges these two lists by checking for any intersections on the first element of the tuple, if there are intersections, add the second elements of each tuple in question (merge the two). After the operation I would like to sort based upon the first element.
I am also posting this because I think its a pretty common problem that has an obvious solution, but I feel that there could be very pythonic solutions to this question ;)

Use a dictionary for the result:
result = {}
for k, v in my_list + other_list:
result[k] = result.get(k, 0) + v
If you want a list of tuples, you can get it via result.items(). The resulting list will be in arbitrary order, but of course you can sort it if desired.
(Note that I renamed your lists to conform with Python's style conventions.)

Use defaultdict:
from collections import defaultdict
results_dict = defaultdict(int)
results_dict.update(my_list)
for a, b in other_list:
results_dict[a] += b
results = sorted(results_dict.items())
Note: When sorting sequences, sorted sorts by the first item in the sequence. If the first elements are the same, then it compares the second element. You can give sorted a function to sort by, using the key keyword argument:
results = sorted(results_dict.items(), key=lambda x: x[1]) #sort by the 2nd item
or
results = sorted(results_dict.items(), key=lambda x: abs(x[0])) #sort by absolute value

A method using itertools:
>>> myList = [(1, 7), (3, 3), (5, 9)]
>>> otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
>>> import itertools
>>> merged = []
>>> for k, g in itertools.groupby(sorted(myList + otherList), lambda e: e[0]):
... merged.append((k, sum(e[1] for e in g)))
...
>>> merged
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
This first concatenates the two lists together and sorts it. itertools.groupby returns the elements of the merged list, grouped by the first element of the tuple, so it just sums them up and places it into the merged list.

>>> [(k, sum(v for x,v in myList + otherList if k == x)) for k in dict(myList + otherList).keys()]
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
>>>
tested for both Python2.7 and 3.2
dict(myList + otherList).keys() returns an iterable containing a set of the keys for the joined lists
sum(...) takes 'k' to loop again through the joined list and add up tuple items 'v' where k == x
... but the extra looping adds processing overhead. Using an explicit dictionary as proposed by Sven Marnach avoids it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merging overlapping items in a list - python

Related

Sorting a list of tuples made up of two numbers by consecutive numbers

Sort a list of tuples in consecutive order

Insert element to list based on previous and next elements

Sorting out strands in python

Python Easiest Way to Sum List Intersection of List of Tuples

Categories

Resources