I'm trying to add a new tuple to a list of tuples (sorted by first element in tuple), where the new tuple contains elements from both the previous and the next element in the list.
Example:
oldList = [(3, 10), (4, 7), (5,5)]
newList = [(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
(4,10) was constructed from and added in between (3,10) and (4,7).
Construct (x,y) from (a,y) and (x,b)
I've tried using enumerate() to insert at the specific position, but that doesn't really let me access the next element.
oldList = [(3, 10), (4, 7), (5,5)]
def pair(lst):
# create two iterators
it1, it2 = iter(lst), iter(lst)
# move second to the second tuple
next(it2)
for ele in it1:
# yield original
yield ele
# yield first ele from next and first from current
yield (next(it2)[0], ele[1])
Which will give you:
In [3]: oldList = [(3, 10), (4, 7), (5, 5)]
In [4]: list(pair(oldList))
Out[4]: [(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
Obviously we need to do some error handling to handle different possible situations.
You could also do it using a single iterator if you prefer:
def pair(lst):
it = iter(lst)
prev = next(it)
for ele in it:
yield prev
yield (prev[0], ele[1])
prev = ele
yield (prev[0], ele[1])
You can use itertools.tee in place of calling iter:
from itertools import tee
def pair(lst):
# create two iterators
it1, it2 = tee(lst)
# move second to the second tuple
next(it2)
for ele in it1:
# yield original
yield ele
# yield first ele from next and first from current
yield (next(it2)[0], ele[1])
You can use a list comprehension and itertools.chain():
>>> list(chain.from_iterable([((i, j), (x, j)) for (i, j), (x, y) in zip(oldList, oldList[1:])])) + oldList[-1:]
[(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
Not being a big fan of one-liners (or complexity) myself, I will propose a very explicit and readable (which is usually a good thing!) solution to your problem.
So, in a very simplistic approach, you could do this:
def insertElements(oldList):
"""
Return a new list, alternating oldList tuples with
new tuples in the form (oldList[i+1][0],oldList[i][1])
"""
newList = []
for i in range(len(oldList)-1):
# take one tuple as is
newList.append(oldList[i])
# then add a new one with items from current and next tuple
newList.append((oldList[i+1][0],oldList[i][1]))
else:
# don't forget the last tuple
newList.append(oldList[-1])
return newList
oldList = [(3, 10), (4, 7), (5, 5)]
newList = insertElements(oldList)
That will give you the desired result in newList:
print(newList)
[(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
This is not much longer code than other more sophisticated (and memory efficient!) solutions, like using generators, AND I consider it a lot easier to read than intricate one-liners. Also, it would be easy to add some checks to this simple function (like making sure you have a list of tuples).
Unless you already know you need to optimize this particular piece of your code (assuming this is part of a bigger project), this should be good enough. At the same time it is: easy to implement, easy to read, easy to explain, easy to maintain, easy to extend, easy to refactor, etc.
Note: all other previous answers to your question are also better solutions than this simple one, in many ways. Just wanted to give you another choice. Hope this helps.
Related
Given a list of iterables:
li = [(1,2), (3,4,8), (3,4,7), (9,)]
I want to sort by the third element if present, otherwise leave the order unchanged. So here the desired output would be:
[(1,2), (3,4,7), (3,4,8), (9,)]
Using li.sort(key=lambda x:x[2]) returns an IndexError. I tried a custom function:
def safefetch(li, idx):
try:
return li[idx]
except IndexError:
return # (ie return None)
li.sort(key=lambda x: safefetch(x, 2))
But None in sorting yields a TypeError.
Broader context: I first want to sort by the first element, then the second, then the third, etc. until the length of the longest element, ie I want to run several sorts of decreasing privilege (as in SQL's ORDER BY COL1 , COL2), while preserving order among those elements that aren't relevant. So: first sort everything by first element; then among the ties on el_1 sort on el_2, etc.. until el_n. My feeling is that calling a sort function on the whole list is probably the wrong approach.
(Note that this was an "XY question": for my actual question, just using sorted on tuples is simplest, as Patrick Artner pointed out in the comments. But the question is posed is trickier.)
We can first get the indices for distinct lengths of elements in the list via a defaultdict and then sort each sublist with numpy's fancy indexing:
from collections import defaultdict
# {length -> inds} mapping
d = defaultdict(list)
# collect indices per length
for j, tup in enumerate(li):
d[len(tup)].append(j)
# sort
li = np.array(li, dtype=object)
for inds in d.values():
li[inds] = sorted(li[inds])
# convert back to list if desired
li = li.tolist()
to get li at the end as
[(1, 2), (3, 4, 7), (3, 4, 8), (9,)]
For some other samples:
In [134]: the_sorter([(12,), (3,4,8), (3,4,7), (9,)])
Out[134]: [(9,), (3, 4, 7), (3, 4, 8), (12,)]
In [135]: the_sorter([(12,), (3,4,8,9), (3,4,7), (11, 9), (9, 11), (2, 4, 4, 4)])
Out[135]: [(12,), (2, 4, 4, 4), (3, 4, 7), (9, 11), (11, 9), (3, 4, 8, 9)]
where the_sorter is above procedure wrapped in a function (name lacks imagination...)
def the_sorter(li):
# {length -> inds} mapping
d = defaultdict(list)
# collect indices per length
for j, tup in enumerate(li):
d[len(tup)].append(j)
# sort
li = np.array(li)
for inds in d.values():
li[inds] = sorted(li[inds])
return li.tolist()
Whatever you return as fallback value must be comparable to the other key values that might be returned. In your example that would require a numerical value.
import sys
def safefetch(li, idx):
try:
return li[idx]
except IndexError:
return sys.maxsize # largest int possible
This would put all the short ones at the back of the sort order, but maintain a stable order among them.
Inspired by #Mustafa Aydın here is a solution in Pandas. Would prefer one without the memory overhead of a dataframe, but this might be good enough.
import pandas as pd
li = [(1,2), (3,4,8), (3,4,7), (9,)]
tmp = pd.DataFrame(li)
[tuple(int(el) for el in t if not pd.isna(el)) for t in tmp.sort_values(by=tmp.columns.tolist()).values]
> [(1, 2), (3, 4, 7), (3, 4, 8), (9,)]
I want to sort a list of tuples in a consecutive order, so the first element of each tuple is equal to the last element of the previous one.
For example:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
I have developed a search like this:
output=[]
given = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
t = given[0][0]
for i in range(len(given)):
# search tuples starting with element t
output += [e for e in given if e[0] == t]
t = output[-1][-1] # Get the next element to search
print(output)
Is there a pythonic way to achieve such order?
And a way to do it "in-place" (with only a list)?
In my problem, the input can be reordered in a circular way using all the tuples, so it is not important the first element chosen.
Assuming your tuples in the list will be circular, you may use dict to achieve it within complexity of O(n) as:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
input_dict = dict(input) # Convert list of `tuples` to dict
elem = input[0][0] # start point in the new list
new_list = [] # List of tuples for holding the values in required order
for _ in range(len(input)):
new_list.append((elem, input_dict[elem]))
elem = input_dict[elem]
if elem not in input_dict:
# Raise exception in case list of tuples is not circular
raise Exception('key {} not found in dict'.format(elem))
Final value hold by new_list will be:
>>> new_list
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
if you are not afraid to waste some memory you could create a dictionary start_dict containing the start integers as keys and the tuples as values and do something like this:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_dict = {item[0]: item for item in tpl}
start = tpl[0][0]
res = []
while start_dict:
item = start_dict[start]
del start_dict[start]
res.append(item)
start = item[-1]
print(res)
if two tuples start with the same number you will lose one of them... if not all the start numbers are used the loop will not terminate.
but maybe this is something to build on.
Actually there're many questions about what you intend to have as an output and what if the input list has invalid structure to do what you need.
Assuming you have an input of pairs where each number is included twice only. So we can consider such input as a graph where numbers are nodes and each pair is an edge. And as far as I understand your question you suppose that this graph is cyclic and looks like this:
10 - 7 - 13 - 4 - 9 - 10 (same 10 as at the beginning)
This shows you that you can reduce the list to store the graph to [10, 7, 13, 4, 9]. And here is the script that sorts the input list:
# input
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# sorting and archiving
first = input[0][0]
last = input[0][1]
output_in_place = [first, last]
while last != first:
for item in input:
if item[0] == last:
last = item[1]
if last != first:
output_in_place.append(last)
print(output_in_place)
# output
output = []
for i in range(len(output_in_place) - 1):
output.append((output_in_place[i], output_in_place[i+1]))
output.append((output_in_place[-1], output_in_place[0]))
print(output)
I would first create a dictionary of the form
{first_value: [list of tuples with that first value], ...}
Then work from there:
from collections import defaultdict
chosen_tuples = input[:1] # Start from the first
first_values = defaultdict()
for tup in input[1:]:
first_values[tup[0]].append(tup)
while first_values: # Loop will end when all lists are removed
value = chosen_tuples[-1][1] # Second item of last tuple
tuples_with_that_value = first_values[value]
chosen_tuples.append(tuples_with_that_value.pop())
if not chosen_with_that_value:
del first_values[value] # List empty, remove it
You can try this:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [input[0]] # output contains the first element of input
temp = input[1:] # temp contains the rest of elements in input
while temp:
item = [i for i in temp if i[0] == output[-1][1]].pop() # We compare each element with output[-1]
output.append(item) # We add the right item to output
temp.remove(item) # We remove each handled element from temp
Output:
>>> output
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
Here is a robust solution using the sorted function and a custom key function:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
def consec_sort(lst):
def key(x):
nonlocal index
if index <= lower_index:
index += 1
return -1
return abs(x[0] - lst[index - 1][1])
for lower_index in range(len(lst) - 2):
index = 0
lst = sorted(lst, key=key)
return lst
output = consec_sort(input)
print(output)
The original list is not modified. Note that sorted is called 3 times for your input list of length 5. In each call, one additional tuple is placed correctly. The first tuple keeps it original position.
I have used the nonlocal keyword, meaning that this code is for Python 3 only (one could use global instead to make it legal Python 2 code).
My two cents:
def match_tuples(input):
# making a copy to not mess up with the original one
tuples = input[:] # [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
last_elem = tuples.pop(0) # (10,7)
# { "first tuple's element": "index in list"}
indexes = {tup[0]: i for i, tup in enumerate(tuples)} # {9: 3, 4: 0, 13: 1, 7: 2}
yield last_elem # yields de firts element
for i in range(len(tuples)):
# get where in the list is the tuple which first element match the last element in the last tuple
list_index = indexes.get(last_elem[1])
last_elem = tuples[list_index] # just get that tuple
yield last_elem
Output:
input = [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
print(list(match_tuples(input)))
# output: [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
this is a (less efficient than the dictionary version) variant where the list is changed in-place:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
for i in range(1, len(tpl)-1): # iterate over the indices of the list
item = tpl[i]
for j, next_item in enumerate(tpl[i+1:]): # find the next item
# in the remaining list
if next_item[0] == item[1]:
next_index = i + j
break
tpl[i], tpl[next_index] = tpl[next_index], tpl[i] # now swap the items
here is a more efficient version of the same idea:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_index = {item[0]: i for i, item in enumerate(tpl)}
item = tpl[0]
next_index = start_index[item[-1]]
for i in range(1, len(tpl)-1):
tpl[i], tpl[next_index] = tpl[next_index], tpl[i]
# need to update the start indices:
start_index[tpl[next_index][0]] = next_index
start_index[tpl[i][0]] = i
next_index = start_index[tpl[i][-1]]
print(tpl)
the list is changed in-place; the dictionary only contains the starting values of the tuples and their index in the list.
To get a O(n) algorithm one needs to make sure that one doesn't do a double-loop over the array. One way to do this is by keeping already processed values in some sort of lookup-table (a dict would be a good choice).
For example something like this (I hope the inline comments explain the functionality well). This modifies the list in-place and should avoid unnecessary (even implicit) looping over the list:
inp = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# A dictionary containing processed elements, first element is
# the key and the value represents the tuple. This is used to
# avoid the double loop
seen = {}
# The second value of the first tuple. This must match the first
# item of the next tuple
current = inp[0][1]
# Iteration to insert the next element
for insert_idx in range(1, len(inp)):
# print('insert', insert_idx, seen)
# If the next value was already found no need to search, just
# pop it from the seen dictionary and continue with the next loop
if current in seen:
item = seen.pop(current)
inp[insert_idx] = item
current = item[1]
continue
# Search the list until the next value is found saving all
# other items in the dictionary so we avoid to do unnecessary iterations
# over the list.
for search_idx in range(insert_idx, len(inp)):
# print('search', search_idx, inp[search_idx])
item = inp[search_idx]
first, second = item
if first == current:
# Found the next tuple, break out of the inner loop!
inp[insert_idx] = item
current = second
break
else:
seen[first] = item
My goal is to merge overlapping tuples in the example list below.
If an item falls within the range of the next, the two tuples will have to be merged. The resulting tuple is one that covers the range of the two items (minimum to maximum values). For instance; [(1,6),(2,5)] will result in [(1,6)], as [2,5] falls within the range of [(1,6)]
mylist=[(1, 1), (1, 6), (2, 5), (4, 4), (9, 10)]
My attempt:
c=[]
t2=[]
for i, x in enumerate(mylist):
w=x,mylist[i-1]
if x[0]-my[i-1][1]<=1:
d=min([x[0] for x in w]),max([x[1] for x in w])
c.append(d)
for i, x in enumerate(set(c)):
t=x,c[i-1]
if x[0]-c[i-1][1]<=1:
t1=min([x[0] for x in t]),max([x[1] for x in t])
t2.append(t1)
print sorted(set(t2))
Derived Output:
[(1, 6), (1, 10)]
Desired output:
[(1, 6), (9, 10)]
Any suggestions on how to get the desired output (in fewer lines if possible)? Thanks.
Basing on answer from #Valera, python implementation:
mylist=[(1, 6), (2, 5), (1, 1), (3, 7), (4, 4), (9, 10)]
result = []
for item in sorted(mylist):
result = result or [item]
if item[0] > result[-1][1]:
result.append(item)
else:
old = result[-1]
result[-1] = (old[0], max(old[1], item[1]))
print result # [(1, 7), (9, 10)]
You can solve this problem in O(nlogn)
First, you need to sort your intervals by it's starting points. After that, you create a new stack, and for each interval do the following:
if it's empty, just push the current interval
if it's not, you check if the first interval in the stack overlaps with you current interval. If it does, you pop it, merge it with your current interval, and push the result back. If it doesn't, you just push your current interval. After you check all the intervals, your stack will contain all merged intervals.
How can I iterate over groupby results in pairs? What I tried isn't quite working:
from itertools import groupby,izip
groups = groupby([(1,2,3),(1,2),(1,2),(3,4,5),(3,4)],key=len)
def grouped(iterable, n):
return izip(*[iterable]*n)
for g, gg in grouped(groups,2):
print list(g[1]), list(gg[1])
Output I get:
[] [(1, 2), (1, 2)]
[] [(3, 4)]
Output I would like to have:
[(1, 2, 3)] [(1, 2), (1, 2)]
[(3, 4, 5)] [(3, 4)]
import itertools as IT
groups = IT.groupby([(1,2,3),(1,2),(1,2),(3,4,5),(3,4)], key=len)
groups = (list(group) for key, group in groups)
def grouped(iterable, n):
return IT.izip(*[iterable]*n)
for p1, p2 in grouped(groups, 2):
print p1, p2
yields
[(1, 2, 3)] [(1, 2), (1, 2)]
[(3, 4, 5)] [(3, 4)]
The code you posted is very interesting. It has a mundane problem, and a subtle problem.
The mundane problem is that itertools.groupby returns an iterator which outputs both a key and a group on each iteration.
Since you are interested in only the groups, not the keys, you need something like
groups = (group for key, group in groups)
The subtle problem is more difficult to explain -- I'm not really sure I understand it fully. Here is my guess: The iterator returned by groupby has turned its input,
[(1,2,3),(1,2),(1,2),(3,4,5),(3,4)]
into an iterator. That the groupby iterator is wrapped around the underlying data iterator is analogous to how a csv.reader is wrapped around an underlying file object iterator. You get one pass through this iterator and one pass only. The itertools.izip function, in the process of pairing items in groups, causes the groups iterator to advance from the first item to the second. Since you only get one pass through the iterator, the first item has been consumed, so when you call list(g[1]) it is empty.
A not-so-satisfying fix to this problem is to convert the iterators in groups into lists:
groups = (list(group) for key, group in groups)
so itertools.izip will not prematurely consume them. Edit: On second thought, this fix is not so bad. groups remains an iterator, and only turns the group into a list as it is consumed.
When you try to look at the second key from the groupby, you are forcing it to iterate that far into the source iterator. Since there is normally nowhere to store the items from the first group, they are simply discarded.
So now we understand why we'll need to make sure we've stored the items from the first group before we try to look at the key (or the items) of the second group.
Some people are sure to hate this, but
>>> groups = groupby([(1, 2, 3), (1, 2), (1, 2), (3, 4, 5), (3, 4)], key=len)
>>> for i, j in ((list(i[1]), list(next(groups)[1])) for i in groups):
... print i, j
...
[(1, 2, 3)] [(1, 2), (1, 2)]
[(3, 4, 5)] [(3, 4)]
Let's say I have the following two lists of tuples
myList = [(1, 7), (3, 3), (5, 9)]
otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
returns => [(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
I would like to design a merge operation that merges these two lists by checking for any intersections on the first element of the tuple, if there are intersections, add the second elements of each tuple in question (merge the two). After the operation I would like to sort based upon the first element.
I am also posting this because I think its a pretty common problem that has an obvious solution, but I feel that there could be very pythonic solutions to this question ;)
Use a dictionary for the result:
result = {}
for k, v in my_list + other_list:
result[k] = result.get(k, 0) + v
If you want a list of tuples, you can get it via result.items(). The resulting list will be in arbitrary order, but of course you can sort it if desired.
(Note that I renamed your lists to conform with Python's style conventions.)
Use defaultdict:
from collections import defaultdict
results_dict = defaultdict(int)
results_dict.update(my_list)
for a, b in other_list:
results_dict[a] += b
results = sorted(results_dict.items())
Note: When sorting sequences, sorted sorts by the first item in the sequence. If the first elements are the same, then it compares the second element. You can give sorted a function to sort by, using the key keyword argument:
results = sorted(results_dict.items(), key=lambda x: x[1]) #sort by the 2nd item
or
results = sorted(results_dict.items(), key=lambda x: abs(x[0])) #sort by absolute value
A method using itertools:
>>> myList = [(1, 7), (3, 3), (5, 9)]
>>> otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
>>> import itertools
>>> merged = []
>>> for k, g in itertools.groupby(sorted(myList + otherList), lambda e: e[0]):
... merged.append((k, sum(e[1] for e in g)))
...
>>> merged
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
This first concatenates the two lists together and sorts it. itertools.groupby returns the elements of the merged list, grouped by the first element of the tuple, so it just sums them up and places it into the merged list.
>>> [(k, sum(v for x,v in myList + otherList if k == x)) for k in dict(myList + otherList).keys()]
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
>>>
tested for both Python2.7 and 3.2
dict(myList + otherList).keys() returns an iterable containing a set of the keys for the joined lists
sum(...) takes 'k' to loop again through the joined list and add up tuple items 'v' where k == x
... but the extra looping adds processing overhead. Using an explicit dictionary as proposed by Sven Marnach avoids it.