Python - strategy to construct triplet tuples from pair tuples - python

I don't think the title does a great job acting as a high level explanation of the problem, but I do think this is an interesting problem to try to solve:
Given a python list of tuples of length 2:
pairs = [('G', 'H'), ('C', 'D'), ('B', 'D'), ('A', 'B'), ('B', 'C')]
I would like to create a new list containing tuples of length 3, on the condition that the tuple ('X', 'Y', 'Z') is created only if the pairs ('X', 'Y'), ('Y', 'Z'), and ('X', 'Z') all appear as tuples in the pairs list. In the case of my pairs list, only the triplet ('B', 'C', 'D') would be created (preferably alphabetically).
I haven't used python in several months, so am a bit rusty and would prefer to solve this using mostly base python packages, but open to any suggestions. Thanks in advance for any help!

I'd use itertools to check if all the pairs exist.
from itertools import combinations
doubles = [('G', 'H'), ('C', 'D'), ('B', 'D'), ('A', 'B'), ('B', 'C')]
keys = set([x for double in doubles for x in double])
options = combinations(keys, 3)
triples = list()
for option in options:
x, y, z = sorted(option)
first, second, third = (x, y), (x, z), (y, z)
if first in doubles and second in doubles and third in doubles:
triples.append(option)
This assumes that all the tuples in your list are already sorted though.

vals = set([i for (i, j) in pairs] + [j for (i, j) in pairs])
triples = [(i, j, k) for i in vals
for j in vals
for k in vals
if (((i, j) in pairs) and
((j, k) in pairs) and
((i, k) in pairs))]
Now, this only works if the order of the tuples matter. If not, you'd want to include the reverse-order tuples in pairs as well

Related

Find longest combination of non-overlapping pairs

I need to find the length of the longest combination of pairs that can be made from a list of pairs, without any common elements.
For example the following list of pairs:
[(A, B), (A, D), (B, C), (B, D), (C, D)]
Would have these combinations:
[(A, B), (C, D)]
[(A, D), (B, C)]
[(B, D)]
And so the longest combination would be 2 pairs in length.
This needs to be able to handle up to several thousand pairs so generating all possible combinations of pairs at each possible length and checking for overlaps would not work.
However, the total number of unique elements across all pairs is capped at 100, so the longest possible combination that could be encountered would be 50 pairs.
Is there an efficient way to do this?
okay this is what I have, maybe not the best but its something
so Combo initializes any 2 pairs, and feeds it to Combine along with the rest of the array not check yet
Combine takes an the leftover array, the current combo and a list of used elements, then check each possible combination, if the check tuple from the leftover array has any elements in the used list, it skips it, if it doesnt, it adds it to the combo and passes it to a further recursed Combine until its as long as it can be
arr = [('A', 'B'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('E', 'D'), ("A",'F'),('J','K'),('M','K'),('K','D'),('B','F')]
def Combo(arr):
combos = []
for i, tup1 in enumerate(arr):
combo = [tup1]
used = [tup1[0], tup1[1]]
for j, tup2 in enumerate(arr[i:]):
if (tup2[0] in used) or (tup2[1] in used):
continue
else:
for el in tup2:
used.append(el)
combo.append(tup2)
combo=Combine(arr[j:], combo, used)
combos.append(combo)
return combos
def Combine(arr, combo, used):
if arr==[]:
return combo
for i, tup in enumerate(arr):
unique = True
for el in tup:
if el in used:
unique = False
continue
if unique:
combo.append(tup)
for el in tup:
used.append(el)
return Combine(arr[i:], combo, used)
return combo
Combo(arr)
OUTPUT
[[('A', 'B'), ('E', 'D'), ('J', 'K')],
[('A', 'D'), ('B', 'C'), ('J', 'K')],
[('B', 'C'), ('E', 'D'), ('A', 'F'), ('J', 'K')],
[('B', 'D'), ('A', 'F'), ('J', 'K')],
[('E', 'D'), ('A', 'F'), ('B', 'C'), ('J', 'K')],
[('A', 'F'), ('J', 'K'), ('B', 'C'), ('E', 'D')],
[('J', 'K'), ('B', 'F'), ('E', 'D')],
[('M', 'K'), ('B', 'F'), ('E', 'D')],
[('K', 'D'), ('B', 'F')]]
as far as I know this should give you each unique combination in a list
Rephrasing the question, we want to find the biggest set of non-overlapping elements of pairs. Probably not the best solution but should work:
def process(pairs):
output = {}
max_length = 0
for i in range(len(pairs)):
curr = 1
output[pairs[i]] = set(pairs[i])
rest = pairs[:i] + pairs[i + 1:]
for j in range(len(rest)):
subset = output[pairs[i]] | set(rest[j])
if len(subset) == len(output[pairs[i]]) + 2:
curr += 1
output[pairs[i]] = subset
max_length = max(curr, max_length)
return max_length
We populate our initial set with the current pair and then if the next pair's elements are not presented in the current set we extend it. We continue this process until we checked all remaining pairs. I used this function for testing:
import random, timeit
def get_random_pairs(num):
return [(random.choice(string.ascii_uppercase), random.choice(string.ascii_uppercase)) for _ in range(num)]
print(timeit.timeit('process(pairs)', number=5, setup="from __main__ import process,get_random_pairs; pairs = get_random_pairs(3000)")/5)
On my machine (Intel i7-9750H (12) # 4.500GHz) it takes about 5-6 seconds to process 3000 pairs.

Convert list of tuples such that [(a,b,c)] converts to [(a,b),(a,c)]

Thoughts on how I would do this? I want the first value in the tuple to pair with each successive value. This way each resulting tuple would be a pair starting with the first value.
I need to do this: [(a,b,c)] --> [(a,b),(a,c)]
You can try this.
(t,)=[('a','b','c')]
[(t[0],i) for i in t[1:]]
# [('a', 'b'), ('a', 'c')]
Using itertools.product
it=iter(('a','b','c'))
list(itertools.product(next(it),it))
# [('a', 'b'), ('a', 'c')]
Using itertools.repeat
it=iter(('a','b','c'))
list(zip(itertools.repeat(next(it)),it))
# [('a', 'b'), ('a', 'c')]
a = [('a','b','c')]
a = a[0]
a = [tuple([a[0], a[index]]) for index in range(1, len(a))]
Try this !
A solution that uses itertools's combinations module.
from itertools import combinations
arr = (['a','b','c'])
for i in list(combinations(arr, 2)):
if(i[0]==arr[0]):
print(i ,end = " ")
This would give a solution ('a', 'b') ('a', 'c')
You can just append pairs of tuples to a list:
original = [(1,2,3)]
def makePairs(lis):
ret = []
for t in lis:
ret.append((t[0],t[1]))
ret.append((t[0],t[2]))
return ret
print(makePairs(original))
Output:
[(1, 2), (1, 3)]
If your tuples are arbitrary length you can write a simple generator:
def make_pairs(iterable):
iterator = iter(iterable)
first = next(iterator)
for item in iterator:
yield first, item
example result:
my_tuple = ('a', 'b', 'c', 'd')
list(make_pairs(my_tuple))
Out[170]: [('a', 'b'), ('a', 'c'), ('a', 'd')]
This is a memory-efficient solution.

Combine lists of tuples by pairs into specific length of new list of tuples

I am newish to Python. I'm trying to combine two lists of tuples pairwise into a single list of tuples, where the tuples are of defined length (let's say 8):
For example,
input:
x = [(0,1,2,3),(4,5,6,7),(8,9,10,11)]
y = [('a','b','c','d'),('e','f','g','h'),('i','j','k','l')]
output:
[('a', 0, 'b', 1, 'c', 2, 'd', 3),
('e', 4, 'f', 5,'g', 6, 'h', 7),
('i', 8, 'j', 9, 'k', 10, 'l', 11)]
I've tried a few different loops that attempt to concatenate the pairwise combination tuples and then add them for a given length, but no luck. See below.
new = []
for n in range(len(x)):
for p in range(len(x[n])):
if p == len(x[n])-1:
new += [(x[n][p],y[n][p])]
for v in range(len(x[n])):
newer += new[v]
else:
new += [(x[n][p],y[n][p])]
The above 'newer' list is not useful, but the 'new' list provides the pairwise combination of tuples that I'm looking for, like I believe merge() would do, at least.
[('a', 0),('b', 1),('c', 2),('d', 3),('e', 4),('f', 5),('g', 6),('h', 7),('i', 8) ('j', 9),('k', 10), ('l', 11)]
I was thinking I could make a sort of window that read across the desired length (in this case four) and concatenated the selection, but have having trouble getting that to work.
Any other solutions are welcome.
Using a buffer:
b = [None] * 8
[tuple(b) for b[::2], b[1::2] in zip(y, x)]
Start by zipping your sub-lists into lists of matched pairs:
pair_list = [list(zip(a, b)) for a, b in zip(x, y) ]
Result:
[[(0, 'a'), (1, 'b'), (2, 'c'), (4, 'd')],
[(5, 'e'), (6, 'f'), (7, 'g'), (8, 'h')],
[(9, 'i'), (10, 'j'), (11, 'k'), (12, 'l')]]
Now, simply flatten the inner lists. You can look up simple flattening as an "exercise for the student", okay?
(Posted solution on behalf of the question author to move it to the answer space).
I ended up getting something to work:
final_list = []
for outer in range(len(a)):
for ele in range(len(a[outer])):
if ele == 0:
slice_start = 0
else:
slice_start += len(b[ele-1])
slice_end = len(b[ele])+slice_start
capture = [target for sublist in a[slice_start:slice_end] for target in sublist]
final_list.append(capture)
final_list = final_list[:len(a)]
Definitely not as beautiful as heap_overflow's answer.

How to merge all intersecting tuples in a list? [duplicate]

Consider the following list:
tuple_list = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
How can I achieve this?
new_tuple_list = [('c', 'e', 'd'), ('a', 'b')]
I have tried:
for tuple in tuple_list:
for tup in tuple_list:
if tuple[0] == tup[0]:
new_tup = (tuple[0],tuple[1],tup[1])
new_tuple_list.append(new_tup)
But it only works if I have the elements of the tuple in a certain order which means it will result in this instead:
new_tuple_list = [('c', 'e', 'd'), ('a', 'b'), ('d', 'e')]
You could consider the tuples as edges in a graph and your goal as finding connected components within the graph. Then you could simply loop over vertices (items in tuples) and for each vertex you haven't visited yet execute DFS to generate a component:
from collections import defaultdict
def dfs(adj_list, visited, vertex, result, key):
visited.add(vertex)
result[key].append(vertex)
for neighbor in adj_list[vertex]:
if neighbor not in visited:
dfs(adj_list, visited, neighbor, result, key)
edges = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
adj_list = defaultdict(list)
for x, y in edges:
adj_list[x].append(y)
adj_list[y].append(x)
result = defaultdict(list)
visited = set()
for vertex in adj_list:
if vertex not in visited:
dfs(adj_list, visited, vertex, result, vertex)
print(result.values())
Output:
[['a', 'b'], ['c', 'e', 'd']]
Note that in above both the components and elements within a component are in random order.
If you don't need duplicate values (the ability to preserve ['a', 'a', 'b'], for example), this is a simple and fast way to do what you want via sets:
iset = set([frozenset(s) for s in tuple_list]) # Convert to a set of sets
result = []
while(iset): # While there are sets left to process:
nset = set(iset.pop()) # Pop a new set
check = len(iset) # Does iset contain more sets
while check: # Until no more sets to check:
check = False
for s in iset.copy(): # For each other set:
if nset.intersection(s): # if they intersect:
check = True # Must recheck previous sets
iset.remove(s) # Remove it from remaining sets
nset.update(s) # Add it to the current set
result.append(tuple(nset)) # Convert back to a list of tuples
gives
[('c', 'e', 'd'), ('a', 'b')]
This has a bad performance because list-contains checks are O(n) but it's quite short:
result = []
for tup in tuple_list:
for idx, already in enumerate(result):
# check if any items are equal
if any(item in already for item in tup):
# tuples are immutable so we need to set the result item directly
result[idx] = already + tuple(item for item in tup if item not in already)
break
else:
# else in for-loops are executed only if the loop wasn't terminated by break
result.append(tup)
This has the nice side-effect that the order is kept:
>>> result
[('c', 'e', 'd'), ('a', 'b')]
I had that problem with sets so I'm contributing my solution to this. It combines sets with one of more common element as long as possible.
My example data:
data = [['A','B','C'],['B','C','D'],['D'],['X'],['X','Y'],['Y','Z'],['M','N','O'],['M','N','O'],['O','A']]
data = list(map(set,data))
My code to solve the problem:
oldlen = len(data)+1
while len(data)<oldlen:
oldlen = len(data)
for i in range(len(data)):
for j in range(i+1,len(data)):
if len(data[i]&data[j]):
data[i] = data[i]|data[j]
data[j] = set()
data = [data[i] for i in range(len(data)) if data[i]!= set()]
Result:
[{'A', 'B', 'C', 'D', 'M', 'N', 'O'}, {'X', 'Y', 'Z'}]
The task becomes trivial with NetworkX, library for graphs manipulation. Similar to this answer by #niemmi you'd need to find the connected components:
import networkx as nx
tuple_list = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
graph = nx.Graph(tuple_list)
result = list(nx.connected_components(graph))
print(result)
# [{'e', 'c', 'd'}, {'b', 'a'}]
To get the result as a list of tuples:
result = list(map(tuple, nx.connected_components(G)))
print(result)
# [('d', 'e', 'c'), ('a', 'b')]
Use sets. You are checking for overlap and accumulation of (initially small) sets, and Python has a data type for that:
#!python3
#tuple_list = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
tuple_list = [(1,2), (3,4), (5,), (1,3,5), (3,'a'),
(9,8), (7,6), (5,4), (9,'b'), (9,7,4),
('c', 'e'), ('e', 'f'), ('d', 'e'), ('d', 'f'),
('a', 'b'),
]
set_list = []
print("Tuple list:", tuple_list)
for t in tuple_list:
#print("Set list:", set_list)
tset = set(t)
matched = []
for s in set_list:
if tset & s:
s |= tset
matched.append(s)
if not matched:
#print("No matches. New set: ", tset)
set_list.append(tset)
elif len(matched) > 1:
#print("Multiple Matches: ", matched)
for i,iset in enumerate(matched):
if not iset:
continue
for jset in matched[i+1:]:
if iset & jset:
iset |= jset
jset.clear()
set_list = [s for s in set_list if s]
print('\n'.join([str(s) for s in set_list]))
I bumped into this problem when resolving coreferences, I need to merge sets in a list of sets that have common elements:
import copy
def merge(list_of_sets):
# init states
list_of_sets = copy.deepcopy(list_of_sets)
result = []
indices = find_fist_overlapping_sets(list_of_sets)
while indices:
# Keep other sets
result = [
s
for idx, s in enumerate(list_of_sets)
if idx not in indices
]
# Append merged set
result.append(
list_of_sets[indices[0]].union(list_of_sets[indices[1]])
)
# Update states
list_of_sets = result
indices = find_fist_overlapping_sets(list_of_sets)
return list_of_sets
def find_fist_overlapping_sets(list_of_sets):
for i, i_set in enumerate(list_of_sets):
for j, j_set in enumerate(list_of_sets[i+1:]):
if i_set.intersection(j_set):
return i, i+j+1

How to separate elements of tuples into occurrences of pairs in Python?

I have a tuple that looks like:
t=(('a','b'),('a','c','d','e'),('c','d','e'))
I need to rearrange it so I have a new tuple that will look like:
t2=(('a','b'),('a','c'),('c','d'),('d','e'),('c','d'),('d','e'))
Basically the new tuple takes pairs (of 2) from each element of the old tuple. But I am not sure how to get started. Thanks for your help.
Use a generator expression with zip to pair and convert to a tuple at the end:
>>> t = (('a','b'),('a','c','d','e'),('c','d','e'))
>>> tuple((x) for tupl in t for x in zip(tupl, tupl[1:]))
(('a', 'b'), ('a', 'c'), ('c', 'd'), ('d', 'e'), ('c', 'd'), ('d', 'e'))
Try this out :
tuple([(t[i][j],t[i][j+1]) for i in range(len(t)) for j in range(len(t[i])-1)])
#[('a', 'b'), ('a', 'c'), ('c', 'd'), ('d', 'e'), ('c', 'd'), ('d', 'e')]
You can also try another way. If the problem is reduced to do this for one tuple alone :
def pairs(my_tuple):
return [(my_tuple[i],my_tuple[i+1]) for i in range(len(my_tuple)-1)]
Then this can be mapped for all the tuples
tuple(sum(list(map(pairs,t)),[]))
#(('a', 'b'), ('a', 'c'), ('c', 'd'), ('d', 'e'), ('c', 'd'), ('d', 'e'))
Explanation :
map(pairs,t) : maps the function pairs for every element in tuple t
list(map(pairs,t)) : output of the above
But as a nested list
[[[('a', 'b')], [('a', 'c'), ('c', 'd'), ('d', 'e')],...]
sum(list(...),[]) : Flattens out this nested list for the desired output
Here's what I came up with really quick
def transform(t):
out = []
for tup in t:
for i in range(0, len(tup) - 1):
out.append((tup[i], tup[i+1]))
return tuple(out)
You can use this easy to understand code:
t = (('a','b'),('a','c','d','e'),('c','d','e'))
t2 = []
for i in t:
for j in range(len(i)-1):
t2.append((i[j], i[j+1]))
t2 = tuple(t2)
Obviously it isn't very optimized like other answers but for an easy understanding it will be perfect.
That is something equivalent to:
t2 = tuple((i[j], i[j+1]) for i in t for j in range(len(i)-1))
That is a generator expression, something quite similar to list comprehension (it use brackets instead of square brackets) and they basically do similar things, or at least in basic codes like this one. I still don't understand very well their differences but the generators are one-time fasters while the list comprehension are slower but reusable...
Nevermind: the generator means:
t2 = tuple(...) # Make with the result array a tuple, otherwise it will be a list.
for i in t # Iterate over each item of t, they will by called i.
for i in t for j in range(len(i)) # Iterate over each item of t --called--> i and then iterate over the range(len(i)) --called--> j.
(i[j], i[j+1]) for i in t for j in range(len(i)) # The same as before but each time we get a new j (each time the second loop iterate) do --> (i[j], i[j+1])
I know, make two generator/list expression/comprehension on the same line is strange. I always look at an answer like this one to remember how to do that.
My old answer was:
t = (('a','b'),('a','c','d','e'),('c','d','e'))
t2 = []
for i in t:
for j in range(len(i)):
if j < len(i) - 1:
t2.append((i[j], i[j+1]))
t2 = tuple(t2)
But I notice that adding a -1 to the len() of the loop I can avoid that line, because I won't never get an out of index error.

Categories

Resources