How can I generate all unique nested 2-tuples (nested pairings) of a set of n objects in Python? - python

By nested 2-tuples, I mean something like this: ((a,b),(c,(d,e))) where all tuples have two elements. I don't need different orderings of the elements, just the different ways of putting parentheses around them. For items = [a, b, c, d], there are 5 unique pairings, which are:
(((a,b),c),d)
((a,(b,c)),d)
(a,((b,c),d))
(a,(b,(c,d)))
((a,b),(c,d))
In a perfect world I'd also like to have control over the maximum depth of the returned tuples, so that if I generated all pairings of items = [a, b, c, d] with max_depth=2, it would only return ((a,b),(c,d)).
This problem turned up because I wanted to find a way to generate the results of addition on non-commutative, non-associative numbers. If a+b doesn't equal b+a, and a+(b+c) doesn't equal (a+b)+c, what are all the possible sums of a, b, and c?
I have made a function that generates all pairings, but it also returns duplicates.
import itertools
def all_pairings(items):
if len(items) == 2:
yield (*items,)
else:
for i, pair in enumerate(itertools.pairwise(items)):
for pairing in all_pairings(items[:i] + [pair] + items[i+2:]):
yield pairing
For example, it returns ((a,b),(c,d)) twice for items=[a, b, c, d], since it pairs up (a,b) first in one case and (c,d) first in the second case.
Returning duplicates becomes a bigger and bigger problem for larger numbers of items. With duplicates, the number of pairings grows factorially, and without duplicates it grows exponentially, according to the Catalan Numbers (https://oeis.org/A000108).
n
With duplicates: (n-1)!
Without duplicates: (2(n-1))!/(n!(n-1)!)
1
1
1
2
1
1
3
2
2
4
6
5
5
24
14
6
120
42
7
720
132
8
5040
429
9
40320
1430
10
362880
4862
Because of this, I have been trying to come up with an algorithm that doesn't need to search through all the possibilities, only the unique ones. Again, it would also be nice to have control over the maximum depth, but that could probably be added to an existing algorithm. So far I've been unsuccessful in coming up with an approach, and I also haven't found any resources that cover this specific problem. I'd appreciate any help or links to helpful resources.

Using a recursive generator:
items = ['a', 'b', 'c', 'd']
def split(l):
if len(l) == 1:
yield l[0]
for i in range(1, len(l)):
for a in split(l[:i]):
for b in split(l[i:]):
yield (a, b)
list(split(items))
Output:
[('a', ('b', ('c', 'd'))),
('a', (('b', 'c'), 'd')),
(('a', 'b'), ('c', 'd')),
(('a', ('b', 'c')), 'd'),
((('a', 'b'), 'c'), 'd')]
Check of uniqueness:
assert len(list(split(list(range(10))))) == 4862
Reversed order of the items:
items = ['a', 'b', 'c', 'd']
def split(l):
if len(l) == 1:
yield l[0]
for i in range(len(l)-1, 0, -1):
for a in split(l[:i]):
for b in split(l[i:]):
yield (a, b)
list(split(items))
[((('a', 'b'), 'c'), 'd'),
(('a', ('b', 'c')), 'd'),
(('a', 'b'), ('c', 'd')),
('a', (('b', 'c'), 'd')),
('a', ('b', ('c', 'd')))]
With maxdepth:
items = ['a', 'b', 'c', 'd']
def split(l, maxdepth=None):
if len(l) == 1:
yield l[0]
elif maxdepth is not None and maxdepth <= 0:
yield tuple(l)
else:
for i in range(1, len(l)):
for a in split(l[:i], maxdepth=maxdepth and maxdepth-1):
for b in split(l[i:], maxdepth=maxdepth and maxdepth-1):
yield (a, b)
list(split(items))
# or
list(split(items, maxdepth=3))
# or
list(split(items, maxdepth=2))
[('a', ('b', ('c', 'd'))),
('a', (('b', 'c'), 'd')),
(('a', 'b'), ('c', 'd')),
(('a', ('b', 'c')), 'd'),
((('a', 'b'), 'c'), 'd')]
list(split(items, maxdepth=1))
[('a', ('b', 'c', 'd')),
(('a', 'b'), ('c', 'd')),
(('a', 'b', 'c'), 'd')]
list(split(items, maxdepth=0))
[('a', 'b', 'c', 'd')]

Full-credit to mozway for the algorithm - my original idea was to represent the pairing in reverse-polish notation, which would not have lent itself to the following optimizations:
First, we replace the two nested loops:
for a in split(l[:i]):
for b in split(l[i:]):
yield (a, b)
-with itertools.product, which will itself cache the results of the inner split(...) call, as well as produce the pairing in internal C code, which will run much faster.
yield from product(split(l[:i]), split(l[i:]))
Next, we cache the results of the previous split(...) calls. To do this we must sacrifice the laziness of generators, as well as ensure that our function parameters are hashable. Explicitly, this means creating a wrapper that casts the input list to a tuple, and to modify the function body to return lists instead of yielding.
def split(l):
return _split(tuple(l))
def _split(l):
if len(l) == 1:
return l[:1]
res = []
for i in range(1, len(l)):
res.extend(product(_split(l[:i]), _split(l[i:])))
return res
We then decorate the function with functools.cache, to perform the caching. So putting it all together:
from itertools import product
from functools import cache
def split(l):
return _split(tuple(l))
#cache
def _split(l):
if len(l) == 1:
return l[:1]
res = []
for i in range(1, len(l)):
res.extend(product(_split(l[:i]), _split(l[i:])))
return res
Testing for following input-
test = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n']`
-produces the following timings:
Original: 5.922573089599609
Revised: 0.08888077735900879
I did also verify that the results matched the original exactly- order and all.
Again, full credit to mozway for the algorithm. I've just applied a few optimizations to speed it up a bit.

Related

Find longest combination of non-overlapping pairs

I need to find the length of the longest combination of pairs that can be made from a list of pairs, without any common elements.
For example the following list of pairs:
[(A, B), (A, D), (B, C), (B, D), (C, D)]
Would have these combinations:
[(A, B), (C, D)]
[(A, D), (B, C)]
[(B, D)]
And so the longest combination would be 2 pairs in length.
This needs to be able to handle up to several thousand pairs so generating all possible combinations of pairs at each possible length and checking for overlaps would not work.
However, the total number of unique elements across all pairs is capped at 100, so the longest possible combination that could be encountered would be 50 pairs.
Is there an efficient way to do this?
okay this is what I have, maybe not the best but its something
so Combo initializes any 2 pairs, and feeds it to Combine along with the rest of the array not check yet
Combine takes an the leftover array, the current combo and a list of used elements, then check each possible combination, if the check tuple from the leftover array has any elements in the used list, it skips it, if it doesnt, it adds it to the combo and passes it to a further recursed Combine until its as long as it can be
arr = [('A', 'B'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('E', 'D'), ("A",'F'),('J','K'),('M','K'),('K','D'),('B','F')]
def Combo(arr):
combos = []
for i, tup1 in enumerate(arr):
combo = [tup1]
used = [tup1[0], tup1[1]]
for j, tup2 in enumerate(arr[i:]):
if (tup2[0] in used) or (tup2[1] in used):
continue
else:
for el in tup2:
used.append(el)
combo.append(tup2)
combo=Combine(arr[j:], combo, used)
combos.append(combo)
return combos
def Combine(arr, combo, used):
if arr==[]:
return combo
for i, tup in enumerate(arr):
unique = True
for el in tup:
if el in used:
unique = False
continue
if unique:
combo.append(tup)
for el in tup:
used.append(el)
return Combine(arr[i:], combo, used)
return combo
Combo(arr)
OUTPUT
[[('A', 'B'), ('E', 'D'), ('J', 'K')],
[('A', 'D'), ('B', 'C'), ('J', 'K')],
[('B', 'C'), ('E', 'D'), ('A', 'F'), ('J', 'K')],
[('B', 'D'), ('A', 'F'), ('J', 'K')],
[('E', 'D'), ('A', 'F'), ('B', 'C'), ('J', 'K')],
[('A', 'F'), ('J', 'K'), ('B', 'C'), ('E', 'D')],
[('J', 'K'), ('B', 'F'), ('E', 'D')],
[('M', 'K'), ('B', 'F'), ('E', 'D')],
[('K', 'D'), ('B', 'F')]]
as far as I know this should give you each unique combination in a list
Rephrasing the question, we want to find the biggest set of non-overlapping elements of pairs. Probably not the best solution but should work:
def process(pairs):
output = {}
max_length = 0
for i in range(len(pairs)):
curr = 1
output[pairs[i]] = set(pairs[i])
rest = pairs[:i] + pairs[i + 1:]
for j in range(len(rest)):
subset = output[pairs[i]] | set(rest[j])
if len(subset) == len(output[pairs[i]]) + 2:
curr += 1
output[pairs[i]] = subset
max_length = max(curr, max_length)
return max_length
We populate our initial set with the current pair and then if the next pair's elements are not presented in the current set we extend it. We continue this process until we checked all remaining pairs. I used this function for testing:
import random, timeit
def get_random_pairs(num):
return [(random.choice(string.ascii_uppercase), random.choice(string.ascii_uppercase)) for _ in range(num)]
print(timeit.timeit('process(pairs)', number=5, setup="from __main__ import process,get_random_pairs; pairs = get_random_pairs(3000)")/5)
On my machine (Intel i7-9750H (12) # 4.500GHz) it takes about 5-6 seconds to process 3000 pairs.

All permutations of an array in python

I have an array. I want to generate all permutations from that array, including single element, repeated element, change the order, etc. For example, say I have this array:
arr = ['A', 'B', 'C']
And if I use the itertools module by doing this:
from itertools import permutations
perms = [''.join(p) for p in permutations(['A','B','C'])]
print(perms)
Or using loop like this:
def permutations(head, tail=''):
if len(head) == 0:
print(tail)
else:
for i in range(len(head)):
permutations(head[:i] + head[i+1:], tail + head[i])
arr= ['A', 'B', 'C']
permutations(arr)
I only get:
['ABC', 'ACB', 'BAC', 'BCA', 'CAB', 'CBA']
But what I want is:
['A', 'B', 'C',
'AA', 'AB', 'AC', 'BB', 'BA', 'BC', 'CA', 'CB', 'CC',
'AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ACA', 'ACC', 'BBB', 'BAA', 'BAB', 'BAC', 'CCC', 'CAA', 'CCA'.]
The result is all permutations from the array given. Since the array is 3 element and all the element can be repetitive, so it generates 3^3 (27) ways. I know there must be a way to do this but I can't quite get the logic right.
A generator that would generate all sequences as you describe (which has infinite length if you would try to exhaust it):
from itertools import product
def sequence(xs):
n = 1
while True:
yield from (product(xs, repeat=n))
n += 1
# example use: print first 100 elements from the sequence
s = sequence('ABC')
for _ in range(100):
print(next(s))
Output:
('A',)
('B',)
('C',)
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'A')
('B', 'B')
('B', 'C')
('C', 'A')
('C', 'B')
('C', 'C')
('A', 'A', 'A')
('A', 'A', 'B')
('A', 'A', 'C')
('A', 'B', 'A')
...
Of course, if you don't want tuples, but strings, just replace the next(s) with ''.join(next(s)), i.e.:
print(''.join(next(s)))
If you don't want the sequences to exceed the length of the original collection:
from itertools import product
def sequence(xs):
n = 1
while n <= len(xs):
yield from (product(xs, repeat=n))
n += 1
for element in sequence('ABC'):
print(''.join(element))
Of course, in that limited case, this will do as well:
from itertools import product
xs = 'ABC'
for s in (''.join(x) for n in range(len(xs)) for x in product(xs, repeat=n+1)):
print(s)
Edit: In the comments, OP asked for an explanation of the yield from (product(xs, repeat=n)) part.
product() is a function in itertools that generates the cartesian product of iterables, which is a fancy way to say that you get all possible combinations of elements from the first iterable, with elements from the second etc.
Play around with it a bit to get a better feel for it, but for example:
list(product([1, 2], [3, 4])) == [(1, 3), (1, 4), (2, 3), (2, 4)]
If you take the product of an iterable with itself, the same happens, for example:
list(product('AB', 'AB')) == [('A', 'A'), ('A', 'B'), ('B', 'A'), ('B', 'B')]
Note that I keep calling product() with list() around it here, that's because product() returns a generator and passing the generator to list() exhausts the generator into a list, for printing.
The final step with product() is that you can also give it an optional repeat argument, which tells product() to do the same thing, but just repeat the iterable a certain number of times. For example:
list(product('AB', repeat=2)) == [('A', 'A'), ('A', 'B'), ('B', 'A'), ('B', 'B')]
So, you can see how calling product(xs, repeat=n) will generate all the sequences you're after, if you start at n=1 and keep exhausting it for ever greater n.
Finally, yield from is a way to yield results from another generator one at a time in your own generator. For example, yield from some_gen is the same as:
for x in some_gen:
yield x
So, yield from (product(xs, repeat=n)) is the same as:
for p in (product(xs, repeat=n)):
yield p

How to merge all intersecting tuples in a list? [duplicate]

Consider the following list:
tuple_list = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
How can I achieve this?
new_tuple_list = [('c', 'e', 'd'), ('a', 'b')]
I have tried:
for tuple in tuple_list:
for tup in tuple_list:
if tuple[0] == tup[0]:
new_tup = (tuple[0],tuple[1],tup[1])
new_tuple_list.append(new_tup)
But it only works if I have the elements of the tuple in a certain order which means it will result in this instead:
new_tuple_list = [('c', 'e', 'd'), ('a', 'b'), ('d', 'e')]
You could consider the tuples as edges in a graph and your goal as finding connected components within the graph. Then you could simply loop over vertices (items in tuples) and for each vertex you haven't visited yet execute DFS to generate a component:
from collections import defaultdict
def dfs(adj_list, visited, vertex, result, key):
visited.add(vertex)
result[key].append(vertex)
for neighbor in adj_list[vertex]:
if neighbor not in visited:
dfs(adj_list, visited, neighbor, result, key)
edges = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
adj_list = defaultdict(list)
for x, y in edges:
adj_list[x].append(y)
adj_list[y].append(x)
result = defaultdict(list)
visited = set()
for vertex in adj_list:
if vertex not in visited:
dfs(adj_list, visited, vertex, result, vertex)
print(result.values())
Output:
[['a', 'b'], ['c', 'e', 'd']]
Note that in above both the components and elements within a component are in random order.
If you don't need duplicate values (the ability to preserve ['a', 'a', 'b'], for example), this is a simple and fast way to do what you want via sets:
iset = set([frozenset(s) for s in tuple_list]) # Convert to a set of sets
result = []
while(iset): # While there are sets left to process:
nset = set(iset.pop()) # Pop a new set
check = len(iset) # Does iset contain more sets
while check: # Until no more sets to check:
check = False
for s in iset.copy(): # For each other set:
if nset.intersection(s): # if they intersect:
check = True # Must recheck previous sets
iset.remove(s) # Remove it from remaining sets
nset.update(s) # Add it to the current set
result.append(tuple(nset)) # Convert back to a list of tuples
gives
[('c', 'e', 'd'), ('a', 'b')]
This has a bad performance because list-contains checks are O(n) but it's quite short:
result = []
for tup in tuple_list:
for idx, already in enumerate(result):
# check if any items are equal
if any(item in already for item in tup):
# tuples are immutable so we need to set the result item directly
result[idx] = already + tuple(item for item in tup if item not in already)
break
else:
# else in for-loops are executed only if the loop wasn't terminated by break
result.append(tup)
This has the nice side-effect that the order is kept:
>>> result
[('c', 'e', 'd'), ('a', 'b')]
I had that problem with sets so I'm contributing my solution to this. It combines sets with one of more common element as long as possible.
My example data:
data = [['A','B','C'],['B','C','D'],['D'],['X'],['X','Y'],['Y','Z'],['M','N','O'],['M','N','O'],['O','A']]
data = list(map(set,data))
My code to solve the problem:
oldlen = len(data)+1
while len(data)<oldlen:
oldlen = len(data)
for i in range(len(data)):
for j in range(i+1,len(data)):
if len(data[i]&data[j]):
data[i] = data[i]|data[j]
data[j] = set()
data = [data[i] for i in range(len(data)) if data[i]!= set()]
Result:
[{'A', 'B', 'C', 'D', 'M', 'N', 'O'}, {'X', 'Y', 'Z'}]
The task becomes trivial with NetworkX, library for graphs manipulation. Similar to this answer by #niemmi you'd need to find the connected components:
import networkx as nx
tuple_list = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
graph = nx.Graph(tuple_list)
result = list(nx.connected_components(graph))
print(result)
# [{'e', 'c', 'd'}, {'b', 'a'}]
To get the result as a list of tuples:
result = list(map(tuple, nx.connected_components(G)))
print(result)
# [('d', 'e', 'c'), ('a', 'b')]
Use sets. You are checking for overlap and accumulation of (initially small) sets, and Python has a data type for that:
#!python3
#tuple_list = [('c', 'e'), ('c', 'd'), ('a', 'b'), ('d', 'e')]
tuple_list = [(1,2), (3,4), (5,), (1,3,5), (3,'a'),
(9,8), (7,6), (5,4), (9,'b'), (9,7,4),
('c', 'e'), ('e', 'f'), ('d', 'e'), ('d', 'f'),
('a', 'b'),
]
set_list = []
print("Tuple list:", tuple_list)
for t in tuple_list:
#print("Set list:", set_list)
tset = set(t)
matched = []
for s in set_list:
if tset & s:
s |= tset
matched.append(s)
if not matched:
#print("No matches. New set: ", tset)
set_list.append(tset)
elif len(matched) > 1:
#print("Multiple Matches: ", matched)
for i,iset in enumerate(matched):
if not iset:
continue
for jset in matched[i+1:]:
if iset & jset:
iset |= jset
jset.clear()
set_list = [s for s in set_list if s]
print('\n'.join([str(s) for s in set_list]))
I bumped into this problem when resolving coreferences, I need to merge sets in a list of sets that have common elements:
import copy
def merge(list_of_sets):
# init states
list_of_sets = copy.deepcopy(list_of_sets)
result = []
indices = find_fist_overlapping_sets(list_of_sets)
while indices:
# Keep other sets
result = [
s
for idx, s in enumerate(list_of_sets)
if idx not in indices
]
# Append merged set
result.append(
list_of_sets[indices[0]].union(list_of_sets[indices[1]])
)
# Update states
list_of_sets = result
indices = find_fist_overlapping_sets(list_of_sets)
return list_of_sets
def find_fist_overlapping_sets(list_of_sets):
for i, i_set in enumerate(list_of_sets):
for j, j_set in enumerate(list_of_sets[i+1:]):
if i_set.intersection(j_set):
return i, i+j+1

Generate iterator object with tuples of varying size

I am trying to create a branch and bound algorithm, to do this I would like to create an iterator object which stores all possible combinations of a list of items of size 0 to n.
Take the following example to demonstrate:
import itertools as it
list_tmp = ['a', 'b', 'c', 'd']
tmp_it = sum([list(map(list, it.combinations(list_tmp, i))) for i in range(2 + 1)], [])
tmp_it is a list of all possible combinations of size 0 to 2. This code works perfectly for small list sizes, but I need to act on a larger list and so would like to preserve
the iterator characteristics of the it.combinations object (generate the combinations on the fly). e.g.
for iteration in it.combinations(list_tmp, 2):
print(iteration)
Is there any method of doing this for combinations of multiple sizes? Rather than converting to a list and losing the characteristics of the iterator object.
You can do this using itertools.chain.from_iterable, which lazily evaluates its argument. Something like this:
tmp_it = it.chain.from_iterable(it.combinations(list_tmp, i) for i in range(2+1)))
You can chain iterators:
>>> sizes = it.chain.from_iterable(it.combinations(list_tmp, i) for i in range(len(list_tmp)))
>>> for i in sizes:
... print(i)
...
()
('a',)
('b',)
('c',)
('d',)
('a', 'b')
('a', 'c')
('a', 'd')
('b', 'c')
('b', 'd')
('c', 'd')
('a', 'b', 'c')
('a', 'b', 'd')
('a', 'c', 'd')
('b', 'c', 'd')

Checking if a set of tuple contains items from another set

Let's say I have a set of tuples like this:
foo = {('A', 'B'), ('C', 'D'), ('B', 'C'), ('A', 'C')}
var = {'A', 'C', 'B'}
I want to check if every item from var is in any place in the set of tuples and returning True if it is and False if it isn't.
I tried with this but I don't have luck so far.
all((x for x in var) in (a,b) for (a,b) in foo)
Desired output : True
Actual output : False
However if:
var = {'A','C','D'}
I want it to return False, the logic is checking if the strings 'know' eachother.
Alright, let's explain this, for my last var.
A is paired with C, C is paired D, however D is not paired with A.
For my first logic,
A is paired with B,B is paired with C,C is paired with B, C is paired with A, Everyone 'knows' each other.
.
Generate all the pairs you expect to be present and see if they're there with a subset check:
from itertools import combinations
def _norm(it):
return {tuple(sorted(t)) for t in it}
def set_contains(foo, var):
return _norm(combinations(var, 2)) <= _norm(foo)
print(set_contains({('A', 'B'), ('C', 'D'), ('B', 'C'), ('A', 'C')},
{'A', 'C', 'B'})) # True
print(set_contains({('A', 'B'), ('C', 'D'), ('B', 'C'), ('A', 'C')},
{'A', 'C', 'D'})) # False
It may be possible to reduce on the amount of sorting, depending on how exactly combinations works (I'm not 100% sure what to make of the docs) and if you reuse either foo or var several times and can thus sort one of the parts just once beforehand.
Try this:
foo = {('A', 'B'), ('C', 'D'), ('B', 'C'), ('A', 'C')}
var = {'A', 'C', 'B'}
for elem in var:
if any(elem in tuples for tuples in foo):
print(True)
This is not as 'compact' as the others but works the same.
for x in var:
for y in foo:
if x in y:
print('Found %s in %s' % (x, y))
else:
print('%s not in %s' % (x, y))
B not in ('C', 'D')
B not in ('A', 'C')
Found B in ('A', 'B')
Found B in ('B', 'C')
A not in ('C', 'D')
Found A in ('A', 'C')
Found A in ('A', 'B')
A not in ('B', 'C')
Found C in ('C', 'D')
Found C in ('A', 'C')
C not in ('A', 'B')
Found C in ('B', 'C')

Categories

Resources