Related
Given a list of letters, say L=['a','b','c','d','e','f'] and a list of tuples, for example T=[('a','b'),('a','c'),('b','c')].
Now I want to create the maximum amount of possible tuples from the list of L that are not contained in T already. This needs to be done without duplicates, i.e. (a,b) would be the same as (b,a). Also, each letter can only be matched with one other letter.
My idea was:
#create a List of all possible tuples first:
all_tuples = [(x,y) for x in L for y in L if x!=y]
#now remove duplicates
unique_tuples = list(set([tuple(sorted(elem)) for elem in all_tuples]))
#Now, create a new set that matches each letter only once with another letter
visited=set()
output = []
for letter1, letter2 in unique tuples:
if ((letter1, letter2) or (letter2, letter1)) in T:
continue
if not letter1 in visited and not letter2 in visited:
visited.add(letter1)
visited.add(letter2)
output.append((letter1,letter2))
print(output)
However, this does not always give the maximum amount of possible tuples, depending on what T is. For example, let's say we extract the possible unique_tuples=[('a','b'),('a','d'),('b','c')].
If we append ('a','b') first to our output, we cannot append ('b','c') anymore, since 'b' was matched already. However, if we appended ('a','d') first, we could also get ('b','c') afterwards and get the maximum amount of two tuples.
How can one solve this?
If we ignore the business about not matching the same letter twice, this is a straightforward use of combinations:
>>> from itertools import combinations
>>> L=['a','b','c','d','e','f']
>>> T=[('a','b'),('a','c'),('b','c')]
>>> [t for t in combinations(L, 2) if t not in T]
[('a', 'd'), ('a', 'e'), ('a', 'f'), ('b', 'd'), ('b', 'e'), ('b', 'f'), ('c', 'd'), ('c', 'e'), ('c', 'f'), ('d', 'e'), ('d', 'f'), ('e', 'f')]
If we limit ourselves to only using each letter once, the problem is very straightforward, because we know that we can only have (letters / 2) tuples at most. Just find the available letters (by subtracting those already present in T) and then pair them up in any arbitrary order.
>>> used_letters = {c for t in T for c in t}
>>> free_letters = [c for c in L if c not in used_letters]
>>> [tuple(free_letters[i:i+2]) for i in range(0, 2 * (len(free_letters) // 2), 2)]
[('d', 'e')]
Without using libraries, you could do it like this:
L=['a','b','c','d','e','f']
T=[('a','b'),('a','c'),('b','c')]
L = sorted(L,key=lambda c: -sum(c in t for t in T))
used = set()
r = [(a,b) for i,a in enumerate(L) for b in L[i+1:]
if (a,b) not in T and (b,a) not in T
and used.isdisjoint((a,b)) and not used.update((a,b))]
print(r)
[('a', 'd'), ('b', 'e'), ('c', 'f')]
The letters are sorted in descending order of frequency in T before combining them. This ensures that the hardest letters to match are processed first thus maximizing the pairing potential for the remaining letters.
Alternatively, you could use a recursive (DP) approach and check all possible pairing combinations.
def maxTuples(L,T):
maxCombos = [] # will return longest
for i,a in enumerate(L): # first letter of tuple
for j,b in enumerate(L[i+1:],i+1): # second letter of tuple
if (a,b) in T: continue # tuple not in T
if (b,a) in T: continue # inverted tuple not in T
rest = L[:i]+L[i+1:j]+L[j+1:] # recurse with rest of letters
R = [(a,b)]+maxTuples(rest,T) # adding to selected pair
if len(R)*2+1>=len(L): return R # max possible, stop here
if len(R)>len(maxCombos): # longer combination of tuples
maxCombos = R # Track it
return maxCombos
...
L=['a','b','c','d','e','f']
T=[('a','b'),('a','c'),('b','c'),('c','f')]
print(maxTuples(L,T))
[('a', 'd'), ('b', 'f'), ('c', 'e')]
L = list("ABCDEFGHIJKLMNOP")
T = [('K', 'N'), ('G', 'F'), ('I', 'P'), ('C', 'A'), ('O', 'M'),
('D', 'B'), ('L', 'J'), ('E', 'H'), ('F', 'E'), ('L', 'H'),
('J', 'G'), ('N', 'I'), ('C', 'M'), ('A', 'P'), ('D', 'O'),
('K', 'B'), ('G', 'H'), ('O', 'A'), ('I', 'J'), ('N', 'M'),
('F', 'P'), ('E', 'B'), ('K', 'L'), ('D', 'C'), ('D', 'E'),
('L', 'F'), ('B', 'H'), ('I', 'A'), ('K', 'G'), ('M', 'O'),
('P', 'C'), ('N', 'J'), ('J', 'E'), ('N', 'P'), ('A', 'G'),
('H', 'O'), ('I', 'B'), ('K', 'F'), ('M', 'C'), ('L', 'D'),
('A', 'B'), ('C', 'E'), ('D', 'F'), ('G', 'I'), ('H', 'J'),
('K', 'M'), ('L', 'N'), ('O', 'P')]
print(maxTuples(L,T))
[('A', 'D'), ('B', 'C'), ('E', 'G'), ('F', 'H'),
('I', 'K'), ('J', 'M'), ('L', 'P'), ('N', 'O')]
Note that the function will be slow if the tuples in T exclude so many pairings that it is impossible to produce a combination of len(L)/2 tuples. It can be optimized further by filtering letters that are completely excluded as we go down the recursion:
def maxTuples(L,T):
if not isinstance(T,dict):
T,E = {c:{c} for c in L},T # convert T to a dictionary
for a,b in E: T[a].add(b);T[b].add(a) # of excluded letter sets
L = [c for c in L if not T[c].issuperset(L)] # filter fully excluded
maxCombos = [] # will return longest
for i,a in enumerate(L): # first letter of tuple
for j,b in enumerate(L[i+1:],i+1): # second letter of tuple
if b in T[a]: continue # exclude tuples in T
rest = L[:i]+L[i+1:j]+L[j+1:] # recurse with rest of letters
R = [(a,b)]+maxTuples(rest,T) # adding to selected pair
if len(R)*2+1>=len(L): return R # max possible, stop here
if len(R)>len(maxCombos): # longer combination of tuples
maxCombos = R # Track it
return maxCombos
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I need a way to reverse my key values pairing. Let me illustrate my requirements.
dict = {1: (a, b), 2: (c, d), 3: (e, f)}
I want the above to be converted to the following:
dict = {1: (e, f), 2: (c, d), 3: (a, b)}
You just need:
new_dict = dict(zip(old_dict, reversed(old_dict.values())))
Note, prior to Python 3.8, where dict_values objects are not reversible, you will need something like:
new_dict = dict(zip(old_dict, reversed(list(old_dict.values()))))
List instead of a dict
Assuming that your keys are always the integers from 1 to N, it seems that your dict should actually be a list. And whatever you use, you shouldn't use dict as a variable name.
You would not lose any information with a list:
d = {1: ('a', 'b'), 3: ('e', 'f'), 2: ('c', 'd')}
l = [v for k, v in sorted(d.items())]
# [('a', 'b'), ('c', 'd'), ('e', 'f')]
You also wouldn't lose any information by shifting the indices by -1.
Getting the information back
You have the sorted values directly inside l.
If you need the keys, you can simply call:
range(len(l))
# range(0, 3)
If you want the index i, you can call l[i].
l[1] # Indices have been shifted. 2 is now 1
# ('c', 'd')
If you want the original dict, you can call:
>>> dict(enumerate(l))
{0: ('a', 'b'), 1: ('c', 'd'), 2: ('e', 'f')}
>>> dict(enumerate(l, 1))
{1: ('a', 'b'), 2: ('c', 'd'), 3: ('e', 'f')}
In order to get the reversed values, you can simply reverse the list:
>>> l[::-1]
[('e', 'f'), ('c', 'd'), ('a', 'b')]
>>> l[::-1][0]
('e', 'f')
And, in order to answer your original question, if you really want to keep the proposed data format, you can call:
>>> dict(list(enumerate(l[::-1])))
{0: ('e', 'f'), 1: ('c', 'd'), 2: ('a', 'b')}
>>> dict(list(enumerate(l[::-1], 1)))
{1: ('e', 'f'), 2: ('c', 'd'), 3: ('a', 'b')}
This should accomplish the desired outcome.
def rev_keys(d: dict) -> dict:
'''Return dictionary structure with the
keys reasigned in opposite order'''
old_keys = list(d.keys())
new_keys = old_keys[::-1]
nd = {}
for ki in range(len(new_keys)):
nd[new_keys[ki]]= d[old_keys[ki]]
return nd
Given and input looking like:
dt = {'1': ('a','b'), '2': ('c','d'), '3': ('e','f')}
rev_keys(dt)
returns:
{'3': ('a', 'b'), '2': ('c', 'd'), '1': ('e', 'f')}
Try the following
dict_ = {1: ('a','b'), 2: ('c','d'), 3: ('e','f')}
values = [y for x, y in dict_.items()][::-1]
res = {}
for x, y in enumerate(dict_.items()):
res[y[0]] = values[x]
print(res)
This is the output:
{1: ('e', 'f'), 2: ('c', 'd'), 3: ('a', 'b')}
You can zip the original dictionary's keys with the original dictionary's values, and since you want the values to be reversed, you can use the negative striding [::-1].
Note that dict.values() cannot be subscripted, hence you will need to convert it into a list first:
dct = {1: ('a', 'b'), 2: ('c', 'd'), 3: ('e', 'f')}
dct = dict(zip(dct, list(dct.values())[::-1]))
print(dct)
Output:
{1: ('e', 'f'), 2: ('c', 'd'), 3: ('a', 'b')}
I have a big list of lists of tuples like
actions = [ [('d', 'r'), ... ('c', 'e'),('', 'e')],
[('r', 'e'), ... ('c', 'e'),('d', 'r')],
... ,
[('a', 'b'), ... ('c', 'e'),('c', 'h')]
]
and i want to find the co-occurrences of the tuples.
I have tried the sugestions from this question but the accepted answer is just too slow. For example in a list with 1494 list of tuple, the resulting dictionary size is 18225703 and took hours to run for 2 tuple co-occurence. So plain permutation and counting doesn't seem to be the answer since i have a bigger list.
I expect the output to somewhat extract the most common pairs (2) or more (3,4,5 at most) tuples that co-occur the most. Using the previous list as example:
('c', 'e'),('d', 'r')
would a common co-occurence when searching for pairs since they co-occur frequently. Is there an efficient method to achieve this?
I think there is no hope for a faster algorithm: you have to compute the combinations to count them. However, if there is threshold of co-occurrences under which you are not interested, you can rty to reduce the complexity of the algorithm. In both cases, there is a hope for less space complexity.
Let's take a small example:
>>> actions = [[('d', 'r'), ('c', 'e'),('', 'e')],
... [('r', 'e'), ('c', 'e'),('d', 'r')],
... [('a', 'b'), ('c', 'e'),('c', 'h')]]
General answer
This answer is probably the best for a large list of lists, but you can avoid creating intermediary lists. First, create an iterable on all present pairs of elements (elements are pairs too in your case, but that doesn't matter):
>>> import itertools
>>> it = itertools.chain.from_iterable(itertools.combinations(pair_list, 2) for pair_list in actions)
If we want to see the result, we have to consume the iteratable:
>>> list(it)
[(('d', 'r'), ('c', 'e')), (('d', 'r'), ('', 'e')), (('c', 'e'), ('', 'e')), (('r', 'e'), ('c', 'e')), (('r', 'e'), ('d', 'r')), (('c', 'e'), ('d', 'r')), (('a', 'b'), ('c', 'e')), (('a', 'b'), ('c', 'h')), (('c', 'e'), ('c', 'h'))]
Then count the sorted pairs (with a fresh it!)
>>> it = itertools.chain.from_iterable(itertools.combinations(pair_list, 2) for pair_list in actions)
>>> from collections import Counter
>>> c = Counter((a,b) if a<=b else (b,a) for a,b in it)
>>> c
Counter({(('c', 'e'), ('d', 'r')): 2, (('', 'e'), ('d', 'r')): 1, (('', 'e'), ('c', 'e')): 1, (('c', 'e'), ('r', 'e')): 1, (('d', 'r'), ('r', 'e')): 1, (('a', 'b'), ('c', 'e')): 1, (('a', 'b'), ('c', 'h')): 1, (('c', 'e'), ('c', 'h')): 1})
>>> c.most_common(2)
[((('c', 'e'), ('d', 'r')), 2), ((('', 'e'), ('d', 'r')), 1)]
At least in term of space, this solution should be efficient since everything is lazy and the number of elements of the Counter is the number of combinations from elements in the same list, that is at most N(N-1)/2 where N is the number of distinct elements in all the lists ("at most" because some elements never "meet" each other and therefore some combination never happen).
The time complexity is O(M . L^2) where M is the number of lists and L the size of the largest list.
With a threshold on the co-occurences number
I assume that all elements in a list are distinct. The key idea is that if an element is present in only one list, then this element has strictly no chance to beat anyone at this game: it will have 1 co-occurence with all his neighbors, and 0 with the elements of other lists. If there are a lot of "orphans", it might be useful to remove them before processing computing the combinations:
>>> d = Counter(itertools.chain.from_iterable(actions))
>>> d
Counter({('c', 'e'): 3, ('d', 'r'): 2, ('', 'e'): 1, ('r', 'e'): 1, ('a', 'b'): 1, ('c', 'h'): 1})
>>> orphans = set(e for e, c in d.items() if c <= 1)
>>> orphans
{('a', 'b'), ('r', 'e'), ('c', 'h'), ('', 'e')}
Now, try the same algorithm:
>>> it = itertools.chain.from_iterable(itertools.combinations((p for p in pair_list if p not in orphans), 2) for pair_list in actions)
>>> c = Counter((a,b) if a<=b else (b,a) for a,b in it)
>>> c
Counter({(('c', 'e'), ('d', 'r')): 2})
Note the comprehension: no brackets but parentheses.
If you have K orphans in a list of N elements, your time complexity for that list falls from N(N-1)/2 to (N-K)(N-K-1)/2, that is (if I'm not mistaken!) K.(2N-K-1) combinations less.
This can be generalized: if an element is present in two or less lists, then it will have at most 2 co-occurrences with other elements, and so on.
If this is still to slow, then switch to a faster language.
I have the following list,
p=[list(['a', 'b', 'c']), list(['d', 'e'])]
I would like to make the subset of each element (of size 2) and list them, this would give the output as follow:
[[('a', 'b'), ('a', 'c'), ('b', 'c')],[('d', 'e')]]
To achieve this I wrote the following function,
def x(m,n):
for i in x:
z=list(itertools.combinations(i, n))
return(z)
yet when I apply ie z(m,2) I only get the last element:
[('d', 'e')]
I wonder what am I doing wrong?
it is because you are setting z each time instead of appending it:
def x(m,n):
z = []
for i in m:
z.append(list(itertools.combinations(i, n)))
return(z)
yileds:
[[('a', 'b'), ('a', 'c'), ('b', 'c')], [('d', 'e')]]
I have three lists that are generated by other functions. Let's assume for now they are:
x = ['d', 'e']
g = ['1', '2']
y = ['f', g]
As you can see, g is part of y. I am trying to get all combinations of the elements of the three lists. I have tried going about this in two ways:
One way:
l = []
l.append([a]+[b] for a in x for b in y)
Another way using itertools:
import itertools
l = list(itertools.product([a for a in x], [b for b in y]))
Both ways produce the following combinations:
[('d', 'f'), ('d', ['1', '2']), ('e', 'f'), ('e', ['1', '2'])]
But what I would like to get is:
[('d', 'f'), ('d', '1'), ('d','2'), ('e', 'f'), ('e', '1'), ('e','2')]
Also, when x for example is empty, I get no combinations at all when I am still expecting to get the element combinations of the remaining two lists.
As #BrenBarn commented, you can flatten list y with chain function, and then use product:
from itertools import product, chain
list(product(x, chain.from_iterable(y)))
# [('d', 'f'), ('d', '1'), ('d', '2'), ('e', 'f'), ('e', '1'), ('e', '2')]
This is inspired from #Psidoms answer but just uses a specifically tailored flatten function to make sure only items that should be flattened are iterated:
def flatten(x, types=list):
lst = []
for item in x:
if isinstance(item, types):
for subitem in item:
lst.append(subitem)
else:
lst.append(item)
return lst
>>> from itertools import product
>>> list(product(x, flatten(y)))
[('d', 'f'), ('d', '1'), ('d', '2'), ('e', 'f'), ('e', '1'), ('e', '2')]
Note that there is unfortunatly no such flatten function in the standard library but you could also use one from an external library, for example iteration_utilities.deepflatten. Note that this requires to provide str or basestring as ignore:
>>> from iteration_utilities import deepflatten
>>> list(product(x, deepflatten(y, ignore=str)))
[('d', 'f'), ('d', '1'), ('d', '2'), ('e', 'f'), ('e', '1'), ('e', '2')]
To exclude empty iterables from the product simply exclude empty subiterables. For example:
>>> x = []
>>> iterables = [subiterable for subiterable in (x, list(deepflatten(y, ignore=str))) if subiterable]
>>> list(product(*iterables))
[('f',), ('1',), ('2',)]
I would like to point out two implementations for flatten-like functions available in more_itertools (install via pip install more_itertools).
flatten is an itertools recipe and emulates #Psidom's proposal:
import itertools as it
import more_itertools as mit
list(it.product(x, mit.flatten(y)))
# [('d', 'f'), ('d', '1'), ('d', '2'), ('e', 'f'), ('e', '1'), ('e', '2')]
However, for flattening more deeply nested iterables, consider using collapse:
# Example
x = ['d', 'e']
g = [('1'), [[['2']]]]
y = [{'f'}, g]
# Bad
list(it.product(x, mit.flatten(y)))
# [('d', 'f'), ('d', '1'), ('d', [[['2']]]), ('e', 'f'), ('e', '1'), ('e', [[['2']]])]
# Good
list(it.product(x, mit.collapse(y)))
# [('d', 'f'), ('d', '1'), ('d', '2'), ('e', 'f'), ('e', '1'), ('e', '2')]