Generate combinations of elements from multiple lists - python

I'm making a function that takes a variable number of lists as input (i.e., an arbitrary argument list).
I need to compare each element from each list to each element of all other lists, but I couldn't find any way to approach this.

Depending on your goal, you can make use of some of the itertools utilities. For example, you can use itertools.product on *args:
from itertools import product
for comb in product(*args):
if len(set(comb)) < len(comb):
# there are equal values....
But currently it's not very clear from your question what you want to achieve. If I didn't understand you correctly, you can try to state the question in a more specific way.

I think #LevLeitsky's answer is the best way to do a loop over the items from your variable number of lists. However, if purpose the loop is just to find common elements between pairs of items from the lists, I'd do it a bit differently.
Here's an approach that finds the common elements between each pair of lists:
import itertools
def func(*args):
sets = [set(l) for l in args]
for a, b in itertools.combinations(sets, 2):
common = a & b # set intersection
# do stuff with the set of common elements...
I'm not sure what you need to do with the common elements, so I'll leave it there.

The itertools module provides a lot of useful tools just for such tasks. You can adapt the following example to your task by integrating it into your specific comparison logic.
Note that the following assumes a commutative function. That is, about half of the tuples are omitted for reasons of symmetry.
Example:
import itertools
def generate_pairs(*args):
# assuming function is commutative
for i, l in enumerate(args, 1):
for x, y in itertools.product(l, itertools.chain(*args[i:])):
yield (x, y)
# you can use lists instead of strings as well
for x, y in generate_pairs("ab", "cd", "ef"):
print (x, y)
# e.g., apply your comparison logic
print any(x == y for x, y in generate_pairs("ab", "cd", "ef"))
print all(x != y for x, y in generate_pairs("ab", "cd", "ef"))
Output:
$ python test.py
('a', 'c')
('a', 'd')
('a', 'e')
('a', 'f')
('b', 'c')
('b', 'd')
('b', 'e')
('b', 'f')
('c', 'e')
('c', 'f')
('d', 'e')
('d', 'f')
False
True

if you want the arguments as dictionary
def kw(**kwargs):
for key, value in kwargs.items():
print key, value
if you want all the arguments as list:
def arg(*args):
for item in args:
print item
you can use both
def using_both(*args, **kwargs) :
kw(kwargs)
arg(args)
call it like that:
using_both([1,2,3,4,5],a=32,b=55)

Related

Pythonic way to generate a pseudo ordered pair list from an iterable (e.g. a list or set)

Given an iterable consisting of a finite set of elements:
(a, b, c, d)
as an example
What would be a Pythonic way to generate the following (pseudo) ordered pair from the above iterable:
ab
ac
ad
bc
bd
cd
A trivial way would be to use for loops, but I'm wondering if there is a pythonic way of generating this list from the iterable above ?
Try using combinations.
import itertools
combinations = itertools.combinations('abcd', n)
will give you an iterator, you can loop over it or convert it into a list with list(combinations)
In order to only include pairs as in your example, you can pass 2 as the argument:
combinations = itertools.combinations('abcd', 2)
>>> print list(combinations)
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
You can accomplish this in a list comprehension.
data = [1, 2, 3, 4]
output = [(data[n], next) for n in range(0,len(data)) for next in data[n:]]
print(repr(output))
Pulling the comprehension apart, it's making a tuple of the first member of the iterable and the first object in a list composed of all other members of the iterable. data[n:] tells Python to take all members after the nth.
Here it is in action.
Use list comprehensions - they seem to be considered pythonic.
stuff = ('a','b','c','d')
Obtain the n-digit binary numbers, in string format, where only two of the digits are one and n is the length of the items.
n = len(stuff)
s = '{{:0{}b}}'.format(n)
z = [s.format(x) for x in range(2**n) if s.format(x).count('1') ==2]
Find the indices of the ones for each combination.
y = [[i for i, c in enumerate(combo) if c == '1'] for combo in z]
Use the indices to select the items, then sort.
x = [''.join(operator.itemgetter(*indices)(stuff)) for indices in y]
x.sort()

Max, len, split Python

I'm trying to find all combinations of A,B repeated 3 times.
Once I've done this I would like to count how many A's there are in a row, by splitting the string and returning the len.max value. However this is going crazy on me. I must have misunderstood the len(max(tmp.split="A")
Can anyone explain what this really does (len returns the length of the string, and max returns the highest integer of that string, based on my split?) I expect it to return the number of A's in a row. "A,B,A" should return 1 even though there are two A's.
Suggestions and clarifications would be sincerely welcome
import itertools
list = list(itertools.product(["A", "B"], repeat=3))
count = 0;
for i in list:
count += 1;
tmp = str(i);
var = len(max(tmp.split("B")))
print(count, i, var)
You can use itertools.groupby to find groups of identical elements in an iterable. groupby generates a sequence of (key, group) tuples, where key is the value of the elements in the group, and group is an iterator of that group (which shares the underlying iterable with groupby. To get the length of the group we need to convert it to a list.
from itertools import product, groupby
for t in product("AB", repeat=3):
a = max([len(list(g)) for k, g in groupby(t) if k == "A"] or [0])
print(t, a)
output
('A', 'A', 'A') 3
('A', 'A', 'B') 2
('A', 'B', 'A') 1
('A', 'B', 'B') 1
('B', 'A', 'A') 2
('B', 'A', 'B') 1
('B', 'B', 'A') 1
('B', 'B', 'B') 0
We need to append or [0] to the list comprehension to cover the situation where no "A"s are found, otherwise max complains that we're trying to find the maximum of an empty sequence.
Update
Padraic Cunningham reminded me that the Python 3 version of max accepts a default arg to handle the situation when you pass it an empty iterable. He also shows another way to calculate the length of an iterable that is a bit nicer since it avoids capturing the iterable into a list, so it's a bit faster and consumes less RAM, which can be handy when working with large iterables. So we can rewrite the above code as
from itertools import product, groupby
for t in product("AB", repeat=3):
a = max((sum(1 for _ in g) for k, g in groupby(t) if k == "A"), default=0)
print(t, a)

Combining lists of tuples based on a common tuple element

Consider two lists of tuples:
data1 = [([X1], 'a'), ([X2], 'b'), ([X3], 'c')]
data2 = [([Y1], 'a'), ([Y2], 'b'), ([Y3], 'c')]
Where len(data1) == len(data2)
Each tuple contains two elements:
list of some strings (i.e [X1])
A common element for data1 and data2: strings 'a', 'b', and so on.
I would like to combine them into following:
[('a', [X1], [Y1]), ('b', [X2], [Y2]),...]
Does anyone know how I can do this?
You can use zip function and a list comprehension:
[(s1,l1,l2) for (l1,s1),(l2,s2) in zip(data1,data2)]
#Kasramvd's solution is good if the order is the same among all elements in the data lists. If they are not, it doesn't take that into account.
A solution that does, utilizes a defaultdict:
from collections import defaultdict
d = defaultdict(list) # values are initialized to empty list
data1 = [("s1", 'a'), ("s2", 'c'), ("s3", 'b')]
data2 = [("s1", 'c'), ("s2", 'b'), ("s3", 'a')]
for value, common in data1 + data2:
d[common].append(value)
In order to get a list of it, simply wrap it in a list() call:
res = list(d.items())
print(res)
# Prints: [('b', ['s3', 's2']), ('a', ['s1', 's3']), ('c', ['s2', 's1'])]
We can do this in a single comprehension expression, using the reduce function
from functools import reduce
from operator import add
[tuple([x]+reduce(add,([y[0]] for y in data1+data2 if y[1]==x))) for x in set(y[1] for y in data1+data2)]
If the lists are large, so that data1+data2 imposes a severe time or memory penalty, it might be better to pre-compute it
combdata = data1+data2
[tuple([x]+reduce(add,[y[0]] for y in combdata if y[1]==x))) for x in set(y[1] for y in combdata)]
This solution does not rely on all "keys" occurring in both lists, or the order being the same.
If returned order is important, we can even do
sorted([tuple([x]+reduce(add,([y[0]] for y in data1+data2 if y[1]==x))) for x in set(y[1] for y in data1+data2)],key = lambda x,y=[x[0] for x in data1+data2]: y.index(x[1]))
to ensure that the order is the same as in the original lists. Again, pre-computing data1+data2 gives
sorted([tuple([x]+reduce(add,([y[0]] for y in combdata if y[1]==x))) for x in set(y[1] for y in combdata)],key = lambda x,y=[x[0] for x in combdata]: y.index(x[1]))

Most efficient way to list all oriented cycles given n elements in Python

I have a list of elements which can be quite big (100+ elements): elements = [a, b, c, d, e, f, g...].
and I need to build the list of all possible directed cycles, considering that the sequences
[a,b,c,d,e], [b,c,d,e,a], [c,d,e,a,b], [d,e,a,b,c], [e,a,b,c,d] are considered identical since they are different representations of the same directed cycle. Only the starting point differs.
Also, since direction matters, [a,b,c,d,e] and [e,d,c,b,a] are different.
I am looking for all the oriented cycles of all lengths, from 2 to len(elements). What's the most pythonic way to do it leveraging the optimization of built-in permutations, combinations, etc ?.
Maybe I'm missing something, but this seems straightforward to me:
def gen_oriented_cycles(xs):
from itertools import combinations, permutations
for length in range(2, len(xs) + 1):
for pieces in combinations(xs, length):
first = pieces[0], # 1-tuple
for rest in permutations(pieces[1:]):
yield first + rest
Then, e.g.,
for c in gen_oriented_cycles('abcd'):
print c
displays:
('a', 'b')
('a', 'c')
('a', 'd')
('b', 'c')
('b', 'd')
('c', 'd')
('a', 'b', 'c')
('a', 'c', 'b')
('a', 'b', 'd')
('a', 'd', 'b')
('a', 'c', 'd')
('a', 'd', 'c')
('b', 'c', 'd')
('b', 'd', 'c')
('a', 'b', 'c', 'd')
('a', 'b', 'd', 'c')
('a', 'c', 'b', 'd')
('a', 'c', 'd', 'b')
('a', 'd', 'b', 'c')
('a', 'd', 'c', 'b')
Is that missing some essential property you're looking for?
EDIT
I thought it might be missing this part of your criteria:
Also, since direction matters, [a,b,c,d,e] and [e,d,c,b,a] are different.
but on second thought I think it meets that requirement, since [e,d,c,b,a] is the same as [a,e,d,c,b] to you.
Is there any good reason to have a canonical representation in memory of this? It's going to be huge, and possibly whatever use case you have for this may have a better way of dealing with it.
It looks like for your source material, you would use any combination of X elements, not necessarily even homogeneous ones? (i.e. you would have (a,e,g,x,f) etc.). Then, I would do this as a nested loop. The outer one would select by length, and select subsets of the entire list to use. The inner one would construct combinations of the subset, and then throw out matching items. It's going to be slow no matter how you do it, but I would use a dictionary with a frozenset as the key (of the items, for immutability and fast lookup), and the items to be a list of already-detected cycles. It's going to be slow/long-running no matter how you do it, but this is one way.
First, you need a way to determine if two tuples (or lists) represent the same cycle. You can do that like this:
def match_cycle(test_cycle, ref_cycle):
try:
refi = ref_cycle.index(test_cycle[0])
partlen = len(ref_cycle) - refi
return not (any(test_cycle[i] - ref_cycle[i+refi] for i in range(partlen)) or
any(test_cycle[i+partlen] - ref_cycle[i] for i in range(refi)))
except:
return False
Then, the rest.
def all_cycles(elements):
for tuple_length in range(2, len(elements)):
testdict = defaultdict(list)
for outer in combinations(elements, tuple_length):
for inner in permutations(outer):
testset = frozenset(inner)
if not any(match_cycle(inner, x) for x in testdict[testset]):
testdict[testset].append(inner)
yield inner
This produced 60 items for elements of length 5, which seems about right and looked OK from inspection. Note that this is going to be exponential though.... length(5) took 1.34 ms/loop. length(6) took 22.1 ms. length(7) took 703 ms, length(8) took 33.5 s. length(100) might finish before you retire, but I wouldn't bet on it.
there might a better way, and probably is, but in general the number of subsets in 100 elements is pretty large, even when reduced some for cycles. So this is probably not the right way to approach whatever problem you are trying to solve.
This may work:
import itertools
import collections
class Cycle(object):
def __init__(self, cycle):
self.all_possible = self.get_all_possible(cycle)
self.canonical = self.get_canonical(self.all_possible)
def __eq__(self, other):
return self.canonical == other.canonical
def __hash__(self):
return hash(self.canonical)
def get_all_possible(self, cycle):
output = []
cycle = collections.deque(cycle)
for i in xrange(len(cycle)):
cycle.rotate(1)
output.append(list(cycle))
return output
def get_canonical(self, cycles):
return min(map(tuple, cycles), key=lambda item: hash(item))
def __repr__(self):
return 'Cycle({0})'.format(self.canonical)
def list_cycles(elements):
output = set()
for i in xrange(2, len(elements) + 1):
output.update(set(map(Cycle, itertools.permutations(elements, i))))
return list(output)
def display(seq):
for cycle in seq:
print cycle.canonical
print '\n'.join(' ' + str(item) for item in cycle.all_possible)
def main():
elements = 'abcdefghijkl'
final = list_cycles(elements)
display(final)
if __name__ == '__main__':
main()
It creates a class to represent any given cycle, which will be hashed and checked for equality against a canonical representation of the cycle. This lets a Cycle object be placed in a set, which will automatically filter out any duplicates. Unfortunately, it's not going to be highly efficient, since it generates every single possible permutation first.
This should give you the right answer with cycles with length 2 to len(elements). Might not be the fastest way to do it though. I used qarma's hint of rotating it to always start with the smallest element.
from itertools import permutations
def rotate_min(l):
'''Rotates the list so that the smallest element comes first '''
minIndex = l.index(min(l))
rotatedTuple = l[minIndex:] + l[:minIndex]
return rotatedTuple
def getCycles(elements):
elementIndicies = tuple(range(len(elements))) #tupple is hashable so it works with set
cyclesIndices = set()
cycles = []
for length in range(2, len(elements)+1):
allPermutation = permutations(elementIndicies, length)
for perm in allPermutation:
rotated_perm = rotate_min(perm)
if rotated_perm not in cyclesIndices:
#If the cycle of indices is not in the set, add it.
cyclesIndices.add(rotated_perm)
#convert indicies to the respective elements and append
cycles.append([elements[i] for i in rotated_perm])
return cycles

Python - input from list of tuples

I've declared a list of tuples that I would like to manipulate. I have a function that returns an option from the user. I would like to see if the user has entered any one of the keys 'A', 'W', 'K'. With a dictionary, I would say this: while option not in author.items() option = get_option(). How can I accomplish this with a list of tuples?
authors = [('A', "Aho"), ('W', "Weinberger"), ('K', "Kernighan")]
authors = [('A', "Aho"), ('W', "Weinberger"), ('K', "Kernighan")]
option = get_option()
while option not in (x[0] for x in authors):
option = get_option()
How this works :
(x[0] for x in authors) is an generator expression, this yield the [0]th element of each item one by one from authors list, and that element is then matched against the option. As soon as match is found it short-circuits and exits.
Generator expressions yield one item at a time, so are memory efficient.
How about something like
option in zip(*authors)[0]
We are using zip to essentially separate the letters from the words. Nevertheless, since we are dealing with a list of tuples, we must unpack it using *:
>>> zip(*authors)
[('A', 'W', 'K'), ('Aho', 'Weinberger', 'Kernighan')]
>>> zip(*authors)[0]
('A', 'W', 'K')
Then we simply use option in to test if option is contained in zip(*authors)[0].
There are good answers here that cover doing this operation with zip, but you don't have to do it like that - you can use an OrderedDict instead.
from collections import OrderedDict
authors = OrderedDict([('A', "Aho"), ('W', "Weinberger"), ('K', "Kernighan")])
Since it remembers its entry order, you can iterate over it without fear of getting odd or unusual orderings of your keys.

Categories

Resources