Max, len, split Python - python

I'm trying to find all combinations of A,B repeated 3 times.
Once I've done this I would like to count how many A's there are in a row, by splitting the string and returning the len.max value. However this is going crazy on me. I must have misunderstood the len(max(tmp.split="A")
Can anyone explain what this really does (len returns the length of the string, and max returns the highest integer of that string, based on my split?) I expect it to return the number of A's in a row. "A,B,A" should return 1 even though there are two A's.
Suggestions and clarifications would be sincerely welcome
import itertools
list = list(itertools.product(["A", "B"], repeat=3))
count = 0;
for i in list:
count += 1;
tmp = str(i);
var = len(max(tmp.split("B")))
print(count, i, var)

You can use itertools.groupby to find groups of identical elements in an iterable. groupby generates a sequence of (key, group) tuples, where key is the value of the elements in the group, and group is an iterator of that group (which shares the underlying iterable with groupby. To get the length of the group we need to convert it to a list.
from itertools import product, groupby
for t in product("AB", repeat=3):
a = max([len(list(g)) for k, g in groupby(t) if k == "A"] or [0])
print(t, a)
output
('A', 'A', 'A') 3
('A', 'A', 'B') 2
('A', 'B', 'A') 1
('A', 'B', 'B') 1
('B', 'A', 'A') 2
('B', 'A', 'B') 1
('B', 'B', 'A') 1
('B', 'B', 'B') 0
We need to append or [0] to the list comprehension to cover the situation where no "A"s are found, otherwise max complains that we're trying to find the maximum of an empty sequence.
Update
Padraic Cunningham reminded me that the Python 3 version of max accepts a default arg to handle the situation when you pass it an empty iterable. He also shows another way to calculate the length of an iterable that is a bit nicer since it avoids capturing the iterable into a list, so it's a bit faster and consumes less RAM, which can be handy when working with large iterables. So we can rewrite the above code as
from itertools import product, groupby
for t in product("AB", repeat=3):
a = max((sum(1 for _ in g) for k, g in groupby(t) if k == "A"), default=0)
print(t, a)

Related

How to combine lists by choosing one element per index?

I'm trying to figure out how to make a function that takes two lists, then returns a list of all the possible combinations of those lists made by choosing one element from one of the lists per index. I don't think I'm describing it well, but what I'm looking for is:
input:
['a','b'], ['c','d']
output:
['ab', 'ad', 'cb', 'cd']
I've made a function that does this semi-successfully here:
def mix_list(lst1, lst2):
res = []
k = max(len(lst1), len(lst2))
ref = itertools.product(range(2), repeat=k)
for comb in list(ref):
temp = [None] * k
for i, e in enumerate(comb):
if e == 0:
try:
temp[i] = lst1[i]
except IndexError:
temp[i] = lst2[i]
elif e == 1:
try:
temp[i] = lst2[i]
except IndexError:
temp[i] = lst1[i]
res.append(temp)
return [''.join(i) for i in set(map(tuple, res))]
My first thought was that itertools would have some function that would accomplish this, but I couldn't find anything. Besides that, I did some googling and searching on here, but I haven't been able to find something that does what I'm looking for a bit faster or a bit more simply.
Is there a better way to accomplish this, or maybe a library that has a function that already does this?
You can zip() the two lists together to get a list of pairs of the values at corresponding indices, and then use product() to get all the combinations of picking one element at each index:
>>> lst1, lst2 = ['a','b'], ['c','d']
>>> list(product(*zip(lst1, lst2)))
[('a', 'b'), ('a', 'd'), ('c', 'b'), ('c', 'd')]
This can easily be extended to an arbitrary number and length of lists, but the size of the result will grow exponentially.
You have to convert
['a','b'], ['c','d']
to
['a','c'], ['b','d']
(first list has values for index 1, second list has values for index 2)
and then use itertools.product()
import itertools
a, b = zip(['a','b'], ['c','d'])
#print(a, b) # ['a','c'], ['b','d']
data = itertools.product(a, b)
data = ["".join(item) for item in data]
print(data)
Result
['ab', 'ad', 'cb', 'cd']

partial intersection - multiple groups

I am not sure how to approach my problem, thus I haven't been able to see if it already exists (apologies in advance)
Group Item
A 1
A 2
A 3
B 1
B 3
C 1
D 2
D 3
I want to know all combinations of groups that share more than X items (2 in this example). And I want to know which items they share.
RESULT:
A-B: 2 (item 1 and item 3)
A-D: 2 (item 2 and item 3)
The list of groups and items is really long and the maximum number of item matches across groups is probably not more than 3-5.
NB More than 2 groups can have shared items - e.g. A-B-E: 3
So it's not sufficient to only compare two groups at a time. I need to compare all combination of groups.
My thoughts
First round: one pile of all groups - are at least two values shared amongst all?
Second round: All-1 group (all combinations)
Third round: All-2 groups (all combinations)
Untill I reach the comparison between only two groups (all combinations).
However this seems super heavy performance-wise!! And I have no idea of how to do this.
What are your thoughts?
Thanks!
Unless you have additional information to restrict the search, I would just process all subsets (having size >= 2) of the set of unique groups.
For each subset, I would search the items belonging to all members of the set:
a = df['Group'].unique()
for cols in chain(*(combinations(a, i) for i in range(2, len(a) + 1))):
vals = df['Item'].unique()
for col in cols:
vals = df.loc[(df.Group==col)&(df.Item.isin(vals)), 'Item'].unique()
if len(vals) > 0: print(cols, vals)
it gives:
('A', 'B') [1 3]
('A', 'C') [1]
('A', 'D') [2 3]
('B', 'C') [1]
('B', 'D') [3]
('A', 'B', 'C') [1]
('A', 'B', 'D') [3]
This is how I would approach the problem, it may not be the most efficient way to deal with it, but it has the merit to be clear.
List for each group, all items possessed by the group.
Then for each pair of group, list all shared items (for instance, for all items of group A, check if it is an item of group B).
Check if the number of shared items is higher than your threshold X.
It's not an off-the-shelf function, but it should be rather easy (or at least a good exercise) to implement.
Have fun !
Here is new solution that will work with all combinations
Steps:
get dataframe "grouped" which groups/lists all groups the the item is in
from each row of grouped get all possible combinations of group which has some common items
from "grouped" dataframe count for each combination if there are 2 or more common items add that in dictionary
Note: It only loop through group combinations that has common items so if you have lots of groups its already filters out huge part of possible combinations that don't have common items
import numpy as np
import pandas as pd
from itertools import combinations
d = {
"Group": "A,A,A,B,B,C,D,D".split(","),
"Item": [1,2,3,1,3,1,2,3]
}
df = pd.DataFrame(d)
grouped = df.groupby("Item").apply(lambda x: list(x.Group))
all_combinations_with_common = [sorted(combinations(item, i)) for item in grouped
for i in range(2, len(item)) if len(item)>=2]
all_combinations_with_common = np.concatenate(all_combinations_with_common)
commons = {}
REPEAT_COUNT = 2
for comb in all_combinations_with_common:
items = grouped.apply(lambda x: np.all(np.in1d(comb, x)))
if sum(items)>=REPEAT_COUNT:
commons["-".join(comb)] = grouped[items].index.values
display(commons)
output
{'A-B': array([1, 3]), 'A-D': array([2, 3])}

Pythonic way to generate a pseudo ordered pair list from an iterable (e.g. a list or set)

Given an iterable consisting of a finite set of elements:
(a, b, c, d)
as an example
What would be a Pythonic way to generate the following (pseudo) ordered pair from the above iterable:
ab
ac
ad
bc
bd
cd
A trivial way would be to use for loops, but I'm wondering if there is a pythonic way of generating this list from the iterable above ?
Try using combinations.
import itertools
combinations = itertools.combinations('abcd', n)
will give you an iterator, you can loop over it or convert it into a list with list(combinations)
In order to only include pairs as in your example, you can pass 2 as the argument:
combinations = itertools.combinations('abcd', 2)
>>> print list(combinations)
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
You can accomplish this in a list comprehension.
data = [1, 2, 3, 4]
output = [(data[n], next) for n in range(0,len(data)) for next in data[n:]]
print(repr(output))
Pulling the comprehension apart, it's making a tuple of the first member of the iterable and the first object in a list composed of all other members of the iterable. data[n:] tells Python to take all members after the nth.
Here it is in action.
Use list comprehensions - they seem to be considered pythonic.
stuff = ('a','b','c','d')
Obtain the n-digit binary numbers, in string format, where only two of the digits are one and n is the length of the items.
n = len(stuff)
s = '{{:0{}b}}'.format(n)
z = [s.format(x) for x in range(2**n) if s.format(x).count('1') ==2]
Find the indices of the ones for each combination.
y = [[i for i, c in enumerate(combo) if c == '1'] for combo in z]
Use the indices to select the items, then sort.
x = [''.join(operator.itemgetter(*indices)(stuff)) for indices in y]
x.sort()

Most efficient way to list all oriented cycles given n elements in Python

I have a list of elements which can be quite big (100+ elements): elements = [a, b, c, d, e, f, g...].
and I need to build the list of all possible directed cycles, considering that the sequences
[a,b,c,d,e], [b,c,d,e,a], [c,d,e,a,b], [d,e,a,b,c], [e,a,b,c,d] are considered identical since they are different representations of the same directed cycle. Only the starting point differs.
Also, since direction matters, [a,b,c,d,e] and [e,d,c,b,a] are different.
I am looking for all the oriented cycles of all lengths, from 2 to len(elements). What's the most pythonic way to do it leveraging the optimization of built-in permutations, combinations, etc ?.
Maybe I'm missing something, but this seems straightforward to me:
def gen_oriented_cycles(xs):
from itertools import combinations, permutations
for length in range(2, len(xs) + 1):
for pieces in combinations(xs, length):
first = pieces[0], # 1-tuple
for rest in permutations(pieces[1:]):
yield first + rest
Then, e.g.,
for c in gen_oriented_cycles('abcd'):
print c
displays:
('a', 'b')
('a', 'c')
('a', 'd')
('b', 'c')
('b', 'd')
('c', 'd')
('a', 'b', 'c')
('a', 'c', 'b')
('a', 'b', 'd')
('a', 'd', 'b')
('a', 'c', 'd')
('a', 'd', 'c')
('b', 'c', 'd')
('b', 'd', 'c')
('a', 'b', 'c', 'd')
('a', 'b', 'd', 'c')
('a', 'c', 'b', 'd')
('a', 'c', 'd', 'b')
('a', 'd', 'b', 'c')
('a', 'd', 'c', 'b')
Is that missing some essential property you're looking for?
EDIT
I thought it might be missing this part of your criteria:
Also, since direction matters, [a,b,c,d,e] and [e,d,c,b,a] are different.
but on second thought I think it meets that requirement, since [e,d,c,b,a] is the same as [a,e,d,c,b] to you.
Is there any good reason to have a canonical representation in memory of this? It's going to be huge, and possibly whatever use case you have for this may have a better way of dealing with it.
It looks like for your source material, you would use any combination of X elements, not necessarily even homogeneous ones? (i.e. you would have (a,e,g,x,f) etc.). Then, I would do this as a nested loop. The outer one would select by length, and select subsets of the entire list to use. The inner one would construct combinations of the subset, and then throw out matching items. It's going to be slow no matter how you do it, but I would use a dictionary with a frozenset as the key (of the items, for immutability and fast lookup), and the items to be a list of already-detected cycles. It's going to be slow/long-running no matter how you do it, but this is one way.
First, you need a way to determine if two tuples (or lists) represent the same cycle. You can do that like this:
def match_cycle(test_cycle, ref_cycle):
try:
refi = ref_cycle.index(test_cycle[0])
partlen = len(ref_cycle) - refi
return not (any(test_cycle[i] - ref_cycle[i+refi] for i in range(partlen)) or
any(test_cycle[i+partlen] - ref_cycle[i] for i in range(refi)))
except:
return False
Then, the rest.
def all_cycles(elements):
for tuple_length in range(2, len(elements)):
testdict = defaultdict(list)
for outer in combinations(elements, tuple_length):
for inner in permutations(outer):
testset = frozenset(inner)
if not any(match_cycle(inner, x) for x in testdict[testset]):
testdict[testset].append(inner)
yield inner
This produced 60 items for elements of length 5, which seems about right and looked OK from inspection. Note that this is going to be exponential though.... length(5) took 1.34 ms/loop. length(6) took 22.1 ms. length(7) took 703 ms, length(8) took 33.5 s. length(100) might finish before you retire, but I wouldn't bet on it.
there might a better way, and probably is, but in general the number of subsets in 100 elements is pretty large, even when reduced some for cycles. So this is probably not the right way to approach whatever problem you are trying to solve.
This may work:
import itertools
import collections
class Cycle(object):
def __init__(self, cycle):
self.all_possible = self.get_all_possible(cycle)
self.canonical = self.get_canonical(self.all_possible)
def __eq__(self, other):
return self.canonical == other.canonical
def __hash__(self):
return hash(self.canonical)
def get_all_possible(self, cycle):
output = []
cycle = collections.deque(cycle)
for i in xrange(len(cycle)):
cycle.rotate(1)
output.append(list(cycle))
return output
def get_canonical(self, cycles):
return min(map(tuple, cycles), key=lambda item: hash(item))
def __repr__(self):
return 'Cycle({0})'.format(self.canonical)
def list_cycles(elements):
output = set()
for i in xrange(2, len(elements) + 1):
output.update(set(map(Cycle, itertools.permutations(elements, i))))
return list(output)
def display(seq):
for cycle in seq:
print cycle.canonical
print '\n'.join(' ' + str(item) for item in cycle.all_possible)
def main():
elements = 'abcdefghijkl'
final = list_cycles(elements)
display(final)
if __name__ == '__main__':
main()
It creates a class to represent any given cycle, which will be hashed and checked for equality against a canonical representation of the cycle. This lets a Cycle object be placed in a set, which will automatically filter out any duplicates. Unfortunately, it's not going to be highly efficient, since it generates every single possible permutation first.
This should give you the right answer with cycles with length 2 to len(elements). Might not be the fastest way to do it though. I used qarma's hint of rotating it to always start with the smallest element.
from itertools import permutations
def rotate_min(l):
'''Rotates the list so that the smallest element comes first '''
minIndex = l.index(min(l))
rotatedTuple = l[minIndex:] + l[:minIndex]
return rotatedTuple
def getCycles(elements):
elementIndicies = tuple(range(len(elements))) #tupple is hashable so it works with set
cyclesIndices = set()
cycles = []
for length in range(2, len(elements)+1):
allPermutation = permutations(elementIndicies, length)
for perm in allPermutation:
rotated_perm = rotate_min(perm)
if rotated_perm not in cyclesIndices:
#If the cycle of indices is not in the set, add it.
cyclesIndices.add(rotated_perm)
#convert indicies to the respective elements and append
cycles.append([elements[i] for i in rotated_perm])
return cycles

Generate combinations of elements from multiple lists

I'm making a function that takes a variable number of lists as input (i.e., an arbitrary argument list).
I need to compare each element from each list to each element of all other lists, but I couldn't find any way to approach this.
Depending on your goal, you can make use of some of the itertools utilities. For example, you can use itertools.product on *args:
from itertools import product
for comb in product(*args):
if len(set(comb)) < len(comb):
# there are equal values....
But currently it's not very clear from your question what you want to achieve. If I didn't understand you correctly, you can try to state the question in a more specific way.
I think #LevLeitsky's answer is the best way to do a loop over the items from your variable number of lists. However, if purpose the loop is just to find common elements between pairs of items from the lists, I'd do it a bit differently.
Here's an approach that finds the common elements between each pair of lists:
import itertools
def func(*args):
sets = [set(l) for l in args]
for a, b in itertools.combinations(sets, 2):
common = a & b # set intersection
# do stuff with the set of common elements...
I'm not sure what you need to do with the common elements, so I'll leave it there.
The itertools module provides a lot of useful tools just for such tasks. You can adapt the following example to your task by integrating it into your specific comparison logic.
Note that the following assumes a commutative function. That is, about half of the tuples are omitted for reasons of symmetry.
Example:
import itertools
def generate_pairs(*args):
# assuming function is commutative
for i, l in enumerate(args, 1):
for x, y in itertools.product(l, itertools.chain(*args[i:])):
yield (x, y)
# you can use lists instead of strings as well
for x, y in generate_pairs("ab", "cd", "ef"):
print (x, y)
# e.g., apply your comparison logic
print any(x == y for x, y in generate_pairs("ab", "cd", "ef"))
print all(x != y for x, y in generate_pairs("ab", "cd", "ef"))
Output:
$ python test.py
('a', 'c')
('a', 'd')
('a', 'e')
('a', 'f')
('b', 'c')
('b', 'd')
('b', 'e')
('b', 'f')
('c', 'e')
('c', 'f')
('d', 'e')
('d', 'f')
False
True
if you want the arguments as dictionary
def kw(**kwargs):
for key, value in kwargs.items():
print key, value
if you want all the arguments as list:
def arg(*args):
for item in args:
print item
you can use both
def using_both(*args, **kwargs) :
kw(kwargs)
arg(args)
call it like that:
using_both([1,2,3,4,5],a=32,b=55)

Categories

Resources