I have two lists of items:
A = 'mno'
B = 'xyz'
I want to generate all permutations, without replacement, simulating replacing all combinations of items in A with items in B, without repetition. e.g.
>>> do_my_permutation(A, B)
['mno', 'xno', 'mxo', 'mnx', 'xyo', 'mxy', 'xyz', 'zno', 'mzo', 'mnz', ...]
This is straight-forward enough for me to write from scratch, but I'm aware of Python's starndard itertools module, which I believe may already implement this. However, I'm having trouble identifying the function that implements this exact behavior. Is there a function in this module I can use to accomplish this?
Is this what you need:
["".join(elem) for elem in itertools.permutations(A+B, 3)]
and replace permutations with combinations if you want all orderings of the same three letters to be collapsed down into a single item (e.g. so that 'mxo' and 'mox' do not each individually appear in the output).
You're looking for itertools.permutations.
From the docs:
Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values.
To have only unique, lexically sorted, permutations, you can use this code:
import itertools
A = 'mno'
B = 'xyz'
s= {"".join(sorted(elem)) for elem in itertools.permutations(A+B, 3)}
Related
Is there a default python function to be able to separate group of numbers without using a conventional loop?
inputArray=["slide_0000_00.jpg", "slide_0000_01.jpg","slide_0000_02.jpg","slide_0001_01.jpg","slide_0001_02.jpg","slide_0002_01.jpg"]
resultArray=[["slide_0000_01.jpg", "slide_0000_02.jpg", "slide_0000_03.jpg"],["slide_0001_01.jpg", "slide_0001_02.jpg"], ["slide_0002_01.jpg"]]
use itertools.groupby to group consecutive items by middle part:
inputArray=["slide_0000_00.jpg",
"slide_0000_01.jpg",
"slide_0000_02.jpg",
"slide_0001_01.jpg",
"slide_0001_02.jpg",
"slide_0002_01.jpg"]
import itertools
result = [list(g) for _,g in itertools.groupby(inputArray,key = lambda x:x.split("_")[1])]
which gives:
>>> result
[['slide_0000_00.jpg', 'slide_0000_01.jpg', 'slide_0000_02.jpg'],
['slide_0001_01.jpg', 'slide_0001_02.jpg'],
['slide_0002_01.jpg']]
note that if the groups don't follow, the grouping won't work (unless you sort the list first, here simple sort would work but the complexity isn't satisfactory). A classic alternative in that case is to use collections.defaultdict(list):
import collections
d = collections.defaultdict(list)
for x in inputArray:
d[x.split("_")[1]].append(x)
result = list(d.values())
the result is identical (order can vary, depending on the version of python and if dictionaries preserve order. You can expect that property from version 3.5)
I am given a sequence of letters and have to produce all the N-length anagrams of the sequence given, where N is the length of the sequence.
I am following a kinda naive approach in python, where I am taking all the permutations in order to achieve that. I have found some similar threads like this one but I would prefer a math-oriented approach in Python. So what would be a more performant alternative to permutations? Is there anything particularly wrong in my attempt below?
from itertools import permutations
def find_all_anagrams(word):
pp = permutations(word)
perm_set = set()
for i in pp:
perm_set.add(i)
ll = [list(i) for i in perm_set]
ll.sort()
print(ll)
If there are lots of repeated letters, the key will be to produce each anagram only once instead of producing all possible permutations and eliminating duplicates.
Here's one possible algorithm which only produces each anagram once:
from collections import Counter
def perm(unplaced, prefix):
if unplaced:
for element in unplaced:
yield from perm(unplaced - Counter(element), prefix + element)
else:
yield prefix
def permutations(iterable):
yield from perm(Counter(iterable), "")
That's actually not much different from the classic recursion to produce all permutations; the only difference is that it uses a collections.Counter (a multiset) to hold the as-yet-unplaced elements instead of just using a list.
The number of Counter objects produced in the course of the iteration is certainly excessive, and there is almost certainly a faster way of writing that; I chose this version for its simplicity and (hopefully) its clarity
This is very slow for long words with many similar characters. Slow compared to theoretical maximum performance that is. For example, permutations("mississippi") will produce a much longer list than necessary. It will have a length of 39916800, but but the set has a size of 34650.
>>> len(list(permutations("mississippi")))
39916800
>>> len(set(permutations("mississippi")))
34650
So the big flaw with your method is that you generate ALL anagrams and then remove the duplicates. Use a method that only generates the unique anagrams.
EDIT:
Here is some working, but extremely ugly and possibly buggy code. I'm making it nicer as you're reading this. It does give 34650 for mississippi, so I assume there aren't any major bugs. Warning again. UGLY!
# Returns a dictionary with letter count
# get_letter_list("mississippi") returns
# {'i':4, 'm':1, 'p': 2, 's':4}
def get_letter_list(word):
w = sorted(word)
c = 0
dd = {}
dd[w[0]]=1
for l in range(1,len(w)):
if w[l]==w[l-1]:
d[c]=d[c]+1
dd[w[l]]=dd[w[l]]+1
else:
c=c+1
d.append(1)
dd[w[l]]=1
return dd
def sum_dict(d):
s=0
for x in d:
s=s+d[x]
return s
# Recursively create the anagrams. It takes a letter list
# from the above function as an argument.
def create_anagrams(dd):
if sum_dict(dd)==1: # If there's only one letter left
for l in dd:
return l # Ugly hack, because I'm not used to dics
a = []
for l in dd:
if dd[l] != 0:
newdd=dict(dd)
newdd[l]=newdd[l]-1
if newdd[l]==0:
newdd.pop(l)
newl=create(newdd)
for x in newl:
a.append(str(l)+str(x))
return a
>>> print (len(create_anagrams(get_letter_list("mississippi"))))
34650
It works like this: For every unique letter l, create all unique permutations with one less occurance of the letter l, and then append l to all these permutations.
For "mississippi", this is way faster than set(permutations(word)) and it's far from optimally written. For instance, dictionaries are quite slow and there's probably lots of things to improve in this code, but it shows that the algorithm itself is much faster than your approach.
Maybe I am missing something, but why don't you just do this:
from itertools import permutations
def find_all_anagrams(word):
return sorted(set(permutations(word)))
You could simplify to:
from itertools import permutations
def find_all_anagrams(word):
word = set(''.join(sorted(word)))
return list(permutations(word))
In the doc for permutations the code is detailled and it seems already optimized.
I don't know python but I want to try to help you: probably there are a lot of other more performant algorithm, but I've thought about this one: it's completely recursive and it should cover all the cases of a permutation. I want to start with a basic example:
permutation of ABC
Now, this algorithm works in this way: for Length times you shift right the letters, but the last letter will become the first one (you could easily do this with a queue).
Back to the example, we will have:
ABC
BCA
CAB
Now you repeat the first (and only) step with the substring built from the second letter to the last one.
Unfortunately, with this algorithm you cannot consider permutation with repetition.
The following examples give the same result:
A.
product = []
for a in "abcd":
for b in "xy":
product.append((a,b))
B.
from itertools import product
list(product("abcd","xy"))
How can I calculate the cartesian product like in example A when I don't know the number of arguments n?
REASON I'm asking this:
Consider this piece of code:
allocations = list(product(*strategies.values()))
for alloc in allocations:
PWC[alloc] = [a for (a,b) in zip(help,alloc) if coalitions[a] >= sum(b)]
The values of the strategies dictionary are list of tuples, help is an auxiliary variable (a list with the same length of every alloc) and coalitions is another dictionary that assigns to the tuples in help some numeric value.
Since strategies values are sorted, I know that the if statement won't be true anymore after a certain alloc. Since allocations is a pretty big list, I would avoid tons of comparisons and tons of sums if I could use the example algorithm A.
You can do:
items = ["abcd","xy"]
from itertools import product
list(product(*items))
The list items can contain an arbitrary number of strings and it'll the calculation with product will provide you with the Cartesian product of those strings.
Note that you don't have to turn it into a list - you can iterate over it and stop when you no longer wish to continue:
for item in product(*items):
print(item)
if condition:
break
If you just want to abort the allocations after you hit a certain condition, and you want to avoid generating all the elements from the cartesian product for those, then simply don’t make a list of all combinations in the first place.
itertools.product is lazy that means that it will only generate a single value of the cartesian product at a time. So you never need to generate all elements, and you also never need to compare the elements then. Just don’t call list() on the result as that would iterate the whole sequence and store all possible combinations in memory:
allocations = product(*strategies.values())
for alloc in allocations:
PWC[alloc] = [a for (a,b) in zip(help,alloc) if coalitions[a] >= sum(b)]
# check whether you can stop looking at more values from the cartesian product
if someCondition(alloc):
break
It’s just important to note how itertools.product generates the values, what pattern it follows. It’s basically equivalent to the following:
for a in firstIterable:
for b in secondIterable:
for c in thirdIterable:
…
for n in nthIterable:
yield (a, b, c, …, n)
So you get an increasing pattern from the left side of your iterables. So make sure that you order the iterables in a way that you can correctly specify a break condition.
I am trying to create a program which outputs all permutations of a string of length n whilst avoiding a defined substring, of length k. For example:
Derive all possible strings, up to a length of 5 characters, that can be generated from an initial empty set, which can either go to A or B, but the string cannot contain the substring "AAB" which is not allowed.
i.e. base case of [""] is the empty set.
The dictionary would be - A:{A}, B:{A,B}
From the empty set we can go to A, and we can go to B. We can not go to a B after an A but we can go to an A after a B. And both A and B can access themselves
example output: a,b,aa,bb,ba,aaa,bbb,baa,bba ... etc
How would I go about prompting a user to define a substring to avoid, and from that generate a dictionary which abides to these rules?
Any help or clarification would be greatly received.
Regards,
rkhad
The itertools module has a useful method called permutations():
(from http://docs.python.org/library/itertools.html#itertools.permutations)
itertools.permutations(iterable[, r])
Return successive r length permutations of elements in the iterable.
If r is not specified or is None, then r defaults to the length of the
iterable and all possible full-length permutations are generated.
Permutations are emitted in lexicographic sort order. So, if the input
iterable is sorted, the permutation tuples will be produced in sorted
order.
List comprehensions provide an easy way to filter generated permutations like this, but beware that if you are storing permutations of a large string that you will quickly get a very large list. You may want to therefore use a set to whittle down your list to non-duplicates. Also, you may find the function sorted to be useful if you intend to iterate through your "paths" in lexicographic order. Lastly, the in operator, when applied to strings, checks for a substring (x in y checks if x is a substring of y).
>>> from itertools import permutations
>>> perms = [''.join(p) for p in permutations('AAAABBBB', 4)]
>>> len(perms)
1680
>>> len(set(perms))
16
>>> filtered = [p for p in sorted(set(perms)) if 'AB' not in p]
>>> filtered
['AAAA', 'BAAA', 'BBAA', 'BBBA', 'BBBB']
I'm working on my dissertation right now too, in the area of Formal Languages. The concept of substring membership can be represented by a very simple regular grammar which corresponds to a deterministic finite automaton. To jog your memory:
http://en.wikipedia.org/wiki/Regular_grammar
http://en.wikipedia.org/wiki/Finite-state_machine
When you look into these you will find that you need to somehow keep track of the current "state" of your computation if you want it to have different "dictionaries" at different phases. I encourage you to read the wikipedia articles, and ask me some follow-up questions as I'd be happy to help you work through this.
The following code creates a multi dimensional list (not sure if that's the Pythonic was of saying it. PHP guy here)
patterns.append(list(itertools.permutations('1234567',7)))
the value of patterns becomes:
([
[1,2,3,4,5,6,7],
[1,2,3,4,5,7,6], ...
])
What I want is for the result to be like this:
([1,2,3,4,5,6,7], [1,2,3,4,5,7,6]...)
If i try doing:
patterns = list(itertools.permutations('1234567',7))
the result is a list of individual numbers
123445671234576
What am I missing?
Thanks,
You extend() instead of append().
patterns.extend(itertools.permutations('1234567',7))
This also makes list() redundant because extend() works on iterables.
This assumes you are ok with the permutations themselves being tuples. Your question is confusing because you the notation doesn't correspond with what you wrote in words.
If you need to get
([1,2,3,4,5,6,7], [1,2,3,4,5,7,6]...)
than you can use:
from itertools import permutations
patterns = tuple(list(int(y) for y in x) for x in permutations('1234567',7))
OR you can use xrange instead of '1234567' if you need to get numbers:
patterns = tuple(list(x) for x in permutations(xrange(1,8),7))
You can get a tuple of lists with
tuple(list(p) for p in itertools.permutations('1234567', 7))
If you want integers instead of one-element strings, then an easy and general way to do that is
digits = [int(digit) for digit in '1234567']
tuple(list(p) for p in itertools.permutations(digits, 7))