Getting permutations in Python, itertools

Getting permutations in Python, itertools - python

I want to get all the 3 letter permutations possible from every letter in the alphabet using itertools. This comes back blank:
import itertools
def permutations(ABCDEFGHIJKLMNOPQRSTUVWXYZ, r=3):
pool = tuple(iterable)
n = len(pool)
r = n if r is None else r
for indices in product(range(n), repeat=r):
if len(set(indices)) == r:
yield tuple(pool[i] for i in indices)
What am I doing wrong?

You are a bit mixed up, that is just code explaining what permutations does. itertools is actually written in C code, the python equivalent is just given to show how it works.
>>> from itertools import permutations
>>> from string import ascii_uppercase
>>> for x in permutations(ascii_uppercase, r=3):
print x
('A', 'B', 'C')
('A', 'B', 'D')
('A', 'B', 'E')
('A', 'B', 'F')
.....
That should work fine

The code in the itertools.permutations documentation explains how the function is implemented, not how to use it. You want to do this:
perms = itertools.permutations('ABCDEFGHIJKLMNOPQRSTUVWXYZ', r=3)
You can print them all out by converting it to a list (print(list(perms))), but you can just iterate over them in a for loop if you want to do something else with them - eg,
for perm in perms:
...

Related

Python package for converting finite regex to a text array? [duplicate]

Suppose I have the following string:
trend = '(A|B|C)_STRING'
I want to expand this to:
A_STRING
B_STRING
C_STRING
The OR condition can be anywhere in the string. i.e STRING_(A|B)_STRING_(C|D)
would expand to
STRING_A_STRING_C
STRING_B_STRING C
STRING_A_STRING_D
STRING_B_STRING_D
I also want to cover the case of an empty conditional:
(|A_)STRING would expand to:
A_STRING
STRING
Here's what I've tried so far:
def expandOr(trend):
parenBegin = trend.index('(') + 1
parenEnd = trend.index(')')
orExpression = trend[parenBegin:parenEnd]
originalTrend = trend[0:parenBegin - 1]
expandedOrList = []
for oe in orExpression.split("|"):
expandedOrList.append(originalTrend + oe)
But this is obviously not working.
Is there any easy way to do this using regex?

Here's a pretty clean way. You'll have fun figuring out how it works :-)
def expander(s):
import re
from itertools import product
pat = r"\(([^)]*)\)"
pieces = re.split(pat, s)
pieces = [piece.split("|") for piece in pieces]
for p in product(*pieces):
yield "".join(p)
Then:
for s in ('(A|B|C)_STRING',
'(|A_)STRING',
'STRING_(A|B)_STRING_(C|D)'):
print s, "->"
for t in expander(s):
print " ", t
displays:
(A|B|C)_STRING ->
A_STRING
B_STRING
C_STRING
(|A_)STRING ->
STRING
A_STRING
STRING_(A|B)_STRING_(C|D) ->
STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D

import exrex
trend = '(A|B|C)_STRING'
trend2 = 'STRING_(A|B)_STRING_(C|D)'
>>> list(exrex.generate(trend))
[u'A_STRING', u'B_STRING', u'C_STRING']
>>> list(exrex.generate(trend2))
[u'STRING_A_STRING_C', u'STRING_A_STRING_D', u'STRING_B_STRING_C', u'STRING_B_STRING_D']

I would do this to extract the groups:
def extract_groups(trend):
l_parens = [i for i,c in enumerate(trend) if c == '(']
r_parens = [i for i,c in enumerate(trend) if c == ')']
assert len(l_parens) == len(r_parens)
return [trend[l+1:r].split('|') for l,r in zip(l_parens,r_parens)]
And then you can evaluate the product of those extracted groups using itertools.product:
expr = 'STRING_(A|B)_STRING_(C|D)'
from itertools import product
list(product(*extract_groups(expr)))
Out[92]: [('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]
Now it's just a question of splicing those back onto your original expression. I'll use re for that :)
#python3.3+
def _gen(it):
yield from it
p = re.compile('\(.*?\)')
for tup in product(*extract_groups(trend)):
gen = _gen(tup)
print(p.sub(lambda x: next(gen),trend))
STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D
There's probably a more readable way to get re.sub to sequentially substitute things from an iterable, but this is what came off the top of my head.

It is easy to achieve with sre_yield module:
>>> import sre_yield
>>> trend = '(A|B|C)_STRING'
>>> strings = list(sre_yield.AllStrings(trend))
>>> print(strings)
['A_STRING', 'B_STRING', 'C_STRING']
The goal of sre_yield is to efficiently generate all values that can match a given regular expression, or count possible matches efficiently... It does this by walking the tree as constructed by sre_parse (same thing used internally by the re module), and constructing chained/repeating iterators as appropriate. There may be duplicate results, depending on your input string though -- these are cases that sre_parse did not optimize.

Permutations of several lists in python efficiently

I'm trying to write a python script that will generate random permutations of several lists without repeating
i.e. [a,b] [c,d]
a, c
b,c,
a,d
b,d
I can generate every permutation using the following, however the result is somewhat non random:
for r in itertools.product(list1, list2):
target.write("%s,%s" % (r[0], r[1])
Does anyone know a way i can implement this such that I can extract only 2 permutations, and they will be completely random but ensure that they will never be repeated?

You can use random.choice():
>>> from itertools import product
>>> import random
>>> l1 = ['a', 'b', 'c']
>>> l2 = ['d', 'e', 'f']
>>> prod = tuple(product(l1, l2))
>>>
>>> random.choice(prod)
('c', 'e')
>>> random.choice(prod)
('a', 'f')
>>> random.choice(prod)
('c', 'd')
Or simply use a nested list comprehension for creating the products:
>>> lst = [(i, j) for j in l2 for i in l1]
If you don't want to produce duplicate items you can use a set object which will create a set object from your product without an specified order then you can simply pot the items from it:
>>> prod = set(product(l1, l2))
>>>
>>> prod.pop()
('c', 'f')
>>> prod.pop()
('a', 'f')
>>> prod.pop()
('a', 'd')
Or use shuffle in order to shuffle the iterable, as #ayhan has suggested in his answer.

You can use random.shuffle then pop to make sure the results will not be repeated:
list1 = ["a", "b"]
list2 = ["c", "d"]
p = list(itertools.product(list1, list2))
random.shuffle(p)
e1 = p.pop()
e2 = p.pop()
list(itertools.product()) is not efficient as it generates and stores all of them. If you have big lists you can generate one at a time and check whether they are duplicated:
s = set()
list1 = ["a", "b"]
list2 = ["c", "d"]
while True:
r = (random.choice(list1), random.choice(list2))
if r not in s:
target.write("%s,%s" % (r[0], r[1]))
s.add(r)
break

Run Length Encoding in Python with List Comprehension

I have a more basic Run Length Encoding question compared to many of the questions about this topic that have already been answered. Essentially, I'm trying to take the string
string = 'aabccccaaa'
and have it return
a2b1c4a3
I thought that if I can manage to get all the information into a list like I have illustrated below, I would easily be able to return a2b1c4a3
test = [['a','a'], ['b'], ['c','c','c','c'], ['a','a','a']]
I came up with the following code so far, but was wondering if someone would be able to help me figure out how to make it create the output I illustrated above.
def string_compression():
for i in xrange(len(string)):
prev_item, current_item = string[i-1], string[i]
print prev_item, current_item
if prev_item == current_item:
<HELP>
If anyone has any additional comments regarding more efficient ways to go about solving a question like this I am all ears!

You can use itertools.groupby():
from itertools import groupby
grouped = [list(g) for k, g in groupby(string)]
This will produce your per-letter groups as a list of lists.
You can turn that into a RLE in one step:
rle = ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])
Each k is the letter being grouped, each g an iterator producing N times the same letter; the sum(1 for _ in g) expression counts those in the most efficient way possible.
Demo:
>>> from itertools import groupby
>>> string = 'aabccccaaa'
>>> [list(g) for k, g in groupby(string)]
[['a', 'a'], ['b'], ['c', 'c', 'c', 'c'], ['a', 'a', 'a']]
>>> ''.join(['{}{}'.format(k, sum(1 for _ in g)) for k, g in groupby(string)])
'a2b1c4a3'

Consider using the more_itertools.run_length tool.
Demo
import more_itertools as mit
iterable = "aabccccaaa"
list(mit.run_length.encode(iterable))
# [('a', 2), ('b', 1), ('c', 4), ('a', 3)]
Code
"".join(f"{x[0]}{x[1]}" for x in mit.run_length.encode(iterable)) # python 3.6
# 'a2b1c4a3'
"".join(x[0] + str(x[1]) for x in mit.run_length.encode(iterable))
# 'a2b1c4a3'
Alternative itertools/functional style:
"".join(map(str, it.chain.from_iterable(x for x in mit.run_length.encode(iterable))))
# 'a2b1c4a3'
Note: more_itertools is a third-party library that installable via pip install more_itertools.

I'm a Python beginner and this is what I wrote for RLE.
s = 'aabccccaaa'
grouped_d = [(k, len(list(g))) for k, g in groupby(s)]
result = ''
for key, count in grouped_d:
result += key + str(count)
print(f'result = {result}')

python unique string creation

I've looked at several other SO questions (and google'd tons) that are 'similar'-ish to this, but none of them seem to fit my question right.
I am trying to make a non fixed length, unique text string, only containing characters in a string I specify. E.g. made up of capital and lower case a-zA-Z characters. (for this example I use only a, b, and c lower case)
Something like this (broken code below)
def next(index, validCharacters = 'abc'):
return uniqueShortAsPossibleString
The index argument would be an index (integer) that relate to a text string, for instance:
next(1) == 'a'
next(2) == 'b'
next(3) == 'c'
next(4) == 'aa'
next(5) == 'ab'
next(6) == 'ac'
next(7) == 'ba'
next(8) == 'bb'
next(9) == 'bc'
next(10) == 'ca'
next(11) == 'cb'
next(12) == 'cc'
And so forth. The string:
Must be unique, I'll be using it as an identifier, and it can only be a-zA-Z chars
As short as possible, with lower index numbers being shortest (see above examples)
Contain only the characters specified in the given argument string validCharacters
In conclusion, how could I write the next() function to relate an integer index value to an unique short string with the characters specified?
P.S. I'm new to SO, this site has helped me tons throughout the years, and while I've never made an account or asked a question (till now), I really hope I've done an okay job explaining what I'm trying to accomplish with this.

What you are trying to do is write the parameter of the next function in another base.
Let's suppose validCharacters contains k characters: then the job of the next function will be to transform parameter p into base k by using the characters in validCharacters.
In your example, you can write the numbers in base 3 and then associate each digit with one letter:
next(1) -> 1 -> 'a'
next(2) -> 2 -> 'b'
next(4) -> 11 -> 'aa'
next(7) -> 21 -> 'ba'
And so forth.
With this method, you can call next(x) without knowing or computing any next(x-i), which you can't do with iterative methods.

You're trying to convert a number to a number in another base, but using arbitrary characters for the digits of that base.
import string
chars = string.lowercase + string.uppercase
def identifier(x, chars):
output = []
base = len(chars)
while x:
output.append(chars[x % base])
x /= base
return ''.join(reversed(output))
print identifier(1, chars)
This lets you jump to any position, you're counting so the identifiers are totally unique, and it is easy to use any character set of any length (of two or more), and lower numbers give shorter identifiers.

itertools can always give you obfuscated one-liner iterators:
from itertools import combinations_with_replacement, chain
chars = 'abc'
a = chain(*(combinations_with_replacement(chars, i) for i in range(1, len(chars) + 1)))
Basically, this code creates an iterator that combines all combinations of chars of lengths 1, 2, ..., len(chars).
The output of for x in a: print x is:
('a',)
('b',)
('c',)
('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')

You can't really "associate" the index with annoying, but the following is a generator that will yield and provide the output you're asking for:
from itertools import combinations_with_replacement
def uniquenames(chars):
for i in range(1, len(chars)):
for j in combinations_with_replacement(chars, i):
yield ''.join(j)
print list(uniquenames('abc'))
# ['a', 'b', 'c', 'aa', 'ab', 'ac', 'bb', 'bc', 'cc']

As far as I understood we shouldn't specify maximum length of output string. So range is not enough:
>>> from itertools import combinations_with_replacement, count
>>> def u(chars):
... for i in count(1):
... for k in combinations_with_replacement(chars, i):
... yield "".join(k)
...
>>> g = u("abc")
>>> next(g)
'a'
>>> next(g)
'b'
>>> next(g)
'c'
>>> next(g)
'aa'
>>> next(g)
'ab'
>>> next(g)
'ac'
>>> next(g)
'bb'
>>> next(g)
'bc'

So it seems like you are trying to enumerate through all the strings generated by the language {'a','b','c'}. This can be done using finite state automata (though you don't want to do that). One simple way to enumerate through the language is to start with a list and append all the strings of length 1 in order (so a then b then c). Then append each letter in the alphabet to each string of length n-1. This will keep it in order as long as you append all the letters in the alphabet to a given string before moving on to the lexicographically next string.

Split sublist of a list into other sublists

I am having problems with 'splitting' a larger list into several of it's combinations. Here is an example:
Let's say I have this list:
x = [['a','b'],['c','f'],['q','w','t']]
and I want to end up with
x = [['a','b'],['c','f'],['q','w'],['q','t'],['w','t']]
so essentially
['q','w','t']
becomes
['q','w'],['q','t'],['w','t']
I see how I can convert
['q','w','t']
to
[['q','w'],['q','t'],['w','t']] #notice the extra brackets
with itertools combinations, but then I am stuck with
x = [['a','b'],['c','f'],[['q','w'],['q','t'],['w','t']]] #notice the extra brackets
Which is not what I want.
How should I do this?
EDIT:
Here is the "solution", that does not give me the result that I want:
from itertools import combinations
x = [['a','b'],['c','f'],['q','w','t']]
new_x = []
for sublist in x:
if len(sublist) == 2:
new_x.append(sublist)
if len(sublist) > 2:
new_x.append([list(ele) for ele in (combinations(sublist,2))])
Thank You

I generally use a nested list comprehension to flatten a list like this:
>>> x = [['a','b'],['c','f'],['q','w','t']]
>>> [c for s in x for c in itertools.combinations(s, 2)]
[('a', 'b'), ('c', 'f'), ('q', 'w'), ('q', 't'), ('w', 't')]

Not the best way to do it but pretty clear for understanding:
from itertools import combinations
a = [['a','b'],['c','f'],['q','w','t']]
def get_new_list(x):
newList = []
for l in x:
if len(l) > 2:
newList.extend([list(sl) for sl in combinations(l, 2)])
else:
newList.append(l)
return newList
print get_new_list(a)
>>> [['a','b'],['c','f'],['q','w'],['q','t'],['w','t']]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting permutations in Python, itertools - python

Related

Python package for converting finite regex to a text array? [duplicate]

Permutations of several lists in python efficiently

Run Length Encoding in Python with List Comprehension

python unique string creation

Split sublist of a list into other sublists

Categories

Resources