Separating a String - python

Given a string, I want to generate all possible combinations. In other words, all possible ways of putting a comma somewhere in the string.
For example:
input: ["abcd"]
output: ["abcd"]
["abc","d"]
["ab","cd"]
["ab","c","d"]
["a","bc","d"]
["a","b","cd"]
["a","bcd"]
["a","b","c","d"]
I am a bit stuck on how to generate all the possible lists. Combinations will just give me lists with length of subset of the set of strings, permutations will give all possible ways to order.
I can make all the cases with only one comma in the list because of iterating through the slices, but I can't make cases with two commas like "ab","c","d" and "a","b","cd"
My attempt w/slice:
test="abcd"
for x in range(len(test)):
print test[:x],test[x:]

How about something like:
from itertools import combinations
def all_splits(s):
for numsplits in range(len(s)):
for c in combinations(range(1,len(s)), numsplits):
split = [s[i:j] for i,j in zip((0,)+c, c+(None,))]
yield split
after which:
>>> for x in all_splits("abcd"):
... print(x)
...
['abcd']
['a', 'bcd']
['ab', 'cd']
['abc', 'd']
['a', 'b', 'cd']
['a', 'bc', 'd']
['ab', 'c', 'd']
['a', 'b', 'c', 'd']

You can certainly use itertools for this, but I think it's easier to write a recursive generator directly:
def gen_commas(s):
yield s
for prefix_len in range(1, len(s)):
prefix = s[:prefix_len]
for tail in gen_commas(s[prefix_len:]):
yield prefix + "," + tail
Then
print list(gen_commas("abcd"))
prints
['abcd', 'a,bcd', 'a,b,cd', 'a,b,c,d', 'a,bc,d', 'ab,cd', 'ab,c,d', 'abc,d']
I'm not sure why I find this easier. Maybe just because it's dead easy to do it directly ;-)

You could generate the power set of the n - 1 places that you could put commas:
what's a good way to combinate through a set?
and then insert commas in each position.

Using itertools:
import itertools
input_str = "abcd"
for k in range(1,len(input_str)):
for subset in itertools.combinations(range(1,len(input_str)), k):
s = list(input_str)
for i,x in enumerate(subset): s.insert(x+i, ",")
print "".join(s)
Gives:
a,bcd
ab,cd
abc,d
a,b,cd
a,bc,d
ab,c,d
a,b,c,d
Also a recursive version:
def commatoze(s,p=1):
if p == len(s):
print s
return
commatoze(s[:p] + ',' + s[p:], p + 2)
commatoze(s, p + 1)
input_str = "abcd"
commatoze(input_str)

You can solve the integer composition problem and use the compositions to guide where to split the list. Integer composition can be solved fairly easily with a little bit of dynamic programming.
def composition(n):
if n == 1:
return [[1]]
comp = composition (n - 1)
return [x + [1] for x in comp] + [y[:-1] + [y[-1]+1] for y in comp]
def split(lst, guide):
ret = []
total = 0
for g in guide:
ret.append(lst[total:total+g])
total += g
return ret
lst = list('abcd')
for guide in composition(len(lst)):
print split(lst, guide)
Another way to generate integer composition:
from itertools import groupby
def composition(n):
for i in xrange(2**(n-1)):
yield [len(list(group)) for _, group in groupby('{0:0{1}b}'.format(i, n))]

Given
import more_itertools as mit
Code
list(mit.partitions("abcd"))
Output
[[['a', 'b', 'c', 'd']],
[['a'], ['b', 'c', 'd']],
[['a', 'b'], ['c', 'd']],
[['a', 'b', 'c'], ['d']],
[['a'], ['b'], ['c', 'd']],
[['a'], ['b', 'c'], ['d']],
[['a', 'b'], ['c'], ['d']],
[['a'], ['b'], ['c'], ['d']]]
Install more_itertools via > pip install more-itertools.

Related

How to generate combination of characters in a string at a particular position?

I have a string list :
li = ['a', 'b', 'c', 'd']
Using the following code in Python, I generated all the possible combination of characters for list li and got a result of 256 strings.
from itertools import product
li = ['a', 'b', 'c', 'd']
for comb in product(li, repeat=4):
print(''.join(comb))
Say for example, I know the character of the second and fourth position of the string in the list li which is 'b' and 'c'.
So the result will be a set of only 16 strings which is :
abac
abbc
abcc
abdc
bbac
bbbc
bbcc
bbdc
cbac
cbbc
cbcc
cbdc
dbac
dbbc
dbcc
dbdc
How to get this result? Is there a Pythonic way to achieve this?
Thanks.
Edit : My desired size of list li is a to z and the value for repeat is 13. When I tried the above code, compiler throwed memory error!
Use list comprehension:
from itertools import product
li = ['a', 'b', 'c', 'd']
combs = [list(x) for x in product(li, repeat=4)]
selected_combs = [comb for comb in combs if (comb[1] == 'b' and comb[3] == 'c')]
print(["".join(comb) for comb in selected_combs])
# ['abac', 'abbc', 'abcc', 'abdc', 'bbac', 'bbbc', 'bbcc', 'bbdc', 'cbac', 'cbbc', 'cbcc', 'cbdc', 'dbac', 'dbbc', 'dbcc', 'dbdc']
To save memory in case you do not need all the combinations combs, you can simply do:
li = ['a', 'b', 'c', 'd']
selected_combs = [comb for comb in product(li, repeat=4) if (comb[1] == 'b' and comb[3] == 'c')]
print(["".join(comb) for comb in selected_combs])
def permute(s):
out = []
if len(s) == 1:
return s
else:
for i,let in enumerate(s):
for perm in permute(s[:i] + s[i+1:]):
out += [let + perm]
return out
per=permute(['a', 'b', 'c', 'd'])
print(per)
Do you want this?

How to get ('a', 'a/b', 'a/b/c') from ('a', 'b', 'c')?

How can I go from this structure
>>> input = ['a', 'b', 'c']
to this one
>>> output
['a', 'a/b', 'a/b/c']
in an elegant (functional) way?
For now I have this:
>>> from functools import reduce
>>> res = []
>>> for i in range(len(input)):
... res.append(reduce(lambda a, b: a + '/' + b, input[:i+1]))
...
>>> res
['a', 'a/b', 'a/b/c']
You can use itertools.accumulate():
from itertools import accumulate
l = ['a', 'b', 'c']
print(list(accumulate(l, '{}/{}'.format)))
This outputs:
['a', 'a/b', 'a/b/c']
You can do this using a simple list comprehension.
l = ['a', 'b', 'c']
['/'.join(l[:i]) for i in range(1, len(l)+1)]
# ['a', 'a/b', 'a/b/c']
If performance is important, you can roll out your own implementation of accumulate:
out = [l[0]]
for l_ in l[1:]:
out.append('{}/{}'.format(out[-1], l_))
out
# ['a', 'a/b', 'a/b/c']
This turns out to be slightly faster than itertools for the given problem.
This should work:
l = ['a', 'b', 'c']
new_list =[]
for i in range(len(l)):
new_list.append("/".join([a for a in l[:i+1]]))
If you must use reduce you could do it like this:
from functools import reduce
input = ['a', 'b', 'c']
output = [reduce(lambda a, b: f"{a}/{b}", input[:n + 1]) for n in range(0, len(input))]
I prefer the built in join function:
output = ['/'.join(input[:n + 1]) for n in range(0, len(input))]
You can use count to slice a string in steps:
from itertools import count
input = ['a', 'b', 'c']
s = '/'.join(input)
c = count(1, 2)
[s[:next(c)] for _ in input]
# ['a', 'a/b', 'a/b/c']
a recursive solution:
The idea is quite simple, we use divide and conquer.
Problem can be solved if we know the answer to the first n-1 string(or char), in this case, what we need to do is just collect all the characters in one string and separate them by '/'('a/b/c' in this case).
we pass an empty list as the 2nd parameter to store the result.
input = ['a', 'b', 'c']
def foo(list1, list2):
if (len(list1) == 0):
return list2
else:
s = list1[0]
for char in list1[1:]:
s += '/' + char
list2.insert(0, str)
return foo(list1[:-1], list2)
>>> foo(input, [])
['a', 'a/b', 'a/b/c']

Python all combinations of a list of lists

So I have a list of lists of strings
[['a','b'],['c','d'],['e','f']]
and I want to get all possible combinations, such that the result is
[['a','b'],['c','d'],['e','f'],
['a','b','c','d'],['a','b','e','f'],['c','d','e','f'],
['a','b','c','d','e','f']]
So far I have come up with this code snippet
input = [['a','b'],['c','d'],['e','f']]
combs = []
for i in xrange(1, len(input)+1):
els = [x for x in itertools.combinations(input, i)]
combs.extend(els)
print combs
largely following an answer in this post.
But that results in
[(['a','b'],),(['c','d'],),(['e','f'],),
(['a','b'],['c','d']),(['a','b'],['e','f']),(['c','d'],['e','f']),
(['a','b'],['c', 'd'],['e', 'f'])]
and I am currently stumped, trying to find an elegant, pythonic way to unpack those tuples.
You can use itertools.chain.from_iterable to flatten the tuple of lists into a list. Example -
import itertools
input = [['a','b'],['c','d'],['e','f']]
combs = []
for i in xrange(1, len(input)+1):
els = [list(itertools.chain.from_iterable(x)) for x in itertools.combinations(input, i)]
combs.extend(els)
Demo -
>>> import itertools
>>> input = [['a','b'],['c','d'],['e','f']]
>>> combs = []
>>> for i in range(1, len(input)+1):
... els = [list(itertools.chain.from_iterable(x)) for x in itertools.combinations(input, i)]
... combs.extend(els)
...
>>> import pprint
>>> pprint.pprint(combs)
[['a', 'b'],
['c', 'd'],
['e', 'f'],
['a', 'b', 'c', 'd'],
['a', 'b', 'e', 'f'],
['c', 'd', 'e', 'f'],
['a', 'b', 'c', 'd', 'e', 'f']]
One idea for such a goal is to map integers from [0..2**n-1] where n is the number of sublists to all your target element according to a very simple rule:
Take the element of index k if (2**k)&i!=0 where i runs over [0..2**n-1]. In other word, i has to be read bitwise, and for each bit set, the corresponding element from l is kept. From a mathematical point of view it is one of the cleanest way of achieving what you want to do since it follows very closely the definition of the parts of a set (where you have exactly 2**n parts for a set with n elements).
Not tried but something like that should work:
l = [['a','b'],['c','d'],['e','f']]
n = len(l)
output = []
for i in range(2**n):
s = []
for k in range(n):
if (2**k)&i: s = s + l[k]
output.append(s)
If you don't want the empty list, just replace the relevant line with:
for i in range(1,2**n):
If you want all combinations, you may consider this simple way:
import itertools
a = [['a','b'],['c','d'],['e','f']]
a = a + [i + j for i in a for j in a if i != j] + [list(itertools.chain.from_iterable(a))]
With comprehension lists :
combs=[sum(x,[]) for i in range(len(l)) for x in itertools.combinations(l,i+1)]

Python: find all possible word combinations with a sequence of characters (word segmentation)

I'm doing some word segmentation experiments like the followings.
lst is a sequence of characters, and output is all the possible words.
lst = ['a', 'b', 'c', 'd']
def foo(lst):
...
return output
output = [['a', 'b', 'c', 'd'],
['ab', 'c', 'd'],
['a', 'bc', 'd'],
['a', 'b', 'cd'],
['ab', 'cd'],
['abc', 'd'],
['a', 'bcd'],
['abcd']]
I've checked combinations and permutations in itertools library,
and also tried combinatorics.
However, it seems that I'm looking at the wrong side because this is not pure permutation and combinations...
It seems that I can achieve this by using lots of loops, but the efficiency might be low.
EDIT
The word order is important so combinations like ['ba', 'dc'] or ['cd', 'ab'] are not valid.
The order should always be from left to right.
EDIT
#Stuart's solution doesn't work in Python 2.7.6
EDIT
#Stuart's solution does work in Python 2.7.6, see the comments below.
itertools.product should indeed be able to help you.
The idea is this:-
Consider A1, A2, ..., AN separated by slabs. There will be N-1 slabs.
If there is a slab there is a segmentation. If there is no slab, there is a join.
Thus, for a given sequence of length N, you should have 2^(N-1) such combinations.
Just like the below
import itertools
lst = ['a', 'b', 'c', 'd']
combinatorics = itertools.product([True, False], repeat=len(lst) - 1)
solution = []
for combination in combinatorics:
i = 0
one_such_combination = [lst[i]]
for slab in combination:
i += 1
if not slab: # there is a join
one_such_combination[-1] += lst[i]
else:
one_such_combination += [lst[i]]
solution.append(one_such_combination)
print solution
#!/usr/bin/env python
from itertools import combinations
a = ['a', 'b', 'c', 'd']
a = "".join(a)
cuts = []
for i in range(0,len(a)):
cuts.extend(combinations(range(1,len(a)),i))
for i in cuts:
last = 0
output = []
for j in i:
output.append(a[last:j])
last = j
output.append(a[last:])
print(output)
output:
zsh 2419 % ./words.py
['abcd']
['a', 'bcd']
['ab', 'cd']
['abc', 'd']
['a', 'b', 'cd']
['a', 'bc', 'd']
['ab', 'c', 'd']
['a', 'b', 'c', 'd']
There are 8 options, each mirroring the binary numbers 0 through 7:
000
001
010
011
100
101
110
111
Each 0 and 1 represents whether or not the 2 letters at that index are "glued" together. 0 for no, 1 for yes.
>>> lst = ['a', 'b', 'c', 'd']
... output = []
... formatstr = "{{:0{}.0f}}".format(len(lst)-1)
... for i in range(2**(len(lst)-1)):
... output.append([])
... s = "{:b}".format(i)
... s = str(formatstr.format(float(s)))
... lstcopy = lst[:]
... for j, c in enumerate(s):
... if c == "1":
... lstcopy[j+1] = lstcopy[j] + lstcopy[j+1]
... else:
... output[-1].append(lstcopy[j])
... output[-1].append(lstcopy[-1])
... output
[['a', 'b', 'c', 'd'],
['a', 'b', 'cd'],
['a', 'bc', 'd'],
['a', 'bcd'],
['ab', 'c', 'd'],
['ab', 'cd'],
['abc', 'd'],
['abcd']]
>>>
You can use a recursive generator:
def split_combinations(L):
for split in range(1, len(L)):
for combination in split_combinations(L[split:]):
yield [L[:split]] + combination
yield [L]
print (list(split_combinations('abcd')))
Edit. I'm not sure how well this would scale up for long strings and at what point it hits Python's recursion limit. Similarly to some of the other answers, you could also use combinations from itertools to work through every possible combination of split-points.
def split_string(s, t):
return [s[start:finish] for start, finish in zip((None, ) + t, t + (None, ))]
def split_combinations(s):
for i in range(len(s)):
for split_points in combinations(range(1, len(s)), i):
yield split_string(s, split_points)
These both seem to work as intended in Python 2.7 (see here) and Python 3.2 (here). As #twasbrillig says, make sure you indent it as shown.

Create all sequences from the first item within a list

Say I have a list, ['a', 'b', 'c', 'd']. Are there any built-ins or methods in Python to easily create all contiguous sublists (i.e. sub-sequences) starting from the first item?:
['a']
['a', 'b']
['a', 'b', 'c']
['a', 'b', 'c', 'd']
in Python?
Note that I am excluding lists/sequences such as ['a' ,'c'], ['a', 'd'], ['b'], ['c'] or ['d']
To match your example output (prefixes), then you can just use:
prefixes = [your_list[:end] for end in xrange(1, len(your_list) + 1)]
You can do this with a simple list comprehension:
>>> l = ['a', 'b', 'c', 'd']
>>>
>>> [l[:i+1] for i in range(len(l))]
[['a'], ['a', 'b'], ['a', 'b', 'c'], ['a', 'b', 'c', 'd']]
See also: range()
If you're using Python 2.x, use xrange() instead.
A little more Pythonic than using (x)range (with the benefit of being the same solution for either Python 2 or Python 3):
lst = list('abcde')
prefixes = [ lst[:i+1] for i,_ in enumerate(lst) ]
If you decided that the empty list should be a valid (zero-length) prefix, a small hack will include it:
# Include 0 as an slice index and still get the full list as a prefix
prefixes = [ lst[:i] for i,_ in enumerate(lst + [None]) ]
Just as an alternative:
def prefixes(seq):
result = []
for item in seq:
result.append(item)
yield result[:]
for x in prefixes(['a', 'b', 'c', 'd']):
print(x)

Categories

Resources