Generating n-grams from a string

Generating n-grams from a string - python

I need to make a list of all 𝑛 -grams beginning at the head of string for each integer 𝑛 from 1 to M. Then return a tuple of M such lists.
def letter_n_gram_tuple(s, M):
s = list(s)
output = []
for i in range(0, M+1):
output.append(s[i:])
return(tuple(output))
From letter_n_gram_tuple("abcd", 3) output should be:
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd']))
However, my output is:
(['a', 'b', 'c', 'd'], ['b', 'c', 'd'], ['c', 'd'], ['d']).
Should I use string slicing and then saving slices into the list?

you can use nested for, first for about n-gram, second to slice the string
def letter_n_gram_tuple(s, M):
output = []
for i in range(1, M + 1):
gram = []
for j in range(0, len(s)-i+1):
gram.append(s[j:j+i])
output.append(gram)
return tuple(output)
or just one line by list comprehension:
output = [[s[j:j+i] for j in range(0, len(s)-i+1)] for i in range(1, M + 1)]
or use windowed in more_itertools:
import more_itertools
output = [list(more_itertools.windowed(s, i)) for i in range(1, M + 1)]
test and output:
print(letter_n_gram_tuple("abcd", 3))
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd'])

You need one more for loop to iterate over letters or str :
def letter_n_gram_tuple(s, M):
output = []
for i in range(0, M):
vals = [s[j:j+i+1] for j in range(len(s)) if len(s[j:j+i+1]) == i+1]
output.append(vals)
return tuple(output)
print(letter_n_gram_tuple("abcd", 3))
Output:
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd'])

Use the below fuction:
def letter_n_gram_tuple(s, M):
s = list(s)
output = [s]
for i in range(M + 1):
output.append([''.join(sorted(set(a + b), key=lambda x: (a + b).index(x))) for a, b in zip(output[-1], output[-1][1:])])
return tuple(filter(lambda x: len(x) > 1, output))
And now:
print(letter_n_gram_tuple('abcd',3))
Returns:
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd'])

def n_grams(word,max_size):
i=1
output=[]
while i<= max_size:
index = 0
innerArray=[]
while index < len(word)-i+1:
innerArray.append(word[index:index+i])
index+=1
i+=1
output.append(innerArray)
innerArray=[]
return tuple(output)
print(n_grams("abcd",3))

Related

Split a string into chunks of substrings with successively increasing length

Let's say I have this string:
a = 'abcdefghijklmnopqrstuvwxyz'
And I want to split this string into chunks, like below:
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
so that every chunk has a different number of characters. For instance, the first one should have one character, the second two and so on.
If there are not enough characters in the last chunk, then I need to add spaces so it matches the length.
I tried this code so far:
print([a[i: i + i + 1] for i in range(len(a))])
But it outputs:
['a', 'bc', 'cde', 'defg', 'efghi', 'fghijk', 'ghijklm', 'hijklmno', 'ijklmnopq', 'jklmnopqrs', 'klmnopqrstu', 'lmnopqrstuvw', 'mnopqrstuvwxy', 'nopqrstuvwxyz', 'opqrstuvwxyz', 'pqrstuvwxyz', 'qrstuvwxyz', 'rstuvwxyz', 'stuvwxyz', 'tuvwxyz', 'uvwxyz', 'vwxyz', 'wxyz', 'xyz', 'yz', 'z']
Here is my desired output:
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']

I don't think any one liner or for loop will look as elegant, so let's go with a generator:
from itertools import islice, count
def get_increasing_chunks(s):
it = iter(s)
c = count(1)
nxt, c_ = next(it), next(c)
while nxt:
yield nxt.ljust(c_)
nxt, c_ = ''.join(islice(it, c_+1)), next(c)
return out
[*get_increasing_chunks(a)]
# ['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']

Thanks to #Prune's comment, I managed to figure out a way to solve this:
a = 'abcdefghijklmnopqrstuvwxyz'
lst = []
c = 0
for i in range(1, len(a) + 1):
c += i
lst.append(c)
print([a[x: y] + ' ' * (i - len(a[x: y])) for i, (x, y) in enumerate(zip([0] + lst, lst), 1) if a[x: y]])
Output:
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
I find the triangular numbers than do a list comprehension, and add spaces if the length is not right.

so what you need is to have a number that controls how many characters you're going to grab (in this case the amount of iterations), and a second number that remembers what the last index was, plus one last number to tell where to stop.
my_str = "abcdefghijklmnopqrstuvwxyz"
last_index = 0
index = 1
iter_count = 1
while True:
sub_string = my_str[last_index:index]
print(sub_string)
last_index = index
iter_count += 1
index = index + iter_count
if last_index > len(my_str):
break
note that you don't need the while loop. i was just feeling lazy

It seems like the split_into recipe at more_itertools can help here. This is less elegant than the answer by #cs95, but perhaps this will help others discover the utility of the itertools module.
Yield a list of sequential items from iterable of length ‘n’ for each integer ‘n’ in sizes.
>>> list(split_into([1,2,3,4,5,6], [1,2,3]))
[[1], [2, 3], [4, 5, 6]]
To use this, we need to construct a list of sizes like [1, 2, 3, 3, 5, 6, 7].
import itertools
def split_into(iterable, sizes):
it = iter(iterable)
for size in sizes:
if size is None:
yield list(it)
return
else:
yield list(itertools.islice(it, size))
a = 'abcdefghijklmnopqrstuvwxyz'
sizes = [1]
while sum(sizes) <= len(a):
next_value = sizes[-1] + 1
sizes.append(next_value)
# sizes = [1, 2, 3, 4, 5, 6, 7]
list(split_into(a, sizes))
# [['a'],
# ['b', 'c'],
# ['d', 'e', 'f'],
# ['g', 'h', 'i', 'j'],
# ['k', 'l', 'm', 'n', 'o'],
# ['p', 'q', 'r', 's', 't', 'u'],
# ['v', 'w', 'x', 'y', 'z']]
chunks = list(map("".join, split_into(a, sizes)))
# ['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz']
# Pad last item with whitespace.
chunks[-1] = chunks[-1].ljust(sizes[-1], " ")
# ['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']

Here is a solution using accumulate from itertools.
>>> from itertools import accumulate
>>> from string import ascii_lowercase
>>> s = ascii_lowercase
>>> n = 0
>>> accum = 0
>>> while accum < len(s):
n += 1
accum += n
>>> L = [s[j:i+j] for i, j in enumerate(accumulate(range(n)), 1)]
>>> L[-1] += ' ' * (n-len(L[-1]))
>>> L
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
Update: Could also be obtained within the while loop
n = 0
accum = 0
L = []
while accum < len(s):
n += 1
L.append(s[accum:accum+n])
accum += n
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz']

Adding a little to U11-Forward's answer:
a = 'abcdefghijklmnopqrstuvwxyz'
l = list(range(len(a))) # numberes list / 1 to len(a)
triangular = [sum(l[:i+2]) for i in l] # sum of 1, 2 and 1,2,3 and 1,2,3,4 and etc
print([a[x: y].ljust(i, ' ') for i, (x, y) in enumerate(zip([0] + triangular, triangular), 1) if a[x: y]])
Output:
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
Find the triangular numbers, do a list comprehension and fill with spaces if the length is incorrect.

a = 'abcdefghijklmnopqrstuvwxyz'
inc = 0
output = []
for i in range(0, len(a)):
print(a[inc: inc+i+1])
inc = inc+i+1
if inc > len(a):
break
output.append(a[inc: inc+i+1])
print(output)
Hey, here is the snippet for your required output. I have just altered your logic.
Output:
['b', 'de', 'ghi', 'klmn', 'pqrst', 'vwxyz']

complete list if the first and last element is equal

I have a problem trying to transform a list.
The original list is like this:
[['a','b','c',''],['c','e','f'],['c','g','h']]
now I want to make the output like this:
[['a','b','c','e','f'],['a','b','c','g','h']]
When the blank is found ( '' ) merge the three list into two lists.
I need to write a function to do this for me.
Here is what I tried:
for x in mylist:
if x[len(x) - 1] == '':
m = x[len(x) - 2]
for y in mylist:
if y[0] == m:
combine(x, y)
def combine(x, y):
for m in y:
if not m in x:
x.append(m)
return(x)
but its not working the way I want.

try this :
mylist = [['a','b','c',''],['c','e','f'],['c','g','h']]
def combine(x, y):
for m in y:
if not m in x:
x.append(m)
return(x)
result = []
for x in mylist:
if x[len(x) - 1] == '':
m = x[len(x) - 2]
for y in mylist:
if y[0] == m:
result.append(combine(x[0:len(x)-2], y))
print(result)
your problem was with
combine(x[0:len(x)-2], y)
output :
[['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]

So you basically want to merge 2 lists? If so, you can use one of 2 ways :
Either use the + operator, or use the
extend() method.
And then you put it into a function.

I made it with standard library only with comments. Please refer it.
mylist = [['a','b','c',''],['c','e','f'],['c','g','h']]
# I can't make sure whether the xlist's item is just one or not.
# So, I made it to find all
# And, you can see how to get the last value of a list as [-1]
xlist = [x for x in mylist if x[-1] == '']
ylist = [x for x in mylist if x[-1] != '']
result = []
# combine matrix of x x y
for x in xlist:
for y in ylist:
c = x + y # merge
c = [i for i in c if i] # drop ''
c = list(set(c)) # drop duplicates
c.sort() # sort
result.append(c) # add to result
print (result)
The result is
[['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]

Your code almost works, except you never do anything with the result of combine (print it, or add it to some result list), and you do not remove the '' element. However, for a longer list, this might be a bit slow, as it has quadratic complexity O(n²).
Instead, you can use a dictionary to map first elements to the remaining elements of the lists. Then you can use a loop or list comprehension to combine the lists with the right suffixes:
lst = [['a','b','c',''],['c','e','f'],['c','g','h']]
import collections
replacements = collections.defaultdict(list)
for first, *rest in lst:
replacements[first].append(rest)
result = [l[:-2] + c for l in lst if l[-1] == "" for c in replacements[l[-2]]]
# [['a', 'b', 'c', 'e', 'f'], ['a', 'b', 'c', 'g', 'h']]
If the list can have more than one placeholder '', and if those can appear in the middle of the list, then things get a bit more complicated. You could make this a recursive function. (This could be made more efficient by using an index instead of repeatedly slicing the list.)
def replace(lst, last=None):
if lst:
first, *rest = lst
if first == "":
for repl in replacements[last]:
yield from replace(repl + rest)
else:
for res in replace(rest, first):
yield [first] + res
else:
yield []
for l in lst:
for x in replace(l):
print(x)
Output for lst = [['a','b','c','','b',''],['c','b','','e','f'],['c','g','b',''],['b','x','y']]:
['a', 'b', 'c', 'b', 'x', 'y', 'e', 'f', 'b', 'x', 'y']
['a', 'b', 'c', 'g', 'b', 'x', 'y', 'b', 'x', 'y']
['c', 'b', 'x', 'y', 'e', 'f']
['c', 'g', 'b', 'x', 'y']
['b', 'x', 'y']

try my solution
although it will change the order of list but it's quite simple code
lst = [['a', 'b', 'c', ''], ['c', 'e', 'f'], ['c', 'g', 'h']]
lst[0].pop(-1)
print([list(set(lst[0]+lst[1])), list(set(lst[0]+lst[2]))])

Find all substrings in a string in python 3 with brute-force

I want to find all substrings 'A' to 'B' in L = ['C', 'A', 'B', 'A', 'A', 'X', 'B', 'Y', 'A'] with bruteforce, this is what i've done:
def find_substring(L):
t = 0
s = []
for i in range(len(L) - 1):
l = []
if ord(L[i]) == 65:
for j in range(i, len(L)):
l.append(L[j])
if ord(L[j]) == 66:
t = t + 1
s.append(l)
return s, t
Now I want the output:
[['A','B'], ['A','B','A','A','X','B'], ['A','A','X','B'], ['A','X','B']]
But i get:
[['A','B','A','A','X','B','Y','A'],['A','B','A','A','X','B','Y','A'],['A','A','X','B','Y','A'],['A','X','B','Y','A']]
Can someone tell me what I'm doing wrong?

The problem is that the list s, holds references to the l lists.
So even though you are appending the correct l lists to s, they are changed after being appended as the future iterations of the j loop modify the l lists.
You can fix this by appending a copy of the l list: l[:].
Also, you can compare strings directly, no need to convert to ASCII.
def find_substring(L):
s = []
for i in range(len(L) - 1):
l = []
if L[i] == 'A':
for j in range(i, len(L)):
l.append(L[j])
if L[j] == 'B':
s.append(l[:])
return s
which now works:
>>> find_substring(['C', 'A', 'B', 'A', 'A', 'X', 'B', 'Y', 'A'])
[['A', 'B'], ['A', 'B', 'A', 'A', 'X', 'B'], ['A', 'A', 'X', 'B'], ['A', 'X', 'B']]

When you append l to s, you are adding a reference to a list which you then continue to grow. You want to append a copy of the l list's contents at the time when you append, to keep it static.
s.append(l[:])
This is a common FAQ; this question should probably be closed as a duplicate.

You would be better first finding all indices of 'A' and 'B', then iterating over those, avoiding brute force.
def find_substrings(lst)
idx_A = [i for i, c in enumerate(lst) if c == 'A']
idx_B = [i for i, c in enumerate(lst) if c == 'B']
return [lst[i:j+1] for i in idx_A for j in idx_B if j > i]

You can reset l to a copy of the string after l is appended l = l[:] right after the last append.

So, you want all the substrings that start with 'A' and end with 'B'?
When you use #Joeidden's code you can change need the for i in range(len(L) - 1): to for i in range(len(L)): because only strings that end with 'B' will be appended to s.
def find_substring(L):
s = []
for i in range(len(L)):
l = []
if L[i] == 'A':
for j in range(i, len(L)):
l.append(L[j])
if L[j] == 'B':
s.append(l[:])
return s

Another slightly different approach would be this:
L = ['C', 'A', 'B', 'A', 'A', 'X', 'B', 'Y', 'A']
def find_substring(L):
output = []
# Start searching for A.
for i in range(len(L)):
# If you found one start searching all B's until you reach the end.
if L[i]=='A':
for j in range(i,len(L),1):
# If you found a B, append the sublist from i index to j+1 index (positions of A and B respectively).
if L[j]=='B':
output.append(L[i:j+1])
return output
result = find_substring(L)
print(result)
Output:
[['A', 'B'], ['A', 'B', 'A', 'A', 'X', 'B'], ['A', 'A', 'X', 'B'], ['A', 'X', 'B']]
In case you need a list comprehension of the above:
def find_substring(L):
output = [L[i:j+1] for i in range(len(L)) for j in range(i,len(L),1) if L[i]=='A' and L[j]=='B']
return output

How can I group a list of objects by continuity?

Given a very large (gigabytes) list of arbitrary objects (I've seen a similar solution to this for ints), can I either group it easily into sublists by equivalence? Either in-place or by generator which consumes the original list.
l0 = [A,B, A,B,B, A,B,B,B,B, A, A, A,B] #spaces for clarity
Desired result:
[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]
I wrote a looping version like so:
#find boundaries
b0 = []
prev = A
group = A
for idx, elem in enumerate(l0):
if elem == group:
b0.append(idx)
prev = elem
b0.append(len(l0)-1)
for idx, b in enumerate(b0):
try:
c = b0[idx+1]
except:
break
if c == len(l0)-1:
l1.append(l0[b:])
else:
l1.append(l0[b:c])
Can this be done as a generator gen0(l) that will work like:
for g in gen(l0):
print g
....
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
....
etc?
EDIT: using python 2.6 or 2.7
EDIT: preferred solution, mostly based on the accepted answer:
def gen_group(f, items):
out = [items[0]]
while items:
for elem in items[1:]:
if f(elem, out[0]):
break
else:
out.append(elem)
for _i in out:
items.pop(0)
yield out
if items:
out = [items[0]]
g = gen_group(lambda x, y: x == y, l0)
for out in g:
print out

Maybe something like this:
def subListGenerator(f,items):
i = 0
n = len(items)
while i < n:
sublist = [items[i]]
i += 1
while i < n and not f(items[i]):
sublist.append(items[i])
i += 1
yield sublist
Used like:
>>> items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
>>> g = subListGenerator(lambda x: x == 'A',items)
>>> for x in g: print(x)
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']

I assume that A is your breakpoint.
>>> A, B = 'A', 'B'
>>> x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
>>> map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))
[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]

Here's a simple generator to perform your task:
def gen_group(L):
DELIMETER = "A"
out = [DELIMETER]
while L:
for ind, elem in enumerate(L[1:]):
if elem == DELIMETER :
break
else:
out.append(elem)
for i in range(ind + 1):
L.pop(0)
yield out
out = [DELIMETER ]
The idea is to cut down the list and yield the sublists until there is nothing left. This assumes the list starts with "A" (DELIMETER variable).
Sample output:
for out in gen_group(l0):
print out
Produces
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']
['A']
Comparitive Timings:
timeit.timeit(s, number=100000) is used to test each of the current answers, where s is the multiline string of the code (listed below):
Trial 1 Trial 2 Trial 3 Trial 4 | Avg
This answer (s1): 0.08247 0.07968 0.08635 0.07133 0.07995
Dilara Ismailova (s2): 0.77282 0.72337 0.73829 0.70574 0.73506
John Coleman (s3): 0.08119 0.09625 0.08405 0.08419 0.08642
This answer is the fastest, but it is very close. I suspect the difference is the additional argument and anonymous function in John Coleman's answer.
s1="""l0 = ["A","B", "A","B","B", "A","B","B","B","B", "A", "A", "A","B"]
def gen_group(L):
out = ["A"]
while L:
for ind, elem in enumerate(L[1:]):
if elem == "A":
break
else:
out.append(elem)
for i in range(ind + 1):
L.pop(0)
yield out
out = ["A"]
out =gen_group(l0)"""
s2 = """A, B = 'A', 'B'
x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))"""
s3 = """def subListGenerator(f,items):
i = 0
n = len(items)
while i < n:
sublist = [items[i]]
i += 1
while i < n and not f(items[i]):
sublist.append(items[i])
i += 1
yield sublist
items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
g = subListGenerator(lambda x: x == 'A',items)"""

The following works in this case. You could change the l[0] != 'A' condition to be whatever. I would probably pass it as an argument, so that you can reuse it somewhere else.
def gen(l_arg, boundary):
l = l_arg.copy() # Optional if you want to save memory
while l:
sub_list = [l.pop(0)]
while l and l[0] != boundary: # Here boundary = 'A'
sub_list.append(l.pop(0))
yield sub_list
It assumes that there is an 'A' at the beginning of your list. And it copies the list, which isn't great when the list is in the range of Gb. you could remove the copy to save memory if you don't care about keeping the original list.

Python: find all possible word combinations with a sequence of characters (word segmentation)

I'm doing some word segmentation experiments like the followings.
lst is a sequence of characters, and output is all the possible words.
lst = ['a', 'b', 'c', 'd']
def foo(lst):
...
return output
output = [['a', 'b', 'c', 'd'],
['ab', 'c', 'd'],
['a', 'bc', 'd'],
['a', 'b', 'cd'],
['ab', 'cd'],
['abc', 'd'],
['a', 'bcd'],
['abcd']]
I've checked combinations and permutations in itertools library,
and also tried combinatorics.
However, it seems that I'm looking at the wrong side because this is not pure permutation and combinations...
It seems that I can achieve this by using lots of loops, but the efficiency might be low.
EDIT
The word order is important so combinations like ['ba', 'dc'] or ['cd', 'ab'] are not valid.
The order should always be from left to right.
EDIT
#Stuart's solution doesn't work in Python 2.7.6
EDIT
#Stuart's solution does work in Python 2.7.6, see the comments below.

itertools.product should indeed be able to help you.
The idea is this:-
Consider A1, A2, ..., AN separated by slabs. There will be N-1 slabs.
If there is a slab there is a segmentation. If there is no slab, there is a join.
Thus, for a given sequence of length N, you should have 2^(N-1) such combinations.
Just like the below
import itertools
lst = ['a', 'b', 'c', 'd']
combinatorics = itertools.product([True, False], repeat=len(lst) - 1)
solution = []
for combination in combinatorics:
i = 0
one_such_combination = [lst[i]]
for slab in combination:
i += 1
if not slab: # there is a join
one_such_combination[-1] += lst[i]
else:
one_such_combination += [lst[i]]
solution.append(one_such_combination)
print solution

#!/usr/bin/env python
from itertools import combinations
a = ['a', 'b', 'c', 'd']
a = "".join(a)
cuts = []
for i in range(0,len(a)):
cuts.extend(combinations(range(1,len(a)),i))
for i in cuts:
last = 0
output = []
for j in i:
output.append(a[last:j])
last = j
output.append(a[last:])
print(output)
output:
zsh 2419 % ./words.py
['abcd']
['a', 'bcd']
['ab', 'cd']
['abc', 'd']
['a', 'b', 'cd']
['a', 'bc', 'd']
['ab', 'c', 'd']
['a', 'b', 'c', 'd']

There are 8 options, each mirroring the binary numbers 0 through 7:
000
001
010
011
100
101
110
111
Each 0 and 1 represents whether or not the 2 letters at that index are "glued" together. 0 for no, 1 for yes.
>>> lst = ['a', 'b', 'c', 'd']
... output = []
... formatstr = "{{:0{}.0f}}".format(len(lst)-1)
... for i in range(2**(len(lst)-1)):
... output.append([])
... s = "{:b}".format(i)
... s = str(formatstr.format(float(s)))
... lstcopy = lst[:]
... for j, c in enumerate(s):
... if c == "1":
... lstcopy[j+1] = lstcopy[j] + lstcopy[j+1]
... else:
... output[-1].append(lstcopy[j])
... output[-1].append(lstcopy[-1])
... output
[['a', 'b', 'c', 'd'],
['a', 'b', 'cd'],
['a', 'bc', 'd'],
['a', 'bcd'],
['ab', 'c', 'd'],
['ab', 'cd'],
['abc', 'd'],
['abcd']]
>>>

You can use a recursive generator:
def split_combinations(L):
for split in range(1, len(L)):
for combination in split_combinations(L[split:]):
yield [L[:split]] + combination
yield [L]
print (list(split_combinations('abcd')))
Edit. I'm not sure how well this would scale up for long strings and at what point it hits Python's recursion limit. Similarly to some of the other answers, you could also use combinations from itertools to work through every possible combination of split-points.
def split_string(s, t):
return [s[start:finish] for start, finish in zip((None, ) + t, t + (None, ))]
def split_combinations(s):
for i in range(len(s)):
for split_points in combinations(range(1, len(s)), i):
yield split_string(s, split_points)
These both seem to work as intended in Python 2.7 (see here) and Python 3.2 (here). As #twasbrillig says, make sure you indent it as shown.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generating n-grams from a string - python

def n_grams(word,max_size): i=1 output=[] while i<= max_size: index = 0 innerArray=[] while index < len(word)-i+1: innerArray.append(word[index:index+i]) index+=1 i+=1 output.append(innerArray) innerArray=[] return tuple(output) print(n_grams("abcd",3))

Related

Split a string into chunks of substrings with successively increasing length

complete list if the first and last element is equal

Find all substrings in a string in python 3 with brute-force

How can I group a list of objects by continuity?

Python: find all possible word combinations with a sequence of characters (word segmentation)

Categories

Resources