Related
This question already has answers here:
How To Get All The Contiguous Substrings Of A String In Python?
(9 answers)
Closed 5 years ago.
I have a string "BANANA". I would like to generate a list of all possible sequential substrings:
[B, BA, BAN, BANA, BANAN, BANANA, A, AN, ANA, ...]
Is this something I can accomplish using a Python List Comprehension or would I just generate them in a brute force manner? Note: I am new to Python. TIA
Using list Comprehension:
s = "BANANA"
l = len(s)
ar = [s[j:] for i in range(l) for j in range(i,l)]
print(*ar)
Using nested loop:
s = "BANANA"
l = len(s)
ar = []
for i in range(l):
for j in range(i,l):
ar.append(s[j:])
print(*ar)
Both output:
BANANA ANANA NANA ANA NA A ANANA NANA ANA NA A NANA ANA NA A ANA NA A NA A A
N.B.: The itertools has already been explained in A.J.'s answer.
Try the following with itertools:
str = "BANANA"
all = [[''.join(j) for j in itertools.product(str, repeat=i)] for i in range(1, len(str)+1)]
>>> all[0]
['B', 'A', 'N', 'A', 'N', 'A']
>>> all[1]
['BB', 'BA', 'BN', 'BA', 'BN', 'BA', 'AB', 'AA', 'AN', 'AA', 'AN', 'AA', 'NB', 'NA', 'NN', 'NA', 'NN', 'NA', 'AB', 'AA', 'AN', 'AA', 'AN', 'AA', 'NB', 'NA', 'NN', 'NA', 'NN', 'NA', 'AB', 'AA', 'AN', 'AA', 'AN', 'AA']
>>>
If you want all the posible sublist, you can use two for in one list comprehension:
def sublists(lst):
return [lst[m:n+1] for m in range(0,len(lst)+1) for n in range(m,len(lst)+1)]
sublists("banana")
=> ['b', 'ba', 'ban', 'bana', 'banan', 'banana', 'banana', 'a', 'an', 'ana', 'anan', 'anana', 'anana', 'n', 'na', 'nan', 'nana', 'nana', 'a', 'an', 'ana', 'ana', 'n', 'na', 'na', 'a', 'a', '']
if you dont want repeated elements:
def sublistsWithoutRepeated(lst):
return list(set(sublists(lst)))
sublistsWithoutRepeated("banana")
=> ['a', '', 'b', 'ba', 'nana', 'na', 'nan', 'an', 'anana', 'anan', 'n', 'bana', 'ban', 'banan', 'banana', 'ana']
I am trying to increment through all of the possible base-n numbers, where numbers are represented by a list of chars.
For example,
For base-5 numbers (where n = 5) limited to 4 places, and the base 5 numbers are represented by the list:
digits=['a','b','c','d','e']
incrementation would look like
a, b, c, d, e, aa, ab, ac, ad, ae, ba, bb, bc, ... , eeee
What is the most pragmatic approach in python to do this where n=5 or n=105
You can get the result with itertools.product, like this
>>> from itertools import product
>>> base = 3
>>> ["".join(item) for i in range(1, base) for item in product('abcde', repeat=i)])
['a',
'b',
'c',
'd',
'e',
'aa',
'ab',
'ac',
'ad',
'ae',
'ba',
'bb',
'bc',
'bd',
'be',
'ca',
'cb',
'cc',
'cd',
'ce',
'da',
'db',
'dc',
'dd',
'de',
'ea',
'eb',
'ec',
'ed',
'ee']
What is the most pragmatic approach in python to do this where n=5 or n=105
I would say, don't create the list at all. You might exhaust the computer's memory. Better use the iterator and use the value as and when you need it. That is exactly why product returns an iterator.
I have the following letters:
Letters = ["a", "b", "c", "d", "e"]
What I would like is to write a generator function that will create strings that can be formed by taking a combination of any of the letters, preferably in some deterministic order like from smallest to biggest.
So for example if I were to run the generator 20 times I would get
a
b
c
d
e
aa
ab
ac
ad
ae
ba
bb
bc
bd
be
ca
cb
cc
cd
ce
da
How would I write this generator?
Generator function:
from itertools import *
def wordgen(letters):
for n in count(1):
yield from map(''.join, product(letters, repeat=n))
Usage:
for word in wordgen('abcde'):
print(word)
Output:
a
b
c
d
e
aa
ab
ac
ad
ae
ba
bb
bc
bd
be
ca
...
A self-made alternative without using itertools:
def wordgen(letters):
yield from letters
for word in wordgen(letters):
for letter in letters:
yield word + letter
Golf-version (admittedly starts with the empty string):
def w(s):yield'';yield from(w+c for w in w(s)for c in s)
Use the combinations functions from the itertools library. There's both combinations with replacement and without replacement
for item in itertools.combinations(Letters, 2):
print("".join(item))
https://docs.python.org/3.4/library/itertools.html
Use itertools.product():
from itertools import product, imap
letters = ["a", "b", "c", "d", "e"]
letters += imap(''.join, product(letters, repeat=2))
print letters
['a', 'b', 'c', 'd', 'e', 'aa', 'ab', 'ac', 'ad', 'ae', 'ba', 'bb', 'bc', 'bd', 'be', 'ca', 'cb', 'cc', 'cd', 'ce', 'da', 'db', 'dc', 'dd', 'de', 'ea', 'eb', 'ec', 'ed', 'ee']
I use a recursive generator function (without itertools)
Letters = ["a", "b", "c", "d", "e"]
def my_generator(list, first=""):
for letter in list:
yield first + letter
my_generators = []
for letter in list:
my_generators.append(my_generator(list, first + letter))
i = 0
while True:
for j in xrange(len(list)**(i/len(list)+1)):
yield next(my_generators[i%len(list)])
i+=1
gen = my_generator(Letters)
[next(gen) for c in xrange(160)]
you get
['a', 'b', 'c', 'd', 'e', 'aa', 'ab', 'ac', 'ad', 'ae', 'ba', 'bb',
'bc', 'bd', 'be', 'ca', 'cb', 'cc', 'cd', 'ce', 'da', 'db', 'dc',
'dd', 'de', 'ea', 'eb', 'ec', 'ed', 'ee', 'aaa', 'aab', 'aac', 'aad',
'aae', 'aba', 'abb', 'abc', 'abd', 'abe', 'aca', 'acb', 'acc', 'acd',
'ace', 'ada', 'adb', 'adc', 'add', 'ade', 'aea', 'aeb', 'aec', 'aed',
'aee', 'baa', 'bab', 'bac', 'bad', 'bae', 'bba', 'bbb', 'bbc', 'bbd',
'bbe', 'bca', 'bcb', 'bcc', 'bcd', 'bce', 'bda', 'bdb', 'bdc', 'bdd',
'bde', 'bea', 'beb', 'bec', 'bed', 'bee', 'caa', 'cab', 'cac', 'cad',
'cae', 'cba', 'cbb', 'cbc', 'cbd', 'cbe', 'cca', 'ccb', 'ccc', 'ccd',
'cce', 'cda', 'cdb', 'cdc', 'cdd', 'cde', 'cea', 'ceb', 'cec', 'ced',
'cee', 'daa', 'dab', 'dac', 'dad', 'dae', 'dba', 'dbb', 'dbc', 'dbd',
'dbe', 'dca', 'dcb', 'dcc', 'dcd', 'dce', 'dda', 'ddb', 'ddc', 'ddd',
'dde', 'dea', 'deb', 'dec', 'ded', 'dee', 'eaa', 'eab', 'eac', 'ead',
'eae', 'eba', 'ebb', 'ebc', 'ebd', 'ebe', 'eca', 'ecb', 'ecc', 'ecd',
'ece', 'eda', 'edb', 'edc', 'edd', 'ede', 'eea', 'eeb', 'eec', 'eed',
'eee', 'aaaa', 'aaab', 'aaac', 'aaad', 'aaae']
I am trying to do the following. The outer product of an array [a,b; c,d] with itself can be described as a 4x4 array of 'strings' of length 2. So in the upper left corner of the 4x4 matrix, the values are aa, ab, ac, ad. What's the best way to generate these strings in numpy/python or matlab?
This is an example for just one outer product. The goal is to handle k successive outer products, that is the 4x4 matrix can be multiplied again by [a,b; c,d] and so on.
You can obtain #Jaime's result in a much simpler way using np.char.array():
a = np.char.array(list('abcd'))
print(a[:,None]+a)
which gives:
chararray([['aa', 'ab', 'ac', 'ad'],
['ba', 'bb', 'bc', 'bd'],
['ca', 'cb', 'cc', 'cd'],
['da', 'db', 'dc', 'dd']],
dtype='|S2')
Using a funky mix of itertools and numpy you could do:
>>> from itertools import product
>>> s = 'abcd' # s = ['a', 'b', 'c', 'd'] works the same
>>> np.fromiter((a+b for a, b in product(s, s)), dtype='S2',
count=len(s)*len(s)).reshape(len(s), len(s))
array([['aa', 'ab', 'ac', 'ad'],
['ba', 'bb', 'bc', 'bd'],
['ca', 'cb', 'cc', 'cd'],
['da', 'db', 'dc', 'dd']],
dtype='|S2')
You can also avoid using numpy getting a little creative with itertools:
>>> from itertools import product, islice
>>> it = (a+b for a, b in product(s, s))
>>> [list(islice(it, len(s))) for j in xrange(len(s))]
[['aa', 'ab', 'ac', 'ad'],
['ba', 'bb', 'bc', 'bd'],
['ca', 'cb', 'cc', 'cd'],
['da', 'db', 'dc', 'dd']]
You could use list comprehensions in Python:
array = [['a', 'b'], ['c', 'd']]
flatarray = [ x for row in array for x in row]
outerproduct = [[y+x for x in flatarray] for y in flatarray]
Output: [['aa', 'ab', 'ac', 'ad'], ['ba', 'bb', 'bc', 'bd'], ['ca', 'cb', 'cc', 'cd'], ['da', 'db', 'dc', 'dd']]
To continue the discussion after Jose Varz's answer:
def foo(A,B):
flatA [x for row in A for x in row],
flatB = [x for row in B for x in row]
outer = [[y+x for x in flatA] for y in flatB]
return outer
In [265]: foo(A,A)
Out[265]:
[['aa', 'ab', 'ac', 'ad'],
['ba', 'bb', 'bc', 'bd'],
['ca', 'cb', 'cc', 'cd'],
['da', 'db', 'dc', 'dd']]
In [268]: A3=np.array(foo(foo(A,A),A))
In [269]: A3
Out[269]:
array([['aaa', 'aab', 'aac', 'aad', 'aba', 'abb', 'abc', 'abd', 'aca',
'acb', 'acc', 'acd', 'ada', 'adb', 'adc', 'add'],
['baa', 'bab', 'bac', 'bad', 'bba', 'bbb', 'bbc', 'bbd', 'bca',
'bcb', 'bcc', 'bcd', 'bda', 'bdb', 'bdc', 'bdd'],
['caa', 'cab', 'cac', 'cad', 'cba', 'cbb', 'cbc', 'cbd', 'cca',
'ccb', 'ccc', 'ccd', 'cda', 'cdb', 'cdc', 'cdd'],
['daa', 'dab', 'dac', 'dad', 'dba', 'dbb', 'dbc', 'dbd', 'dca',
'dcb', 'dcc', 'dcd', 'dda', 'ddb', 'ddc', 'ddd']],
dtype='|S3')
In [270]: A3.reshape(4,4,4)
Out[270]:
array([[['aaa', 'aab', 'aac', 'aad'],
['aba', 'abb', 'abc', 'abd'],
['aca', 'acb', 'acc', 'acd'],
['ada', 'adb', 'adc', 'add']],
[['baa', 'bab', 'bac', 'bad'],
['bba', 'bbb', 'bbc', 'bbd'],
['bca', 'bcb', 'bcc', 'bcd'],
['bda', 'bdb', 'bdc', 'bdd']],
[['caa', 'cab', 'cac', 'cad'],
['cba', 'cbb', 'cbc', 'cbd'],
['cca', 'ccb', 'ccc', 'ccd'],
['cda', 'cdb', 'cdc', 'cdd']],
[['daa', 'dab', 'dac', 'dad'],
['dba', 'dbb', 'dbc', 'dbd'],
['dca', 'dcb', 'dcc', 'dcd'],
['dda', 'ddb', 'ddc', 'ddd']]],
dtype='|S3')
With this definition, np.array(foo(A,foo(A,A))).reshape(4,4,4) produces the same array.
In [285]: A3.reshape(8,8)
Out[285]:
array([['aaa', 'aab', 'aac', 'aad', 'aba', 'abb', 'abc', 'abd'],
['aca', 'acb', 'acc', 'acd', 'ada', 'adb', 'adc', 'add'],
['baa', 'bab', 'bac', 'bad', 'bba', 'bbb', 'bbc', 'bbd'],
['bca', 'bcb', 'bcc', 'bcd', 'bda', 'bdb', 'bdc', 'bdd'],
['caa', 'cab', 'cac', 'cad', 'cba', 'cbb', 'cbc', 'cbd'],
['cca', 'ccb', 'ccc', 'ccd', 'cda', 'cdb', 'cdc', 'cdd'],
['daa', 'dab', 'dac', 'dad', 'dba', 'dbb', 'dbc', 'dbd'],
['dca', 'dcb', 'dcc', 'dcd', 'dda', 'ddb', 'ddc', 'ddd']],
dtype='|S3')
Could it be that you want the Kronecker product of two char.arrays?
A quick adaptation of np.kron (numpy/lib/shape_base.py):
def outer(a,b):
# custom 'outer' for this issue
# a,b must be np.char.array for '+' to be defined
return a.ravel()[:, np.newaxis]+b.ravel()[np.newaxis,:]
def kron(a,b):
# assume a,b are 2d char array
# functionally same as np.kron, but using custom outer()
result = outer(a, b).reshape(a.shape+b.shape)
result = np.hstack(np.hstack(result))
result = np.char.array(result)
return result
A = np.char.array(list('abcd')).reshape(2,2)
produces:
A =>
[['a' 'b']
['c' 'd']]
outer(A,A) =>
[['aa' 'ab' 'ac' 'ad']
['ba' 'bb' 'bc' 'bd']
['ca' 'cb' 'cc' 'cd']
['da' 'db' 'dc' 'dd']]
kron(A,A) =>
[['aa' 'ab' 'ba' 'bb']
['ac' 'ad' 'bc' 'bd']
['ca' 'cb' 'da' 'db']
['cc' 'cd' 'dc' 'dd']]
kron rearranges the outer elements by reshaping it to (2,2,2,2), and then concatenating twice on axis=1.
kron(kron(A,A),A) =>
[['aaa' 'aab' 'aba' 'abb' 'baa' 'bab' 'bba' 'bbb']
['aac' 'aad' 'abc' 'abd' 'bac' 'bad' 'bbc' 'bbd']
['aca' 'acb' 'ada' 'adb' 'bca' 'bcb' 'bda' 'bdb']
['acc' 'acd' 'adc' 'add' 'bcc' 'bcd' 'bdc' 'bdd']
['caa' 'cab' 'cba' 'cbb' 'daa' 'dab' 'dba' 'dbb']
['cac' 'cad' 'cbc' 'cbd' 'dac' 'dad' 'dbc' 'dbd']
['cca' 'ccb' 'cda' 'cdb' 'dca' 'dcb' 'dda' 'ddb']
['ccc' 'ccd' 'cdc' 'cdd' 'dcc' 'dcd' 'ddc' 'ddd']]
kron(kron(kron(A,A),A),A) =>
# (16,16)
[['aaaa' 'aaab' 'aaba' 'aabb'...]
['aaac' 'aaad' 'aabc' 'aabd'...]
['aaca' 'aacb' 'aada' 'aadb'...]
['aacc' 'aacd' 'aadc' 'aadd'...]
...]
I don't know of a better way to word what I'm looking for, so please bear with me.
Let's say that I have a list of 17 elements. For the sake of brevity we'll represent this list as ABCDEFGHIJKLMNOPQ. If I wanted to divide this into 7 sufficiently "even" sub-lists, it might look like this:
ABC DE FGH IJ KL MNO PQ
Here, the lengths of each sub-list are 3, 2, 3, 2, 2, 3, 2. The maximum length is only one more than the minimum length: ABC DE FGH I JKL MN OPQ has seven sub-lists as well, but the range of lengths is two here.
Furthermore, examine how many 2's separate each pair of 3's: this follows the same rule of RANGE ≤ 1. The range of lengths in ABC DEF GH IJ KLM NO PQ is 1 as well, but they are imbalanced: 3, 3, 2, 2, 3, 2, 2. Ideally, if one were to keep reducing the sub-list in such a fashion, the numbers would never deviate from one another by more than one.
Of course, there is more than one way to "evenly" divide a list into sub-lists in this fashion. I'm not looking for an exhaustive set of solutions - if I can get one solution in Python for a list of any length and any number of sub-lists, that's good enough for me. The problem is that I don't even know where to begin when solving such a problem. Does anyone know what I'm looking for?
>>> s='ABCDEFGHIJKLMNOPQ'
>>> parts=7
>>> [s[i*len(s)//parts:(i+1)*len(s)//parts] for i in range(parts)]
['AB', 'CD', 'EFG', 'HI', 'JKL', 'MN', 'OPQ']
>>> import string
>>> for j in range(26):
... print [string.uppercase[i*j//parts:(i+1)*j//parts] for i in range(parts)]
...
['', '', '', '', '', '', '']
['', '', '', '', '', '', 'A']
['', '', '', 'A', '', '', 'B']
['', '', 'A', '', 'B', '', 'C']
['', 'A', '', 'B', '', 'C', 'D']
['', 'A', 'B', '', 'C', 'D', 'E']
['', 'A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E', 'F', 'G']
['A', 'B', 'C', 'D', 'E', 'F', 'GH']
['A', 'B', 'C', 'DE', 'F', 'G', 'HI']
['A', 'B', 'CD', 'E', 'FG', 'H', 'IJ']
['A', 'BC', 'D', 'EF', 'G', 'HI', 'JK']
['A', 'BC', 'DE', 'F', 'GH', 'IJ', 'KL']
['A', 'BC', 'DE', 'FG', 'HI', 'JK', 'LM']
['AB', 'CD', 'EF', 'GH', 'IJ', 'KL', 'MN']
['AB', 'CD', 'EF', 'GH', 'IJ', 'KL', 'MNO']
['AB', 'CD', 'EF', 'GHI', 'JK', 'LM', 'NOP']
['AB', 'CD', 'EFG', 'HI', 'JKL', 'MN', 'OPQ']
['AB', 'CDE', 'FG', 'HIJ', 'KL', 'MNO', 'PQR']
['AB', 'CDE', 'FGH', 'IJ', 'KLM', 'NOP', 'QRS']
['AB', 'CDE', 'FGH', 'IJK', 'LMN', 'OPQ', 'RST']
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STU']
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STUV']
['ABC', 'DEF', 'GHI', 'JKLM', 'NOP', 'QRS', 'TUVW']
['ABC', 'DEF', 'GHIJ', 'KLM', 'NOPQ', 'RST', 'UVWX']
['ABC', 'DEFG', 'HIJ', 'KLMN', 'OPQ', 'RSTU', 'VWXY']
If you have a list of length N, and you want some number of sub-lists S, it seems to me that you should start with a division with remainder. For N == 17 and S == 7, you have 17 // 7 == 2 and 17 % 7 == 3. So you can start with 7 length values of 2, but know that you need to increment 3 of the length values by 1 to handle the remainder. Since your list of length values is length 7, and you have 3 values to increment, you could compute X = 7 / 3 and use that as a stride: increment the 0th item, then the int(X) item, the int(2*X) item, and so on.
If that doesn't work for you, I suggest you get a book called The Algorithm Design Manual by Skiena, and look through the set and tree algorithms.
http://www.algorist.com/
See the "grouper" example at http://docs.python.org/library/itertools.html