Related
Here is the string :
st = 'abcdfdedf'
I want to split the string in 3 sub string and get all the combinations as such:
result = [['abc','dfd','edf'],['abcd','f','dedf'], ...]
Note: Order is not important here.
Unless I misunderstand the question, it is a simple matter of keeping track of the two split positions. For example:
from pprint import pprint
st = 'abcdfdedf'
n = len(st)
splits = []
for i in range(1, n-1):
for j in range(i+1, n):
splits.append([st[0:i], st[i:j], st[j:]])
pprint(splits)
which results in
[['a', 'b', 'cdfdedf'],
['a', 'bc', 'dfdedf'],
['a', 'bcd', 'fdedf'],
['a', 'bcdf', 'dedf'],
['a', 'bcdfd', 'edf'],
['a', 'bcdfde', 'df'],
['a', 'bcdfded', 'f'],
['ab', 'c', 'dfdedf'],
['ab', 'cd', 'fdedf'],
['ab', 'cdf', 'dedf'],
['ab', 'cdfd', 'edf'],
['ab', 'cdfde', 'df'],
['ab', 'cdfded', 'f'],
['abc', 'd', 'fdedf'],
['abc', 'df', 'dedf'],
['abc', 'dfd', 'edf'],
['abc', 'dfde', 'df'],
['abc', 'dfded', 'f'],
['abcd', 'f', 'dedf'],
['abcd', 'fd', 'edf'],
['abcd', 'fde', 'df'],
['abcd', 'fded', 'f'],
['abcdf', 'd', 'edf'],
['abcdf', 'de', 'df'],
['abcdf', 'ded', 'f'],
['abcdfd', 'e', 'df'],
['abcdfd', 'ed', 'f'],
['abcdfde', 'd', 'f']]
I need a list with 6 unique elements, like 000001, 000002, 000003 etc. It isn't neccessary have to be in digits, it can be a string, like AAAAAA, AAAAAB, ABCDEF etc.
If I generate a list with np.arange() I won't have 6-dimensional elements. I only decided to use 'for' cicles like
but I think there are a lot of more convenient ways to do this.
You need a cartesian product of the string "ABCDEF" by itself, taken five times (in other words, the product of six identical strings). It can be calculated using product() function from module itertools. The result of the product is a list of 6-tuples of individual characters. The tuples are converted to strings with join().
from itertools import product
symbols = "ABCDEF"
[''.join(x) for x in product(*([symbols] * len(symbols)))]
#['AAAAAA', 'AAAAAB', 'AAAAAC', 'AAAAAD', 'AAAAAE',
# 'AAAAAF', 'AAAABA', 'AAAABB', 'AAAABC', 'AAAABD',...
# 'FFFFFA', 'FFFFFB', 'FFFFFC', 'FFFFFD', 'FFFFFE', 'FFFFFF']
You can change the value of symbols to any other combination of distinct characters.
You can use the function combinations_with_replacement():
from itertools import combinations_with_replacement
list(map(''.join, combinations_with_replacement('ABC', r=3)))
# ['AAA', 'AAB', 'AAC', 'ABB', 'ABC', 'ACC', 'BBB', 'BBC', 'BCC', 'CCC']
If you need all possible combinations use the function product():
from itertools import product
list(map(''.join, product('ABC', repeat=3)))
# ['AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ABC', 'ACA', 'ACB', 'ACC', 'BAA', 'BAB', 'BAC', 'BBA', 'BBB', 'BBC', 'BCA', 'BCB', 'BCC', 'CAA', 'CAB', 'CAC', 'CBA', 'CBB', 'CBC', 'CCA', 'CCB', 'CCC']
You can use np.unravel_index to get an index array:
idx = np.array(np.unravel_index(np.arange(30000), 6*(6,)), order='F').T
idx
# array([[0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 1],
# [0, 0, 0, 0, 0, 2],
# ...,
# [3, 5, 0, 5, 1, 3],
# [3, 5, 0, 5, 1, 4],
# [3, 5, 0, 5, 1, 5]])
You can replace the indices with more or less anything you like afterwards:
symbols = np.fromiter('ABCDEF', 'U1')
symbols
# array(['A', 'B', 'C', 'D', 'E', 'F'], dtype='<U1')
symbols[idx]
# array([['A', 'A', 'A', 'A', 'A', 'A'],
# ['A', 'A', 'A', 'A', 'A', 'B'],
# ['A', 'A', 'A', 'A', 'A', 'C'],
# ...,
# ['D', 'F', 'A', 'F', 'B', 'D'],
# ['D', 'F', 'A', 'F', 'B', 'E'],
# ['D', 'F', 'A', 'F', 'B', 'F']], dtype='<U1')
If you need the result as a list of words:
final = symbols[idx].view('U6').ravel().tolist()
final[:20]
# ['AAAAAA', 'AAAAAB', 'AAAAAC', 'AAAAAD', 'AAAAAE', 'AAAAAF', 'AAAABA', 'AAAABB', 'AAAABC', 'AAAABD', 'AAAABE', 'AAAABF', 'AAAACA', 'AAAACB', 'AAAACC', 'AAAACD', 'AAAACE', 'AAAACF', 'AAAADA', 'AAAADB']
I have the following letters:
Letters = ["a", "b", "c", "d", "e"]
What I would like is to write a generator function that will create strings that can be formed by taking a combination of any of the letters, preferably in some deterministic order like from smallest to biggest.
So for example if I were to run the generator 20 times I would get
a
b
c
d
e
aa
ab
ac
ad
ae
ba
bb
bc
bd
be
ca
cb
cc
cd
ce
da
How would I write this generator?
Generator function:
from itertools import *
def wordgen(letters):
for n in count(1):
yield from map(''.join, product(letters, repeat=n))
Usage:
for word in wordgen('abcde'):
print(word)
Output:
a
b
c
d
e
aa
ab
ac
ad
ae
ba
bb
bc
bd
be
ca
...
A self-made alternative without using itertools:
def wordgen(letters):
yield from letters
for word in wordgen(letters):
for letter in letters:
yield word + letter
Golf-version (admittedly starts with the empty string):
def w(s):yield'';yield from(w+c for w in w(s)for c in s)
Use the combinations functions from the itertools library. There's both combinations with replacement and without replacement
for item in itertools.combinations(Letters, 2):
print("".join(item))
https://docs.python.org/3.4/library/itertools.html
Use itertools.product():
from itertools import product, imap
letters = ["a", "b", "c", "d", "e"]
letters += imap(''.join, product(letters, repeat=2))
print letters
['a', 'b', 'c', 'd', 'e', 'aa', 'ab', 'ac', 'ad', 'ae', 'ba', 'bb', 'bc', 'bd', 'be', 'ca', 'cb', 'cc', 'cd', 'ce', 'da', 'db', 'dc', 'dd', 'de', 'ea', 'eb', 'ec', 'ed', 'ee']
I use a recursive generator function (without itertools)
Letters = ["a", "b", "c", "d", "e"]
def my_generator(list, first=""):
for letter in list:
yield first + letter
my_generators = []
for letter in list:
my_generators.append(my_generator(list, first + letter))
i = 0
while True:
for j in xrange(len(list)**(i/len(list)+1)):
yield next(my_generators[i%len(list)])
i+=1
gen = my_generator(Letters)
[next(gen) for c in xrange(160)]
you get
['a', 'b', 'c', 'd', 'e', 'aa', 'ab', 'ac', 'ad', 'ae', 'ba', 'bb',
'bc', 'bd', 'be', 'ca', 'cb', 'cc', 'cd', 'ce', 'da', 'db', 'dc',
'dd', 'de', 'ea', 'eb', 'ec', 'ed', 'ee', 'aaa', 'aab', 'aac', 'aad',
'aae', 'aba', 'abb', 'abc', 'abd', 'abe', 'aca', 'acb', 'acc', 'acd',
'ace', 'ada', 'adb', 'adc', 'add', 'ade', 'aea', 'aeb', 'aec', 'aed',
'aee', 'baa', 'bab', 'bac', 'bad', 'bae', 'bba', 'bbb', 'bbc', 'bbd',
'bbe', 'bca', 'bcb', 'bcc', 'bcd', 'bce', 'bda', 'bdb', 'bdc', 'bdd',
'bde', 'bea', 'beb', 'bec', 'bed', 'bee', 'caa', 'cab', 'cac', 'cad',
'cae', 'cba', 'cbb', 'cbc', 'cbd', 'cbe', 'cca', 'ccb', 'ccc', 'ccd',
'cce', 'cda', 'cdb', 'cdc', 'cdd', 'cde', 'cea', 'ceb', 'cec', 'ced',
'cee', 'daa', 'dab', 'dac', 'dad', 'dae', 'dba', 'dbb', 'dbc', 'dbd',
'dbe', 'dca', 'dcb', 'dcc', 'dcd', 'dce', 'dda', 'ddb', 'ddc', 'ddd',
'dde', 'dea', 'deb', 'dec', 'ded', 'dee', 'eaa', 'eab', 'eac', 'ead',
'eae', 'eba', 'ebb', 'ebc', 'ebd', 'ebe', 'eca', 'ecb', 'ecc', 'ecd',
'ece', 'eda', 'edb', 'edc', 'edd', 'ede', 'eea', 'eeb', 'eec', 'eed',
'eee', 'aaaa', 'aaab', 'aaac', 'aaad', 'aaae']
I am trying to do the following. The outer product of an array [a,b; c,d] with itself can be described as a 4x4 array of 'strings' of length 2. So in the upper left corner of the 4x4 matrix, the values are aa, ab, ac, ad. What's the best way to generate these strings in numpy/python or matlab?
This is an example for just one outer product. The goal is to handle k successive outer products, that is the 4x4 matrix can be multiplied again by [a,b; c,d] and so on.
You can obtain #Jaime's result in a much simpler way using np.char.array():
a = np.char.array(list('abcd'))
print(a[:,None]+a)
which gives:
chararray([['aa', 'ab', 'ac', 'ad'],
['ba', 'bb', 'bc', 'bd'],
['ca', 'cb', 'cc', 'cd'],
['da', 'db', 'dc', 'dd']],
dtype='|S2')
Using a funky mix of itertools and numpy you could do:
>>> from itertools import product
>>> s = 'abcd' # s = ['a', 'b', 'c', 'd'] works the same
>>> np.fromiter((a+b for a, b in product(s, s)), dtype='S2',
count=len(s)*len(s)).reshape(len(s), len(s))
array([['aa', 'ab', 'ac', 'ad'],
['ba', 'bb', 'bc', 'bd'],
['ca', 'cb', 'cc', 'cd'],
['da', 'db', 'dc', 'dd']],
dtype='|S2')
You can also avoid using numpy getting a little creative with itertools:
>>> from itertools import product, islice
>>> it = (a+b for a, b in product(s, s))
>>> [list(islice(it, len(s))) for j in xrange(len(s))]
[['aa', 'ab', 'ac', 'ad'],
['ba', 'bb', 'bc', 'bd'],
['ca', 'cb', 'cc', 'cd'],
['da', 'db', 'dc', 'dd']]
You could use list comprehensions in Python:
array = [['a', 'b'], ['c', 'd']]
flatarray = [ x for row in array for x in row]
outerproduct = [[y+x for x in flatarray] for y in flatarray]
Output: [['aa', 'ab', 'ac', 'ad'], ['ba', 'bb', 'bc', 'bd'], ['ca', 'cb', 'cc', 'cd'], ['da', 'db', 'dc', 'dd']]
To continue the discussion after Jose Varz's answer:
def foo(A,B):
flatA [x for row in A for x in row],
flatB = [x for row in B for x in row]
outer = [[y+x for x in flatA] for y in flatB]
return outer
In [265]: foo(A,A)
Out[265]:
[['aa', 'ab', 'ac', 'ad'],
['ba', 'bb', 'bc', 'bd'],
['ca', 'cb', 'cc', 'cd'],
['da', 'db', 'dc', 'dd']]
In [268]: A3=np.array(foo(foo(A,A),A))
In [269]: A3
Out[269]:
array([['aaa', 'aab', 'aac', 'aad', 'aba', 'abb', 'abc', 'abd', 'aca',
'acb', 'acc', 'acd', 'ada', 'adb', 'adc', 'add'],
['baa', 'bab', 'bac', 'bad', 'bba', 'bbb', 'bbc', 'bbd', 'bca',
'bcb', 'bcc', 'bcd', 'bda', 'bdb', 'bdc', 'bdd'],
['caa', 'cab', 'cac', 'cad', 'cba', 'cbb', 'cbc', 'cbd', 'cca',
'ccb', 'ccc', 'ccd', 'cda', 'cdb', 'cdc', 'cdd'],
['daa', 'dab', 'dac', 'dad', 'dba', 'dbb', 'dbc', 'dbd', 'dca',
'dcb', 'dcc', 'dcd', 'dda', 'ddb', 'ddc', 'ddd']],
dtype='|S3')
In [270]: A3.reshape(4,4,4)
Out[270]:
array([[['aaa', 'aab', 'aac', 'aad'],
['aba', 'abb', 'abc', 'abd'],
['aca', 'acb', 'acc', 'acd'],
['ada', 'adb', 'adc', 'add']],
[['baa', 'bab', 'bac', 'bad'],
['bba', 'bbb', 'bbc', 'bbd'],
['bca', 'bcb', 'bcc', 'bcd'],
['bda', 'bdb', 'bdc', 'bdd']],
[['caa', 'cab', 'cac', 'cad'],
['cba', 'cbb', 'cbc', 'cbd'],
['cca', 'ccb', 'ccc', 'ccd'],
['cda', 'cdb', 'cdc', 'cdd']],
[['daa', 'dab', 'dac', 'dad'],
['dba', 'dbb', 'dbc', 'dbd'],
['dca', 'dcb', 'dcc', 'dcd'],
['dda', 'ddb', 'ddc', 'ddd']]],
dtype='|S3')
With this definition, np.array(foo(A,foo(A,A))).reshape(4,4,4) produces the same array.
In [285]: A3.reshape(8,8)
Out[285]:
array([['aaa', 'aab', 'aac', 'aad', 'aba', 'abb', 'abc', 'abd'],
['aca', 'acb', 'acc', 'acd', 'ada', 'adb', 'adc', 'add'],
['baa', 'bab', 'bac', 'bad', 'bba', 'bbb', 'bbc', 'bbd'],
['bca', 'bcb', 'bcc', 'bcd', 'bda', 'bdb', 'bdc', 'bdd'],
['caa', 'cab', 'cac', 'cad', 'cba', 'cbb', 'cbc', 'cbd'],
['cca', 'ccb', 'ccc', 'ccd', 'cda', 'cdb', 'cdc', 'cdd'],
['daa', 'dab', 'dac', 'dad', 'dba', 'dbb', 'dbc', 'dbd'],
['dca', 'dcb', 'dcc', 'dcd', 'dda', 'ddb', 'ddc', 'ddd']],
dtype='|S3')
Could it be that you want the Kronecker product of two char.arrays?
A quick adaptation of np.kron (numpy/lib/shape_base.py):
def outer(a,b):
# custom 'outer' for this issue
# a,b must be np.char.array for '+' to be defined
return a.ravel()[:, np.newaxis]+b.ravel()[np.newaxis,:]
def kron(a,b):
# assume a,b are 2d char array
# functionally same as np.kron, but using custom outer()
result = outer(a, b).reshape(a.shape+b.shape)
result = np.hstack(np.hstack(result))
result = np.char.array(result)
return result
A = np.char.array(list('abcd')).reshape(2,2)
produces:
A =>
[['a' 'b']
['c' 'd']]
outer(A,A) =>
[['aa' 'ab' 'ac' 'ad']
['ba' 'bb' 'bc' 'bd']
['ca' 'cb' 'cc' 'cd']
['da' 'db' 'dc' 'dd']]
kron(A,A) =>
[['aa' 'ab' 'ba' 'bb']
['ac' 'ad' 'bc' 'bd']
['ca' 'cb' 'da' 'db']
['cc' 'cd' 'dc' 'dd']]
kron rearranges the outer elements by reshaping it to (2,2,2,2), and then concatenating twice on axis=1.
kron(kron(A,A),A) =>
[['aaa' 'aab' 'aba' 'abb' 'baa' 'bab' 'bba' 'bbb']
['aac' 'aad' 'abc' 'abd' 'bac' 'bad' 'bbc' 'bbd']
['aca' 'acb' 'ada' 'adb' 'bca' 'bcb' 'bda' 'bdb']
['acc' 'acd' 'adc' 'add' 'bcc' 'bcd' 'bdc' 'bdd']
['caa' 'cab' 'cba' 'cbb' 'daa' 'dab' 'dba' 'dbb']
['cac' 'cad' 'cbc' 'cbd' 'dac' 'dad' 'dbc' 'dbd']
['cca' 'ccb' 'cda' 'cdb' 'dca' 'dcb' 'dda' 'ddb']
['ccc' 'ccd' 'cdc' 'cdd' 'dcc' 'dcd' 'ddc' 'ddd']]
kron(kron(kron(A,A),A),A) =>
# (16,16)
[['aaaa' 'aaab' 'aaba' 'aabb'...]
['aaac' 'aaad' 'aabc' 'aabd'...]
['aaca' 'aacb' 'aada' 'aadb'...]
['aacc' 'aacd' 'aadc' 'aadd'...]
...]
I don't know of a better way to word what I'm looking for, so please bear with me.
Let's say that I have a list of 17 elements. For the sake of brevity we'll represent this list as ABCDEFGHIJKLMNOPQ. If I wanted to divide this into 7 sufficiently "even" sub-lists, it might look like this:
ABC DE FGH IJ KL MNO PQ
Here, the lengths of each sub-list are 3, 2, 3, 2, 2, 3, 2. The maximum length is only one more than the minimum length: ABC DE FGH I JKL MN OPQ has seven sub-lists as well, but the range of lengths is two here.
Furthermore, examine how many 2's separate each pair of 3's: this follows the same rule of RANGE ≤ 1. The range of lengths in ABC DEF GH IJ KLM NO PQ is 1 as well, but they are imbalanced: 3, 3, 2, 2, 3, 2, 2. Ideally, if one were to keep reducing the sub-list in such a fashion, the numbers would never deviate from one another by more than one.
Of course, there is more than one way to "evenly" divide a list into sub-lists in this fashion. I'm not looking for an exhaustive set of solutions - if I can get one solution in Python for a list of any length and any number of sub-lists, that's good enough for me. The problem is that I don't even know where to begin when solving such a problem. Does anyone know what I'm looking for?
>>> s='ABCDEFGHIJKLMNOPQ'
>>> parts=7
>>> [s[i*len(s)//parts:(i+1)*len(s)//parts] for i in range(parts)]
['AB', 'CD', 'EFG', 'HI', 'JKL', 'MN', 'OPQ']
>>> import string
>>> for j in range(26):
... print [string.uppercase[i*j//parts:(i+1)*j//parts] for i in range(parts)]
...
['', '', '', '', '', '', '']
['', '', '', '', '', '', 'A']
['', '', '', 'A', '', '', 'B']
['', '', 'A', '', 'B', '', 'C']
['', 'A', '', 'B', '', 'C', 'D']
['', 'A', 'B', '', 'C', 'D', 'E']
['', 'A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E', 'F', 'G']
['A', 'B', 'C', 'D', 'E', 'F', 'GH']
['A', 'B', 'C', 'DE', 'F', 'G', 'HI']
['A', 'B', 'CD', 'E', 'FG', 'H', 'IJ']
['A', 'BC', 'D', 'EF', 'G', 'HI', 'JK']
['A', 'BC', 'DE', 'F', 'GH', 'IJ', 'KL']
['A', 'BC', 'DE', 'FG', 'HI', 'JK', 'LM']
['AB', 'CD', 'EF', 'GH', 'IJ', 'KL', 'MN']
['AB', 'CD', 'EF', 'GH', 'IJ', 'KL', 'MNO']
['AB', 'CD', 'EF', 'GHI', 'JK', 'LM', 'NOP']
['AB', 'CD', 'EFG', 'HI', 'JKL', 'MN', 'OPQ']
['AB', 'CDE', 'FG', 'HIJ', 'KL', 'MNO', 'PQR']
['AB', 'CDE', 'FGH', 'IJ', 'KLM', 'NOP', 'QRS']
['AB', 'CDE', 'FGH', 'IJK', 'LMN', 'OPQ', 'RST']
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STU']
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STUV']
['ABC', 'DEF', 'GHI', 'JKLM', 'NOP', 'QRS', 'TUVW']
['ABC', 'DEF', 'GHIJ', 'KLM', 'NOPQ', 'RST', 'UVWX']
['ABC', 'DEFG', 'HIJ', 'KLMN', 'OPQ', 'RSTU', 'VWXY']
If you have a list of length N, and you want some number of sub-lists S, it seems to me that you should start with a division with remainder. For N == 17 and S == 7, you have 17 // 7 == 2 and 17 % 7 == 3. So you can start with 7 length values of 2, but know that you need to increment 3 of the length values by 1 to handle the remainder. Since your list of length values is length 7, and you have 3 values to increment, you could compute X = 7 / 3 and use that as a stride: increment the 0th item, then the int(X) item, the int(2*X) item, and so on.
If that doesn't work for you, I suggest you get a book called The Algorithm Design Manual by Skiena, and look through the set and tree algorithms.
http://www.algorist.com/
See the "grouper" example at http://docs.python.org/library/itertools.html