Break a string into sequential substrings? [duplicate]

Break a string into sequential substrings? [duplicate] - python

This question already has answers here:
How To Get All The Contiguous Substrings Of A String In Python?
(9 answers)
Closed 5 years ago.
I have a string "BANANA". I would like to generate a list of all possible sequential substrings:
[B, BA, BAN, BANA, BANAN, BANANA, A, AN, ANA, ...]
Is this something I can accomplish using a Python List Comprehension or would I just generate them in a brute force manner? Note: I am new to Python. TIA

Using list Comprehension:
s = "BANANA"
l = len(s)
ar = [s[j:] for i in range(l) for j in range(i,l)]
print(*ar)
Using nested loop:
s = "BANANA"
l = len(s)
ar = []
for i in range(l):
for j in range(i,l):
ar.append(s[j:])
print(*ar)
Both output:
BANANA ANANA NANA ANA NA A ANANA NANA ANA NA A NANA ANA NA A ANA NA A NA A A
N.B.: The itertools has already been explained in A.J.'s answer.

Try the following with itertools:
str = "BANANA"
all = [[''.join(j) for j in itertools.product(str, repeat=i)] for i in range(1, len(str)+1)]
>>> all[0]
['B', 'A', 'N', 'A', 'N', 'A']
>>> all[1]
['BB', 'BA', 'BN', 'BA', 'BN', 'BA', 'AB', 'AA', 'AN', 'AA', 'AN', 'AA', 'NB', 'NA', 'NN', 'NA', 'NN', 'NA', 'AB', 'AA', 'AN', 'AA', 'AN', 'AA', 'NB', 'NA', 'NN', 'NA', 'NN', 'NA', 'AB', 'AA', 'AN', 'AA', 'AN', 'AA']
>>>

If you want all the posible sublist, you can use two for in one list comprehension:
def sublists(lst):
return [lst[m:n+1] for m in range(0,len(lst)+1) for n in range(m,len(lst)+1)]
sublists("banana")
=> ['b', 'ba', 'ban', 'bana', 'banan', 'banana', 'banana', 'a', 'an', 'ana', 'anan', 'anana', 'anana', 'n', 'na', 'nan', 'nana', 'nana', 'a', 'an', 'ana', 'ana', 'n', 'na', 'na', 'a', 'a', '']
if you dont want repeated elements:
def sublistsWithoutRepeated(lst):
return list(set(sublists(lst)))
sublistsWithoutRepeated("banana")
=> ['a', '', 'b', 'ba', 'nana', 'na', 'nan', 'an', 'anana', 'anan', 'n', 'bana', 'ban', 'banan', 'banana', 'ana']

Related

Need to compare a portion of one list with another and see if they have the same numeric order and if not see the elements that are in other position

I'm trying to compare one list lst with another one lst2 and see if the values in one lst correspond to a portion of the other list lst2 and if it has the same string order of the first one lst and if is not returns the values that do not have the right position.
This are the examples:
lst = ['a', 'b', 'd', 'c', 'e']
lst2 = ['DD', 'OO', 'CC' ,'WW', 'GG', 'a', 'b', 'c', 'd', 'e', 'AA' 'QQ', 'EE', 'ZZ', 'XX', 'YY', 'UU', 'II', 'OO', 'HH']
Supose that the the lst values will change. I mean it will have a different length with other string values added in near future but the string index 'GG' and 'AA' in lst2 will not change, it just will change the values from 'a' to 'e' as lst but the process it will be the same.
It is better using pandas dataframes or "\n".join() as string columns or just using list?

The question was not actually clear, but as I understand it - you want to find the elements in lst2 that did not appear in lst1.
for both upcoming approaches I checked with the example inputs as you provided:
lst = ['a', 'b', 'd', 'c', 'e']
lst2 = ['DD', 'OO', 'CC' ,'WW', 'GG', 'a', 'b', 'c', 'd', 'e', 'AA' 'QQ', 'EE', 'ZZ', 'XX', 'YY', 'UU', 'II', 'OO', 'HH']
If the order of appearances must be the same in both lists - you can use tricky way of manipulating strings. like in here:
lst_str = ' '.join(lst)
lst2_str = ' '.join(lst2)
index_found = lst2_str.find(lst_str)
lst4 = lst2_str.split()
if index_found!=-1:
lst4 = (lst2_str[0:index_found] + lst2_str[index_found+len(lst_str):]).split()
output:
['DD', 'OO', 'CC', 'WW', 'GG', 'a', 'b', 'c', 'd', 'e', 'AAQQ', 'EE', 'ZZ', 'XX', 'YY', 'UU', 'II', 'OO', 'HH']
this approach assume that whitespaces are not allowed in the elements in the lists (you can use other seperator as well of course)
Otherwise if the order does not matter, a simple list comprehension will do the work:
lst3 = [item for item in lst2 if not item in lst]
output:
['DD', 'OO', 'CC', 'WW', 'GG', 'AAQQ', 'EE', 'ZZ', 'XX', 'YY', 'UU', 'II', 'OO', 'HH']
the differences between the approaches outputs comes up because the order in elements in lst is differet than in lst2, therefore the 2 approaches retrieve different outputs

How can I move a item in a numpy array?

I have a numpy array -
['L', 'London', 'M', 'Moscow', 'NYC', 'Paris', 'nan']
I want 'nan' to be first, like so:
['nan', 'L', 'London', 'M', 'Moscow', 'NYC', 'Paris']
How can I do that?

If you want to use numpy, you can use numpy.roll:
a = np.array(['L', 'London', 'M', 'Moscow', 'NYC', 'Paris', 'nan'])
a = np.roll(a, 1)
print(a)
Prints:
['nan' 'L' 'London' 'M' 'Moscow' 'NYC' 'Paris']

Get all possible combination of values in a list -- Python

I have a list containing ['a', 'bill', 'smith'] and I would like to write a python code in order to obtain all possible combinations applying a certain criteria. To be more precise, I would like to obtain combination of those three element in the list plus the first letter of each element if that element isn't yet present in the output list. For example, given the list ['a', 'bill', 'smith'], one part of the expected output would be: ['a', 'bill', 'smith'], ['bill', 'smith'], ['a', 'smith'] but as well, ['a', 'b, 'smith'], ['bill, 's'], ['a', 's']. What I'm not expected to obtain is output like this ['s', 'bill, 'smith'] as the first element (s) is already taken into account by the third element ('smith'). Can someone help me?
This is what I've done so far:
mapping = dict(enumerate(['a', 'bill', 'smith']))
for i in mapping.items():
if len(i[1])>1:
mapping[i[0]] = [i[1], i[1][0]]
else:
mapping[i[0]] = [i[1]]
print(mapping)
{0: ['a'], 1: ['bill', 'b'], 2: ['william', 'w'], 3: ['stein', 's']}
I'm now stucked. I would like to use itertools library to iterate over the dict values to create all possible combinations.
Thanks in advance :)

You can use some itertools:
from itertools import product, permutations
lst = [list({s, s[:1]}) for s in ['a', 'bill', 'smith']]
# [['a'], ['bill', 'b'], ['s', 'smith']]
for perms in map(permutations, product(*lst)):
for p in perms:
print(p)
('a', 'bill', 's')
('a', 's', 'bill')
('bill', 'a', 's')
('bill', 's', 'a')
('s', 'a', 'bill')
('s', 'bill', 'a')
('a', 'bill', 'smith')
('a', 'smith', 'bill')
('bill', 'a', 'smith')
('bill', 'smith', 'a')
('smith', 'a', 'bill')
('smith', 'bill', 'a')
('a', 'b', 's')
('a', 's', 'b')
('b', 'a', 's')
('b', 's', 'a')
('s', 'a', 'b')
('s', 'b', 'a')
('a', 'b', 'smith')
('a', 'smith', 'b')
('b', 'a', 'smith')
('b', 'smith', 'a')
('smith', 'a', 'b')
('smith', 'b', 'a')
The first step creates the list of equivalent lists:
[['a'], ['bill', 'b'], ['s', 'smith']]
then, product produces the cartesian product of the lists in said lists:
('a', 'bill', 's')
('a', 'bill', 'smith')
('a', 'b', 's')
...
and for each of those, permutations gives you, well, all permutations:
('a', 'bill', 's')
('a', 's', 'bill')
('bill', 'a', 's')
...

You could do something like this with combinations from itertools:
Here I am assuming you want the first letter of each word in the list only if it has a length greater than 1. If not you can change the if condition.
from itertools import combinations
lst = ['a', 'bill', 'smith']
lst_n=[]
for words in lst:
lst_n.append(words)
if len(words)>1:
lst_n.append(words[0])
for t in range(1,len(lst_n)+1):
for comb in combinations(lst_n,r=t):
print(list(comb))
OUTPUT:
['a']
['bill']
['b']
['smith']
['s']
['a', 'bill']
['a', 'b']
['a', 'smith']
['a', 's']
['bill', 'b']
['bill', 'smith']
['bill', 's']
['b', 'smith']
['b', 's']
['smith', 's']
['a', 'bill', 'b']
['a', 'bill', 'smith']
['a', 'bill', 's']
['a', 'b', 'smith']
['a', 'b', 's']
['a', 'smith', 's']
['bill', 'b', 'smith']
['bill', 'b', 's']
['bill', 'smith', 's']
['b', 'smith', 's']
['a', 'bill', 'b', 'smith']
['a', 'bill', 'b', 's']
['a', 'bill', 'smith', 's']
['a', 'b', 'smith', 's']
['bill', 'b', 'smith', 's']
['a', 'bill', 'b', 'smith', 's']
Here if you want combinations to be of length 3 only remove the for loop with range and set r=3.

Sorting a list using both length and alphabetically [duplicate]

This question already has answers here:
Length-wise-sorted list but, same length in alphabetical-order in a step
(2 answers)
Closed 5 years ago.
I have a list of all combinations of word HACK like this:
lista = ['H', 'A', 'C', 'K', 'HA', 'HC', 'HK', 'AC', 'AK', 'CK']
I tried sorting the above using :
lista.sort(lambda x,y:cmp(len(x),len(y)))
gives me the same result.
How can I sort with both the length and alphabetically.
Expected Output:
['A', 'C', 'H', 'K', 'AC', 'AH', 'AK', 'CH', 'CK', 'HK']
Update:
from itertools import combinations
inp = "HACK 2".split(" ")
lista = []
for i in range(1,int(inp[1])+1):
for item in list(combinations(inp[0],i)):
lista.append("".join(item))
lista = sorted(lista, key=lambda x: (len(x), x))
print lista
#Output
['A', 'C', 'H', 'K', 'AC', 'AK', 'CK', 'HA', 'HC', 'HK']
#Expected Output
['A', 'C', 'H', 'K', 'AC', 'AH', 'AK', 'CH', 'CK', 'HK']
Also is there anything wrong with how I am iterating the combinations ?

list.sort, sorted accept an optional keyword argument key. Return value of the key function is used to compare elements instead of the elements themselves.
For your case, you can use a key function that returns a tuple of (length, string itself):
>>> lista = ['H', 'A', 'C', 'K', 'HA', 'HC', 'HK', 'AC', 'AK', 'CK']
>>> sorted(lista, key=lambda x: (len(x), x))
['A', 'C', 'H', 'K', 'AC', 'AK', 'CK', 'HA', 'HC', 'HK']

You want to sort not just the lista list, but also all the strings in it. So
>>> lista = ['H', 'A', 'C', 'K', 'HA', 'HC', 'HK', 'AC', 'AK', 'CK']
>>> for i, string in enumerate(lista):
... lista[i] = ''.join(sorted(list(string)))
...
>>> lista
['H', 'A', 'C', 'K', 'AH', 'CH', 'HK', 'AC', 'AK', 'CK']
>>> lista.sort(key=lambda s: (len(s), s))
>>> lista
['A', 'C', 'H', 'K', 'AC', 'AH', 'AK', 'CH', 'CK', 'HK']

Splitting a K-length list into L sub-lists that are as "even" as possible, even if K/L leaves a remainder

I don't know of a better way to word what I'm looking for, so please bear with me.
Let's say that I have a list of 17 elements. For the sake of brevity we'll represent this list as ABCDEFGHIJKLMNOPQ. If I wanted to divide this into 7 sufficiently "even" sub-lists, it might look like this:
ABC DE FGH IJ KL MNO PQ
Here, the lengths of each sub-list are 3, 2, 3, 2, 2, 3, 2. The maximum length is only one more than the minimum length: ABC DE FGH I JKL MN OPQ has seven sub-lists as well, but the range of lengths is two here.
Furthermore, examine how many 2's separate each pair of 3's: this follows the same rule of RANGE ≤ 1. The range of lengths in ABC DEF GH IJ KLM NO PQ is 1 as well, but they are imbalanced: 3, 3, 2, 2, 3, 2, 2. Ideally, if one were to keep reducing the sub-list in such a fashion, the numbers would never deviate from one another by more than one.
Of course, there is more than one way to "evenly" divide a list into sub-lists in this fashion. I'm not looking for an exhaustive set of solutions - if I can get one solution in Python for a list of any length and any number of sub-lists, that's good enough for me. The problem is that I don't even know where to begin when solving such a problem. Does anyone know what I'm looking for?

>>> s='ABCDEFGHIJKLMNOPQ'
>>> parts=7
>>> [s[i*len(s)//parts:(i+1)*len(s)//parts] for i in range(parts)]
['AB', 'CD', 'EFG', 'HI', 'JKL', 'MN', 'OPQ']
>>> import string
>>> for j in range(26):
... print [string.uppercase[i*j//parts:(i+1)*j//parts] for i in range(parts)]
...
['', '', '', '', '', '', '']
['', '', '', '', '', '', 'A']
['', '', '', 'A', '', '', 'B']
['', '', 'A', '', 'B', '', 'C']
['', 'A', '', 'B', '', 'C', 'D']
['', 'A', 'B', '', 'C', 'D', 'E']
['', 'A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E', 'F', 'G']
['A', 'B', 'C', 'D', 'E', 'F', 'GH']
['A', 'B', 'C', 'DE', 'F', 'G', 'HI']
['A', 'B', 'CD', 'E', 'FG', 'H', 'IJ']
['A', 'BC', 'D', 'EF', 'G', 'HI', 'JK']
['A', 'BC', 'DE', 'F', 'GH', 'IJ', 'KL']
['A', 'BC', 'DE', 'FG', 'HI', 'JK', 'LM']
['AB', 'CD', 'EF', 'GH', 'IJ', 'KL', 'MN']
['AB', 'CD', 'EF', 'GH', 'IJ', 'KL', 'MNO']
['AB', 'CD', 'EF', 'GHI', 'JK', 'LM', 'NOP']
['AB', 'CD', 'EFG', 'HI', 'JKL', 'MN', 'OPQ']
['AB', 'CDE', 'FG', 'HIJ', 'KL', 'MNO', 'PQR']
['AB', 'CDE', 'FGH', 'IJ', 'KLM', 'NOP', 'QRS']
['AB', 'CDE', 'FGH', 'IJK', 'LMN', 'OPQ', 'RST']
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STU']
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STUV']
['ABC', 'DEF', 'GHI', 'JKLM', 'NOP', 'QRS', 'TUVW']
['ABC', 'DEF', 'GHIJ', 'KLM', 'NOPQ', 'RST', 'UVWX']
['ABC', 'DEFG', 'HIJ', 'KLMN', 'OPQ', 'RSTU', 'VWXY']

If you have a list of length N, and you want some number of sub-lists S, it seems to me that you should start with a division with remainder. For N == 17 and S == 7, you have 17 // 7 == 2 and 17 % 7 == 3. So you can start with 7 length values of 2, but know that you need to increment 3 of the length values by 1 to handle the remainder. Since your list of length values is length 7, and you have 3 values to increment, you could compute X = 7 / 3 and use that as a stride: increment the 0th item, then the int(X) item, the int(2*X) item, and so on.
If that doesn't work for you, I suggest you get a book called The Algorithm Design Manual by Skiena, and look through the set and tree algorithms.
http://www.algorist.com/

See the "grouper" example at http://docs.python.org/library/itertools.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Break a string into sequential substrings? [duplicate] - python

Related

Need to compare a portion of one list with another and see if they have the same numeric order and if not see the elements that are in other position

How can I move a item in a numpy array?

Get all possible combination of values in a list -- Python

Sorting a list using both length and alphabetically [duplicate]

Splitting a K-length list into L sub-lists that are as "even" as possible, even if K/L leaves a remainder

Categories

Resources