python string operation. sticked words separation - python

I have a pretty challenging problem here I need your help.
the problem is this:
I have a string for example "abcde"
Now, I want to separate this string into any possible ordered combinations as a list of strings.
for example,
my_function('abcde')
output =
[
['a', 'b', 'c', 'd', 'e'],
['a', 'b', 'c', 'de'],
['a', 'b', 'cde'],
['a', 'bced'],
['a', 'b', 'cd', 'e'],
['a', 'bc', 'd', 'e'],
['a', 'bc', 'de'],
['a', 'bcd', 'e'],
['a', 'bcde'],
['ab', 'c', 'd', 'e'],
['ab', 'c', 'de'],
['ab', 'cd', 'e'],
['ab', 'cde'],
['abc','d','e'],
['abc', 'de'],
['abcd', 'e'],
['abcde']
]
It is not quite the permutation since the order matters.

Same result without itertools:
s = 'python'
splits = len(s) - 1
output = []
for i in range(2 ** splits):
combination = []
word = ''
for position in range(splits + 1):
word += s[position]
if not (i & (1 << position)):
combination.append(word)
word = ''
output.append(combination)
output.sort()
for combination in output:
print(combination)
Just for beginners.

You could do this:
import itertools
def get_slices(values):
slices_len = len(values) - 1
for is_slice in itertools.product([True, False], repeat=slices_len):
start_index = 0
slices = []
for slice_index, is_index_slice in enumerate(is_slice, 1):
if is_index_slice:
index_slice = values[start_index:slice_index]
start_index = slice_index
slices.append(index_slice)
slices.append(values[start_index:])
yield slices
Most important part of this code is the itertools.product call at the beginning, this generates all possible types of slices. A slice definition here corresponds to a bunch of bools representing whether two adjacent elements at all indices of pairs in values (there are slices_len of these) are joined or not.
list(get_slices("abcde)) will return the list you requested. If you don't need all results immediately, and instead want to iterate through them, you don't need the surrounding list call.
If you want the reverse order, you can switch the [True, False] with [False, True].

i got 16 items and you have 17 :-)
def fn(base_str):
result = [[base_str]]
for i in range(1, len(base_str)):
child = fn(base_str[i:])
for x in child:
x.insert(0, base_str[0:i])
result = child + result
return result
print(fn("abcde"))

Related

Unique combinations from list with "distance limit"

Given the list a = ['a', 'b', 'c', 'd', 'e'], I would use itertools.combinations to get all unique combos like ['ab', 'ac', ...], as per the classic SO answer
How can I limit the unique combinations to items that are not farther away than n spots?
Example
If I want list items no more than n=2 spots away, I would accept 'ab' and 'ac' as combinations but not 'ae', because the distance between 'a' and 'e' is greater than n=2
Edit - code
Below the plain python code solution, which I'd avoid due to the double-for loop, that is not ideal for large lists
a = ['a', 'b', 'c', 'd', 'e']
n_items = len(a)
n_max_look_forward = 2
unique_combos = []
for i, item in enumerate(a):
for j in range(i+1, min(i+n_max_look_forward+1, n_items)):
unique_combos.append( item+a[j] )
print(unique_combos)
Complexity-wise, your solution is close to the best possible.
You could refactor it to be a generator to generate the values only when you need them so that you don't have to hold all of them in memory at the same time:
def combis(source, max_distance=2):
for i, item in enumerate(source):
for j in range(i+1, min(i+max_distance+1, len(source))):
yield item+source[j]
You can then iterate over the generator:
>>> for combi in combis(['a', 'b', 'c', 'd', 'e']):
... print(combi)
...
ab
ac
bc
bd
cd
ce
de
If you need all of them in memory as a list, you can still use the generator to initialise it:
>>> list(combis(['a', 'b', 'c', 'd', 'e']))
['ab', 'ac', 'bc', 'bd', 'cd', 'ce', 'de']

Python 3 sort list -> all entries starting with lower case first

l1 = ['B','c','aA','b','Aa','C','A','a']
the result should be
['a','aA','b','c','A','Aa','B','C']
so same as l1.sort() but beginning with all words that start with lower case.
Try this:
>>> l = ['B', 'b','a','A', 'aA', 'Aa','C', 'c']
>>> sorted(l, key=str.swapcase)
['a', 'aA', 'b', 'c', 'A', 'Aa', 'B', 'C']
EDIT:
A one-liner using the list.sort method for those who prefer the imperative approach:
>>> l.sort(key=str.swapcase)
>>> print l
['a', 'aA', 'b', 'c', 'A', 'Aa', 'B', 'C']
Note:
The first approach leaves the state of l unchanged while the second one does change it.
Here is what you might be looking for:
li = ['a', 'A', 'b', 'B']
def sort_low_case_first(li):
li.sort() # will sort the list, uppercase first
index = 0 # where the list needs to be cuted off
for i, x in enumerate(li): # iterate over the list
if x[0].islower(): # if we uncounter a string starting with a lowercase
index = i # memorize where
break # stop searching
return li[index:]+li[:index] # return the end of the list, containing the sorted lower case starting strings, then the sorted uppercase starting strings
sorted_li = sort_low_case_first(li) # run the function
print(sorted_li) # check the result
>>> ['a', 'b', 'A', 'B']

My Python module returns wrong list

I done the following Python script which should return a list of sublists.
def checklisting(inputlist, repts):
result = []
temprs = []
ic = 1;
for x in inputlist
temprs.append(x)
ic += 1
if ic == repts:
ic = 1
result.append(temprs)
return result
Example: If I called the function with the following arguments:
checklisting(['a', 'b', 'c', 'd'], 2)
it would return
[['a', 'b'], ['c', 'd']]
or if I called it like:
checklisting(['a', 'b', 'c', 'd'], 4)
it would return
[['a', 'b', 'c', 'd']]
However what it returns is a weird huge list:
>>> l.checklisting(['a','b','c','d'], 2)
[['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']]
Someone please help! I need that script to compile a list with the data:
['water tax', 20, 'per month', 'electric tax', 1, 'per day']
The logic behind it is that it would separe sequences in the list the size of repts into sublists so it can be better and easier organized. I don't want arbitrary chunks of sublists as these in the other question don't specify the size of the sequence correctly.
Your logic is flawed.
Here are the bugs: You keep appending to temprs. Once repts is reached, you need to remove elements from temprs. Also, list indexes start at 0 so ic should be 0 instead of 1
Replace your def with:
def checklisting(inputlist, repts):
result = []
temprs = []
ic = 0;
for x in inputlist:
temprs.append(x)
ic += 1
if ic == repts:
ic = 0
result.append(temprs)
temprs = []
return result
Here is link to working demo of code above
def split_into_sublists(list_, size):
return list(map(list,zip(*[iter(list_)]*size)))
#[iter(list_)]*size this creates size time lists, if
#size is 3 three lists will be created.
#zip will zip the lists into tuples
#map will covert tuples to lists.
#list will convert map object to list.
print(split_into_sublists(['a', 'b', 'c', 'd'], 2))
[['a', 'b'], ['c', 'd']]
print(split_into_sublists(['a', 'b', 'c', 'd'], 4))
[['a', 'b', 'c', 'd']]
I got lost in your code. I think the more Pythonic approach is to slice the list. And I can never resist list comprehensions.
def checklisting(inputlist, repts):
return [ input_list[i:i+repts] for i in range(int(len(input_list)/repts)) ]

Python: find all possible word combinations with a sequence of characters (word segmentation)

I'm doing some word segmentation experiments like the followings.
lst is a sequence of characters, and output is all the possible words.
lst = ['a', 'b', 'c', 'd']
def foo(lst):
...
return output
output = [['a', 'b', 'c', 'd'],
['ab', 'c', 'd'],
['a', 'bc', 'd'],
['a', 'b', 'cd'],
['ab', 'cd'],
['abc', 'd'],
['a', 'bcd'],
['abcd']]
I've checked combinations and permutations in itertools library,
and also tried combinatorics.
However, it seems that I'm looking at the wrong side because this is not pure permutation and combinations...
It seems that I can achieve this by using lots of loops, but the efficiency might be low.
EDIT
The word order is important so combinations like ['ba', 'dc'] or ['cd', 'ab'] are not valid.
The order should always be from left to right.
EDIT
#Stuart's solution doesn't work in Python 2.7.6
EDIT
#Stuart's solution does work in Python 2.7.6, see the comments below.
itertools.product should indeed be able to help you.
The idea is this:-
Consider A1, A2, ..., AN separated by slabs. There will be N-1 slabs.
If there is a slab there is a segmentation. If there is no slab, there is a join.
Thus, for a given sequence of length N, you should have 2^(N-1) such combinations.
Just like the below
import itertools
lst = ['a', 'b', 'c', 'd']
combinatorics = itertools.product([True, False], repeat=len(lst) - 1)
solution = []
for combination in combinatorics:
i = 0
one_such_combination = [lst[i]]
for slab in combination:
i += 1
if not slab: # there is a join
one_such_combination[-1] += lst[i]
else:
one_such_combination += [lst[i]]
solution.append(one_such_combination)
print solution
#!/usr/bin/env python
from itertools import combinations
a = ['a', 'b', 'c', 'd']
a = "".join(a)
cuts = []
for i in range(0,len(a)):
cuts.extend(combinations(range(1,len(a)),i))
for i in cuts:
last = 0
output = []
for j in i:
output.append(a[last:j])
last = j
output.append(a[last:])
print(output)
output:
zsh 2419 % ./words.py
['abcd']
['a', 'bcd']
['ab', 'cd']
['abc', 'd']
['a', 'b', 'cd']
['a', 'bc', 'd']
['ab', 'c', 'd']
['a', 'b', 'c', 'd']
There are 8 options, each mirroring the binary numbers 0 through 7:
000
001
010
011
100
101
110
111
Each 0 and 1 represents whether or not the 2 letters at that index are "glued" together. 0 for no, 1 for yes.
>>> lst = ['a', 'b', 'c', 'd']
... output = []
... formatstr = "{{:0{}.0f}}".format(len(lst)-1)
... for i in range(2**(len(lst)-1)):
... output.append([])
... s = "{:b}".format(i)
... s = str(formatstr.format(float(s)))
... lstcopy = lst[:]
... for j, c in enumerate(s):
... if c == "1":
... lstcopy[j+1] = lstcopy[j] + lstcopy[j+1]
... else:
... output[-1].append(lstcopy[j])
... output[-1].append(lstcopy[-1])
... output
[['a', 'b', 'c', 'd'],
['a', 'b', 'cd'],
['a', 'bc', 'd'],
['a', 'bcd'],
['ab', 'c', 'd'],
['ab', 'cd'],
['abc', 'd'],
['abcd']]
>>>
You can use a recursive generator:
def split_combinations(L):
for split in range(1, len(L)):
for combination in split_combinations(L[split:]):
yield [L[:split]] + combination
yield [L]
print (list(split_combinations('abcd')))
Edit. I'm not sure how well this would scale up for long strings and at what point it hits Python's recursion limit. Similarly to some of the other answers, you could also use combinations from itertools to work through every possible combination of split-points.
def split_string(s, t):
return [s[start:finish] for start, finish in zip((None, ) + t, t + (None, ))]
def split_combinations(s):
for i in range(len(s)):
for split_points in combinations(range(1, len(s)), i):
yield split_string(s, split_points)
These both seem to work as intended in Python 2.7 (see here) and Python 3.2 (here). As #twasbrillig says, make sure you indent it as shown.

Create all sequences from the first item within a list

Say I have a list, ['a', 'b', 'c', 'd']. Are there any built-ins or methods in Python to easily create all contiguous sublists (i.e. sub-sequences) starting from the first item?:
['a']
['a', 'b']
['a', 'b', 'c']
['a', 'b', 'c', 'd']
in Python?
Note that I am excluding lists/sequences such as ['a' ,'c'], ['a', 'd'], ['b'], ['c'] or ['d']
To match your example output (prefixes), then you can just use:
prefixes = [your_list[:end] for end in xrange(1, len(your_list) + 1)]
You can do this with a simple list comprehension:
>>> l = ['a', 'b', 'c', 'd']
>>>
>>> [l[:i+1] for i in range(len(l))]
[['a'], ['a', 'b'], ['a', 'b', 'c'], ['a', 'b', 'c', 'd']]
See also: range()
If you're using Python 2.x, use xrange() instead.
A little more Pythonic than using (x)range (with the benefit of being the same solution for either Python 2 or Python 3):
lst = list('abcde')
prefixes = [ lst[:i+1] for i,_ in enumerate(lst) ]
If you decided that the empty list should be a valid (zero-length) prefix, a small hack will include it:
# Include 0 as an slice index and still get the full list as a prefix
prefixes = [ lst[:i] for i,_ in enumerate(lst + [None]) ]
Just as an alternative:
def prefixes(seq):
result = []
for item in seq:
result.append(item)
yield result[:]
for x in prefixes(['a', 'b', 'c', 'd']):
print(x)

Categories

Resources