Split string into strings of repeating elements

Split string into strings of repeating elements - python

I want to split a string like:
'aaabbccccabbb'
into
['aaa', 'bb', 'cccc', 'a', 'bbb']
What's an elegant way to do this in Python? If it makes it easier, it can be assumed that the string will only contain a's, b's and c's.

That is the use case for itertools.groupby :)
>>> from itertools import groupby
>>> s = 'aaabbccccabbb'
>>> [''.join(y) for _,y in groupby(s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']

You can create an iterator - without trying to be smart just to keep it short and unreadable:
def yield_same(string):
it_str = iter(string)
result = it_str.next()
for next_chr in it_str:
if next_chr != result[0]:
yield result
result = ""
result += next_chr
yield result
..
>>> list(yield_same("aaaaaabcbcdcdccccccdddddd"))
['aaaaaa', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'cccccc', 'dddddd']
>>>
edit
ok, so there is itertools.groupby, which probably does something like this.

Here's the best way I could find using regex:
print [a for a,b in re.findall(r"((\w)\2*)", s)]

>>> import re
>>> s = 'aaabbccccabbb'
>>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']

Related

python from ['a','b','c','d'] to ['a', 'ab', abc', 'abcd']

I have a list ['a','b','c','d'], want to make another list, like this: ['a', 'ab', abc', 'abcd']?
Thanks
Tried:
list1=['a','b','c', 'd']
for i in range(1, (len(list1)+1)):
for j in range(1, 1+i):
print(*[list1[j-1]], end = "")
print()
returns:
a
ab
abc
abcd
It does print what i want, but not sure,how to add it to a list to look like ['a', 'ab', abc', 'abcd']

Use itertools.accumulate, which by default sums up the elements for accumulation like a cummulative sum. Since addition (__add__) is defined for str and results in the concatenation of the strings
assert "a" + "b" == "ab"
we can use accumulate as is:
import itertools
list1 = ["a", "b", "c", "d"]
list2 = list(itertools.accumulate(list1)) # list() because accumulate returns an iterator
print(list2) # ['a', 'ab', 'abc', 'abcd']

Append to a second list in a loop:
list1=['a','b','c', 'd']
list2 = []
s = ''
for c in list1:
s += c
list2.append(s)
print(list2)
Output:
['a', 'ab', 'abc', 'abcd']

list1=['a','b','c', 'd']
l = []
for i in range(len(list1)):
l.append("".join(list1[:i+1]))
print(l)
Printing stuff is useless if you want to do ANYTHING else with the data you are printing. Only use it when you actually want to display something to console.

You could form a string and slice it in a list comprehension:
s = ''.join(['a', 'b', 'c', 'd'])
out = [s[:i+1] for i, _ in enumerate(s)]
print(out):
['a', 'ab', 'abc', 'abcd']

You can do this in a list comprehension:
vals = ['a', 'b', 'c', 'd']
res = [''.join(vals[:i+1]) for i, _ in enumerate(vals)]

Code:
[''.join(list1[:i+1]) for i,l in enumerate(list1)]
Output:
['a', 'ab', 'abc', 'abcd']

Python : list of strings to list of unique characters [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a list of strings
ll = ['abc', 'abd', 'xyz', 'xzk']
I want a list of unique characters across all strings in the given list.
For ll, output should be
['a','b','c','d','x','y','z','k']
is there a clean way to do this ?

You want to produce a set of the letters:
{l for word in ll for l in word}
You can always convert that back to a list:
list({l for word in ll for l in word})
Demo:
>>> ll = ['abc', 'abd', 'xyz', 'xzk']
>>> {l for word in ll for l in word}
{'b', 'a', 'x', 'k', 'd', 'c', 'z', 'y'}
You can also use itertools.chain.from_iterable() to provide a single iterator over all the characters:
from itertools import chain
set(chain.from_iterable(ll))
If you must have a list that reflects the order of the first occurrence of the characters, you can use a collections.OrderedDict() object instead of a set, then extract the keys with list():
from collections import OrderedDict
from itertools import chain
list(OrderedDict.fromkeys(chain.from_iterable(ll)))
Demo:
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(chain.from_iterable(ll)))
['a', 'b', 'c', 'd', 'x', 'y', 'z', 'k']

I do not know the simplest way to do this, but I know one way:
list = ['abc', 'abd', 'xyz', 'xzk']
new=set()
for word in list:
for letter in word:
new.add(letter)
print(new)
This is an easy way for a beginner because it doesn't need any modules which you probably don't know how to use yet.

Here's an inefficient way that preserves the order. It's ok when the total number of chars is small, otherwise, you should use Martijn's OrderedDict approach.
ll = ['abc', 'abd', 'xyz', 'xzk']
s = ''.join(ll)
print(sorted(set(s), key=s.index))
output
['a', 'b', 'c', 'd', 'x', 'y', 'z', 'k']
Here's an alternative way to preserve the order which is less compact, but more efficient than the previous approach.
ll = ['abc', 'abd', 'xyz', 'xzk']
d = {c: i for i, c in enumerate(reversed(''.join(ll)))}
print(sorted(d, reverse=True, key=d.get))
output
['a', 'b', 'c', 'd', 'x', 'y', 'z', 'k']
Using s.index as the key function is inefficient because it has to perform a linear scan on the s string for each character that it sorts, whereas my d dict can get the index of each character in O(1). I use the reversed iterator because we want earlier chars to overwrite later duplicates of the same char, and using reversed is a little more efficient than building a new string with [::-1].
Creating the d dict is only slightly slower than creating set(s), and it may be a little faster than using OrderedDict, it certainly uses less RAM.

Consider using a set()
s = set()
for word in ll:
for letter in word:
s.add(letter)
Now s should have all the unique letters. You can convert s to a list using list(s).

You can use itertools for that:
import itertools
ll = ['abc', 'abd', 'xyz', 'xzk']
set(itertools.chain(*[list(x) for x in ll]))
{'a', 'b', 'c', 'd', 'k', 'x', 'y', 'z'}

l2 =list()
for i in ll:
for j in i:
l2.append(j)
[''.join(i) for i in set(l2)]
output:
'a', 'c', 'b', 'd', 'k', 'y', 'x', 'z'

Just another one...
>>> set().union(*ll)
{'d', 'a', 'y', 'k', 'c', 'x', 'b', 'z'}
Wrap list(...) around it if needed, though why would you.

This is a function you can call and give it the list and it will return all unique letters and I added it to print at the end
lst = ['abc', 'abd', 'xyz', 'xzk']
def uniqueLetters(lst1):
unique = set()
for word in lst1:
for letter in word:
unique.add(letter)
return unique
print(uniqueLetters(lst))
To get a variable with the unique variables call the function like so:
uniqueLetters123 = uniqueLetters(lst)
And you can replace lst with your list name.

Combinations and Permutations of characters

I am trying to come up with elegant code that creates combinations/permutations of characters from a single character:
E.g. from a single character I'd like code to create these permutations (order of the result is not important):
'a' ----> ['a', 'aa', 'A', 'AA', 'aA', 'Aa']
The not so elegant solutions I have thus far:
# this does it...
from itertools import permutations
char = 'a'
p = [char, char*2, char.upper(), char.upper()*2]
pp = [] # stores the final list of permutations
for j in range(1,3):
for i in permutations(p,j):
p2 = ''.join(i)
if len(p2) < 3:
pp.append(p2)
print pp
['a', 'aa', 'A', 'AA', 'aA', 'Aa']
#this also works...
char = 'a'
p = ['', char, char*2, char.upper(), char.upper()*2]
pp = [] # stores the final list of permutations
for i in permutations(p,2):
j = ''.join(i)
if len(j) < 3:
pp.append(j)
print list(set(pp))
['a', 'aa', 'aA', 'AA', 'Aa', 'A']
# and finally... so does this:
char = 'a'
p = ['', char, char.upper()]
pp = [] # stores the final list of permutations
for i in permutations(p,2):
pp.append(''.join(i))
print list(set(pp)) + [char*2, char.upper()*2]
['a', 'A', 'aA', 'Aa', 'aa', 'AA']
I'm not great with lambdas, and I suspect that may be where a better solution lies.
So, could you help me find the most elegant/pythonic way to the desired result?

You can simply use the itertools.product with different repeat values to get the expected result
>>> pop = ['a', 'A']
>>> from itertools import product
>>> [''.join(item) for i in range(len(pop)) for item in product(pop, repeat=i + 1)]
['a', 'A', 'aa', 'aA', 'Aa', 'AA']

Separating a String

Given a string, I want to generate all possible combinations. In other words, all possible ways of putting a comma somewhere in the string.
For example:
input: ["abcd"]
output: ["abcd"]
["abc","d"]
["ab","cd"]
["ab","c","d"]
["a","bc","d"]
["a","b","cd"]
["a","bcd"]
["a","b","c","d"]
I am a bit stuck on how to generate all the possible lists. Combinations will just give me lists with length of subset of the set of strings, permutations will give all possible ways to order.
I can make all the cases with only one comma in the list because of iterating through the slices, but I can't make cases with two commas like "ab","c","d" and "a","b","cd"
My attempt w/slice:
test="abcd"
for x in range(len(test)):
print test[:x],test[x:]

How about something like:
from itertools import combinations
def all_splits(s):
for numsplits in range(len(s)):
for c in combinations(range(1,len(s)), numsplits):
split = [s[i:j] for i,j in zip((0,)+c, c+(None,))]
yield split
after which:
>>> for x in all_splits("abcd"):
... print(x)
...
['abcd']
['a', 'bcd']
['ab', 'cd']
['abc', 'd']
['a', 'b', 'cd']
['a', 'bc', 'd']
['ab', 'c', 'd']
['a', 'b', 'c', 'd']

You can certainly use itertools for this, but I think it's easier to write a recursive generator directly:
def gen_commas(s):
yield s
for prefix_len in range(1, len(s)):
prefix = s[:prefix_len]
for tail in gen_commas(s[prefix_len:]):
yield prefix + "," + tail
Then
print list(gen_commas("abcd"))
prints
['abcd', 'a,bcd', 'a,b,cd', 'a,b,c,d', 'a,bc,d', 'ab,cd', 'ab,c,d', 'abc,d']
I'm not sure why I find this easier. Maybe just because it's dead easy to do it directly ;-)

You could generate the power set of the n - 1 places that you could put commas:
what's a good way to combinate through a set?
and then insert commas in each position.

Using itertools:
import itertools
input_str = "abcd"
for k in range(1,len(input_str)):
for subset in itertools.combinations(range(1,len(input_str)), k):
s = list(input_str)
for i,x in enumerate(subset): s.insert(x+i, ",")
print "".join(s)
Gives:
a,bcd
ab,cd
abc,d
a,b,cd
a,bc,d
ab,c,d
a,b,c,d
Also a recursive version:
def commatoze(s,p=1):
if p == len(s):
print s
return
commatoze(s[:p] + ',' + s[p:], p + 2)
commatoze(s, p + 1)
input_str = "abcd"
commatoze(input_str)

You can solve the integer composition problem and use the compositions to guide where to split the list. Integer composition can be solved fairly easily with a little bit of dynamic programming.
def composition(n):
if n == 1:
return [[1]]
comp = composition (n - 1)
return [x + [1] for x in comp] + [y[:-1] + [y[-1]+1] for y in comp]
def split(lst, guide):
ret = []
total = 0
for g in guide:
ret.append(lst[total:total+g])
total += g
return ret
lst = list('abcd')
for guide in composition(len(lst)):
print split(lst, guide)
Another way to generate integer composition:
from itertools import groupby
def composition(n):
for i in xrange(2**(n-1)):
yield [len(list(group)) for _, group in groupby('{0:0{1}b}'.format(i, n))]

Given
import more_itertools as mit
Code
list(mit.partitions("abcd"))
Output
[[['a', 'b', 'c', 'd']],
[['a'], ['b', 'c', 'd']],
[['a', 'b'], ['c', 'd']],
[['a', 'b', 'c'], ['d']],
[['a'], ['b'], ['c', 'd']],
[['a'], ['b', 'c'], ['d']],
[['a', 'b'], ['c'], ['d']],
[['a'], ['b'], ['c'], ['d']]]
Install more_itertools via > pip install more-itertools.

Python: function that gets n-th element of a list

I have a list of strings like
['ABC', 'DEF', 'GHIJ']
and I want a list of strings containing the first letter of each string, i.e.
['A', 'D', 'G'].
I thought about doing that using map and the function that returns the first element of a list: my_list[0]. But how can I pass this to map?
Thanks.

you can try
In [14]: l = ['ABC', 'DEF', 'GHIJ']
In [15]: [x[0] for x in l]
Out[15]: ['A', 'D', 'G']

You should use a list comprehension, like #avasal since it's more pythonic, but here's how to do it with map:
>>> from operator import itemgetter
>>> L = ['ABC', 'DEF', 'GHIJ']
>>> map(itemgetter(0), L)
['A', 'D', 'G']

use list comprehension like so:
results = [i[0] for i in mySrcList]

One way:
l1=['ABC', 'DEF', 'GHIJ']
l1=map(lambda x:x[0], l1)

a=['ABC','DEF','GHI']
b=[]
for i in a:
b.append(i[0])
b is the array you need.

Try this.
>>> myArray=['ABC', 'DEF', 'GHIJ']
>>> newArray=[]
>>> for i in map(lambda x:x[0],myArray):
... newArray.append(i)
...
>>> print(newArray)
['A', 'D', 'G']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split string into strings of repeating elements - python

I want to split a string like: 'aaabbccccabbb' into ['aaa', 'bb', 'cccc', 'a', 'bbb'] What's an elegant way to do this in Python? If it makes it easier, it can be assumed that the string will only contain a's, b's and c's.

That is the use case for itertools.groupby :) >>> from itertools import groupby >>> s = 'aaabbccccabbb' >>> [''.join(y) for _,y in groupby(s)] ['aaa', 'bb', 'cccc', 'a', 'bbb']

Here's the best way I could find using regex: print [a for a,b in re.findall(r"((\w)\2*)", s)]

>>> import re >>> s = 'aaabbccccabbb' >>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)] ['aaa', 'bb', 'cccc', 'a', 'bbb']

Related

python from ['a','b','c','d'] to ['a', 'ab', abc', 'abcd']

Python : list of strings to list of unique characters [closed]

Combinations and Permutations of characters

Separating a String

Python: function that gets n-th element of a list

Categories

Resources