Split a string into sets of twos - python

I want to split a string into sets of twos, e.g.
['abcdefg']
to
['ab','cd','ef']
Here is what I have so far:
string = 'acabadcaa\ndarabr'
newString = []
for i in string:
newString.append(string[i:i+2])

One option using regular expressions:
>>> import re
>>> re.findall(r'..', 'abcdefg')
['ab', 'cd', 'ef']
re.findall returns a list of all non-overlapping matches from a string. '..' says match any two consecutive characters.

def splitCount(s, count):
return [''.join(x) for x in zip(*[list(s[z::count]) for z in range(count)])]
splitCount('abcdefg',2)

To split a string s into a list of (guaranteed) equally long substrings of the length n, and truncating smaller fragments:
n = 2
s = 'abcdef'
lst = [s[i:i+n] for i in xrange(0, len(s)-len(s)%n, n)]
['ab', 'cd', 'ef']

Try this
s = "abcdefg"
newList = [s[i:i+2] for i in range(0,len(s)-1,2)]

This function will get any chunk :
def chunk(s,chk):
ln = len(s)
return [s[i:i+chk] for i in xrange(0, ln - ln % chk, chk)]
In [2]: s = "abcdefg"
In [3]: chunk(s,2)
Out[3]: ['ab', 'cd', 'ef']
In [4]: chunk(s,3)
Out[4]: ['abc', 'def']
In [5]: chunk(s,5)
Out[5]: ['abcde']

Related

How to filter out strings that do not start with specific chars from a list [python]? [duplicate]

Given the list ['a','ab','abc','bac'], I want to compute a list with strings that have 'ab' in them. I.e. the result is ['ab','abc']. How can this be done in Python?
This simple filtering can be achieved in many ways with Python. The best approach is to use "list comprehensions" as follows:
>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']
Another way is to use the filter function. In Python 2:
>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']
In Python 3, it returns an iterator instead of a list, but you can cast it:
>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']
Though it's better practice to use a comprehension.
[x for x in L if 'ab' in x]
# To support matches from the beginning, not any matches:
items = ['a', 'ab', 'abc', 'bac']
prefix = 'ab'
filter(lambda x: x.startswith(prefix), items)
Tried this out quickly in the interactive shell:
>>> l = ['a', 'ab', 'abc', 'bac']
>>> [x for x in l if 'ab' in x]
['ab', 'abc']
>>>
Why does this work? Because the in operator is defined for strings to mean: "is substring of".
Also, you might want to consider writing out the loop as opposed to using the list comprehension syntax used above:
l = ['a', 'ab', 'abc', 'bac']
result = []
for s in l:
if 'ab' in s:
result.append(s)
mylist = ['a', 'ab', 'abc']
assert 'ab' in mylist

Python string to list?

I'm trying to convert string to a list
str = "ab(1234)bcta(45am)in23i(ab78lk)"
Expected Output
res_str = ["ab","bcta","in23i"]
I tried removing brackets from str.
re.sub(r'\([^)]*\)', '', str)
You may use a negated character class with a lookahead:
>>> s = "ab(1234)bcta(45am)in23i(ab78lk)"
>>> print (re.findall(r'[^()]+(?=\()', s))
['ab', 'bcta', 'in23i']
RegEx Details:
[^()]+: Match 1 of more of any character that is not ( and )
(?=\(): Lookahead to assert that there is a ( ahead
So many options here. One possibility would be using split:
import re
str = "ab(1234)bcta(45am)in23i(ab78lk)"
print(re.split(r'\(.*?\)', str)[:-1])
Returns:
['ab', 'bcta', 'in23i']
A second option would be to split by all paranthesis and slice your resulting array:
import re
str = "ab(1234)bcta(45am)in23i(ab78lk)"
print(re.split('[()]', str)[0:-1:2])
Where [0:-1:2] means to start at index 0, to stop at second to last index, and step two indices.
Use re.split
import re
str = "ab(1234)bcta(45am)in23i(ab78lk)"
print(re.split('\(.*?\)', str))
Returns:
['ab', 'bcta', 'in23i', '']
If you want to get rid of empty strings in your list, you may use a filter:
print(list(filter(None, re.split('\(.*?\)', str))))
Returns:
['ab', 'bcta', 'in23i']
You may match all alphanumeric characters followed by a ( :
>>> re.findall('\w+(?=\()',str)
['ab', 'bcta', 'in23i']
or using re.sub as you were:
>>> re.sub('\([^)]+\)',' ',str).split()
['ab', 'bcta', 'in23i']
Just for the sake of complexity :
>>>> str = "ab(1234)bcta(45am)in23i(ab78lk)"
>>>> res_str = [y[-1] for y in [ x.split(')') for x in str.split('(')]][0:-1]
['ab', 'bcta', 'in23i']

Using Variables instead of pattern in Regular Expression

I am fairly new to python. I have searched several forums and have not quite found the answer.
I have a list defined and would like to search a line for occurrences in the list. Something like
import re
list = ['a', 'b', 'c']
for xa in range(0, len(list)):
m = re.search(r, list[xa], line):
if m:
print(m)
Is there anyway to pass the variable into regex?
yep, you could do like this,
for xa in range(0, len(lst)):
m = re.search(lst[xa], line)
if m:
print(m.group())
Example:
>>> line = 'foo bar'
>>> import re
>>> lst = ['a', 'b', 'c']
>>> for xa in range(0, len(lst)):
m = re.search(lst[xa], line)
if m:
print(m.group())
a
b
You can build the variable into the regex parameter, for example:
import re
line = '1y2c3a'
lst = ['a', 'b', 'c']
for x in lst:
m = re.search('\d'+x, line)
if m:
print m.group()
Output:
3a
2c

Search a list of strings with a list of substrings

I have a list of strings and currently I can search for one substring at the time:
str = ['abc', 'efg', 'xyz']
[s for s in str if "a" in s]
which correctly returns
['abc']
Now let's say I have a list of substrings instead:
subs = ['a', 'ef']
I want a command like
[s for s in str if anyof(subs) in s]
which should return
['abc', 'efg']
>>> s = ['abc', 'efg', 'xyz']
>>> subs = ['a', 'ef']
>>> [x for x in s if any(sub in x for sub in subs)]
['abc', 'efg']
Don't use str as a variable name, it's a builtin.
Gets a little convoluted but you could do
[s for s in str if any([sub for sub in subs if sub in s])]
Simply use them one after the other:
[s for s in str for r in subs if r in s]
>>> r = ['abc', 'efg', 'xyz']
>>> s = ['a', 'ef']
>>> [t for t in r for x in s if x in t]
['abc', 'efg']
I still like map and filter, despite what is being said against and how comprehension can always replace a map and a filter. Hence, here is a map + filter + lambda version:
print filter(lambda x: any(map(x.__contains__,subs)), s)
which reads:
filter elements of s that contain any element from subs
I like how this uses words that carry a strong semantic meaning, rather than only if, for, in

How to sort the letters in a string alphabetically in Python

Is there an easy way to sort the letters in a string alphabetically in Python?
So for:
a = 'ZENOVW'
I would like to return:
'ENOVWZ'
You can do:
>>> a = 'ZENOVW'
>>> ''.join(sorted(a))
'ENOVWZ'
>>> a = 'ZENOVW'
>>> b = sorted(a)
>>> print b
['E', 'N', 'O', 'V', 'W', 'Z']
sorted returns a list, so you can make it a string again using join:
>>> c = ''.join(b)
which joins the items of b together with an empty string '' in between each item.
>>> print c
'ENOVWZ'
Sorted() solution can give you some unexpected results with other strings.
List of other solutions:
Sort letters and make them distinct:
>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s.lower())))
' belou'
Sort letters and make them distinct while keeping caps:
>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s)))
' Bbelou'
Sort letters and keep duplicates:
>>> s = "Bubble Bobble"
>>> ''.join(sorted(s))
' BBbbbbeellou'
If you want to get rid of the space in the result, add strip() function in any of those mentioned cases:
>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s.lower()))).strip()
'belou'
Python functionsorted returns ASCII based result for string.
INCORRECT: In the example below, e and d is behind H and W due it's to ASCII value.
>>>a = "Hello World!"
>>>"".join(sorted(a))
' !!HWdellloor'
CORRECT: In order to write the sorted string without changing the case of letter. Use the code:
>>> a = "Hello World!"
>>> "".join(sorted(a,key=lambda x:x.lower()))
' !deHllloorW'
OR (Ref: https://docs.python.org/3/library/functions.html#sorted)
>>> a = "Hello World!"
>>> "".join(sorted(a,key=str.lower))
' !deHllloorW'
If you want to remove all punctuation and numbers.
Use the code:
>>> a = "Hello World!"
>>> "".join(filter(lambda x:x.isalpha(), sorted(a,key=lambda x:x.lower())))
'deHllloorW'
You can use reduce
>>> a = 'ZENOVW'
>>> reduce(lambda x,y: x+y, sorted(a))
'ENOVWZ'
the code can be used to sort string in alphabetical order without using any inbuilt function of python
k = input("Enter any string again ")
li = []
x = len(k)
for i in range (0,x):
li.append(k[i])
print("List is : ",li)
for i in range(0,x):
for j in range(0,x):
if li[i]<li[j]:
temp = li[i]
li[i]=li[j]
li[j]=temp
j=""
for i in range(0,x):
j = j+li[i]
print("After sorting String is : ",j)
Really liked the answer with the reduce() function. Here's another way to sort the string using accumulate().
from itertools import accumulate
s = 'mississippi'
print(tuple(accumulate(sorted(s)))[-1])
sorted(s) -> ['i', 'i', 'i', 'i', 'm', 'p', 'p', 's', 's', 's', 's']
tuple(accumulate(sorted(s)) -> ('i', 'ii', 'iii', 'iiii', 'iiiim', 'iiiimp', 'iiiimpp', 'iiiimpps', 'iiiimppss', 'iiiimppsss', 'iiiimppssss')
We are selecting the last index (-1) of the tuple

Categories

Resources