Python string to list? - python

I'm trying to convert string to a list
str = "ab(1234)bcta(45am)in23i(ab78lk)"
Expected Output
res_str = ["ab","bcta","in23i"]
I tried removing brackets from str.
re.sub(r'\([^)]*\)', '', str)

You may use a negated character class with a lookahead:
>>> s = "ab(1234)bcta(45am)in23i(ab78lk)"
>>> print (re.findall(r'[^()]+(?=\()', s))
['ab', 'bcta', 'in23i']
RegEx Details:
[^()]+: Match 1 of more of any character that is not ( and )
(?=\(): Lookahead to assert that there is a ( ahead

So many options here. One possibility would be using split:
import re
str = "ab(1234)bcta(45am)in23i(ab78lk)"
print(re.split(r'\(.*?\)', str)[:-1])
Returns:
['ab', 'bcta', 'in23i']
A second option would be to split by all paranthesis and slice your resulting array:
import re
str = "ab(1234)bcta(45am)in23i(ab78lk)"
print(re.split('[()]', str)[0:-1:2])
Where [0:-1:2] means to start at index 0, to stop at second to last index, and step two indices.

Use re.split
import re
str = "ab(1234)bcta(45am)in23i(ab78lk)"
print(re.split('\(.*?\)', str))
Returns:
['ab', 'bcta', 'in23i', '']
If you want to get rid of empty strings in your list, you may use a filter:
print(list(filter(None, re.split('\(.*?\)', str))))
Returns:
['ab', 'bcta', 'in23i']

You may match all alphanumeric characters followed by a ( :
>>> re.findall('\w+(?=\()',str)
['ab', 'bcta', 'in23i']
or using re.sub as you were:
>>> re.sub('\([^)]+\)',' ',str).split()
['ab', 'bcta', 'in23i']

Just for the sake of complexity :
>>>> str = "ab(1234)bcta(45am)in23i(ab78lk)"
>>>> res_str = [y[-1] for y in [ x.split(')') for x in str.split('(')]][0:-1]
['ab', 'bcta', 'in23i']

Related

Split a string after multiple delimiters and include it

Hello I'm trying to split a string without removing the delimiter and it can have multiple delimiters.
The delimiters can be 'D', 'M' or 'Y'
For example:
>>>string = '1D5Y4D2M'
>>>re.split(someregex, string) #should ideally return
['1D', '5Y', '4D', '2M']
To keep the delimiter I use Python split() without removing the delimiter
>>> re.split('([^D]+D)', '1D5Y4D2M')
['', '1D', '', '5Y4D', '2M']
For multiple delimiters I use In Python, how do I split a string and keep the separators?
>>> re.split('(D|M|Y)', '1D5Y4D2M')
['1', 'D', '5', 'Y', '4', 'D', '2', 'M', '']
Combining both doesn't quite make it.
>>> re.split('([^D]+D|[^M]+M|[^Y]+Y)', string)
['', '1D', '', '5Y4D', '', '2M', '']
Any ideas?
I'd use findall() in your case. How about:
re.findall(r'\d+[DYM]', string
Which will result in:
['1D', '5Y', '4D', '2M']
(?<=(?:D|Y|M))
You need 0 width assertion split.Can be done using regex module python.
See demo.
https://regex101.com/r/aKV13g/1
You can split at the locations right after D, Y or M but not at the end of the string with
re.split(r'(?<=[DYM])(?!$)', text)
See the regex demo. Details:
(?<=[DYM]) - a positive lookbehind that matches a location that is immediately preceded with D or Y or M
(?!$) - a negative lookahead that fails the match if the current position is the string end position.
Note
In the current scenario, (?<=[DYM]) can be used instead of a more verbose (?<=D|Y|M) since all alternatives are single characters. If you have multichar delimiters, you would have to use a non-capturing group, (?:...), with lookbehind alternatives inside it. For example, to separate right after Y, DX and MZB you would use (?:(?<=Y)|(?<=DX)|(?<=MZB)). See Python Regex Engine - "look-behind requires fixed-width pattern" Error
I think it will work fine without regex or split
time complexity O(n)
string = '1D5Y4D2M'
temp=''
res = []
for x in string:
if x=='D':
temp+='D'
res.append(temp)
temp=''
elif x=='M':
temp+='M'
res.append(temp)
temp=''
elif x=='Y':
temp+='Y'
res.append(temp)
temp=''
else:
temp+=x
print(res)
using translate
string = '1D5Y4D2M'
delimiters = ['D', 'Y', 'M']
result = string.translate({ord(c): f'{c}*' for c in delimiters}).strip('.*').split('*')
print(result)
>>> ['1D', '5Y', '4D', '2M']

string split considering quotation

Imagine this string:
"a","b","hi, this is Mboyle"
I would like to split it on commas, unless the comma is between two quotations:
i.e:
["a","b","hi, this is Mboyle"]
How do I achieve this? Using split, the "hi, this is Mboyle" gets split as well!
You can split your string not by commas, but by ",":
In [1]: '"a","b","hi, this is Mboyle"'.strip('"').split('","')
Out[1]: ['a', 'b', 'hi, this is Mboyle']
My take on the problem (use with caution!)
s = '"a","b","hi, this is Mboyle"'
new_s = eval(f'[{s}]')
print(new_s)
Output:
['a', 'b', 'hi, this is Mboyle']
EDIT (safer version):
import ast.literal_eval
s = '"a","b","hi, this is Mboyle"'
new_s = ast.literal_eval(f'[{s}]')
Solved.
with gzip.open(file, 'rt') as handler:
for row in csv.reader(handler, delimiter=","):
This makes the trick! Thank you to you all
You could include the quotations in the split, so with .split('","'). Then remove the quotations on the end list items as needed.
You can use re.split:
import re
s = '"a","b","hi, this is Mboyle"'
new_s = list(map(lambda x:x[1:-1], re.split('(?<="),(?=")', s)))
Output:
['a', 'b', 'hi, this is Mboyle']
However, re.findall is much cleaner:
new_result = re.findall('"(.*?)"', s)
Output:
['a', 'b', 'hi, this is Mboyle']

python how to split string with more than one character?

I would like to split a string as below
1234ABC into 123 and ABC
2B into 2 and B
10E into 10 and E
I found split function does not work because there is no delimiter
You can use itertools.groupby with boolean isdigit function.
from itertools import groupby
test1 = '123ABC'
test2 = '2B'
test3 = '10E'
def custom_split(s):
return [''.join(gp) for _, gp in groupby(s, lambda char: char.isdigit())]
for t in [test1, test2, test3]:
print(custom_split(t))
# ['123', 'ABC']
# ['2', 'B']
# ['10', 'E']
This can quite easily be accomplished using the re module:
>>> import re
>>>
>>> re.findall('[a-zA-Z]+|[0-9]+', '1234ABC')
['1234', 'ABC']
>>> re.findall('[a-zA-Z]+|[0-9]+', '2B')
['2', 'B']
>>> re.findall('[a-zA-Z]+|[0-9]+', '10E')
['10', 'E']
>>> # addtionall test case
...
>>> re.findall('[a-zA-Z]+|[0-9]+', 'abcd1234efgh5678')
['abcd', '1234', 'efgh', '5678']
>>>
The regex use is very simple. Here is quick walk through:
[a-zA-Z]+: Match one or more alphabetic characters lower case or upper
| or...
[0-9]+: One or more whole numbers
Another way to solve it using re package
r = re.search('([0-9]*)([a-zA-Z]*)', test_string)
r.groups()

Find all strings in nested brackets

How do i find string in nested brackets
Lets say I have a string
uv(wh(x(yz))
and I want to find all string in brackets (so wh, x, yz)
import re
s="uuv(wh(x(yz))"
regex = r"(\(\w*?\))"
matches = re.findall(regex, s)
The above code only finds yz
Can I modify this regex to find all matches?
To get all properly parenthesized text:
import re
def get_all_in_parens(text):
in_parens = []
n = "has something to substitute"
while n:
text, n = re.subn(r'\(([^()]*)\)', # match flat expression in parens
lambda m: in_parens.append(m.group(1)) or '', text)
return in_parens
Example:
>>> get_all_in_parens("uuv(wh(x(yz))")
['yz', 'x']
Note: there is no 'wh' in the result due to the unbalanced paren.
If the parentheses are balanced; it returns all three nested substrings:
>>> get_all_in_parens("uuv(wh(x(yz)))")
['yz', 'x', 'wh']
>>> get_all_in_parens("a(b(c)de)")
['c', 'bde']
Would a string split work instead of a regex?
s='uv(wh(x(yz))'
match=[''.join(x for x in i if x.isalpha()) for i in s.split('(')]
>>>print(match)
['uv', 'wh', 'x', 'yz']
>>> match.pop(0)
You could pop off the first element because if it was contained in a parenthesis, the first position would be blank, which you wouldn't want and if it wasn't blank that means it wasn't in the parenthesis so again, you wouldn't want it.
Since that wasn't flexible enough something like this would work:
def match(string):
unrefined_match=re.findall('\((\w+)|(\w+)\)', string)
return [x for i in unrefined_match for x in i if x]
>>> match('uv(wh(x(yz))')
['wh', 'x', 'yz']
>>> match('a(b(c)de)')
['b', 'c', 'de']
Using regex a pattern such as this might potentially work:
\((\w{1,})
Result:
['wh', 'x', 'yz']
Your current pattern escapes the ( ) and doesn't treat them as a capture group.
Well if you know how to covert from PHP regex to Python , then you can use this
\(((?>[^()]+)|(?R))*\)

Split a string into sets of twos

I want to split a string into sets of twos, e.g.
['abcdefg']
to
['ab','cd','ef']
Here is what I have so far:
string = 'acabadcaa\ndarabr'
newString = []
for i in string:
newString.append(string[i:i+2])
One option using regular expressions:
>>> import re
>>> re.findall(r'..', 'abcdefg')
['ab', 'cd', 'ef']
re.findall returns a list of all non-overlapping matches from a string. '..' says match any two consecutive characters.
def splitCount(s, count):
return [''.join(x) for x in zip(*[list(s[z::count]) for z in range(count)])]
splitCount('abcdefg',2)
To split a string s into a list of (guaranteed) equally long substrings of the length n, and truncating smaller fragments:
n = 2
s = 'abcdef'
lst = [s[i:i+n] for i in xrange(0, len(s)-len(s)%n, n)]
['ab', 'cd', 'ef']
Try this
s = "abcdefg"
newList = [s[i:i+2] for i in range(0,len(s)-1,2)]
This function will get any chunk :
def chunk(s,chk):
ln = len(s)
return [s[i:i+chk] for i in xrange(0, ln - ln % chk, chk)]
In [2]: s = "abcdefg"
In [3]: chunk(s,2)
Out[3]: ['ab', 'cd', 'ef']
In [4]: chunk(s,3)
Out[4]: ['abc', 'def']
In [5]: chunk(s,5)
Out[5]: ['abcde']

Categories

Resources