This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python: Split string with multiple delimiters
I have a small syntax problem. I have a string and another string that has a list of seperators. I need to split it via the .split method.
I can't seem to figure out how, this certainly gives a Type error.
String.split([' ', '{', '='])
How can i split it with multiple seperators?
str.split() only accepts one separator.
Use re.split() to split using a regular expression.
import re
re.split(r"[ {=]", "foo bar=baz{qux")
Output:
['foo', 'bar', 'baz', 'qux']
That's not how the built-in split() method works. It simply uses a single string as the separator, not a list of single-character separators.
You can use regular-expression based splitting, instead. This would probably mean building a regular expression that is the "or" of all your desired delimiters:
splitters = "|".join([" ", "{", "="])
re.split(splitters, my_string)
You can do this with the re (regex) library like so:
import re
result=re.split("[abc]", "my string with characters i want to split")
Where the characters in the square brackets are the characters you want to split with.
Use split from regular expressions instead:
>>> import re
>>> s = 'toto + titi = tata'
>>> re.split('[+=]', s)
['toto ', ' titi ', ' tata']
>>>
import re
string_test = "abc cde{fgh=ijk"
re.split('[\s{=]',string_test)
Related
This question already has answers here:
Split Strings into words with multiple word boundary delimiters
(31 answers)
Closed 8 years ago.
I found some answers online, but I have no experience with regular expressions, which I believe is what is needed here.
I have a string that needs to be split by either a ';' or ', '
That is, it has to be either a semicolon or a comma followed by a space. Individual commas without trailing spaces should be left untouched
Example string:
"b-staged divinylsiloxane-bis-benzocyclobutene [124221-30-3], mesitylene [000108-67-8]; polymerized 1,2-dihydro-2,2,4- trimethyl quinoline [026780-96-1]"
should be split into a list containing the following:
('b-staged divinylsiloxane-bis-benzocyclobutene [124221-30-3]' , 'mesitylene [000108-67-8]', 'polymerized 1,2-dihydro-2,2,4- trimethyl quinoline [026780-96-1]')
Luckily, Python has this built-in :)
import re
re.split('; |, ', string_to_split)
Update:Following your comment:
>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']
Do a str.replace('; ', ', ') and then a str.split(', ')
Here's a safe way for any iterable of delimiters, using regular expressions:
>>> import re
>>> delimiters = "a", "...", "(c)"
>>> example = "stackoverflow (c) is awesome... isn't it?"
>>> regex_pattern = '|'.join(map(re.escape, delimiters))
>>> regex_pattern
'a|\\.\\.\\.|\\(c\\)'
>>> re.split(regex_pattern, example)
['st', 'ckoverflow ', ' is ', 'wesome', " isn't it?"]
re.escape allows to build the pattern automatically and have the delimiters escaped nicely.
Here's this solution as a function for your copy-pasting pleasure:
def split(delimiters, string, maxsplit=0):
import re
regex_pattern = '|'.join(map(re.escape, delimiters))
return re.split(regex_pattern, string, maxsplit)
If you're going to split often using the same delimiters, compile your regular expression beforehand like described and use RegexObject.split.
If you'd like to leave the original delimiters in the string, you can change the regex to use a lookbehind assertion instead:
>>> import re
>>> delimiters = "a", "...", "(c)"
>>> example = "stackoverflow (c) is awesome... isn't it?"
>>> regex_pattern = '|'.join('(?<={})'.format(re.escape(delim)) for delim in delimiters)
>>> regex_pattern
'(?<=a)|(?<=\\.\\.\\.)|(?<=\\(c\\))'
>>> re.split(regex_pattern, example)
['sta', 'ckoverflow (c)', ' is a', 'wesome...', " isn't it?"]
(replace ?<= with ?= to attach the delimiters to the righthand side, instead of left)
In response to Jonathan's answer above, this only seems to work for certain delimiters. For example:
>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']
>>> b='1999-05-03 10:37:00'
>>> re.split('- :', b)
['1999-05-03 10:37:00']
By putting the delimiters in square brackets it seems to work more effectively.
>>> re.split('[- :]', b)
['1999', '05', '03', '10', '37', '00']
This is how the regex look like:
import re
# "semicolon or (a comma followed by a space)"
pattern = re.compile(r";|, ")
# "(semicolon or a comma) followed by a space"
pattern = re.compile(r"[;,] ")
print pattern.split(text)
This question already has answers here:
How to extract numbers from a string in Python?
(19 answers)
Closed 4 years ago.
I am looking to extract numbers in the format:
[number]['/' or ' ' or '\' possible, ignore]:['/' or ' ' or '\'
possible, ignore][number]['/' or ' ' or '\' possible, ignore]:...
For example:
"4852/: 5934: 439028/:\23"
Would extract: ['4852', '5934', '439028', '23']
Use re.findall to extract all occurrences of a pattern. Note that you should use double backslash to represent a literal backslash in quotes.
>>> import re
>>> re.findall(r'\d+', '4852/: 5934: 439028/:\\23')
['4852', '5934', '439028', '23']
>>>
Python does have a regex package 2.7, 3.*
The function that you would probably want to use is the .split() function
A code snippet would be
import re
numbers = re.split('[/:\]', your_string)
The code above would work if thats you only split it based on those non-alphanumeric characters. But you could split it based on all non numeric characters too. like this
numbers = re.split('\D+', your_string)
or you could do
numbers = re.findall('\d+',your_string)
Kudos!
This question already has answers here:
Python 3 How to get string between two points using regex?
(2 answers)
Closed 4 years ago.
I have the following string:
str1 = "I am doing 'very well' for your info"
and I want to extract the part between the single quotes i.e. very well
How am I supposed to set my regular expression?
I tried the following but obviously it will give wrong result
import re
pt = re.compile(r'\'*\'')
m = pt.findall(str1)
Thanks
You can use re.findall to capture the group between the single quotes:
import re
str1 = "I am doing 'very well' for your info"
data = re.findall("'(.*?)'", str1)[0]
Output:
'very well'
Another way to solve the problem with re.findall: find all sequences that begin and end with a quote, but do not contain a quote.
re.findall("'([^']*)'", str1)
You need to place a word character and a space between the escaped single quotes.
import re
pt = re.compile(r"'([\w ]*'")
m = pt.findall(str1)
Is using regular expressions entirely necessary for your case? It often is but sometimes regular expressions just complicate simple string operations.
If not, you can use Python's native Split function to split the string into a list using ' as the divider and access that part of the array it creates.
str1 = "I am doing 'very well' for your info"
str2 = str1.split("'")
print(str2[1]) # should print: very well
try this
import re
pattern=r"'(\w.+)?'"
str1 = "I am doing 'very well' for your info"
print(re.findall(pattern,str1))
output:
['very well']
This question already has answers here:
Python regex: splitting on pattern match that is an empty string
(2 answers)
Closed 5 years ago.
import re
s = 'PythonCookbookListOfContents'
# the first line does not work
print re.split('(?<=[a-z])(?=[A-Z])', s )
# second line works well
print re.sub('(?<=[a-z])(?=[A-Z])', ' ', s)
# it should be ['Python', 'Cookbook', 'List', 'Of', 'Contents']
How to split a string from the border of a lower case character and an upper case character using Python re?
Why does the first line fail to work while the second line works well?
According to re.split:
Note that split will never split a string on an empty pattern match.
For example:
>>> re.split('x*', 'foo')
['foo']
>>> re.split("(?m)^$", "foo\n\nbar\n")
['foo\n\nbar\n']
How about using re.findall instead? (Instead of focusing on separators, focus on the item you want to get.)
>>> import re
>>> s = 'PythonCookbookListOfContents'
>>> re.findall('[A-Z][a-z]+', s)
['Python', 'Cookbook', 'List', 'Of', 'Contents']
UPDATE
Using regex module (Alternative regular expression module, to replace re), you can split on zero-width match:
>>> import regex
>>> s = 'PythonCookbookListOfContents'
>>> regex.split('(?<=[a-z])(?=[A-Z])', s, flags=regex.VERSION1)
['Python', 'Cookbook', 'List', 'Of', 'Contents']
NOTE: Specify regex.VERSION1 flag to enable split-on-zero-length-match behavior.
The title of How do I do what strtok() does in C, in Python? suggests it should answer my question but the specific strtok() behavior I'm looking for is breaking on any one of the characters in the delimiter string. That is, given:
const char* delim = ", ";
str1 = "123,456";
str2 = "234 567";
str3 = "345, 678";
strtok() finds the substrings of digits regardless of how many characters from delim are present. Python's split expects the entire delimiting string to be there so I can't do:
delim = ', '
"123,456".split(delim)
because it doesn't find delim as a substring and returns a list of single element.
If you know that the tokens are going to be numbers, you should be able to use the split function from Python's re module:
import re
re.split("\D+", "123,456")
More generally, you could match on any of the delimiter characters:
re.split("[ ,]", "123,456")
or:
re.split("[" + delim + "]", "123,456")
Using replace() to normalize your delimiters all to the same character, and split()-ting on that character, is one way to deal with simpler cases. For your examples, replace(',',' ').split() should work (converting the commas to spaces, then using the special no-argument form of split to split on runs of whitespace).
In Python, when things start getting too complex for split and replace you generally turn to the re module; see Sam Mussmann's more general answer.