Python regular expression quoted string [duplicate] - python

This question already has answers here:
Python 3 How to get string between two points using regex?
(2 answers)
Closed 4 years ago.
I have the following string:
str1 = "I am doing 'very well' for your info"
and I want to extract the part between the single quotes i.e. very well
How am I supposed to set my regular expression?
I tried the following but obviously it will give wrong result
import re
pt = re.compile(r'\'*\'')
m = pt.findall(str1)
Thanks

You can use re.findall to capture the group between the single quotes:
import re
str1 = "I am doing 'very well' for your info"
data = re.findall("'(.*?)'", str1)[0]
Output:
'very well'

Another way to solve the problem with re.findall: find all sequences that begin and end with a quote, but do not contain a quote.
re.findall("'([^']*)'", str1)

You need to place a word character and a space between the escaped single quotes.
import re
pt = re.compile(r"'([\w ]*'")
m = pt.findall(str1)

Is using regular expressions entirely necessary for your case? It often is but sometimes regular expressions just complicate simple string operations.
If not, you can use Python's native Split function to split the string into a list using ' as the divider and access that part of the array it creates.
str1 = "I am doing 'very well' for your info"
str2 = str1.split("'")
print(str2[1]) # should print: very well

try this
import re
pattern=r"'(\w.+)?'"
str1 = "I am doing 'very well' for your info"
print(re.findall(pattern,str1))
output:
['very well']

Related

Replace a regex pattern in a string with another regex pattern in Python [duplicate]

This question already has answers here:
How do I add tags to certain strings in python using re.sub?
(2 answers)
Closed 4 months ago.
Is there a way to replace a regex pattern in a string with another regex pattern? I tried this but it didn't work as intended:
s = 'This is a test. There are two tests'
re.sub(r'\btest(s)??\b', "<b><font color='blue'>\btest(s)??\b</font></b>", s)
The output was:
"This is a <b><font color='blue'>\x08test(s)??\x08</font></b>. There are two <b><font color='blue'>\x08test(s)??\x08</font></b>"
Instead of the desired result of enclosing the keyword test and tests with html tags:
"This is a <b><font color='blue'>\test</font></b>. There are two <b><font color='blue'>tests</font></b>"
And if there was a workaround, how could I apply that to a text column in a dataframe?
Thanks in advance.
If in result you want to put element which it found in original text then you have to put regex in () (to catch it) and later use \1 to put this element in result.
re.sub(r'(\btest(s)??\b)', r"<b><font color='blue'>\1</font></b>", s)
BTW: it needs also prefix r in result to treat \ as normal char.
Result:
"This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>"
If you will use more () then every () will catch separated elements and every element will have own number \1, \2, etc.
For example
re.sub(r'(.*) (.*)', r'\2 \1', 'first second')
gives:
'second first'
In example it catchs also (s) and it has number \2
You can use a function to replace.
import re
def replacer(match):
return f"<b><font color='blue'>{match[0]}</font></b>"
s = 'This is a test. There are two tests'
ss = re.sub(r'\btest(s)??\b', replacer, s)
print(ss)
This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>

Extracting the last statement in []'s (regex) [duplicate]

This question already has answers here:
Remove text between square brackets at the end of string
(3 answers)
Closed 3 years ago.
I'm trying to extract the last statement in brackets. However my code is returning every statement in brackets plus everything in between.
Ex: 'What [are] you [doing]'
I want '[doing]', but I get back '[are] you [doing]' when I run re.search.
I ran re.search using a regex expression that SHOULD get the last statement in brackets (plus the brackets) and nothing else. I also tried adding \s+ at the beginning hoping that would fix it, but it didn't.
string = '[What] are you [doing]'
m = re.search(r'\[.*?\]$' , string)
print(m.group(0))
I should just get [doing] back, but instead I get the entire string.
re.findall(r'\[(.+?)\]', 'What [are] you [doing]')[-1]
['doing']
According to condition to extract the last statement in brackets:
import re
s = 'What [are] you [doing]'
m = re.search(r'.*(\[[^\[\]]+\])', s)
res = m.group(1) if m else m
print(res) # [doing]
You can use findall and get last index
import re
string = 'What [are] you [doing]'
re.findall("\[\w{1,}]", string)[-1]
Output
'[doing]'
This will also work with the example posted by #MonkeyZeus in comments. If the last value is empty it should not return empty value. For example
string = 'What [are] you []'
Output
'[are]'
You can use a negative lookahead pattern to ensure that there isn't another pair of brackets to follow the matching pair of brackets:
re.search(r'\[[^\]]*\](?!.*\[.*\])', string).group()
or you can use .* to consume all the leading characters until the last possible match:
re.search(r'.*(\[.*?\])', string).group(1)
Given string = 'abc [foo] xyz [bar] 123', both of the above code would return: '[bar]'
This captures bracketed segments with anything in between the brackets (not necessarily letters or digits: any symbols/spaces/etc):
import re
string = '[US 1?] Evaluate any matters identified when testing segment information.[US 2!]'
print(re.findall(r'\[[^]]*\]', string)[-1])
gives
[US 2!]
A minor fix with your regex. You don't need the $ at the end. And also use re.findall rather than re.search
import re
string = 'What [are] you [doing]'
re.findall("\[.*?\]", string)[-1]
Output:
'[doing]'
If you have empty [] in your string, it will also be counted in the output by above method. To solve this, change the regex from \[.*?\] to \[..*?\]
import re
string = "What [are] you []"
re.findall("\[..*?\]", string)[-1]
Output:
'[are]'
If there is no matching, it will throw error like all other answers, so you will have to use try and except

RegEx: Extract Unknown # of Numbers of Unknown Length, With Separators, and Characters to Ignore [duplicate]

This question already has answers here:
How to extract numbers from a string in Python?
(19 answers)
Closed 4 years ago.
I am looking to extract numbers in the format:
[number]['/' or ' ' or '\' possible, ignore]:['/' or ' ' or '\'
possible, ignore][number]['/' or ' ' or '\' possible, ignore]:...
For example:
"4852/: 5934: 439028/:\23"
Would extract: ['4852', '5934', '439028', '23']
Use re.findall to extract all occurrences of a pattern. Note that you should use double backslash to represent a literal backslash in quotes.
>>> import re
>>> re.findall(r'\d+', '4852/: 5934: 439028/:\\23')
['4852', '5934', '439028', '23']
>>>
Python does have a regex package 2.7, 3.*
The function that you would probably want to use is the .split() function
A code snippet would be
import re
numbers = re.split('[/:\]', your_string)
The code above would work if thats you only split it based on those non-alphanumeric characters. But you could split it based on all non numeric characters too. like this
numbers = re.split('\D+', your_string)
or you could do
numbers = re.findall('\d+',your_string)
Kudos!

why doesn't python return result using regex when there's more than one match? [duplicate]

This question already has answers here:
Extract string with Python re.match
(5 answers)
Closed 8 years ago.
Here's the code:
pattern = re.compile(r'ea')
match = pattern.match('sea ea')
if match:
print match.group()
the result is null. But when I change the code to pattern = re.compile(r'sea'), the output is "sea"
Could anyone give me an explanation?
p.s.
Btw, What I want is to retrieve the "#{year}" from string "select * from records where year = #{year}", plz give me an usable regex. Thanks in advance!
Summary:
Thanks to ALL of u, I find it in the document of python with your instruction. since I can select only one most appropriate answer, I just give it to the one who answered most quickly. Thx again.
pattern.match is anchored at the beginning of the string.
You need pattern.search.
From the documentation:
Python offers two different primitive operations based on regular
expressions: re.match() checks for a match only at the beginning of
the string, while re.search() checks for a match anywhere in the
string (this is what Perl does by default).
You mean to use search, not match. match will match the regular expression only if it is at the start of the string.
pattern = re.compile(r'ea')
match = pattern.search('sea ea')
if match:
print match.group()
match just matches, it doesn't search for things. This does:
>>> pattern = re.compile(r'(#{\w+})')
>>> pattern.split('select * from records where year = #{year}')
['select * from records where year = ', '#{year}', '']

Trying to split a string [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python: Split string with multiple delimiters
I have a small syntax problem. I have a string and another string that has a list of seperators. I need to split it via the .split method.
I can't seem to figure out how, this certainly gives a Type error.
String.split([' ', '{', '='])
How can i split it with multiple seperators?
str.split() only accepts one separator.
Use re.split() to split using a regular expression.
import re
re.split(r"[ {=]", "foo bar=baz{qux")
Output:
['foo', 'bar', 'baz', 'qux']
That's not how the built-in split() method works. It simply uses a single string as the separator, not a list of single-character separators.
You can use regular-expression based splitting, instead. This would probably mean building a regular expression that is the "or" of all your desired delimiters:
splitters = "|".join([" ", "{", "="])
re.split(splitters, my_string)
You can do this with the re (regex) library like so:
import re
result=re.split("[abc]", "my string with characters i want to split")
Where the characters in the square brackets are the characters you want to split with.
Use split from regular expressions instead:
>>> import re
>>> s = 'toto + titi = tata'
>>> re.split('[+=]', s)
['toto ', ' titi ', ' tata']
>>>
import re
string_test = "abc cde{fgh=ijk"
re.split('[\s{=]',string_test)

Categories

Resources