Regular Expression to find brackets in a string - python

I have a string which has multiple brackets. Let says
s="(a(vdwvndw){}]"
I want to extract all the brackets as a separate string.
I tried this:
>>> brackets=re.search(r"[(){}[]]+",s)
>>> brackets.group()
But it is only giving me last two brackets.
'}]'
Why is that? Shouldn't it fetch one or more of any of the brackets in the character set?

You have to escape the first closing square bracket.
r'[(){}[\]]+'
To combine all of them into a string, you can search for anything that doesn't match and remove it.
brackets = re.sub( r'[^(){}[\]]', '', s)

Use the following (Closing square bracket must be escaped inside character class):
brackets=re.search(r"[(){}[\]]+",s)
↑

The regular expression "[(){}[]]+" (or rather "[](){}[]+" or "[(){}[\]]+" (as others have suggested)) finds a sequence of consecutive characters.
What you need to do is find all of these sequences and join them.
One solution is this:
brackets = ''.join(re.findall(r"[](){}[]+",s))
Note also that I rearranged the order of characters in a class, as ] has to be at the beginning of a class so that it is not interpreted as the end of class definition.

You could also do this without a regex:
s="(a(vdwvndw){}]"
keep = {"(",")","[","]","{","}"}
print("".join([ch for ch in s if ch in keep]))
((){}]

Related

Replace text between parentheses in python

My string will contain () in it. What I need to do is to change the text between the brackets.
Example string: "B.TECH(CS,IT)".
In my string I need to change the content present inside the brackets to something like this.. B.TECH(ECE,EEE)
What I tried to resolve this problem is as follows..
reg = r'(()([\s\S]*?)())'
a = 'B.TECH(CS,IT)'
re.sub(reg,"(ECE,EEE)",a)
But I got output like this..
'(ECE,EEE)B(ECE,EEE).(ECE,EEE)T(ECE,EEE)E(ECE,EEE)C(ECE,EEE)H(ECE,EEE)((ECE,EEE)C(ECE,EEE)S(ECE,EEE),(ECE,EEE)I(ECE,EEE)T(ECE,EEE))(ECE,EEE)'
Valid output should be like this..
B.TECH(CS,IT)
Where I am missing and how to correctly replace the text.
The problem is that you're using parentheses, which have another meaning in RegEx. They're used as grouping characters, to catch output.
You need to escape the () where you want them as literal tokens. You can escape characters using the backslash character: \(.
Here is an example:
reg = r'\([\s\S]*\)'
a = 'B.TECH(CS,IT)'
re.sub(reg, '(ECE,EEE)', a)
# == 'B.TECH(ECE,EEE)'
The reason your regex does not work is because you are trying to match parentheses, which are considered meta characters in regex. () actually captures a null string, and will attempt to replace it. That's why you get the output that you see.
To fix this, you'll need to escape those parens – something along the lines of
\(...\)
For your particular use case, might I suggest a simpler pattern?
In [268]: re.sub(r'\(.*?\)', '(ECE,EEE)', 'B.TECH(CS,IT)')
Out[268]: 'B.TECH(ECE,EEE)'

Regular Expression returns nothing

I have written the following regular expression to return everything except alphabets & letters. However this regular expression returns nothing. What can be the regular expression for such case?
Regex:
r'[^[a-z]+]'
Regards
You are messing with the character class []. Here is the correct one(without uppercase):
r'[^a-z]+'
If you want to match with start and end of string, including Upper case letters.
r'^[^a-zA-Z]+$'
And here is how you can use it:
print re.findall(r'([^a-zA-Z]+)', input_string)
() means capture the group so that it returns after the matching is performed.
This is how the regex engine see's your regex
[^[a-z]+ # Not any of these characters '[', nor a-z
] # literal ']'
So, as #Sajuj says, just need to remove the outer square brackets [^a-z]+

Pattern for '.' separated words with arbitrary number of whitespaces

It's the first time that I'm using regular expressions in Python and I just can't get it to work.
Here is what I want to achieve: I want to find all strings, where there is a word followed by a dot followed by another word. After that an unknown number of whitespaces followed by either (off) or (on). For example:
word1.word2 (off)
Here is what I have come up so far.
string_group = re.search(r'\w+\.\w+\s+[(\(on\))(\(off\))]', analyzed_string)
\w+ for the first word
\. for the dot
\w+ for the second word
\s+ for the whitespaces
[(\(on\))(\(off\))] for the (off) or (on)
I think that the last expression might not be doing what I need it to. With the implementation right now, the program does find the right place in the string, but the output of
string_group.group(0)
Is just
word1.word2 (
instead of the whole expression I'm looking for. Could you please give me a hint what I am doing wrong?
[ ... ] is used for character class, and will match any one character inside them unless you put a quantifier: [ ... ]+ for one or more time.
But simply adding that won't work...
\w+\.\w+\s+[(\(on\))(\(off\))]+
Will match garbage stuff like word1.word2 )(fno(nofn too, so you actually don't want to use a character class, because it'll match the characters in any order. What you can use is a capturing group, and a non-capturing group along with an OR operator |:
\w+\.\w+\s+(\((?:on|off)\))
(?:on|off) will match either on or off
Now, if you don't like the parentheses, to be caught too in the first group, you can change that to:
\w+\.\w+\s+\((on|off)\)
You've got your logical OR mixed up.
[(\(on\))(\(off\))]
should be
\((?:on|off)\)
[]s are just for matching single characters.
The square brackets are a character class, which matches any one of the characters in the brackets. You appear to be trying to use it to match one of the sub-regexes (\(one\)) and (\(two\)). The way to do that is with an alternation operation, the pipe symbol: (\(one\)|\(two\)).
I think your problem may be with the square brackets []
they indicate a set of single characters to match. So your expression would match a single instance of any of the following chars: "()ofn"
So for the string "word1.word2 (on)", you are matching only this part: "word1.word2 ("
Try using this one instead:
re.search(r'\w+\.\w+\s+\((on|off)\)', analyzed_string)
This match assumes that the () will be there, and looks for either "on" or "off" inside the parenthesis.

Find several strings with regular expressions

I'm looking for an OR capability to match on several strings with regular expressions.
# I would like to find either "-hex", "-mos", or "-sig"
# the result would be -hex, -mos, or -sig
# You see I want to get rid of the double quotes around these three strings.
# Other double quoting is OK.
# I'd like something like.
messWithCommandArgs = ' -o {} "-sig" "-r" "-sip" '
messWithCommandArgs = re.sub(
r'"(-[hex|mos|sig])"',
r"\1",
messWithCommandArgs)
This works:
messWithCommandArgs = re.sub(
r'"(-sig)"',
r"\1",
messWithCommandArgs)
Square brackets are for character classes that can only match a single character. If you want to match multiple character alternatives you need to use a group (parentheses instead of square brackets). Try changing your regex to the following:
r'"(-(?:hex|mos|sig))"'
Note that I used a non-capturing group (?:...) because you don't need another capture group, but r'"(-(hex|mos|sig))"' would actually work the same way since \1 would still be everything but the quotes.
Alternative you could use r'"-(hex|mos|sig)"' and use r"-\1" as the replacement (since the - is no longer a part of the group.
You should remove [] metacharacters in order to match hex or mos or sig. (?:-(hex|mos|sig))

python regular expressions to return complete result

How can I get what was matched from a python regular expression?
re.match("^\\\w*", "/welcome")
All python returns is a valid match; but I want the entire result returned; how do I do that?
Just use re.findall function.
>>> re.findall("a+", 'sdaaddaa')
['aa', 'aa']
You could use a group.
res = re.search("^(\\\w*)", "/welcome")
if res:
res.group(1);
Calling the group() method of the returned match object without any arguments will return the matched portion of the string.
The regular expression "^\\\w*" will match a string beginning with a backslash followed by 0 or more w characters. The string you are searching begins with a forward slash so your regex won't match. That's why you aren't getting anything back.
Note that your regex, if you printed out the string contains \\w. The \\ means match a single backslash then the w means match a literal w. If you want a backslash followed by a word character then you will need to escape the first backslash and the easiest way would be to use a raw string r"^\\\w*" would match "\\welcome" but still not match "/welcome".
Notice that you're "^" says you're string has to start at the beginning of a line. RegexBuddy doesn't tell that to you by default.
Maybe you want to tell us what exactly are you trying to find?

Categories

Resources