I have written the following regular expression to return everything except alphabets & letters. However this regular expression returns nothing. What can be the regular expression for such case?
Regex:
r'[^[a-z]+]'
Regards
You are messing with the character class []. Here is the correct one(without uppercase):
r'[^a-z]+'
If you want to match with start and end of string, including Upper case letters.
r'^[^a-zA-Z]+$'
And here is how you can use it:
print re.findall(r'([^a-zA-Z]+)', input_string)
() means capture the group so that it returns after the matching is performed.
This is how the regex engine see's your regex
[^[a-z]+ # Not any of these characters '[', nor a-z
] # literal ']'
So, as #Sajuj says, just need to remove the outer square brackets [^a-z]+
Related
I would like to replace the ー to - in a regular expression like \d+(ー)\d+(ー)\d+. I tried re.sub but it will replace all the text including the numbers. Is it possible to replace the word in parentheses only?
e.g.
sub('\d+(ー)\d+(ー)\d+','4ー3ー1','-') returns '4-3-1'. Assume that simple replace cannot be used because there are other ー that do not satisfy the regular expression. My current solution is to split the text and do replacement on the part which satisfy the regular expression.
You may use the Group Reference here.
import re
before = '4ー3ー1ーー4ー31'
after = re.sub(r'(\d+)ー(\d+)ー(\d+)', r'\1-\2-\3', before)
print(after) # '4-3-1ーー4ー31'
Here, r'\1' is the reference to the first group, a.k.a, the first parentheses.
You could use a function for the repl argument in re.sub to only touch the match groups.
import re
s = '1234ー2134ー5124'
re.sub("\d+(ー)\d+(ー)\d+", lambda x: x.group(0).replace('ー', '-'), s)
Using a slightly different pattern, you might be able to take advantage of a lookahead expression which does not consume the part of string it matches to. That is to say, a lookahead/lookbehind will match on a pattern with the condition that it also matches the component in the lookahead/lookbehind expression (rather than the entire pattern.)
re.sub("ー(?=\d+)", "-", s)
If you can live with a fixed-length expression for the part preceding the emdash you can combine the lookahead with a lookbehind to make the regex a little more conservative.
re.sub("(?<=\d)ー(?=\d+)", "-", s)
re.sub('\d+(ー)\d+(ー)\d+','4ー3ー1','-')
Like you pointed out, the output of the regular expression will be '-'. because you are trying to replace the entire pattern with a '-'. to replace the ー to - you can use
import re
input_string = '4ー3ー1'
re.sub('ー','-', input_string)
or you could do a find all on the digits and join the string with a '-'
'-'.join(re.findall('\d+', input_string))
both methods should give you '4-3-1'
Today, I found out that regex r"['a', 'b']" matches 'a, b'.
Why is that? What does comma and ' mean inside []?
Thank you.
[] is used to define character sets in regular expressions. The expression will match if the string contains any of the characters in that set.
Your regular expression:
r"['a', 'b']"
Says "match if string contains ' or a or , or b. As #Patrick Haugh mentions in his comment. Your expression is equivalent to [',ab]. Repeating the same character in the set does nothing.
http://www.regexpal.com/ is a great site for testing your regular expressions. It can help break it down for you and explain what your expression does and why it matches on certain strings.
Suppose I am using the following regular expression to match, logically the regular expression means match anything with prefix foo: and ends with anything which is not a space. Match group will be the parts exclude prefix foo
My question is what exactly means anything in Python 2.7? Any ASCII or? If anyone could share some document, it will be great. Thanks.
a = re.compile('foo:([^ ]+)')
thanks in advance,
Lin
Try:
a = re.compile('foo:\S*')
\S means anything but whitespace.
I recommend you check out http://pythex.org.
It's really good for testing out regular expresions and has a decent cheat-sheet.
UPDATE:
Anything (.) matches anything, all unicode/UTF-8 characters.
The regular expression metacharacter which matches any character is . (dot).
a = re.compile('foo:(.+)')
The character class [^ ] matches any one character which isn't one of the characters between the square brackets (a literal space, in this example). The quantifier + specifies one or more repetitions of the preceding expression.
I have a string which has multiple brackets. Let says
s="(a(vdwvndw){}]"
I want to extract all the brackets as a separate string.
I tried this:
>>> brackets=re.search(r"[(){}[]]+",s)
>>> brackets.group()
But it is only giving me last two brackets.
'}]'
Why is that? Shouldn't it fetch one or more of any of the brackets in the character set?
You have to escape the first closing square bracket.
r'[(){}[\]]+'
To combine all of them into a string, you can search for anything that doesn't match and remove it.
brackets = re.sub( r'[^(){}[\]]', '', s)
Use the following (Closing square bracket must be escaped inside character class):
brackets=re.search(r"[(){}[\]]+",s)
↑
The regular expression "[(){}[]]+" (or rather "[](){}[]+" or "[(){}[\]]+" (as others have suggested)) finds a sequence of consecutive characters.
What you need to do is find all of these sequences and join them.
One solution is this:
brackets = ''.join(re.findall(r"[](){}[]+",s))
Note also that I rearranged the order of characters in a class, as ] has to be at the beginning of a class so that it is not interpreted as the end of class definition.
You could also do this without a regex:
s="(a(vdwvndw){}]"
keep = {"(",")","[","]","{","}"}
print("".join([ch for ch in s if ch in keep]))
((){}]
How can I get what was matched from a python regular expression?
re.match("^\\\w*", "/welcome")
All python returns is a valid match; but I want the entire result returned; how do I do that?
Just use re.findall function.
>>> re.findall("a+", 'sdaaddaa')
['aa', 'aa']
You could use a group.
res = re.search("^(\\\w*)", "/welcome")
if res:
res.group(1);
Calling the group() method of the returned match object without any arguments will return the matched portion of the string.
The regular expression "^\\\w*" will match a string beginning with a backslash followed by 0 or more w characters. The string you are searching begins with a forward slash so your regex won't match. That's why you aren't getting anything back.
Note that your regex, if you printed out the string contains \\w. The \\ means match a single backslash then the w means match a literal w. If you want a backslash followed by a word character then you will need to escape the first backslash and the easiest way would be to use a raw string r"^\\\w*" would match "\\welcome" but still not match "/welcome".
Notice that you're "^" says you're string has to start at the beginning of a line. RegexBuddy doesn't tell that to you by default.
Maybe you want to tell us what exactly are you trying to find?