How can I get what was matched from a python regular expression?
re.match("^\\\w*", "/welcome")
All python returns is a valid match; but I want the entire result returned; how do I do that?
Just use re.findall function.
>>> re.findall("a+", 'sdaaddaa')
['aa', 'aa']
You could use a group.
res = re.search("^(\\\w*)", "/welcome")
if res:
res.group(1);
Calling the group() method of the returned match object without any arguments will return the matched portion of the string.
The regular expression "^\\\w*" will match a string beginning with a backslash followed by 0 or more w characters. The string you are searching begins with a forward slash so your regex won't match. That's why you aren't getting anything back.
Note that your regex, if you printed out the string contains \\w. The \\ means match a single backslash then the w means match a literal w. If you want a backslash followed by a word character then you will need to escape the first backslash and the easiest way would be to use a raw string r"^\\\w*" would match "\\welcome" but still not match "/welcome".
Notice that you're "^" says you're string has to start at the beginning of a line. RegexBuddy doesn't tell that to you by default.
Maybe you want to tell us what exactly are you trying to find?
Related
I'm having a hell of a time trying to transfer my experience with javascript regex to Python.
I'm just trying to get this to work:
print(re.match('e','test'))
...but it prints None. If I do:
print(re.match('e','est'))
It matches... does it by default match the beginning of the string? When it does match, how do I use the result?
How do I make the first one match? Is there better documentation than the python site offers?
re.match implicitly adds ^ to the start of your regex. In other words, it only matches at the start of the string.
re.search will retry at all positions.
Generally speaking, I recommend using re.search and adding ^ explicitly when you want it.
http://docs.python.org/library/re.html
the docs is clear i think.
re.match(pattern, string[, flags])ΒΆ
If zero or more characters **at the beginning of string** match the
regular expression pattern, return a corresponding MatchObject
instance. Return None if the string does not match the pattern; note
that this is different from a zero-length match.
I have a for loop that produces a variable current_out_dir, sometimes the variable will have a /. at the end of the line (that is /.$) I want to replace /.$ with /$. Currently I have .replace('/.','/'), but this would replace hidden directories that start with . as well. e.g. /home/.log/file.txt
I've looked into re.sub() but I can't figure out how to apply it.
Dot will match any character not of newline character. So you need to escape the dot to match a literal dot.
re.sub(r'(?<=/)\.$', r'', string)
/\.(?=$)
Try this.This should work for you.This uses a positive lookahead to assert end of string.
The question was about using regex, but I've come up with a more pythonic solution to the problem.
if os.path.split(current_out_dir)[1] == '.':
current_out_dir = os.path.split(current_out_dir)[0]
I have written the following regular expression to return everything except alphabets & letters. However this regular expression returns nothing. What can be the regular expression for such case?
Regex:
r'[^[a-z]+]'
Regards
You are messing with the character class []. Here is the correct one(without uppercase):
r'[^a-z]+'
If you want to match with start and end of string, including Upper case letters.
r'^[^a-zA-Z]+$'
And here is how you can use it:
print re.findall(r'([^a-zA-Z]+)', input_string)
() means capture the group so that it returns after the matching is performed.
This is how the regex engine see's your regex
[^[a-z]+ # Not any of these characters '[', nor a-z
] # literal ']'
So, as #Sajuj says, just need to remove the outer square brackets [^a-z]+
import re
re.compile(([0-9]|[A-Z0-9]))
Is this the correct way about doing it?
Thank you!
You need to provide re.compile() a string, and your current regular expression will only match a single character, try changing it to the following:
import re
pattern = re.compile(r'^[A-Z\d]+$')
Now you can test strings to see if the match this pattern by using pattern.match(some_string).
Note that I used a raw string literal, which ensures the proper handling of backslashes.
The ^ at the beginning and $ at the end are called anchors, ^ matches only at the beginning of the string and $ matches only at the end of the string, they are necessary since you specified you want to only match strings that are entirely uppercase characters or digits, otherwise you could just match a substring.
Correct way is:
re.compile(r'^[A-Z\d]+$')
I have this weirdly formatted URL. I have to extract the contents in '()'.
Sample URL : http://sampleurl.com/(K(ThinkCode))/profile/view.aspx
If I can extract ThinkCode out of it, I will be a happy man! I am having a tough time with regexing special chars like '(' and '/'.
>>> foo = re.compile( r"(?<=\(K\()[^\)]*" )
>>> foo.findall( r"http://sampleurl.com/(K(ThinkCode))/profile/view.aspx" )
['ThinkCode']
Explanation
In regex-world, a lookbehind is a way of saying "I want to match ham, but only if it's preceded by spam. We write this as (?<=spam)ham. So in this case, we want to match [^\)]*, but only if it's preceded by \(K\(.
Now \(K\( is a nice, easy regex, because it's plain text! It means, match exactly the string (K(. Notice that we have to escape the brackets (by putting \ in front of them), since otherwise the regex parser would think they were part of the regex instead of a character to match!
Finally, when you put something in square brackets in regex-world, it means "any of the characters in here is OK". If you put something inside square brackets where the first character is ^, it means "any character not in here is OK". So [^\)] means "any character that isn't a right-bracket", and [^\)]* means "as many characters as possible that aren't right-brackets".
Putting it all together, (?<=\(K\()[^\)]* means "match as many characters as you can that aren't right-brackets, preceded by the string (K(.
Oh, one last thing. Because \ means something inside strings in Python as well as inside regexes, we use raw strings -- r"spam" instead of just "spam". That tells Python to ignore the \'s.
Another way
If lookbehind is a bit complicated for you, you can also use capturing groups. The idea behind those is that the regex matches patterns, but can also remember subpatterns. That means that you don't have to worry about lookaround, because you can match the entire pattern and then just extract the subpattern inside it!
To capture a group, simply put it inside brackets: (foo) will capture foo as the first group. Then, use .groups() to spit out all the groups that you matched! This is the way the other answer works.
It's not too hard, especially since / isn't actually a special character in Python regular expressions. You just backslash the literal parens you want. How about this:
s = "http://sampleurl.com/(K(ThinkCode))/profile/view.aspx"
mo = re.match(r"http://sampleurl\.com/\(K\(([^)]+)\)\)/profile.view\.aspx", s);
print mo.group(1)
Note the use of r"" raw strings to preserve the backslashes in the regular expression pattern string.
If you want to have special characters in a regex, you need to escape them, such as \(, \/, \\.
Matching things inside of nested parenthesis is quite a bit of a pain in regex. if that format is always the same, you could use this:
\(.*?\((.*?)\).*?\)
Basically: find a open paren, match characters until you find another open paren, group characters until I see a close paren, then make sure there are two more close paren somewhere in there.
mystr = "http://sampleurl.com/(K(ThinkCode))/profile/view.aspx"
import re
re.sub(r'^.*\((\w+)\).*',r'\1',mystr)