I need help to find words containing . in middle or at the end with regex in python.
Like N. or N.E. or North.East or N.East.
Not sure if you specifically need to use regex, but here's how you can do it without. Here are a couple of ways of looking at it:
If you're looking anywhere in the word (let's call it MyString) except the first character, you can use MyString[1:].contains('.'), or simply '.' in MyString[1:].
If you want to check the exact center of a string, you can use MyString[len(MyString)/2] == '.'; if the string has an even number of characters, the righthand character will be checked ('d' in 'abcdef', for instance).
If you want to check the very last character without checking anything else, MyString[-1] == '.' is enough.
Assuming that your words are sent as strings, anyway.
Maybe this is what you are looking for:
/\w+\.\w*\.?/g
https://regex101.com/r/iH9bO6/1
^\w+(?:\.\w+)*\.?$
Try this.See demo.
https://regex101.com/r/sS2dM8/15
Related
suppose, I have a string, s="panpanIpanAMpanJOEpan" . From this I want to find the word pan and replace it with spaces so that I can get the output string as "I AM JOE". How can I do it??
Actually I also don't know how to find certain substring from a long string without spaces such as mentioned above.
It will be great if someone helps me learning about this.
If you don't know pan you can exploit that the letters you want to find is all upper case.
fillword = min(set("".join(i if i.islower() else ' ' for i in s).split(' '))-set(['']),key=len)
This works by first replacing all upper case letters with space, then splitting on space and finding the minimal nonempty word.
Use replace to replace with space, and then strip to remove excess spacing.
s="panpanIpanAMpanJOEpan"
s.replace(fillword,' ').strip()
gives:
'I AM JOE'
s="panpanIpanAMpanJOEpan"
print(s.replace("pan"," ").strip())
use replace
Output:
I AM JOE
As DarrylG and others mentioned, .replace will do what you asked for, where you define what you want to replace ("pan") and what you want to replace it with (" ").
To find a certain string in a longer string you can use .find(), which takes a string you are looking for and optionally where to start and stop looking for it (as integers) as arguments.
If you wanted to find all of the occurrences of a string in a bigger string there's two options:
Find the string with find(), then cut the string so it no longer contains your searchterm and repeat this until the .find() method returns -1(that means the searchterm is not found in the string anymore)
or use the regex module and use the .finditer method to find all occurences of your string Link to someone explaining exactly that on stackoverflow.
Edit: If you don't know what you are searching for, it becomes a bit more tricky, but you can write a regex expession that would extract this data as well using the same regex module. This is easy if you know what the end result is supposed to be (I AM JOE in your case). If you don't it becomes more complicated and we would need additional information to help with this.
You can use replace, to replace all occurances of a substring at once.
In case you want to find the substrings yourself, you can do it manually:
s = "panpanIpanAMpanJOEpan"
while True:
panPosition = s.find('pan') # -1 == 'pan' not found!
if panPosition == -1:
s = s.strip()
break
# Cut out pan from s and replace it with a blanc.
s = s[:panPosition] + ' ' + s[panPosition + 3:]
print(s)
Out:
I AM JOE
I need to check whether the string contains exactly three letters-no more, no les.
I tried:
import re
rege=r'[A-Z]{3,3}'
word='AAAD'
if( re.match(rege,word)):
print 'yes'
else:
print 'no'
My second try was:
import re
rege=r'[A-Z][A-Z][A-Z]'
word='AAAD'
if( re.match(rege,word)):
print 'yes'
else:
print 'no'
both regex tests give the answer 'yes'. Of course I can check len(word) but, this part of regex will be part of more difficult regex expression and I do not want to use structure like
if(re.match(word[0:2],r'[A-Z][A-Z][A-Z]')):
if(re.match(word[3]=='-')):
if....:
if....:
....
Thank you.
You want to use anchors:
^[a-zA-Z]{3}$
^ will match the beginning of the string, $ will match the end.
^[A-Z]{3}$
will do the magic for you
According to you [A-Z]{3} should work, but this only means to check whether the string to be tested contains three letters. Not exactly three letters. The string may have more letters as well.
Thus my regex will check number of letters from starting of the string to the end.
You should use:
^[A-Z]{3}$
as they specify the beginning and the ending of the line, making sure nothing else is in there.
I am writing a regex that will be used for recognizing commands in a string. I have three possible words the commands could start with and they always end with a semi-colon.
I believe the regex pattern should look something like this:
(command1|command2|command3).+;
The problem, I have found, is that since . matches any character and + tells it to match one or more, it skips right over the first instance of a semi-colon and continues going.
Is there a way to get it to stop at the first instance of a semi-colon it comes across? Is there something other than . that I should be using instead?
The issue you are facing with this: (command1|command2|command3).+; is that the + is greedy, meaning that it will match everything till the last value.
To fix this, you will need to make it non-greedy, and to do that you need to add the ? operator, like so: (command1|command2|command3).+?;
Just as an FYI, the same applies for the * operator. Adding a ? will make it non greedy.
Tell it to find only non-semicolons.
[^;]+
What you are looking for is a non-greedy match.
.+?
The "?" after your greedy + quantifier will make it match as less as possible, instead of as much as possible, which it does by default.
Your regex would be
'(command1|command2|command3).+?;'
See Python RE documentation
This is in reference to a question I asked before here
I received a solution to the problem in that question but ended up needing to go with regex for this particular part.
I need a regular expression to search and replace a string for instances of two vowels in a row that are the same, so the "oo" in "took", or the "ee" in "bees" and replace it with the one of the letters that was replaced and a :.
Some examples of expected behavior:
"took" should become "to:k"
"waaeek" should become "wa:e:k"
"raaag" should become "ra:ag"
Thank you for the help.
Try this:
re.sub(r'([aeiou])\1', r'\1:', str)
Search for ([aeiou])\1 and replace it with \1:
I don't know about python, but you should be able to make the regex case insensitive and global with something like /([aeiou])\1/gi
What NOT to do:
As noted, this will match any two vowels together. Leaving this answer as an example of what NOT to do. The correct answer (in this case) is to use backreferences as mentioned in numerous other answers.
import re
data = ["took","waaeek","raaag"]
for s in data:
print re.sub(r'([aeiou]){2}',r'\1:',s)
This matches exactly two occurrences {2} of any member of the set [aeiou]. and replaces it with the vowel, captured with the parens () and placed in the sub string by the \1 followed by a ':'
Output:
to:k
wa:e:k
ra:ag
You'll need to use a back reference in your search expression. Try something like: ([a-z])+\1 (or ([a-z])\1 for just a double).
One rule that I need is that if the last vowel (aeiou) of a string is before a character from the set ('t','k','s','tk'), then a : needs to be added right after the vowel.
So, in Python if I have the string "orchestras" I need a rule that will turn it into "orchestra:s"
edit: The (t, k, s, tk) would be the final character(s) in the string
re.sub(r"([aeiou])(t|k|s|tk)([^aeiou]*)$", r"\1:\2\3", "orchestras")
re.sub(r"([aeiou])(t|k|s|tk)$", r"\1:\2", "orchestras")
You don't say if there can be other consonants after the t/k/s/tk. The first regex allows for this as long as there aren't any more vowels, so it'll change "fist" to "fi:st" for instance. If the word must end with the t/k/s/tk then use the second regex, which will do nothing for "fist".
If you have not figured it out yet, I recommend trying [python_root]/tools/scripts/redemo.py It is a nice testing area.
Another take on the replacement regex:
re.sub("(?<=[aeiou])(?=(?:t|k|s|tk)$)", ":", "orchestras")
This one does not need to replace using remembered groups.