regex in python with defined number of letters but no more

regex in python with defined number of letters but no more - python

I need to check whether the string contains exactly three letters-no more, no les.
I tried:
import re
rege=r'[A-Z]{3,3}'
word='AAAD'
if( re.match(rege,word)):
print 'yes'
else:
print 'no'
My second try was:
import re
rege=r'[A-Z][A-Z][A-Z]'
word='AAAD'
if( re.match(rege,word)):
print 'yes'
else:
print 'no'
both regex tests give the answer 'yes'. Of course I can check len(word) but, this part of regex will be part of more difficult regex expression and I do not want to use structure like
if(re.match(word[0:2],r'[A-Z][A-Z][A-Z]')):
if(re.match(word[3]=='-')):
if....:
if....:
....
Thank you.

You want to use anchors:
^[a-zA-Z]{3}$
^ will match the beginning of the string, $ will match the end.

^[A-Z]{3}$
will do the magic for you
According to you [A-Z]{3} should work, but this only means to check whether the string to be tested contains three letters. Not exactly three letters. The string may have more letters as well.
Thus my regex will check number of letters from starting of the string to the end.

You should use:
^[A-Z]{3}$
as they specify the beginning and the ending of the line, making sure nothing else is in there.

Related

how to find substring from a single line string

suppose, I have a string, s="panpanIpanAMpanJOEpan" . From this I want to find the word pan and replace it with spaces so that I can get the output string as "I AM JOE". How can I do it??
Actually I also don't know how to find certain substring from a long string without spaces such as mentioned above.
It will be great if someone helps me learning about this.

If you don't know pan you can exploit that the letters you want to find is all upper case.
fillword = min(set("".join(i if i.islower() else ' ' for i in s).split(' '))-set(['']),key=len)
This works by first replacing all upper case letters with space, then splitting on space and finding the minimal nonempty word.
Use replace to replace with space, and then strip to remove excess spacing.
s="panpanIpanAMpanJOEpan"
s.replace(fillword,' ').strip()
gives:
'I AM JOE'

s="panpanIpanAMpanJOEpan"
print(s.replace("pan"," ").strip())
use replace
Output:
I AM JOE

As DarrylG and others mentioned, .replace will do what you asked for, where you define what you want to replace ("pan") and what you want to replace it with (" ").
To find a certain string in a longer string you can use .find(), which takes a string you are looking for and optionally where to start and stop looking for it (as integers) as arguments.
If you wanted to find all of the occurrences of a string in a bigger string there's two options:
Find the string with find(), then cut the string so it no longer contains your searchterm and repeat this until the .find() method returns -1(that means the searchterm is not found in the string anymore)
or use the regex module and use the .finditer method to find all occurences of your string Link to someone explaining exactly that on stackoverflow.
Edit: If you don't know what you are searching for, it becomes a bit more tricky, but you can write a regex expession that would extract this data as well using the same regex module. This is easy if you know what the end result is supposed to be (I AM JOE in your case). If you don't it becomes more complicated and we would need additional information to help with this.

You can use replace, to replace all occurances of a substring at once.
In case you want to find the substrings yourself, you can do it manually:
s = "panpanIpanAMpanJOEpan"
while True:
panPosition = s.find('pan') # -1 == 'pan' not found!
if panPosition == -1:
s = s.strip()
break
# Cut out pan from s and replace it with a blanc.
s = s[:panPosition] + ' ' + s[panPosition + 3:]
print(s)
Out:
I AM JOE

Find words containing . in middle or at the end

I need help to find words containing . in middle or at the end with regex in python.
Like N. or N.E. or North.East or N.East.

Not sure if you specifically need to use regex, but here's how you can do it without. Here are a couple of ways of looking at it:
If you're looking anywhere in the word (let's call it MyString) except the first character, you can use MyString[1:].contains('.'), or simply '.' in MyString[1:].
If you want to check the exact center of a string, you can use MyString[len(MyString)/2] == '.'; if the string has an even number of characters, the righthand character will be checked ('d' in 'abcdef', for instance).
If you want to check the very last character without checking anything else, MyString[-1] == '.' is enough.
Assuming that your words are sent as strings, anyway.

Maybe this is what you are looking for:
/\w+\.\w*\.?/g
https://regex101.com/r/iH9bO6/1

^\w+(?:\.\w+)*\.?$
Try this.See demo.
https://regex101.com/r/sS2dM8/15

in python find index in list if combination of strings exist

I'm writing my first script and trying to learn python.
But I'm stuck and can't get out of this one.
I'm writing a script to change file names.
Lets say I have a string = "this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv"
I want the result to be string = "This Is Test3 E00"
this is what I have so far:
l = list(string)
//Transform the string into list
for i in l:
if "E" in l:
p = l.index("E")
if isinstance((p+1), int () is True:
if isinstance((p+2), int () is True:
delp = p+3
a = p-3
del l[delp:]
new = "".join(l)
new = new.replace("."," ")
print (new)
get in index where "E" and check if after "E" there are 2 integers.
Then delete everything after the second integer.
However this will not work if there is an "E" anyplace else.
at the moment the result I get is:
this is tEst
because it is finding index for the first "E" on the list and deleting everything after index+3
I guess my question is how do I get the index in the list if a combination of strings exists.
but I can't seem to find how.
thanks for everyone answers.
I was going in other direction but it is also not working.
if someone could see why it would be awesome. It is much better to learn by doing then just coping what others write :)
this is what I came up with:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
anyone can tell me why this isn't working. I get an error.
Thank you so much

Have you ever heard of a Regular Expression?
Check out python's re module. Link to the Docs.
Basically, you can define a "regex" that would match "E and then two integers" and give you the index of it.
After that, I'd just use python's "Slice Notation" to choose the piece of the string that you want to keep.
Then, check out the string methods for str.replace to swap the periods for spaces, and str.title to put them in Title Case

An easy way is to use a regex to find up until the E followed by 2 digits criteria, with s as your string:
import re
up_until = re.match('(.*?E\d{2})', s).group(1)
# this.is.tEst3.E00
Then, we replace the . with a space and then title case it:
output = up_until.replace('.', ' ').title()
# This Is Test3 E00

The technique to consider using is Regular Expressions. They allow you to search for a pattern of text in a string, rather than a specific character or substring. Regular Expressions have a bit of a tough learning curve, but are invaluable to learn and you can use them in many languages, not just in Python. Here is the Python resource for how Regular Expressions are implemented:
http://docs.python.org/2/library/re.html
The pattern you are looking to match in your case is an "E" followed by two digits. In Regular Expressions (usually shortened to "regex" or "regexp"), that pattern looks like this:
E\d\d # ('\d' is the specifier for any digit 0-9)
In Python, you create a string of the regex pattern you want to match, and pass that and your file name string into the search() method of the the re module. Regex patterns tend to use a lot of special characters, so it's common in Python to prepend the regex pattern string with 'r', which tells the Python interpreter not to interpret the special characters as escape characters. All of this together looks like this:
import re
filename = 'this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv'
match_object = re.search(r'E\d\d', filename)
if match_object:
# The '0' means we want the first match found
index_of_Exx = match_object.end(0)
truncated_filename = filename[:index_of_Exx]
# Now take care of any more processing
Regular expressions can get very detailed (and complex). In fact, you can probably accomplish your entire task of fully changing the file name using a single regex that's correctly put together. But since I don't know the full details about what sorts of weird file names might come into your program, I can't go any further than this. I will add one more piece of information: if the 'E' could possibly be lower-case, then you want to add a flag as a third argument to your pattern search which indicates case-insensitive matching. That flag is 're.I' and your search() method would look like this:
match_object = re.search(r'E\d\d', filename, re.I)
Read the documentation on Python's 're' module for more information, and you can find many great tutorials online, such as this one:
http://www.zytrax.com/tech/web/regex.htm
And before you know it you'll be a superhero. :-)

The reason why this isn't working:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
...is because 'i' contains a character from the string 'l', not an integer. You compare it with 'E' (which works), but then try to add 1 to it, which errors out.

Using variable in re.match in python

I am trying to create an array of things to match in a description line. So I can ignore them later on in my script. Below is a sample script that I have been working on, on the side.
Basically I am trying to take a bunch of strings and match it against a bunch of other strings.
AKA:
asdf or asfs or wrtw in string = true continue with script
if not print this.
import re
ignorelist = ['^test', '(.*)set']
def guess(a):
for ignore in ignorelist:
if re.match(ignore, a):
return('LOSE!')
else:
return('WIN!')
a = raw_input('Take a guess: ')
print guess(a)
Thanks

You have a bit of logic/flow problem.
You test the first term in the list. If it doesn't match, you go to the else and return "WIN!" without testing any of the other terms in the list.
(Also, ignorelist is outside the function.)
[EDIT: I see you edited the question to include regular expressions, so I will edit the answer back to a re context...] Note that you should use re.search instead of re.match if you want to give it actual regex since re.match only matches at the beginning of the line.
There are innumerable ways to change this, depending on how you want your program to work.
I would re-write guess along these lines. (You can also put ignorelist inside the function instead of passing it.):
ignorelist = [r'^test', r'[abc]set']
def guess(a,il):
for reg in il:
if re.search(reg,a):
return "LOSE"
return "WIN"
a = raw_input()
print guess(a,ignorelist)
In this case, it will loop through each word, exiting if it finds a match, but if it doesn't (completes the loop without returning anything) then it will finally return "WIN".

I think it would be far better using a single regex, or a set of them if only one would be to big to compile. Something like:
GUESSER = re.compile('|'.join(ignorelist))
def guess(a):
if GUESSER.search(a):
return('WIN!')
else:
return('LOSE!')
Note: Pattern in "ignorelist" should be enclosed in a pair of parentheses if they use the or "|" operator.

python regex pattern not working properly

Good day stackoverflow, I have a problem with my program. I want to test if the string I entered is alphanumeric or not.
def logUtb(fl, str):
now = datetime.datetime.now()
fl.write(now.strftime('%Y-%m-%d %H:%M') + " - " + str + "\n");
return;
#Test alphanumeric
def testValidationAlphaNum():
valid = re.match('[A-Za-z0-9]', '!###$#$#')
if valid == True:
logUtb(f, 'Alphanumeric')
else:
logUtb(f, 'Unknown characters')
As you can see I entered '!###$#$#' to be tested by my regex pattern. Instead of return "Unknown characters" to my report log it returns alphanumeric. Can you guys please tell me what seems to be wrong with my program? Thanks!

re.match() returns None if the string didn't match and a MatchObject if it did. So the == True test will never be satisfied. If you're really seeing the 'Alphanumeric' output, then it's not a result of the code you have posted.
In any case, you should use str.isalnum() for this:
>>> 'abc'.isalnum()
True

I agree with the others who have said that calling str.isalnum() would be a simpler option here.
However, I would like to point out a few things with regard to the regex pattern you tried. Like Alex Baldwin said, your pattern as-is will only be looking for a single alphanumeric at the beginning of the string. So, you could potentially have anything else in the rest of the string and still get a match.
What you should do instead is quantify your character class, and anchor that class to the end of the string. To test that the string contains some alphanumerics, you ought to choose the + quantifier, which looks for at least one alphanumeric. Make sure you use the $ to anchor the pattern to the end of the string, or else you could have some non-alphanumerics sneak in at the end:
re.match('[A-Za-z0-9]+$', '!###$#$#')
This will return false, of course, for the given string. The problem with using the * here is that it will return a MatchObject even against an empty string, and I assume you want there to be at least one alphanumeric character present. Notice also that using the ^ to anchor the character class to the beginning of the string is not necessary, because re.match() begins its search only at the beginning of the string. What you then want to test with your conditional is whether or not a MatchObject was returned by re.match():
valid = re.match('[A-Za-z0-9]+$', '!###$#$#')
if valid:
logUtb(f, 'Alphanumeric')
else:
logUtb(f, 'Unknown characters')
Additional information on the quantifiers and anchors can be found in the documentation:
http://docs.python.org/2/library/re.html

valid = re.match('[A-Za-z0-9]', '!###$#$#')
could be
valid = re.match(r'^\w*$', '!###$#$#')
and work. \w is alpha numeric. (I'd like to add that underscores are alphanumeric, according to python.) So if you don't want those, your regex should be: ^[A-Za-z0-9]*$
OR, it could [^_/W]
but if valid == True must be if valid to work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

regex in python with defined number of letters but no more - python

You want to use anchors: ^[a-zA-Z]{3}$ ^ will match the beginning of the string, $ will match the end.

You should use: ^[A-Z]{3}$ as they specify the beginning and the ending of the line, making sure nothing else is in there.

Related

how to find substring from a single line string

Find words containing . in middle or at the end

in python find index in list if combination of strings exist

Using variable in re.match in python

python regex pattern not working properly

Categories

Resources