Python Regex for string matching - python

There are 2 rules that I am trying to match with regex. I've tried testing on various cases, giving me unwanted results.
Rules are as follows:
Find all strings that are numbers (integer, decimal, and negative values included)
Find all strings that have no numeric value. This is referring to special characters like !##$%^&*()
So in my attempt to match these rules, I got this:
def rule(word):
if re.match("\W", word):
return True
elif re.match("[-.\d]", word):
return True
else:
return False
Input: output tests are as follows
word = '972.2' : True
word = '-88.2' : True
word = '8fdsf' : True
word = '86fdsf' : True I want this to be False
word = '&^(' : True
There were some more tests, but I just wanted to show that one I want to return False. It seems like it's matching just the first character, so I tried changing the regex epressions, but that made things worse.

As the documentation says, re.match will return a MatchObject which always evaluates to True whenever the start of the string is matched to the regex.
Thus, you need to use anchors in regex to make sure only whole string match counts, e.g. [-.\d]$ (note the dollar sign).
EDIT: plus what Max said - use + so your regex won't just match a single letter.

Your regexes (both of them) only look at the first character of your string. Change them to add +$ at the end in order to make sure your string is only made of the target characters.

As i understand, you need to exclude all except 1 and 2.
Try this:
import re
def rule(word):
return True if re.search("[^\W\d\-\.]+", word) is None else False
Results on provided samples:
972.2: True
-88.2: True
8fdsf: False
86fdsf: False
&^(: True

Related

How to validate a string?

I was just wondering how one would write a code that validates a string? For example a user inputs a postal code (string). i have to make sure it follows a L#L#L# format L-> represents a letter only and #-> represents only a number not decimal ...if not ask user to enter again
String Methods more info
For your example you could slice the string with a step of 2 checking every other if it is a digit/letter:
.isdecimal checks for characters that make up base-10 numbers systems (0-9).
.isalpha checks for letters (A-Z)
test_good = 'L5L5L5'
test_bad = 'LLLLLL'
def check_string(test):
if test[0::2].isalpha() and test[1::2].isdecimal():
return True
else:
return False
Test it out:
check_string(test_good)
>>>True
Negative test:
check_string(test_bad)
>>>False
Regex more info regexr
Regex does pattern matching operations and really a lot more. In the example below I compiled the pattern ahead of time so that it looks clean and can be reused if needed.
I also use re.fullmatch() which requires the entire provided string match, not just one part of it. On its own it will return None or the match object, so I check to see if it exists (meaning it matched) and return True or if not (None) return False.
import re
def match(test):
re_pattern = re.compile('[A-Z][0-9][A-Z][0-9][A-Z][0-9]')
if re.fullmatch(re_pattern, test):
return True
else:
return False

Trying to find a match between strings that ignores the case as well as certain 0s

There are two lists of strings that I am trying to find a match between. They both include strings that have different formatting but they point to the same list of information.
One list include strings that are formatted as "A02A18" While the next could have the same string as "a2a18"
Also there are some strings that may look like "A05" that would go with "a5"
I say "certain 0s" in the title because I dont want there to be a string such as "A15A20" and "a15a2" to match if I strip all 0s from from the string (Obviously because 05 is the same as 5 but 20 is not the same as 2)
I am looking for a way that can get them to match each other if found.
It would ideally look like
first = "A02A18"
second = "a2a18"
if first == second:
print "Yes"
What I had-
Initially I had a statement that would match strings like "A05" and "a5"
This looked like
first = "A05"
second = "a5"
if first[1:].lstrip("0") == second[1:].lstrip("0"):
print "yeah"
this would take both strings and compare them after the first index so in case of the previous example "A05" and a5 would be compared after the first letter to ignore the case. then lstrip "0" would strip the 0 that would be there. I originally did a strip 0 on both sides incase in the future a string had "a05" instead of "a5" (just trying to cover all bases)
While this works for this case, it would not work for a strings such as "A02A18" and "a2a18"
I would use regex to remove the zeroes following a letter, and compare the results (uppercasing the source to be able to compare without casing):
import re
def compare(s1,s2):
def convert(s):
return re.sub("([A-Z])0+",r"\1",s.upper())
return convert(s1) == convert(s2)
print(compare("A02A18","a2a18"))
print(compare("A20A18","a2a18"))
result:
True
False
note: this also works for A000B12: the zeroes are just removed. However, if there's a risk of false positives because inputs can be A00B1 and AB1, then the convert function could create a list of strings + converted integers as an alternative:
def convert(s):
return [int(x) if x.isdigit() else x.upper() for x in re.findall("[a-zA-Z]+|\d+",s)]
or a simpler version uppercasing the source from the start (shorter, probably slightly faster because there's only one call to upper)
def convert(s):
return [int(x) if x.isdigit() else x for x in re.findall("[A-Z]+|\d+",s.upper())]
Assuming that you mean to say that all zeros should be ignored unless the last character is a zero, the following code should be able to perform the task.
def main(first, second):
first = first.lower()
second = second.lower()
string = ""
for i in range(0, (len(first)-1)):
if(first[i] != "0"):
string = string + first[i]
string = string + first[-1]
if(string == second):
return True
else:
return False
answer = main("A02A18", "a2a18")
print(answer)
answer = main("A15A20", "a15a2")
print(answer)
This code returns True for the first call and False for the second. In the future, keep in mind the string.lower() and string.upper() functions. They are very useful in cases like this.

A check text numbers, alphabets, letters upper case and lower case except symbols

I need guys your help.
I can't understand what to use either list or set. List is more efficient. dictionary also need index. but my problem is text should be string so variable must equal to text as string. I can't D=['a','b','c'].
text gives me error because it can't compare them all except individual and i must create such as abc or word example as _success and confirm its in the list to be true.
This is my code so far but i have problem which is now it accepts numbers and letters and symbols. Symbols such as !##$% should be returning False.
Having it as own function works but i need it in the if statement.
return text.isalnum() doesn't work in the if statement. Thats my problem symbols should be false.
def check(text):
if text== '':
return False
if text.isalpha() == text.isdigit():
return True
else:
return text.isalnum()
def main():
text = str(raw_input("Enter text: "))
print(check(text))
main()
output problem.
Enter text: _
False
_ is suppose to be one of the symbols True. Example _success123 is True
!##$% is suppose to be false but its showing as True as output Another example is !##A123. This output is False.
The code up there does accept the underscore and letter and number
output:
_success123
but problem is also accepts !##$ as True.
return text.isalnum() Does deny the symbols but its not working in the if statement.
It's an overkill, but you can use Regex. It's easy to add new chars (e.g. symbols):
import re
def check(text):
return re.match('^[a-zA-Z0-9_!]*$', text)
text = str(raw_input("Enter text: "))
print(check(text))
If you want to avoid a regular expression, you could use Python sets:
allowed = set('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_0123456789')
def check(text):
return not len(set(text) - allowed)
for text in ['_success123', '!##$%']:
print(text, check(text))
This converts your text into a set of characters and removes all the characters that are allowed. If any characters remain then you know it is invalid. For two examples, this gives:
_success123 True
!##$% False

how to check if each letter in a password is part of a list of allowed characters [duplicate]

Alright so for this problem I am meant to be writing a function that returns True if a given string contains only characters from another given string. So if I input "bird" as the first string and "irbd" as the second, it would return True, but if I used "birds" as the first string and "irdb" as the second it would return False. My code so far looks like this:
def only_uses_letters_from(string1,string2):
"""Takes two strings and returns true if the first string only contains characters also in the second string.
string,string -> string"""
if string1 in string2:
return True
else:
return False
When I try to run the script it only returns True if the strings are in the exact same order or if I input only one letter ("bird" or "b" and "bird" versus "bird" and "irdb").
This is a perfect use case of sets. The following code will solve your problem:
def only_uses_letters_from(string1, string2):
"""Check if the first string only contains characters also in the second string."""
return set(string1) <= set(string2)
sets are fine, but aren't required (and may be less efficient depending on your string lengths). You could also do simply:
s1 = "bird"
s2 = "irbd"
print all(l in s1 for l in s2) # True
Note that this will stop immediately as soon as a letter in s2 isn't found in s1 and return False.

Remove strings with repeating characters [Python]

I need to determine if a string is composed of a certain repeating character, for example eeeee, 55555, or !!!.
I know this regex 'e{1,15}' can match eeeee but it obviously can't match 555. I tried [a-z0-9]{1-15} but it matches even the strings I don't need like Hello.
The solution doesn't have to be regex. I just can't think of any other way to do this.
A string consists of a single repeating character if and only if all characters in it are the same. You can easily test that by constructing a set out of the string: set('55555').
All characters are the same if and only if the set has size 1:
>>> len(set('55555')) == 1
True
>>> len(set('Hello')) == 1
False
>>> len(set('')) == 1
False
If you want to allow the empty string as well (set size 0), then use <= 1 instead of == 1.
Regex solution (via re.search() function):
import re
s = 'eeeee'
print(bool(re.search(r'^(.)\1+$', s))) # True
s = 'ee44e'
print(bool(re.search(r'^(.)\1+$', s))) # False
^(.)\1+$ :
(.) - capture any character
\1+ - backreference to the previously captured group, repeated one or many times
You do not have to use regex for this, a test to determine if all characters in the string are the same will produce the desired output:
s = "eee"
assert len(s) > 0
reference = s[0]
result = all([c==reference for c in s])
Or use set as Thomas showed, which is probably a better way.

Categories

Resources