Python Regex Check for Credit Cards - python

I created a script to look for specific card numbers, of which, are included in the list credit_cards. Simple function, it just marks each one as Valid/Invalid depending on the regex pattern listed.
My problem stems from understanding how to implement this regex check, with the inclusion of spaces, and periods. So if a card were to have 3423.3423.2343.3433 or perhaps 3323 3223 442 3234. I do include hyphens as a delimiter I'm checking for, but how can I also include multiple delimeters, like periods and spaces?
Here is my script-
import re
credit_cards = ['6011346728478930','5465625879615786','5711424424442444',
'5000-2368-7954-3214', '4444444444444444', '5331625879615786', '5770625879615786',
'5750625879615786', '575455879615786']
def World_BINS(credit_cards):
valid_BINS = r"^5(?:465|331|000|[0-9]{2})(-?\d{4}){3}$"
do_not_repeat = r"((\d)-?(?!(-?\2){3})){16}"
filters = valid_BINS, do_not_repeat
for num in credit_cards:
if all(re.match(f, num) for f in filters):
print(f"{num} is Valid")
else:
print (f"{num} is invalid")
World_BINS(credit_cards)

You can use
import re
credit_cards = ['5000 2368 7954 3214','5000.2368.7954.3214','6011346728478930','5465625879615786', '5711424424442444', '5000-2368-7954-3214', '4444444444444444', '5331625879615786', '5770625879615786','5750625879615786', '575455879615786']
def World_BINS(credit_cards):
valid_BINS = r"^5(?:465|331|000|[0-9]{2})(?=([\s.-]?))(\1\d{4}){3}$"
do_not_repeat = r"^((\d)([\s.-]?)(?!(\3?\2){3})){16}$"
filters = [valid_BINS, do_not_repeat]
for num in credit_cards:
if all(re.match(f, num) for f in filters):
print(f"{num} is Valid")
else:
print (f"{num} is invalid")
World_BINS(credit_cards)
See the Python demo.
The (?=([\s.-]?))(\1\d{4}){3} in the first regex captures a whitespace (\s), . or - as an optional char into Group 1 and then \1 refers to the value (either an empty string, or whitespace, or . or -) in the next group. The lookaround is used to make sure the delimiters / separators are used consistently in the string.
In the second regex, ^((\d)([\s.-]?)(?!(\3?\2){3})){16}$, similar technique is used, the whitespace, . or - is captured into Group 3, and the char is optionally matched inside the subsequent quantified group to refer to the same value.

Related

How to check subsequent elements of string in python using iterators?

I have a sentence that I want to parse to check for some conditions:
a) If there is a period and it is followed by a whitespace followed by a lowercase letter
b) If there is a period internal to a sequence of letters with no adjacent whitespace (i.e. www.abc.com)
c) If there is a period followed by a whitespace followed by an uppercase letter and preceded by a short list of titles (i.e. Mr., Dr. Mrs.)
Currently I am iterating through the string (line) and using the next() function to see whether the next character is a space or lowercase, etc. And then I just loop through the line. But how would I check to see what the next, next character would be? And how would I find the previous ones?
line = "This is line.1 www.abc.com. Mr."
t = iter(line)
b = next(t)
for i in line[:len(line)-1]:
a = next(t)
if i == "." and (a.isdigit()): #for example, this checks to see if the value after the period is a number
print("True")
Any help would be appreciated. Thank you.
Regular expressions is what you want.
Since your going to check for a pattern in a string, you can make use of the python's builtin support for regular expressions through re library.
Example:
#To check if there is a period internal to a sequence of letters with no adjacent whitespace
import re
str = 'www.google.com'
pattern = '.*\..*'
obj = re.compile(pattern)
if obj.search(str):
print "Pattern matched"
Similarly generate patterns for the conditions you want to check in your string.
#If there is a period and it is followed by a whitespace followed by a lowercase letter
regex = '.*\. [a-z].*'
You can generate and test your regular expressions online using this simple tool
Read more extensively about re library here
You can use multiple next operations to get more data
line = "This is line.1 www.abc.com. Mr."
t = iter(line)
b = next(t)
for i in line[:len(line)-1]:
a = next(t)
c = next(t)
if i == "." and (a.isdigit()): #for example, this checks to see if the value after the period is a number
print("True")
You can get previous ones by saving your iterations to a temporary list

Python: strip function definition using regex

I am a very beginner of programming and reading the book "Automate the boring stuff with Python'. In Chapter 7, there is a project practice: the regex version of strip(). My code below does not work (I use Python 3.6.1). Could anyone help?
import re
string = input("Enter a string to strip: ")
strip_chars = input("Enter the characters you want to be stripped: ")
def strip_fn(string, strip_chars):
if strip_chars == '':
blank_start_end_regex = re.compile(r'^(\s)+|(\s)+$')
stripped_string = blank_start_end_regex.sub('', string)
print(stripped_string)
else:
strip_chars_start_end_regex = re.compile(r'^(strip_chars)*|(strip_chars)*$')
stripped_string = strip_chars_start_end_regex.sub('', string)
print(stripped_string)
You can also use re.sub to substitute the characters in the start or end.
Let us say if the char is 'x'
re.sub(r'^x+', "", string)
re.sub(r'x+$', "", string)
The first line as lstrip and the second as rstrip
This just looks simpler.
When using r'^(strip_chars)*|(strip_chars)*$' string literal, the strip_chars is not interpolated, i.e. it is treated as a part of the string. You need to pass it as a variable to the regex. However, just passing it in the current form would result in a "corrupt" regex because (...) in a regex is a grouping construct, while you want to match a single char from the define set of chars stored in the strip_chars variable.
You could just wrap the string with a pair of [ and ] to create a character class, but if the variable contains, say z-a, it would make the resulting pattern invalid. You also need to escape each char to play it safe.
Replace
r'^(strip_chars)*|(strip_chars)*$'
with
r'^[{0}]+|[{0}]+$'.format("".join([re.escape(x) for x in strip_chars]))
I advise to replace * (zero or more occurrences) with + (one or more occurrences) quantifier because in most cases, when we want to remove something, we need to match at least 1 occurrence of the unnecessary string(s).
Also, you may replace r'^(\s)+|(\s)+$' with r'^\s+|\s+$' since the repeated capturing groups will keep on re-writing group values upon each iteration slightly hampering the regex execution.
#! python
# Regex Version of Strip()
import re
def RegexStrip(mainString,charsToBeRemoved=None):
if(charsToBeRemoved!=None):
regex=re.compile(r'[%s]'%charsToBeRemoved)#Interesting TO NOTE
return regex.sub('',mainString)
else:
regex=re.compile(r'^\s+')
regex1=re.compile(r'$\s+')
newString=regex1.sub('',mainString)
newString=regex.sub('',newString)
return newString
Str=' hello3123my43name is antony '
print(RegexStrip(Str))
Maybe this could help, it can be further simplified of course.

How to make a regular expression 'greedy but optional'

I'm trying to write a parser for a string which represents a file path, optionally following by a colon (:) and a string representing access flags (e.g. r+ or w). The file name can itself contain colons, e.g., foo:bar.txt, so the colon separating the access flags should be the last colon in the string.
Here is my implementation so far:
import re
def parse(string):
SCHEME = r"file://" # File prefix
PATH_PATTERN = r"(?P<path>.+)" # One or more of any character
FLAGS_PATTERN = r"(?P<flags>.+)" # The letters r, w, a, b, a '+' symbol, or any digit
# FILE_RESOURCE_PATTERN = SCHEME + PATH_PATTERN + r":" + FLAGS_PATTERN + r"$" # This makes the first test pass, but the second one fail
FILE_RESOURCE_PATTERN = SCHEME + PATH_PATTERN + optional(r":" + FLAGS_PATTERN) + r"$" # This makes the second test pass, but the first one fail
tokens = re.match(FILE_RESOURCE_PATTERN, string).groupdict()
return tokens['path'], tokens['flags']
def optional(re):
'''Encloses the given regular expression in a group which matches 0 or 1 repetitions.'''
return '({})?'.format(re)
I've tried the following tests:
import pytest
def test_parse_file_with_colon_in_file_name():
assert parse("file://foo:bar.txt:r+") == ("foo:bar.txt", "r+")
def test_parse_file_without_acesss_flags():
assert parse("file://foobar.txt") == ("foobar.txt", None)
if __name__ == "__main__":
pytest.main([__file__])
The problem is that by either using or not using optional, I can make one or the other test pass, but not both. If I make r":" + FLAGS_PATTERN optional, then preceding regular expression consumes the entire string.
How can I adapt the parse method to make both tests pass?
You should build the regex like
^file://(?P<path>.+?)(:(?P<flags>[^:]+))?$
See the regex demo.
In your code, ^ anchor is not necessary as you are using re.match anchoring the match at the start of the string. The path group matches any 1+ chars lazily (thus, all the text that can be matched with Group 2 will land in the second capture), up to the first occurrence of : followed with 1+ chars other than : (if present) and then end of string position is tested. Thanks to $ anchor, the first group will match the whole string if the second optional group is not matched.
Use the following fix:
PATH_PATTERN = r"(?P<path>.+?)" # One or more of any character
FLAGS_PATTERN = r"(?P<flags>[^:]+)" # The letters r, w, a, b, a '+' symbol, or any digit
See the online Python demo.
Just for fun, I wrote this parse function, which I think is better than using RE?
def parse(string):
s = string.split('//')[-1]
try:
path, flags = s.rsplit(':', 1)
except ValueError:
path, flags = s.rsplit(':', 1)[0], None
return path, flags

Python Regex Partial Match or "hitEnd"

I'm writing a scanner, so I'm matching an arbitrary string against a list of regex rules. It would be useful if I could emulate the Java "hitEnd" functionality of knowing not just when the regular expression didn't match, but when it can't match; when the regular expression matcher reached the end of the input before deciding it was rejected, indicating that a longer input might satisfy the rule.
For example, maybe I'm matching html tags for starting to bold a sentence of the form "< b >". So I compile my rule
bold_html_rule = re.compile("<b>")
And I run some tests:
good_match = bold_html_rule.match("<b>")
uncertain_match = bold_html_rule.match("<")
bad_match = bold_html_rule.match("goat")
How can I tell the difference between the "bad" match, for which goat can never be made valid by more input, and the ambiguous match that isn't a match yet, but could be.
Attempts
It is clear that in the above form, there is no way to distinguish, because both the uncertain attempt and the bad attempt return "None". If I wrap all rules in "(RULE)?" then any input will return a match, because at the least the empty string is a substring of all strings. However, when I try and see how far the regex progressed before rejecting my string by using the group method or endPos field, it is always just the length of the string.
Does the Python regex package do a lot of extra work and traverse the whole string even if it's an invalid match on the first character? I can see what it would have to if I used search, which will verify if the sequence is anywhere in the input, but it seems very strange to do so for match.
I've found the question asked before (on non-stackoverflow places) like this one:
https://mail.python.org/pipermail/python-list/2012-April/622358.html
but he doesn't really get a response.
I looked at the regular expression package itself but wasn't able to discern its behavior; could I extend the package to get this result? Is this the wrong way to tackle my task in the first place (I've built effective Java scanners using this strategy in the past)
Try this out. It does feel like a hack, but at least it does achieve the result you are looking for. Though I am a bit concerned about the PrepareCompileString function. It should be able to handle all the escaped characters, but cannot handle any wildcards
import re
#Grouping every single character
def PrepareCompileString(regexString):
newstring = ''
escapeFlag = False
for char in regexString:
if escapeFlag:
char = escapeString+char
escapeFlag = False
escapeString = ''
if char == '\\':
escapeFlag = True
escapeString = char
if not escapeFlag:
newstring += '({})?'.format(char)
return newstring
def CheckMatch(match):
# counting the number of non matched groups
count = match.groups().count(None)
# If all groups matched - good match
# all groups did not match - bad match
# few groups matched - uncertain match
if count == 0:
print('Good Match:', match.string)
elif count < len(match.groups()):
print('Uncertain Match:', match.string)
elif count == len(match.groups()):
print('Bad Match:', match.string)
regexString = '<b>'
bold_html_rule = re.compile(PrepareCompileString(regexString))
good_match = bold_html_rule.match("<b>")
uncertain_match = bold_html_rule.match("<")
bad_match = bold_html_rule.match("goat")
for match in [good_match, uncertain_match, bad_match]:
CheckMatch(match)
I got this result:
Good Match: <b>
Uncertain Match: <
Bad Match: goat

Python Regular Expressions Findall

To look through data, I am using regular expressions. One of my regular expressions is (they are dynamic and change based on what the computer needs to look for --- using them to search through data for a game AI):
O,2,([0-9],?){0,},X
After the 2, there can (and most likely will) be other numbers, each followed by a comma.
To my understanding, this will match:
O,2,(any amount of numbers - can be 0 in total, each followed by a comma),X
This is fine, and works (in RegExr) for:
O,4,1,8,6,7,9,5,3,X
X,6,3,7,5,9,4,1,8,2,T
O,2,9,6,7,11,8,X # matches this
O,4,6,9,3,1,7,5,O
X,6,9,3,5,1,7,4,8,O
X,3,2,7,1,9,4,6,X
X,9,2,6,8,5,3,1,X
My issue is that I need to match all the numbers after the original, provided number. So, I want to match (in the example) 9,6,7,11,8.
However, implementing this in Python:
import re
pattern = re.compile("O,2,([0-9],?){0,},X")
matches = pattern.findall(s) # s is the above string
matches is ['8'], the last number, but I need to match all of the numbers after the given (so '9,6,7,11,8').
Note: I need to use pattern.findall because thee will be more than one match (I shortened my list of strings, but there are actually around 20 thousand strings), and I need to find the shortest one (as this would be the shortest way for the AI to win).
Is there a way to match the entire string (or just the last numbers after those I provided)?
Thanks in advance!
Use this:
O,2,((?:[0-9],?){0,}),X
See it in action:http://regex101.com/r/cV9wS1
import re
s = '''O,4,1,8,6,7,9,5,3,X
X,6,3,7,5,9,4,1,8,2,T
O,2,9,6,7,11,8,X
O,4,6,9,3,1,7,5,O
X,6,9,3,5,1,7,4,8,O
X,3,2,7,1,9,4,6,X
X,9,2,6,8,5,3,1,X'''
pattern = re.compile("O,2,((?:[0-9],?){0,}),X")
matches = pattern.findall(s) # s is the above string
print matches
Outputs:
['9,6,7,11,8']
Explained:
By wrapping the entire value capture between 2, and ,X in (), you end up capturing that as well. I then used the (?: ) to ignore the inner captured set.
you don't have to use regex
split the string to array
check item 0 == 0 , item 1==2
check last item == X
check item[2:-2] each one of them is a number (is_digit)
that's all

Categories

Resources