Validate both numbers and letters - python

There have been cases where I have needed to validate a string filled with numbers and letters and I want to know the easiest way to do it
For example, in Tic Tac Toe / Noughts and Crosses, I need to make sure that the position that the user has entered is between "1-3" and "a-c"
For better understanding of what I am asking:
pos = "2c"
>>> Input is valid
pos = "1z"
>>> Input is invalid: Letters outside range a-c
pos = "5b"
>>> Input is invalid: Numbers outside range 1-3

There are only 9 possible valid inputs, so you could just check them all, or you could use a regular expression to see if the input matches all the valid inputs.
import re
pattern = re.compile(r'^[123][abc]$')
m = pattern.match("2b")
if m:
print("It's a match!")
The regular expression r'^[123][abc]$' looks for the start of a string, followed by 1, 2, or 3, followed by a, b, or c, followed by the end of the string. No inputs outside that range (or that are longer than two characters) should match.

Without regular expressions, you could have something like:
def validate(i):
if type(i) != str or len(i) != 2:
return False
d, char = int(i[0]), i[1]
return d >= 1 and d <= 3 and char in 'abc'
print(validate('1c')) #True
print(validate('3a')) #True
print(validate('2b')) #True
print(validate('36')) # False
print(validate('106')) # False
print(validate('10c')) # False
print(validate(10)) # False
This does, however, assume that the first character in your input can be converted to an int.

Use regex as follows:
import re
example_str = '1c'
p = re.compile('^[1-3][a-c]$')
if p.match(example_str):
# Valid
else:
# Invalid
You can use - for range selection in pattern.

Related

How do i make the program print specific letters in this specific format i give to it?

so i need to code a program which, for example if given the input 3[a]2[b], prints "aaabb" or when given 3[ab]2[c],prints "abababcc"(basicly prints that amount of that letter in the given order). i tried to use a for loop to iterate the first given input and then detect "[" letters in it so it'll know that to repeatedly print but i don't know how i can make it also understand where that string ends
also this is where i could get it to,which probably isnt too useful:
string=input()
string=string[::-1]
bulundu=6
for i in string:
if i!="]":
if i!="[":
lst.append(i)
if i=="[":
break
The approach I took is to remove the brackets, split the items into a list, then walk the list, and if the item is a number, add that many repeats of the next item to the result for output:
import re
data = "3[a]2[b]"
# Remove brackets and convert to a list
data = re.sub(r'[\[\]]', ' ', data).split()
result = []
for i, item in enumerate(data):
# If item is a number, print that many of the next item
if item.isdigit():
result.append(data[i+1] * int(item))
print(''.join(result))
# aaabb
A different approach, inspired by Subbu's use of re.findall. This approach finds all 'pairs' of numbers and letters using match groups, then multiplies them to produce the required text:
import re
data = "3[a]2[b]"
matches = re.findall('(\d+)\[([a-zA-Z]+)\]',data)
# [(3, 'a'), (2, 'b')]
for x in matches:
print(x[1] * int(x[0]), end='')
#aaabb
Lenghty and documented version using NO regex but simple string and list manipulation:
first split the input into parts that are numbers and texts
then recombinate them again
I opted to document with inline comments
This could be done like so:
# testcases are tuples of input and correct result
testcases = [ ("3[a]2[b]","aaabb"),
("3[ab]2[c]","abababcc"),
("5[12]6[c]","1212121212cccccc"),
("22[a]","a"*22)]
# now we use our algo for all those testcases
for inp,res in testcases:
split_inp = [] # list that takes the splitted values of the input
num = 0 # accumulator variable for more-then-1-digit numbers
in_text = False # bool that tells us if we are currently collecting letters
# go over all letters : O(n)
for c in inp:
# when a [ is reached our num is complete and we need to store it
# we collect all further letters until next ] in a list that we
# add at the end of your split_inp
if c == "[":
split_inp.append(num) # add the completed number
num = 0 # and reset it to 0
in_text = True # now in text
split_inp.append([]) # add a list to collect letters
# done collecting letters
elif c == "]":
in_text = False # no longer collecting, convert letters
split_inp[-1] = ''.join(split_inp[-1]) # to text
# between [ and ] ... simply add letter to list at end
elif in_text:
split_inp[-1].append(c) # add letter
# currently collecting numbers
else:
num *= 10 # increase current number by factor 10
num += int(c) # add newest number
print(repr(inp), split_inp, sep="\n") # debugging output for parsing part
# now we need to build the string from our parsed data
amount = 0
result = [] # intermediate list to join ['aaa','bb']
# iterate the list, if int remember it, it text, build composite
for part in split_inp:
if isinstance(part, int):
amount = part
else:
result.append(part*amount)
# join the parts
result = ''.join(result)
# check if all worked out
if result == res:
print("CORRECT: ", result + "\n")
else:
print (f"INCORRECT: should be '{res}' but is '{result}'\n")
Result:
'3[a]2[b]'
[3, 'a', 2, 'b']
CORRECT: aaabb
'3[ab]2[c]'
[3, 'ab', 2, 'c']
CORRECT: abababcc
'5[12]6[c]'
[5, '12', 6, 'c']
CORRECT: 1212121212cccccc
'22[a]'
[22, 'a']
CORRECT: aaaaaaaaaaaaaaaaaaaaaa
This will also handle cases of '5[12]' wich some of the other solutions wont.
You can capture both the number of repetitions n and the pattern to repeat v in one go using the described pattern. This essentially matches any sequence of digits - which is the first group we need to capture, reason why \d+ is between brackets (..) - followed by a [, followed by anything - this anything is the second pattern of interest, hence it is between backets (...) - which is then followed by a ].
findall will find all these matches in the passed line, then the first match - the number - will be cast to an int and used as a multiplier for the string pattern. The list of int(n) * v is then joined with an empty space. Malformed patterns may throw exceptions or return nothing.
Anyway, in code:
import re
pattern = re.compile("(\d+)\[(.*?)\]")
def func(x): return "".join([v*int(n) for n,v in pattern.findall(x)])
print(func("3[a]2[b]"))
print(func("3[ab]2[c]"))
OUTPUT
aaabb
abababcc
FOLLOW UP
Another solution which achieves the same result, without using regular expression (ok, not nice at all, I get it...):
def func(s): return "".join([int(x[0])*x[1] for x in map(lambda x:x.split("["), s.split("]")) if len(x) == 2])
I am not much more than a beginner and looking at the other answers, I thought understanding regex might be a challenge for a new contributor such as yourself since I myself haven't really dealt with regex.
The beginner friendly way to do this might be to loop through the input string and use string functions like isnumeric() and isalpha()
data = "3[a]2[b]"
chars = []
nums = []
substrings = []
for i, char in enumerate(data):
if char.isnumeric():
nums.append(char)
if char.isalpha():
chars.append(char)
for i, char in enumerate(chars):
substrings.append(char * int(nums[i]))
string = "".join(substrings)
print(string)
OUTPUT:
aaabb
And on trying different values for data:
data = "0[a]2[b]3[p]"
OUTPUT bbppp
data = "1[a]1[a]2[a]"
OUTPUT aaaa
NOTE: In case you're not familiar with the above functions, they are string functions, which are fairly self-explanatory. They are used as <your_string_here>.isalpha() which returns true if and only if the string is an alphabet (whitespace, numerics, and symbols return false
And, similarly for isnumeric()
For example,
"]".isnumeric() and "]".isalpha() return False
"a".isalpha() returns True
IF YOU NEED ANY CLARIFICATION ON A FUNCTION USED, PLEASE DO NOT HESITATE TO LEAVE A COMMENT

Python anagram strings

I'm supposed to write a function to determine if two strings are anagrams or not.
The codes I've written now, doesn't really work well.
For example, one of sample input is
Tom Cruise
So I'm cuter
output should be True, but my code keep says False.
For another example, when the input is
the eyes
they see
My code actually says True which is the right answer.
So I have no idea why my code only works for certain input.
Can anyone help?
def anagram(a, b):
if(sorted(a)==sorted(b)):
return True
else:
return False
You need to remove all the non alphabet characters and then convert all letters to lower case:
import re
regex = re.compile('[^a-zA-Z]')
return sorted(regex.sub('', a).lower()) == sorted(regex.sub('', b).lower())
As I mentioned in the comment section above, you need to remove any symbol such as ' and convert each letter to either uppercase or lowercase to avoid case mismatches. So, your code should look like this:
def anagram(a, b):
newA = ''.join(elem.lower() for elem in a if elem.isalpha())
newB = ''.join(elem.lower() for elem in b if elem.isalpha())
if(sorted(newA)==sorted(newB)):
return True
else:
return False
a = "Tom Cruise"
b = "So I'm cuter"
print(anagram(a,b))
This will give you:
True
Check this one...
s1=input("Enter first string:")
s2=input("Enter second string:")
a=''.join(t for t in s1 if t.isalnum())
b=''.join(t for t in s2 if t.isalnum())
a=a.lower()
b=b.lower()
if(sorted(a)==sorted(b)):
print("The strings are anagrams.")
else:
print("The strings aren't anagrams.")
def anagram(a, b):
a = ''.join(e for e in a if e.isalpha()).lower()
b = ''.join(e for e in b if e.isalpha()).lower()
return sorted(a)==sorted(b)
a = "Tom Cruise"
b = "So I'm cuter"
anagram(a,b)
You got to remove the none alpha characters in the string and convert the two string into the consistent case.

How can you group a very specfic pattern with regex?

Problem:
https://coderbyte.com/editor/Simple%20Symbols
The str parameter will be composed of + and = symbols with
several letters between them (ie. ++d+===+c++==a) and for the string
to be true each letter must be surrounded by a + symbol. So the string
to the left would be false. The string will not be empty and will have
at least one letter.
Input:"+d+=3=+s+"
Output:"true"
Input:"f++d+"
Output:"false"
I'm trying to create a regular expression for the following problem, but I keep running into various problems. How can I produce something that returns the specified rules('+\D+')?
import re
plusReg = re.compile(r'[(+A-Za-z+)]')
plusReg.findall()
>>> []
Here I thought I could create my own class that searches for the pattern.
import re
plusReg = re.compile(r'([\\+,\D,\\+])')
plusReg.findall('adf+a+=4=+S+')
>>> ['a', 'd', 'f', '+', 'a', '+', '=', '=', '+', 'S', '+']
Here I thought I the '\\+' would single out the plus symbol and read it as a char.
mo = plusReg.search('adf+a+=4=+S+')
mo.group()
>>>'a'
Here using the same shell, I tried using the search instead of findall, but I just ended up with the first letter which isn't even surrounded by a plus.
My end result is to group the string 'adf+a+=4=+S+' into ['+a+','+S+'] and so on.
edit:
Solution:
import re
def SimpleSymbols(str):
#added padding, because if str = 'y+4==+r+'
#then program would return true when it should return false.
string = '=' + str + '='
#regex that returns false if a letter *doesn't* have a + in front or back
plusReg = re.compile(r'[^\+][A-Za-z].|.[A-Za-z][^\+]')
#if statement that returns "true" if regex doesn't find any letters
#without a + behind or in front
if plusReg.search(string) is None:
return "true"
return "false"
print SimpleSymbols(raw_input())
I borrowed some code from ekhumoro and Sanjay. Thanks
One approach is to search the string for any letters that are either: (1) not preceeded by a +, or (2) not followed by a +. This can be done using look ahead and look behind assertions:
>>> rgx = re.compile(r'(?<!\+)[a-zA-Z]|[a-zA-Z](?!\+)')
So if rgx.search(string) returns None, the string is valid:
>>> rgx.search('+a+') is None
True
>>> rgx.search('+a+b+') is None
True
but if it returns a match, the string is invalid:
>>> rgx.search('+ab+') is None
False
>>> rgx.search('+a=b+') is None
False
>>> rgx.search('a') is None
False
>>> rgx.search('+a') is None
False
>>> rgx.search('a+') is None
False
The important thing about look ahead/behind assertions is that they don't consume characters, so they can handle overlapping matches.
Something like this should do the trick:
import re
def is_valid_str(s):
return re.findall('[a-zA-Z]', s) == re.findall('\+([a-zA-Z])\+', s)
Usage:
In [10]: is_valid_str("f++d+")
Out[10]: False
In [11]: is_valid_str("+d+=3=+s+")
Out[11]: True
I think you are on the right track. The regular expression you have is correct, but it can simplify down to just letters:
search_pattern = re.compile(r'\+[a-zA-z]\+')
for upper and lower case strings. Now we can use this regex with the findall function:
results = re.findall(search_pattern, 'adf+a+=4=+S+') # returns ['+a+', '+S+']
Now the question needs you to return a boolean depending on if the string is valid to the specified pattern so we can wrap this all up into a function:
def is_valid_pattern(pattern_string):
search_pattern = re.compile(r'\+[a-zA-z]?\+')
letter_pattern = re.compile(r'[a-zA-z]') # to search for all letters
results = re.findall(search_pattern, pattern_string)
letters = re.findall(letter_pattern, pattern_string)
# if the lenght of the list of all the letters equals the length of all
# the values found with the pattern, we can say that it is a valid string
return len(results) == len(letter_pattern)
You should be looking for what isn't there, as opposed to what is. You should search for something like, ([^\+][A-Za-z]|[A-Za-z][^\+]). The | in the middle is a logical or operator. Then on either side, it checks if it can find any scenario where there is a letter without a "+" on the left/right respectively. If if finds something, that means the string fails. If it can't find anything, that means that there are no instances of a letter not being surrounded by "+"'s.

Search a string for a given key

I've been doing some more CodeEval challenges and came across one on the hard tab.
You are given two strings. Determine if the second string is a substring of the first (Do NOT use any substr type library function). The second string may contain an asterisk() which should be treated as a regular expression i.e. matches zero or more characters. The asterisk can be escaped by a \ char in which case it should be interpreted as a regular '' character. To summarize: the strings can contain alphabets, numbers, * and \ characters.
So you are given two strings in a file that look something like this: Hello,ell your job is to figure out if ell is in hello, what I do:
I haven't quite gotten it perfect, but I did get it to the point where it passes and works with a 65% complete. How it runs through the string, and the key, and checks if the characters match. If the characters match, it appends the character into a list. After this it divides the length of the string by 2 and checks if the length of the list is either greater than, or equal to half of the string. I figured half of the string length would be enough to verify if it indeed matches or not. Example of how it works:
h == e -> no
e == e -> yes -> list
l == e -> no
l == e -> no
...
My question is what can I do better to the point where I can verify the wildcards that are said above?
import sys
def search_string(string, key):
""" Search a string for a specified key.
If the key exists out put "true" if it doesn't output "false"
>>> search_string("test", "est")
true
>>> search_string("testing", "rawr")
false"""
results = []
for c in string:
for ch in key:
if c == ch:
results.append(c)
if len(string) / 2 < len(results) or len(string) / 2 == len(results):
return "true"
else:
return "false"
if __name__ == '__main__':
with open(sys.argv[1]) as data:
for line in data.readlines():
data_list = line.rstrip().split(",")
search_key = data_list[1]
word = data_list[0]
print(search_string(word, search_key))
I've come up with a solution to this problem. You've said "Do NOT use any substr type library function", I'm not sure If some of the functions I used are allowed or not, so tell me if I've broken any rules :D
Hope this helps you :)
def search_string(string, key):
key = key.replace("\\*", "<NormalStar>") # every \* becomes <NormalStar>
key = key.split("*") # splitting up the key makes it easier to work with
#print(key)
point = 0 # for checking order, e.g. test = t*est, test != est*t
found = "true" # default
for k in key:
k = k.replace("<NormalStar>", "*") # every <NormalStar> becomes *
if k in string[point:]: # the next part of the key is after the part before
point = string.index(k) + len(k) # move point after this
else: # k nbt found, return false
found = "false"
break
return found
print(search_string("test", "est")) # true
print(search_string("t....est", "t*est")) # true
print(search_string("n....est", "t*est")) # false
print(search_string("est....t", "t*est")) # false
print(search_string("anything", "*")) # true
print(search_string("test", "t\*est")) # false
print(search_string("t*est", "t\*est")) # true

Check if a string contains a number

Most of the questions I've found are biased on the fact they're looking for letters in their numbers, whereas I'm looking for numbers in what I'd like to be a numberless string.
I need to enter a string and check to see if it contains any numbers and if it does reject it.
The function isdigit() only returns True if ALL of the characters are numbers. I just want to see if the user has entered a number so a sentence like "I own 1 dog" or something.
Any ideas?
You can use any function, with the str.isdigit function, like this
def has_numbers(inputString):
return any(char.isdigit() for char in inputString)
has_numbers("I own 1 dog")
# True
has_numbers("I own no dog")
# False
Alternatively you can use a Regular Expression, like this
import re
def has_numbers(inputString):
return bool(re.search(r'\d', inputString))
has_numbers("I own 1 dog")
# True
has_numbers("I own no dog")
# False
You can use a combination of any and str.isdigit:
def num_there(s):
return any(i.isdigit() for i in s)
The function will return True if a digit exists in the string, otherwise False.
Demo:
>>> king = 'I shall have 3 cakes'
>>> num_there(king)
True
>>> servant = 'I do not have any cakes'
>>> num_there(servant)
False
Use the Python method str.isalpha(). This function returns True if all characters in the string are alphabetic and there is at least one character; returns False otherwise.
Python Docs: https://docs.python.org/3/library/stdtypes.html#str.isalpha
https://docs.python.org/2/library/re.html
You should better use regular expression. It's much faster.
import re
def f1(string):
return any(i.isdigit() for i in string)
def f2(string):
return re.search('\d', string)
# if you compile the regex string first, it's even faster
RE_D = re.compile('\d')
def f3(string):
return RE_D.search(string)
# Output from iPython
# In [18]: %timeit f1('assdfgag123')
# 1000000 loops, best of 3: 1.18 µs per loop
# In [19]: %timeit f2('assdfgag123')
# 1000000 loops, best of 3: 923 ns per loop
# In [20]: %timeit f3('assdfgag123')
# 1000000 loops, best of 3: 384 ns per loop
You could apply the function isdigit() on every character in the String. Or you could use regular expressions.
Also I found How do I find one number in a string in Python? with very suitable ways to return numbers. The solution below is from the answer in that question.
number = re.search(r'\d+', yourString).group()
Alternatively:
number = filter(str.isdigit, yourString)
For further Information take a look at the regex docu: http://docs.python.org/2/library/re.html
Edit: This Returns the actual numbers, not a boolean value, so the answers above are more correct for your case
The first method will return the first digit and subsequent consecutive digits. Thus 1.56 will be returned as 1. 10,000 will be returned as 10. 0207-100-1000 will be returned as 0207.
The second method does not work.
To extract all digits, dots and commas, and not lose non-consecutive digits, use:
re.sub('[^\d.,]' , '', yourString)
I'm surprised that no-one mentionned this combination of any and map:
def contains_digit(s):
isdigit = str.isdigit
return any(map(isdigit,s))
in python 3 it's probably the fastest there (except maybe for regexes) is because it doesn't contain any loop (and aliasing the function avoids looking it up in str).
Don't use that in python 2 as map returns a list, which breaks any short-circuiting
You can accomplish this as follows:
if a_string.isdigit():
do_this()
else:
do_that()
https://docs.python.org/2/library/stdtypes.html#str.isdigit
Using .isdigit() also means not having to resort to exception handling (try/except) in cases where you need to use list comprehension (try/except is not possible inside a list comprehension).
You can use NLTK method for it.
This will find both '1' and 'One' in the text:
import nltk
def existence_of_numeric_data(text):
text=nltk.word_tokenize(text)
pos = nltk.pos_tag(text)
count = 0
for i in range(len(pos)):
word , pos_tag = pos[i]
if pos_tag == 'CD':
return True
return False
existence_of_numeric_data('We are going out. Just five you and me.')
You can use range with count to check how many times a number appears in the string by checking it against the range:
def count_digit(a):
sum = 0
for i in range(10):
sum += a.count(str(i))
return sum
ans = count_digit("apple3rh5")
print(ans)
#This print 2
import string
import random
n = 10
p = ''
while (string.ascii_uppercase not in p) and (string.ascii_lowercase not in p) and (string.digits not in p):
for _ in range(n):
state = random.randint(0, 2)
if state == 0:
p = p + chr(random.randint(97, 122))
elif state == 1:
p = p + chr(random.randint(65, 90))
else:
p = p + str(random.randint(0, 9))
break
print(p)
This code generates a sequence with size n which at least contain an uppercase, lowercase, and a digit. By using the while loop, we have guaranteed this event.
any and ord can be combined to serve the purpose as shown below.
>>> def hasDigits(s):
... return any( 48 <= ord(char) <= 57 for char in s)
...
>>> hasDigits('as1')
True
>>> hasDigits('as')
False
>>> hasDigits('as9')
True
>>> hasDigits('as_')
False
>>> hasDigits('1as')
True
>>>
A couple of points about this implementation.
any is better because it works like short circuit expression in C Language and will return result as soon as it can be determined i.e. in case of string 'a1bbbbbbc' 'b's and 'c's won't even be compared.
ord is better because it provides more flexibility like check numbers only between '0' and '5' or any other range. For example if you were to write a validator for Hexadecimal representation of numbers you would want string to have alphabets in the range 'A' to 'F' only.
What about this one?
import string
def containsNumber(line):
res = False
try:
for val in line.split():
if (float(val.strip(string.punctuation))):
res = True
break
except ValueError:
pass
return res
containsNumber('234.12 a22') # returns True
containsNumber('234.12L a22') # returns False
containsNumber('234.12, a22') # returns True
I'll make the #zyxue answer a bit more explicit:
RE_D = re.compile('\d')
def has_digits(string):
res = RE_D.search(string)
return res is not None
has_digits('asdf1')
Out: True
has_digits('asdf')
Out: False
which is the solution with the fastest benchmark from the solutions that #zyxue proposed on the answer.
Also, you could use regex findall. It's a more general solution since it adds more control over the length of the number. It could be helpful in cases where you require a number with minimal length.
s = '67389kjsdk'
contains_digit = len(re.findall('\d+', s)) > 0
Simpler way to solve is as
s = '1dfss3sw235fsf7s'
count = 0
temp = list(s)
for item in temp:
if(item.isdigit()):
count = count + 1
else:
pass
print count
alp_num = [x for x in string.split() if x.isalnum() and re.search(r'\d',x) and
re.search(r'[a-z]',x)]
print(alp_num)
This returns all the string that has both alphabets and numbers in it. isalpha() returns the string with all digits or all characters.
This too will work.
if any(i.isdigit() for i in s):
print("True")
You can also use set.intersection
It is quite fast, better than regex for small strings.
def contains_number(string):
return True if set(string).intersection('0123456789') else False
An iterator approach. It consumes all characters unless a digit is met. The second argument of next fix the default value to return when the iterator is "empty". In this case it set to False but also '' works since it is casted to a boolean value in the if.
def has_digit(string):
str_iter = iter(string)
while True:
char = next(str_iter, False)
# check if iterator is empty
if char:
if char.isdigit():
return True
else:
return False
or by looking only at the 1st term of a generator comprehension
def has_digit(string):
return next((True for char in string if char.isdigit()), False)
I'm surprised nobody has used the python operator in. Using this would work as follows:
foo = '1dfss3sw235fsf7s'
bar = 'lorem ipsum sit dolor amet'
def contains_number(string):
for i in range(10):
if str(i) in list(string):
return True
return False
print(contains_number(foo)) #True
print(contains_number(bar)) #False
Or we could use the function isdigit():
foo = '1dfss3sw235fsf7s'
bar = 'lorem ipsum sit dolor amet'
def contains_number(string):
for i in list(string):
if i.isdigit():
return True
return False
print(contains_number(foo)) #True
print(contains_number(bar)) #False
These functions basically just convert s into a list, and check whether the list contains a digit. If it does, it returns True, if not, it returns False.

Categories

Resources