Find substring in string but only if whole words? - python

What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?
Perhaps an example will demonstrate what I mean:
string1 = "ADDLESHAW GODDARD"
string2 = "ADDLESHAW GODDARD LLP"
assert string_found(string1, string2) # this is True
string1 = "ADVANCE"
string2 = "ADVANCED BUSINESS EQUIPMENT LTD"
assert not string_found(string1, string2) # this should be False
How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:
def string_found(string1, string2):
if string2.find(string1 + " "):
return True
return False
But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)

You can use regular expressions and the word boundary special character \b (highlight by me):
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.
def string_found(string1, string2):
if re.search(r"\b" + re.escape(string1) + r"\b", string2):
return True
return False
Demo
If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:
def string_found(string1, string2):
string1 = " " + string1.strip() + " "
string2 = " " + string2.strip() + " "
return string2.find(string1)

The simplest and most pythonic way, I believe, is to break the strings down into individual words and scan for a match:
string = "My Name Is Josh"
substring = "Name"
for word in string.split():
if substring == word:
print("Match Found")
For a bonus, here's a oneliner:
any(substring == word for word in string.split())

Here's a way to do it without a regex (as requested) assuming that you want any whitespace to serve as a word separator.
import string
def find_substring(needle, haystack):
index = haystack.find(needle)
if index == -1:
return False
if index != 0 and haystack[index-1] not in string.whitespace:
return False
L = index + len(needle)
if L < len(haystack) and haystack[L] not in string.whitespace:
return False
return True
And here's some demo code (codepad is a great idea: Thanks to Felix Kling for reminding me)

I'm building off aaronasterling's answer.
The problem with the above code is that it will return false when there are multiple occurrences of needle in haystack, with the second occurrence satisfying the search criteria but not the first.
Here's my version:
def find_substring(needle, haystack):
search_start = 0
while (search_start < len(haystack)):
index = haystack.find(needle, search_start)
if index == -1:
return False
is_prefix_whitespace = (index == 0 or haystack[index-1] in string.whitespace)
search_start = index + len(needle)
is_suffix_whitespace = (search_start == len(haystack) or haystack[search_start] in string.whitespace)
if (is_prefix_whitespace and is_suffix_whitespace):
return True
return False

One approach using the re, or regex, module that should accomplish this task is:
import re
string1 = "pizza pony"
string2 = "who knows what a pizza pony is?"
search_result = re.search(r'\b' + string1 + '\W', string2)
print(search_result.group())

Excuse me REGEX fellows, but the simpler answer is:
text = "this is the esquisidiest piece never ever writen"
word = "is"
" {0} ".format(text).lower().count(" {0} ".format(word).lower())
The trick here is to add 2 spaces surrounding the 'text' and the 'word' to be searched, so you guarantee there will be returning only counts for the whole word and you don't get troubles with endings and beginnings of the 'text' searched.

Thanks for #Chris Larson's comment, I test it and updated like below:
import re
string1 = "massage"
string2 = "muscle massage gun"
try:
re.search(r'\b' + string1 + r'\W', string2).group()
print("Found word")
except AttributeError as ae:
print("Not found")

def string_found(string1,string2):
if string2 in string1 and string2[string2.index(string1)-1]=="
" and string2[string2.index(string1)+len(string1)]==" ":return True
elif string2.index(string1)+len(string1)==len(string2) and
string2[string2.index(string1)-1]==" ":return True
else:return False

Related

Regex Python with min a letter, a number and min a non-alphanumeric character

I would like to check if a string contains at least: 12 characters, min a letter, min a number and finally min a non-alphanumeric character.
I am in the process of creating a Regex but it does not meet my expectations.
Here is the Regex:
regex = re.compile('([A-Za-z]+[0-9]+\W+){12,}')
def is_valid(string):
return re.fullmatch(regex, string) is not None
test_string = "abdfjhfl58425!!"
print(is_valid(test_string))
When the string contains numbers after letters, it does not match!
Could you help me? Thank you.
Your regex is wrong. I found this on another post which describes a different scenario albeit very similar.
You can tweak this regex so that it reads like this:
^(.{0,12}|[^a-zA-Z]{1,}|[^\d]{1,}|[^\W]{1,})$|[\s]
Now what you have here is a regex that matches only when the password is invalid. Meaning that if you have no matches, the password is valid, and if you have matches the password is invalid. So you will need to alter the code to suit but try that regex above instead and it should work for all combinations.
The final working code would then be (with extra tests):
import re
regex = re.compile('^(.{0,12}|[^a-zA-Z]{1,}|[^\d]{1,}|[^\W]{1,})$|[\s]')
def is_valid(string):
return re.fullmatch(regex, string) is None
test_string = "abdfl58425B!!"
print(is_valid(test_string))
test_string = "ABRER58425B!!"
print(is_valid(test_string))
test_string = "eruaso58425!!"
print(is_valid(test_string))
Regex is not really suited to this task as it involves remembering counts of each type of character. You could construct a regex to do it but it would end up being very long and unreadable. Much simpler to write a function to count the number of occurrences of each type of character, something like:
def is_valid(test_string):
if len(test_string) >= 12 \
and len([c for c in test_string if c.isalpha()]) >= 1 \
and len([c for c in test_string if c.isnumeric()]) >= 1 \
and len([c for c in test_string if not c.isalnum()]) >= 1:
return True
else:
return False
If that helps: if you want to do the same thing but without ReGex, you can use this function that I had done! It works perfectly!
def is_strong_password(a_string):
if len(a_string) >= 12:
chiffre = 0
lettre = 0
alnum = 0
for x in a_string:
if x.isalpha():
lettre += 1
if x.isdigit():
chiffre += 1
if not x.isalnum():
alnum += 1
if lettre > 1 and chiffre > 1 and alnum > 1:
return True
else:
return False
else:
return False
You could four positive lookaheads:
(?i)(?=.{12})(?=.*[a-z])(?=.*\d)(?=.*[^a-z\d])
Demo
(?i) specifies that matches are to be case-indifferent.
The four positive lookaheads are as follows:
(?=.{12}) # assert that the string contains (at least) 12 characters
(?=.*[a-z]) # assert that the string contains a letter
(?=.*\d) # assert that the string contains a digit
(?=.*[^a-z\d]) # assert that the string contains a non-alphanumeric character

Exact match of a string variable in another string in python [duplicate]

I'm trying to determine whether a substring is in a string.
The issue I'm running into is that I don't want my function to return True if the substring is found within another word in the string.
For example: if the substring is; "Purple cow"
and the string is; "Purple cows make the best pets."
This should return False. Since cow isn't plural in the substring.
And if the substring was; "Purple cow"
and the string was; "Your purple cow trampled my hedge!"
would return True
My code looks something like this:
def is_phrase_in(phrase, text):
phrase = phrase.lower()
text = text.lower()
return phrase in text
text = "Purple cows make the best pets!"
phrase = "Purple cow"
print(is_phrase_in(phrase, text)
In my actual code I clean up unnecessary punctuation and spaces in 'text' before comparing it to phrase, but otherwise this is the same.
I've tried using re.search, but I don't understand regular expressions very well yet and have only gotten the same functionality from them as in my example.
Thanks for any help you can provide!
Since your phrase can have multiple words, doing a simple split and intersect won't work. I'd go with regex for this one:
import re
def is_phrase_in(phrase, text):
return re.search(r"\b{}\b".format(phrase), text, re.IGNORECASE) is not None
phrase = "Purple cow"
print(is_phrase_in(phrase, "Purple cows make the best pets!")) # False
print(is_phrase_in(phrase, "Your purple cow trampled my hedge!")) # True
Using PyParsing:
import pyparsing as pp
def is_phrase_in(phrase, text):
phrase = phrase.lower()
text = text.lower()
rule = pp.ZeroOrMore(pp.Keyword(phrase))
for t, s, e in rule.scanString(text):
if t:
return True
return False
text = "Your purple cow trampled my hedge!"
phrase = "Purple cow"
print(is_phrase_in(phrase, text))
Which yields:
True
One can do this very literally with a loop
phrase = phrase.lower()
text = text.lower()
answer = False
j = 0
for i in range(len(text)):
if j == len(phrase):
return text[i] == " "
if phrase[j] == text[i]:
answer = True
j+=1
else:
j = 0
answer = False
return answer
Or by splitting
phrase_words = phrase.lower().split()
text_words = text.lower().split()
return phrase_words in text_words
or using regular expressions
import re
pattern = re.compile("[^\w]" + text + ""[^\w]")
pattern.match(phrase.lower())
to say that we want no characters preceding or following our text, but whitespace is okay.
Regular Expressions should do the trick
import re
def is_phrase_in(phrase, text):
phrase = phrase.lower()
text = text.lower()
if re.findall('\\b'+phrase+'\\b', text):
found = True
else:
found = False
return found
Here you go, hope this helps
# Declares
string = "My name is Ramesh and I am cool. You are Ram ?"
sub = "Ram"
# Check String For SUb String
result = sub in string
# Condition Check
if result:
# find starting position
start_position = string.index(sub)
# get stringlength
length = len(sub)
# return string
output = string[start_position:len]

How to check that a string contains only “a-z”, “A-Z” and “0-9” characters [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 3 years ago.
I am importing string and trying to check if text contains only "a-z", "A-Z", and "0-9".
But I get only input and it doesn't print success when I enter letters and digits
import string
text=input("Enter: ")
correct = string.ascii_letters + string.digits
if text in correct:
print("Success")
You could use regex for this, e.g. check string against following pattern:
import re
pattern = re.compile("[A-Za-z0-9]+")
pattern.fullmatch(string)
Explanation:
[A-Za-z0-9] matches a character in the range of A-Z, a-z and 0-9, so letters and numbers.
+ means to match 1 or more of the preceeding token.
The re.fullmatch() method allows to check if the whole string matches the regular expression pattern. Returns a corresponding match object if match found, else returns None if the string does not match the pattern.
All together:
import re
if __name__ == '__main__':
string = "YourString123"
pattern = re.compile("[A-Za-z0-9]+")
# if found match (entire string matches pattern)
if pattern.fullmatch(string) is not None:
print("Found match: " + string)
else:
# if not found match
print("No match")
Just use str.isalnum()
>>> '123AbC'.isalnum()
True
>>> '1&A'.isalnum()
False
Referencing the docs:
Return true if all characters in the string are alphanumeric and there
is at least one character, false otherwise. A character c is alphanumeric
if one of the following returns True: c.isalpha(), c.isdecimal(),
c.isdigit(), or c.isnumeric().
If you don't want str.isdigit() or str.isnumeric() to be checked which may allow for decimal points in digits just use str.isnumeric() and str.isalpha():
>>> all(c.isnumeric() or c.isalpha() for c in '123AbC')
True
>>> all(c.isnumeric() or c.isalpha() for c in '1&A')
False
You must compare each letter of the incoming text separately.
import string
text = input("Enter: ")
correct = string.ascii_letters + string.digits
status = True
for char in text:
if char not in correct:
status = False
if status:
print('Correct')
else:
print('InCorrect')
You are testing if the entire string is in the string 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
So say the input is 'testInput'
You are checking if 'testInput' is in the string 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' which it is not.
You need to check each character individually
You can do this with a function for ease of use
import string
def validCharacters(text):
correct = string.ascii_letters + string.digits
for character in text:
if character not in correct:
return False
return True
text = input("Enter: ")
if validCharacters(text):
print("Correct")
else:
print("Incorrect")
We can remove all characters that aren't A-Z, a-z, or 0-9 then check the length of the remaining string. If it's greater than 0, there are characters that aren't the ones above:
import re
text = input("Enter: ")
result = re.sub("[A-Za-z0-9]", '', text)
if len(result) == 0:
print("Success")
else:
print("Failure")
Result:
abcDEF123 -> Success
hello! -> Failure
You can do it in many various ways, I would harness sets for that following way:
import string
correct = {char for char in string.ascii_letters + string.digits}
def is_correct(text):
return {char for char in text}.issubset(correct)
print(is_correct('letters123')) # True
print(is_correct('???')) # False
print(is_correct('\n')) # False
I simply check if set consisting of characters of given text is subset of set of all legal characters. Keep in mind that this always process whole text, which is not best from performance point of view (as you might end check after you find first illegal character), but it should not be problem unless you need to deal with very long texts or in very small amount of time.
You can use regex match for checking if the string contains only alpha and letters
import re
text = input("enter:")
regex = r"([0-9a-zA-Z]+)"
match = re.match(regex, string)
if match != None:
print("success")

Check special symbols in string endings

How to check special symbols such as !?,(). in the words ending? For example Hello??? or Hello,, or Hello! returns True but H!??llo or Hel,lo returns False.
I know how to check the only last symbol of string but how to check if two or more last characters are symbols?
You may have to use regex for this.
import re
def checkword(word):
m = re.match("\w+[!?,().]+$", word)
if m is not None:
return True
return False
That regex is:
\w+ # one or more word characters (a-zA-z)
[!?,().]+ # one or more of the characters inside the brackets
# (this is called a character class)
$ # assert end of string
Using re.match forces the match to begin at the beginning of the string, or else we'd have to use ^ before the regular expression.
You can try something like this:
word = "Hello!"
def checkSym(word):
return word[-1] in "!?,()."
print(checkSym(word))
The result is:
True
Try giving different strings as input and check the results.
In case you want to find every symbol from the end of the string, you can use:
def symbolCount(word):
i = len(word)-1
c = 0
while word[i] in "!?,().":
c = c + 1
i = i - 1
return c
Testing it with word = "Hello!?.":
print(symbolCount(word))
The result is:
3
If you want to get a count of the 'special' characters at the end of a given string.
special = '!?,().'
s = 'Hello???'
count = 0
for c in s[::-1]:
if c in special:
count += 1
else:
break
print("Found {} special characters at the end of the string.".format(count))
You can use re.findall:
import re
s = "Hello???"
if re.findall('\W+$', s):
pass
You could try this.
string="gffrwr."
print(string[-1] in "!?,().")

How do I check if a string contains ALL letters of the alphabet in python?

I am trying to write a python program that checks if a given string is a pangram - contains all letters of the alphabet.
Therefore, "We promptly judged antique ivory buckles for the next prize" should return True while any string that does not contain every letter of the alphabet at least once should return False.
I believe I should be using RegEx for this one, but I'm not sure how. It should look similar to this:
import sys
import re
input_string_array = sys.stdin.readlines()
input_string = input_string_array[0]
if (re.search('string contains every letter of the alphabet',input_string):
return True
else:
return False
This is not something I'd solve with a regular expression, no. Create a set of the lowercased string and check if it is a superset of the letters of the alphabet:
import string
alphabet = set(string.ascii_lowercase)
def ispangram(input_string):
return set(input_string.lower()) >= alphabet
Only if every letter of the alphabet is in the set created from the input text will it be a superset; by using a superset and not equality, you allow for punctuation, digits and whitespace, in addition to the (ASCII) letters.
Demo:
>>> import string
>>> alphabet = set(string.ascii_lowercase)
>>> input_string = 'We promptly judged antique ivory buckles for the next prize'
>>> set(input_string.lower()) >= alphabet
True
>>> set(input_string[:15].lower()) >= alphabet
False
This is my solution in python:
alphabet = "abcdefghijklmnopqrstuvwxyz"
sentence = input()
sentence = sentence.lower()
missing = ''
for letter in alphabet:
if letter not in sentence:
missing = missing+letter
if (len(missing) != 0):
print("missing", missing)
else:
print("pangram")
You dont need regex. What you want can be done in two lines with good space efficiency.
ms = itertools.chain(range(ord("a"),ord("z")),range(ord("A"),ord("Z")))
flag = all(chr(o) in string for o in ms)
That's it. string is the string you want to check. flag will be either True or False depending on if all chars are in string or not.
A pangram is a function that contains at least each letter of the alphabet.
I have tried in this way:
def pangram():
n = str(input('give me a word to check if it is a pangram:\n'))
n = n.lower()
n = n.replace(' ','')
if not isinstance(n, str):
return n, False
elif set(n) >= set('abcdefghijklmnopqrstuvxywz'):
return n, True
else:
return n, False
The function isinstance(n, str) checks if n is a string. The function set() gives us a set. For example set('penny') returns {'y', 'e', 'p', 'n'}... as you see it is a set without the repeated letters.
I was doing the same exercise today, maybe it's not the best aproach, but I think it's easy to understand.
def ispangram(s):
stringy = ''
flag = True
validLetters = "abcdefghijklmnopqrstuvwxyz"
#transform the given string in simple string with no symbols only letters
for char in s.lower():
if(char in validLetters):
stringy += char
#check if all the letters of the alphabet exist on the string
for char in validLetters:
if(char in stringy):
pass
else:
flag = False
break
return flag
if(ispangram("We promptly judged antique ivory buckles for the next prize")):
print("It's PANGRAM!")
else:
print("It's not Pangram :(")
import string
def ispangram(str1, alphabet=string.ascii_lowercase):
return ''.join(sorted(set(str1.lower().replace(" ","")))) == alphabet
First changed all alphabets to lowercase and then removed all spaces using replace. Then Converted into set to have unique chars and then used sorted function to sort alphabetically. As sorted function gives a list, so join func to join it without spaces and then compared it to all lowercase chars.
Here is my solution:
def isaPangrams(s):
alph = list(string.ascii_lowercase)
s = s.lower()
s = list(s)
for letter in alph:
if letter not in s:
print('not pangram')
present='false'
break
if letter in s:
present = 'true'
if present == 'true':
print('pangram')
if __name__ == '__main__':
s = input()
answer = isaPangrams(s)

Categories

Resources