Checking Sentence Case in String - python

I wonder how to check for sentence case in string. I know about isupper() method that checks for any uppercase character. But what about sentence case? For example:
def check_string(string):
<...>
return result
check_string('This is a proper sentence case.')
True
check_string('THIS IS UPPER CASE')
False
check_string('This string is Clean')
False

A quick trick for this would be using capitalize() and check if it is equal to the original input:
def check_string(my_string):
cap_string = my_string.capitalize()
if (cap_string == my_string):
return True
return False

def check_string(string):
sentences = string.split(".")
for sentence in sentences:
sentence = sentence.strip()
if sentence and (not sentence[0].isupper() or any(filter(lambda x: x.isupper(), sentence[1:]))):
return False
return True

Related

Assert only if an entire word is filled out

string1 = "Billie Jean"
string2 = " "
teststring = string2.split(" ")
for word in teststring:
if word in string1:
return True
return False
Can I make it so that if string2 is for example: "Baby Jean" it's true, but if it's: "ea" it returns as false?
I believe the resolution is to split up string1 into a list of words before performing an in check, as follows:
string1 = "Billie Jean"
string2 = " "
string3 = 'Baby Jean'
words1 = string1.split()
def check(string):
teststring = string.split()
for word in teststring:
# Notice that we iterate over a list of words instead of string1
if word in words1:
return True
return False
print(check(string2))
print(check(string3))
Another option would be to use the set.intersection method, which returns true if there are any shared elements between two collections (one of which is a set):
string1 = "Billie Jean"
string2 = " "
string3 = 'Baby Jean'
words1 = set(string1.split())
def check(string):
return True if words1.intersection(string.split()) else False
print(check(string2))
print(check(string3))
Outputs:
False
True
A couple of pointers:
split() by default splits on the " " character so you don't need to explicitly pass it.
Since you are checking for word being in another string, you must split string1 too.
Check the code below:
def func(string1, string2):
for word in string2.split():
if word in string1.split():
return True
return False
print(func("Billie Jean", "Baby Jean")) # True
print(func("Billie Jean", "ea")) # False
A more interesting solution would be the following:
def func(string1, string2):
# create set of words for each string and see if intersection is empty or not
return True if set(string1.split()) & set(string2.split()) else False
print(func("Billie Jean", "Baby Jean"))
print(func("Billie Jean", "ea"))
Since matching using the intersection of sets leverages hashing, this method will be a lot faster at scale. (Might not really matter to you know, but it is good to know)
PS. You might want to convert both strings to the same case (lower or upper) if you wish to match "Jean" with "jean".

How to check if the substring is present in the parent string in same order

Input
Indian
3
nda
dan
ndani
Output
True
True
False
Explanation
1st line is the parent string
2nd line is the number of test case
The next n lines are the queries
First and Second query substring are in same order as in parent string.
for each query, initialize a pointer at the beginning of the query string, increment it only when you match an alphabet from the parent string while looping through the parent string
start = 0
for x in parent:
if x == query[start]:
start += 1
if start == len(query):
print(True)
break
else:
print(False)
You can do this for each query.
You can do this by creating a regular expression from the substring you are trying to match. For example, for the first test case if you want to know if 'nda' can be found in 'Indian', then form the regular expression n.*d.*a and do a search for that expression in 'Indian':
import re
string = 'Indian'
substrings = [
'nda',
'dan',
'ndan1'
]
for substring in substrings:
rex = '.*'.join(re.escape(ch) for ch in substring) # 'n.*d.*a'
print('True' if re.search(rex, string) else 'False')
Prints:
True
True
False
You just need to check each char against the main string index from i to end.
You can try this:
main_str = 'Indian'
words = ['nda','dan','ndani']
for word in words:
checker = []
for i in range(len(main_str)):
if i == len(word):
break
if word[i] in main_str[i::]:
checker.append('True')
else:
checker.append('False')
if 'False' in checker:
print('False')
else:
print('True')
It's not very efficient and intuitive but it gets the jobe done (I think). You can just modify the code to fit your input
This is a very simple program. But it will work -
string = 'Indian'
words = ['3','nda','dan','ndani']
for sub in words:
if sub in the string:
return True
else:
return False
This is my approach - use iter.
def isSubsequence(sub: str, orig: str) -> bool:
it = iter(orig) #
return all(ch in orig for ch in sub)
if __name__ == '__main__':
sub = 'rice'
orig = 'practice'
assert isSubsequence(sub, orig) == True
assert isSubsequence('pace', orig) == True
assert isSubsequence('acts', orig) == False

Efficiently Detecting Kangaroo Words

This is a bit of a puzzler, and I wanted to get an ideal algorithm for this type of question. The problem is focused on kangaroo words, which are words that have all the same letters in order as the main word. It's easiest to provide an example. If we take the word (which seems to be floating about online for this type of question) - courage.
courage
courage -> core
courage -> cog
Here is working code to detect the lines above:
def areAllCharsInWordInOrder(word, lookup_word):
is_kangaroo_word = True
curr_idx = 0
are_all_letters_consecutive = True
for individual_char in lookup_word:
try:
new_idx = word.index(individual_char, curr_idx)
if new_idx - curr_idx == 1:
are_all_letters_consecutive = are_all_letters_consecutive and True
else:
are_all_letters_consecutive = False
curr_idx = new_idx
except:
return False
if are_all_letters_consecutive:
return False
return True
However, the caveat for the question comes with the fact that the letters may not be consecutive. So if we look at devil and evil, these are not kangaroo words because evil is all in order of devil. However, devilishly and evil would be because: devilishly would match evil.
The nuance comes in that now - I believe - we have to explore every possible matching index to see if it's a valid path. Is this true? Is there a more optimal algorithm? This was my cleanest attempt (lightly tested).
def findAllIndexes(char, curr_idx, word):
return [i for i, ltr in enumerate(word) if ltr == char and i > curr_idx]
def kangarooHelper(lookup_word, lookup_idx, curr_idx, are_all_letters_consecutive, word):
if lookup_idx >= len(lookup_word):
# we're done we've iterated through the whole word
if not are_all_letters_consecutive:
return True
else:
return False
new_indices = findAllIndexes(lookup_word[lookup_idx], curr_idx, word)
if len(new_indices) == 0:
return False
return any(kangarooHelper(lookup_word, lookup_idx+1, new_idx, (new_idx - curr_idx == 1) and are_all_letters_consecutive, word) for new_idx in new_indices)
def areAllCharsInWordInOrderFixed(word, lookup_word):
# Should return false if they're in order
is_kangaroo_word, are_all_letters_consecutive = True, True
lookup_idx = 0
if len(lookup_word) == 0:
return True
try:
curr_idx = word.index(lookup_word[lookup_idx], 0)
except:
return False
return kangarooHelper(lookup_word, lookup_idx + 1, curr_idx, are_all_letters_consecutive, word)
Again, it's been lightly tested, but I'd love to clean up both algo and code.
if __name__ == '__main__':
print(areAllCharsInWordInOrderFixed('encourage', 'urge')) # True
print(areAllCharsInWordInOrderFixed('devil', 'evil')) # False
print(areAllCharsInWordInOrderFixed('devilishly', 'evil')) # True
print(areAllCharsInWordInOrderFixed('encourage', 'nrage')) # True
print(areAllCharsInWordInOrderFixed('encourage', 'rage')) # False
Thanks! Any advice and suggestions would be greatly appreciated.
Here's a regex-based approach to the problem. We form a regex from lookup_word by adding .* between each letter in the word. Then we attempt to match the regex against word. Since .* is inherently greedy, you will get the longest possible match inside word. You can then compare the length of the matched string to the length of lookup_word, and if the matched string is longer, then lookup_word is a kangaroo word:
import re
def areAllCharsInWordInOrderFixed(word, lookup_word):
regex = '.*'.join(lookup_word)
match = re.search(regex, word)
return match is not None and len(match.group()) > len(lookup_word)
print(areAllCharsInWordInOrderFixed('encourage', 'urge')) # True
print(areAllCharsInWordInOrderFixed('devil', 'evil')) # False
print(areAllCharsInWordInOrderFixed('devilishly', 'evil')) # True
print(areAllCharsInWordInOrderFixed('encourage', 'nrage')) # True
print(areAllCharsInWordInOrderFixed('encourage', 'rage')) # False
Output:
True
False
True
True
False
Alternatively you can take an iterative approach. There are two conditions that need to be true for the input to be a kangaroo word:
the letters of the lookup word must be present in the word in order
there must be at least one extra letter between the letters of the lookup word
The first condition can be tested by checking each letter in turn to see that there is an occurrence of it after the previous letter in the word. The last condition can be checked by testing that the first occurrence of the first letter is more than the length of the word away from the last occurrence of the last letter. For example:
def areAllCharsInWordInOrderFixed(word, lookup_word):
first = start = word.find(lookup_word[0])
if first == -1:
return False
for c in lookup_word[1:-1]:
start = word.find(c, start+1)
if start == -1:
return False
end = word.rfind(lookup_word[-1], start+1)
# don't need to check for end == -1 as the next test will fail if it is
return end - first >= len(lookup_word)
The results are the same as the regex version.

Exact match of a string variable in another string in python [duplicate]

I'm trying to determine whether a substring is in a string.
The issue I'm running into is that I don't want my function to return True if the substring is found within another word in the string.
For example: if the substring is; "Purple cow"
and the string is; "Purple cows make the best pets."
This should return False. Since cow isn't plural in the substring.
And if the substring was; "Purple cow"
and the string was; "Your purple cow trampled my hedge!"
would return True
My code looks something like this:
def is_phrase_in(phrase, text):
phrase = phrase.lower()
text = text.lower()
return phrase in text
text = "Purple cows make the best pets!"
phrase = "Purple cow"
print(is_phrase_in(phrase, text)
In my actual code I clean up unnecessary punctuation and spaces in 'text' before comparing it to phrase, but otherwise this is the same.
I've tried using re.search, but I don't understand regular expressions very well yet and have only gotten the same functionality from them as in my example.
Thanks for any help you can provide!
Since your phrase can have multiple words, doing a simple split and intersect won't work. I'd go with regex for this one:
import re
def is_phrase_in(phrase, text):
return re.search(r"\b{}\b".format(phrase), text, re.IGNORECASE) is not None
phrase = "Purple cow"
print(is_phrase_in(phrase, "Purple cows make the best pets!")) # False
print(is_phrase_in(phrase, "Your purple cow trampled my hedge!")) # True
Using PyParsing:
import pyparsing as pp
def is_phrase_in(phrase, text):
phrase = phrase.lower()
text = text.lower()
rule = pp.ZeroOrMore(pp.Keyword(phrase))
for t, s, e in rule.scanString(text):
if t:
return True
return False
text = "Your purple cow trampled my hedge!"
phrase = "Purple cow"
print(is_phrase_in(phrase, text))
Which yields:
True
One can do this very literally with a loop
phrase = phrase.lower()
text = text.lower()
answer = False
j = 0
for i in range(len(text)):
if j == len(phrase):
return text[i] == " "
if phrase[j] == text[i]:
answer = True
j+=1
else:
j = 0
answer = False
return answer
Or by splitting
phrase_words = phrase.lower().split()
text_words = text.lower().split()
return phrase_words in text_words
or using regular expressions
import re
pattern = re.compile("[^\w]" + text + ""[^\w]")
pattern.match(phrase.lower())
to say that we want no characters preceding or following our text, but whitespace is okay.
Regular Expressions should do the trick
import re
def is_phrase_in(phrase, text):
phrase = phrase.lower()
text = text.lower()
if re.findall('\\b'+phrase+'\\b', text):
found = True
else:
found = False
return found
Here you go, hope this helps
# Declares
string = "My name is Ramesh and I am cool. You are Ram ?"
sub = "Ram"
# Check String For SUb String
result = sub in string
# Condition Check
if result:
# find starting position
start_position = string.index(sub)
# get stringlength
length = len(sub)
# return string
output = string[start_position:len]

python - regex: we are looking for the input of a function

import re
def step_through_with(s):
pattern = re.compile(s + ',')
if pattern == True:
return True
else:
return False
The task is to find a word in a sentence, which is the input parameter of the function. How should the syntax look like?
If you want to find a word in a sentence, you have to take into account boundaries (so searching for 'fun' won't match 'function' for instance).
An example:
import re
def step_through_with(sentence, word):
pattern = r'\b{}\b'.format(word)
if re.search(pattern, sentence):
return True
return False
sentence = 'we are looking for the input of a function'
print step_through_with(sentence, 'input') # True
print step_through_with(sentence, 'fun') # False

Categories

Resources