Find frequency for a specific word in a given string

Find frequency for a specific word in a given string - python

I have created some code in Python to find the top frequency word in a string. I am pretty new in Python and ask for your help to see if I could code this better and more effectively. (code below returns the frequency of the specified word). Since I am a beginning Python dev I have the feeling my code is unnecessarily long and could be written much better, only good thing is that the code works. But want to learn how I could do it better. I also don't know if my class WordCounter makes sense with it's attributes....
class WordCounter:
def __init__(self, word, frequency):
self.word = word
self.frequency = frequency
# calculate_frequency_for_word should return the frequency of the specified word
def frequency_specific_word(text: str, word: str) -> int:
lookup_word = word #this contains the specified word to search for
incoming_string = [word.lower() for word in text.split() if word.isalpha()]
count = 0 #count is increased when the specified word is found in the string
i=0 #this is used as counter for the index
j=0 #the loop will run from j=0 till the length on the incoming_string
length = len(incoming_string) #checking the length of incoming_string
while j < length:
j += 1
if lookup_word in incoming_string[i]: #Specified word is found, add 1 to count
count += 1
incoming_string[i] = incoming_string[i + 1] #move to next word in incoming string
else:
incoming_string[i] #Specified word not found, do nothing
#print("No," + lookup_word + " not found in List : " + incoming_string[i])
i += 1
return count
print("The word 'try' found " +str(WordCounter.frequency_specific_word("Your help is much appreciated, this code could be done much better I think, much much better", "much"))+" times in text\n")

You can try the list.count() method:
>>> s = "Your help is much appreciated, this code could be done much better I think, much much better"
>>> s.lower().split().count('much')
4
To eliminate punctuation, you can use the built-in re module:
>>> import re
>>> s = "Your help is much appreciated, this code could be done much better I think, much much better"
>>> re.findall(r'\b\w+\b', s.lower()).count('much')
4

Related

Number of words in text you can fully type using this keyboard

There is such a task with Leetcode. Everything works for me when I press RUN, but when I submit, it gives an error:
text = "a b c d e"
brokenLetters = "abcde"
Output : 1
Expected: 0
def canBeTypedWords(self, text, brokenLetters):
for i in brokenLetters:
cnt = 0
text = text.split()
s1 = text[0]
s2 = text[1]
if i in s1 and i in s2:
return 0
else:
cnt += 1
return cnt
Can you please assist what I missed here?
Everything work exclude separate letters condition in a text.

So consider logically what you have to do, then write that algorithmically.
Logically, you have a list of words, a list of broken letters, and you need to return the count of words that have none of those broken letters in them.
"None of those broken letters in them" is the important bit -- if even one broken letter is in the word, it's no good.
def count_words(broken_letters, word_list) -> int:
words = word_list.split() # split on spaces
broken_letters = set(broken_letters) # we'll be doing membership checks
# on this kind of a lot, so changing
# it to a set is more performant
count = 0
for word in words:
for letter in word:
if letter in broken_letters:
# this word doesn't work, so break out of the
# "for letter in word" loop
break
else:
# a for..else block is only entered if execution
# falls off the bottom naturally, so in this case
# the word works!
count += 1
return count
This can, of course, be written much more concisely and (one might argue) idiomatically, but it is less obvious to a novice how this code works. As exercise to the reader: see if you can understand how this code works and how you might modify it if the exercise was, instead, giving you all the letters that work rather than the letters that are broken.
def count_words(broken_letters, word_list) -> int:
words = word_list.split()
broken_letters = set(broken_letters)
return sum((1 for word in words if all(lett not in broken_letters for lett in word)))

Reverse a specific word function

I'm having trouble doing the next task:
So basically, I need to build a function that receives a (sentence, word, occurrence)
and it will search for that word and reverse it only where it occurs
for example:
function("Dani likes bananas, Dani also likes apples", "lik", "2")
returns: "Dani likes bananas, Dani also kiles apples"
As you can see, the "word" is 'lik' and at the second time it occurred it reversed to 'kil'.
I wrote something but it's too messy and that part still doesn't work for me,
def q2(sentence, word, occurrence):
count = 0
reSentence = ''
reWord = ''
for char in word:
if sentence.find(word) == -1:
print('could not find the word')
break
for letter in sentence:
if char == letter:
if word != reWord:
reWord += char
reSentence += letter
break
elif word == reWord:
if count == int(occurrence):
reWord = word[::-1]
reSentence += reWord
elif count > int(occurrence):
print("no such occurrence")
else:
count += 1
else:
reSentence += letter
print(reSentence)
sentence = 'Dani likes bananas, Dani also likes apples'
word = 'li'
occurrence = '2'
q2(sentence,word,occurrence)
the main problem right now is that, after it breaks it goes back to check from the start of the sentence so it will find i in "Dani". I couldn't think of a way to make it check from where it stopped.
I tried using enumerate but still had no idea how.

This will work for the given scenario
scentence = 'Dani likes bananas, Dani also likes apples'
word = 'lik'
st = word
occ = 2
lt = scentence.split(word)
op = ''
if (len(lt) > 1):
for i,x in enumerate(lt[:-1]):
if (i+1) == occ:
word = ''.join(reversed(word))
op = op + x + word
word = st
print(op+lt[-1])
Please test yourself for other scenario
This line for i,x in enumerate(lt[:-1]) basically loops on the list excluding the last element. using enumerate we can get index of the element in the list in i and value of element in x. So when code gets loops through it I re-join the split list with same word by which I broke, but I change the word on the specified position where you desired. The reason to exclude the last element while looping is because inside loop there is addition of word and after each list of element and if I include the whole list there will be extra word at the end. Hope it explains.

Your approach shows that you've clearly thought about the problem and are using the means you know well enough to solve it. However, your code has a few too many issue to simply fix, for example:
you only check for occurrence of the word once you're inside the loop;
you loop over the entire sentence for each letter in the word;
you only compare a character at a time, and make some mistakes in keeping track of how much you've matched so far.
you pass a string '2', which you intend to use as a number 2
All of that and other problems can be fixed, but you would do well to use what the language gives you. Your task breaks down into:
find the n-th occurrence of a substring in a string
replace it with another word where found and return the string
Note that you're not really looking for a 'word' per se, as your example shows you replacing only part of a word (i.e. 'lik') and a 'word' is commonly understood to mean a whole word between word boundaries.
def q2(sentence, word, occurrence):
# the first bit
position = 0
count = 0
while count < occurrence:
position = sentence.find(word, position+1)
count += 1
if position == -1:
print (f'Word "{word}" does not appear {occurrence} times in "{sentence}"')
return None
# and then using what was found for a result
return sentence[0:position] + word[::-1] + sentence[position+len(word):]
print(q2('Dani likes bananas, Dani also likes apples','lik',2))
print(q2('Dani likes bananas, Dani also likes apples','nope',2))
A bit of explanation on that return statement:
sentence[0:position] gets sentence from the start 0 to the character just before position, this is called a 'slice'
word[::-1] get word from start to end, but going in reverse -1. Leaving out the values in the slice implies 'from one end to the other'
sentence[position+len(word):] gets sentence from the position position + len(word), which is the character after the found word, until the end (no index, so taking everything).
All those combined is the result you need.
Note that the function returns None if it can't find the word the right number of times - that may not be what is needed in your case.

import re
from itertools import islice
s = "Dani likes bananas, Dani also likes apples"
t = "lik"
n = 2
x = re.finditer(t, s)
try:
i = next(islice(x, n - 1, n)).start()
except StopIteration:
i = -1
if i >= 0:
y = s[i: i + len(t)][::-1]
print(f"{s[:i]}{y}{s[i + len(t):]}")
else:
print(s)
Finds the 2nd starting index (if exists) using Regex. May require two passes in the worst case over string s, one to find the index, one to form the output. This can also be done in one pass using two pointers, but I'll leave that to you. From what I see, no one has offered a solution yet that does in one pass.

index = Find index of nth occurence
Use slice notation to get part you are interested in (you have it's beginning and length)
Reverse it
Construct your result string:
result = sentence[:index] + reversed part + sentence[index+len(word):]

How to avoid Runtime error in this coding challenge?

I am completing this HackerRank coding challenge. Essentially, the challenge asks us to find all the substrings of the input string without mixing up the letters. Then, we count the number of substrings that start with a vowel and count the number of substrings that start with a consonant.
The coding challenge is structured as a game where Stuart's score is the number of consonant starting substrings and Kevin's score is the number of vowel starting substrings. The program outputs the winner, i.e. the one with the most substrings.
For example, I created the following code:
def constwordfinder(word):
word = word.lower()
return_lst = []
for indx in range(1,len(word)+1):
if word[indx-1:indx] not in ['a','e','i','o','u']:
itr = indx
while itr < len(word)+1:
return_lst.append(word[indx-1:itr])
itr +=1
return return_lst
def vowelwordfinder(word):
word = word.lower()
return_lst = []
for indx in range(1,len(word)+1):
if word[indx-1:indx] in ['a','e','i','o','u']:
itr = indx
while itr < len(word)+1:
return_lst.append(word[indx-1:itr])
itr +=1
return return_lst
def game_scorer(const_list, vow_list):
if len(const_list) == len(vow_list):
return 'Draw'
else:
if len(const_list) > len(vow_list):
return 'Stuart ' + str(len(const_list))
else:
return 'Kevin ' + str(len(vow_list))
input_str = input()
print(game_scorer(constwordfinder(input_str), vowelwordfinder(input_str)))
This worked for smaller strings like BANANA, although when HackerRank started inputting strings like the following, I got multiple Runtime errors on the test cases:
NANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANAN
I tried structuring the program to be a bit more concise, although I still got Runtime errors on the longer test cases:
def wordfinder(word):
word = word.lower()
return_lst = []
for indx in range(1,len(word)+1):
itr = indx
while itr < len(word)+1:
return_lst.append(word[indx-1:itr])
itr +=1
return return_lst
def game_scorer2(word_list):
kevin_score = 0
stuart_score = 0
for word in word_list:
if word[0:1] not in ['a','e','i','o','u']:
stuart_score += 1
else:
kevin_score +=1
if stuart_score == kevin_score:
return 'Draw'
else:
if stuart_score > kevin_score:
return 'Stuart ' + str(stuart_score)
else:
return 'Kevin ' + str(kevin_score)
print(game_scorer2(wordfinder(input())))
What else exactly should I be doing to structure my program to avoid Runtime errors like before?

Here's a quick and dirty partial solution based on my hints:
input_str = raw_input()
kevin = 0
for i, c in enumerate(input_str):
if c.lower() in "aeiou":
kevin += len(input_str) - i
print kevin
Basically, iterate over each character, and if it is in the set of vowels, Kevin's score increases by the number of remaining characters in the string.
The remaining work should be rather obvious, I hope!
[stolen from the spoilers section of the site in question]
Because say for each consonant, you can make n substrings beginning with that consanant. So for the BANANA example look at the first B. With that B, you can make: B, BA, BAN, BANA, BANAN, BANANA. That's six substrings starting with that B or length(string) - indexof(character), which means that B adds 6 to the score. So you go through the string, looking for each consonant, and add length(string) - index to the score.

The problem here is your algorithm.You are finding all the Sub strings of the text. It takes exponential time to solve the problem. That's why you got run time errors here. You have to use another good algorithm to solve this problem rather than using sub strings.

Scrambling strings with random.shuffle in Python 3.x

Hello I'm new to programming and have been teaching myself Python through various means for the past year.I have been lurking here for a while and have gotten a lot of help but now i have a specific problem:
Im working on a word scramble game and i ran into a snag.
Python 3.x code
def scrambler(word):
word_to_scramble = list(word)
random.shuffle(word_to_scramble)
new_word = ''.join(word_to_scramble)
return new_word
Now the code above works as desired, but occasionally it will return the original word i fed into the scrambler function.
My question is: Is there a way to ensure that the string returned is always different than the one given.
I attempted to use the scrambler function inside itself with a while loop, which would crash the script and give me an error stating the recursion limit had been reached.
Any help would be greatly appreciated.

If you're hitting the recursion limit, then it's likely that you're trying to scramble a 1-character word. However, it's quite easy to remove the recursion in this problem ...:
def _scrambler(word):
word_to_scramble = list(word)
random.shuffle(word_to_scramble)
new_word = ''.join(word_to_scramble)
return new_word
def scrambler(word):
new_word = _scrambler(word)
while new_word == word and len(word) > 1:
new_word = _scrambler(word)
return new_word
I've added a len(word) > 1 check so this should eventually give you a different word than what you put in if it is possible to do so.

hope you enjoy with programming Chris.
well if u test like this...
from random import shuffle
word='test'
def scrambler(word):
word_to_scramble = list(word)
shuffle(word_to_scramble)
new_word = ''.join(word_to_scramble)
return new_word
print(scrambler(word))
u will get the word test in ~ <25th try as i did ...(cant be sure because its not realy random)
so u can just put some if statement in your function. and if new_world is same as word u can just recall the function again... and it doesn't need any loop. its like some recursive base function.
something like this...
enter code here
from random import shuffle
word = 'test'
def scrambler(word):
word_to_scramble = list(word)
shuffle(word_to_scramble)
new_word = ''.join(word_to_scramble)
#print (new_word) if u want to check what is going on ...
if new_word == word:
scrambler(word)
elif new_word != word:
return new_word
print(new_word)
scrambler(word)

Creating a word scrambler but it won't work, need help as a beginner

Beginner python coder here, keep things simple, please.
So, I need this code below to scramble two letters without scrambling the first or last letters. Everything seems to work right up until the scrambler() function.
from random import randint
def wordScramble(string):
stringArray = string.split()
for word in stringArray:
if len(word) >= 4:
letter = randint(1,len(word)-2)
point = letter
while point == letter:
point = randint(1, len(word)-2)
word = switcher(word,letter,point)
' '.join(stringArray)
return stringArray
def switcher(word,letter,point):
word = list(word)
word[letter],word[point]=word[point],word[letter]
return word
print(wordScramble("I can't wait to see how this turns itself out"))
The outcome is always:
I can't wait to see how this turns itself out

Since you are a beginner, I tried to change your code as little as possible. Mostly you are expecting changes to word to change the contents or your list stringArray. The comments mark the changes and reasons.
from random import randint
def wordScramble(myString): # avoid name clashes with python modules
stringArray = myString.split()
for i, word in enumerate(stringArray): # keep the index so we can update the list
if len(word) >= 4:
letter = randint(1,len(word)-2)
point = letter
while point == letter:
point = randint(1, len(word)-2)
stringArray[i] = switcher(word,letter,point) # update the array
return ' '.join(stringArray) # return the result of the join
def switcher(word,letter,point):
word = list(word)
word[letter],word[point]=word[point],word[letter]
return ''.join(word) # return word back as a string
print(wordScramble("I can't wait to see how this turns itself out"))

Because there had to be a cleaner (and better documented) way to do this:
from random import sample
def wordScramble(sentence):
# Split sentence into words; apply switcher to each; rejoin into a sentence
return ' '.join([switcher(x) for x in sentence.split()])
def switcher(word):
if len(word) <= 3: # Don't bother if not enough letters to scramble
return word
# Pick 2 positions from interior of word
a,b = sorted(sample( xrange(1,len(word)-1), 2 ))
# Re-assemble word with out 2 positions swapped using bits before, between & after them
return word[:a] + word[b] + word[a+1:b] + word[a] + word[b+1:]
print wordScramble("I can't wait to see how this turns itself out")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find frequency for a specific word in a given string - python

Related

Number of words in text you can fully type using this keyboard

Reverse a specific word function

How to avoid Runtime error in this coding challenge?

Scrambling strings with random.shuffle in Python 3.x

Creating a word scrambler but it won't work, need help as a beginner

Categories

Resources