Scanning for two word phrase in Python dictionary - python

I am trying to use a Python dictionary object to help translate an input string to other words or phrases. I am having success with translating single words from the input, but I can't seem to figure out how to translate multi-word phrases.
Example:
sentence = input("Please enter a sentence: ")
myDict = {"hello": "hi","mean adult":"grumpy elder", ...ect}
How can I return hi grumpy elder if the user enters hello mean adult for the input?

"fast car" is a key to the dictionary, so you can extract the value if you use the key coming back from it.
If you're taking the input straight from the user and using it to reference the dictionary, get is safer, as it allows you to provide a default value in case the key doesn't exist.
print(myDict.get(sentence, "Phrase not found"))
Since you've clarified your requirements a bit more, the hard part now is the splitting; the get doesn't change. If you can guarantee the order and structure of the sentences (that is, it's always going to be structured such that we have a phrase with 1 word followed by a phrase with 2 words), then split only on the first occurrence of a space character.
split_input = input.split(' ', 1)
print("{} {}".format(myDict.get(split_input[0]), myDict.get(split_input[1])))
More complex split requirements I leave as an exercise for the reader. A hint would be to use the keys of myDict to determine what valid tokens are present in the sentence.

The same way as you normally would.
translation = myDict['fast car']
A solution to your particular problem would be something like the following, where maxlen is the maximum number of words in a single phrase in the dictionary.
translation = []
words = sentence.split(' ')
maxlen = 3
index = 0
while index < len(words):
for i in range(maxlen, 0, -1):
phrase = ' '.join(words[index:index+i])
if phrase in myDict:
translation.append(myDict[phrase])
index += i
break
else:
translation.append(words[index])
index += 1
print ' '.join(translation)
Given the sentence hello this is a nice fast car, it outputs hi this is a sweet quick ride

This will check for each word and also a two word phrase using the word before and after the current word to make the phrase:
myDict = {"hello": "hi",
"fast car": "quick ride"}
sentence = input("Please enter a sentence: ")
words = sentence.split()
for i, word in enumerate(words):
if word in myDict:
print myDict.get(word)
continue
if i:
phrase = ' '.join([words[i-1], word])
if phrase1 in myDict:
print myDict.get(phrase)
continue
if i < len(words)-1:
phrase = ' '.join([word, words[i+1])
if phrase in myDict:
print myDict.get(phrase)
continue

Related

Python Question Relating to Finding Anagram from Dictionary

I am struggling with this project that I am working on.
Edit: I want the program to find 2 words from the dictionary that are the anagram of the input word(s). The way I wanted to approach this program is by using counter(input()) and then looping through the dictionary content twice (finding first word anagram then the next). The loop would take every word from the dictionary, counter(that word) and see if it is <= counter(input word). Once the program finds first anagram, it adds that word to candidate and proceeds to second loop to find the second word.
To put to simple words, if I input a word (or a phrase), I would like the program to run through a dictionary text file (which I have saved) and find two words from the dictionary that becomes anagram to my input. For instance, if I input "dormitory" the program output should be "dirty room" and if input "a gentleman", output "elegant man". Here is what I have done so far:
from pathlib import Path
from collections import Counter
my_dictionary = open(Path.home() / 'dictionary.txt')
my_words = my_dictionary.read().strip().split('\n')
my_dictionary.close()
letter_number = 0
my_word = []
print('Please type in your phrase:')
word = input()
word = word.replace(" ","")
word_map = Counter(word.lower())
for a_word in my_words:
test = ''
candidate = ''
test_word = Counter(a_word.lower())
for letter in test_word:
if test_word[letter] <= word_map[letter]:
test += letter
if Counter(test) == test_word:
candidate += a_word.lower()
for a_word in my_words:
test = ''
test_word = Counter(a_word.lower())
for letter in test_word:
if test_word[letter] <= word_map[letter]:
test += letter
if Counter(test) == test_word:
candidate += a_word.lower()
if Counter(candidate) == word_map:
my_word.append(candidate)
print(my_word)
For some reason I am getting nothing from the output.
I cannot get any result after I put my input.
I also have tried to use del. command for getting rid of the word counter of first word from dictionary then proceed to find a second word from the dictionary but that didn't work either.
In summary, there must be some wrong place in the codes that flaws the program to not give any output.
Please help me figure out my mistake and error.
Thanks in advance.
Code can be optimized as follows:
# script.py
from pathlib import Path
from collections import Counter
filename = 'dictionary.txt'
my_words = Path.home().joinpath(filename).read_text().strip().splitlines()
word = input('Please type in your phrase:\n').replace(" ","")
word_counter = Counter(word.lower())
def parse(my_words=my_words):
matches = []
for a_word in my_words:
a_word_counter = Counter(a_word.lower())
if all(c <= word_counter[w] for c in a_word_counter.values()):
matches.append(a_word)
return matches
def exactly_parse(my_words=my_words):
return [w for w in my_words if Counter(w) == word_counter]
my_word = parse()
print(my_word)
Let's say content of dictionary.txt:
$ cat dictionary.txt
how
are
you
fine
thanks
input word is how
What's the expected output? how
$ python script.py
Please type in your phrase:
how
['how']
$ python script.py
Please type in your phrase:
thanksyou
['you', 'thanks']

Python program for word count, average word length, word frequency and frequency of words starting with letters of the alphabet

Need to write a Python program that analyzes a file and counts:
The number of words
The average length of a word
How many times each word occurs
How many words start with each letter of the alphabet
I've got the code to do the first 2 things:
with open(input('Please enter the full name of the file: '),'r') as f:
w = [len(word) for line in f for word in line.rstrip().split(" ")]
total_w = len(w)
avg_w = sum(w)/total_w
print('The total number of words in this file is:', total_w)
print('The average length of the words in this file is:', avg_w)
But I'm not sure on how to do the others. Any help is appreciated.
Btw, when I say "How many words start with each letter of the alphabet" I mean how many words start with "A", how many start with "B", how many start with "C", etc all the way through to "Z".
There are many ways to achieve this, a more advanced approach would involve an initial simple gathering of the text and its words, then working on the data with ML/DS tools, with which you could extrapolate more statistics (Things like "a new paragraph starts mostly with X words" / "X words are mostly preceeded/succeeded by Y words" etc.)
If you just need very basic statistics you can gather them while iterating over each word and do the calculations at the end of it, like:
stats = {
'amount': 0,
'length': 0,
'word_count': {},
'initial_count': {}
}
with open('lorem.txt', 'r') as f:
for line in f:
line = line.strip()
if not line:
continue
for word in line.split():
word = word.lower()
initial = word[0]
# Add word and length count
stats['amount'] += 1
stats['length'] += len(word)
# Add initial count
if not initial in stats['initial_count']:
stats['initial_count'][initial] = 0
stats['initial_count'][initial] += 1
# Add word count
if not word in stats['word_count']:
stats['word_count'][word] = 0
stats['word_count'][word] += 1
# Calculate average word length
stats['average_length'] = stats['length'] / stats['amount']
Online Demo here
Interesting challenge you were given, i made a proposition for question 3, how many times a word occurs inside the string. This code is not optimal at all, but it does work.
also i used the file text.txt
edit: noticed i forgot to create wordlist as it was saved in my ram memory
with open('text.txt', 'r') as doc:
print('opened txt')
for words in doc:
wordlist = words.split()
for numbers in range(len(wordlist)):
for inner_numbers in range(len(wordlist)):
if inner_numbers != numbers:
if wordlist[numbers] == wordlist[inner_numbers]:
print('word: %s == %s' %(wordlist[numbers], wordlist[inner_numbers]))
Answer to question four: This one wasn't really hard after you have created a list with all the words since strings can be treated like a list and you can easily get the first letter of the string by simply doing string[0] and if its a list with strings stringList[position of word][0]
for numbers in range(len(wordlist)):
if wordlist[numbers][0] == 'a':
print(wordlist[numbers])

Python: Iterate through string and print only specific words

I'm taking a class in python and now I'm struggling to complete one of the tasks.
The aim is to ask for an input, integrate through that string and print only words that start with letters > g. If the word starts with a letter larger than g, we print that word. Otherwise, we empty the word and iterate through the next word(s) in the string to do the same check.
This is the code I have, and the output. Would be grateful for some tips on how to solve the problem.
# [] create words after "G" following the Assignment requirements use of functions, menhods and kwyowrds
# sample quote "Wheresoever you go, go with all your heart" ~ Confucius (551 BC - 479 BC)
# [] copy and paste in edX assignment page
quote = input("Enter a sentence: ")
word = ""
# iterate through each character in quote
for char in quote:
# test if character is alpha
if char.isalpha():
word += char
else:
if word[0].lower() >= "h":
print(word.upper())
else:
word=""
Enter a sentence: Wheresoever you go, go with all your heart
WHERESOEVER
WHERESOEVERYOU
WHERESOEVERYOUGO
WHERESOEVERYOUGO
WHERESOEVERYOUGOGO
WHERESOEVERYOUGOGOWITH
WHERESOEVERYOUGOGOWITHALL
WHERESOEVERYOUGOGOWITHALLYOUR
The output should look like,
Sample output:
WHERESOEVER
YOU
WITH
YOUR
HEART
Simply a list comprehension with split will do:
s = "Wheresoever you go, go with all your heart"
print(' '.join([word for word in s.split() if word[0].lower() > 'g']))
# Wheresoever you with your heart
Modifying to match with the desired output (Making all uppercase and on new lines):
s = "Wheresoever you go, go with all your heart"
print('\n'.join([word.upper() for word in s.split() if word[0].lower() > 'g']))
'''
WHERESOEVER
YOU
WITH
YOUR
HEART
'''
Without list comprehension:
s = "Wheresoever you go, go with all your heart"
for word in s.split(): # Split the sentence into words and iterate through each.
if word[0].lower() > 'g': # Check if the first character (lowercased) > g.
print(word.upper()) # If so, print the word all capitalised.
Here is a readable and commented solution. The idea is first to split the sentence into a list of words using re.findall (regex package) and iterate through this list, instead of iterating on each character as you did. It is then quite easy to print only the words starting by a letter greater then 'g':
import re
# Prompt for an input sentence
quote = input("Enter a sentence: ")
# Split the sentence into a list of words
words = re.findall(r'\w+', quote)
# Iterate through each word
for word in words:
# Print the word if its 1st letter is greater than 'g'
if word[0].lower() > 'g':
print(word.upper())
To go further, here is also the one-line style solution based on exactly the same logic, using list comprehension:
import re
# Prompt for an input sentence
quote = input("Enter a sentence: ")
# Print each word starting by a letter greater than 'g', in upper case
print(*[word.upper() for word in re.findall(r'\w+', quote) if word[0].lower() > 'g'], sep='\n')
s = "Wheresoever you go, go with all your heart"
out = s.translate(str.maketrans(string.punctuation, " "*len(string.punctuation)))
desired_result = [word.upper() for word in out.split() if word and word[0].lower() > 'g']
print(*desired_result, sep="\n")
Your problem is that you're only resetting word to an empty string in the else clause. You need to reset it to an empty string immediately after the print(word.upper()) statement as well for the code as you've wrote it to work correctly.
That being said, if it's not explicitly disallowed for the class you're taking, you should look into string methods, specifically string.split()

How to change the value of a string with a dictionary value in Python

I have the following dictionary in Python:
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
and with a string like : "How are you?" I wish to have the following result once compared to myDict:
"como are tu?"
as you can see If a word doesn't appear in myDict like "are" in the result appears as it.
This is my code until now:
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
def translate(word):
word = word.lower()
word = word.split()
for letter in word:
if letter in myDict:
return myDict[letter]
print(translate("How are you?"))
As a result only gets the first letter : como , so what am I doing wrong for not getting the entire sentence?
Thanks for your help in advanced!
The function returns (exits) the first time it hits a return statement. In this instance, that will always be the first word.
What you should do is make a list of words, and where you see the return currently, you should add to the list.
Once you have added each word, you can then return the list at the end.
PS: your terminology is confusing. What you have are phrases, each made up of words. "This is a phrase" is a phrase of 4 words: "This", "is", "a", "phrase". A letter would be the individual part of the word, for example "T" in "This".
The problem is that you are returning the first word that is mapped in your dictionary, so you can use this (I have changed some variable names because is kind of confusing):
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
def translate(string):
string = string.lower()
words = string.split()
translation = ''
for word in words:
if word in myDict:
translation += myDict[word]
else:
translation += word
translation += ' ' # add a space between words
return translation[:-1] #remove last space
print(translate("How are you?"))
Output:
'como are tu?'
When you call return, the method that is currently being executed is terminated, which is why yours stops after finding one word. For your method to work properly, you would have to append to a String that is stored as a local variable within the method.
Here's a function that uses list comprehension to translate a String if its exists in the dictionary:
def translate(myDict, string):
return ' '.join([myDict[x.lower()] if x.lower() in myDict.keys() else x for x in string.split()])
Example:
myDict = {"how": "como", "you?": "tu?", "goodbye": "adios", "where": "donde"}
print(translate(myDict, 'How are you?'))
>> como are tu?
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
s = "How are you?"
newString =''
for word in s.lower().split():
newWord = word
if word in myDict:
newWord = myDict[word]
newString = newString+' '+newWord
print(newString)

Need help making a "compliment generator"

I'm trying to make a simple compliment generator that takes a noun and an adjective from two separate list ands randomly combines them together. I can get one on its own to work but trying to get the second word to appear makes weird stuff happen. What am I doing wrong here? any input would be great.
import random
sentence = "Thou art a *adj *noun."
sentence = sentence.split()
adjectives = ["decadent", "smelly", "delightful", "volatile", "marvelous"]
indexCount= 0
noun = ["dandy", "peaseant", "mule", "maiden", "sir"]
wordCount= 0
for word in sentence:
if word =="*adj":
wordChoice = random.choice (adjectives)
sentence [indexCount] = wordChoice
indexCount += 1
for word in sentence:
if "*noun" in word:
wordChoice = random.choice (noun)
sentence [wordCount] = wordChoice
wordCount += 1
st =""
for word in sentence:
st+= word + " "
print (st)
The end result nets me a double noun. how would I get rid of the duplicate?
You aren't incrementing wordCount in the second loop as you do indexCount in the first.

Categories

Resources