I'm writing a code that will go over each word in words, look them up in dictionary and then append the dictionary value to counter. However if I print counter, I only get the last number from my if statement, if any. If I place print counter inside the loop, then I get all the numbers for each individual word, but no total value.
My code is the following:
dictionary = {word:2, other:5, string:10}
words = "this is a string of words you see and other things"
if word in dictionary.keys():
number = dictionary[word]
counter += number
print counter
my example will give me:
[10]
[5]
while I want 15, preferable outside the loop, as in the real life code, words is not a single string but many strings which are being looped over.
Can anyone help me with this?
Here's a pretty straightforward example, that prints 15:
dictionary = {'word': 2, 'other': 5, 'string': 10}
words = "this is a string of words you see and other things"
counter = 0
for word in words.split():
if word in dictionary:
counter += dictionary[word]
print counter
Note that you should declare counter=0 before the loop and use word in dictionary instead of word in dictionary.keys().
You can also write the same thing in one-line using sum():
print sum(dictionary[word] for word in words.split() if word in dictionary)
or:
print sum(dictionary.get(word, 0) for word in words.split())
you should declare the counter outside the loop. Everything else you do in your code is correct.
The correct code:
dictionary = {word:2, other:5, string:10}
words = "this is a string of words you see and other things"
counter = 0
if word in dictionary.keys():
number = dictionary[word]
counter += number
print counter
I'm not sure what you're doing with that code, since I don't see any loop there. However, a way to do what you want would be the following:
sum(dictionary[word] for word in words.split() if word in dictionary)
Related
This program functions like an anagram, the segment below shows a small algorithm which goes through a list of given words that are stored within a list named word_list and compares the items within to a choice word that is inserted by the user.
The first loop iterates through every one of those items within the list and assigns them to word then sets shared_letters(counter to decide whether or not the letters word can be found within choice) to zero before starting to go through shared letters between the two words in order to not overflow the i iterable during the second loop.
The second loop iterates x using the length of word which is stored within word length . Then the loop goes through a conditional if-statement which decides whether the x index letter of sliced word (which is just equal to list(word)) is found within sliced choice (list(choice)). If it is then the counter shared_letters goes up by 1, otherwise it breaks out of the second loop and goes back to the first in order to get a new word.
The looping process has worked fine before with me, but for some reason in this segment of code it just no longer runs the second loop at all, I've tried putting in print statements to check the routes that the program was taking, and it always skipped over the nested for loop. Even when I tried turning it into something like a function, the program just refused to go through that function.
choice = input("Enter a word: ") # User enters a word
# Algorithm below compares the entered word with all the words found in the dictionary, then saves any words found into "sushi" list
for i in range(num_words): # Word loop, gives iterated word
word = word_list[i] # Picks a loop
shared_letters = 0 # Resets # of shared letters
for x in range(word_length): # Goes through the letters of iterated word
if sliced_word[x] in sliced_choice:
shared_letters = x + 1
elif sliced_word[x] not in sliced_choice:
break
Here is the complete program just in case you want to get a better idea of it, sorry if the coding looks all jumbled up, I've been trying a lot with this program and I just seem to never reach a good solution.
word_list = ["race","care","car","rake","caring","scar"]
sushi = []
word = ""
sliced_word = list(word)
word_length = len(sliced_word)
choice_word = ""
sliced_choice = list(choice_word)
choice_length = len(sliced_choice)
shared_letters = 0
num_words = len(word_list)
next_word = False
choice = input("Enter a word: ") # User enters a word
# Algorithm below compares the entered word with all the words found in the dicitionary, then saves any words found into "sushi" list
for i in range(num_words): # Word loop, gives iterated word
word = word_list[i] # Picks a loop
shared_letters = 0 # Resets # of shared letters
for x in range(word_length): # Goes through the letters of iterated word
if sliced_word[x] in sliced_choice:
# Checks if the letters of the iterated word can be found in the choice word
shared_letters = x + 1
elif sliced_word[x] not in sliced_choice:
break # If any of the letters within the iterated word are not within the choice word, it moves onto the next word
if shared_letters == word_length:
sushi.append(word_list[i])
# If all of the letters within the iterated word are found in the choice word, it appends the iterated word into the "sushi" list. Then moves onto the next word in the word_list.
You have a number of issues, but I think the biggest is that this search does not account for anagrams that have multiple of the same letter. The easiest way to determine if a word would be an anagram or not would be to see if they each have the same count for each letter.
There is a builtin helper class called Counter from the collections module that can help with this.
>>> from collections import Counter
>>> Counter("care")
Counter({'c': 1, 'a': 1, 'r': 1, 'e': 1})
>>> Counter("race")
Counter({'r': 1, 'a': 1, 'c': 1, 'e': 1})
>>> Counter("care") == Counter("race")
True
Working this into your example, you could refactor like this:
word_list = ["race","care","car","rake","caring","scar"]
sushi = []
for word in word_list:
if Counter(choice) == Counter(word):
sushi.append(word)
Now this is kind of slow if we have to make the Counter objects over and over again, so you could choose to store them in a dictionary:
word_list = ["race","care","car","rake","caring","scar"]
word_counters = {word: Counter(word) for word in word_list}
sushi = []
for word, counter in word_counters.items():
if Counter(choice) == counter:
sushi.append(word)
If you want to find an imperfect match, say one word is contained in the other, you can use the - operator and test if the lefthand side has any letters left over afterwards:
>>> not (Counter("race") - Counter("racecar"))
True
>>> not (Counter("race") - Counter("bob"))
False
Working that back into the example:
word_list = ["race","care","car","rake","caring","scar"]
word_counters = {word: Counter(word) for word in word_list}
sushi = []
for word, counter in word_counters.items():
if not (Counter(choice) - counter):
sushi.append(word)
I have the following dictionary in Python:
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
and with a string like : "How are you?" I wish to have the following result once compared to myDict:
"como are tu?"
as you can see If a word doesn't appear in myDict like "are" in the result appears as it.
This is my code until now:
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
def translate(word):
word = word.lower()
word = word.split()
for letter in word:
if letter in myDict:
return myDict[letter]
print(translate("How are you?"))
As a result only gets the first letter : como , so what am I doing wrong for not getting the entire sentence?
Thanks for your help in advanced!
The function returns (exits) the first time it hits a return statement. In this instance, that will always be the first word.
What you should do is make a list of words, and where you see the return currently, you should add to the list.
Once you have added each word, you can then return the list at the end.
PS: your terminology is confusing. What you have are phrases, each made up of words. "This is a phrase" is a phrase of 4 words: "This", "is", "a", "phrase". A letter would be the individual part of the word, for example "T" in "This".
The problem is that you are returning the first word that is mapped in your dictionary, so you can use this (I have changed some variable names because is kind of confusing):
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
def translate(string):
string = string.lower()
words = string.split()
translation = ''
for word in words:
if word in myDict:
translation += myDict[word]
else:
translation += word
translation += ' ' # add a space between words
return translation[:-1] #remove last space
print(translate("How are you?"))
Output:
'como are tu?'
When you call return, the method that is currently being executed is terminated, which is why yours stops after finding one word. For your method to work properly, you would have to append to a String that is stored as a local variable within the method.
Here's a function that uses list comprehension to translate a String if its exists in the dictionary:
def translate(myDict, string):
return ' '.join([myDict[x.lower()] if x.lower() in myDict.keys() else x for x in string.split()])
Example:
myDict = {"how": "como", "you?": "tu?", "goodbye": "adios", "where": "donde"}
print(translate(myDict, 'How are you?'))
>> como are tu?
myDict = {"how":"como", "you?":"tu?", "goodbye":"adios", "where":"donde"}
s = "How are you?"
newString =''
for word in s.lower().split():
newWord = word
if word in myDict:
newWord = myDict[word]
newString = newString+' '+newWord
print(newString)
So I need to make a program that gets the user to enter a sentence, and then the code turns that sentence into numbers corresponding to it's position in the list, I cam across the command Enumerate here: Python using enumerate inside list comprehension but this gets every character not every word, so this is my code so far, can anyone help me fix this?
list = []
lists = ""
sentence= input("Enter a sentence").lower()
print(sentence)
list.append(lists)
print(lists)
for i,j in enumerate(sentence):
print (i,j)
Your sentence is string, so it is split to single chars. You should split it to words first:
for i,j in enumerate(sentence.split(' ')):
You can also try this:
>>> sentence = 'I like Moive'
>>> sentence = sentence.lower()
>>> sentence = sentence.split()
>>> for i, j in enumerate(sentence):
... print(i, j)
I'm trying to make a word guessing program and I'm having trouble printing parallel tuples. I need to print the "secret word" with the corresponding hint, but the code that I wrote doesn't work. I can't figure out where I'm going wrong.
Any help would be appreciated :)
This is my code so far:
import random
Words = ("wallet","canine")
Hints = ("Portable money holder","Man's best friend")
vowels = "aeiouy"
secret_word = random.choice(Words)
new_word = ""
for letter in secret_word:
if letter in vowels:
new_word += '_'
else:
new_word += letter
maxIndex = len(Words)
for i in range(1):
random_int = random.randrange(maxIndex)
print(new_word,"\t\t\t",Hints[random_int])
The issue here is that random_int is, as defined, random. As a result you'll randomly get the right result sometimes.
A quick fix is by using the tuple.index method, get the index of the element inside the tuple Words and then use that index on Hints to get the corresponding word, your print statement looking like:
print(new_word,"\t\t\t",Hints[Words.index(secret_word)])
This does the trick but is clunky. Python has a data structure called a dictionary with which you can map one value to another. This could make your life easier in the long run. To create a dictionary from the two tuples we can zip them together:
mapping = dict(zip(Words, Hints))
and create a structure that looks like:
{'canine': "Man's best friend", 'wallet': 'Portable money holder'}
This helps.
Another detail you could fix is in how you create the new_word; instead of looping you can use a comprehension to create the respective letters and then join these on the empty string "" to create the resulting string:
new_word = "".join("_" if letter in vowels else letter for letter in secret_word)
with exactly the same effect. Now, since you also have the dictionary mapping, getting the respective hint is easy, just supply the key new_word to mapping and it'll return the key.
A revised version of your code looks like this:
import random
Words = ("wallet", "canine")
Hints = ("Portable money holder", "Man's best friend")
mapping = dict(zip(Words, Hints))
vowels = "aeiouy"
secret_word = random.choice(Words)
new_word = "".join("_" if letter in vowels else letter for letter in secret_word)
print(new_word,"\t\t\t", d[secret_word])
I need to display the 10 most frequent words in a text file, from the most frequent to the least as well as the number of times it has been used. I can't use the dictionary or counter function. So far I have this:
import urllib
cnt = 0
i=0
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt")
uniques = []
for line in txtFile:
words = line.split()
for word in words:
if word not in uniques:
uniques.append(word)
for word in words:
while i<len(uniques):
i+=1
if word in uniques:
cnt += 1
print cnt
Now I think I should look for every word in the array 'uniques' and see how many times it is repeated in this file and then add that to another array that counts the instance of each word. But this is where I am stuck. I don't know how to proceed.
Any help would be appreciated. Thank you
The above problem can be easily done by using python collections
below is the Solution.
from collections import Counter
data_set = "Welcome to the world of Geeks " \
"This portal has been created to provide well written well" \
"thought and well explained solutions for selected questions " \
"If you like Geeks for Geeks and would like to contribute " \
"here is your chance You can write article and mail your article " \
" to contribute at geeksforgeeks org See your article appearing on " \
"the Geeks for Geeks main page and help thousands of other Geeks. " \
# split() returns list of all the words in the string
split_it = data_set.split()
# Pass the split_it list to instance of Counter class.
Counters_found = Counter(split_it)
#print(Counters)
# most_common() produces k frequently encountered
# input values and their respective counts.
most_occur = Counters_found.most_common(4)
print(most_occur)
You're on the right track. Note that this algorithm is quite slow because for each unique word, it iterates over all of the words. A much faster approach without hashing would involve building a trie.
# The following assumes that we already have alice30.txt on disk.
# Start by splitting the file into lowercase words.
words = open('alice30.txt').read().lower().split()
# Get the set of unique words.
uniques = []
for word in words:
if word not in uniques:
uniques.append(word)
# Make a list of (count, unique) tuples.
counts = []
for unique in uniques:
count = 0 # Initialize the count to zero.
for word in words: # Iterate over the words.
if word == unique: # Is this word equal to the current unique?
count += 1 # If so, increment the count
counts.append((count, unique))
counts.sort() # Sorting the list puts the lowest counts first.
counts.reverse() # Reverse it, putting the highest counts first.
# Print the ten words with the highest counts.
for i in range(min(10, len(counts))):
count, word = counts[i]
print('%s %d' % (word, count))
from string import punctuation #you will need it to strip the punctuation
import urllib
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt")
counter = {}
for line in txtFile:
words = line.split()
for word in words:
k = word.strip(punctuation).lower() #the The or you You counted only once
# you still have words like I've, you're, Alice's
# you could change re to are, ve to have, etc...
if "'" in k:
ks = k.split("'")
else:
ks = [k,]
#now the tally
for k in ks:
counter[k] = counter.get(k, 0) + 1
#and sorting the counter by the value which holds the tally
for word in sorted(counter, key=lambda k: counter[k], reverse=True)[:10]:
print word, "\t", counter[word]
import urllib
import operator
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt").readlines()
txtFile = " ".join(txtFile) # this with .readlines() replaces new lines with spaces
txtFile = "".join(char for char in txtFile if char.isalnum() or char.isspace()) # removes everything that's not alphanumeric or spaces.
word_counter = {}
for word in txtFile.split(" "): # split in every space.
if len(word) > 0 and word != '\r\n':
if word not in word_counter: # if 'word' not in word_counter, add it, and set value to 1
word_counter[word] = 1
else:
word_counter[word] += 1 # if 'word' already in word_counter, increment it by 1
for i,word in enumerate(sorted(word_counter,key=word_counter.get,reverse=True)[:10]):
# sorts the dict by the values, from top to botton, takes the 10 top items,
print "%s: %s - %s"%(i+1,word,word_counter[word])
output:
1: the - 1432
2: and - 734
3: to - 703
4: a - 579
5: of - 501
6: she - 466
7: it - 440
8: said - 434
9: I - 371
10: in - 338
This methods ensures that only alphanumeric and spaces are in the counter. Doesn't matter that much tho.
Personally I'd make my own implementation of collections.Counter. I assume you know how that object works, but if not I'll summarize:
text = "some words that are mostly different but are not all different not at all"
words = text.split()
resulting_count = collections.Counter(words)
# {'all': 2,
# 'are': 2,
# 'at': 1,
# 'but': 1,
# 'different': 2,
# 'mostly': 1,
# 'not': 2,
# 'some': 1,
# 'that': 1,
# 'words': 1}
We can certainly sort that based on frequency by using the key keyword argument of sorted, and return the first 10 items in that list. However that doesn't much help you because you don't have Counter implemented. I'll leave THAT part as an exercise for you, and show you how you might implement Counter as a function rather than an object.
def counter(iterable):
d = {}
for element in iterable:
if element in d:
d[element] += 1
else:
d[element] = 1
return d
Not difficult, actually. Go through each element of an iterable. If that element is NOT in d, add it to d with a value of 1. If it IS in d, increment that value. It's more easily expressed by:
def counter(iterable):
d = {}
for element in iterable:
d.setdefault(element, 0) += 1
Note that in your use case, you probably want to strip out the punctuation and possibly casefold the whole thing (so that someword gets counted the same as Someword rather than as two separate words). I'll leave that to you as well, but I will point out str.strip takes an argument as to what to strip out, and string.punctuation contains all the punctuation you're likely to need.
You can also do it through pandas dataframes and get result in convinient form as a table: "word-its freq." ordered.
def count_words(words_list):
words_df = pn.DataFrame(words_list)
words_df.columns = ["word"]
words_df_unique = pn.DataFrame(pn.unique(words_list))
words_df_unique.columns = ["unique"]
words_df_unique["count"] = 0
i = 0
for word in pn.Series.tolist(words_df_unique.unique):
words_df_unique.iloc[i, 1] = len(words_df.word[words_df.word == word])
i+=1
res = words_df_unique.sort_values('count', ascending = False)
return(res)
To do the same operation on a pandas data frame, you may use the following through Counter function from Collections:
from collections import Counter
cnt = Counter()
for text in df['text']:
for word in text.split():
cnt[word] += 1
# Find most common 10 words from the Pandas dataframe
cnt.most_common(10)