Adding more than one value to dictionary when looping through string - python

Still super new to Python 3 and have encountered a problem... I am trying to create a function which returns a dictionary with the keys being the length of each word and the values being the words in the string.
For example, if my string is: "The dogs run quickly forward to the park", my dictionary should return
{2: ['to'] 3: ['The', 'run', 'the'], 4: ['dogs', 'park], 7: ['quickly', 'forward']}
Problem is that when I loop through the items, it is only appending one of the words in the string.
def word_len_dict(my_string):
dictionary = {}
input_list = my_string.split(" ")
unique_list = []
for item in input_list:
if item.lower() not in unique_list:
unique_list.append(item.lower())
for word in unique_list:
dictionary[len(word)] = []
dictionary[len(word)].append(word)
return (dictionary)
print (word_len_dict("The dogs run quickly forward to the park"))
The code returns
{2: ['to'], 3: ['run'], 4: ['park'], 7: ['forward']}
Can someone point me in the right direction? Perhaps not giving me the answer freely, but what do I need to look at next in terms of adding the missing words to the list. I thought that appending them to the list would do it, but it's not.
Thank you!

This will solve all your problems:
def word_len_dict(my_string):
input_list = my_string.split(" ")
unique_set = set()
dictionary = {}
for item in input_list:
word = item.lower()
if word not in unique_set:
unique_set.add(word)
key = len(word)
if key not in dictionary:
dictionary[key] = []
dictionary[key].append(word)
return dictionary
You were wiping dict entries each time you encountered a new word. There were also some efficiencly problems (searching a list for membership while growing it resulted in an O(n**2) algorithm for an O(n) task). Replacing the list membership test with a set membership test corrected the efficiency problem.
It gives the correct output for your sample sentence:
>>> print(word_len_dict("The dogs run quickly forward to the park"))
{2: ['to'], 3: ['the', 'run'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}
I noticed some of the other posted solutions are failing to map words to lowercase and/or failing to remove duplicates, which you clearly wanted.

you can create first the list of the unique words like this in order to avoid a first loop, and populate the dictionary on a second step.
unique_string = set("The dogs run quickly forward to the park".lower().split(" "))
dict = {}
for word in unique_string:
key, value = len(word), word
if key not in dict: # or dict.keys() for better readability (but is the same)
dict[key] = [value]
else:
dict[key].append(value)
print(dict)

You are assigning an empty list to the dictionary item before you append the latest word, which erases all previous words.

for word in unique_list:
dictionary[len(word)] = [x for x in input_list if len(x) == len(word)]

Your code is simply resetting the key to an empty list each time, which is why you only get one value (the last value) in the list for each key.
To make sure there are no duplicates, you can set the default value of a key to a set which is a collection that enforces uniqueness (in other words, there can be no duplicates in a set).
def word_len_dict(my_string):
dictionary = {}
input_list = my_string.split(" ")
for word in input_list:
if len(word) not in dictionary:
dictionary[len(word)] = set()
dictionary[len(word)].add(word.lower())
return dictionary
Once you add that check, you can get rid of the first loop as well. Now it will work as expected.
You can also optimize the code further, by using the setdefault method of dictionaries.
for word in input_list:
dictionary.setdefault(len(word), set()).add(word.lower())

Pythonic way,
Using itertools.groupby
>>> my_str = "The dogs run quickly forward to the park"
>>> {x:list(y) for x,y in itertools.groupby(sorted(my_str.split(),key=len), key=lambda x:len(x))}
{2: ['to'], 3: ['The', 'run', 'the'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}

This option starts by creating a unique set of lowercase words and then takes advantage of dict's setdefault to avoid searching the dictionary keys multiple times.
>>> a = "The dogs run quickly forward to the park"
>>> b = set((word.lower() for word in a.split()))
>>> result = {}
>>> {result.setdefault(len(word), []).append(word.lower()) for word in b}
{None}
>>> result
{2: ['to'], 3: ['the', 'run'], 4: ['park', 'dogs'], 7: ['quickly', 'forward']}

Related

How do I classify words in list according to a dictionary?

Say I have a dictionary with keys being a category name and values being words within that category. For example:
words={
'seasons':('summer','spring','autumn','winter'),
'rooms':('kitchen','loo','livingroom','hall','diningroom','bedroom'),
'insects':('bee','butterfly','beetle')}
For any given input I want to create a list with two items where the first item is the value word and the second is the key word. For example, the expected output should be:
input: kitchen
output: ['kitchen','rooms']
input: bee
output: ['bee','insects']
I checked the question Get key by value in dictionary but afaict all answers work for dictionaries with 1 value per key.
I've tried the following naive, closed form code:
word=input('Enter a word: ')
word_pair=[]
word_pair.append(word)
if word in (words['seasons']):
index=0
elif word in (words['rooms']):
index=1
elif word in (words['insects']):
index=2
else:
index=3
try:
key_val=list(words.keys())[index]
word_pair.append(key_val)
except IndexError:
word_pair.append('NA')
print(word_pair)
Obviously, this code only works for this specific dictionary as is. If I wanted to add a category, I'd have to add an elif clause. If I wanted to change the name of a category or their order, remove one, combine two or more, etc., I'd have to tweak a whole bunch of things by hand.
Is there a more generalized way to do this?
All help is appreciated.
You can use generator with unpacking:
inp = input()
result, *_ = ([inp, k] for k, v in words.items() if inp in v)
Even better to use next() with generator, cause it will stop after first match found:
result = next([inp, k] for k, v in words.items() if inp in v)
You can invert that dict with:
>>> {s_v:k for k,v in words.items() for s_v in v}
{'summer': 'seasons', 'spring': 'seasons', 'autumn': 'seasons', 'winter': 'seasons', 'kitchen': 'rooms', 'loo': 'rooms', 'livingroom': 'rooms', 'hall': 'rooms', 'diningroom': 'rooms', 'bedroom': 'rooms', 'bee': 'insects', 'butterfly': 'insects', 'beetle': 'insects'}
And then lookup your input in the inverted dict.
You can do it like this:
words={
'seasons':('summer','spring','autumn','winter'),
'rooms':('kitchen','loo','livingroom','hall','diningroom','bedroom'),
'insects':('bee','butterfly','beetle')}
def find_cat(word):
for category, items in words.items():
if word in items:
return category
word=input('Enter a word: ')
print(find_cat(word))
Explanation: words.items() return a tuple (key, value) for each key in the dictionary. In this case, value is a list. So, we can use the in operator to find if the word is in that list. If yes, simply return the key.
You can iterate over the dictionary keys:
words={
'seasons':('summer','spring','autumn','winter'),
'rooms':('kitchen','loo','livingroom','hall','diningroom','bedroom'),
'insects':('bee','butterfly','beetle')}
search_text = input("input: ")
for key in words.keys():
if search_text in words[key]:
print("output: {0}".format([search_text,key]))
You would test using in rather than exact match (see word in value in the code below), and you'd probably also want to include some kind of check that there is only one matching key, and provided that there is, use the first one.
words = {
'seasons':('summer','spring','autumn','winter'),
'rooms':('kitchen','loo','livingroom','hall','diningroom','bedroom'),
'insects':('bee','butterfly','beetle')}
word = 'kitchen'
keys = [key for key, value in words.items() if word in value]
if len(keys) != 1:
raise ValueError
word_pair = [word, keys[0]]
print(word_pair)
Gives:
['kitchen', 'rooms']
Another way to do it would be to transform your input dictionnary to inverse the logic: make values the keys, and the keys, values.
So a solution like that one:
def invert_dic(index):
new_dic = {}
for k,v in index.items():
for x in v:
new_dic.setdefault(x,[]).append(k)
and then you'd do:
index = invert_dic(words) # linear time, O(n)
word = input('Enter a word: ')
categories = index.get(word, ['None'])) # constant time, O(1)
print(', '.join(f"Your word is in those categories: {categories}")
That solution is mimicking the concept of indexes in databases, where you spend time at the creation of the database (the words dictionary being converted as the index dictionary) and memory to store that index, to have very fast resolution when looking up a word using the hash algorithm (where looking a key is in constant time).
A bonus of the above solution is that if a word is in two categories, you'll get the list of all the categories your word is in.

concatenate two parts of a word in a list using iterator

I need to concatenate certain words that appear separated in a list of words, such as "computer" (below). These words appear separated in the list due to line breaks and I want to fix this.
lst=['love','friend', 'apple', 'com', 'puter']
the expected result is:
lst=['love','friend', 'apple', 'computer']
My code doesn't work. Can anyone help me to do that?
the code I am trying is:
from collections import defaultdict
import enchant
import string
words=['love', 'friend', 'car', 'apple',
'com', 'puter', 'vi']
myit = iter(words)
dic=enchant.Dict('en_UK')
lst=[]
errors=[]
for i in words:
if dic.check(i) is True:
lst.append(i)
if dic.check(i) is False:
a= i + next(myit)
if dic.check(a) is True:
lst.append(a)
else:
continue
print (lst)`
Notwithstanding the fact that this method is not very robust (you would miss "ham-burger", for example), the main error was that you didn't loop on the iterator, but on the list itself. Here is a corrected version.
Note that I renamed the variables to give them more expressive names, and I replaced the dictionnary check by a simple word in dic with a sample vocabulary - the module you import is not part of the standard library, which make your code as-is difficult to run for us who don't have it.
dic = {'love', 'friend', 'car', 'apple',
'computer', 'banana'}
words=['love', 'friend', 'car', 'apple', 'com', 'puter', 'vi']
words_it = iter(words)
valid_words = []
for word in words_it:
if word in dic:
valid_words.append(word)
else:
try:
concacenated = word + next(words_it)
if concacenated in dic:
valid_words.append(concacenated)
except StopIteration:
pass
print (valid_words)
# ['love', 'friend', 'car', 'apple', 'computer']
You need the try ... except part in case the last word of the list is not in the dictionnary, as next() will raise a StopIteration in this case.
The main problem with your code is that you are, on the one hand, iterating words in the for loop and, on the other hand, through the iterator myit. These two iterations are independent, so you cannot use next(myit) within your loop to get the word after i (also, if i is the last word there would be no next word). On the other hand, your problem can be complicated by the fact that there may be split words with parts that are too in the dictionary (e.g. printable is a word, but so are print and able).
Assuming a simple scenario where split word parts are never in the dictionary, I think this algorithm could work better for you:
import enchant
words = ['love', 'friend', 'car', 'apple', 'com', 'puter', 'vi']
myit = iter(words)
dic = enchant.Dict('en_UK')
lst = []
# The word that you are currently considering
current = ''
for i in words:
# Add the next word
current += i
# If the current word is in the dictionary
if dic.check(current):
# Add it to the list
lst.append(current)
# Clear the current word
current = ''
# If the word is not in the dictionary we keep adding words to current
print(lst)

Maximum frequency value for word

Seeking help on Homework
I am given a list and asked to find the most occurring value in a list and returns the amount of times it is occurred. This question is fairly big and i have managed to get through the other parts by myself but this one stumped me.I should add that this is for an assignment any guidance would be appreciated.
Question Statement : Maximum (word) Frequency
For example in a book with the following words ['big', 'big', 'bat', 'bob', 'book'] the maximum frequency is 2, i.e., big is the most frequently occurring word, therefore 2 is the maximum frequency.
def maximum_frequency(new_list):
word_counter = {}
for word in new_list:
if word in word_counter:
word_counter[word] += 1
else:
word_counter[word] = 1
I have gotten this far but I am not sure if its right/where to go from here
Try this:
from collections import Counter
c = Counter(['big', 'big', 'bat', 'bob', 'book'])
max(c.items(), key=lambda x:x[1])
the max will returns the most one by its count, you can do:
key,rate = max(c.items(), key=lambda x:x[1])
the key will be big and the rate will be 2.
also, you can access all of the items count by c.items(). and the output will be
{'big': 2, 'bat': 1, 'bob': 1, 'book': 1}
Edit:
as schwobaseggl said the best practice to find from a counter is to use most_common.
c.most_common(1)[0]
You just need to count the occurrence of all the unique elements and compare the frequency with the previously computed frequency.
sample is a list of words.
def maxfreq(sample):
m=0
frequency=0
word=''
set_sample=list(set(sample))
for i in range(len(set_sample)):
c=sample.count(set_sample[i])
if c>m:
m=c
frequency=m
word=set_sample[i]
return (frequency,word)
Since it sounds like this is some kind of challenge and/or homework you're supposed to be working on, instead of directly providing a code sample let me give you some concepts.
First off, the best way to know if you've seen a word or not is to use a map, in Python -- the term is "dict" and the syntax is simple {}, you can store values like this: my_dict['value'] = true or whatever key/value you need.
So if you're going to read your words, one by one, and store them into this dict, the what should the value be? You know you want to know the maximum frequency, right? Well, so let's use that as our value. By default, if we add a word, we should make sure to set it's initial value to 1 (we've seen it once). And if we see a word a second time, we then increment our frequency.
Now that you have a dict full of words and their frequencies, perhaps you might be able to figure out how to find the one with the largest frequency?
So that being said, things you should look into are:
How to determine if a key exists in a dict
How to modify the value of a key in a dict
How to iterate a dict's key/value pairs
After that, your answer should be pretty easy to figure out.
try this :
>>> MyList = ["above", "big", "above", "cat", "cat", "above", "cat"]
>>> my_dict = {i:MyList.count(i) for i in MyList}
>>> my_dict
{'above': 3, 'big': 1, 'cat': 3}
It can also be accomplish using collections.Counter which is compatible with Python 2.7 or 3.x !
>>> from collections import Counter
>>> MyList = ['big', 'big', 'bat', 'bob', 'book']
>>> dict(Counter(MyList))
{'big': 2, 'bat': 1, 'bob': 1, 'book': 1}
If you are open to Pandas then it can be done as follows:
>>> import pandas as pd
>>> pd.Series(MyList).value_counts()
big 2
book 1
bob 1
bat 1
dtype: int64
#Answer to the OP's next Question in the comment section what if i wanted to get just the maximum value instead of the word .
>>> pd.Series(MyList).value_counts().max()
2
How about this:
def maximum_frequency(new_list):
word_counter = {}
for word in new_list:
if word in word_counter:
word_counter[word] += 1
else:
word_counter[word] = 1
max_freq = max(word_counter.items(), key=(lambda x: x[1]))
return max_freq
if __name__ == '__main__':
test_data = ['big', 'big', 'bat', 'bob', 'book']
print(maximum_frequency(test_data))
Output:
('big', 2)
Works fine with Python 2 and 3 and returns result as a tuple of most frequent word and occurrences count.
EDIT:
If you don't care at all which word has the highest count and you want only the frequency number you can simplify it a bit to:
def maximum_frequency(new_list):
word_counter = {}
for word in new_list:
if word in word_counter:
word_counter[word] += 1
else:
word_counter[word] = 1
return max(word_counter.values())
if __name__ == '__main__':
test_data = ['big', 'big', 'bat', 'bob', 'book']
print(maximum_frequency(test_data))

Keeping number of hits in a dictionary

I have a list of (unique) words:
words = [store, worry, periodic, bucket, keen, vanish, bear, transport, pull, tame, rings, classy, humorous, tacit, healthy]
That i want to crosscheck with two different lists of lists (with the same range), while counting the number of hits.
l1 = [[terrible, worry, not], [healthy], [fish, case, bag]]
l2 = [[vanish, healthy, dog], [plant], [waves, healthy, bucket]]
I was thinking of using a dictionary and assume the word as the key, but would need two 'values' (one for each list) for the number of hits.
So the output would be something like:
{"store": [0, 0]}
{"worry": [1, 0]}
...
{"healthy": [1, 2]}
How would something like this work?
Thank you in advance!
You can use itertools to flatten the list and then use dictionary comprehension:
from itertools import chain
words = [store, worry, periodic, bucket, keen, vanish, bear, transport, pull, tame, rings, classy, humorous, tacit, healthy]
l1 = [[terrible, worry, not], [healthy], [fish, case, bag]]
l2 = [[vanish, healthy, dog], [plant], [waves, healthy, bucket]]
l1 = list(chain(*l1))
l2 = list(chain(*l2))
final_count = {i:[l1.count(i), l2.count(i)] for i in words}
For your dictionary example, you would just need to iterate over each list and add those to the dictionary as so:
my_dict = {}
for word in l1:
if word in words: #This makes sure you only work with words that are in your list of unique words
if word not in my_dict:
my_dict[word] = [0,0]
my_dict[word][0] += 1
for word in l2:
if word in words:
if word not in my_dict:
my_dict[word] = [0,0]
my_dict[word][1] += 1
(Or you could make that repeated code a function that passes in for parameter the list, dictionary, and the index, that way you repeat fewer lines)
If your lists are 2d like in your example, then you just change the first iteration in the for loop to be 2d.
my_dict = {}
for group in l1:
for word in group:
if word in words:
if word not in my_dict:
my_dict[word] = [0,0]
my_dict[word][0] += 1
for group in l2
for word in group:
if word in words:
if word not in my_dict:
my_dict[word] = [0,0]
my_dict[word][1] += 1
Though if you are just wanting to know the words in common, perhaps sets could be an option as well, since you have the union operators in sets for easy viewing of all words in common, but sets eliminate duplicates so if the counts are necessary, then the set isn't an option.

Creating a dictionary where the key is an integer and the value is the length of a random sentence

Super new to to python here, I've been struggling with this code for a while now. Basically the function returns a dictionary with the integers as keys and the values are all the words where the length of the word corresponds with each key.
So far I'm able to create a dictionary where the values are the total number of each word but not the actual words themselves.
So passing the following text
"the faith that he had had had had an affect on his life"
to the function
def get_word_len_dict(text):
result_dict = {'1':0, '2':0, '3':0, '4':0, '5':0, '6' :0}
for word in text.split():
if str(len(word)) in result_dict:
result_dict[str(len(word))] += 1
return result_dict
returns
1 - 0
2 - 3
3 - 6
4 - 2
5 - 1
6 - 1
Where I need the output to be:
2 - ['an', 'he', 'on']
3 - ['had', 'his', 'the']
4 - ['life', 'that']
5 - ['faith']
6 - ['affect']
I think I need to have to return the values as a list. But I'm not sure how to approach it.
I think that what you want is a dic of lists.
result_dict = {'1':[], '2':[], '3':[], '4':[], '5':[], '6' :[]}
for word in text.split():
if str(len(word)) in result_dict:
result_dict[str(len(word))].append(word)
return result_dict
Fixing Sabian's answer so that duplicates aren't added to the list:
def get_word_len_dict(text):
result_dict = {1:[], 2:[], 3:[], 4:[], 5:[], 6 :[]}
for word in text.split():
n = len(word)
if n in result_dict and word not in result_dict[n]:
result_dict[n].append(word)
return result_dict
Check out list comprehensions
Integers are legal dictionaries keys so there is no need to make the numbers strings unless you want it that way for some other reason.
if statement in the for loop controls flow to add word only once. You could get this effect more automatically if you use set() type instead of list() as your value data structure. See more in the docs. I believe the following does the job:
def get_word_len_dict(text):
result_dict = {len(word) : [] for word in text.split()}
for word in text.split():
if word not in result_dict[len(word)]:
result_dict[len(word)].append(word)
return result_dict
try to make it better ;)
Instead of defining the default value as 0, assign it as set() and within if condition do, result_dict[str(len(word))].add(word).
Also, instead of preassigning result_dict, you should use collections.defaultdict.
Since you need non-repetitive words, I am using set as value instead of list.
Hence, your final code should be:
from collections import defaultdict
def get_word_len_dict(text):
result_dict = defaultdict(set)
for word in text.split():
result_dict[str(len(word))].add(word)
return result_dict
In case it is must that you want list as values (I think set should suffice your requirement), you need to further iterate it as:
for key, value in result_dict.items():
result_dict[key] = list(value)
What you need is a map to list-construct (if not many words, otherwise a 'Counter' would be fine):
Each list stands for a word class (number of characters). Map is checked whether word class ('3') found before. List is checked whether word ('had') found before.
def get_word_len_dict(text):
result_dict = {}
for word in text.split():
if not result_dict.get(str(len(word))): # add list to map?
result_dict[str(len(word))] = []
if not word in result_dict[str(len(word))]: # add word to list?
result_dict[str(len(word))].append(word)
return result_dict
-->
3 ['the', 'had', 'his']
2 ['he', 'an', 'on']
5 ['faith']
4 ['that', 'life']
6 ['affect']
the problem here is you are counting the word by length, instead you want to group them. You can achieve this by storing a list instead of a int:
def get_word_len_dict(text):
result_dict = {}
for word in text.split():
if len(word) in result_dict:
result_dict[len(word)].add(word)
else:
result_dict[len(word)] = {word} #using a set instead of list to avoid duplicates
return result_dict
Other improvements:
don't hardcode the key in the initialized dict but let it empty instead. Let the code add the new keys dynamically when necessary
you can use int as keys instead of strings, it will save you the conversion
use sets to avoid repetitions
Using groupby
Well, I'll try to propose something different: you can group by length using groupby from the python standard library
import itertools
def get_word_len_dict(text):
# split and group by length (you get a list if tuple(key, list of values)
groups = itertools.groupby(sorted(text.split(), key=lambda x: len(x)), lambda x: len(x))
# convert to a dictionary with sets
return {l: set(words) for l, words in groups}
You say you want the keys to be integers but then you convert them to strings before storing them as a key. There is no need to do this in Python; integers can be dictionary keys.
Regarding your question, simply initialize the values of the keys to empty lists instead of the number 0. Then, in the loop, append the word to the list stored under the appropriate key (the length of the word), like this:
string = "the faith that he had had had had an affect on his life"
def get_word_len_dict(text):
result_dict = {i : [] for i in range(1, 7)}
for word in text.split():
length = len(word)
if length in result_dict:
result_dict[length].append(word)
return result_dict
This results in the following:
>>> get_word_len_dict(string)
{1: [], 2: ['he', 'an', 'on'], 3: ['the', 'had', 'had', 'had', 'had', 'his'], 4: ['that', 'life'], 5: ['faith'], 6: ['affect']}
If you, as you mentioned, wish to remove the duplicate words when collecting your input string, it seems elegant to use a set and convert to a list as a final processing step, if this is needed. Also note the use of defaultdict so you don't have to manually initialize the dictionary keys and values as a default value set() (i.e. the empty set) gets inserted for each key that we try to access but not others:
from collections import defaultdict
string = "the faith that he had had had had an affect on his life"
def get_word_len_dict(text):
result_dict = defaultdict(set)
for word in text.split():
length = len(word)
result_dict[length].add(word)
return {k : list(v) for k, v in result_dict.items()}
This gives the following output:
>>> get_word_len_dict(string)
{2: ['he', 'on', 'an'], 3: ['his', 'had', 'the'], 4: ['life', 'that'], 5: ['faith'], 6: ['affect']}
Your code is counting the occurrence of each word length - but not storing the words themselves.
In addition to capturing each word into a list of words with the same size, you also appear to want:
If a word length is not represented, do not return an empty list for that length - just don't have a key for that length.
No duplicates in each word list
Each word list is sorted
A set container is ideal for accumulating the words - sets naturally eliminate any duplicates added to them.
Using defaultdict(sets) will setup an empty dictionary of sets -- a dictionary key will only be created if it is referenced in our loop that examines each word.
from collections import defaultdict
def get_word_len_dict(text):
#create empty dictionary of sets
d = defaultdict(set)
# the key is the length of each word
# The value is a growing set of words
# sets automatically eliminate duplicates
for word in text.split():
d[len(word)].add(word)
# the sets in the dictionary are unordered
# so sort them into a new dictionary, which is returned
# as a dictionary of lists
return {i:sorted(d[i]) for i in d.keys()}
In your example string of
a="the faith that he had had had had an affect on his life"
Calling the function like this:
z=get_word_len_dict(a)
Returns the following list:
print(z)
{2: ['an', 'he', 'on'], 3: ['had', 'his', 'the'], 4: ['life', 'that'], 5: ['faith'], 6: ['affect']}
The type of each value in the dictionary is "list".
print(type(z[2]))
<class 'list'>

Categories

Resources