Removal of Duplicates from a dictionary - python

I have made an anagram below from a words.txt file.
with open('words.txt', 'r') as read:
line = read.readlines()
def make_anagram_dict(line):
word_list = {}
for word in line:
word = word.lower()
key = ''.join(sorted(word))
if key in word_list and len(word) > 5 and word not in word_list:
word_list[key].append(word)
else:
word_list[key] = [word]
return word_list
if __name__ == '__main__':
word_list = make_anagram_dict(line)
for key, words in word_list.items():
if len(words) >:
print('Key value' + ' '*len(key) + '| words')
print(key + ' '*len(key) + ':' + str(words))
print('---------------------------------------------')
The output I get looks like this (on a random part)
Key value | words
hortwy :['worthy\n', 'wrothy\n']
---------------------------------------------
But I also get output like this (the duplicate part i am trying to fix)
Key value | words
eipprz :['zipper\n', 'zipper\n']
---------------------------------------------
the problem is that in the words.txt file, It coins duplicates except for the capital letter at the start:
i.e Zipper and zipper. It therefore creates an anagram of zipper, when it shouldn't. I tried to fix it with the part in bold. I would really appreciate any help!

Going from list to set and then back to list removes duplicates. To apply this to the values of a dictionary use a comprehension:
new_dict = {key: list(set(value)) for key, value in old_dict.items()}
Example
my_dict = {"abc": ["abc", "acb", "bac", "bca"],
"aab": ["aab", "aba", "baa", "aab"]}
new_dict = {key: list(set(value)) for key, value in my_dict.items()}
print(new_dict)
> {'abc': ['abc', 'bca', 'bac', 'acb'], 'aab': ['aba', 'aab', 'baa']}
It would in fact by more efficient to just use set from the start. Ie:
if key in word_list and len(word) > 5 and word not in word_list:
word_list[key].add(word)
else:
word_list[key] = {word}
If you really need it as a list, you can convert back at the end. I'm not 100% sure whether this method will always preserve order however.

Related

Randomly pick a value from each list in a dictionary in python

I have the following code:
result = set()
with open("words.txt") as fd:
for line in fd:
matching_words = {word for word in line.lower().split() if len(word)==4 and "'" not in word}
result.update(matching_words)
print(result)
print(len(result))
result_dict = {}
for word in result:
result_dict[word[2:]] = result_dict.get(word[2:], []) + [word]
print(result_dict)
print({key: len(value) for key, value in result_dict.items()})
Output
This takes a .txt file finds all the unique four letter words and excludes any that include an apostrophe. These words are then split using the last 2 characters. Each of the word endings are then added to a dictionary with the number of words containing that ending displayed as the value.
What I now need to do is disregard any list with less than 30 words in it.
Then randomly select one word from each of the remaining lists and print the list of words.
The following comprehension should work:
[random.choice(v) for v in result_dict.values() if len(v) >= 30]
Why not use random.choice and use a list comprehension to limit the values given to it:
random.choice([k for k, v in result_dict.items() if len(v) >= 30])

I m trying to append multiple values to key in a dictionary in python

I am trying to read the file and converting it into dictionary .after reading i have to take a word and word first character as a key and word itself as a value. If another word with same character comes it should append the values to existing key itself.
import io
file1 = open("text.txt")
line = file1.read()
words = line.split()
Dict={}
for w in words:
if w[0] in Dict.keys():
key1=w[0]
wor=str(w)
Dict.setdefault(key1,[])
Dict[key1].append(wor)
else:
Dict[w[0]] = w
print Dict
Just simplified your code. There is no point in having a else condition if using set_default
words = 'hello how are you'.split()
dictionary = {}
for word in words:
key = word[0]
dictionary.setdefault(key, []).append(word)
print dictionary
In order to get rid of set_default use default_dict
from collections import defaultdict
words = 'hello how are you'.split()
dictionary = defaultdict(list)
for word in words:
key = word[0]
dictionary[key].append(word)
print dictionary.items()

In how many lists does the word appear?

I have different lists in python:
list1 = [hello,there,hi]
list2 = [my,name,hello]
I need to make a dictionary with the key being the number of lists a word appears in. So my answer would look like
{2:hello,1:hi ....}
I am new to python and I have no idea how to do this.
You need to use a dictionary to store key-value results.
Here is some code to help you get started, but you'll has to modify to your exact solution.
#!/usr/bin/python
list1 = ["hello","there","hi"]
list2 = ["my","name","hello"]
result = dict()
for word in list1:
if word in result.keys():
result[word] = result[word] + 1
else:
result[word] = 1
for word in list2:
if word in result.keys():
result[word] = result[word] + 1
else:
result[word] = 1
print result
As first step, make reverse dictionary like so
initialize it
words_count = {}
and then for each list of words do like so
for word in list_of_words:
if not word in words_count:
words_count[word] = 1
else:
words_count[word] += 1
then reverse words_count like so:
inv_words_count = {v: k for k, v in words_count.items()}
inv_words_count is the desired result
I have slightly modified your input lists (list1 & list2) as shown below:
list1 = ['hello,there,hi'] # Added quotes as it is a string
list2 = ['my,name,hello']
Here is the logic:
list1 = list1[0].split(',')
list2 = list2[0].split(',')
list_final = list1 + list2
dict_final = {}
for item in list_final:
if item in dict_final.keys():
dict_final.update({item:(dict_final.get(item) + 1)})
else:
dict_final.update({item:1})
Hope it will work as you are expecting :)

Append to List Nested in Dictionary

I am trying to append to lists nested in a dictionary so I can see which letters follow a letter. I have the desired result at the bottom I would like to get. Why is this not matching up?
word = 'google'
word_map = {}
word_length = len(word)
last_letter = word_length - 1
for index, letter in enumerate(word):
if index < last_letter:
if letter not in word_map.keys():
word_map[letter] = list(word[index+1])
if letter in word_map.keys():
word_map[letter].append(word[index+1])
if index == last_letter:
word_map[letter] = None
print word_map
desired_result = {'g':['o', 'l'], 'o':['o', 'g'], 'l':['e'],'e':None}
print desired_result
Use the standard library to your advantage:
from itertools import izip_longest
from collections import defaultdict
s = 'google'
d = defaultdict(list)
for l1,l2 in izip_longest(s,s[1:],fillvalue=None):
d[l1].append(l2)
print d
The first trick here is to yield the letters pair-wise (with a None at the end). That's exactly what we do with izip_longest(s,s[1:],fillvalue=None). From there, it's a simple matter of appending the second letter to the dictionary list which corresponds to the first character. The defaultdict allows us to avoid all sorts of tests to check if the key is in the dict or not.
if letter not in word_map.keys():
word_map[letter] = list(word[index+1])
# now letter IS in word_map, so this also executes:
if letter in word_map.keys():
word_map[letter].append(word[index+1])
You meant:
if letter not in word_map.keys():
word_map[letter] = list(word[index+1])
else:
word_map[letter].append(word[index+1])
Another thing: what if the last letter also occurs in the middle of the word?

Not working: indexing the words in a file in a dict by first letter

I have to write a function based on a open file that has one lowercase word per line. I have to return a dictionary with keys in single lowercase letters and each value is a list of the words from the file that starts with that letter. (The keys in the dictionary are from only the letters of the words that appear in the file.)
This is my code:
def words(file):
line = file.readline()
dict = {}
list = []
while (line != ""):
list = line[:].split()
if line[0] not in dict.keys():
dict[line[0]] = list
line = file.readline()
return dict
However, when I was testing it myself, my function doesn't seem to return all the values. If there are more than two words that start with a certain letter, only the first one shows up as the values in the output. What am I doing wrong?
For example, the file should return:
{'a': ['apple'], 'p': ['peach', 'pear', 'pineapple'], \
'b': ['banana', 'blueberry'], 'o': ['orange']}, ...
... but returns ...
{'a': ['apple'], 'p': ['pear'], \
'b': ['banana'], 'o': ['orange']}, ...
Try this solution, it takes into account the case where there are words starting with the same character in more than one line, and it doesn't use defaultdict. I also simplified the function a bit:
def words(file):
dict = {}
for line in file:
lst = line.split()
dict.setdefault(line[0], []).extend(lst)
return dict
You aren't adding to the list for each additional letter. Try:
if line[0] not in dict.keys():
dict[line[0]] = list
else:
dict[line[0]] += list
The specific problem is that dict[line[0]] = list replaces the value for the new key. There are many ways to fix this... I'm happy to provide one, but you asked what was wrong and that's it. Welcome StackOverflow.
It seems like every dictionary entry should be a list. Use the append method on the dictionary key.
Sacrificing performance (to a certain extent) for elegance:
with open(whatever) as f: words = f.read().split()
result = {
first: [word for word in words if word.startswith(first)]
for first in set(word[0] for word in words)
}
Something like this should work
def words(file):
dct = {}
for line in file:
word = line.strip()
try:
dct[word[0]].append(word)
except KeyError:
dct[word[0]] = [word]
return dct
The first time a new letter is found, there will be a KeyError, subsequent occurances of the letter will cause the word to be appended to the existing list
Another approach would be to prepopulate the dict with the keys you need
import string
def words(file):
dct = dict.fromkeys(string.lowercase, [])
for line in file:
word = line.strip()
dct[word[0]] = dct[word[0]] + [word]
return dct
I'll leave it as an exercise to work out why dct[word[0]] += [word] won't work
Try this function
def words(file):
dict = {}
line = file.readline()
while (line != ""):
my_key = line[0].lower()
dict.setdefault(my_key, []).extend(line.split() )
line = file.readline()
return dict

Categories

Resources