Checking keys in a defaultdict - python

I have this code that should run through the keys in the python defaultdict, and if the key isn't in the defaultdict, it gets added.
I'm getting an error that I don't encounter with regular defined dictionaries, and I'm having a bit of trouble working it out:
The code:
from collections import defaultdict
def counts(line):
for word in line.split():
if word not in defaultdict.keys():
word = "".join(c for c in word if c not in ('!', '.', ':', ','))
defaultdict[word] = 0
if word != "--":
defaultdict[word] += 1
The error:
if word not in defaultdict.keys():
TypeError: descriptor 'keys' of 'dict' object needs an argument

You did not construct a defaultdict object here, you simply refer to the defaultdict class.
You can create one, like:
from collections import defaultdict
def counts(line):
dd = defaultdict(int)
for word in line.split():
word = ''.join(c for c in word if c not in ('!', '.', ':', ','))
if word not in dd:
dd[word] = 0
if word != '--':
dd[word] += 1
return dd
That being said, you probably want to use a Counter here, like:
from collections import Counter
def counts(line):
words = (
''.join(c for c in word if c not in ('!', '.', ':', ',')) for word in line.split()
)
return Counter(
word for word in words if word != '--'
)

defaultdict is a class; you need an object:
from collections import defaultdict
def counts(line, my_dict):
for word in line.split():
if word not in my_dict.keys():
word = "".join(c for c in word if c not in ('!', '.', ':', ','))
my_dict[word] = 0
if word != "--":
my_dict[word] += 1
my_dict = defaultdict()
counts("Now is the time for all good parties to come to the aid of man.", my_dict)
print(my_dict)
Output:
defaultdict(None, {'Now': 1, 'is': 1, 'the': 2, 'time': 1, 'for': 1, 'all': 1, 'good': 1, 'parties': 1, 'to': 2, 'come': 1, 'aid': 1, 'of': 1, 'man': 1})

Related

How to use list comprehensions for for-loop with additional operations

I want to simplify this construction with list comprehensions:
words = {}
counter = 0
for sentence in text:
for word in sentence:
if word not in words:
words[word] = counter
counter += 1
If there was something like post-increment, it could be written like:
words = {word: counter++ for sentence in text for word in sentence if word not in words}
How should I do it in pythonic way?
For example:
text =
[
['aaa', 'bbb', 'ccc'],
['bbb', 'ddd'],
['aaa', 'ccc', 'eee']
]
Desired result:
words = {'aaa': 1, 'bbb': 2, 'ccc': 3, 'ddd': 4, 'eee': 5}
Order does not matter.
UPD:
I found an interesting solution:
words = {}
counter = (x for x in range(10**6))
[words.update({word: counter.next()}) for sentence in text for word in sentence if word not in words]
update method allows to check if word in dictionary already.
Maybe I should use len(words) instead of counter.next(), but I thought that counter will be faster (O(1) vs. O(dict_size)).
There are many ways to do this. This one is without using any external modules, one liner:
s = "a a a b b a a b a b a b"
d = [[(out, out.update([(v, out.get(v, 0) + 1)])) for v in s.split()] for out in [{}]][0][0][0]
print(d)
Prints:
{'a': 7, 'b': 5}
You use a dictionary, then you should use its setdefault method, it makes this kind of tasks trivial.
words = {}
for sentence in text:
for word in sentence:
words[word] = words.setdefault(word, 0) + 1
If you really want to use list comprehension, this one works:
def countWords(content):
allWords = [word for words in content for word in words]
return {word: allWords.count(word) for word in set(allWords)}
This was kinda fun to play with. You really can't do it in one line (and that's ok, 1 line solutions aren't always the best), but you can do it with all comprehensions.
d={}
s = "a a a b b a a b a b a b"
x = [(word, 1) for word in s.split()]
d = {word: sum(cnt for w,cnt in x if w == word) for word,_ in x if not word in d.keys()}
d is the destination dictionary that will hold the word counts. s is one of the sentences (you can expand this to extract at more than one level if you have a list of sentences). x is an intermediate list that holds a pair for each word that is ('word', 1), and then we use that to sum across the pairs to get the final count.
At the end, the value of x and d is:
>>> x
[('a', 1), ('a', 1), ('a', 1), ('b', 1), ('b', 1), ('a', 1), ('a', 1), ('b', 1), ('a', 1), ('b', 1), ('a', 1), ('b', 1)]
>>> d
{'a': 7, 'b': 5}
You can't initialise variables inside a list/dictionary comprehension. But you could always do it in two steps using a list then a dictionary comprehension :
# We list the different words in the text
list_words = [word for word in sentence for sentence in text]
# Using numpy's unique function and the count()
# function we use a dictionary comprehension
dict_words = {word : list_words.count(word) for word in np.unique(list_words)}

Counting the Vowels at the End of a Word

Write a function named vowelEndings that takes a string, text, as a parameter.
The function vowelEndings returns a dictionary d in which the keys are all the vowels that are the last letter of some word in text. The letters a, e, i, o and u are vowels. No other letter is a vowel. The value corresponding to each key in d is a list of all the words ending with that vowel. No word should appear more than once in a given list. All of the letters in text are lower case.
The following is an example of correct output:
>>> t = 'today you are you there is no one alive who is you-er than you'
>>> vowelEndings(t)
{'u': ['you'], 'o': ['no', 'who'], 'e': ['are', 'there', 'one', 'alive']}
This is what I have so far:
def vowelEndings(text):
vowels = 'aeiouAEIOU'
vowelCount = 0
words = text.split()
for word in words:
if word[0] in vowels:
vowelCount += 1
return vowelCount
t = 'today you are you there is no one alive who is you-er than you'
print(vowelEndings(t))
Output:
5
What is doing is counting the vowels of the beginning of each word but it should be counting the vowels of the end of each word. Also, it should print out the vowel and the word which the vowel is referring to like in the question. I need help with that.
You are close. The missing aspects are:
To extract the last letter, use word[-1].
You need to create a dictionary with vowel keys.
The dictionary values should be set to avoid duplicates.
The classic Python solution is to use collections.defaultdict:
from collections import defaultdict
t = 'today you are you there is no one alive who is you-er than you'
def vowelEndings(text):
vowels = set('aeiou')
d = defaultdict(set)
for word in text.split():
final = word[-1]
if final in vowels:
d[final].add(word)
return d
print(vowelEndings(t))
defaultdict(set,
{'e': {'alive', 'are', 'one', 'there'},
'o': {'no', 'who'},
'u': {'you'}})

Removing punctuation and creating a dictionary Python

I'm trying to create a function that removes punctuation and lowercases every letter in a string. Then, it should return all this in the form of a dictionary that counts the word frequency in the string.
This is the code I wrote so far:
def word_dic(string):
string = string.lower()
new_string = string.split(' ')
result = {}
for key in new_string:
if key in result:
result[key] += 1
else:
result[key] = 1
for c in result:
"".join([ c if not c.isalpha() else "" for c in result])
return result
But this what i'm getting after executing it:
{'am': 3,
'god!': 1,
'god.': 1,
'i': 2,
'i?': 1,
'thanks': 1,
'to': 1,
'who': 2}
I just need to remove he punctuation at the end of the words.
Another option is to use that famous Python's batteries included.
>>> sentence = 'Is this a test? It could be!'
>>> from collections import Counter
>>> Counter(re.sub('\W', ' ', sentence.lower()).split())
Counter({'a': 1, 'be': 1, 'this': 1, 'is': 1, 'it': 1, 'test': 1, 'could': 1})
Leverages collections.Counter for counting words, and re.sub for replacing everything that's not a word character.
"".join([ c if not c.isalpha() else "" for c in result]) creates a new string without the punctuation, but it doesn't do anything with it; it's thrown away immediately, because you never store the result.
Really, the best way to do this is to normalize your keys before counting them in result. For example, you might do:
for key in new_string:
# Keep only the alphabetic parts of each key, and replace key for future use
key = "".join([c for c in key if c.isalpha()])
if key in result:
result[key] += 1
else:
result[key] = 1
Now result never has keys with punctuation (and the counts for "god." and "god!" are summed under the key "god" alone), and there is no need for another pass to strip the punctuation after the fact.
Alternatively, if you only care about leading and trailing punctuation on each word (so "it's" should be preserved as is, not converted to "its"), you can simplify a lot further. Simply import string, then change:
key = "".join([c for c in key if c.isalpha()])
to:
key = key.rstrip(string.punctuation)
This matches what you specifically asked for in your question (remove punctuation at the end of words, but not at the beginning or embedded within the word).
You can use string.punctuation to recognize punctuation and use collections.Counter to count occurence once the string is correctly decomposed.
from collections import Counter
from string import punctuation
line = "It's a test and it's a good ol' one."
Counter(word.strip(punctuation) for word in line.casefold().split())
# Counter({"it's": 2, 'a': 2, 'test': 1, 'and': 1, 'good': 1, 'ol': 1, 'one': 1})
Using str.strip instead of str.replace allows to preserve words such as It's.
The method str.casefold is simply a more general case of str.lower.
Maybe if you want to reuse the words later, you can store them in a sub-dictionary along with its ocurrences number. Each word will have its place in a dictionary. We can create our own function to remove punctuation, pretty simple.
See if the code bellow serves your needs:
def remove_punctuation(word):
for c in word:
if not c.isalpha():
word = word.replace(c, '')
return word
def word_dic(s):
words = s.lower().split(' ')
result = {}
for word in words:
word = remove_punctuation(word)
if not result.get(word, None):
result[word] = {
'word': word,
'ocurrences': 1,
}
continue
result[word]['ocurrences'] += 1
return result
phrase = 'Who am I and who are you? Are we gods? Gods are we? We are what we are!'
print(word_dic(phrase))
and you'll have an output like this:
{
'who': {
'word': 'who',
'ocurrences': 2},
'am': {
'word': 'am',
'ocurrences': 1},
'i': {
'word': 'i',
'ocurrences': 1},
'and': {
'word': 'and',
'ocurrences': 1},
'are': {
'word': 'are',
'ocurrences': 5},
'you': {
'word': 'you',
'ocurrences': 1},
'we': {
'word': 'we',
'ocurrences': 4},
'gods': {
'word': 'gods',
'ocurrences': 2},
'what': {
'word': 'what',
'ocurrences': 1}
}
Then you can easily access each word and its ocurrences simply doing:
word_dict(phrase)['are']['word'] # output: are
word_dict(phrase)['are']['ocurrences'] # output: 5

how to dict map each word to list of words which follow it in python?

what i am trying to do :
dict that maps each word that appears in the file
to a list of all the words that immediately follow that word in the file.
The list of words can be be in any order and should include
duplicates.So for example the key "and" might have the list
["then", "best", "then", "after", ...] listing
all the words which came after "and" in the text.
f = open(filename,'r')
s = f.read().lower()
words = s.split()#list of words in the file
dict = {}
l = []
i = 0
for word in words:
if i < (len(words)-1) and word == words[i]:
dict[word] = l.append(words[i+1])
print dict.items()
sys.exit(0)
collections.defaultdict is helpful for such iterations. For simplicity, I've invented a string rather than loaded from a file.
from collections import defaultdict
import string
x = '''This is a random string with some
string elements repeated. This is so
that, with someluck, we can solve a problem.'''
translator = str.maketrans('', '', string.punctuation)
y = x.lower().translate(translator).replace('\n', '').split(' ')
result = defaultdict(list)
for i, j in zip(y[:], y[1:]):
result[i].append(j)
# result
# defaultdict(list,
# {'a': ['random', 'problem'],
# 'can': ['solve'],
# 'elements': ['repeated'],
# 'is': ['a', 'so'],
# 'random': ['string'],
# 'repeated': ['this'],
# 'so': ['that'],
# 'solve': ['a'],
# 'some': ['string'],
# 'someluck': ['we'],
# 'string': ['with', 'elements'],
# 'that': ['with'],
# 'this': ['is', 'is'],
# 'we': ['can'],
# 'with': ['some', 'someluck']})
You can use defaultdict for this:
from collections import defaultdict
words = ["then", "best", "then", "after"]
words_dict = defaultdict(list)
for w1,w2 in zip(words, words[1:]):
words_dict[w1].append(w2)
Results:
defaultdict(<class 'list'>, {'then': ['best', 'after'], 'best': ['then']})

How to create a sentence from a dictionary

I am trying to make some code where the user inputs a sentence, the sentence is turned into a dict and then the dict is used to get the original sentence back.
Code:
import json
def code():
sentence = input("Please write a sentence: ")
dictionary = {v: k for k,v in enumerate(sentence.split(), start=1)}
with open('Dict.txt', 'w') as fp:
json.dump(dictionary, fp)
print(dictionary)
puncList = ["{","}",",",":","'","[","]","1","2","3","4","5"]
for i in puncList:
for sentence in dictionary:
dictionary=[sentence.replace(i," ") for sentence in dictionary]
print(' '.join(dictionary))
code()
Input:
Hello my name is Bob
Actual output:
{'Hello' : '1', 'name' : '3', 'Bob' : '5', 'my' : '2', 'is' : '4'}
Hello name Bob my is
Desired output:
{'Hello' : '1', 'name' : '3', 'Bob' : '5', 'my' : '2', 'is' : '4'}
Hello my name is Bob
This would be fine too:
{'Hello' : '1', 'my' : '2', 'name' : '3', 'is' : '4', 'Bob' : '5'}
Hello my name is Bob
For the part where I recreate the original sentence, it cant just print the sentence, it has to be from the dict.
You need to either use OrderedDict to retain the element order, or sort the dictionary elements before you print them out. You've already got an OrderedDict answer, so here's how to use the dict you created:
print(' '.join(k for (k, v) in sort(dictionary.items(), key=lambda x: x[1])))
Incidentally, your approach has a bug: If you apply it to a sentence with repeated words, e.g., "boys will be boys", you'll find that there's no element with index 1 in your dictionary since (boys, 4) will overwrite (boys, 1).
Use an OrderedDict on enumerate, like so:
from collections import OrderedDict
s = "Hello my name is Bob"
d = OrderedDict((v, i) for i, v in enumerate(s.split(), 1))
print(d)
# OrderedDict([('Hello', 1), ('my', 2), ('name', 3), ('is', 4), ('Bob', 5)])
s_rebuild = ' '.join(d)
print(s_rebuild)
# 'Hello my name is Bob'
Since the dictionary is already ordered, the values are not used for rebuilding the string.
You logic is flawed in that it can't handle sentences with repeated words:
Hello Bob my name is Bob too
{'name': 4, 'Hello': 1, 'Bob': 6, 'is': 5, 'too': 7, 'my': 3}
name Hello Bob is too my
We can deal with that using a defaultdict, making the values arrays of word positions instead of individual numbers. We can further improve things by dealing with your punch list up front via a split. Finally, we can reconstruct the original sentence using a pair of nested loops. We don't want/need an OrderedDict, or sorting, to do this:
import re
import json
from collections import defaultdict
PUNCH_LIST = r"[ {},:'[\]1-5]+"
def code():
dictionary = defaultdict(list)
sentence = input("Please write a sentence: ")
for position, word in enumerate(re.split(PUNCH_LIST, sentence), start=1):
dictionary[word].append(position)
with open('Dict.txt', 'w') as fp:
json.dump(dictionary, fp)
print(dictionary)
position = 1
sentence = []
while position:
for word, positions in dictionary.items():
if position in positions:
sentence.append(word)
position += 1
break
else:
position = 0
print(' '.join(sentence))
code()
EXAMPLE:
Please write a sentence: Hello Bob, my name is Bob too
defaultdict(<class 'list'>, {'is': [5], 'too': [7], 'Bob': [2, 6], 'Hello': [1], 'name': [4], 'my': [3]})
Hello Bob my name is Bob too
Where Dict.txt contains:
{"is": [5], "too": [7], "Bob": [2, 6], "Hello": [1], "name": [4], "my": [3]}
Note that the defaultdict is a convenience, not a requirement. A plain dictionary will do, but you'll have to initialize the lists for each key.

Categories

Resources