My function is supposed to have:
One parameter as a tweet.
This tweet can involve numbers, words, hashtags, links and punctuations.
A second parameter is a dictionary that counts the words in that string with tweets, disregarding the hashtag's, mentions, links, and punctuation included in it.
The function returns all individual words in the dictionary as lowercase letters without any punctuation.
If the tweet had Don't then the dictionary would count it as dont.
Here is my function:
def count_words(tweet, num_words):
''' (str, dict of {str: int}) -> None
Return a NoneType that updates the count of words in the dictionary.
>>> count_words('We have made too much progress', num_words)
>>> num_words
{'we': 1, 'have': 1, 'made': 1, 'too': 1, 'much': 1, 'progress': 1}
>>> count_words("#utmandrew Don't you wish you could vote? #MakeAmericaGreatAgain", num_words)
>>> num_words
{'dont': 1, 'wish': 1, 'you': 2, 'could': 1, 'vote': 1}
>>> count_words('I am fighting for you! #FollowTheMoney', num_words)
>>> num_words
{'i': 1, 'am': 1, 'fighting': 1, 'for': 1, 'you': 1}
>>> count_words('', num_words)
>>> num_words
{'': 0}
'''
I might misunderstand your question, but if you want to update the dictionary you can do it in this manner:
d = {}
def update_dict(tweet):
for i in tweet.split():
if i not in d:
d[i] = 1
else:
d[i] += 1
return d
Related
I just started learning python and found this snippet. It's supposed to count how many times a word appears. I guess, for all of you this will seem very logical, but unfortunately for me, it doesn't make any sense.
str = "house is where you live, you don't leave the house."
dict = {}
list = str.split(" ")
for word in list: # Loop over the list
if word in dict: # How can I loop over the dictionary if it's empty?
dict[word] = dict[word] + 1
else:
dict[word] = 1
So, my question here is, how can I loop over the dictionary? Shouldn't the dictionary be empty because I didn't pass anything inside?
Maybe I am not smart enough, but I don't see the logic. Can anybody explain me how does it work?
Many thanks
As somebody else pointed out, the terms str, dict, and list shouldn't be used for variable names, because these are actual Python commands that do special things in Python. For example, str(33) turns the number 33 into the string "33". Granted, Python is often smart enough to understand that you want to use these things as variable names, but to avoid confusion you really should use something else. So here's the same code with different variable names, plus some print statements at the end of the loop:
mystring = "house is where you live, you don't leave the house."
mydict = {}
mylist = mystring.split(" ")
for word in mylist: # Loop over the list
if word in mydict:
mydict[word] = mydict[word] + 1
else:
mydict[word] = 1
print("\nmydict is now:")
print(mydict)
If you run this, you'll get the following output:
mydict is now:
{'house': 1}
mydict is now:
{'house': 1, 'is': 1}
mydict is now:
{'house': 1, 'is': 1, 'where': 1}
mydict is now:
{'house': 1, 'is': 1, 'where': 1, 'you': 1}
mydict is now:
{'house': 1, 'is': 1, 'live,': 1, 'where': 1, 'you': 1}
mydict is now:
{'house': 1, 'is': 1, 'live,': 1, 'where': 1, 'you': 2}
mydict is now:
{"don't": 1, 'house': 1, 'is': 1, 'live,': 1, 'you': 2, 'where': 1}
mydict is now:
{"don't": 1, 'house': 1, 'is': 1, 'live,': 1, 'leave': 1, 'you': 2, 'where': 1}
mydict is now:
{"don't": 1, 'house': 1, 'is': 1, 'live,': 1, 'leave': 1, 'you': 2, 'where': 1, 'the': 1}
mydict is now:
{"don't": 1, 'house': 1, 'is': 1, 'live,': 1, 'house.': 1, 'leave': 1, 'you': 2, 'where': 1, 'the': 1}
So mydict is indeed updating with every word it finds. This should also give you a better idea of how dictionaries work in Python.
To be clear, you're not "looping" over the dictionary. The for command starts a loop; the if word in mydict: command isn't a loop, but just a comparison. It looks at all of the keys in mydict and sees if there's one that matches the same string as word.
Also, note that since you only split your sentence on strings, your list of words includes for example both "house" and "house.". Since these two don't exactly match, they're treated as two different words, which is why you see 'house': 1 and 'house.': 1 in your dictionary instead of 'house': 2.
So i am trying to get the position of each word in a list, and print it in a dictionary that has the word for key and a set of integers where it belongs in the list.
list_x = ["this is the first", "this is the second"]
my_dict = {}
for i in range(len(list_x)):
for x in list_x[i].split():
if x in my_dict:
my_dict[x] += 1
else:
my_dict[x] = 1
print(my_dict)
This is the code i tried but this gives me the total number of how many time it appears in the list each word.
What i am trying to get is this format:
{'this': {0, 1}, 'is': {0, 1}, 'the': {0, 1}, 'first': {0}, 'second': {1}}
As you can see this is the key and it appears once, in the "0" position and once in the "1" and .. Any idea how i might get to this point?
Fixed two lines:
list_x = ["this is the first", "this is the second"]
my_dict = {}
for i in range(len(list_x)):
for x in list_x[i].split():
if x in my_dict:
my_dict[x].append(i)
else:
my_dict[x] = [i]
print(my_dict)
Returns:
{'this': [0, 1], 'is': [0, 1], 'the': [0, 1], 'first': [0], 'second': [1]}
Rather than using integers in your dict, you should use a set:
for i in range(len(list_x)):
for x in list_x[i].split():
if x in my_dict:
my_dict[x].add(i)
else:
my_dict[x] = set([i])
Or, more briefly,
for i in range(len(list_x)):
for x in list_x[i].split():
my_dict.setdefault(x, set()).add(i)
You can also do this with defaultdict and enumerate:
from collections import defaultdict
list_x = ["this is the first",
"this is the second",
"third is this"]
pos = defaultdict(set)
for i, sublist in enumerate(list_x):
for word in sublist.split():
pos[word].add(i)
Output:
>>> from pprint import pprint
>>> pprint(dict(pos))
{'first': {0},
'is': {0, 1, 2},
'second': {1},
'the': {0, 1},
'third': {2},
'this': {0, 1, 2}}
The purpose of enumerate is to provide the index (position) of each string within list_x. For each word encountered, the position of its sentence within list_x will be added to the set for its corresponding key in the result, pos.
This question already has answers here:
Iterating through a string word by word
(7 answers)
Closed 4 years ago.
I have made a text string and removed all non alphabetical symbols and added whitespaces in between the words, but when I add them to a dictionary to count the frequency of the words it counts the letters instead. How do I count the words from a dictionary?
dictionary = {}
for item in text_string:
if item in dictionary:
dictionary[item] = dictionary[item]+1
else:
dictionary[item] = 1
print(dictionary)
Change this
for item in text_string:
to this
for item in text_string.split():
Function .split() splits the string to words using whitespace characters (including tabs and newlines) as delimiters.
You are very close. Since you state that your words are already whitespace separated, you need to use str.split to make a list of words.
An example is below:
dictionary = {}
text_string = 'there are repeated words in this sring with many words many are repeated'
for item in text_string.split():
if item in dictionary:
dictionary[item] = dictionary[item]+1
else:
dictionary[item] = 1
print(dictionary)
{'there': 1, 'are': 2, 'repeated': 2, 'words': 2, 'in': 1,
'this': 1, 'sring': 1, 'with': 1, 'many': 2}
Another solution is to use collections.Counter, available in the standard library:
from collections import Counter
text_string = 'there are repeated words in this sring with many words many are repeated'
c = Counter(text_string.split())
print(c)
Counter({'are': 2, 'repeated': 2, 'words': 2, 'many': 2, 'there': 1,
'in': 1, 'this': 1, 'sring': 1, 'with': 1})
I am trying to count every word from text files and appending the word and count to a dictionary as the key-value pairs. It throws me this error: if key not in wordDict:
TypeError: unhashable type: 'list'
Also, I am wondering of .split() is good because my text files contain different punctuation marks.
fileref = open(mypath + '/' + i, 'r')
wordDict = {}
for line in fileref.readlines():
key = line.split()
if key not in wordDict:
wordDict[key] = 1
else:
wordDict[key] += 1
from collections import Counter
text = '''I am trying to count every word from text files and appending the word and count to a dictionary as the key-value pairs. It throws me this error: if key not in wordDict: TypeError: unhashable type: 'list' Also, I am wondering of .split() is good because my text files contain different punctuation marks. Thanks ahead for those who help!'''
split_text = text.split()
counter = Counter(split_text)
print(counter)
out:
Counter({'count': 2, 'and': 2, 'text': 2, 'to': 2, 'I': 2, 'files': 2, 'word': 2, 'am': 2, 'the': 2, 'dictionary': 1, 'a': 1, 'not': 1, 'in': 1, 'ahead': 1, 'me': 1, 'trying': 1, 'every': 1, '.split()': 1, 'type:': 1, 'my': 1, 'punctuation': 1, 'is': 1, 'key': 1, 'error:': 1, 'help!': 1, 'those': 1, 'different': 1, 'throws': 1, 'TypeError:': 1, 'contain': 1, 'wordDict:': 1, 'appending': 1, 'if': 1, 'It': 1, 'Also,': 1, 'unhashable': 1, 'from': 1, 'because': 1, 'marks.': 1, 'pairs.': 1, 'this': 1, 'key-value': 1, 'wondering': 1, 'Thanks': 1, 'of': 1, 'good': 1, "'list'": 1, 'for': 1, 'who': 1, 'as': 1})
key is a list of space-delimited words found in the current line. You would need to iterate over that list as well.
for line in fileref:
keys = line.split()
for key in keys:
if key not in wordDict:
wordDict[key] = 1
else:
wordDict[key] += 1
This can be cleaned up considerably by either using the setdefault method or a defaultdict from the collections module; both allow you to avoid explicitly checking for a key by automatically adding the key with an initial value if it isn't already in the dict.
for key in keys:
wordDict.setdefault(key, 0) += 1
or
from collections import defaultdict
wordDict = defaultdict(int) # Default to 0, since int() == 0
...
for key in keys:
wordDict[key] += 1
key is a list and you're trying to see if a list is in a dictionary which is equivalent to seeing if it is one of the keys. Dictionary keys canot be lists hence the "unhashable type" error.
str.split return a list of words
>>> "hello world".split()
['hello', 'world']
>>>
and lists or any other mutable object cannot be used as a key of a dictionary, and that is why you get the error TypeError: unhashable type: 'list'.
You need to iterate over it to include each one of those, also the recommended way to work with a file is with the with statement
wordDict = {}
with open(mypath + '/' + i, 'r') as fileref:
for line in fileref:
for word in line.split():
if word not in wordDict:
wordDict[word] = 1
else:
wordDict[word] += 1
the above can be shortened with the use Counter and an appropriate call to it
from collections import Counter
with open(mypath + '/' + i, 'r') as fileref:
wordDict = Counter( word for line in fileref for word in line.split() )
I'm trying to compute the frequencies of words using a dictionary in a nested lists. Each nested list is a sentence broken up into each word. Also, I want to delete proper nouns and lower case words at the beginning of the sentence. Is it even possible to get ride of proper nouns?
x = [["Hey", "Kyle","are", "you", "doing"],["I", "am", "doing", "fine"]["Kyle", "what", "time" "is", "it"]
from collections import Counter
def computeFrequencies(x):
count = Counter()
for listofWords in L:
for word in L:
count[word] += 1
return count
It is returning an error: unhashable type: 'list'
I want to return exactly this without the Counter() around the dictionary:
{"hey": 1, "how": 1, "are": 1, "you": 1, "doing": 2, "i": , "am": 1, "fine": 1, "what": 1, "time": 1, "is": 1, "it": 1}
Since your data is nested, you can flatten it with chain.from_iterable like this
from itertools import chain
from collections import Counter
print Counter(chain.from_iterable(x))
# Counter({'doing': 2, 'Kyle': 2, 'what': 1, 'timeis': 1, 'am': 1, 'Hey': 1, 'I': 1, 'are': 1, 'it': 1, 'you': 1, 'fine': 1})
If you want to use generator expression, then you can do
from collections import Counter
print Counter(item for items in x for item in items)
If you want to do this without using Counter, then you can use a normal dictionary like this
my_counter = {}
for line in x:
for word in line:
my_counter[word] = my_counter.get(word, 0) + 1
print my_counter
You can also use collections.defaultdict, like this
from collections import defaultdict
my_counter = defaultdict(int)
for line in x:
for word in line:
my_counter[word] += 1
print my_counter
Okay, if you simply want to convert the Counter object to a dict object (which I believe is not necessary at all since Counter is actually a dictionary. You can access key-values, iterate, delete update the Counter object just like a normal dictionary object), you can use bsoist's suggestion,
print dict(Counter(chain.from_iterable(x)))
The problem is that you are iterating over L twice.
Replace the inner loop:
for word in L:
with:
for word in listofWords:
Though, if want to go "pythonic" - check out #thefourtheye's solution.