computing frequencies in a nested list - python

I'm trying to compute the frequencies of words using a dictionary in a nested lists. Each nested list is a sentence broken up into each word. Also, I want to delete proper nouns and lower case words at the beginning of the sentence. Is it even possible to get ride of proper nouns?
x = [["Hey", "Kyle","are", "you", "doing"],["I", "am", "doing", "fine"]["Kyle", "what", "time" "is", "it"]
from collections import Counter
def computeFrequencies(x):
count = Counter()
for listofWords in L:
for word in L:
count[word] += 1
return count
It is returning an error: unhashable type: 'list'
I want to return exactly this without the Counter() around the dictionary:
{"hey": 1, "how": 1, "are": 1, "you": 1, "doing": 2, "i": , "am": 1, "fine": 1, "what": 1, "time": 1, "is": 1, "it": 1}

Since your data is nested, you can flatten it with chain.from_iterable like this
from itertools import chain
from collections import Counter
print Counter(chain.from_iterable(x))
# Counter({'doing': 2, 'Kyle': 2, 'what': 1, 'timeis': 1, 'am': 1, 'Hey': 1, 'I': 1, 'are': 1, 'it': 1, 'you': 1, 'fine': 1})
If you want to use generator expression, then you can do
from collections import Counter
print Counter(item for items in x for item in items)
If you want to do this without using Counter, then you can use a normal dictionary like this
my_counter = {}
for line in x:
for word in line:
my_counter[word] = my_counter.get(word, 0) + 1
print my_counter
You can also use collections.defaultdict, like this
from collections import defaultdict
my_counter = defaultdict(int)
for line in x:
for word in line:
my_counter[word] += 1
print my_counter
Okay, if you simply want to convert the Counter object to a dict object (which I believe is not necessary at all since Counter is actually a dictionary. You can access key-values, iterate, delete update the Counter object just like a normal dictionary object), you can use bsoist's suggestion,
print dict(Counter(chain.from_iterable(x)))

The problem is that you are iterating over L twice.
Replace the inner loop:
for word in L:
with:
for word in listofWords:
Though, if want to go "pythonic" - check out #thefourtheye's solution.

Related

Transform every word in string to a dict and pass how many times all the word occured as value in python

I'm having trouble transforming every word of a string in a dictionary and passing how many times the word appears as the value.
For example
string = 'How many times times appeared in this many times'
The dict i wanted is:
dict = {'times':3, 'many':2, 'how':1 ...}
Using Counter
from collections import Counter
res = dict(Counter(string.split()))
#{'How': 1, 'many': 2, 'times': 3, 'appeared': 1, 'in': 1, 'this': 1}
You can loop through the words and increment the count like so:
d = {}
for word in string.split(" "):
d.setdefault(word, 0)
d[word] += 1

index word in dictionary

I have a text file where I want each word in the text file in a dictionary and then print out the index position each time the word is in the text file.
The code I have is only giving me the number of times the word is in the text file. How can I change this?
I have already converted to lowercase.
dicti = {}
for eachword in wordsintxt:
freq = dicti.get(eachword, None)
if freq == None:
dicti[eachword] = 1
else:
dicti[eachword] = freq + 1
print(dicti)
Change your code to keep the indices themselves, rather than merely count them:
for index, eachword in enumerate(wordsintxt):
freq = dicti.get(eachword, None)
if freq == None:
dicti[eachword] = []
else:
dicti[eachword].append(index)
If you still need the word frequency: that's easy to recover:
freq = len(dicti[word])
Update per OP comment
Without enumerate, simply provide that functionality yourself:
for index in range(len(wordsintxt)):
eachword = wordsintxt[i]
I'm not sure why you'd want to do that; the operation is idiomatic and common enough that Python developers created enumerate for exactly that purpose.
You can use this:
wordsintxt = ["hello", "world", "the", "a", "Hello", "my", "name", "is", "the"]
words_data = {}
for i, word in enumerate(wordsintxt):
word = word.lower()
words_data[word] = words_data.get(word, {'freq': 0, 'indexes': []})
words_data[word]['freq'] += 1
words_data[word]['indexes'].append(i)
for k, v in words_data.items():
print(k, '\t', v)
Which prints:
hello {'freq': 2, 'indexes': [0, 4]}
world {'freq': 1, 'indexes': [1]}
the {'freq': 2, 'indexes': [2, 8]}
a {'freq': 1, 'indexes': [3]}
my {'freq': 1, 'indexes': [5]}
name {'freq': 1, 'indexes': [6]}
is {'freq': 1, 'indexes': [7]}
You can avoid checking if the value exists in your dictionary and then performing a custom action by just using data[key] = data.get(key, STARTING_VALUE)
Greetings!
Use collections.defaultdict with enumerate, just append all the indexes you retrieve from enumerate
from collections import defaultdict
with open('test.txt') as f:
content = f.read()
words = content.split()
dd = defaultdict(list)
for i, v in enumerate(words):
dd[v.lower()].append(i)
print(dd)
# defaultdict(<class 'list'>, {'i': [0, 6, 35, 54, 57], 'have': [1, 36, 58],... 'lowercase.': [62]})

My dictionary counts letter instead of words [duplicate]

This question already has answers here:
Iterating through a string word by word
(7 answers)
Closed 4 years ago.
I have made a text string and removed all non alphabetical symbols and added whitespaces in between the words, but when I add them to a dictionary to count the frequency of the words it counts the letters instead. How do I count the words from a dictionary?
dictionary = {}
for item in text_string:
if item in dictionary:
dictionary[item] = dictionary[item]+1
else:
dictionary[item] = 1
print(dictionary)
Change this
for item in text_string:
to this
for item in text_string.split():
Function .split() splits the string to words using whitespace characters (including tabs and newlines) as delimiters.
You are very close. Since you state that your words are already whitespace separated, you need to use str.split to make a list of words.
An example is below:
dictionary = {}
text_string = 'there are repeated words in this sring with many words many are repeated'
for item in text_string.split():
if item in dictionary:
dictionary[item] = dictionary[item]+1
else:
dictionary[item] = 1
print(dictionary)
{'there': 1, 'are': 2, 'repeated': 2, 'words': 2, 'in': 1,
'this': 1, 'sring': 1, 'with': 1, 'many': 2}
Another solution is to use collections.Counter, available in the standard library:
from collections import Counter
text_string = 'there are repeated words in this sring with many words many are repeated'
c = Counter(text_string.split())
print(c)
Counter({'are': 2, 'repeated': 2, 'words': 2, 'many': 2, 'there': 1,
'in': 1, 'this': 1, 'sring': 1, 'with': 1})

Python increment values in a dictionary

I am trying to count every word from text files and appending the word and count to a dictionary as the key-value pairs. It throws me this error: if key not in wordDict:
TypeError: unhashable type: 'list'
Also, I am wondering of .split() is good because my text files contain different punctuation marks.
fileref = open(mypath + '/' + i, 'r')
wordDict = {}
for line in fileref.readlines():
key = line.split()
if key not in wordDict:
wordDict[key] = 1
else:
wordDict[key] += 1
from collections import Counter
text = '''I am trying to count every word from text files and appending the word and count to a dictionary as the key-value pairs. It throws me this error: if key not in wordDict: TypeError: unhashable type: 'list' Also, I am wondering of .split() is good because my text files contain different punctuation marks. Thanks ahead for those who help!'''
split_text = text.split()
counter = Counter(split_text)
print(counter)
out:
Counter({'count': 2, 'and': 2, 'text': 2, 'to': 2, 'I': 2, 'files': 2, 'word': 2, 'am': 2, 'the': 2, 'dictionary': 1, 'a': 1, 'not': 1, 'in': 1, 'ahead': 1, 'me': 1, 'trying': 1, 'every': 1, '.split()': 1, 'type:': 1, 'my': 1, 'punctuation': 1, 'is': 1, 'key': 1, 'error:': 1, 'help!': 1, 'those': 1, 'different': 1, 'throws': 1, 'TypeError:': 1, 'contain': 1, 'wordDict:': 1, 'appending': 1, 'if': 1, 'It': 1, 'Also,': 1, 'unhashable': 1, 'from': 1, 'because': 1, 'marks.': 1, 'pairs.': 1, 'this': 1, 'key-value': 1, 'wondering': 1, 'Thanks': 1, 'of': 1, 'good': 1, "'list'": 1, 'for': 1, 'who': 1, 'as': 1})
key is a list of space-delimited words found in the current line. You would need to iterate over that list as well.
for line in fileref:
keys = line.split()
for key in keys:
if key not in wordDict:
wordDict[key] = 1
else:
wordDict[key] += 1
This can be cleaned up considerably by either using the setdefault method or a defaultdict from the collections module; both allow you to avoid explicitly checking for a key by automatically adding the key with an initial value if it isn't already in the dict.
for key in keys:
wordDict.setdefault(key, 0) += 1
or
from collections import defaultdict
wordDict = defaultdict(int) # Default to 0, since int() == 0
...
for key in keys:
wordDict[key] += 1
key is a list and you're trying to see if a list is in a dictionary which is equivalent to seeing if it is one of the keys. Dictionary keys canot be lists hence the "unhashable type" error.
str.split return a list of words
>>> "hello world".split()
['hello', 'world']
>>>
and lists or any other mutable object cannot be used as a key of a dictionary, and that is why you get the error TypeError: unhashable type: 'list'.
You need to iterate over it to include each one of those, also the recommended way to work with a file is with the with statement
wordDict = {}
with open(mypath + '/' + i, 'r') as fileref:
for line in fileref:
for word in line.split():
if word not in wordDict:
wordDict[word] = 1
else:
wordDict[word] += 1
the above can be shortened with the use Counter and an appropriate call to it
from collections import Counter
with open(mypath + '/' + i, 'r') as fileref:
wordDict = Counter( word for line in fileref for word in line.split() )

counting words from a dictionary?

My function is supposed to have:
One parameter as a tweet.
This tweet can involve numbers, words, hashtags, links and punctuations.
A second parameter is a dictionary that counts the words in that string with tweets, disregarding the hashtag's, mentions, links, and punctuation included in it.
The function returns all individual words in the dictionary as lowercase letters without any punctuation.
If the tweet had Don't then the dictionary would count it as dont.
Here is my function:
def count_words(tweet, num_words):
''' (str, dict of {str: int}) -> None
Return a NoneType that updates the count of words in the dictionary.
>>> count_words('We have made too much progress', num_words)
>>> num_words
{'we': 1, 'have': 1, 'made': 1, 'too': 1, 'much': 1, 'progress': 1}
>>> count_words("#utmandrew Don't you wish you could vote? #MakeAmericaGreatAgain", num_words)
>>> num_words
{'dont': 1, 'wish': 1, 'you': 2, 'could': 1, 'vote': 1}
>>> count_words('I am fighting for you! #FollowTheMoney', num_words)
>>> num_words
{'i': 1, 'am': 1, 'fighting': 1, 'for': 1, 'you': 1}
>>> count_words('', num_words)
>>> num_words
{'': 0}
'''
I might misunderstand your question, but if you want to update the dictionary you can do it in this manner:
d = {}
def update_dict(tweet):
for i in tweet.split():
if i not in d:
d[i] = 1
else:
d[i] += 1
return d

Categories

Resources