I have a text file where I want each word in the text file in a dictionary and then print out the index position each time the word is in the text file.
The code I have is only giving me the number of times the word is in the text file. How can I change this?
I have already converted to lowercase.
dicti = {}
for eachword in wordsintxt:
freq = dicti.get(eachword, None)
if freq == None:
dicti[eachword] = 1
else:
dicti[eachword] = freq + 1
print(dicti)
Change your code to keep the indices themselves, rather than merely count them:
for index, eachword in enumerate(wordsintxt):
freq = dicti.get(eachword, None)
if freq == None:
dicti[eachword] = []
else:
dicti[eachword].append(index)
If you still need the word frequency: that's easy to recover:
freq = len(dicti[word])
Update per OP comment
Without enumerate, simply provide that functionality yourself:
for index in range(len(wordsintxt)):
eachword = wordsintxt[i]
I'm not sure why you'd want to do that; the operation is idiomatic and common enough that Python developers created enumerate for exactly that purpose.
You can use this:
wordsintxt = ["hello", "world", "the", "a", "Hello", "my", "name", "is", "the"]
words_data = {}
for i, word in enumerate(wordsintxt):
word = word.lower()
words_data[word] = words_data.get(word, {'freq': 0, 'indexes': []})
words_data[word]['freq'] += 1
words_data[word]['indexes'].append(i)
for k, v in words_data.items():
print(k, '\t', v)
Which prints:
hello {'freq': 2, 'indexes': [0, 4]}
world {'freq': 1, 'indexes': [1]}
the {'freq': 2, 'indexes': [2, 8]}
a {'freq': 1, 'indexes': [3]}
my {'freq': 1, 'indexes': [5]}
name {'freq': 1, 'indexes': [6]}
is {'freq': 1, 'indexes': [7]}
You can avoid checking if the value exists in your dictionary and then performing a custom action by just using data[key] = data.get(key, STARTING_VALUE)
Greetings!
Use collections.defaultdict with enumerate, just append all the indexes you retrieve from enumerate
from collections import defaultdict
with open('test.txt') as f:
content = f.read()
words = content.split()
dd = defaultdict(list)
for i, v in enumerate(words):
dd[v.lower()].append(i)
print(dd)
# defaultdict(<class 'list'>, {'i': [0, 6, 35, 54, 57], 'have': [1, 36, 58],... 'lowercase.': [62]})
Related
I'd like to write a function that will take one argument (a text file) to use its contents as keys and assign values to the keys. But I'd like the keys to go from 1 to n:
{'A': 1, 'B': 2, 'C': 3, 'D': 4... }.
I tried to write something like this:
Base code which kind of works:
filename = 'words.txt'
with open(filename, 'r') as f:
text = f.read()
ready_text = text.split()
def create_dict(lst):
""" go through the arg, stores items in it as keys in a dict"""
dictionary = dict()
for item in lst:
if item not in dictionary:
dictionary[item] = 1
else:
dictionary[item] += 1
return dictionary
print(create_dict(ready_text))
The output: {'A': 1, 'B': 1, 'C': 1, 'D': 1... }.
Attempt to make the thing work:
def create_dict(lst):
""" go through the arg, stores items in it as keys in a dict"""
dictionary = dict()
values = list(range(100)) # values
for item in lst:
if item not in dictionary:
for value in values:
dictionary[item] = values[value]
else:
dictionary[item] = values[value]
return dictionary
The output: {'A': 99, 'B': 99, 'C': 99, 'D': 99... }.
My attempt doesn't work. It gives all the keys 99 as their value.
Bonus question: How can I optimaze my code and make it look more elegant/cleaner?
Thank you in advance.
You can use dict comprehension with enumerate (note the start parameter):
words.txt:
colorless green ideas sleep furiously
Code:
with open('words.txt', 'r') as f:
words = f.read().split()
dct = {word: i for i, word in enumerate(words, start=1)}
print(dct)
# {'colorless': 1, 'green': 2, 'ideas': 3, 'sleep': 4, 'furiously': 5}
Note that "to be or not to be" will result in {'to': 5, 'be': 6, 'or': 3, 'not': 4}, perhaps what you don't want. Having only one entry out of two (same) words is not the result of the algorithm here. Rather, it is inevitable as long as you use a dict.
Your program sends a list of strings to create_dict. For each string in the list, if that string is not in the dictionary, then the dictionary value for that key is set to 1. If that string has been encountered before, then the value of that key is increased by 1. So, since every key is being set to 1, then that must mean there are no repeat keys anywhere, meaning you're sending a list of unique strings.
So, in order to have the numerical values increase with each new key, you just have to increment some number during your loop:
num = 0
for item in lst:
num += 1
dictionary[item] = num
There's an easier way to loop through both numbers and list items at the same time, via enumerate():
for num, item in enumerate(lst, start=1): # start at 1 and not 0
dictionary[item] = num
You can use this code. If an item has been in the lst more than once, the idx is considered one time in dictionary!
def create_dict(lst):
""" go through the arg, stores items in it as keys in a dict"""
dictionary = dict()
idx = 1
for item in lst:
if item not in dictionary:
dictionary[item]=idx
idx += 1
return dictionary
My function is supposed to have:
One parameter as a tweet.
This tweet can involve numbers, words, hashtags, links and punctuations.
A second parameter is a dictionary that counts the words in that string with tweets, disregarding the hashtag's, mentions, links, and punctuation included in it.
The function returns all individual words in the dictionary as lowercase letters without any punctuation.
If the tweet had Don't then the dictionary would count it as dont.
Here is my function:
def count_words(tweet, num_words):
''' (str, dict of {str: int}) -> None
Return a NoneType that updates the count of words in the dictionary.
>>> count_words('We have made too much progress', num_words)
>>> num_words
{'we': 1, 'have': 1, 'made': 1, 'too': 1, 'much': 1, 'progress': 1}
>>> count_words("#utmandrew Don't you wish you could vote? #MakeAmericaGreatAgain", num_words)
>>> num_words
{'dont': 1, 'wish': 1, 'you': 2, 'could': 1, 'vote': 1}
>>> count_words('I am fighting for you! #FollowTheMoney', num_words)
>>> num_words
{'i': 1, 'am': 1, 'fighting': 1, 'for': 1, 'you': 1}
>>> count_words('', num_words)
>>> num_words
{'': 0}
'''
I might misunderstand your question, but if you want to update the dictionary you can do it in this manner:
d = {}
def update_dict(tweet):
for i in tweet.split():
if i not in d:
d[i] = 1
else:
d[i] += 1
return d
I have a word occurrence dictionary, and a synonym dictionary.
Word occurrence dictionary example:
word_count = {'grizzly': 2, 'panda': 4, 'beer': 3, 'ale': 5}
Synonym dictionary example:
synonyms = {
'bear': ['grizzly', 'bear', 'panda', 'kodiak'],
'beer': ['beer', 'ale', 'lager']
}
I would like to comibine/rename aggregate the word count dictionary as
new_word_count = {'bear': 6, 'beer': 8}
I thought I would try this:
new_dict = {}
for word_key, word_value in word_count.items(): # Loop through word count dict
for syn_key, syn_value in synonyms.items(): # Loop through synonym dict
if word_key in [x for y in syn_value for x in y]: # Check if word in synonyms
if syn_key in new_dict: # If so:
new_dict[syn_key] += word_value # Increment count
else: # If not:
new_dict[syn_key] = word_value # Create key
But this isn't working, new_dict ends up empty. Also, is there an easier way to do this? Maybe using dictionary comprehension?
Using dict comprehension, sum and dict.get:
In [11]: {w: sum(word_count.get(x, 0) for x in ws) for w, ws in synonyms.items()}
Out[11]: {'bear': 6, 'beer': 8}
Using collections.Counter and dict.get:
from collections import Counter
ec = Counter()
for x, vs in synonyms.items():
for v in vs:
ec[x] += word_count.get(v, 0)
print(ec) # Counter({'bear': 6, 'beer': 8})
Let's change your synonym dictionary a little. Instead of mapping from a word to a list of all its synonyms, let's map from a word to its parent synonym (i.e. ale to beer). This should speed up lookups
synonyms = {
'bear': ['grizzly', 'bear', 'panda', 'kodiak'],
'beer': ['beer', 'ale', 'lager']
}
synonyms = {syn:word for word,syns in synonyms.items() for syn in syns}
Now, let's make your aggregate dictionary:
word_count = {'grizzly': 2, 'panda': 4, 'beer': 3, 'ale': 5}
new_word_count = {}
for word,count in word_count:
word = synonyms[word]
if word not in new_word_count:
new_word_count[word] = 0
new_word_count[word] += count
I have a dictionary like this
d = {1:'Bob', 2:'Joe', 3:'Bob', 4:'Bill', 5:'Bill'}
I want to keep a count of how many times each name occurs as a dictionary value. So, the output should be like this:
d = {1:['Bob', 1], 2:['Joe',1], 3:['Bob', 2], 4:['Bill',1] , 5:['Bill',2]}
One way of counting the values like you want, is shown below:
from collections import Counter
d = {1:'Bob',2:'Joe',3:'Bob', 4:'Bill', 5:'Bill'}
c = Counter()
new_d = {}
for k in sorted(d.keys()):
name = d[k]
c[name] += 1;
new_d[k] = [name, c[name]]
print(new_d)
# {1: ['Bob', 1], 2: ['Joe', 1], 3: ['Bob', 2], 4: ['Bill', 1], 5: ['Bill', 2]}
Here I use Counter to keep track of occurrences of names in the input dictionary. Hope this helps. Maybe not most elegant code, but it works.
To impose an order (which a dict per se doesn't have), let's say you're going in sorted order on the keys. Then you could do -- assuming the values are hashable, as in you example...:
import collections
def enriched_by_count(somedict):
countsofar = collections.defaultdict(int)
result = {}
for k in sorted(somedict):
v = somedict[k]
countsofar[v] += 1
result[k] = [v, countsofar[v]]
return result
Without using any modules, this is the code I came up with. Maybe not as short, but I am scared of modules.
def new_dict(d):
check = [] #List for checking against
new_dict = {} #The new dictionary to be returned
for i in sorted(d.keys()): #Loop through all the dictionary items
val = d[i] #Store the dictionary item value in a variable just for clarity
check.append(val) #Add the current item to the array
new_dict[i] = [d[i], check.count(val)] #See how many of the items there are in the array
return new_dict
Use like so:
d = {1:'Bob', 2:'Joe', 3:'Bob', 4:'Bill', 5:'Bill'}
d = new_dict(d)
print d
Output:
{1: ['Bob', 1], 2: ['Joe', 1], 3: ['Bob', 2], 4: ['Bill', 1], 5: ['Bill', 2]}
I'm trying to compute the frequencies of words using a dictionary in a nested lists. Each nested list is a sentence broken up into each word. Also, I want to delete proper nouns and lower case words at the beginning of the sentence. Is it even possible to get ride of proper nouns?
x = [["Hey", "Kyle","are", "you", "doing"],["I", "am", "doing", "fine"]["Kyle", "what", "time" "is", "it"]
from collections import Counter
def computeFrequencies(x):
count = Counter()
for listofWords in L:
for word in L:
count[word] += 1
return count
It is returning an error: unhashable type: 'list'
I want to return exactly this without the Counter() around the dictionary:
{"hey": 1, "how": 1, "are": 1, "you": 1, "doing": 2, "i": , "am": 1, "fine": 1, "what": 1, "time": 1, "is": 1, "it": 1}
Since your data is nested, you can flatten it with chain.from_iterable like this
from itertools import chain
from collections import Counter
print Counter(chain.from_iterable(x))
# Counter({'doing': 2, 'Kyle': 2, 'what': 1, 'timeis': 1, 'am': 1, 'Hey': 1, 'I': 1, 'are': 1, 'it': 1, 'you': 1, 'fine': 1})
If you want to use generator expression, then you can do
from collections import Counter
print Counter(item for items in x for item in items)
If you want to do this without using Counter, then you can use a normal dictionary like this
my_counter = {}
for line in x:
for word in line:
my_counter[word] = my_counter.get(word, 0) + 1
print my_counter
You can also use collections.defaultdict, like this
from collections import defaultdict
my_counter = defaultdict(int)
for line in x:
for word in line:
my_counter[word] += 1
print my_counter
Okay, if you simply want to convert the Counter object to a dict object (which I believe is not necessary at all since Counter is actually a dictionary. You can access key-values, iterate, delete update the Counter object just like a normal dictionary object), you can use bsoist's suggestion,
print dict(Counter(chain.from_iterable(x)))
The problem is that you are iterating over L twice.
Replace the inner loop:
for word in L:
with:
for word in listofWords:
Though, if want to go "pythonic" - check out #thefourtheye's solution.