I'm in the process of learning python and I love how much can be accomplished in such a small amount of code but I'm getting confused about the syntax. I'm just trying to iterate through a dictionary and print out each item and value.
Here is my code:
words = {}
value = 1
for line in open("test.txt", 'r'):
for word in line.split():
print (word)
try:
words[word] += 1
except KeyError:
#wur you at key?
print("no")
words[word]=1
for item in words:
print ("{",item, ": ", words[item][0], " }")
My current print statement doesn't work and I can't find a good example of a large print statement using multiple variables. How would I print this properly?
Your problem seems to be that you're trying to print words[item][0], but words[item] is always going to be a number, and number can't be indexed.
So, just… don't do that:
print ("{",item, ": ", words[item], " }")
That's enough to fix it, but there are ways you could improve this code:
print with multiple arguments puts a space between each one, so you're going to end up printing { item : 3 }, when you probably didn't want all those spaces. You can fix that by using the keyword argument sep='', but a better solution is to use string formatting or the % operator.
You can get the keys and values at the same time by iterating over words.items() instead of words.
You can simplify the whole "store a default value if one isn't already there" by using the setdefault method, or by using a defaultdict—or, even more simply, you can use a Counter.
You should always close files that you open—preferably by using a with statement.
Be consistent in your style—don't put spaces after some functions but not others.
So:
import collections
with open("test.txt") as f:
words = collections.Counter(word for line in f for word in line.split())
for item, count in words.items():
print("{%s: %d}" % (item, count))
you can use dict.get and can eliminate try and except block.
words = {}
for line in open("test.txt", 'r'):
for word in line.split():
print (word)
words[word] = words.get(word,0) +1
for word,count in words.items():
print(word,count)
dict.get it return the key, if present in dictionary else default value
syntax: dict.get(key[,default])
you can also override __missing__:
class my_dict(dict):
def __missing__(self,key):
return 0
words = my_dict()
for line in open("test.txt", 'r'):
for word in line.split():
print (word)
words[word] += 1
for word,count in words.items():
print(word,count)
The best way to iterate through a dictionary as you're doing here is to loop by key AND value, unpacking the key-value tuple each time through the loop:
for item, count in words.items():
print("{", item, ": ", count, "}")
And as a side note, you don't really need that exception handling logic in that loop where you build the array. Dictionaries' get() methods can return a default value if the key isn't in the dictionary, simplifying your code to this:
words[word] = words.get(word, 0) + 1
Related
For a current research project, I am planning to measure the relative occurrence of a unique word within a JSON file. Currently, I have an indicator for the number of unique words within the file and their corresponding number of occurrences (e.g. "technology":"325") but am still lacking a method for a full word count.
The code as I am using for a full word count (total = sum(d[key])) yields the following notification. I have checked some solutions for similar problems but not found an applicable answer yet. Is there any smart way to get this solved?
total = sum(d[key]) - TypeError: 'int' object is not iterable
The corresponding code section looks like this:
# Create an empty dictionary
d = dict()
# processing:
for row in data:
line = row['Text Main']
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
# Count the total number of words
total = sum(d[key])
print(total)
https://docs.python.org/3/library/functions.html#sum
You are trying to sum(iterable, /, start=0) an integer. This doesn't make sense, because sum is meant to be called on an iterable. A brief explanation of an iterable is that it's something that you could use a for loop on. For example, a list.
You could either modify your # Print the contents of dictionary loop in one of the two following ways:
# Print the contents of dictionary
total = 0
for key in list(d.keys()):
print(key, ":", d[key])
# Count the total number of words
total += d[key]
print(total)
print("Actual total: ," total)
Or, more condensed:
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
# Get the total word count
total = sum(d.values())
python's built-in sum function takes iterable as argument, but you trying to pass an single number to it. your code is equivalent to
total = sum(1)
but sum function need add something iterable to compute sum from. e.g.
sum([1,2,3,4,5,6,7])
if you want to compute total number of words you can try:
sum(d.values())
d=dict()
d['A']=1
d['B']=2
d['C']=3
total = sum(d.values())
print total
for key in list(d.keys()):
print(key, ":", d[key], float(d[key])/total)
#Count the total number of words
d[key] is a single int
d.values() is a list
Write a program that reads the contents of a random text file. The program should create a dictionary in which the keys are individual words found in the file and the values are the number of times each word appears.
How would I go about doing this?
def main():
c = 0
dic = {}
words = set()
inFile = open('text2', 'r')
for line in inFile:
line = line.strip()
line = line.replace('.', '')
line = line.replace(',', '')
line = line.replace("'", '') #strips the punctuation
line = line.replace('"', '')
line = line.replace(';', '')
line = line.replace('?', '')
line = line.replace(':', '')
words = line.split()
for x in words:
for y in words:
if x == y:
c += 1
dic[x] = c
print(dic)
print(words)
inFile.close()
main()
Sorry for the vague question. Never asked any questions here before. This is what I have so far. Also, this is the first ever programming I've done so I expect it to be pretty terrible.
with open('path/to/file') as infile:
# code goes here
That's how you open a file
for line in infile:
# code goes here
That's how you read a file line-by-line
line.strip().split()
That's how you split a line into (white-space separated) words.
some_dictionary['abcd']
That's how you access the key 'abcd' in some_dictionary.
Questions for you:
What does it mean if you can't access the key in a dictionary?
What error does that give you? Can you catch it with a try/except block?
How do you increment a value?
Is there some function that GETS a default value from a dict if the key doesn't exist?
For what it's worth, there's also a function that does almost exactly this, but since this is pretty obviously homework it won't fulfill your assignment requirements anyway. It's in the collections module. If you're interested, try and figure out what it is :)
There are at least three different approaches to add a new word to the dictionary and count the number of occurences in this file.
def add_element_check1(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 1
else:
my_dict[e] += 1
def add_element_check2(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 0
my_dict[e] += 1
def add_element_except(my_dict, elements):
for e in elements:
try:
my_dict[e] += 1
except KeyError:
my_dict[e] = 1
my_words = {}
with open('pathtomyfile.txt', r) as in_file:
for line in in_file:
words = [word.strip().lower() word in line.strip().split()]
add_element_check1(my_words, words)
#or add_element_check2(my_words, words)
#or add_element_except(my_words, words)
If you are wondering which is the fastest? The answer is: it depends. It depends on how often a given word might occur in the file. If a word does only occur (relatively) few times, the try-except would be the best choice in your case.
I have done some simple benchmarks here
This is a perfect job for the built in Python Collections class. From it, you can import Counter, which is a dictionary subclass made for just this.
How you want to process your data is up to you. One way to do this would be something like this
from collections import Counter
# Open your file and split by white spaces
with open("yourfile.txt","r") as infile:
textData = infile.read()
# Replace characters you don't want with empty strings
textData = textData.replace(".","")
textData = textData.replace(",","")
textList = textData.split(" ")
# Put your data into the counter container datatype
dic = Counter(textList)
# Print out the results
for key,value in dic.items():
print "Word: %s\n Count: %d\n" % (key,value)
Hope this helps!
Matt
I'm writing a code that will go over each word in words, look them up in dictionary and then append the dictionary value to counter. However if I print counter, I only get the last number from my if statement, if any. If I place print counter inside the loop, then I get all the numbers for each individual word, but no total value.
My code is the following:
dictionary = {word:2, other:5, string:10}
words = "this is a string of words you see and other things"
if word in dictionary.keys():
number = dictionary[word]
counter += number
print counter
my example will give me:
[10]
[5]
while I want 15, preferable outside the loop, as in the real life code, words is not a single string but many strings which are being looped over.
Can anyone help me with this?
Here's a pretty straightforward example, that prints 15:
dictionary = {'word': 2, 'other': 5, 'string': 10}
words = "this is a string of words you see and other things"
counter = 0
for word in words.split():
if word in dictionary:
counter += dictionary[word]
print counter
Note that you should declare counter=0 before the loop and use word in dictionary instead of word in dictionary.keys().
You can also write the same thing in one-line using sum():
print sum(dictionary[word] for word in words.split() if word in dictionary)
or:
print sum(dictionary.get(word, 0) for word in words.split())
you should declare the counter outside the loop. Everything else you do in your code is correct.
The correct code:
dictionary = {word:2, other:5, string:10}
words = "this is a string of words you see and other things"
counter = 0
if word in dictionary.keys():
number = dictionary[word]
counter += number
print counter
I'm not sure what you're doing with that code, since I don't see any loop there. However, a way to do what you want would be the following:
sum(dictionary[word] for word in words.split() if word in dictionary)
I need to update a dictionary with the use of sets that I have. My program needs to essentially take a set and assign it to a value (in a dictionary). If the set already exists, i need to update its value (keep adding the values together).
Here is how my program works now:
for line in fd:
new_line = line.split(' ')
for word in new_line:
new_word = ''.join(l for l in word if l.isalpha())
new_word = new_word.lower()
ind_count = 0
for let in new_word:
c_dict[let, ind_count] = new_word
ind_count += 1
And in my fd file, it contains a list of words.
I want my result to look something like this:
print(c_dict)
{ (0, "h") : { "hello", "helps" } , (0, "c") : { "cow" } }
This essentially takes a letter from the word and it's index #, and sets the value to that word. My file will have hundreds of words that have the letter 'h' at position 0, and essentially the key (0, 'h') would have a value that contains all of those words.
Right now, my program just replaces the values. Any help would be greatly appreciated.
Thanks!
dict.setdefault() is perfect for this:
for line in fd:
new_line = line.split(' ')
for word in new_line:
new_word = ''.join(l for l in word if l.isalpha())
new_word = new_word.lower()
for ind_count, let in enumerate(new_word):
c_dict.setdefault((let, ind_count), set()).add(new_word)
Note that I also change the innermost for loop to use enumerate() rather than manually incrementing ind_index inside the loop.
c_dict.setdefault((let, ind_count), set()).add(new_word) is equivalent in behavior to the following code:
if (let, ind_count) in c_dict:
c_dict[let, ind_count].add(new_word)
else:
c_dict[let, ind_count] = set([new_word])
I have a small python script I am working on for a class homework assignment. The script reads a file and prints the 10 most frequent and infrequent words and their frequencies. For this assignment, a word is defined as 2 letters or more. I have the word frequencies working just fine, however the third part of the assignment is to print the total number of unique words in the document. Unique words meaning count every word in the document, only once.
Without changing my current script too much, how can I count all the words in the document only one time?
p.s. I am using Python 2.6 so please don't mention the use of collections.Counter
from string import punctuation
from collections import defaultdict
import re
number = 10
words = {}
total_unique = 0
words_only = re.compile(r'^[a-z]{2,}$')
counter = defaultdict(int)
"""Define words as 2+ letters"""
def count_unique(s):
count = 0
if word in line:
if len(word) >= 2:
count += 1
return count
"""Open text document, read it, strip it, then filter it"""
txt_file = open('charactermask.txt', 'r')
for line in txt_file:
for word in line.strip().split():
word = word.strip(punctuation).lower()
if words_only.match(word):
counter[word] += 1
# Most Frequent Words
top_words = sorted(counter.iteritems(),
key=lambda(word, count): (-count, word))[:number]
print "Most Frequent Words: "
for word, frequency in top_words:
print "%s: %d" % (word, frequency)
# Least Frequent Words:
least_words = sorted(counter.iteritems(),
key=lambda (word, count): (count, word))[:number]
print " "
print "Least Frequent Words: "
for word, frequency in least_words:
print "%s: %d" % (word, frequency)
# Total Unique Words:
print " "
print "Total Number of Unique Words: %s " % total_unique
Count the number of keys in your counter dictionary:
total_unique = len(counter.keys())
Or more simply:
total_unique = len(counter)
A defaultdict is great, but it might be more that what you need. You will need it for the part about most frequent words. But in the absence of that question, using a defaultdict is overkill. In such a situation, I would suggest using a set instead:
words = set()
for line in txt_file:
for word in line.strip().split():
word = word.strip(punctuation).lower()
if words_only.match(word):
words.add(word)
num_unique_words = len(words)
Now words contains only unique words.
I am only posting this because you say that you are new to python, so I want to make sure that you are aware of sets as well. Again, for your purposes, a defaultdict works fine and is justified