'str' object has no attribute 'txt' - python

I'm trying to get this code to work and keep getting
AttributeError: 'str' object has no attribute 'txt'
my code is as written below, I am new to this so any help would be greatly appreciated. I for the life of me cannot figure out what I am doing wrong.
def countFrequency(alice):
# Open file for reading
file = open(alice.txt, "r")
# Create an empty dictionary to store the words and their frequency
wordFreq = {}
# Read file line by line
for line in file:
# Split the line into words
words = line.strip().split()
# Iterate through the list of words
for i in range(len(words)):
# Remove punctuations and special symbols from the word
for ch in '!"#$%&()*+,-./:;<=>?<#[\\]^_`{|}~' :
words[i] = words[i].replace(ch, "")
# Convert the word to lowercase
words[i] = words[i].lower()
# Add the word to the dictionary with a frequency of 1 if it is not already in the dictionary
if words[i] not in wordFreq:
wordFreq[words[i]] = 1
# Increase the frequency of the word by 1 in the dictionary if it is already in the dictionary
else:
wordFreq[words[i]] += 1
# Close the file
file.close()
# Return the dictionary
return wordFreq
if __name__ == "__main__":
# Call the function to get frequency of the words in the file
wordFreq = countFrequency("alice.txt")
# Open file for writing
outFile = open("most_frequent_alice.txt", "w")
# Write the number of unique words to the file
outFile.write("Total number of unique words in the file: " + str(len(wordFreq)) + "\n")
# Write the top 20 most used words and their frequency to the file
outFile.write("\nTop 20 most used words and their frequency:\n\n")
outFile.write("{:<20} {}\n" .format("Word", "Frequency"))
wordFreq = sorted(wordFreq.items(), key = lambda kv:(kv[1], kv[0]), reverse = True)
for i in range(20):
outFile.write("{:<20} {}\n" .format(wordFreq[i][0], str(wordFreq[i][1])))
# Close the file
outFile.close()

file = open("alice.txt", "r")
You missed the quotation, and you might need to give the correct location of that text file too.

Related

count how many times the same word occurs in a txt file

Hello I am struggling with this:
I have a txt file that once open looks like this:
Jim, task1\n
Marc, task3\n
Tom, task4\n
Jim, task2\n
Jim, task6\n
And I want to check how many duplicate names there are. I am interested only in the first field (i.e person name).
I tried to look for an answer on this website, but I could not find anything that helped, as in my case I don't know which name is duplicate, since this file txt will be updated frequently.
As I am new to Python/programming is there a simple way to solve this without using any dictionaries or list comprehensions or without importing modules?
Thank you
same_word_count = 0
with open('tasks.txt','r') as file2:
content = file2.readlines()
for line in content:
split_data = line.split(', ')
user = split_data[0]
word = user
if word == user:
same_word_count -= 1
print(same_word_count)
You can do the following.
word = "Word" # word you want to count
count = 0
with open("temp.txt", 'r') as f:
for line in f:
words = line.split()
for i in words:
if(i==word):
count=count+1
print("Occurrences of the word", word, ":", count)
Or you can get list of all words occurrences
# Open the file in read mode
text = open("sample.txt", "r")
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])

syntax errors on creating wordDictionary of word and occurences

Having Attribute error issue on line 32. Requesting some assistance figuring out how to display word and occurrences.
import re
file_object = open('dialog.txt')
# read the file content
fileContents = file_object.read()
# convert fileContents to lowercase
final_dialog = fileContents.lower()
# print(final_dialog)
# replace a-z and spaces with cleanText variable
a_string = final_dialog
cleanText = re.sub("[^0-9a-zA-Z]+", "1", a_string)
# print(cleanText)
# wordlist that contains all words found in cleanText
text_string = cleanText
wordList = re.sub("1"," ", text_string)
# print(wordList)
#wordDictionary to count occurrence of each word to list in wordList
wordDictionary = dict()
#loop through .txt
for line in list(wordList):
# remove spaces and newline characters
line = line.strip()
# split the line into words
words = line.split()
#iterate over each word in line
for word in words.split():
if word not in wordDictionary:
wordDictionary[word] = 1
else:
wordDictionary[word] += 1
# print contents of dictionary
print(word)
# print file content
# print(fileContents)
# close file
# file_object.close()
Having Attribute error issue on line 32. Requesting some assistance figuring out how to display word and occurrences.
I think the error is
for word in words.split():
and should be replaced with
for word in words:
Explanation: words is already a list. A list has no split method, so you'll get an AttributeError when trying to call that method.

Python 3: Trying to change a dictionary to a string and write to file, but it still won't write to file?

I'm trying to take text files, count the word usage of each word as key-value pairs in dictionaries, and write each dictionary to their own file. Then, I want to add all of the dictionaries together into one master dictionary, and then write that to its own text file. When I run the program, I keep getting a TypeError with the save_the_dictionary function, since it's getting passed a dictionary instead of a string; however, I thought that my save_the_dictionary function changes each key-value pair into strings before they are written to the file, but that doesn't seem to be the case. Any help with this would be greatly appreciate. Here is my code:
import os
from nltk.tokenize import sent_tokenize, word_tokenize
class Document:
def tokenize(self, text):
dictionary = {}
for line in text:
all_words = line.upper()
words = word_tokenize(all_words)
punctuation = '''!()-[]{};:'"\,<>./?##$%^&*_~'''
cleaned_words = []
for word in words:
if word not in punctuation:
cleaned_words.append(word)
for word in cleaned_words:
if word in dictionary:
dictionary[word] += 1
else:
dictionary[word] = 1
return dictionary
def save_the_dictionary(self, dictionary, filename): #This save function writes a new file, and turns each key and its corresponding value into strings and writes them into a text file-
newfile = open(filename, "w") #, it also adds formatting by tabbing over after the key, writing the value, and then making a new line. Then it closes the file.
for key, value in dictionary.items():
newfile.write(str(key) + "/t" + str(value) + "/n")
file.close()
# The main idea of this method is that it first converts all the text to uppercase strips all of the formatting from the file that it is reading, then it splits the text into a list,-
# using both whitespace and the characters above as delimiters. After that, it goes through the entire list pulled from the text file, and sees if it is in the dictionary variable. If-
# it is in there, it adds 1 to the value associated with that key. If it is not found within the dictionary variable, it adds to as a key to the dictionary variable, and sets its value-
# to 1.
#The above document class will only be used within the actual vectorize function.
def vectorize(filepath):
all_files = os.listdir(filepath)
full_dictionary = {}
for file in all_files:
doc = Document()
full_path = filepath + "\\" + file
textfile = open(full_path, "r", encoding="utf8")
text = textfile.read()
compiled_dictionary = doc.tokenize(text)
final_path = filepath + "\\final" + file
doc.save_the_dictionary(final_path, compiled_dictionary)
for line in text:
all_words = line.upper()
words = word_tokenize(all_words)
punctuation = '''!()-[]{};:'"\,<>./?##$%^&*_~'''
cleaned_words = []
for word in words:
if word not in punctuation:
cleaned_words.append(word)
for word in cleaned_words:
if word in dictionary:
full_dictionary[word] += 1
else:
full_dictionary[word] = 1
Document().save_the_dictionary(filepath + "\\df.txt", full_dictionary)
vectorize("C:\\Users\\******\\Desktop\\*******\\*****\\*****\\Text files")

JSON File: Separate Word Count for Different Objects with Python

For a current research project, I am planning to count the unique words of different objects in a JSON file. Ideally, the output file should show separate word count summaries (counting the occurence of unique words) for the texts in "Text Main", "Text Pro" and "Text Con". Is there any smart tweak to make this happen?
At the moment, I am receiving the following error message:
File "index.py", line 10, in <module>
text = data["Text_Main"]
TypeError: list indices must be integers or slices, not str
The JSON file has the following structure:
[
{"Stock Symbol":"A",
"Date":"05/11/2017",
"Text Main":"Text sample 1",
"Text Pro":"Text sample 2",
"Text Con":"Text sample 3"}
]
And the corresponding code looks like this:
# Import relevant libraries
import string
import json
import csv
import textblob
# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
text = data["Text_Main"]
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
# Save results as CSV
with open('Glassdoor_A.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(["Word", "Occurences", "Percentage"])
writer.writerows([key, d[key])
Well, firstly the key should be "Text Main" and secondly you need to access the first dict in the list. So just extract the text variable like this:
text = data[0]["Text Main"]
This should fix the error message.
Your JSON file has an object inside a list. In order to access the content you want, first you have to access the object via data[0]. Then you can access the string field. I would change the code to:
# Open JSON file and slice by object
file = open("Glassdoor_A.json", "r")
data = json.load(file)
json_obj = data[0]
text = json_obj["Text_Main"]
or you can access that field in a single line with text = data[0]["Text_Main"] as quamrana stated.

TypeError: 'NoneType' object is not iterable for a word counting program

When I run my current code, I get a TypeError: 'NoneType' object is not iterable. Specifically, the mostword=wordcount(wordlist) line 62 in the main function, and where I've added for x in wordlist on line 19. Can you possibly help me figure out where I'm going wrong?
def getwords():
#function to get words in the input file
try:
fp=open("sample.txt",'r')
except IOError:
print('Unable to open file')
return
words=[]
#read the file line by line
for line in fp:
#convert each line into words with space as delimiter
words=words+line.split()
return words
def wordcount(wordlist):
#function to count words in the file
#worddic is dictionary to store words frequency
worddic=dict()
for x in wordlist:
#convert word to lowercase to ignorecase
t=x.lower()
if(t not in worddic):
worddic[t]=0
worddic[t]=worddic[t]+1
max=-1
t=''
for x in worddic:
if(worddic[x]>max):
max=worddic[x]
t=x
return t
def letters(wordlist,lettercount):
#function to count letters in the file
for x in wordlist:
#For each word in the list
t=x.lower()
for y in t:
#for each letter in the word
if(not (y in lettercount)):
#if the letter is not in dictionary add it
#and set frequency to zero
lettercount[y]=0
#increment the frequency of letter in dictionary
lettercount[y] = lettercount[y]+1
def createoutput(lettercount,wordlist,mostword):
#creates an empty file 'statistics.txt'
try:
fout=open("statistics.txt",'w+')
except IOError:
print('Unable to create file')
fout.write('Number of words in the file are '+str(len(wordlist))+'\n')
fout.write('Most repeated word in the file is '+mostword+'\n')
for x in lettercount:
#write to the file 'statistics.txt'
fout.write(x+' appeared in the file for '+str(lettercount[x])+' times \n')
def main():
wordlist=getwords()
#lettercount is a dictionary with letters as keys
#and their frequency in the input file as data
lettercount=dict()
mostword=wordcount(wordlist)
letters(wordlist,lettercount)
createoutput(lettercount,wordlist,mostword)
main()
Thanks in advance. Much appreciated.
You declare the words inside the function getwords after the try/except block. If the IOException is raised, you simply return. This would return None on a failed open. Move words = [] to before the try/except.
This code should work now, left return words out of exception.
def getwords():
#function to get words in the input file
words = []
try:
fp = open("sample.txt", "r")
except IOError:
return words
#read the file line by line
for line in fp:
#convert each line into words with space as delimiter
words=words+line.split()
return words
def wordcount(wordlist):
#function to count words in the file
#worddic is dictionary to store words frequency
worddic=dict()
for x in wordlist:
#convert word to lowercase to ignorecase
t=x.lower()
if(t not in worddic):
worddic[t]=0
worddic[t]=worddic[t]+1
max=-1
t=''
for x in worddic:
if(worddic[x]>max):
max=worddic[x]
t=x
return t
def letters(wordlist,lettercount):
#function to count letters in the file
for x in wordlist:
#For each word in the list
t=x.lower()
for y in t:
#for each letter in the word
if(not (y in lettercount)):
#if the letter is not in dictionary add it
#and set frequency to zero
lettercount[y]=0
#increment the frequency of letter in dictionary
lettercount[y] = lettercount[y]+1
def createoutput(lettercount,wordlist,mostword):
#creates an empty file 'statistics.txt'
try:
fout=open("statistics.txt",'w+')
except IOError:
print('Unable to create file')
fout.write('Number of words in the file are '+str(len(wordlist))+'\n')
fout.write('Most repeated word in the file is '+mostword+'\n')
for x in lettercount:
#write to the file 'statistics.txt'
fout.write(x+' appeared in the file for '+str(lettercount[x])+' times \n')
def main():
wordlist=getwords()
#lettercount is a dictionary with letters as keys
#and their frequency in the input file as data
lettercount=dict()
mostword=wordcount(wordlist)
letters(wordlist,lettercount)
createoutput(lettercount,wordlist,mostword)
main()
I'm assuming get words is not returning anything within the array because you're appending the lines incorrectly. Add a print words before you return it. When you call get wordcount function, it replies with none type is not iterable. instead of words = words.line.split() , use the append function. words.append(line.split())

Categories

Resources