Finding number of words in a file in python

Finding number of words in a file in python - python

I'm new to python and attempting to do an exercise where I open a txt file and then read the contents of it (probably straight forward for most but I will admit I am struggling a bit).
I opened my file and used .read() to read the file. I then proceeded to remove the file of any punctation.
Next I created a for loop. In this loop I began my using .split() and adding to an expression:
words = words + len(characters)
words being previously defined as 0 outside the loop and characters being what was split at the beginning of the loop.
Very long story short, the problem that I'm having now is that instead of adding the entire word to my counter, each individual character is being added. Anything I can do to fix that in my for loop?
my_document = open("book.txt")
readTheDocument = my_document.read
comma = readTheDocument.replace(",", "")
period = comma.replace(".", "")
stripDocument = period.strip()
numberOfWords = 0
for line in my_document:
splitDocument = line.split()
numberOfWords = numberOfWords + len(splitDocument)
print(numberOfWords)

A more Pythonic way is to use with:
with open("book.txt") as infile:
count = len(infile.read().split())
You've got to understand that by using .split() you are not really getting real grammatical words. You are getting word-like fragments. If you want proper words, use module nltk:
import nltk
with open("book.txt") as infile:
count = len(nltk.word_tokenize(infile.read()))

Just open the file and split to get the count of words.
file=open("path/to/file/name.txt","r+")
count=0
for word in file.read().split():
count = count + 1
print(count)

Related

save Python list to a txt file in nice order [duplicate]

I have a file that is like this:
word, number
word, number
[...]
and I want to take/keep just the words, again one word in new line
word
word
[...]
My code so far
f = open("new_file.txt", "w")
with open("initial_file.txt" , "r+") as l:
for line in l:
word = line.split(", ")[0]
f.write(word)
print word # debugging purposes
gives me all the words in one line in the new file
wordwordwordword[...]
Which is the pythonic and most optimized way to do this?
I tried to use f.write("\n".join(word)) but what I got was
wordw
ordw
[...]

You can just use f.write(str(word)+"\n") to do this. Here str is used to make sure we can add "\n".
If you're on Windows, it's better to use "\r\n" instead.

Python word counting program for .txt files keeps on showing string index out of range as an error code

Im pretty new to this and i was trying to write a program which counts the words in txt files. There is probably a better way of doing this, but this was the idea i came up with, so i wanted to go through with it. I just don´t understand, why i, or any variable, does´nt work for as an index for the string of the page, that i´m counting on...
Do you guys have a solution or should i just take a different approach?
page = open("venv\harrry_potter.txt", "r")
alphabet = "qwertzuiopüasdfghjklöäyxcvbnmßQWERTZUIOPÜASDFGHJKLÖÄYXCVBNM"
# Counting the characters
list_of_lines = page.readlines()
characternum = 0
textstr = "" # to convert the .txt file to string
for line in list_of_lines:
for character in line:
characternum += 1
textstr += character
# Counting the words
i = 0
wordnum = 1
while i <= characternum:
if textstr[i] not in alphabet and textstr[i+1] in alphabet:
wordnum += 1
i += 1
print(wordnum)
page.close()
Counting the characters and converting the .txt file to string is done a bit weird, because i thought the other way could be the source of the problem...
Can you help me please?

Typically you want to use split for simplistically counting words. They way you are doing it you will get right-minded as two words, or don't as 2 words. If you can just rely on spaces then you can just use split like this:
book = "Hello, my name is Inigo Montoya, you killed my father, prepare to die."
words = book.split()
print(f'word count = {len(words)}')
you can also use parameters to split to add more options if the given doesn't suit you.
https://pythonexamples.org/python-count-number-of-words-in-text-file/

You want to get the word count of a text file
The shortest code is this (that I could come up with):
with open('lorem.txt', 'r') as file:
print(len(file.read().split()))
First of for smaller files this is fine but this loads all of the data into the memory so not that great for large files. First of use a context manager (with), it helps with error handling an other stuff. What happens is you print the length of the whole file read and split by space so file.read() reads the whole file and returns a string, so you use .split() on it and it splits the whole string by space and returns a list of each word in between spaces so you get the lenght of that.
A better approach would be this:
word_count = 0
with open('lorem.txt', 'r') as file:
for line in file:
word_count += len(line.split())
print(word_count)
Because here the whole file is not saved into memory, you read each line separately and overwrite the previous in the memory. Here again for each line you split it by space and measure the length of the returned list, then add to the total word count. At the end simply print out the total word count.
Useful sources:
about with
Context Managers - Efficiently Managing Resources (to learn how they work a bit in detail) by Corey Schafer
.split() "docs"

Calculating how many times sentence words are repeating in the file

I want to check how many times a word is repeating in the file. I have seen other codes on finding words in file but they won't solve my problem.From this I mean if I want to find "Python is my favourite language"The program will split the text will tell how many times it has repeated in the file.
def search_tand_export():
file = open("mine.txt")
#targetlist = list()
#targetList = [line.rstrip() for line in open("mine.txt")]
contentlist = file.read().split(" ")
string=input("search box").split(" ")
print(string)
fre={}
outputfile=open("outputfile.txt",'w')
for word in contentlist:
print(word)
for i in string:
# print(i)
if i == word:
print(f"'{string}' is in text file ")
outputfile.write(word)
print(word)
spl=tuple(string.split())
for j in range(0,len(contentist)):
if spl in contentlist:
fre[spl]+=1
else:
fre[spl]=1
sor_list=sorted(fre.items(),key =lambda x:x[1])
for x,y in sor_list:
print(f"Word\tFrequency")
print(f"{x}\t{y}")
else:
continue
print(f"The word or collection of word is not present")
search_tand_export()

I don't quite understand what you're trying to do.
But I suppose you are trying to find how many times every word from a given sentence is repeated in the file.
If this is the case, you can try something like this:
sentence = "Python is my favorite programming language"
words = sentence.split()
with open("file.txt") as fp:
file_data = fp.read()
for word in words:
print(f"{file_data.count(word)} occurence(s) of '{word}' found")
Note that the code above is case-sensitive (that is, "Python" and "python" are different words). To make it case-insensitive, you can bring file_data and every word during comparison to lowercase using str.lower().
sentence = "Python is my favorite programming language"
words = sentence.split()
with open("file.txt") as fp:
file_data = fp.read().lower()
for word in words:
print(f"{file_data.count(word.lower())} occurence(s) of '{word}' found")
A couple of things to note:
You are opening a file and even don't close it finally (although you should). It's better to use with open(...) as ... (context-manager), so the file is closed automatically.
Python strings (as well as lists, tuples etc.) have .count(what) method. It returns how many occurences of what are found in the object.
Read about PEP-8 coding style and give better names to variables. For example, it is not easy to understand what does fre means in your code. But if you name it as frequency, the code will become more readable, and it will be easier to work with it.
to be continued

Try this script. It finds word in file and counts how many times it is found in words:
file = open('hello.txt','r')
word = 'Python'
words = 0
for line in file:
for word in line:
words += 1
print('File contains ' + word + ' ' + str(words) + ' times' )

How can I read a line from a file and split it

I am stuck on a bit of code and I can't get it to work.
from random import randint
def random_song():
global song
linenum = randint(1,43)
open('data.txt')
band_song = readlines."data.txt"(1)
global band
band = band_song.readlines(linenum)
song = band_song.split(" ,")
What I'm trying to do is generate a random number between the 1st and last line of a text file and then read that specific line. Then split the line to 2 strings. Eg: line 26, "Iron Maiden,Phantom of the Opera" split to "Iron Maiden" and then "Phantom of the Opera
Also, how do I split the second string to the first letter of each word and to get that to work for any length and number of letters per word & number of words?
Thank you,
MiniBitComputers

There's a space in your split string, you don't need it, just split on ',' and using .strip() to get rid of white space on the outside of the result.
There's some odd code around the reading of the code as well. And you're splitting the list of read lines, not just the line you want to read.
There's also no need for using globals, it's a bad practice and best avoided in almost all cases.
All that fixed:
from random import randint
def random_song():
with open('data.txt') as f:
lines = f.readlines()
artist, song = lines[randint(1,43)].split(',')
return artist.strip(), song.strip()
print(random_song())
Note that using with ensures the file is closed once the with block ends.
As for getting the first letter of each word:
s = 'This is a bunch of words of varying length.'
first_letters = [word[0] for word in s.split(' ')]
print(first_letters)

How to input a line word by word in Python?

I have multiple files, each with a line with, say ~10M numbers each. I want to check each file and print a 0 for each file that has numbers repeated and 1 for each that doesn't.
I am using a list for counting frequency. Because of the large amount of numbers per line I want to update the frequency after accepting each number and break as soon as I find a repeated number. While this is simple in C, I have no idea how to do this in Python.
How do I input a line in a word-by-word manner without storing (or taking as input) the whole line?
EDIT: I also need a way for doing this from live input rather than a file.

Read the line, split the line, copy the array result into a set. If the size of the set is less than the size of the array, the file contains repeated elements
with open('filename', 'r') as f:
for line in f:
# Here is where you do what I said above
To read the file word by word, try this
import itertools
def readWords(file_object):
word = ""
for ch in itertools.takewhile(lambda c: bool(c), itertools.imap(file_object.read, itertools.repeat(1))):
if ch.isspace():
if word: # In case of multiple spaces
yield word
word = ""
continue
word += ch
if word:
yield word # Handles last word before EOF
Then you can do:
with open('filename', 'r') as f:
for num in itertools.imap(int, readWords(f)):
# Store the numbers in a set, and use the set to check if the number already exists
This method should also work for streams because it only reads one byte at a time and outputs a single space delimited string from the input stream.
After giving this answer, I've updated this method quite a bit. Have a look
<script src="https://gist.github.com/smac89/bddb27d975c59a5f053256c893630cdc.js"></script>

The way you are asking it is not possible I guess. You can't read word by word as such in python . Something of this can be done:
f = open('words.txt')
for word in f.read().split():
print(word)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding number of words in a file in python - python

Just open the file and split to get the count of words. file=open("path/to/file/name.txt","r+") count=0 for word in file.read().split(): count = count + 1 print(count)

Related

save Python list to a txt file in nice order [duplicate]

Python word counting program for .txt files keeps on showing string index out of range as an error code

Calculating how many times sentence words are repeating in the file

How can I read a line from a file and split it

How to input a line word by word in Python?

Categories

Resources