Python - Calculating length of string is inaccurate for certain strings only

Python - Calculating length of string is inaccurate for certain strings only - python

I'm new to programming and trying to make a basic hangman game. For some reason when calculating the length of a string from a text file some words have the length of the string calculated incorrectly. Some strings have values too high and some too low. I can't seem to figure out why. I have already ensured that there are no spaces in the text file so that the space is counted as a character.
import random
#chooses word from textfile for hangman
def choose_word():
words = []
with open("words.txt", "r") as file:
words = file.readlines()
#number of words in text file
num_words = sum(1 for line in open("words.txt"))
n = random.randint(1, num_words)
#chooses the selected word from the text file
chosen_word = (words[n-1])
print(chosen_word)
#calculates the length of the word
len_word = len(chosen_word)
print(len_word)
choose_word()
#obama returns 5
#snake, turtle, hoodie, all return 7
#intentions returns 11
#racecar returns 8
words.txt
snake
racecar
turtle
cowboy
intentions
hoodie
obama

Use strip().
string.strip(s[, chars])
Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.
Example:
>>> ' Hello'.strip()
'Hello'
Try this:
import random
#chooses word from textfile for hangman
def choose_word():
words = []
with open("words.txt", "r") as file:
words = file.readlines()
#number of words in text file
num_words = sum(1 for line in open("words.txt"))
n = random.randint(1, num_words)
#chooses the selected word from the text file
chosen_word = (words[n-1].strip())
print(chosen_word)
#calculates the length of the word
len_word = len(chosen_word)
print(len_word)
choose_word()

You are reading a random line from a text file.
Probably you have spaces in some lines after the words in those lines.
For example, the word "snake" is written in the file as "snake ", so it has length of 7.
To solve it you can either:
A) Manually or by a script remove the spaces in the file
B) When you read a random line from the text, before you check the length of the word, write: chosen_word = chosen_word.replace(" ", "").
This will remove the spaces from your word.

You need to strip all spaces from each line. This removes the beginning and trailing spaces. Here is your corrected code.
import random
# chooses word from textfile for hangman
def choose_word():
words = []
with open("./words.txt", "r") as file:
words = file.readlines()
# number of words in text file
num_words = sum(1 for line in open("words.txt"))
n = random.randint(1, num_words)
# chooses the selected word from the text file
# Added strip() to remove spaces surrounding your words
chosen_word = (words[n-1]).strip()
print(chosen_word)
# calculates the length of the word
len_word = len(chosen_word)
print(len_word)
choose_word()

Im supposing that the .txt file contains one word per line and without commas.
Maybe try to change some things here:
First, notice that the readlines() method is returning a list with all the lines but that also includes the newline string "\n".
# This deletes the newline from each line
# strip() also matches new lines as Hampus Larsson suggested
words = [x.strip() for x in file.readlines()]
You can calculate the number of words from the length of the words list itself:
num_words = len(words)
You do not need parenthesis to get the random word
chosen_word = words[n]
It should now work correctly!

in the file everyword has an \n to symbolize a new line.
in order to cut that out you have to replace:
chosen_word = (words[n-1])
by
chosen_word = (words[n-1][:-1])
this will cut of the last two letters of the chosen word!

Related

syntax errors on creating wordDictionary of word and occurences

Having Attribute error issue on line 32. Requesting some assistance figuring out how to display word and occurrences.
import re
file_object = open('dialog.txt')
# read the file content
fileContents = file_object.read()
# convert fileContents to lowercase
final_dialog = fileContents.lower()
# print(final_dialog)
# replace a-z and spaces with cleanText variable
a_string = final_dialog
cleanText = re.sub("[^0-9a-zA-Z]+", "1", a_string)
# print(cleanText)
# wordlist that contains all words found in cleanText
text_string = cleanText
wordList = re.sub("1"," ", text_string)
# print(wordList)
#wordDictionary to count occurrence of each word to list in wordList
wordDictionary = dict()
#loop through .txt
for line in list(wordList):
# remove spaces and newline characters
line = line.strip()
# split the line into words
words = line.split()
#iterate over each word in line
for word in words.split():
if word not in wordDictionary:
wordDictionary[word] = 1
else:
wordDictionary[word] += 1
# print contents of dictionary
print(word)
# print file content
# print(fileContents)
# close file
# file_object.close()
Having Attribute error issue on line 32. Requesting some assistance figuring out how to display word and occurrences.

I think the error is
for word in words.split():
and should be replaced with
for word in words:
Explanation: words is already a list. A list has no split method, so you'll get an AttributeError when trying to call that method.

How can I read a line from a file and split it

I am stuck on a bit of code and I can't get it to work.
from random import randint
def random_song():
global song
linenum = randint(1,43)
open('data.txt')
band_song = readlines."data.txt"(1)
global band
band = band_song.readlines(linenum)
song = band_song.split(" ,")
What I'm trying to do is generate a random number between the 1st and last line of a text file and then read that specific line. Then split the line to 2 strings. Eg: line 26, "Iron Maiden,Phantom of the Opera" split to "Iron Maiden" and then "Phantom of the Opera
Also, how do I split the second string to the first letter of each word and to get that to work for any length and number of letters per word & number of words?
Thank you,
MiniBitComputers

There's a space in your split string, you don't need it, just split on ',' and using .strip() to get rid of white space on the outside of the result.
There's some odd code around the reading of the code as well. And you're splitting the list of read lines, not just the line you want to read.
There's also no need for using globals, it's a bad practice and best avoided in almost all cases.
All that fixed:
from random import randint
def random_song():
with open('data.txt') as f:
lines = f.readlines()
artist, song = lines[randint(1,43)].split(',')
return artist.strip(), song.strip()
print(random_song())
Note that using with ensures the file is closed once the with block ends.
As for getting the first letter of each word:
s = 'This is a bunch of words of varying length.'
first_letters = [word[0] for word in s.split(' ')]
print(first_letters)

python - interpreting two spaces as one from text file

I'm trying to make a program that translates morse code from a text file. In theory it should be pretty easy but the problem is that I find the formatting of the text file a bit silly (its school work so can't change that). What I meant by that is that in the file one space separates two characters (like this -. ---) but two spaces equal end of a word (so space in the translated text). Like this: .--. .-.. . .- ... . .... . .-.. .--. .-.-.-
This is what I have, but it gives me translated text without the spaces.
translator = {} #alphabet and the equivalent code, which I got from another file
message = []
translated = ("")
msg_file = open(msg.txt,"r")
for line in msg_file:
line = line.rstrip()
part = line.rsplit(" ")
message.extend(part)
for i in message:
if i in translator.keys():
translated += (translator[i])
print(translated)
I also dont know how to intercept the line change (\n).

Why don't you split on two spaces to get the words, then on space to get the characters? Something like:
translated = "" # store for the translated text
with open("msg.txt", "r") as f: # open your file for reading
for line in f: # read the file line by line
words = line.split(" ") # split by two spaces to get our words
parsed = [] # storage for our parsed words
for word in words: # iterate over the words
word = [] # we'll re-use this to store our translated characters
for char in word.split(" "): # get characters by splitting and iterate over them
word.append(translator.get(char, " ")) # translate the character
parsed.append("".join(word)) # join the characters and add the word to `parsed`
translated += " ".join(parsed) # join the parsed words and add them to `translated`
# uncomment if you want to add new line after each line of the file:
# translated += "\n"
print(translated) # print the translated string
# PLEASE HELP!
Of course, all this assuming your translator dict has proper mapping.

Split on double-space first to get a list of words in each line then you can split the words on a single space to get characters to feed your translator
translator = {} #alphabet and the equivalent code, which I got from another file
message = []
translated = ("")
with open('msg.txt',"r") as msg_file:
for line in msg_file:
line = line.strip()
words = line.split(' ')
line = []
for word in words:
characters = word.split()
word = []
for char in characters:
word.append(translator[char])
line.append(''.join(word))
message.append(' '.join(line))
print('\n'.join(message))

Find the number of characters in a file using Python

Here is the question:
I have a file with these words:
hey how are you
I am fine and you
Yes I am fine
And it is asked to find the number of words, lines and characters.
Below is my program, but the number of counts for the characters without space is not correct.
The number of words is correct and the number of line is correct.
What is the mistake in the same loop?
fname = input("Enter the name of the file:")
infile = open(fname, 'r')
lines = 0
words = 0
characters = 0
for line in infile:
wordslist = line.split()
lines = lines + 1
words = words + len(wordslist)
characters = characters + len(line)
print(lines)
print(words)
print(characters)
The output is:
lines=3(Correct)
words=13(correct)
characters=47
I've looked on the site with multiple answers and I am confused because I didn't learn some other functions in Python. How do I correct the code as simple and basic as it is in the loop I've done?
Whereas the number of characters without space is 35 and with space is 45.
If possible, I want to find the number of characters without space. Even if someone know the loop for the number of characters with space that's fine.

Sum up the length of all words in a line:
characters += sum(len(word) for word in wordslist)
The whole program:
with open('my_words.txt') as infile:
lines=0
words=0
characters=0
for line in infile:
wordslist=line.split()
lines=lines+1
words=words+len(wordslist)
characters += sum(len(word) for word in wordslist)
print(lines)
print(words)
print(characters)
Output:
3
13
35
This:
(len(word) for word in wordslist)
is a generator expression. It is essentially a loop in one line that produces the length of each word. We feed these lengths directly to sum:
sum(len(word) for word in wordslist)
Improved version
This version takes advantage of enumerate, so you save two lines of code, while keeping the readability:
with open('my_words.txt') as infile:
words = 0
characters = 0
for lineno, line in enumerate(infile, 1):
wordslist = line.split()
words += len(wordslist)
characters += sum(len(word) for word in wordslist)
print(lineno)
print(words)
print(characters)
This line:
with open('my_words.txt') as infile:
opens the file with the promise to close it as soon as you leave indentation.
It is always good practice to close file after your are done using it.

Remember that each line (except for the last) has a line separator.
I.e. "\r\n" for Windows or "\n" for Linux and Mac.
Thus, exactly two characters are added in this case, as 47 and not 45.
A nice way to overcome this could be to use:
import os
fname=input("enter the name of the file:")
infile=open(fname, 'r')
lines=0
words=0
characters=0
for line in infile:
line = line.strip(os.linesep)
wordslist=line.split()
lines=lines+1
words=words+len(wordslist)
characters=characters+ len(line)
print(lines)
print(words)
print(characters)

To count the characters, you should count each individual word. So you could have another loop that counts characters:
for word in wordslist:
characters += len(word)
That ought to do it. The wordslist should probably take away newline characters on the right, something like wordslist = line.rstrip().split() perhaps.

I found this solution very simply and readable:
with open("filename", 'r') as file:
text = file.read().strip().split()
len_chars = sum(len(word) for word in text)
print(len_chars)

This is too long for a comment.
Python 2 or 3? Because it really matters. Try out the following in your REPL for both:
Python 2.7.12
>>>len("taña")
5
Python 3.5.2
>>>len("taña")
4
Huh? The answer lies in unicode. That ñ is an 'n' with a combining diacritical. Meaning its 1 character, but not 1 byte. So unless you're working with plain ASCII text, you'd better specify which version of python your character counting function is for.

How's this? It uses a regular expression to match all non-whitespace characters and returns the number of matches within a string.
import re
DATA="""
hey how are you
I am fine and you
Yes I am fine
"""
def get_char_count(s):
return len(re.findall(r'\S', s))
if __name__ == '__main__':
print(get_char_count(DATA))
Output
35
The image below shows this tested on RegExr:

It is probably counting new line characters. Subtract characters with (lines+1)

Here is the code:
fp = open(fname, 'r+').read()
chars = fp.decode('utf8')
print len(chars)
Check the output. I just tested it.

A more Pythonic solution than the others:
with open('foo.txt') as f:
text = f.read().splitlines() # list of lines
lines = len(text) # length of the list = number of lines
words = sum(len(line.split()) for line in text) # split each line on spaces, sum up the lengths of the lists of words
characters = sum(len(line) for line in text) # sum up the length of each line
print(lines)
print(words)
print(characters)
The other answers here are manually doing what str.splitlines() does. There's no reason to reinvent the wheel.

You do have the correct answer - and your code is completely correct. The thing that I think it is doing is that there is an end of line character being passed through, which includes your character count by two (there isn't one on the last line, as there is no new line to go to). If you want to remove this, the simple fudge would be to do as Loaf suggested
characters = characters - (lines - 1)
See csl's answer for the second part...

Simply skip unwanted characters while calling len,
import os
characters=characters+ len([c for c in line if c not in (os.linesep, ' ')])
or sum the count,
characters=characters+ sum(1 for c in line if c not in (os.linesep, ' '))
or build a str from the wordlist and take len,
characters=characters+ len(''.join(wordlist))
or sum the characters in the wordlist. I think this is the fastest.
characters=characters+ sum(1 for word in wordlist for char in word)

You have two problems. One is the line endings and the other is the spaces in between.
Now there are many people who posted pretty good answers, but I find this method easier to understand:
characters = characters + len(line.strip()) - line.strip().count(' ')
line.strip() removes the trailing and leading spaces. Then I'm subtracting the number of spaces from the total length.

It's very simple:
f = open('file.txt', 'rb')
f.seek(0) # Move to the start of file
print len(f.read())

Here i got smallest program with less memory usage for your problem
with open('FileName.txt') as f:
lines = f.readlines()
data = ''.join(lines)
print('lines =',len(lines))
print('Words = ',len(data.split()))
data = ''.join(data.split())
print('characters = ',len(data))
lines will be list of lines,so length of lines is nothing but number of lines.Next step data contains a string of your file contents(each word separated by a whitespace), so if we split data gives list of words in your file. thus, length of that list gives number of words. again if we join the words list you will get all characters as a single string. thus length of that gives number of characters.

taking the input as file name i.e files.txt from the input parameter and then counting the total number of characters in the file and save to the variable
char
fname = input("Enter the name of the file:")
infile = open(fname, 'r') # connection of the file
lines = 0
words = 0
char = 0 # init as zero integer
for line in infile:
wordslist = line.split() # splitting line to word
lines = lines + 1 # counter up the word
words = words + len(wordslist) # splitting word to charac
char = char + len(line) # counter up the character
print("lines are: " + str(lines))
print("words are: " + str(words))
print("chars are: " + str(char)) # printing beautify

num_lines = sum(1 for line in open('filename.txt'))
num_words = sum(1 for word in open('filename.txt').read().split())
num_chars = sum(len(word) for word in open('filename.txt').read().split())

Displaying the Top 10 words in a string

I am writing a program that grabs a txt file off the internet and reads it. It then displays a bunch of data related to that txt file. Now, this all works well, until we get to the end. The last thing I want to do is display the top 10 most frequent words used in the txt file. The code I have right now only displays the most frequent word 10 times. Can someone look at this and tell me what the problem is? The only part you have to look at is the last part.
import urllib
open = urllib.urlopen("http://www.textfiles.com/etext/FICTION/alice30.txt").read()
v = str(open) # this variable makes the file a string
strip = v.replace(" ", "") # this trims spaces
char = len(strip) # this variable counts the number of characters in the string
ch = v.splitlines() # this variable seperates the lines
line = len(ch) # this counts the number of lines
print "Here's the number of lines in your file:", line
wordz = v.split()
print wordz
print "Here's the number of characters in your file:", char
spaces = v.count(' ')
words = ''.join(c if c.isalnum() else ' ' for c in v).split()
words = len(words)
print "Here's the number of words in your file:", words
topten = map(lambda x:filter(str.isalpha,x.lower()),v.split())
print "\n".join(sorted(words,key=words.count)[-10:][::-1])

Use collections.Counter to count all the words, Counter.most_common(10) will return the ten most common words and their count
wordz = v.split()
from collections import Counter
c = Counter(wordz)
print(c.most_common(10))
Using with to open the file and get a count of all the words in the txt file:
from collections import Counter
with open("http://www.textfiles.com/etext/FICTION/alice30.txt") as f:
c = Counter()
for line in f:
c.update(line.split()) # Counter.update adds the values
print(c.most_common(10))
To get total characters in the file get the sum of length of each key multiplied by the times it appears:
print(sum(len(k)*v for k,v in c.items()))
To get the word count:
print(sum(c.values()))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Calculating length of string is inaccurate for certain strings only - python

in the file everyword has an \n to symbolize a new line. in order to cut that out you have to replace: chosen_word = (words[n-1]) by chosen_word = (words[n-1][:-1]) this will cut of the last two letters of the chosen word!

Related

syntax errors on creating wordDictionary of word and occurences

How can I read a line from a file and split it

python - interpreting two spaces as one from text file

Find the number of characters in a file using Python

Displaying the Top 10 words in a string

Categories

Resources