Python: How to count average length of lines in a text file - python

I'm having trouble defining a function that takes a filename as a parameter and returns the average line length. Particularly having trouble removing the "\n" in the file.
This is what I have so far:
def averageLineLength(fn):
fn = open("Ex_file.txt", "r")
lines = fn.readlines()
return (sum(len(line) for line in lines) / len(lines))

You can use strip() to remove preceding and trailing \n from the line and minor modification to your own code should be sufficient
def averageLineLength(fn):
fn = open("Ex_file.txt", "r")
lines = fn.readlines()
return sum([len(line.strip('\n')) for line in lines]) / len(lines)
This cleans out all leading and trailing whitespace. If you only want to strip out \n at the end of the line
(sum(len(line.replace("\n",'')) for line in lines) / len(lines))

The common way to get rid of whitespace (such as newlines) in python is strip. There's also rstrip if you want to preserve the left side, and you can give any of them an argument if you only want to target newlines specifically:
>>> ' Hello, world \n'.strip()
'Hello, world'
>>> ' Hello, world \n'.rstrip()
' Hello, world'
>>> ' Hello, world \n'.strip('\n')
' Hello, world '
Two other notes: The original function did not actually use the filename, and in Python 2 it also performs integer division (which may or may not be intentional). With these modifications:
def averageLineLength(fn):
with open(fn) as f:
lines = [line.strip() for line in f]
return 1.0 * sum(map(len, lines)) / len(lines)

What you have already will tell you the average line length.
There are a few different methods for handling removal of the '\n'.
The simplest is to just use the "strip" method. This will remove all leading and trailing whitespace from each line.
If you want to remove only the trailing '\n'. You can write a simple list comprehension like this to do so.
[l[:-1] if l[-1] == "\n" else l for l in lines]
Or simply remove the final character without checking, trusting it is a '\n' because you used "readlines".
[l[:-1] for l in lines]
You should also use the "with" construction block on your file to ensure it is closed when the program exits the block. Making these changes, your function becomes the following:
def averageLineLength(fn):
with open("Ex_file.txt", "r") as fn:
lines = [l.strip() for l in fn.readlines()]
return (sum(len(line) for line in lines) / len(lines)
Or alternatively, if you want to preserve leading and trailing whitespace that is not '\n':
def averageLineLength(fn):
with open("Ex_file.txt", "r") as fn:
lines = [l[:-1] for l in fn.readlines()]
return (sum(len(line) for line in lines) / len(lines)

This solution should solve the problem as well:
def averageLineLength(fn):
with open('Ex_file.txt".txt', 'r') as fn:
lst = fn.readlines()
return sum([len(line.strip()) for line in lst]) / len(lst)

Related

Is there any shortcut in Python to remove all blanks at the end of each line in a file?

I've learned that we can easily remove blank lined in a file or remove blanks for each string line, but how about remove all blanks at the end of each line in a file ?
One way should be processing each line for a file, like:
with open(file) as f:
for line in f:
store line.strip()
Is this the only way to complete the task ?
Possibly the ugliest implementation possible but heres what I just scratched up :0
def strip_str(string):
last_ind = 0
split_string = string.split(' ')
for ind, word in enumerate(split_string):
if word == '\n':
return ''.join([split_string[0]] + [ ' {} '.format(x) for x in split_string[1:last_ind]])
last_ind += 1
Don't know if these count as different ways of accomplishing the task. The first is really just a variation on what you have. The second does the whole file at once, rather than line-by-line.
Map that calls the 'rstrip' method on each line of the file.
import operator
with open(filename) as f:
#basically the same as (line.rstrip() for line in f)
for line in map(operator.methodcaller('rstrip'), f)):
# do something with the line
read the whole file and use re.sub():
import re
with open(filename) as f:
text = f.read()
text = re.sub(r"\s+(?=\n)", "", text)
You just want to remove spaces, another solution would be...
line.replace(" ", "")
Good to remove white spaces.

how to replace a line of two words in a file using python

I want to replace a line in a file but my code doesn't do what I want. The code doesn't change that line. It seems that the problem is the space between ALS and 4277 characters in the input.txt. I need to keep that space in the file. How can I fix my code?
A part part of input.txt:
ALS 4277
Related part of the code:
for lines in fileinput.input('input.txt', inplace=True):
print(lines.rstrip().replace("ALS"+str(4277), "KLM" + str(4945)))
Desired output:
KLM 4945
Using the same idea that other user have already pointed out, you could also reproduce the same spacing, by first matching the spacing and saving it in a variable (spacing in my code):
import re
with open('input.txt') as f:
lines = f.read()
match = re.match(r'ALS(\s+)4277', lines)
if match != None:
spacing = match.group(1)
lines = re.sub(r'ALS\s+4277', 'KLM%s4945'%spacing, lines.rstrip())
print lines
As the spaces vary you will need to use regex to account for the spaces.
import re
lines = "ALS 4277 "
line = re.sub(r"(ALS\s+4277)", "KLM 4945", lines.rstrip())
print(line)
Try:
with open('input.txt') as f:
for line in f:
a, b = line.strip().split()
if a == 'ALS' and b == '4277':
line = line.replace(a, 'KLM').replace(b, '4945')
print(line, end='') # as line has '\n'

Trim whitespace from multiple lines

I tried to trim whitespace in python using s.strip() like this, but it's only working on the first line:
Input:
a
b
Output:
a
b
How do I get it to trim whitespace from multiple lines? Here's my code:
Code:
import sys
if __name__ == "__main__":
text_file = open("input.txt", "r")
s = text_file.read()
s = s.strip()
text_file.close()
with open("Output.txt", "w") as text_file:
text_file.write(s)
Split the lines, strip each, then re-join:
s = text_file.read()
s = '\n'.join([line.strip() for line in s.splitlines()])
This uses the str.splitlines() method, together with the str.join() method to put the lines together again with newlines in between.
Better still, read the file line by line, process and write out in one go; that way you need far less memory for the whole process:
with open("input.txt", "r") as infile, open("Output.txt", "w") as outfile:
for line in infile:
outfile.write(line.strip() + '\n')
The issue occurs because string.strip() only strips the trailing and leading whitespaces, it does not strip the whitespaces in the middle.
For the input -
a
b
And doing text_file.read() .
The actual string representation would be -
' a\n b'
s.strip() would strip the trailing and leading whitespaces , but not the \n and spaces in the middle, hence you are getting the multiple lines and the spaces in the middle are not getting removed.
For your case to work, you should read the input line by line and then strip each line and write it back.
Example -
import sys
if __name__ == "__main__":
with open("input.txt", "r") as text_file, open("Output.txt", "w") as out_file:
for line in text_file:
out_file.write(line.strip() + '\n')
Use
for line in s.splitlines()
to iterate over each line and use strip() for them.
Just for completeness, there is also textwrap.dedent(),
which e.g. allows to write multi-line strings indented in code (for readability), while the resulting strings do not have left-hand side whitespaces.
For example as given in https://docs.python.org/3/library/textwrap.html#textwrap.dedent
import textwrap
def test():
# end first line with \ to avoid the empty line!
s = '''\
hello
world
'''
print(repr(s)) # prints ' hello\n world\n '
print(repr(dedent(s))) # prints 'hello\n world\n'

python not writing when a space in text

I have working code which writes 'hi' in text file if 'Welcome' is present in the next line.
But, if the next line begins with whitespace before word 'Welcome' then it doesnot displays 'hi'
Code:
with open('afile.txt', 'r+') as f:
a = [x.rstrip() for x in f]
index = 0
for item in a:
if item.startswith("Welcome"):
a.insert(index, "hi")
break
index += 1
# Go to start of file and clear it
f.seek(0)
f.truncate()
# Write each line back
for line in a:
f.write(line + "\n")
Input: afile.txt
Welcome here
Good place
Expected output:
hi
Welcome here
Good place
I need to preserve my indendation also. How can I do that?
You are currently checking for Welcome directly. Instead, strip your line of whitespaces, and use the following condition instead
if item.strip().startswith("Welcome"):
EDIT
I see you've done rstrip earlier in a = [x.rstrip() for x in f]. Do a lstrip instead to remove whitespaces from the left. However, if you do this, your indentation will not be preserved.
In the line :
a = [x.rstrip() for x in f]
replace rstip with strip and you are good to go ...

How to read a text file into a string variable and strip newlines?

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

Categories

Resources