file.readlines leaving blank lines [duplicate]

file.readlines leaving blank lines [duplicate] - python

This question already has answers here:
How to read a file without newlines?
(12 answers)
Closed 2 years ago.
I have read that file.readlines reads the whole file line by line and stores it in a list.
If I have a file like so -
Sentence 1
Sentence 2
Sentence 3
and I use readlines to print each sentence like so -
file = open("test.txt")
for i in file.readlines():
print i
The output is
Sentence 1
Sentence 2
Sentence 3
My question is why do I get the extra line between each sentence and how can I get rid of it?
UPDATE
I found that using i.strip also removes the extra lines. Why does this happen? As far as I know, split removes the white spaces at the end and beginning of a string.

file.readlines() return list of strings. Each string contain trailing newlines. print statement prints the passed parameter with newlnie.; That's why you got extra lines.
To remove extra newline, use str.rstrip:
print i.rstrip('\n')
or use sys.stdout.write
sys.stdout.write(i)
BTW, don't use file.readlines unless you need all lines at once. Just iterate the file.
with open("test.txt") as f:
for i in f:
print i.rstrip('\n')
...
UPDATE
In Python 3, to prevent print prints trailing newline, you can use print(i, end='').
In Python 2, you can use same feature if you do : from __future__ import print_function
Answer to UPDATE
Tabs, Newlines are also considers as whitespaces.
>> ' \r\n\t\v'.isspace()
True

file.readlines()
(and also file.readline()) includes the newlines.
Do
print i.replace('\n', '')
if you don't want them.
It may seem weird to include the newline at the end of the line, but this allows, for example, you to tell whether the last line has a newline character or not. That case in tricky in many languages' I/O.

The below will strip the newline for you.
i = i.rstrip("\n") #strips newline
Hope this helps.

This worked out for me with Python 3.
I found rstrip() function really useful in these situations
for i in readtext:
print(i.rstrip())

with open(txtname, 'r') as txtfile:
lines = txtfile.readlines()
lines = [j for j in lines if j != '\n']
with open(outname, 'w') as outfile:
outfile.writelines(lines)

Related

output gets printed but it has a extra empty line

I'm trying this simple code to print a txt file with a if condition.
Code works fine, but when the output gets printed but it has a extra empty line. how to fix that?
with open('test.txt') as file:
for line in file:
if 'Efficient AP Image Upgrade ..... Enabled' in line:
break
print(line)

The line in line contains a newline character at the end. To avoid the print function to add another newline (the default behaviour), you should call print('line', end='') to specify that you don't want the extra newline.

You are probably using print on strings that already have a final newline -- in fact now that the question has been tidied up we can see that this is the case because you are using an iterator over file, and this will produce a sequence of lines that end with newline characters (except possibly the last line if it does not have a newline in the input file).
Note that after the data items, the print function will write an additional newline by default (more specifically, it will write the value specified by the end parameter, which defaults to a newline).
Possible approaches:
Use sys.stdout.write (does not append newline):
sys.stdout.write(text)
Use print but set it to write empty string instead of newline at the end:
print(text, end='')
Remove any newlines before printing (in principle this may include newlines in the middle of the string but because your strings come from an iterator over file object, there shouldn't be any):
print(text.replace('\n', ''))
Remove any leading or trailing whitespace (including newlines) before printing - note that this may include other spaces:
print(text.strip())

print() by default, creates a new line when you execute that. For your notice, try using
print(line,end='')
or
print(line,end=' ')

To remove the trailing new line you can strip new lines from the right side of the line:
print(text.rstrip('\n'))

Python whitespace in file [duplicate]

This question already has answers here:
How to print without a newline or space
(26 answers)
Closed 3 years ago.
I have a .txt file with some words in it like "example". I also have the following code would be:
Name = open("file.txt", "r")
print(name.read())
print("text")
input()
Why is there a whitespace in the output like
"example
text"
And how do I stop that from happening?

The reason why you have got an "extra" whitespace may be that the file "file.txt" ends with extra whitespace. You should check every byte of the file, especially the '\n' and '\r' characters.
To avoid the problem,
print(name.read().rstrip())
print("text")
str.rstrip wipes out extra whitespaces at the end of the string. Although I am not sure what caused your problem, str.rstrip should stop that from happening.

Use this link:Python Remove Character from String. It will solve your problem.
Name = open("file.txt", "r")
print(Name.read().replace('\n', ''))
print("text")
input()

use can do something like this
ans=''
with open('test.txt','r') as f:
for line in f:
for word in line.split():
ans+=word+' '
print(ans)
separate it word by word and do whatever you want .

How can I format a txt file in python so that extra paragraph lines are removed as well as extra blank spaces?

I'm trying to format a file similar to this: (random.txt)
Hi, im trying to format a new txt document so
that extra spaces between words and paragraphs are only 1.
This should make this txt document look like:
This is how it should look below: (randomoutput.txt)
Hi, I'm trying to format a new txt document so
that extra spaces between words and paragraphs are only 1.
This should make this txt document look like:
So far the code I've managed to make has only removed the spaces, but I'm having trouble making it recognize where a new paragraph starts so that it doesn't remove the blank lines between paragraphs. This is what I have so far.
def removespaces(input, output):
ivar = open(input, 'r')
ovar = open(output, 'w')
n = ivar.read()
ovar.write(' '.join(n.split()))
ivar.close()
ovar.close()
Edit:
I've also found a way to create spaces between paragraphs, but right now it just takes every line break and creates a space between the old line and new line using:
m = ivar.readlines()
m[:] = [i for i in m if i != '\n']
ovar.write('\n'.join(m))

You should process the input line-by line. Not only will this make your program simpler but also more easy on the system's memory.
The logic for normalizing horizontal white space in a line stays the same (split words and join with a single space).
What you'll need to do for the paragraphs is test whether line.strip() is empty (just use it as a boolean expression) and keep a flag whether the previous line was empty too. You simply throw away the empty lines but if you encounter a non-empty line and the flag is set, print a single empty line before it.
with open('input.txt', 'r') as istr:
new_par = False
for line in istr:
line = line.strip()
if not line: # blank
new_par = True
continue
if new_par:
print() # print a single blank line
print(' '.join(line.split()))
new_par = False
If you want to suppress blank lines at the top of the file, you'll need an extra flag that you set only after encountering the first non-blank line.
If you want to go more fancy, have a look at the textwrap module but be aware that is has (or, at least, used to have, from what I can say) some bad worst-case performance issues.

The trick here is that you want to turn any sequence of 2 or more \n into exactly 2 \n characters. This is hard to write with just split and join—but it's dead simple to write with re.sub:
n = re.sub(r'\n\n+', r'\n\n', n)
If you want lines with nothing but spaces to be treated as blank lines, do this after stripping spaces; if you want them to be treated as non-blank, do it before.
You probably also want to change your space-stripping code to use split(' ') rather than just split(), so it doesn't screw up newlines. (You could also use re.sub for that as well, but it isn't really necessary, because turning 1 or more spaces into exactly 1 isn't hard to write with split and join.)
Alternatively, you could just go line by line, and keep track of the last line (either with an explicit variable inside the loop, or by writing a simple adjacent_pairs iterator, like i1, i2 = tee(ivar); next(i2); return zip_longest(i1, i2, fillvalue='')) and if the current line and the previous line are both blank, don't write the current line.

split without Argument will cut your string at each occurence if a whitespace ( space, tab, new line,...).
Write
n.split(" ")
and it will only split at spaces.
Instead of writing the output to a file, put it Ingo a New variable, and repeat the step again, this time with
m.split("\n")

Firstly, let's see, what exactly is the problem...
You cannot have 1+ consecutive spaces or 2+ consecutive newlines.
You know how to handle 1+ spaces.
That approach won't work on 2+ newlines as there are 3 possible situations:
- 1 newline
- 2 newlines
- 2+ newlines
Great so.. How do you solve this then?
There are many solutions. I'll list 3 of them.
Regex based.
This problem is very easy to solve iff1 you know how to use regex...
So, here's the code:
s = re.sub(r'\n{2,}', r'\n\n', in_file.read())
If you have memory constraints, this is not the best way as we read the entire file into the momory.
While loop based.
This code is really self-explainatory, but I wrote this line anyway...
s = in_file.read()
while "\n\n\n" in s:
s = s.replace("\n\n\n", "\n\n")
Again, you have memory constraints, we still read the entire file into the momory.
State based.
Another way to approach this problem is line-by-line. By keeping track whether the last line we encountered was blank, we can decide what to do.
was_last_line_blank = False
for line in in_file:
# Uncomment if you consider lines with only spaces blank
# line = line.strip()
if not line:
was_last_line_blank = True
continue
if not was_last_line_blank:
# Add a new line to output file
out_file.write("\n")
# Write contents of `line` in file
out_file.write(line)
was_last_line_blank = False
Now, 2 of them need you to load the entire file into memory, the other one is fairly more complicated. My point is: All these work but since there is a small difference in ow they work, what they need on the system varies...
1 The "iff" is intentional.

Basically, you want to take lines that are non-empty (so line.strip() returns empty string, which is a False in boolean context). You can do this using list/generator comprehension on result of str.splitlines(), with if clause to filterout empty lines.
Then for each line you want to ensure, that all words are separated by single space - for this you can use ' '.join() on result of str.split().
So this should do the job for you:
compressed = '\n'.join(
' '.join(line.split()) for line in txt.splitlines()
if line.strip()
)
or you can use filter and map with helper function to make it maybe more readable:
def squash_line(line):
return ' '.join(line.split())
non_empty_lines = filter(str.strip, txt.splitlines())
compressed = '\n'.join(map(squash_line, non_empty_lines))

To fix the paragraph issue:
import re
data = open("data.txt").read()
result = re.sub("[\n]+", "\n\n", data)
print(result)

Why doesn't .rstrip('\n') work?

Let's say doc.txt contains
a
b
c
d
and that my code is
f = open('doc.txt')
doc = f.read()
doc = doc.rstrip('\n')
print doc
why do I get the same values?

str.rstrip() removes the trailing newline, not all the newlines in the middle. You have one long string, after all.
Use str.splitlines() to split your document into lines without newlines; you can rejoin it if you want to:
doclines = doc.splitlines()
doc_rejoined = ''.join(doclines)
but now doc_rejoined will have all lines running together without a delimiter.

Because you read the whole document into one string that looks like:
'a\nb\nc\nd\n'
When you do a rstrip('\n') on that string, only the rightmost \n will be removed, leaving all the other untouched, so the string would look like:
'a\nb\nc\nd'
The solution would be to split the file into lines and then right strip every line. Or just replace all the newline characters with nothing: s.replace('\n', ''), which gives you 'abcd'.

rstrip strips trailing spaces from the whole string. If you were expecting it to work on individual lines, you'd need to split the string into lines first using something like doc.split('\n').

Try this instead:
with open('doc.txt') as f:
for line in f:
print line,
Explanation:
The recommended way to open a file is using with, which takes care of closing the file at the end
You can iterate over each line in the file using for line in f
There's no need to call rstrip() now, because we're reading and printing one line at a time

Consider using replace and replacing each instance of '\n' with ''. This would get rid of all the new line characters in the input text.

Eliminating extra commas

I am having trouble replacing three commas with one comma in a text file of data.
I am processing a large text file to put it into comma delimited format so I can query it using a database.
I do the following at the command prompt and it works:
>>> import re
>>> line = 'one,,,two'
>>> line=re.sub(',+',',',line)
>>> print line
one,two
>>>
following below is my actual code:
with open("dmis8.txt", "r") as ifp:
with open("dmis7.txt", "w") as ofp:
for line in ifp:
#join lines by removing a line ending.
line=re.sub('(?m)(MM/ANGDEC)[\r\n]+$','',line)
#various replacements of text with nothing. This removes the text
line=re.sub('IDENTIFIER','',line)
line=re.sub('PART','50-1437',line)
line=re.sub('Eval','',line)
line=re.sub('Feat','',line)
line=re.sub('=','',line)
#line=re.sub('r"++++"','',line)
line=re.sub('r"----|"',' ',line)
line=re.sub('Nom','',line)
line=re.sub('Act',' ',line)
line=re.sub('Dev','',line)
line=re.sub('LwTol','',line)
line=re.sub('UpTol','',line)
line=re.sub(':','',line)
line=re.sub('(?m)(Trend)[\r\n]*$',' ',line)
#Remove spaces replace with semicolon
line=re.sub('[ \v\t\f]+', ',', line)
#no worky line=re.sub(r",,,",',',line)
line=re.sub(',+',',',line)
#line=line.replace(",+", ",")
#line=line.replace(",,,", ",")
ofp.write(line)
This is what i get from the code above:
There are several commas together. I don't understand why they won't get replaced down to one comma.
Never mind that I don't see how the extra commas got there in the first place.
50-1437,d
2012/05/01
00/08/27
232_PD_1_DIA,PED_HL1_CR,,,12.482,12.478,-0.004,-0.021,0.020,----|++++
232_PD_2_DIA_TOP,PED_HL2_TOP,,12.482,12.483,0.001,-0.021,0.020,----|++++
232_PD_2_DIA,PED_HL2_CR,,12.482,12.477,-0.005,-0.021,0.020,----|++++
232_PD_2_DIA_BOT,PED_HL2_BOT,,12.482,12.470,-0.012,-0.021,0.020,--|--++++
raw data for reference:
PART IDENTIFIER : d
2012/05/01
00/08/27
232_PD_1_DIA Eval Feat = PED_HL1_CR MM/ANGDEC
Nom Act Dev LwTol UpTol Trend
12.482 12.478 -0.004 -0.021 0.020 ----|++++
232_PD_2_DIA_TOP Eval Feat = PED_HL2_TOP MM/ANGDEC
12.482 12.483 0.001 -0.021 0.020 ----|++++
232_PD_2_DIA Eval Feat = PED_HL2_CR MM/ANGDEC
12.482 12.477 -0.005 -0.021 0.020 ----|++++
Can someone kindly point what I am doing wrong?
thanks in advance...

Your regex is working fine. The problem is that it you concatenate the lines (by write()ing them) after you scrub them with your regex.
Instead, use "".join() on all of your lines, run re.sub() on the whole thing, and then write() it all to the file at once.

I think your problem is caused by the fact that removing line endings does not join lines, in combination with the fact that write does not add newlines to the end each string. So you have multiple input lines that look like a single line in the output.
Looking at the comments, you seem to think that just replacing the end of the line by an empty string will magically append the next line to it, but that doesn't actually work. So the three commas you're seeing are not replaced by your re.sub command because they're not in one line, they're multiple input lines (which after all the replacements are empty except for commas) which get printed to a single output line because you stripped their '\n' characters, and write doesn't automatically add '\n' to the end of each written string (unlike print).
To debug your code, just put print line after each line of code, to see what each "line" actually is - that should help you see what's going wrong.
In general, reading file formats where each "record" spans multiple lines requires more complicated methods than just a for line in file loop.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.