I want to read a file and remove the spaces. I swear I've done this multiple times, but some reason the method I used to use doesn;t seem to be working. I must be making some small mistake somewhere, so I decided to make a small practice file (because the files I actually need to use are EXTREMELY LARGE) to find out.
the original file says:
abcdefg
(new line)
hijklmn
but I want it to say:
abcdefghijklmn
file = open('please work.txt', 'r')
for line in file:
lines = line.strip()
print(lines)
close.file()
However, it just says:
abcdefg
(new line)
hijklmn
and when I use line.strip('\n') it says:
abcdefg
(big new line)
hijklmn
Any help will be greatly appreciated, because this was the first thing I learned and suddenly I can't remember how to use it!
If what you want to do is to concatenate each line into a single line, you could utilize rstrip and concatenate to a result variable:
with open('test.txt', 'r') as fin:
lines = ''
for line in fin:
stripped_line = line.rstrip()
lines += stripped_line
print(lines)
From a text file looking like this:
abcdefg hijklmnop
this is a line
The result would be abcdefg hijklmnopthis is a line. If you did want to remove the whitespace as well you could lines = lines.replace(' ','') after the loop which would result in abcdefghijklmnopthisisaline.
The (new line) in your output is from the print, which will output a \n. you can use print(lines, end='') to remove it.
strip() only removes leading & trailing spaces.
You can use string.replace(' ', '') to remove all spaces.
'abcdefg (new line) hijklmn'.replace(' ', '')
If your file has tab newline or other forms of spaces, the above will not work and you will need to use regex to remove all forms of space in the file.
import re
string = '''this is a \n
test \t\t\t
\r
\v
'''
re.sub(r'\s', '', string)
#'thisisatest'
Related
I have a csv
"AA","AB","AC"
"BA","BB","BC"
"CA","CB","CC"
after removing a string say " the csv format changes to
AA,AB,AC
BA,BB,BC
CA,CB,CB
What should I do to avoid the unwanted lines ?
import fileinput
for line in fileinput.FileInput("test.csv",inplace=1):
line = line.replace('"','')
print (line)
Looks like you're printing it, looks like Python 3, and looks like your file content already includes the necessary newlines. Therefore, you need to tell the print() function not to add its own newlines:
print(line, end='')
When read each line includes the terminating new line character. Furthermore print() will also add a new line of it's own, so you end up with two new lines.
But you are not using strip() as suggested by your question's title.
To get around that you can use rstrip() to remove any whitespace at the end of each line:
import fileinput
for line in fileinput.FileInput("test.csv",inplace=1):
line = line.replace('"','').rstrip()
print (line)
That will get rid of the extra new line characters, but note that it will also remove other whitespace at the end of the line.
An alternative is to prevent print() adding its own new line:
Python 2:
print(line), # comma prevents new line
Python 3:
print(line, end='')
Why are you doing this? You should use csv module , it would handle both the , as well as the quotes for you. Example -
import csv
with fileinput.FileInput('test.csv',inplace=1) as f:
reader = csv.reader(f)
for row in reader:
print (','.join(row))
Example/Demo -
>>> import csv
>>> with fileinput.FileInput('test.csv',inplace=1) as f:
... reader = csv.reader(f)
... for row in reader:
... print(','.join(row))
...
AA,AB,AC
BA,BB,BC
CA,CB,CC
You are seeing extra lines because the lines read from the file end with '\n' and the print(line) statement appends an extra newline.
You can use rstrip() to strip out the trailing newline:
import fileinput
for line in fileinput.FileInput("test.csv",inplace=1):
line = line.rstrip().replace('"','')
print (line)
I always have replaced \t\t with \t999999999\t by coding like
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\t', '\t999999999\t'),
So I thought coding like the following will work for replacing \t\r with \t999999999\r
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\r', '\t999999999\r'),
But surprisingly it doesn't.
The input is tab-delimited txt.
Is \r something special that it can't be replaced in usual way? Then how can I replace it by python?
===Question edited====
I tried this.
for line in fileinput.input(input, inplace = 1):
print line.replace('\t\n', '\t999999999\n'),
It works!
My input was separating lines by \r\n
Perhaps Python reads \r\n just as \n ?
Perhaps that's why it worked?
Does this code work if input separates lines by \r only?
\r is (part of) a line separator. Python normalises line separators when reading files in text mode, using only \n for lines; \r and \r\n are replaced by \n when reading.
Note: When using fileinput you need to strip the newline from line otherwise you end up with double newlines in your output, rather than use print ..,:
for line in fileinput.input(input, inplace = 1):
line = line.replace('\t\n', '\t999999999\n')
print line.rstrip('\n')
print .., adds an extra space to all your lines.
The things I've googled haven't worked, so I'm turning to experts!
I have some text in a tab-delimited text file that has some sort of carriage return in it (when I open it in Notepad++ and use "show all characters", I see [CR][LF] at the end of the line). I need to remove this carriage return (or whatever it is), but I can't seem to figure it out. Here's a snippet of the text file showing a line with the carriage return:
firstcolumn secondcolumn third fourth fifth sixth seventh
moreoftheseventh 8th 9th 10th 11th 12th 13th
Here's the code I'm trying to use to replace it, but it's not finding the return:
with open(infile, "r") as f:
for line in f:
if "\n" in line:
line = line.replace("\n", " ")
My script just doesn't find the carriage return. Am I doing something wrong or making an incorrect assumption about this carriage return? I could just remove it manually in a text editor, but there are about 5000 records in the text file that may also contain this issue.
Further information:
The goal here is select two columns from the text file, so I split on \t characters and refer to the values as parts of an array. It works on any line without the returns, but fails on the lines with the returns because, for example, there is no element 9 in those lines.
vals = line.split("\t")
print(vals[0] + " " + vals[9])
So, for the line of text above, this code fails because there is no index 9 in that particular array. For lines of text that don't have the [CR][LF], it works as expected.
Depending on the type of file (and the OS it comes from, etc), your carriage return might be '\r', '\n', or '\r'\n'. The best way to get rid of them regardless of which one they are is to use line.rstrip().
with open(infile, "r") as f:
for line in f:
line = line.rstrip() # strip out all tailing whitespace
If you want to get rid of ONLY the carriage returns and not any extra whitespaces that might be at the end, you can supply the optional argument to rstrip:
with open(infile, "r") as f:
for line in f:
line = line.rstrip('\r\n') # strip out all tailing whitespace
Hope this helps
Here's how to remove carriage returns without using a temporary file:
with open(file_name, 'r') as file:
content = file.read()
with open(file_name, 'w', newline='\n') as file:
file.write(content)
Python opens files in so-called universal newline mode, so newlines are always \n.
Python is usually built with universal newlines support; supplying 'U'
opens the file as a text file, but lines may be terminated by any of
the following: the Unix end-of-line convention '\n', the Macintosh
convention '\r', or the Windows convention '\r\n'. All of these
external representations are seen as '\n' by the Python program.
You iterate through file line-by-line. And you are replacing \n in the lines. But in fact there are no \n because lines are already separated by \n by iterator and each line contains no \n.
You can just read from file f.read(). And then replace \n in it.
with open(infile, "r") as f:
content = f.read()
content = content.replace('\n', ' ')
#do something with content
Technically, there is an answer!
with open(filetoread, "rb") as inf:
with open(filetowrite, "w") as fixed:
for line in inf:
fixed.write(line)
The b in open(filetoread, "rb") apparently opens the file in such a way that I can access those line breaks and remove them. This answer actually came from Stack Overflow user Kenneth Reitz off the site.
Thanks everyone!
I've created a code to do it and it works:
end1='C:\...\file1.txt'
end2='C:\...\file2.txt'
with open(end1, "rb") as inf:
with open(end2, "w") as fixed:
for line in inf:
line = line.replace("\n", "")
line = line.replace("\r", "")
fixed.write(line)
I have a piece of code that's removing some unwanted lines from a text file and writing the results to a new one:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write('%s\n' % line)
The original file is single-spaced, but the new file has an empty line between each line of text. How can I lose the empty lines?
You're not removing the newline from the input lines, so you shouldn't be adding one (\n) on output.
Either strip the newlines off the lines you read or don't add new ones as you write it out.
Just do:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write(line)
Every line that you read from original file has \n (new line) character at the end, so do not add another one (right now you are adding one, which means you actually introduce empty lines).
My guess is that the variable "line" already has a newline in it, but you're writing an additional newline with the g.write('%s*\n*' % line)
line has a newline at the end.
Remove the \n from your write, or rstrip line.
I am trying to write a jython code for deleting spaces from Text file.I have a following scenario.
I have a text file like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
Here all lines are single lines.
But what i want is like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
So in all i want to start checking from second line till i encounter END and delete all spaces at tyhe end of each line.
Can someone guide me for writing this code??
tried like:
srcfile=open('d:/BUR001.txt','r')
trgtfile=open('d:/BUR002.txt','w')
readfile=srcfile.readline()
while readfile:
trgtfile.write(readfile.replace('\s',''))
readfile=srcfile.readline()
srcfile.close()
trgtfile.close()
Thanks,
Mahesh
You can use fact that those special lines starts with special values:
line = srcfile.readline()
while line:
line2 = line
if not line2.startswith('START') and not line2.startswith('END'):
line2 = line2.replace(' ','')
trgtfile.write(line2)
line = srcfile.readline()
Also note that with readline() result strings ends with \n (or are empty at end of input file), and this code removes all spaces from the line, not only those at end of the line.
If I understood your example all you want is to remove empty lines, so instead of reading file line by line read it at once:
content = srcfile.read()
and then remove empty lines from content:
while '\n\n' in content:
content = content.replace('\n\n', '\n')