How to remove all whitespace and newlines? - python

Assuming I have a file that contains the following:
Assume <tab> is actually a tab and <space> is actually a space. (ignore quotes)
"
<tab><tab>
<space>
<tab>
The clothes at
the superstore are
at a discount today.
"
Assume this is in a text file. How do I remove all the spaces such that the resulting text file is (ignore the quotes:
"
The clothes at
the superstore are
at a discount today.
"

Try this, assuming you don't want to overwrite the old file. Easy to adapt if you do:
oldfile = open("EXISTINGFILENAME", "r")
data = oldfile.read()
oldfile.close()
stripped_data = data.lstrip()
newfile = open("NEWFILENAME", "w")
newfile.write(stripped_data)
newfile.close()
Note that this will only remove leading whitespace, to remove any trailing whitespace as well, use strip in place of lstrip.

Something like this perhaps (don't know if you need a python solution or if cmdline-tools are ok):
$ cat -t INPUT
^I^I
^I^I
"^I
^I^I^I
^I ghi
"
$ sed '/^[ ]*$/d' INPUT
"
ghi
"
I.e. remove lines only containing spaces/and/or tabs as well as empty limes.

If you want to preserve indentation and trailing space on the lines in your output file, test the stripped line, but write the raw line.
This also uses context managers, and works in Python 2.7:
with open('EXISTINGFILE', 'r') as fin, open('NEWFILE', 'w') as fout:
for line in fin:
if line.strip():
fout.write(line)
If you want to do other processing, I'd suggest defining that in its own function body, and calling that function:
def process_line(line):
# for example
return ''.join(('Payload:\t', line.strip().upper(), '\tEnd Payload\n'))
with open('EXISTINGFILE', 'r') as fin, open('NEWFILE', 'w') as fout:
for line in fin:
if line.strip():
fout.write(process_line(line))
Rereading your question, I see that you only asked about removing whitespace at the beginning of your file. If you want to get EVERY line of your source file after a certain condition is met, you can set a flag for that condition, and switch your output based on the flag.
For example, if you want to remove initial lines of whitespace, process non-whitespace lines, and not remove or process all whitespace lines after you have at least one line of data, you could do this:
def process_line(line):
# for example
return ''.join(('Payload:\t', line.strip().upper(), '\tEnd Payload\n'))
with open('EXISTINGFILE', 'r') as fin, open('NEWFILE', 'w') as fout:
have_paydata = False
for line in fin:
if line.strip():
have_paydata = True if not have_paydata
fout.write(process_line(line))
elif have_paydata:
fout.write(line)

strip() removes all leading/trailing whitespace, then after we do that test if there are any characters left in the line:
with f as open("file.txt", "r"):
for line in f:
if len(line.strip()):
print line

lstrip will remove all whitespace from the beginning of a string. If you need to keep the leading whitespace on the first text line, use a regex instead:
import re
data = '''\
\t\t
\t
The clothes at
the superstore are
at a discount today.
'''
# Remove ALL whitespace from the start of string
print(data.lstrip())
# Remove all whitespace from start of string up to and including a newline
print(re.sub(r'^\s*\n',r'',data))
Output:
The clothes at
the superstore are
at a discount today.
The clothes at
the superstore are
at a discount today.
To modify a file this way:
# A with statement closes the file on exit from the block
with open('data.txt') as f:
data = f.read()
data = re.sub(r'^\s*\n',r'',data))
with open('data.txt','w') as f:
f.write(data)

Related

python doesn't append each line but skips some

I have a complete_list_of_records which has a length of 550
this list would look something like this:
Apples
Pears
Bananas
The issue is that when i use:
with open("recordedlines.txt", "a") as recorded_lines:
for i in complete_list_of_records:
recorded_lines.write(i)
the outcome of the file is 393 long and the structure someplaces looks like so
Apples
PearsBananas
Pineapples
I have tried with "w" instead of "a" append and manually inserted "\n" for each item in the list but this just creates blank spaces on every second row and still som rows have the same issue with dual lines in one.
Anyone who has encountered something similar?
From the comments seen so far, I think there are strings in the source list that contain newline characters in positions other than at the end. Also, it seems that some strings end with newline character(s) but not all.
I suggest replacing embedded newlines with some other character - e.g., underscore.
Therefore I suggest this:
with open("recordedlines.txt", "w") as recorded_lines:
for line in complete_list_of_records:
line = line.rstrip() # remove trailing whitespace
line = line.replace('\n', '_') # replace any embedded newlines with underscore
print(line, file=recorded_lines) # print function will add a newline
You could simply strip all whitespaces off in any case and then insert a newline per hand like so:
with open("recordedlines.txt", "a") as recorded_lines:
for i in complete_list_of_records:
recorded_lines.write(i.strip() + "\n")
you need to use
file.writelines(listOfRecords)
but the list values must have '\n'
f = open("demofile3.txt", "a")
li = ["See you soon!", "Over and out."]
li = [i+'\n' for i in li]
f.writelines(li)
f.close()
#open and read the file after the appending:
f = open("demofile3.txt", "r")
print(f.read())
output will be
See you soon!
Over and out.
you can also use for loop with write() having '\n' at each iteration
[Soln][1]
complete_list_of_records =['1.Apples','2.Pears','3.Bananas','4.Pineapples']
with open("recordedlines.txt", "w") as recorded_lines:
for i in complete_list_of_records:
recorded_lines.write(i+"\n")
I think it should work.
Make sure that, you write as a string.

how to replace a line of two words in a file using python

I want to replace a line in a file but my code doesn't do what I want. The code doesn't change that line. It seems that the problem is the space between ALS and 4277 characters in the input.txt. I need to keep that space in the file. How can I fix my code?
A part part of input.txt:
ALS 4277
Related part of the code:
for lines in fileinput.input('input.txt', inplace=True):
print(lines.rstrip().replace("ALS"+str(4277), "KLM" + str(4945)))
Desired output:
KLM 4945
Using the same idea that other user have already pointed out, you could also reproduce the same spacing, by first matching the spacing and saving it in a variable (spacing in my code):
import re
with open('input.txt') as f:
lines = f.read()
match = re.match(r'ALS(\s+)4277', lines)
if match != None:
spacing = match.group(1)
lines = re.sub(r'ALS\s+4277', 'KLM%s4945'%spacing, lines.rstrip())
print lines
As the spaces vary you will need to use regex to account for the spaces.
import re
lines = "ALS 4277 "
line = re.sub(r"(ALS\s+4277)", "KLM 4945", lines.rstrip())
print(line)
Try:
with open('input.txt') as f:
for line in f:
a, b = line.strip().split()
if a == 'ALS' and b == '4277':
line = line.replace(a, 'KLM').replace(b, '4945')
print(line, end='') # as line has '\n'

Trim whitespace from multiple lines

I tried to trim whitespace in python using s.strip() like this, but it's only working on the first line:
Input:
a
b
Output:
a
b
How do I get it to trim whitespace from multiple lines? Here's my code:
Code:
import sys
if __name__ == "__main__":
text_file = open("input.txt", "r")
s = text_file.read()
s = s.strip()
text_file.close()
with open("Output.txt", "w") as text_file:
text_file.write(s)
Split the lines, strip each, then re-join:
s = text_file.read()
s = '\n'.join([line.strip() for line in s.splitlines()])
This uses the str.splitlines() method, together with the str.join() method to put the lines together again with newlines in between.
Better still, read the file line by line, process and write out in one go; that way you need far less memory for the whole process:
with open("input.txt", "r") as infile, open("Output.txt", "w") as outfile:
for line in infile:
outfile.write(line.strip() + '\n')
The issue occurs because string.strip() only strips the trailing and leading whitespaces, it does not strip the whitespaces in the middle.
For the input -
a
b
And doing text_file.read() .
The actual string representation would be -
' a\n b'
s.strip() would strip the trailing and leading whitespaces , but not the \n and spaces in the middle, hence you are getting the multiple lines and the spaces in the middle are not getting removed.
For your case to work, you should read the input line by line and then strip each line and write it back.
Example -
import sys
if __name__ == "__main__":
with open("input.txt", "r") as text_file, open("Output.txt", "w") as out_file:
for line in text_file:
out_file.write(line.strip() + '\n')
Use
for line in s.splitlines()
to iterate over each line and use strip() for them.
Just for completeness, there is also textwrap.dedent(),
which e.g. allows to write multi-line strings indented in code (for readability), while the resulting strings do not have left-hand side whitespaces.
For example as given in https://docs.python.org/3/library/textwrap.html#textwrap.dedent
import textwrap
def test():
# end first line with \ to avoid the empty line!
s = '''\
hello
world
'''
print(repr(s)) # prints ' hello\n world\n '
print(repr(dedent(s))) # prints 'hello\n world\n'

How to write lines from a input file to an output file in reversed order in python 3

What I want to do is take a series of lines from one text document, and put them in reverse in a second. For example text document a contains:
hi
there
people
So therefore I would want to write these same lines to text document b, except like this:
people
there
hi
So far I have:
def write_matching_lines(input_filename, output_filename):
infile = open(input_filename)
lines = infile.readlines()
outfile = open(output_filename, 'w')
for line in reversed(lines):
outfile.write(line.rstrip())
infile.close()
outfile.close()
but this only returns:
peopletherehi
in one line. any help would be appreciated.
One line will do:
open("out", "wb").writelines(reversed(open("in").readlines()))
You just need to + '\n' since .write does not do that for you, alternatively you can use
print >>f, line.rstrip()
equivalently in Python 3:
print(line.rstrip(), file=f)
which will add a new line for you. Or do something like this:
>>> with open('text.txt') as fin, open('out.txt', 'w') as fout:
fout.writelines(reversed([line.rstrip() + '\n' for line in fin]))
This code assumes that you don't know if the last line has a newline or not, if you know it does you can just use
fout.writelines(reversed(fin.readlines()))
Why do you rstrip() your line before writing it? You're stripping off the newline at the end of each line as you write it. And yet you then notice that you don't have any newlines. Simply remove the rstrip() in your write.
Less is more.
Update
If I couldn't prove/verify that the last line has a terminating newline, I'd personally be inclined to mess with the one line where it mattered, up front. E.g.
....
outfile = open(output_filename, 'w')
lines[-1] = lines[-1].rstrip() + '\n' # make sure last line has a newline
for line in reversed(lines):
outfile.write(line)
....
with open(your_filename) as h:
print ''.join(reversed(h.readlines()))
or, if you want to write it to other stream:
with open(your_filename_out, 'w') as h_out:
with open(your_filename_in) as h_in:
h_out.write(''.join(reversed(h_in.readlines()))

How do I write only certain lines to a file in Python?

I have a file that looks like this(have to put in code box so it resembles file):
text
(starts with parentheses)
tabbed info
text
(starts with parentheses)
tabbed info
...repeat
I want to grab only "text" lines from the file(or every fourth line) and copy them to another file. This is the code I have, but it copies everything to the new file:
import sys
def process_file(filename):
output_file = open("data.txt", 'w')
input_file = open(filename, "r")
for line in input_file:
line = line.strip()
if not line.startswith("(") or line.startswith(""):
output_file.write(line)
output_file.close()
if __name__ == "__main__":
process_file(sys.argv[1])
The reason why your script is copying every line is because line.startswith("") is True, no matter what line equals.
You might try using isspace to test if line begins with a space:
def process_file(filename):
with open("data.txt", 'w') as output_file:
with open(filename, "r") as input_file:
for line in input_file:
line=line.rstrip()
if not line.startswith("(") or line[:1].isspace():
output_file.write(line)
with open('data.txt','w') as of:
of.write(''.join(textline
for textline in open(filename)
if textline[0] not in ' \t(')
)
To write every fourth line use slice result[::4]
with open('data.txt','w') as of:
of.write(''.join([textline
for textline in open(filename)
if textline[0] not in ' \t('][::4])
)
I need not to rstrip the newlines as I use them with write.
In addition to line.startswith("") always being true, line.strip() will remove the leading tab forcing the tabbed data to be written as well. change it to line.rstrip() and use \t to test for a tab. That part of your code should look like:
line = line.rstrip()
if not line.startswith(('(', '\t')):
#....
In response to your question in the comments:
#edited in response to comments in post
for i, line in input_file:
if i % 4 == 0:
output_file.write(line)
try:
if not line.startswith("(") and not line.startswith("\t"):
without doing line.strip() (this will strip the tabs)
So the issue is that (1) you are misusing boolean logic, and (2) every possible line starts with "".
First, the boolean logic:
The way the or operator works is that it returns True if either of its operands is True. The operands are "not line.startswith('(')" and "line.startswith('')". Note that the not only applies to one of the operands. If you want to apply it to the total result of the or expression, you will have to put the whole thing in parentheses.
The second issue is your use of the startswith() method with a zero-length strong as an argument. This essentially says "match any string where the first zero characters are nothing. It matches any strong you could give it.
See other answers for what you should be doing here.

Categories

Resources