Script for deleting whitespace for multiple files

Script for deleting whitespace for multiple files - python

I have developed a script which deletes all whitespaces at the end of the file.
import sys
with open("/Users/XXXXX/Desktop/XXXXX.txt") as infile:
lines = infile.read()
while lines.endswith("\n"):
lines = lines[:-2]
with open("/Users/XXXXX/Desktop/XXXXX.txt", 'w') as outfile:
for line in lines:
outfile.write(line)
The script works fine but I have two thousand small files in a folder where I need to delete all whitespaces.
Can someone guide me on how to change my script, so I can open each file in a folder and run the script above ?
thanks,

Try the following code :
import os
import sys
def removeNewLines(file):
with open(file , 'r') as infile:
lines = infile.read()
while lines.endswith("\n"):
lines = lines[:-2]
with open(file, 'w') as outfile:
for line in lines:
outfile.write(line)
all_files = os.listdir('FOLDER PATH')
for file in all_files:
removeNewLines(file)

Related

Write() starts halfway through

I am new to python and I really don't understand why this is happening: when I run my code, the lower() is only applied to half (or less) of the text file. How I can fix this?
import glob, os, string, re
list_of_files = glob.glob("/Users/louis/Downloads/assignment/data2/**/*.txt")
for file_name in list_of_files:
f = open(file_name, 'r+')
for line in f:
line = line.lower()
f.write(line)

The problem is most probably because you are reading and writing at the same time. And you need to return to the start of the file to write in place of the original content. Try this:
for file_name in list_of_files:
with open(file_name, 'r+') as f:
content = f.read().lower()
f.seek(0, 0) # returns to the start of the file
f.write(content)

Is there a way to read and write on multiple text files in a folder using Python?

I am trying to retrieve several text files from a folder. Afterwards, I am trying to read all of the files within the directory to which I then append a blank line at the top of each file.
However, once I run the program it does not execute what I desire. This is the code:
import os
folderPath = "./textFiles"
def myFilesAddEmptyLine():
for file in os.listdir(folderPath):
if file.endswith(".txt"):
with open(file, "r+") as myFile:
# print(myFile)
# ^ This returns "<_io.TextIOWrapper name='test.txt' mode='r+' encoding='cp1252'>" in the console.
fileContent = myFile.read()
myFile.seek(0, 0)
myFile.write("\n" + fileContent)
myFilesAddEmptyLine()
On the other hand, if I read the file directly without trying to automate the process using os, it executes what I am trying to achieve flawlessly. Therefore, the following piece of code opens the file and appends a blank line at the top of the file.
def myFilesAddEmptyLine():
with open("test.txt", "r+") as myFile:
fileContent = myFile.read()
myFile.seek(0, 0)
myFile.write("\n" + fileContent)
myFilesAddEmptyLine()
Could anyone kindly outline what the issue with the first piece of code is? Thanks in advance!

As user #asylumax pointed out in the comments, this:
import os
folderPath = "./textFiles"
def myFilesAddEmptyLine():
for file in os.listdir(folderPath):
if file.endswith(".txt"):
with open(file, "r+") as myFile:
fileContent = myFile.read()
myFile.seek(0, 0)
myFile.write("\n" + fileContent)
myFilesAddEmptyLine()
Needed to be changed to this:
import os
folderPath = "./textFiles"
def myFilesAddEmptyLine():
for file in os.listdir(folderPath):
if file.endswith(".txt"):
with open(os.path.join(folderPath, file), "r+") as myFile: #This is the line that needed changing.
fileContent = myFile.read()
myFile.seek(0, 0)
myFile.write("\n" + fileContent)
print(myFile)
myFilesAddEmptyLine()

Read txt files, results in empty lines

I have some problem to open and read a txt-file in Python. The txt file contains text (cat text.txt works fine in Terminal). But in Python I only get 5 empty lines.
print open('text.txt').read()
Do you know why?

I solved it. Was a utf-16 file.
print open('text.txt').read().decode('utf-16-le')

if this prints the lines in your file then perhaps the file your program is selecting is empty? I don't know, but try this:
import tkinter as tk
from tkinter import filedialog
import os
def fileopen():
GUI=tk.Tk()
filepath=filedialog.askopenfilename(parent=GUI,title='Select file to print lines.')
(GUI).destroy()
return (filepath)
filepath = fileopen()
filepath = os.path.normpath(filepath)
with open (filepath, 'r') as fh:
print (fh.read())
or alternatively, using this method of printing lines:
fh = open(filepath, 'r')
for line in fh:
line=line.rstrip('\n')
print (line)
fh.close()
or if you want the lines loaded into a list of strings:
lines = []
fh = open(filepath, 'r')
for line in fh:
line=line.rstrip('\n')
lines.append(line)
fh.close()
for line in lines:
print (line)

When you open file I think you have to specify how do you want to open it. In your example you should open it for reading like:
print open('text.txt',"r").read()
Hope this does the trick.

combining text files in python

I am trying to combine multiple text files in a directory in one file. I want to write a HEADER and END statement in the combined file. The current python script which I am using combines all the files into one, but I am not able to figure out how to write a HEADER and END statement for each of the file in the combine file.
filenames = ['pm.pdb.B10010001.txt', 'pm.pdb.B10020001.txt', ...]
with open('/pdb3c91.0/output.txt', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)

Just write the two lines.
filenames = ['pm.pdb.B10010001.txt', 'pm.pdb.B10020001.txt', ...]
with open('/pdb3c91.0/output.txt', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write("HEADER\n")
for line in infile:
outfile.write(line)
outfile.write("END\n")

Replace and overwrite instead of appending

I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?

You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html

file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.

Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()

import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.

See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()

in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)

Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Script for deleting whitespace for multiple files - python

Related

Write() starts halfway through

Is there a way to read and write on multiple text files in a folder using Python?

Read txt files, results in empty lines

combining text files in python

Replace and overwrite instead of appending

Categories

Resources