Python: editing a series of txt files via loop - python

With Python I'm attempting to edit a series of text files to insert a series of strings. I can do so successfully with a single txt file. Here's my working code that appends messages before and after the main body within the txt file:
filenames = ['text_0.txt']
with open("text_0.txt", "w") as outfile:
for filename in filenames:
with open(filename) as infile:
header1 = "Message 1:"
lines = "\n\n\n\n"
header2 = "Message 2:"
contents = header1 + infile.read() + lines + header2
outfile.write(contents)
I'm seeking some assistance in structuring a script to iteratively make the same edits to a series of similar txt files in the directory. There are 20 or similar txt files are structured the same: text_0.txt, text_1.txt, text_2.txt, and so on. Any assistance is greatly appreciated.

to loop through a folder of text files, you need to do it like this:
import os
YOURDIRECTORY = "TextFilesAreHere" ##this is the folder where there's your text files
for file in os.listdir(YOURDIRECTORY):
filename = os.fsdecode(file)
with open(YOURDIRECTORY + "/" + filename, "r"):
###do what you want with the file

If you already know the file naming then you can simply loop:
filenames = [f'text_{index}.txt' for index in range(21)]
for file_name in filenames:
with open(file_name, "w") as outfile:
for filename in filenames:
with open(filename) as infile:
header1 = "Message 1:"
lines = "\n\n\n\n"
header2 = "Message 2:"
contents = header1 + infile.read() + lines + header2
outfile.write(contents)
Or loop the directory like:
import os
for filename in os.listdir(directory):
#do something , like check the filename in list

Related

Loop through files in a folder and create a new merged text file

I am working on merging a number of text files together into a single text document. I am able to read all the file names and create a new output document.
However, when I output the document, I am only getting the data from one file and not the rest? Overall it should be close to 1 million lines in a txt, but only getting the first 10k
import os
projpath1 = 'PATH1'
projpath2 = 'PATH2'
for root, dirs, files in os.walk(f"{projpath1}", topdown=False):
for name in files:
if not name.startswith('.DS_Store'):
split = name.split("/")
title = split[0]
filename = (os.path.join(root, name))
inputf = os.path.expanduser(f'{projpath1}/{title}')
updatedf = os.path.expanduser(f'{projpath2}/ENC_merged.txt')
with open(inputf, "r") as text_file, open(updatedf, 'w') as outfile:
for info in text_file:
for lines in info:
outfile.write(lines)
I really am stuck and can't figure it out :/
You are suppose to open create output file first and within it you need to save all the input files, something like this should work for you.
import os
projpath1 = 'PATH1'
projpath2 = 'PATH2'
with open(updatedf, 'w') as outfile:
for root, dirs, files in os.walk(f"{projpath1}", topdown=False):
for name in files:
if not name.startswith('.DS_Store'):
split = name.split("/")
title = split[0]
filename = (os.path.join(root, name))
inputf = os.path.expanduser(f'{projpath1}/{title}')
updatedf = os.path.expanduser(f'{projpath2}/ENC_merged.txt')
with open(inputf, "r") as text_file:
for info in text_file:
for lines in info:
outfile.write(lines)
What about doing it with bash
ls | xargs cat > merged_file

Python - Read Multiple Files & Write To Multiple New Files

I know there's a lot of content about reading & writing out there, but I'm still not quite finding what I need specifically.
I have 5 files (i.e. in1.txt, in2.txt, in3.txt....), and I want to open/read, run the data through a function I have, and then output the new returned value to corresponding new files (i.e. out1.txt, out2.txt, out3.txt....)
I want to do this in one program run. I'm not sure how to write the loop to process all the numbered files in one run.
If you want them to be processed serially, you can use a for loop as follows:
inpPrefix = "in"
outPrefix = "out"
for i in range(1, 6):
inFile = inPrefix + str(i) + ".txt"
with open(inFile, 'r') as f:
fileLines = f.readlines()
# process content of each file
processedOutput = process(fileLines)
#write to file
outFile = outPrefix + str(i) + ".txt"
with open(outFile, 'w') as f:
f.write(processedOutput)
Note: This assumes that the input and output files are in the same directory as the script is in.
If you are looking just for running one by one separately you can do:
import os
count = 0
directory = "dir/where/your/files/are/"
for filename in os.listdir(directory):
if filename.endswith(".txt"):
count += 1
with open(directory + filename, "r") as read_file:
return_of_your_function = do_something_with_data()
with open(directory + count + filename, "w") as write_file:
write_file.write(return_of_your_function)
Here, you go! I would do something like this:
(Assuming all the input .txt files are in the same input folder)
input_path = '/path/to/input/folder/'
output_path = '/path/to/output/folder/'
for count in range(1,6):
input_file = input_path + 'in' + str(count) + '.txt'
output_file = output_path + 'out' + str(count) + '.txt'
with open(input_file, 'r') as f:
content = f.readlines()
output = process_input(content)
with open(output_file, 'w') as f:
w.write(output)

Concatenate multiple files' data into one file and also rename the file?

Using python how can I combine all the text file in the specified directory into one text file and rename the output text file with the same filename.
For example: Filea.txt and Fileb_2.txt is in root directory, and it output generated file is Filea_Fileb_2.txt
Filea.txt
123123
21321
Fileb_2.txt
2344
23432
Filea_Fileb_2.txt
123123
21321
2344
23432
my script:
PWD1 = /home/jenkins/workspace
files = glob.glob(PWD1 + '/' + '*.txt')
with open(f, 'r') as file:
for line in (file):
outputfile = open('outputfile.txt', 'a')
outputfile.write(line)
outputfile.close()
Here's another way to combine text files.
#! python3
from pathlib import Path
import glob
folder_File1 = r"C:\Users\Public\Documents\Python\CombineFIles"
txt_only = r"\*.txt"
files_File1 = glob.glob(f'{folder_File1}{txt_only}')
new_txt = f'{folder_File1}\\newtxt.txt'
newFile = []
for indx, file in enumerate(files_File1):
if file == new_txt:
pass
else:
contents = Path(file).read_text()
newFile.append(contents)
file = open(new_txt, 'w')
file.write("\n".join(newFile))
file.close()
This is a working solution which stores both file names and file contents in a list, then joins the list filenames and creates a "combined" filename and then adds the contents of all the files to it, because lists append in order that the data is read this is sufficient (my example filenames are filea.txt and fileb.txt but it will work for the filenames you've used):
import os
import sys
path = sys.argv[1]
files = []
contents = []
for f in os.listdir(path):
if f.endswith('.txt'): # in case there are other file types in there
files.append(str(f.replace('.txt', ''))) #chops off txt so we can join later
with open(f) as cat:
for line in cat:
contents.append(line) # put file contents in list
outfile_name = '_'.join(x for x in files)+'.txt' #create your output filename
outfile = open(outfile_name, 'w')
for line in contents:
outfile.write(line)
outfile.close()
to run this on a specific directory just pass it on the commandline:
$python3.6 catter.py /path/to/my_text_files/
output filename:
filea_fileb.txt
contents:
123123
21321
2344
23432

Text file to csv with glob. Need to change delimiter depending on section of file being read

I have a text file that doesn't have a standard delimiter. I need to be able to check if the current line is equal to a certain phrase and if it is, the code should use a certain delimiter until another phrase is found. delimiters used are ',' '-',':' and '='.
Please help me out :)
This is what my code is at the moment
import csv
import glob
import os
directory = raw_input("INPUT Folder for Log Dump Files:")
output = raw_input("OUTPUT Folder for .csv files:")
txt_files = os.path.join(directory, '*.txt')
for txt_file in glob.glob(txt_files):
with open(txt_file, "rb") as input_file:
in_txt = csv.reader(input_file, delimiter=':')
filename = os.path.splitext(os.path.basename(txt_file))[0] + '.csv'
with open(os.path.join(output, filename), 'wb') as output_file:
out_csv = csv.writer(output_file)
out_csv.writerows(in_txt)
I cannot speak to the time efficiency of this method, but it might just get what you want done. The basic idea is to create a list to contain the lines of each text file, and then output the list to your new csv file. You save a 'delimiter' variable and then change it by checking each line as you go through the text files.
For example:
I created two text files on my Desktop. They read as follows:
delimiter_test_1.txt
test=delimiter=here
does-it-work
I'm:Not:Sure
delimiter_test_2.txt
This:File:Uses:Colons
Pretty:Much:The:Whole:Time
does-it-work
If-Written-Correctly-yes
I then ran this script on them:
import csv
import glob
import os
directory = raw_input("INPUT Folder for Log Dump Files:")
output = raw_input("OUTPUT Folder for .csv files:")
txt_files = os.path.join(directory, '*.txt')
delimiter = ':'
for txt_file in glob.glob(txt_files):
SavingList = []
with open(txt_file, 'r') as text:
for line in text:
if line == 'test=delimiter=here\n':
delimiter = '='
elif line == 'does-it-work\n':
delimiter = '-'
elif line == "I'm:Not:Sure":
delimiter = ':'
SavingList.append(line.split(delimiter))
with open('%s.csv' %os.path.join(output, txt_file.split('.')[0]), 'wb') as output_file:
writer = csv.writer(output_file)
for m in xrange(len(SavingList)):
writer.writerow(SavingList[m])
And got two csv files with the text split based on the desired delimiter. Depending on how many different lines you have for changing the delimiter you could set up a dictionary of said lines. Then your check becomes:
if line in my_dictionary.keys():
delimiter = my_dictionary[line]
for example.

How do I apply my python code to all of the files in a folder at once, and how do I create a new name for each subsequent output file?

The code I am working with takes in a .pdf file, and outputs a .txt file. My question is, how do I create a loop (probably a for loop) which runs the code over and over again on all files in a folder which end in ".pdf"? Furthermore, how do I change the output each time the loop runs so that I can write a new file each time, that has the same name as the input file (ie. 1_pet.pdf > 1_pet.txt, 2_pet.pdf > 2_pet.txt, etc.)
Here is the code so far:
path="2_pet.pdf"
content = getPDFContent(path)
encoded = content.encode("utf-8")
text_file = open("Output.txt", "w")
text_file.write(encoded)
text_file.close()
The following script solve your problem:
import os
sourcedir = 'pdfdir'
dl = os.listdir('pdfdir')
for f in dl:
fs = f.split(".")
if fs[1] == "pdf":
path_in = os.path.join(dl,f)
content = getPDFContent(path_in)
encoded = content.encode("utf-8")
path_out = os.path.join(dl,fs[0] + ".txt")
text_file = open(path_out, 'w')
text_file.write(encoded)
text_file.close()
Create a function that encapsulates what you want to do to each file.
import os.path
def parse_pdf(filename):
"Parse a pdf into text"
content = getPDFContent(filename)
encoded = content.encode("utf-8")
## split of the pdf extension to add .txt instead.
(root, _) = os.path.splitext(filename)
text_file = open(root + ".txt", "w")
text_file.write(encoded)
text_file.close()
Then apply this function to a list of filenames, like so:
for f in files:
parse_pdf(f)
One way to operate on all PDF files in a directory is to invoke glob.glob() and iterate over the results:
import glob
for path in glob.glob('*.pdf')
content = getPDFContent(path)
encoded = content.encode("utf-8")
text_file = open("Output.txt", "w")
text_file.write(encoded)
text_file.close()
Another way is to allow the user to specify the files:
import sys
for path in sys.argv[1:]:
...
Then the user runs your script like python foo.py *.pdf.
You could use a recursive function to search the folders and all subfolders for files that end with pdf. Than take those files and then create a text file for it.
It could be something like:
import os
def convert_PDF(path, func):
d = os.path.basename(path)
if os.path.isdir(path):
[convert_PDF(os.path.join(path,x), func) for x in os.listdir(path)]
elif d[-4:] == '.pdf':
funct(path)
# based entirely on your example code
def convert_to_txt(path):
content = getPDFContent(path)
encoded = content.encode("utf-8")
file_path = os.path.dirname(path)
# replace pdf with txt extension
file_name = os.path.basename(path)[:-4]+'.txt'
text_file = open(file_path +'/'+file_name, "w")
text_file.write(encoded)
text_file.close()
convert_PDF('path/to/files', convert_to_txt)
Because the actual operation is changeable, you can replace the function with whatever operation you need to perform (like using a different library, converting to a different type, etc.)

Categories

Resources