This question already has answers here:
How to delete a specific line in a file?
(17 answers)
Closed 3 years ago.
I would like to read a lot of data in a folder, and want to delete lines that have "DT=(SINGLE SINGLE SINGLE)", and then write it as new data.
In that Data folder, there are 300 data files!
My code is
import os, sys
path = "/Users/xxx/Data/"
allFiles = os.listdir(path)
for fname in allFiles:
print(fname)
with open(fname, "r") as f:
with open(fname, "w") as w:
for line in f:
if "DT=(SINGLE SINGLE SINGLE)" not in line:
w.write(line)
FileNotFoundError: [Errno 2] No such file or directory: '1147.dat'
I want to do it for a bunch of dataset.
How can I automatically read and write to delete the lines?
and is there way to make a new dataset with a different name? e.g. 1147.dat -> 1147_new.dat
The below should do; code demos of what each annotated line does afterwards:
path = "/Users/xxx/Data/"
allFiles = [os.path.join(path, filename) for filename in os.listdir(path)] # [1]
del_keystring = "DT=(SINGLE SINGLE SINGLE)" # general case
for filepath in allFiles: # better longer var names for clarity
print(filepath)
with open(filepath,'r') as f_read: # [2]
loaded_txt = f_read.readlines()
new_txt = []
for line in loaded_txt:
if del_keystring not in line:
new_txt.append(line)
with open(filepath,'w') as f_write: # [2]
f_write.write(''.join([line for line in new_txt])) # [4]
with open(filepath,'r') as f_read: # [5]
assert(len(f_read.readlines()) <= len(loaded_txt))
1 os.listdir returns only the filenames, not the filepaths; os.path.join joins its inputs into a fullpath, with separators (e.g. \\): folderpath + '\\' + filename
[2] NOT same as doing with open(X,'r') as .., with open(X,'w') as ..:; the as 'w' empties the file, thus nothing for as 'r' to read
[3] If f_read.read() == "Abc\nDe\n12", then f_read.read().split('\n')==["Abc,"De","12"]
[4] Undoes [3]: if _ls==["a","bc","12"], then "\n".join([x for x in _ls])=="a\nbc\n12"
[5] Optional code to verify that saved file's # of lines is <= original file's
NOTE: you may see the saved filesize slightly bigger than original's, which may be due to original's better packing, compression, etc - which you can figure from its docs; [5] ensures it isn't due to more lines
# bonus code to explicitly verify intended lines were deleted
with open(original_file_path,'r') as txt:
print(''.join(txt.readlines()[:80])) # select small excerpt
with open(processed_file_path,'r') as txt:
print(''.join(txt.readlines()[:80])) # select small excerpt
# ''.join() since .readlines() returns a list, delimited by \n
NOTE: for more advanced caveats, see comments below answer; for a more compact alternative, see Torxed's version
Related
This question already has answers here:
How do I create an incrementing filename in Python?
(13 answers)
Closed 2 years ago.
Suppose that I have a textfile text.txt. In Python, I read it to lines (which is a str object), make some changes to lines, and after that I want to save lines to a new file in the same directory. The new filename should be
text.new.txt if it doesn't exist already,
text.new.2.txt if text.new.txt already exists,
text.new.3.txt if text.new.txt and text.new.2.txt already exist,
etc.
I came up with the following solution:
import itertools
import os
file_1 = r'C:\Documents\text.txt'
with open(file_1, mode='r', encoding='utf8') as f:
lines = f.read()
# Here I modify `lines`
# ...
postfix = '.new'
root, ext = os.path.splitext(file_1)
file_2 = f'{root}{postfix}{ext}'
if os.path.isfile(file_2):
for j in itertools.count(start=2):
file_2 = f'{root}{postfix}.{j}{ext}'
if not os.path.isfile(file_2):
break
with open(file_2, mode='w', encoding='utf8') as f:
f.write(lines)
I realize that I can use while True ... break instead of itertools.count() but it is essentially the same thing. I am wondering if there are any substantially better solutions to this problem.
If you want to keep your code and your format is filename.new.index.txt, when you check that filename.new.2.txt exists you can check the last file with glob:
from glob import glob
# path contains path
filelist = glob(path + "\\filename.new.*")
last_index = max([filename.split(".")[-2] for filename in filelist])
And assign last_index+1 as index for the new file.
For a more robust approach, consider applying str.zfill() to indices in order to sort them easily (i.e. 001, 002, 003...).
I have a large number of entries in a file. Let me call it file A.
File A:
('aaa.dat', 'aaa.dat', 'aaa.dat')
('aaa.dat', 'aaa.dat', 'bbb.dat')
('aaa.dat', 'aaa.dat', 'ccc.dat')
I want to use these entries, line by line, in a program that would iteratively pick an entry from file A, concatenate the files in this way:
filenames = ['aaa.dat', 'aaa.dat', 'ccc.dat'] ###entry number 3
with open('out.dat', 'w') as outfile: ###the name has to be aaa-aaa-ccc.dat
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read().strip())
All I need to do is to substitute the filenames iteratively and create an output in a "aaa-aaa-aaa.dat" format. I would appreciate any help-- feeling a bit lost!
Many thanks!!!
You can retrieve and modify the file names in the following way:
import re
pattern = re.compile('\W')
with open('fnames.txt', 'r') as infile:
for line in infile:
line = (re.sub(pattern, ' ', line)).split()
# Old filenames - to concatenate contents
content = [x + '.dat' for x in line[::2]];
# New filename
new_name = ('-').join(line[::2]) + '.dat'
# Write the concatenated content to the new
# file (first read the content all at once)
with open(new_name, 'w') as outfile:
for con in content:
with open(con, 'r') as old:
new_content = old.read()
outfile.write(new_content)
This program reads your input file, here named fnames.txt with the exact structure from your post, line by line. For each line it splits the entries using a precompiled regex (precompiling regex is suitable here and should make things faster). This assumes that your filenames are only alphanumeric characters, since the regex substitutes all non-alphanumeric characters with a space.
It retrieves only 'aaa' and dat entries as a list of strings for each line and forms a new name by joining every second entry starting from 0 and adding a .dat extension to it. It joins using a - as in the post.
It then retrieves the individual file names from which it will extract the content into a list content by selecting every second entry from line.
Finally, it reads each of the files in content and writes them to the common file new_name. It reads each of them all at ones which may be a problem if these files are big and in general there may be more efficient ways of doing all this. Also, if you are planning to do more things with the content from old files before writing, consider moving the old file-specific operations to a separate function for readability and any potential debugging.
Something like this:
with open(fname) as infile, open('out.dat', 'w') as outfile:
for line in infile:
line = line.strip()
if line: # not empty
filenames = eval(line.strip()) # read tuple
filenames = [f[:-4] for f in filenames] # remove extension
filename = '-'.join(filenames) + '.dat' # make filename
outfile.write(filename + '\n') # write
If your problem is just calculating the new filenames, how about using os.path.splitext?
'-'.join([
f[0] for f in [os.path.splitext(path) for path in filenames]
]) + '.dat'
Which can be probably better understood if you see it like this:
import os
clean_fnames = []
filenames = ['aaa.dat', 'aaa.dat', 'ccc.dat']
for fname in filenames:
name, extension = os.path.splitext(fname)
clean_fnames.append(name)
name_without_ext = '-'.join(clean_fnames)
name_with_ext = name_without_ext + '.dat'
print(name_with_ext)
HOWEVER: If your issue is that you can not get the filenames in a list by reading the file line by line, you must keep in mind that when you read files, you get text (strings) NOT Python structures. You need to rebuild a list from a text like: "('aaa.dat', 'aaa.dat', 'aaa.dat')\n".
You could take a look to ast.literal_eval or try to rebuild it yourself. The code below outputs a lot of messages to show what's happening:
import pprint
collected_fnames = []
with open('./fileA.txt') as f:
for line in f:
print("Read this (literal) line: %s" % repr(line))
line_without_whitespaces_on_the_sides = line.strip()
if not line_without_whitespaces_on_the_sides:
print("line is empty... skipping")
continue
else:
line_without_parenthesis = (
line_without_whitespaces_on_the_sides
.lstrip('(')
.rstrip(')')
)
print("Cleaned parenthesis: %s" % line_without_parenthesis)
chunks = line_without_parenthesis.split(', ')
print("Collected %s chunks in a %s: %s" % (len(chunks), type(chunks), chunks))
chunks_without_quotations = [chunk.replace("'", "") for chunk in chunks]
print("Now we don't have quotations: %s" % chunks_without_quotations)
collected_fnames.append(chunks_without_quotations)
print("collected %s lines with filenames:\n%s" %
(len(collected_fnames), pprint.pformat(collected_fnames)))
I am reading a folder with a specific file name. I am reading the content within a file, but how do I read specific lines or the last 6 lines within a file?
************************************
Test Scenario No. 1
TestcaseID = FB_71125_1
dpSettingScript = FB_71125_1_DP.txt
************************************
Setting Pre-Conditions (DP values, Sqlite DB):
cp /fs/images/nfs/FileRecogTest/MNT/test/Databases/FB_71125_1_device.sqlite $NUANCE_DB_DIR/device.sqlite
"sync" twice.
Starting the test:
0#00041511#0000000000# FILERECOGNITIONTEST: = testScenarioNo (int)1 =
0#00041514#0000000000# FILERECOGNITIONTEST: = TestcaseID (char*)FB_71125_1 =
0#00041518#0000000000# FILERECOGNITIONTEST: = dpSettingScript (char*)FB_71125_1_DP.txt =
0#00041520#0000000000# FILERECOGNITIONTEST: = UtteranceNo (char*)1 =
0#00041524#0000000000# FILERECOGNITIONTEST: = expectedEventData (char*)0||none|0||none =
0#00041528#0000000000# FILERECOGNITIONTEST: = expectedFollowUpDialog (char*) =
0#00041536#0000000000# FILERECOGNITIONTEST: /fs/images/nfs/FileRecogTest/MNT/test/main_menu.wav#MEDIA_COND:PAS_MEDIA&MEDIA_NOT_BT#>main_menu.global<#<FS0000_Pos_Rec_Tone><FS1000_MainMenu_ini1>
0#00041789#0000000000# FILERECOGNITIONTEST: Preparing test data done
0#00043768#0000000000# FILERECOGNITIONTEST: /fs/images/nfs/FileRecogTest/MNT/test/Framework.wav##>{any_device_name}<#<FS0000_Pos_Rec_Tone><FS1400_DeviceDisambig_<slot>_ini1>
0#00044008#0000000000# FILERECOGNITIONTEST: Preparing test data done
0#00045426#0000000000# FILERECOGNITIONTESTWARNING: expected >{any_device_name}<, got >lowconfidence1#FS1000_MainMenu<
1900#00046452#0000000000# FILERECOGNITIONTESTERROR: expected <FS0000_Pos_Rec_Tone><FS1400_DeviceDisambig_<slot>_ini1>, got <FS0000_Misrec_Tone><FS1000_MainMenu_nm1_004><pause300><FS1000_MainMenu_nm_001>
0#00046480#0000000000# FILERECOGNITIONTEST: Preparing test data done
0#00047026#0000000000# FILERECOGNITIONTEST: Stopping dialog immediately
[VCALogParser] Scenario 1 FAILED.
Can someone suggest me how to read specific lines, or the last 6 lines within a file ?
I can think of two methods. If your files are not too big, you can just read all lines, and keep only the last six ones:
f = open(some_path)
last_lines = f.readlines()[-6:]
But that's really brute-force. Something cleverer is to make a guess, using the seek() method of your file object:
file_size = os.stat(some_path).st_size # in _bytes_, so take care depending on encoding
f = open(some_path)
f.seek(file_size - 1000) # here's the guess. Adjust with expected line length
last_lines = f.readline()[-6:]
To read the last 6 lines of a single file, you could use Python's file.seek to move near to the end of the file and then read the remaining lines. You need to decide what the maximum line length could possibly be, e.g. 1024 characters.
The seek command is first used to move to the end of the file (without reading it in), tell is used to determine with position in the file (as we are at the end, this will be the length). It then goes backwards in the file and reads the lines in. If the file is very short, the whole file is read in.
import os
filename = r"C:\Users\hemanth_venkatappa\Desktop\TEST\Language\test.txt"
back_up = 6 * 1024 # Go back from the end more than 6 lines worth.
with open(filename, "r") as f_input:
f_input.seek(0, os.SEEK_END)
backup = min(back_up, f_input.tell())
f_input.seek(-backup, os.SEEK_END)
print f_input.readlines()[-6:]
Using with will ensure your file is automatically closed afterwards. Prefixing your file path with r avoids you needing to double backslash your file path.
So to then apply this to your directory walk and write your results to a separate output file, you could do the following:
import os
import re
back_up = 6 * 256 # Go back from the end more than 6 lines worth
directory = r"C:\Users\hemanth_venkatappa\Desktop\TEST\Language"
output_filename = r"C:\Users\hemanth_venkatappa\Desktop\TEST\output.txt"
with open(output_filename, 'w') as f_output:
for dirpath, dirnames, filenames in os.walk(directory):
for filename in filenames:
if filename.startswith('VCALogParser_output'):
cur_file = os.path.join(dirpath, filename)
with open(cur_file, "r") as f_input:
f_input.seek(0, os.SEEK_END)
backup = min(back_up , f_input.tell())
f_input.seek(-backup, os.SEEK_END)
last_lines = ''.join(f_input.readlines()[-6:])
try:
summary = ', '.join(re.search(r'(\d+ warning\(s\)).*?(\d+ error\(s\)).*?(\d+ scenarios\(s\))', last_lines, re.S).groups())
except AttributeError:
summary = "No summary"
f_output.write('{}: {}\n'.format(filename, summary))
Or, essentially, use a for loop to append lines to an array and then remove the nth number of items from the array like:
array=[]
f=open("file.txt","r")
for lines in f:
array.append(f.readlines())
f.close()
while len(array) > 5:
del array[0]
I compare two text files and print out the results to a 3rd file. I am trying to make it so the script i'm running would iterate over all of the folders that have two text files in them, in the CWD of the script.
What i have so far:
import os
import glob
path = './'
for infile in glob.glob( os.path.join(path, '*.*') ):
print('current file is: ' + infile)
with open (f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:
Would this be a good way to start the iteration process?
It's not the most clear code but it gets the job done. However, i'm pretty sure i need to take the logic out of the read / write methods but i'm not sure where to start.
What i'm basically trying to do is have a script iterate over all of the folders in its CWD, open each folder, compare the two text files inside, write a 3rd text file to the same folder, then move on to the next.
Another method i have tried is as follows:
import os
rootDir = 'C:\\Python27\\test'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
And this outputs the following (to give you a better example of the file structure:
Found directory: C:\Python27\test
test.py
Found directory: C:\Python27\test\asdd
asd1.txt
asd2.txt
Found directory: C:\Python27\test\chro
ch1.txt
ch2.txt
Found directory: C:\Python27\test\hway
hw1.txt
hw2.txt
Would it be wise to put the compare logic under the for fname in fileList? How do i make sure it compares the two text files inside the specific folder and not with other fnames in the fileList?
This is the full code that i am trying to add this functionality into. I appologize for the Frankenstein nature of it but i am still working on a refined version but it does not work yet.
from collections import defaultdict
from operator import itemgetter
from itertools import groupby
from collections import deque
import os
class avs_auto:
def load_and_compare(self, input_file1, input_file2, output_file1, output_file2, result_file):
self.load(input_file1, input_file2, output_file1, output_file2)
self.compare(output_file1, output_file2)
self.final(result_file)
def load(self, fileIn1, fileIn2, fileOut1, fileOut2):
with open(fileIn1+'.txt') as fin1, open(fileIn2+'.txt') as fin2:
frame_rects = defaultdict(list)
for row in (map(str, line.split()) for line in fin1):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects[frame].append(id)
frame_rects[frame].append(rect)
frame_rects2 = defaultdict(list)
for row in (map(str, line.split()) for line in fin2):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects2[frame].append(id)
frame_rects2[frame].append(rect)
with open(fileOut1+'.txt', 'w') as fout1, open(fileOut2+'.txt', 'w') as fout2:
for frame, rects in sorted(frame_rects.iteritems()):
fout1.write('{{{}:{}}}\n'.format(frame, rects))
for frame, rects in sorted(frame_rects2.iteritems()):
fout2.write('{{{}:{}}}\n'.format(frame, rects))
def compare(self, fileOut1, fileOut2):
with open(fileOut1+'.txt', 'r') as fin1:
with open(fileOut2+'.txt', 'r') as fin2:
lines1 = fin1.readlines()
lines2 = fin2.readlines()
diff_lines = [l.strip() for l in lines1 if l not in lines2]
diffs = defaultdict(list)
with open(fileOut1+'x'+fileOut2+'.txt', 'w') as result_file:
for line in diff_lines:
d = eval(line)
for k in d:
list_ids = d[k]
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
result_file.write('{0} {1} {2}\n'.format(id_, group[0], group[-1]))
def final(self, result_file):
with open(result_file+'.txt', 'r') as fin:
lines = (line.split() for line in fin)
for k, g in groupby(lines, itemgetter(0)):
fst = next(g)
lst = next(iter(deque(g, 1)), fst)
with open('final/{}.avs'.format(k), 'w') as fout:
fout.write('video0=ImageSource("old\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video1=ImageSource("new\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video0=BilinearResize(video0,640,480)\n')
fout.write('video1=BilinearResize(video1,640,480)\n')
fout.write('StackHorizontal(video0,video1)\n')
fout.write('Subtitle("ID: {}", font="arial", size=30, align=8)'.format(k))
using the load_and_compare() function, i define two input text files, two output text files, a file for the comparison results and a final phase that writes many files for all of the differences.
What i am trying to do is have this whole class run on the current working directory and go through every sub folder, compare the two text files, and write everything into the same folder, specifically the final() results.
You can indeed use os.walk(), since that already separates the directories from the files. You only need the directories it returns, because that's where you're looking for your 2 specific files.
You could also use os.listdir() but that returns directories as well files in the same list, so you would have to check for directories yourself.
Either way, once you have the directories, you iterate over them (for subdir in dirnames) and join the various path components you have: The dirpath, the subdir name that you got from iterating over the list and your filename.
Assuming there are also some directories that don't have the specific 2 files, it's a good idea to wrap the open() calls in a try..except block and thus ignore the directories where one of the files (or both of them) doesn't exist.
Finally, if you used os.walk(), you can easily choose if you only want to go into directories one level deep or walk the whole depth of the tree. In the former case, you just clear the dirnames list by dirnames[:] = []. Note that dirnames = [] wouldn't work, since that would just create a new empty list and put that reference into the variable instead of clearing the old list.
Replace the print("do something ...") with your program logic.
#!/usr/bin/env python
import errno
import os
f1 = "test1"
f2 = "test2"
path = "."
for dirpath, dirnames, _ in os.walk(path):
for subdir in dirnames:
filepath1, filepath2 = [os.path.join(dirpath, subdir, f + ".txt") for f in f1, f2]
try:
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
except IOError as e:
# ignore directiories that don't contain the 2 files
if e.errno != errno.ENOENT:
# reraise exception if different from "file or directory doesn't exist"
raise
# comment the next line out if you want to traverse all subsubdirectories
dirnames[:] = []
Edit:
Based on your comments, I hope I understand your question better now.
Try the following code snippet instead. The overall structure stays the same, only now I'm using the returned filenames of os.walk(). Unfortunately, that would also make it harder to do something like "go only into the subdirectories 1 level deep", so I hope walking the tree recursively is fine with you. If not, I'll have to add a little code to later.
#!/usr/bin/env python
import fnmatch
import os
filter_pattern = "*.txt"
path = "."
for dirpath, dirnames, filenames in os.walk(path):
# comment this out if you don't want to filter
filenames = [fn for fn in filenames if fnmatch.fnmatch(fn, filter_pattern)]
if len(filenames) == 2:
# comment this out if you don't want the 2 filenames to be sorted
filenames.sort(key=str.lower)
filepath1, filepath2 = [os.path.join(dirpath, fn) for fn in filenames]
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
I'm still not really sure what your program logic does, so you will have to interface the two yourself.
However, I noticed that you're adding the ".txt" extension to the file name explicitly all over your code, so depending on how you are going to use the snippet, you might or might not need to remove the ".txt" extension first before handing the filenames over. That would be achieved by inserting the following line after or before the sort:
filenames = [os.path.splitext(fn)[0] for fn in filenames]
Also, I still don't understand why you're using eval(). Do the text files contain python code? In any case, eval() should be avoided and be replaced by code that's more specific to the task at hand.
If it's a list of comma separated strings, use line.split(",") instead.
If there might be whitespace before or after the comma, use [word.strip() for word in line.split(",")] instead.
If it's a list of comma separated integers, use [int(num) for num in line.split(",")] instead - for floats it works analogously.
etc.
I work with spatial data that is output to text files with the following format:
COMPANY NAME
P.O. BOX 999999
ZIP CODE , CITY
+99 999 9999
23 April 2013 09:27:55
PROJECT: Link Ref
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Design DTM is 30MB 2.5X2.5
Stripping applied to design is 0.000
Point Number Easting Northing R.L. Design R.L. Difference Tol Name
3224808 422092.700 6096059.380 2.520 -19.066 -21.586 --
3224809 422092.200 6096059.030 2.510 -19.065 -21.575 --
<Remainder of lines>
3273093 422698.920 6096372.550 1.240 -20.057 -21.297 --
Average height difference is -21.390
RMS is 21.596
0.00 % above tolerance
98.37 % below tolerance
End of Report
As shown, the files have a header and a footer. The data is delimited by spaces, but not an equal amount between the columns.
What I need, is comma delimited files with Easting, Northing and Difference.
I'd like to prevent having to modify several hundred large files by hand and am writing a small script to process the files. This is what I have so far:
#! /usr/bin/env python
import csv,glob,os
from itertools import islice
list_of_files = glob.glob('C:/test/*.txt')
for filename in list_of_files:
(short_filename, extension )= os.path.splitext(filename)
print short_filename
file_out_name = short_filename + '_ed' + extension
with open (filename, 'rb') as source:
reader = csv.reader( source)
for row in islice(reader, 10, None):
file_out= open (file_out_name, 'wb')
writer= csv.writer(file_out)
writer.writerows(reader)
print 'Created file: '+ file_out_name
file_out.close()
print 'All done!'
Questions:
How can I let the line starting with 'Point number' become the header in the output file? I'm trying to put DictReader in place of the reader/writer bit but can't get it to work.
Writing the output file with delimiter ',' does work but writes a comma in place of each space, giving way too much empty columns in my output file. How do I circumvent this?
How do I remove the footer?
I can see a problem with your code, you are creating a new writer for each row; so you will end up only with the last one.
Your code could be something like this, without the need of CSV readers or writers, as it's simple enough to be parsed as simple text (problem would arise if you had text columns, with escaped characters and so on).
def process_file(source, dest):
found_header = False
for line in source:
line = line.strip()
if not header_found:
#ignore everything until we find this text
header_found = line.starswith('Point Number')
elif not line:
return #we are done when we find an empty line, I guess
else:
#write the needed columns
columns = line.split()
dest.writeline(','.join(columns[i] for i in (1, 2, 5)))
for filename in list_of_files:
short_filename, extension = os.path.splitext(filename)
file_out_name = short_filename + '_ed' + extension
with open(filename, 'r') as source:
with open(file_out_name. 'w') as dest:
process_file(source, dest)
This worked:
#! /usr/bin/env python
import glob,os
list_of_files = glob.glob('C:/test/*.txt')
def process_file(source, dest):
header_found = False
for line in source:
line = line.strip()
if not header_found:
#ignore everything until we find this text
header_found = line.startswith('Stripping applied') #otherwise, header is lost
elif not line:
return #we are done when we find an empty line
else:
#write the needed columns
columns = line.split()
dest.writelines(','.join(columns[i] for i in (1, 2, 5))+"\n") #newline character adding was necessary
for filename in list_of_files:
short_filename, extension = os.path.splitext(filename)
file_out_name = short_filename + '_ed' + ".csv"
with open(filename, 'r') as source:
with open(file_out_name, 'wb') as dest:
process_file(source, dest)
To answer to your first and last question: it is simply about ignoring the corresponding lines, i.e. not to write them to output. This corresponds to if not header_found and else if not line: blocks of fortran proposal.
Second point is that there is no dedicated delimiter in your file: you have one or more spaces, which makes it hard to be parsed using csv module. Using split() will parse each line and return the list of non-blank characters, and will therefore only return useful values.