Change two lines in text - python

I have a python script mostly coded so far for a project I'm currently working on and have hit a road block. I essentially run a program that spits out the following output file (called big.dmp):
)O+_05 Big-body initial data (WARNING: Do not delete this line!!)
) Lines beginning with `)' are ignored.
)---------------------------------------------------------------------
style (Cartesian, Asteroidal, Cometary) = Cartesian
epoch (in days) = 1365250.
)---------------------------------------------------------------------
COMPSTAR r=5.00000E-01 d=3.00000E+00 m= 0.160000000000000E+01
4.570923967127310E-01 1.841433531828977E+01 0.000000000000000E+00
-6.207379670518027E-03 1.540861575481520E-04 0.000000000000000E+00
0.000000000000000E+00 0.000000000000000E+00 0.000000000000000E+00
Now with this file I need to edit both the epoch line and the line beginning with COMPSTAR while keeping the rest of the information constant from integration to integration as the last 3 lines contain the cartesian coordinates of my object and is essentially what the program is outputting.
I know how to use f = open('big.dmp', 'w') and f.write('text here') to create the initial file but how would one go about reading these final three lines into a new big.dmp file for the next integration?

Something like this perhaps?
infile = open('big1.dmp')
outfile = open('big2.dmp', 'w')
for line in infile:
if line.startswith(')'):
# ignore comments
pass
elif 'epoch' in line:
# do something with line
line = line.replace('epoch', 'EPOCH')
elif line.startswith('COMPSTAR'):
# do something with line
line = line.replace('COMPSTAR', 'comparison star')
outfile.write(line)

Here is a somewhat more change-tolerant version:
import re
reg_num = r'\d+'
reg_sci = r'[-+]?\d*\.?\d+([eE][+-]?\d+)?'
def update_config(s, finds=None, replaces=None, **kwargs):
if finds is None: finds = update_config.finds
if replaces is None: replaces = update_config.replaces
for name,value in kwargs.iteritems():
s = re.sub(finds[name], replaces[name].format(value), s)
return s
update_config.finds = {
'epoch': r'epoch \(in days\) =\s*'+reg_num+'\.',
'r': r' r\s*=\s*' + reg_sci,
'd': r' d\s*=\s*' + reg_sci,
'm': r' m\s*=\s*' + reg_sci
}
update_config.replaces = {
'epoch': 'epoch (in days) ={:>11d}.',
'r': ' r={:1.5E}',
'd': ' d={:1.5E}',
'm': ' m= {:1.15E}'
}
def main():
with open('big.dmp') as inf:
s = inf.read()
s = update_config(s, epoch=1365252, r=0.51, d=2.99, m=1.1)
with open('big.dmp', 'w') as outf:
outf.write(s)
if __name__=="__main__":
main()

On the off-chance that the format of your file is fixed with regard to line numbers, this solution will change only the two lines:
with open('big.dmp') as inf, open('out.txt', 'w') as outf:
data = inf.readlines()
data[4] = ' epoch (in days) = 9999.\n' # line with epoch
data[6] = 'COMPSTAR r=2201 d=3330 m= 12\n' # line with COMPSTAR
outf.writelines(data)
resulting in this output file:
)O+_05 Big-body initial data (WARNING: Do not delete this line!!)
) Lines beginning with `)' are ignored.
)---------------------------------------------------------------------
style (Cartesian, Asteroidal, Cometary) = Cartesian
epoch (in days) = 9999.
)---------------------------------------------------------------------
COMPSTAR r=2201 d=3330 m= 12
4.570923967127310E-01 1.841433531828977E+01 0.000000000000000E+00
-6.207379670518027E-03 1.540861575481520E-04 0.000000000000000E+00
0.000000000000000E+00 0.000000000000000E+00 0.000000000000000E+00
Clearly this will not work if the line numbers aren't consistent, but I thought I'd offer it up just in case your data format is consistent in terms of line numbers.
Also, since it reads the whole file into memory at once, it won't be an ideal solution for truly huge files.
The advantage of opening files using with is that they are automatically closed for you when you are done with them, or if you encounter an exception.
There are more flexible solution (searching for the strings, processing the file line-by-line) but if your data is fixed and small, there's no downside of taking advantage of those factors. Somebody smart once said "Simple is better than complex." (The Zen of Python)

It's a little hard to understand what you want, but assuming that you only want to remove the lines not starting with ):
text = open(filename).read()
lines = text.split("\n")
result = [line for line in lines if not line.startswith(")")
or, the one liner:
[line for line in open(file_name).read().split("\n") if not line.startswith(")")]

Related

Reading txt file into python

I python rookie here. I have multiple text files each with the following format of floats 1x10, 1x10 and 10x10
0.0551 1500.0 [273.639, 273.331, 273.021, 272.711, 272.399, 272.087, 271.773, 271.46, 271.145]
0.0553 1532.5 [272.422, 273.96, 273.021, 273.321, 272.494, 273.129, 271.12, 271.23, 271.889]
0.0555 1560.0 [273.234, 273.44, 273.133, 272.065, 272.234, 272.012, 271.942, 271.43, 271.145]
0.0558 1582.5 [272.45, 273.011, 273.45, 273.331, 272.321, 273.234, 271.34, 271.531, 271.932]
I would like to read them as a column as following to be able to plot them:
column1 = [0.0551,0.0553,0.0555,0.0558,....]
column2 = [1500.0,1532.5,1560.0,1582.5,....]
column3 = [[273.639, 273.331, 273.021, 272.711, 272.399, 272.087, 271.773, 271.46, 271.145],[272.422, 273.96, 273.021, 273.321, 272.494, 273.129, 271.12, 271.23, 271.889],[273.234, 273.44, 273.133, 272.065, 272.234, 272.012, 271.942, 271.43, 271.145],[272.45, 273.011, 273.45, 273.331, 272.321, 273.234, 271.34, 271.531, 271.932]]
I tried numpy loadtxt and numerous other functions but was never able to successfully read them in python. What is the best way to read the text file in the desired format?
Your file structure is kinda weird, you should clean it upstream.
Anyway here's the function to load your data. If the file structure changes too much, the function may not work.
def load_data(file):
cols = [[] for _ in range(3)]
to_remove = ['[', ']', '\n']
with open(file, 'r') as f:
for line in f.readlines():
if len(line) > 1:
split_line = line
for x in to_remove: split_line = split_line.replace(x, '')
split_line = split_line.split(' ', 2)
cols[0].append(float(split_line[0]))
cols[1].append(float(split_line[1]))
cols[2].append([float(i) for i in split_line[2].split(',')])
return cols

Python: Having trouble replacing lines from file

I'm trying to build a translator using deepl for subtitles but it isn't running perfectly. I managed to translate the subtitles and most of the part I'm having problems replacing the lines. I can see that the lines are translated because it prints them but it doesn't replace them. Whenever I run the program it is the same as the original file.
This is the code responsible for:
def translate(input, output, languagef, languaget):
file = open(input, 'r').read()
fileresp = open(output,'r+')
subs = list(srt.parse(file))
for sub in subs:
try:
linefromsub = sub.content
translationSentence = pydeepl.translate(linefromsub, languaget.upper(), languagef.upper())
print(str(sub.index) + ' ' + translationSentence)
for line in fileresp.readlines():
newline = fileresp.write(line.replace(linefromsub,translationSentence))
except IndexError:
print("Error parsing data from deepl")
This is the how the file looks:
1
00:00:02,470 --> 00:00:04,570
- Yes, I do.
- (laughs)
2
00:00:04,605 --> 00:00:07,906
My mom doesn't want
to babysit everyday
3
00:00:07,942 --> 00:00:09,274
or any day.
4
00:00:09,310 --> 00:00:11,977
But I need
my mom's help sometimes.
5
00:00:12,013 --> 00:00:14,046
She's just gonna
have to be grandma today.
Help will be appreaciated :)
Thanks.
You are opening fileresp with r+ mode. When you call readlines(), the file's position will be set to the end of the file. Subsequent calls to write() will then append to the file. If you want to overwrite the original contents as opposed to append, you should try this instead:
allLines = fileresp.readlines()
fileresp.seek(0) # Set position to the beginning
fileresp.truncate() # Delete the contents
for line in allLines:
fileresp.write(...)
Update
It's difficult to see what you're trying to accomplish with r+ mode here but it seems you have two separate input and output files. If that's the case consider:
def translate(input, output, languagef, languaget):
file = open(input, 'r').read()
fileresp = open(output, 'w') # Use w mode instead
subs = list(srt.parse(file))
for sub in subs:
try:
linefromsub = sub.content
translationSentence = pydeepl.translate(linefromsub, languaget.upper(), languagef.upper())
print(str(sub.index) + ' ' + translationSentence)
fileresp.write(translationSentence) # Write the translated sentence
except IndexError:
print("Error parsing data from deepl")

How do you make tables with previously stored strings?

So the question basically gives me 19 DNA sequences and wants me to makea basic text table. The first column has to be the sequence ID, the second column the length of the sequence, the third is the number of "A"'s, 4th is "G"'s, 5th is "C", 6th is "T", 7th is %GC, 8th is whether or not it has "TGA" in the sequence. Then I get all these values and write a table to "dna_stats.txt"
Here is my code:
fh = open("dna.fasta","r")
Acount = 0
Ccount = 0
Gcount = 0
Tcount = 0
seq=0
alllines = fh.readlines()
for line in alllines:
if line.startswith(">"):
seq+=1
continue
Acount+=line.count("A")
Ccount+=line.count("C")
Gcount+=line.count("G")
Tcount+=line.count("T")
genomeSize=Acount+Gcount+Ccount+Tcount
percentGC=(Gcount+Ccount)*100.00/genomeSize
print "sequence", seq
print "Length of Sequence",len(line)
print Acount,Ccount,Gcount,Tcount
print "Percent of GC","%.2f"%(percentGC)
if "TGA" in line:
print "Yes"
else:
print "No"
fh2 = open("dna_stats.txt","w")
for line in alllines:
splitlines = line.split()
lenstr=str(len(line))
seqstr = str(seq)
fh2.write(seqstr+"\t"+lenstr+"\n")
I found that you have to convert the variables into strings. I have all of the values calculated correctly when I print them out in the terminal. However, I keep getting only 19 for the first column, when it should go 1,2,3,4,5,etc. to represent all of the sequences. I tried it with the other variables and it just got the total amounts of the whole file. I started trying to make the table but have not finished it.
So my biggest issue is that I don't know how to get the values for the variables for each specific line.
I am new to python and programming in general so any tips or tricks or anything at all will really help.
I am using python version 2.7
Well, your biggest issue:
for line in alllines: #1
...
fh2 = open("dna_stats.txt","w")
for line in alllines: #2
....
Indentation matters. This says "for every line (#1), open a file and then loop over every line again(#2)..."
De-indent those things.
This puts the info in a dictionary as you go and allows for DNA sequences to go over multiple lines
from __future__ import division # ensure things like 1/2 is 0.5 rather than 0
from collections import defaultdict
fh = open("dna.fasta","r")
alllines = fh.readlines()
fh2 = open("dna_stats.txt","w")
seq=0
data = dict()
for line in alllines:
if line.startswith(">"):
seq+=1
data[seq]=defaultdict(int) #default value will be zero if key is not present hence we can do +=1 without originally initializing to zero
data[seq]['seq']=seq
previous_line_end = "" #TGA might be split accross line
continue
data[seq]['Acount']+=line.count("A")
data[seq]['Ccount']+=line.count("C")
data[seq]['Gcount']+=line.count("G")
data[seq]['Tcount']+=line.count("T")
data[seq]['genomeSize']+=data[seq]['Acount']+data[seq]['Gcount']+data[seq]['Ccount']+data[seq]['Tcount']
line_over = previous_line_end + line[:3]
data[seq]['hasTGA']= data[seq]['hasTGA'] or ("TGA" in line) or (TGA in line_over)
previous_line_end = str.strip(line[-4:]) #save previous_line_end for next line removing new line character.
for seq in data.keys():
data[seq]['percentGC']=(data[seq]['Gcount']+data[seq]['Ccount'])*100.00/data[seq]['genomeSize']
s = '%(seq)d, %(genomeSize)d, %(Acount)d, %(Ccount)d, %(Tcount)d, %(Tcount)d, %(percentGC).2f, %(hasTGA)s'
fh2.write(s % data[seq])
fh.close()
fh2.close()

Python: Simple script that parses metrics data

I have a small Python script that I need to modify because the format of the metrics file has changed slightly. I do not know Python at all and have tried to take an honest effort to fix it myself. The changes make sense to me but apparently there is still one issue with the script. Otherwise, everything else is working. Here's what the script looks like:
import sys
import datetime
##########################################################################
now = datetime.datetime.now();
logFile = now.strftime("%Y%m%d")+'.QE-Metric.log';
underlyingParse = True;
strParse = "UNDERLYING_TICK";
if (len(sys.argv) == 2):
if sys.argv[1] == '2':
strParse = "ORDER_SHOOT";
underlyingParse = False;
elif (len(sys.argv) == 3):
logFile = sys.argv[2];
if sys.argv[1] == '2':
strParse = "ORDER_SHOOT";
underlyingParse = False;
else:
print 'Incorrect number of arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
sys.exit()
##########################################################################
# Read the deployment file
FIput = open(logFile, 'r');
FOput = open('ParsedMetrics.txt', 'w');
##########################################################################
def ParseMetrics( file_lines ):
ii = 0
tokens = [];
for ii in range(len(file_lines)):
line = file_lines[ii].strip()
if (line.find(strParse) != -1):
tokens = line.split(",");
currentTime = float(tokens[2])
if (underlyingParse == True and ii != 0):
newIndex = ii-1
prevLine = file_lines[newIndex].strip()
while (prevLine.find("ORDER_SHOOT") != -1 and newIndex > -1):
newIndex -= 1;
tokens = prevLine.split(",");
currentTime -= float(tokens[2]);
prevLine = file_lines[newIndex].strip();
if currentTime > 0:
FOput.write(str(currentTime) + '\n')
##########################################################################
file_lines = FIput.readlines()
ParseMetrics( file_lines );
print 'Metrics parsed and written to ParsedMetrics.txt'
Everything is working fine except for the logic that is supposed to reverse iterate through previous lines to add up the ORDER_SHOOT numbers since the last UNDERLYING_TICK event occurred (starting at the code: if (underlyingParse == True and ii != 0):...) and then subtract that total from the current UNDERLYING_TICK event line being processed. This is what a typical line in the file being parsed looks like:
08:40:02.039387(+26): UNDERLYING_TICK, 1377, 1499.89
Basically, I'm only interested in the last data element (1499.89) which is the time in micros. I know it has to be something stupid. I just need another pair of eyes. Thanks!
So, if command line option is 2, the function creates an output file where all the lines contain just the 'time' portion of the lines from the input file that had the "order_shoot" token in them?
And if the command line option is 1, the function creates an output file with a line for each line in input file that contained the 'underlying_tick' token, except that the number you want here is the underlying_tick time value minus all the order_shoot time values that occurred SINCE the preceding underlying_tick value (or from the start of file if this is the first one)?
If this is correct, and all lines are unique (there are no duplicates), then I would suggest the following re-written script:
#### Imports unchanged.
import sys
import datetime
#### Changing the error checking to be a little simpler.
#### If the number of args is wrong, or the "mode" arg is
#### not a valid option, it will print the error message
#### and exit.
if len(sys.argv) not in (2,3) or sys.argv[2] not in (1,2):
print 'Incorrect arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
sys.exit()
#### the default previously specified in the original code.
now = datetime.datetime.now()
#### Using ternary logic to set the input file to either
#### the files specified in argv[2] (if it exists), or to
#### the default previously specified in the original code.
FIput = open((sys.argv[2] if len(sys.argv)==3
else now.strftime("%Y%m%d")+'.QE-Metric.log'), 'r');
#### Output file not changed.
FOput = open('ParsedMetrics.txt', 'w');
#### START RE-WRITTEN FUNCTION
def ParseMetrics(file_lines,mode):
#### The function now takes two params - the lines from the
#### input file, and the 'mode' - whichever the user selected
#### at run-time. As you can see from the call down below, this
#### is taken straight from argv[1].
if mode == '1':
#### So if we're doing underlying_tick mode, we want to find each tick,
#### then for each tick, sum the preceding order_shoots since the last
#### tick (or start of file for the first tick).
ticks = [file_lines.index(line) for line in file_lines \
if 'UNDERLYING_TICK' in line]
#### The above list comprehension iterates over file_lines, and creates
#### a list of the indexes to file_lines elements that contain ticks.
####
#### Then the following loop iterates over ticks, and for each tick,
#### subtracts the sum of all times for order_shoots that occure prior
#### to the tick, from the time value of the tick itself. Then that
#### value is written to the outfile.
for tick in ticks:
sub_time = float(file_lines[tick].split(",")[2]) - \
sum([float(line.split(",")[2]) \
for line in file_lines if "ORDER_SHOOT" in line \
and file_lines.index(line) <= tick]
FOput.write(float(line.split(",")[2]))
#### if the mode is 2, then it just runs through file_lines and
#### outputs all of the order_shoot time values.
if mode == '2':
for line in file_lines:
if 'ORDER_SHOOT' in line:
FOput.write(float(line.split(",")[2]))
#### END OF REWRITTEN FUNCTION
#### As you can see immediately below, we pass sys.argv[2] for the
#### mode argument of the ParseMetrics function.
ParseMetrics(FIput.readlines(),sys.argv[2])
print 'Metrics parsed and written to ParsedMetrics.txt'
And that should do the trick. The main issue is that if you have any lines with "UNDERLYING_TICK" that are exact duplicates of any other such line, then this will not work. Different logic would need to be applied to get the correct indexes.
I am sure there is a way to make this much better, but this was my first thought.
It's also worth noting I added a lot of inline line breaks to the above source for readability, but you might want to pull them if you use this as written.
It's unclear what is wrong with your output because you don't show your output and we can't really understand your input.
I am assuming the following:
Lines are formatted as "absolutetime: TYPE, positiveinteger, float_time_duration_in_ms", where this last item is the amount of time the thing took.
Lines are sorted by "absolutetime". As a consequence, the ORDER_SHOOTs that belong to an UNDERLYING_TICK are always on the lines since the last UNDERLYING_TICK (or the beginning of the file), and only those lines. If this assumption is not true, then you need to sort the file first. You can either do that with a separate program (e.g. pipe output from sort), or use the bisect module to store your lines sorted and easily extract the relevant lines.
If both these assumptions are true, take a look at the following script instead. (Untested because I don't have a big input sample or an output sample to compare against.)
This is a much more Pythonic style, much easier to read and understand, doesn't make use of global variables as function parameters, and should be much more efficient because it doesn't iterate backwards through lines or load the entire file into memory to parse it.
It also demonstrates use of the argparse module for your command line parsing. This isn't necessary, but if you have a lot of command-line Python scripts you should get familiar with it.
import sys
VALIDTYPES = ['UNDERLYING_TICK','ORDER_SHOOT']
def parseLine(line):
# format of `tokens`:
# 0 = absolute timestamp
# 1 = event type
# 2 = ???
# 3 = timedelta (microseconds)
tokens = [t.strip(':, \t') for t in line.strip().split()]
if tokens[1] not in VALIDTYPES:
return None
tokens[2] = int(tokens[2])
tokens[3] = float(tokens[3])
return tuple(tokens)
def parseMetrics(lines, parsetype):
"""Yield timedelta for each line of specified type
If parsetype is 'UNDERLYING_TICK', subtract previous ORDER_SHOOT
timedeltas from the current UNDERLYING_TICK delta before yielding
"""
order_shoots_between_ticks = []
for line in lines:
tokens = parseLine(line)
if tokens is None:
continue # go home early
if parsetype=='UNDERLYING_TICK':
if tokens[1]=='ORDER_SHOOT':
order_shoots_between_ticks.append(tokens)
elif tokens[1]=='UNDERLYING_TICK':
adjustedtick = tokens[3] - sum(t[3] for t in order_shoots_between_ticks)
order_shoots_between_ticks = []
yield adjustedtick
elif parsetype==tokens[1]:
yield tokens[3]
def parseFile(instream, outstream, parsetype):
printablelines = ("{0:f}\n".format(time) for time in parseMetrics(instream, parsetype))
outstream.writelines(printablelines)
def main(argv):
import argparse, datetime
parser = argparse.ArgumentParser(description='Output timedeltas from a QE-Metric log file')
parser.add_argument('mode', type=int, choices=range(1, len(VALIDTYPES)+1),
help="the types to parse. Valid values are: 1 (Underlying), 2 (OrderShoot)")
parser.add_argument('infile', required=False,
default='{}.QE-Metric.log'.format(datetime.datetime.now().strftime('%Y%m%d'))
help="the input file. Defaults to today's file: YYYYMMDD.QE-Metric.log. Use - for stdin.")
parser.add_argument('outfile', required=False,
default='ParsedMetrics.txt',
help="the output file. Defaults to ParsedMetrics.txt. Use - for stdout.")
parser.add_argument('--verbose', '-v', action='store_true')
args = parser.parse_args(argv)
args.mode = VALIDTYPES[args.mode-1]
if args.infile=='-':
instream = sys.stdin
else:
instream = open(args.infile, 'rb')
if args.outfile=='-':
outstream = sys.stdout
else:
outstream = open(args.outfile, 'wb')
parseFile(instream, outstream, args.mode)
instream.close()
outstream.close()
if args.verbose:
sys.stderr.write('Metrics parsed and written to {0}\n'.format(args.outfile))
if __name__=='__main__':
main(sys.argv[1:])

how to sort a list by the nth element in v2.3?

This is a simple script I wrote:
#!/usr/bin/env python
file = open('readFile.txt', 'r')
lines = file.readlines()
file.close()
del file
sortedList = sorted(lines, key=lambda lines: lines.split('\t')[-2])
file = open('outfile.txt', 'w')
for line in sortedList:
file.write(line)
file.close()
del file
to rewrite a file like this:
161788 group_monitor.sgmops 4530 1293840320 1293840152
161789 group_atlas.atlas053 22350 1293840262 1293840152
161790 group_alice.alice017 210 1293840254 1293840159
161791 group_lhcb.pltlhc15 108277 1293949235 1293840159
161792 group_atlas.sgmatlas 35349 1293840251 1293840160
(where the last two fields are epoch time) ordered by the next to last field to this:
161792 group_atlas.sgmatlas 35349 1293840251 1293840160
161790 group_alice.alice017 210 1293840254 1293840159
161789 group_atlas.atlas053 22350 1293840262 1293840152
161788 group_monitor.sgmops 4530 1293840320 1293840152
161791 group_lhcb.pltlhc15 108277 1293949235 1293840159
As you can see, I used sorted(), which was introduced in v2.4, how can I rewrite the script for v2.3, so that it does that same thing.
In addition, I want to convert the epoch time to the human-readable format, so the resultant file looks like this:
161792 group_atlas.sgmatlas 35349 01/01/11 00:04:11 01/01/11 00:02:40
161790 group_alice.alice017 210 01/01/11 00:04:14 01/01/11 00:02:39
161789 group_atlas.atlas053 22350 01/01/11 00:04:22 01/01/11 00:02:32
I know, this strftime("%d/%m/%y %H:%M:%S", gmtime()) can be used to convert the epoch time but I just can't figure out how can I apply that to the script to rewrite the file in that format.
Comments? Advice treasured!
#Mark: Update
In some cases, the epoch time comes as 3600, which is to indicate an unfinished business. I wanted to print aborted instead of 01/01/1970 for such a line. So, I changed the format_seconds_since_epoch() like this:
def format_seconds_since_epoch(t):
if t == 3600:
return "aborted"
else:
return strftime("%d/%m/%y %H:%M:%S",datetime.fromtimestamp(t).timetuple())
which solved the problem. Is it the best that can be done in this regard? Cheers!!
file = open('readFile.txt', 'r')
lines = file.readlines()
file.close()
del file
lines = [line.split(' ') for line in lines]
lines.sort(lambda x,y: cmp(x[2], y[2])
lines = [' '.join(line) for line in lines]
In reply to your final query, you can create a datetime object from a time_t-like "seconds since the epoch" value using datetime.fromtimestamp, e.g.
from datetime import datetime
from time import strftime
def format_seconds_since_epoch(t):
return strftime("%d/%m/%y %H:%M:%S",datetime.fromtimestamp(t).timetuple())
print format_seconds_since_epoch(1293840160)
So, putting that together with a slightly modified version of pynator's answer, you script might look like:
#!/usr/bin/env python
from datetime import datetime
from time import strftime
import os
def format_seconds_since_epoch(t):
return strftime("%d/%m/%y %H:%M:%S",datetime.fromtimestamp(t).timetuple())
fin = open('readFile.txt', 'r')
lines = fin.readlines()
fin.close()
del fin
split_lines = [ line.split("\t") for line in lines ]
split_lines.sort( lambda a, b: cmp(int(a[-2]),int(b[-2])) )
fout = open('outfile.txt', 'w')
for split_line in split_lines:
for i in (-2,-1):
split_line[i] = format_seconds_since_epoch(int(split_line[i]))
fout.write("\t".join(split_line)+os.linesep)
fout.close()
del fout
Note that using file as a variable name is a bad idea, since it shadows the built-in file type, so I changed them to fin and fout. (Even though you are deling the variables afterwards, it's still good style to avoid the name file, I think.)
In reply to your further question about the special "3600" value, your solution is fine. Personally, I would probably keep the format_seconds_since_epoch function as it is, so that it doesn't have a surprising special case and is more generally useful. You could create an additional wrapper function with the special case, or just change the split_line[i] = format_seconds_since_epoch(int(split_line[i])) line to:
entry = int(split_line[i])
if entry == 3600:
split_line[i] = "aborted"
else:
split_line[i] = format_seconds_since_epoch(entry)
... however I don't think there's much in the difference.
Incidentally, if this is more than a one-off task, I would see if you can use a later version of Python in the 2 series than 2.3, which is rather old now - they have lots of nice features that help one to write cleaner scripts.

Categories

Resources