Python: Simple script that parses metrics data - python

I have a small Python script that I need to modify because the format of the metrics file has changed slightly. I do not know Python at all and have tried to take an honest effort to fix it myself. The changes make sense to me but apparently there is still one issue with the script. Otherwise, everything else is working. Here's what the script looks like:
import sys
import datetime
##########################################################################
now = datetime.datetime.now();
logFile = now.strftime("%Y%m%d")+'.QE-Metric.log';
underlyingParse = True;
strParse = "UNDERLYING_TICK";
if (len(sys.argv) == 2):
if sys.argv[1] == '2':
strParse = "ORDER_SHOOT";
underlyingParse = False;
elif (len(sys.argv) == 3):
logFile = sys.argv[2];
if sys.argv[1] == '2':
strParse = "ORDER_SHOOT";
underlyingParse = False;
else:
print 'Incorrect number of arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
sys.exit()
##########################################################################
# Read the deployment file
FIput = open(logFile, 'r');
FOput = open('ParsedMetrics.txt', 'w');
##########################################################################
def ParseMetrics( file_lines ):
ii = 0
tokens = [];
for ii in range(len(file_lines)):
line = file_lines[ii].strip()
if (line.find(strParse) != -1):
tokens = line.split(",");
currentTime = float(tokens[2])
if (underlyingParse == True and ii != 0):
newIndex = ii-1
prevLine = file_lines[newIndex].strip()
while (prevLine.find("ORDER_SHOOT") != -1 and newIndex > -1):
newIndex -= 1;
tokens = prevLine.split(",");
currentTime -= float(tokens[2]);
prevLine = file_lines[newIndex].strip();
if currentTime > 0:
FOput.write(str(currentTime) + '\n')
##########################################################################
file_lines = FIput.readlines()
ParseMetrics( file_lines );
print 'Metrics parsed and written to ParsedMetrics.txt'
Everything is working fine except for the logic that is supposed to reverse iterate through previous lines to add up the ORDER_SHOOT numbers since the last UNDERLYING_TICK event occurred (starting at the code: if (underlyingParse == True and ii != 0):...) and then subtract that total from the current UNDERLYING_TICK event line being processed. This is what a typical line in the file being parsed looks like:
08:40:02.039387(+26): UNDERLYING_TICK, 1377, 1499.89
Basically, I'm only interested in the last data element (1499.89) which is the time in micros. I know it has to be something stupid. I just need another pair of eyes. Thanks!

So, if command line option is 2, the function creates an output file where all the lines contain just the 'time' portion of the lines from the input file that had the "order_shoot" token in them?
And if the command line option is 1, the function creates an output file with a line for each line in input file that contained the 'underlying_tick' token, except that the number you want here is the underlying_tick time value minus all the order_shoot time values that occurred SINCE the preceding underlying_tick value (or from the start of file if this is the first one)?
If this is correct, and all lines are unique (there are no duplicates), then I would suggest the following re-written script:
#### Imports unchanged.
import sys
import datetime
#### Changing the error checking to be a little simpler.
#### If the number of args is wrong, or the "mode" arg is
#### not a valid option, it will print the error message
#### and exit.
if len(sys.argv) not in (2,3) or sys.argv[2] not in (1,2):
print 'Incorrect arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
sys.exit()
#### the default previously specified in the original code.
now = datetime.datetime.now()
#### Using ternary logic to set the input file to either
#### the files specified in argv[2] (if it exists), or to
#### the default previously specified in the original code.
FIput = open((sys.argv[2] if len(sys.argv)==3
else now.strftime("%Y%m%d")+'.QE-Metric.log'), 'r');
#### Output file not changed.
FOput = open('ParsedMetrics.txt', 'w');
#### START RE-WRITTEN FUNCTION
def ParseMetrics(file_lines,mode):
#### The function now takes two params - the lines from the
#### input file, and the 'mode' - whichever the user selected
#### at run-time. As you can see from the call down below, this
#### is taken straight from argv[1].
if mode == '1':
#### So if we're doing underlying_tick mode, we want to find each tick,
#### then for each tick, sum the preceding order_shoots since the last
#### tick (or start of file for the first tick).
ticks = [file_lines.index(line) for line in file_lines \
if 'UNDERLYING_TICK' in line]
#### The above list comprehension iterates over file_lines, and creates
#### a list of the indexes to file_lines elements that contain ticks.
####
#### Then the following loop iterates over ticks, and for each tick,
#### subtracts the sum of all times for order_shoots that occure prior
#### to the tick, from the time value of the tick itself. Then that
#### value is written to the outfile.
for tick in ticks:
sub_time = float(file_lines[tick].split(",")[2]) - \
sum([float(line.split(",")[2]) \
for line in file_lines if "ORDER_SHOOT" in line \
and file_lines.index(line) <= tick]
FOput.write(float(line.split(",")[2]))
#### if the mode is 2, then it just runs through file_lines and
#### outputs all of the order_shoot time values.
if mode == '2':
for line in file_lines:
if 'ORDER_SHOOT' in line:
FOput.write(float(line.split(",")[2]))
#### END OF REWRITTEN FUNCTION
#### As you can see immediately below, we pass sys.argv[2] for the
#### mode argument of the ParseMetrics function.
ParseMetrics(FIput.readlines(),sys.argv[2])
print 'Metrics parsed and written to ParsedMetrics.txt'
And that should do the trick. The main issue is that if you have any lines with "UNDERLYING_TICK" that are exact duplicates of any other such line, then this will not work. Different logic would need to be applied to get the correct indexes.
I am sure there is a way to make this much better, but this was my first thought.
It's also worth noting I added a lot of inline line breaks to the above source for readability, but you might want to pull them if you use this as written.

It's unclear what is wrong with your output because you don't show your output and we can't really understand your input.
I am assuming the following:
Lines are formatted as "absolutetime: TYPE, positiveinteger, float_time_duration_in_ms", where this last item is the amount of time the thing took.
Lines are sorted by "absolutetime". As a consequence, the ORDER_SHOOTs that belong to an UNDERLYING_TICK are always on the lines since the last UNDERLYING_TICK (or the beginning of the file), and only those lines. If this assumption is not true, then you need to sort the file first. You can either do that with a separate program (e.g. pipe output from sort), or use the bisect module to store your lines sorted and easily extract the relevant lines.
If both these assumptions are true, take a look at the following script instead. (Untested because I don't have a big input sample or an output sample to compare against.)
This is a much more Pythonic style, much easier to read and understand, doesn't make use of global variables as function parameters, and should be much more efficient because it doesn't iterate backwards through lines or load the entire file into memory to parse it.
It also demonstrates use of the argparse module for your command line parsing. This isn't necessary, but if you have a lot of command-line Python scripts you should get familiar with it.
import sys
VALIDTYPES = ['UNDERLYING_TICK','ORDER_SHOOT']
def parseLine(line):
# format of `tokens`:
# 0 = absolute timestamp
# 1 = event type
# 2 = ???
# 3 = timedelta (microseconds)
tokens = [t.strip(':, \t') for t in line.strip().split()]
if tokens[1] not in VALIDTYPES:
return None
tokens[2] = int(tokens[2])
tokens[3] = float(tokens[3])
return tuple(tokens)
def parseMetrics(lines, parsetype):
"""Yield timedelta for each line of specified type
If parsetype is 'UNDERLYING_TICK', subtract previous ORDER_SHOOT
timedeltas from the current UNDERLYING_TICK delta before yielding
"""
order_shoots_between_ticks = []
for line in lines:
tokens = parseLine(line)
if tokens is None:
continue # go home early
if parsetype=='UNDERLYING_TICK':
if tokens[1]=='ORDER_SHOOT':
order_shoots_between_ticks.append(tokens)
elif tokens[1]=='UNDERLYING_TICK':
adjustedtick = tokens[3] - sum(t[3] for t in order_shoots_between_ticks)
order_shoots_between_ticks = []
yield adjustedtick
elif parsetype==tokens[1]:
yield tokens[3]
def parseFile(instream, outstream, parsetype):
printablelines = ("{0:f}\n".format(time) for time in parseMetrics(instream, parsetype))
outstream.writelines(printablelines)
def main(argv):
import argparse, datetime
parser = argparse.ArgumentParser(description='Output timedeltas from a QE-Metric log file')
parser.add_argument('mode', type=int, choices=range(1, len(VALIDTYPES)+1),
help="the types to parse. Valid values are: 1 (Underlying), 2 (OrderShoot)")
parser.add_argument('infile', required=False,
default='{}.QE-Metric.log'.format(datetime.datetime.now().strftime('%Y%m%d'))
help="the input file. Defaults to today's file: YYYYMMDD.QE-Metric.log. Use - for stdin.")
parser.add_argument('outfile', required=False,
default='ParsedMetrics.txt',
help="the output file. Defaults to ParsedMetrics.txt. Use - for stdout.")
parser.add_argument('--verbose', '-v', action='store_true')
args = parser.parse_args(argv)
args.mode = VALIDTYPES[args.mode-1]
if args.infile=='-':
instream = sys.stdin
else:
instream = open(args.infile, 'rb')
if args.outfile=='-':
outstream = sys.stdout
else:
outstream = open(args.outfile, 'wb')
parseFile(instream, outstream, args.mode)
instream.close()
outstream.close()
if args.verbose:
sys.stderr.write('Metrics parsed and written to {0}\n'.format(args.outfile))
if __name__=='__main__':
main(sys.argv[1:])

Related

How to read values one whitespace separated value at a time?

In C++ you can read one value at a time like this:
//from console
cin >> x;
//from file:
ifstream fin("file name");
fin >> x;
I would like to emulate this behaviour in Python. It seems, however, that the ordinary ways to get input in Python read either whole lines, the whole file, or a set number of bits.
I would like a function, let's call it one_read(), that reads from a file until it encounters either a white-space or a newline character, then stops. Also, on subsequent calls to one_read() the input should begin where it left off.
Examples of how it should work:
# file input.in is:
# 5 4
# 1 2 3 4 5
n = int(one_read())
k = int(one_read())
a = []
for i in range(n):
a.append(int(one_read()))
# n = 5 , k = 4 , a = [1,2,3,4,5]
How can I do this?
I think the following should get you close. I admit I haven't tested the code carefully. It sounds like itertools.takewhile should be your friend, and a generator like yield_characters below will be useful.
from itertools import takewhile
import regex as re
# this function yields characters from a file one a at a time.
def yield_characters(file):
with open(file, 'r') as f:
while f:
line = f.readline()
for char in line:
yield char
# double check this. My python regex is weak.
def not_whitespace(char):
return bool(re.match(r"\S", char))
# this should use takewhile to get iterators while something is
def read_one(file):
chars = yield_character(file)
while chars:
yield list(takewhile(not_whitespace, chars)).join()
The read_one above is a generator, so you will need to do something like call list on it.
Normally you would just read a line at a time, then split this and work with each part. However if you can't do this for resource reasons, you can implement your own reader which will read one character at a time, and then yield a word each time it reaches a delimiter (or in this example also a newline or the end of the file).
This implemention uses a context manager to handle the file opening/reading, though this might be overkill:
from functools import partial
class Words():
def __init__(self, fname, delim):
self.delims = ['\n', delim]
self.fname = fname
self.fh = None
def __enter__(self):
self.fh = open(self.fname)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.fh.close()
def one_read(self):
chars = []
for char in iter(partial(self.fh.read, 1), ''):
if char in self.delims:
# delimiter signifies end of word
word = ''.join(chars)
chars = []
yield word
else:
chars.append(char)
# Assuming x.txt contains 12 34 567 8910
with Words('/tmp/x.txt', ' ') as w:
print(next(w.one_read()))
# 12
print(next(w.one_read()))
# 34
print(list(w.one_read()))
# [567, 8910]
More or less anything that operates on files in Python can operate on the standard input and standard output. The sys standard library module defines stdin and stdout which give you access to those streams as file-like objects.
Reading a line at a time is considered idiomatic in Python because the other way is quite error-prone (just one C++ example question on Stack Overflow). But if you insist: you will have to build it yourself.
As you've found, .read(n) will read at most n text characters (technically, Unicode code points) from a stream opened in text mode. You can't tell where the end of the word is until you read the whitespace, but you can .seek back one spot - though not on the standard input, which isn't seekable.
You should also be aware that the built-in input will ignore any existing data on the standard input before prompting the user:
>>> sys.stdin.read(1) # blocks
foo
'f'
>>> # the `foo` is our input, the `'f'` is the result
>>> sys.stdin.read(1) # data is available; doesn't block
'o'
>>> input()
bar
'bar'
>>> # the second `o` from the first input was lost
Try creating a class to remember where the operation left off.
The __init__ function takes the filename, you could modify this to take a list or other iterable.
read_one checks if there is anything left to read, and if there is, removes and returns the item at index 0 in the list; that being everything until the first whitespace.
class Reader:
def __init__(self, filename):
self.file_contents = open(filename).read().split()
def read_one(self):
if self.file_contents != []:
return self.file_contents.pop(0)
Initalise the function as follows and adapt to your liking:
reader = Reader(filepath)
reader.read_one()

Turtle module doesn't open window

I have an exercise at school where we have to use the sys module to read a script file that contains instructions for the turtle module.
The script file is a .trtl file.
It contains the following info, formatted as below:
Walk
100
Turn
90
Walk
50
Turn
90
Walk
100
Turn
90
Walk
50
I have tried this code:
import sys
import turtle
for idx, line in enumerate(sys.stdin):
move = 0
while (idx % 2) == 0:
move = line
while (idx % 2) != 0:
if line == "Walk":
forward(move)
elif line == "Turn":
left(move)
when I try running this code with stdin from the script file, my terminal just goes to the next line without doing anything. I can see, that the program is running, and can KeyboardInterrupt it, but no window appears.
Any help would be greatly appreciated!
Your issue stems from two main problems:
Reading a line from a file will read the ENTIRE line, including the new line character at the end. Using .rstrip() method will remove that.
Reading a line from a file reads a string. You have to coerce it to the type that you need. For example, when you read the line 100, you are reading in 4 bytes: '1', '0' '0' '\n', not the number 100. You will need to add an int() coercion as well as strip the trailing \n to these lines.
Read up on the difference between an if and a while statement. An if statement will check the logical value of its argument, and decide whether or not to execute the following block of code. A while loop will loop through the code as long as the logical statement is truthy.
There are several other issues with your code, which I will indicate in comments.
Fixing your code:
import sys
import turtle
for idx, line in enumerate(sys.stdin):
#Strip trailing character
line = line.rstrip()
#Change if to while
if (idx % 2) == 0:
#Coerce value to int instead of string
move = int(line)
# idx % 2 will either be 0 or not 0, no need to check twice.
# If it is not 0, then this else statement will run
else:
#Now that line has been stripped of trailing chars, we can check
if line == "Walk":
forward(move)
#Alternately, we can use
# if line.startswith("Walk"):
# and not have to do an rstrip
elif line == "Turn":
left(move)
For fun, an alternative to #blackbrandt's detailed solution (+1) that, although terse, is also more easily expanded to additional monadic operators:
import sys
import turtle
commands = {'Walk': turtle.forward, 'Turn': turtle.left}
for command, argument in zip(sys.stdin, sys.stdin):
if method := commands.get(command.rstrip()):
method(int(argument))
turtle.exitonclick()

How to remove brackets and the contents inside from a file

I have a file named sample.txt which looks like below
ServiceProfile.SharediFCList[1].DefaultHandling=1
ServiceProfile.SharediFCList[1].ServiceInformation=
ServiceProfile.SharediFCList[1].IncludeRegisterRequest=n
ServiceProfile.SharediFCList[1].IncludeRegisterResponse=n
Here my requirement is to remove the brackets and the integer and enter os commands with that
ServiceProfile.SharediFCList.DefaultHandling=1
ServiceProfile.SharediFCList.ServiceInformation=
ServiceProfile.SharediFCList.IncludeRegisterRequest=n
ServiceProfile.SharediFCList.IncludeRegisterResponse=n
I am quite a newbie in Python. This is my first attempt. I have used these codes to remove the brackets:
#!/usr/bin/python
import re
import os
import sys
f = os.open("sample.txt", os.O_RDWR)
ret = os.read(f, 10000)
os.close(f)
print ret
var1 = re.sub("[\(\[].*?[\)\]]", "", ret)
print var1f = open("removed.cfg", "w+")
f.write(var1)
f.close()
After this using the file as input I want to form application specific commands which looks like this:
cmcli INS "DefaultHandling=1 ServiceInformation="
and the next set as
cmcli INS "IncludeRegisterRequest=n IncludeRegisterRequest=y"
so basically now I want the all the output to be bunched to a set of two for me to execute the commands on the operating system.
Is there any way that I could bunch them up as set of two?
Reading 10,000 bytes of text into a string is really not necessary when your file is line-oriented text, and isn't scalable either. And you need a very good reason to be using os.open() instead of open().
So, treat your data as the lines of text that it is, and every two lines, compose a single line of output.
from __future__ import print_function
import re
command = [None,None]
cmd_id = 1
bracket_re = re.compile(r".+\[\d\]\.(.+)")
# This doesn't just remove the brackets: what you actually seem to want is
# to pick out everything after [1]. and ignore the rest.
with open("removed_cfg","w") as outfile:
with open("sample.txt") as infile:
for line in infile:
m = bracket_re.match(line)
cmd_id = 1 - cmd_id # gives 0, 1, 0, 1
command[cmd_id] = m.group(1)
if cmd_id == 1: # we have a pair
output_line = """cmcli INS "{0} {1}" """.format(*command)
print (output_line, file=outfile)
This gives the output
cmcli INS "DefaultHandling=1 ServiceInformation="
cmcli INS "IncludeRegisterRequest=n IncludeRegisterResponse=n"
The second line doesn't correspond to your sample output. I don't know how the input IncludeRegisterResponse=n is supposed to become the output IncludeRegisterRequest=y. I assume that's a mistake.
Note that this code depends on your input data being precisely as you describe it and has no error checking whatsoever. So if the format of the input is in reality more variable than that, then you will need to add some validation.

Copy pieces of data from a .txt into another file for a spreadsheet

I have a bunch of data in .txt file and I need it in a format that I can use in fusion tables/spreadsheet. I assume that that format would be a csv that I can write into another file that I can then import into a spreadsheet to work with.
The data is in this format with multiple entries separated by a blank line.
Start Time
8/18/14, 11:59 AM
Duration
15 min
Start Side
Left
Fed on Both Sides
No
Start Time
8/18/14, 8:59 AM
Duration
13 min
Start Side
Right
Fed on Both Sides
No
(etc.)
but I need it ultimately in this format (or whatever i can use to get it into a spreadsheet)
StartDate, StartTime, Duration, StartSide, FedOnBothSides
8/18/14, 11:59 AM, 15, Left, No
- , -, -, -, -
The problems I have come across are:
-I don't need all the info or every line but i'm not sure how to automatically separate them. I don't even know if the way I am going about sorting each line is smart
-I have been getting an error that says that "argument 1 must be string or read-only character buffer, not list" when I use .read() or .readlines() sometimes (although it did work at first). also both of my arguments are .txt files.
-the dates and times are not in set formats with regular lengths (it has 8/4/14, 5:14 AM instead of 08/04/14, 05:14 AM) which I'm not sure how to deal with
this is what I have tried so far
from sys import argv
from os.path import exists
def filework():
script, from_file, to_file = argv
print "copying from %s to %s" % (from_file, to_file)
in_file = open(from_file)
indata = in_file.readlines() #.read() .readline .readlines .read().splitline .xreadlines
print "the input file is %d bytes long" % len(indata)
print "does the output file exist? %r" % exists(to_file)
print "ready, hit RETURN to continue, CTRL-C to abort."
raw_input()
#do stuff section----------------BEGIN
for i in indata:
if i == "Start Time":
pass #do something
elif i== '{date format}':
pass #do something
else:
pass #do something
#do stuff section----------------END
out_file = open(to_file, 'w')
out_file.write(indata)
print "alright, all done."
out_file.close()
in_file.close()
filework()
So I'm relatively unversed in scripts like this that have multiple complex parts. Any help and suggestions would be greatly appreciated. Sorry if this is a jumble.
Thanks
This code should work, although its not exactly optimal, but I'm sure you'll figure out how to make it better!
What this code basically does is:
Get all the lines from the input data
Loop through all the lines, and try to recognize different keys (the start time etc)
If a keys is recognize, get the line beneath it, and apply a appropriate function to it
If a new line is found, add the current entry to a list, so that other entries can be read
Write the data to a file
Incase you haven't seen string formatting being done this way before:
"{0:} {1:}".format(arg0, arg1), the {0:} is just a way of defining a placeholder for a variable(here: arg0), and the 0 just defines which arguments to use.
Find out more here:
Python .format docs
Python OrderedDict docs
If you are using a version of python < 2.7, you might have to install a other version of ordereddicts by using pip install ordereddict. If that doesn't work, just change data = OrderedDict() to data = {}, and it should work. But then the output will look somewhat different each time it is generated, but it will still be correct.
from sys import argv
from os.path import exists
# since we want to have a somewhat standardized format
# and dicts are unordered by default
try:
from collections import OrderedDict
except ImportError:
# python 2.6 or earlier, use backport
from ordereddict import OrderedDict
def get_time_and_date(time):
date, time = time.split(",")
time, time_indic = time.split()
date = pad_time(date)
time = "{0:} {1:}".format(pad_time(time), time_indic)
return time, date
"""
Make all the time values look the same, ex turn 5:30 AM into 05:30 AM
"""
def pad_time(time):
# if its time
if ":" in time:
separator = ":"
# if its a date
else:
separator = "/"
time = time.split(separator)
for index, num in enumerate(time):
if len(num) < 2:
time[index] = "0" + time[index]
return separator.join(time)
def filework():
from_file, to_file = argv[1:]
data = OrderedDict()
print "copying from %s to %s" % (from_file, to_file)
# by using open(...) the file closes automatically
with open(from_file, "r") as inputfile:
indata = inputfile.readlines()
entries = []
print "the input file is %d bytes long" % len(indata)
print "does the output file exist? %r" % exists(to_file)
print "ready, hit RETURN to continue, CTRL-C to abort."
raw_input()
for line_num in xrange(len(indata)):
# make the entire string lowercase to be more flexible,
# and then remove whitespace
line_lowered = indata[line_num].lower().strip()
if "start time" == line_lowered:
time, date = get_time_and_date(indata[line_num+1].strip())
data["StartTime"] = time
data["StartDate"] = date
elif "duration" == line_lowered:
duration = indata[line_num+1].strip().split()
# only keep the amount of minutes
data["Duration"] = duration[0]
elif "start side" == line_lowered:
data["StartSide"] = indata[line_num+1].strip()
elif "fed on both sides" == line_lowered:
data["FedOnBothSides"] = indata[line_num+1].strip()
elif line_lowered == "":
# if a blank line is found, prepare for reading a new entry
entries.append(data)
data = OrderedDict()
entries.append(data)
# create the outfile if it does not exist
with open(to_file, "w+") as outfile:
headers = entries[0].keys()
outfile.write(", ".join(headers) + "\n")
for entry in entries:
outfile.write(", ".join(entry.values()) + "\n")
filework()

Change two lines in text

I have a python script mostly coded so far for a project I'm currently working on and have hit a road block. I essentially run a program that spits out the following output file (called big.dmp):
)O+_05 Big-body initial data (WARNING: Do not delete this line!!)
) Lines beginning with `)' are ignored.
)---------------------------------------------------------------------
style (Cartesian, Asteroidal, Cometary) = Cartesian
epoch (in days) = 1365250.
)---------------------------------------------------------------------
COMPSTAR r=5.00000E-01 d=3.00000E+00 m= 0.160000000000000E+01
4.570923967127310E-01 1.841433531828977E+01 0.000000000000000E+00
-6.207379670518027E-03 1.540861575481520E-04 0.000000000000000E+00
0.000000000000000E+00 0.000000000000000E+00 0.000000000000000E+00
Now with this file I need to edit both the epoch line and the line beginning with COMPSTAR while keeping the rest of the information constant from integration to integration as the last 3 lines contain the cartesian coordinates of my object and is essentially what the program is outputting.
I know how to use f = open('big.dmp', 'w') and f.write('text here') to create the initial file but how would one go about reading these final three lines into a new big.dmp file for the next integration?
Something like this perhaps?
infile = open('big1.dmp')
outfile = open('big2.dmp', 'w')
for line in infile:
if line.startswith(')'):
# ignore comments
pass
elif 'epoch' in line:
# do something with line
line = line.replace('epoch', 'EPOCH')
elif line.startswith('COMPSTAR'):
# do something with line
line = line.replace('COMPSTAR', 'comparison star')
outfile.write(line)
Here is a somewhat more change-tolerant version:
import re
reg_num = r'\d+'
reg_sci = r'[-+]?\d*\.?\d+([eE][+-]?\d+)?'
def update_config(s, finds=None, replaces=None, **kwargs):
if finds is None: finds = update_config.finds
if replaces is None: replaces = update_config.replaces
for name,value in kwargs.iteritems():
s = re.sub(finds[name], replaces[name].format(value), s)
return s
update_config.finds = {
'epoch': r'epoch \(in days\) =\s*'+reg_num+'\.',
'r': r' r\s*=\s*' + reg_sci,
'd': r' d\s*=\s*' + reg_sci,
'm': r' m\s*=\s*' + reg_sci
}
update_config.replaces = {
'epoch': 'epoch (in days) ={:>11d}.',
'r': ' r={:1.5E}',
'd': ' d={:1.5E}',
'm': ' m= {:1.15E}'
}
def main():
with open('big.dmp') as inf:
s = inf.read()
s = update_config(s, epoch=1365252, r=0.51, d=2.99, m=1.1)
with open('big.dmp', 'w') as outf:
outf.write(s)
if __name__=="__main__":
main()
On the off-chance that the format of your file is fixed with regard to line numbers, this solution will change only the two lines:
with open('big.dmp') as inf, open('out.txt', 'w') as outf:
data = inf.readlines()
data[4] = ' epoch (in days) = 9999.\n' # line with epoch
data[6] = 'COMPSTAR r=2201 d=3330 m= 12\n' # line with COMPSTAR
outf.writelines(data)
resulting in this output file:
)O+_05 Big-body initial data (WARNING: Do not delete this line!!)
) Lines beginning with `)' are ignored.
)---------------------------------------------------------------------
style (Cartesian, Asteroidal, Cometary) = Cartesian
epoch (in days) = 9999.
)---------------------------------------------------------------------
COMPSTAR r=2201 d=3330 m= 12
4.570923967127310E-01 1.841433531828977E+01 0.000000000000000E+00
-6.207379670518027E-03 1.540861575481520E-04 0.000000000000000E+00
0.000000000000000E+00 0.000000000000000E+00 0.000000000000000E+00
Clearly this will not work if the line numbers aren't consistent, but I thought I'd offer it up just in case your data format is consistent in terms of line numbers.
Also, since it reads the whole file into memory at once, it won't be an ideal solution for truly huge files.
The advantage of opening files using with is that they are automatically closed for you when you are done with them, or if you encounter an exception.
There are more flexible solution (searching for the strings, processing the file line-by-line) but if your data is fixed and small, there's no downside of taking advantage of those factors. Somebody smart once said "Simple is better than complex." (The Zen of Python)
It's a little hard to understand what you want, but assuming that you only want to remove the lines not starting with ):
text = open(filename).read()
lines = text.split("\n")
result = [line for line in lines if not line.startswith(")")
or, the one liner:
[line for line in open(file_name).read().split("\n") if not line.startswith(")")]

Categories

Resources