Python: using loops to save multiple files - python

I have the following code which collects certain strings in a log file using a loop:
skip_line = True
writefile = open("input_nn.txt", "a")
with open("input_n.txt","r") as myfile:
for line in myfile:
if "Distance" in line:
skip_line = True
elif "Input" in line:
skip_line = False
else:
pass
if skip_line:
continue
writefile.write(line)`
To this loop I want to add a line(s) which saves the output after the elif statement to a new .scr/.txt file each time it goes through this loop.
I've seen posts which do this for arrays of numbers however those loops work since the values of the arrays can be called by index. I have no clue how to do it in this case since I'll be saving lines of strings to different files.

Related

Using Python3 to search a file for a string, add the results on the next lines to an array before stopping at the next string

I am using Python 3 to process a results file. The structure of the file is a combination of string identifiers followed by lists of integer values in this format:
ENERGY_BOUNDS
1.964033E+07 1.733253E+07 1.491825E+07 1.384031E+07 1.161834E+07 1.000000E+07 8.187308E+06 6.703200E+06
6.065307E+06 5.488116E+06 4.493290E+06 3.678794E+06 3.011942E+06 2.465970E+06 2.231302E+06 2.018965E+06
EIGENVALUE
1.219034E+00
There are maybe 50 different sets of data with unique identifiers in this file. What I want to do is write a code that will search for a specific identifier (e.g. ENERGY_BOUNDS), then read the values that follow into a list, stopping at the next identifier (in this case EIGENVALUE). I then need to be able to manipulate the list (finding its length, printing its values, etc.).
I am writing this as a function so I can call it multiple times in my code when I want to search for different identifiers. So far what I have is:
def read_data_from_file(file_name, identifier):
list_of_results = [] # Create list_of_results to put results in for future manipulation
# Open the file in read only mode
with open(file_name, 'r') as read_obj:
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
# If yes, read the next line
nextValue = next(line)
list_of_results.append(nextValue.rstrip())
return list_of_results
It works fine up until it comes to reading the next line after the identifier, and I am stuck on how to continue reading the results after that line and how to make it stop at the next identifier.
Following is simple and tested answer.
You are making two mistakes
line is a string and not iterator so doing next(line) is causing error.
You are just reading one line after identifier has been found while you need to keep on reading until another identifier appears.
Following is the code after doing little modification of your code. It's also tested on your data
def read_data_from_file(file_name, identifier):
with open(file_name, 'r') as read_obj:
list_of_results = []
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
# If yes, read the next line
nextValue = next(read_obj)
while(not nextValue.strip().isalpha()): #keep on reading untill next identifier appears
list_of_results.extend(nextValue.split())
nextValue = next(read_obj)
print(list_of_results)
I would suggest adding a variable that indicates whether you have found a line containing an identifier.
Afterwards, simply add the values into the array until the next identifier has been reached.
def read_data_from_file(file_name, identifier):
list_of_results = [] # Create list_of_results to put results in for future manipulation
identifier_found = False
# Open the file in read only mode
with open(file_name, 'r') as read_obj:
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
identifier_found = True
elif identifier_found:
if line.strip().isalpha(): # Next identifier reached, exit loop
break
list_of_results += line.split() # Add values to result
return list_of_results
Use booleans, continue, and break!
Try to implement logic as follows:
Set a boolean (I'll use in_range) to False
Look through the lines and see if they match the identifier.
If it does, set the boolean to True and continue
If it does not, continue
If the boolean is False AND the line begins with a space: continue
If the boolean is True AND the line begins with a space: Add the line to the list.
If the boolean is True AND the line doesn't begin with a space: break.
This ends the searching process once a new identifier has been started.
The other 2 answers are already helpful. Here is my method incase that you need something else. With comments to explain.
If you dont want to use the end_identifier you can use .isAlpha() which checks if the string only contains letters.
def read_data_from_file(file_name, start_identifier, end_identifier):
list_of_results = []
with open(file_name, 'r') as read_obj:
start_identifier_reached = False # variable to check if we reached the needed identifier_reached
for line in read_obj:
if start_identifier in line:
start_identifier_reached = True # now we reached the identifier
continue # We go back to the start so we dont write the identifier into the list
if start_identifier_reached and (end_identifier not in line): # Put the values into the list until we reach the end_identifier
list_of_results.append(line.rstrip())
else:
return list_of_results

Extract a part from a file between specific lines

I would like to know, how can I extract some data from a specific range in a big data file? Is there a way to read the content beginning and ending with "buzzwords".
I would like to read line per line between *NODE and **
*NODE
13021145, 2637.6073002472617, 55.011929824413045, 206.0394346892517
13021146, 2637.6051226039867, 55.21115693303926, 206.05686503802065
13021147, 2634.226986419154, 54.98263035830583, 205.9520084547658
13021148, 2634.224808775879, 55.181857466932044, 205.96943880353476
**
Before *NODE and after ** there are thousand of lines...
I know it should look something similar like:
a = []
with open('file.txt') as file:
for line in file:
if line.startswith('*NODE'):
# NOW THERE SHOULD FOLLOW SOMETHING LIKE:
# Go to next line and "a.append" till there comes the "magical"
# "**"
Any idea? I am totally new to python. Thanks for help!
I hope u know what i mean.
You pretty much did it - the only thing missing is that once you find the beginning, you search for the sequence end and until that happens append every line you're iterating over to your list. i.e.:
data = None # a placeholder to store your lines
with open("file.txt", "r") as f: # do not shadow the built-in `file`
for line in f: # iterate over the lines
if data is None: # we haven't found `NODE*` yet
if line[:5] == "NODE*": # search for `NODE*` at the line beginning
data = [] # make `data` an empty list to begin collecting
elif line[:2] == "**": # data initialized, we look for the sequence's end
break # no need to iterate over the file anymore
else: # data initialized but not at the end...
data.append(line) # append the line to our data
Now data will contain either a list of lines between NODE* and **, or None if the sequence was not found.
Try this:
with open('file.txt') as file:
a = []
running = False # avoid NameError when 'if' statement below isn't reached
for line in file:
if line.startswith('*NODE'):
running = True # show that we are starting to add values
continue # make sure we don't add '*NODE'
if line.startswith('**'):
running = False # show that we're done adding values
continue # make sure we don't add '**'
if running: # only add the values if 'running' is True
a.extend([i.strip() for i in line.split(',')])
The output is a list containing the following as strings:
(I used print('\n'.join(a)))
13021145
2637.6073002472617
55.011929824413045
206.0394346892517
13021146
2637.6051226039867
55.21115693303926
206.05686503802065
13021147
2634.226986419154
54.98263035830583
205.9520084547658
13021148
2634.224808775879
55.181857466932044
205.96943880353476
We can iterate over lines until there is no any left or we've reached end of block like
a = []
with open('file.txt') as file:
for line in file:
if line.startswith('*NODE'):
# collect block-related lines
while True:
try:
line = next(file)
except StopIteration:
# there is no lines left
break
if line.startswith('**'):
# we've reached the end of block
break
a.append(line)
# stop iterating over file
break
will give us
print(a)
['13021145, 2637.6073002472617, 55.011929824413045, 206.0394346892517\n',
'13021146, 2637.6051226039867, 55.21115693303926, 206.05686503802065\n',
'13021147, 2634.226986419154, 54.98263035830583, 205.9520084547658\n',
'13021148, 2634.224808775879, 55.181857466932044, 205.96943880353476\n']
Alternatively we can write helper predicates like
def not_a_block_start(line):
return not line.startswith('*NODE')
def not_a_block_end(line):
return not line.startswith('**')
and then use brilliance of itertools module like
from itertools import (dropwhile,
takewhile)
with open('file.txt') as file:
block_start = dropwhile(not_a_block_start, file)
# skip block start line
next(block_start)
a = list(takewhile(not_a_block_end, block_start))
this will give us the same value for a.

object.write() is not working as expected

I am new in python. I want to read one file and copy data to another file. my code is following. In code below, when I open the files inside the for loop then I can write all the data into dst_file. but it takes 8 seconds to write dst_file.
for cnt, hex_num in enumerate(hex_data):
with open(src_file, "r") as src_f, open(dst_file, "a") as dst_f:
copy_flag = False
for src_line in src_f:
if r"SPI_frame_0" in src_line:
src_line = src_line.replace('SPI_frame_0', 'SPI_frame_' + str(cnt))
copy_flag = True
if r"halt" in src_line:
copy_flag = False
if copy_flag:
copy_mid_data += src_line
updated_data = WriteHexData(copy_mid_data, hex_num, cnt, msb_lsb_flag)
copy_mid_data = ""
dst_f.write(updated_data)
To improve performance, I am trying to open the files outside of the for loop. but it is not working properly. it is writing only once (one iteration of for loop) in the dst_file. As shown below.
with open(src_file, "r") as src_f, open(dst_file, "a") as dst_f:
for cnt, hex_num in enumerate(hex_data):
copy_flag = False
for src_line in src_f:
if r"SPI_frame_0" in src_line:
src_line = src_line.replace('SPI_frame_0', 'SPI_frame_' + str(cnt))
copy_flag = True
if r"halt" in src_line:
copy_flag = False
if copy_flag:
copy_mid_data += src_line
updated_data = WriteHexData(copy_mid_data, hex_num, cnt, msb_lsb_flag)
copy_mid_data = ""
dst_f.write(updated_data)
can someone please help me to find my mistake?
Files are iterators. Looping over them reads the file line by line. Until you reach the end. They then don't just go back to the start when you try to read more. A new for loop over a file object does not 'reset' the file.
Either re-open the input file each time in the loop, seek back to the start explicitly, or read the file just once. You can seek back with src_f.seek(0), reopening means you need to use two with statements (one to open the output file once, the other in the for loop to handle the src_f source file).
In this case, given that you build up the data to be written out to memory in one go anyway, I'd read the input file just once, keeping only the lines you need to copy.
You can use multiple for loops over the same file object, the file position will change accordingly. That makes reading a series of lines from a match on one key string to another very simple. The itertools.takewhile() function makes it even easier:
from itertools import takewhile
# read the correct lines (from SPI_frame_0 to halt) from the source file
lines = []
with open(src_file, "r") as src_f:
for line in src_f:
if r"SPI_frame_0" in line:
lines.append(line)
# read additional lines until we find 'halt'
lines += takewhile(lambda l: 'halt' not in l, src_f)
# transform the source lines with a new counter
with open(dst_file, "a") as dst_f:
for cnt, hex_num in enumerate(hex_data):
copy_mid_data = []
for line in lines:
if "SPI_frame_0" in line:
line = line.replace('SPI_frame_0', 'SPI_frame_{}'.format(cnt))
copy_mid_data.append(line)
updated_data = WriteHexData(''.join(copy_mid_data), hex_num, cnt, msb_lsb_flag)
dst_f.write(updated_data)
Note that I changed copy_mid_data to a list to avoid quadratic string copying; it is far more efficient to join a list of strings just once.

Python - how to get last line in a loop

I have some CSV files that I have to modify which I do through a loop. The code loops through the source file, reads each line, makes some modifications and then saves the output to another CSV file. In order to check my work, I want the first line and the last line saved in another file so I can confirm that nothing was skipped.
What I've done is put all of the lines into a list then get the last one from the index minus 1. This works but I'm wondering if there is a more elegant way to accomplish this.
Code sample:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
check_count = 0
check_list = []
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
check_list.append(line)
check_count += 1
if check_count == 1:
check.write(line)
[CSV modifications become a string called "newline"]
fb.write(newline)
final_check = check_list[len(check_list)-1]
check.write(final_check)
fb.close()
If you actually need check_list for something, then, as the other answers suggest, using check_list[-1] is equivalent to but better than check_list[len(check_list)-1].
But do you really need the list? If all you want to keep track of is the first and last lines, you don't. If you keep track of the first line specially, and keep track of the current line as you go along, then at the end, the first line and the current line are the ones you want.
In fact, since you appear to be writing the first line into check as soon as you see it, you don't need to keep track of anything but the current line. And the current line, you've already got that, it's line.
So, let's strip all the other stuff out:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
first_line = True
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
if first_line:
check.write(line)
first_line = False
[CSV modifications become a string called "newline"]
fb.write(newline)
check.write(line)
fb.close()
You can enumerate the csv rows of inpunt file, and check the index, like this:
def CVS1():
with open('C:\\HP\\WS\\final-cir.csv','wb') as fb, open('C:\\HP\\WS\\check-all.csv','wb') as check, open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for idx,line in enumerate(skip_first_line):
if idx==0 or idx==len(skip_first_line):
check.write(line)
#[CSV modifications become a string called "newline"]
fb.write(newline)
I've replaced the open statements with with block, to delegate to interpreter the files handlers
you can access the index -1 directly:
final_check = check_list[-1]
which is nicer than what you have now:
final_check = check_list[len(check_list)-1]
If it's not an empty or 1 line file you can:
my_file = open(root_to file, 'r')
my_lines = my_file.readlines()
first_line = my_lines[0]
last_line = my_lines[-1]

Python I/O Index out of range, not an off-by-one error (I think)

I have this simple code which is really just to help me understand how Python I/O works:
inFile = open("inFile.txt",'r')
outFile = open("outFile.txt",'w')
lines = inFile.readlines()
first = True
for line in lines:
if first == True:
outFile.write(line) #always print the header
first = False
continue
nums = line.split()
outFile.write(nums[3] + "\n") #print the 4th column of each row
outFile.close()
My input file is something like this:
#header
34.2 3.42 64.56 54.43 3.45
4.53 65.6 5.743 34.52 56.4
4.53 90.8 53.45 134.5 4.58
5.76 53.9 89.43 54.33 3.45
The output prints out into the file just as it should but I also get the error:
outFile.write(nums[3] + "\n")
IndexError: list index out of range
I'm assuming this is because it has continued to read the next line although there is no longer any data?
Others have already answered your question. Here is a better way to "always print out the file header", avoiding testing for first at every iteration:
with open('inFile.txt', 'r') as inFile, open('outFile.txt', 'w') as outFile:
outFile.write(inFile.readline()) #always print the header
for line in inFile:
nums = line.split()
if len(nums) >= 4: #Checks to make sure a fourth column exists.
outFile.write(nums[3] + "\n") #print the 4th column of each row
A couple things are going on here:
with open('inFile.txt', 'r') as inFile, open('outFile.txt', 'w') as outFile:
The with expression is a convenient way to open files because it automatically closes the files even if an exception occurs and the with block exits early.
Note: In Python 2.6, you will need to use two with statements, as support for multiple contexts was not added until 2.7. e.g:
with open(somefile, 'r') as f:
with open(someotherfile, 'w') as g:
#code here.
outFile.write(inFile.readline()) #always print the header
The file object is an iterator that gets consumed. When readline() is called, the buffer position advances forwards and the first line is returned.
for line in inFile:
As mentioned before, the file object is an iterator, so you can use it directly in a for loop.
The error shows that in your source code you have the following line:
outFile.write(nums[6] + "\n")
Note that the 6 there is different from the 3 you show in your question. You may have two different versions of the file.
It fails because nums is the result of splitting a line and in your case it contains only 5 elements:
for line in lines:
# ...
# line is for example "34.2 3.42 64.56 54.43 3.45"
nums = line.split()
print len(nums)
You can't index past the end of a list.
You also may have an error in your code. You write the header, then split it and write one element from it. You probably want an if/else.
for line in lines:
if first == 1:
# do something with the header
else:
# do something with the other lines
Or you could just handle the header separately before you enter the loop.
The problem is that you are processing the "header line" just like the rest of the data. I.e., even though you identify the header line, you don't skip its processing. I.e., you don't avoid split()'ing it further down in the loop which causes the run-time error.
To fix your problem simply insert a continue as shown:
first = True
for line in lines:
if first == True:
outFile.write(line) #always print the header
first = False
continue ## skip the rest of the loop and start from the top
nums = line.split()
...
that will bypass the rest of the loop and all will work as it should.
The output file outFile.txt will contain:
#header
54.43
34.52
134.5
54.33
And the 2nd problem turned out having blank lines at the end of the input file (see discussion in comments below)
Notes: You could restructure your code, but if you are not interested in doing that, the simple fix above lets you keep all of your present code, and only requires the addition of the one line. As mentioned in other posts, it's worth looking into using with to manage your open files as it will also close them for you when you are done or an exception is encountered.

Categories

Resources