How to read the last few lines within a file using Python? - python

I am reading a folder with a specific file name. I am reading the content within a file, but how do I read specific lines or the last 6 lines within a file?
************************************
Test Scenario No. 1
TestcaseID = FB_71125_1
dpSettingScript = FB_71125_1_DP.txt
************************************
Setting Pre-Conditions (DP values, Sqlite DB):
cp /fs/images/nfs/FileRecogTest/MNT/test/Databases/FB_71125_1_device.sqlite $NUANCE_DB_DIR/device.sqlite
"sync" twice.
Starting the test:
0#00041511#0000000000# FILERECOGNITIONTEST: = testScenarioNo (int)1 =
0#00041514#0000000000# FILERECOGNITIONTEST: = TestcaseID (char*)FB_71125_1 =
0#00041518#0000000000# FILERECOGNITIONTEST: = dpSettingScript (char*)FB_71125_1_DP.txt =
0#00041520#0000000000# FILERECOGNITIONTEST: = UtteranceNo (char*)1 =
0#00041524#0000000000# FILERECOGNITIONTEST: = expectedEventData (char*)0||none|0||none =
0#00041528#0000000000# FILERECOGNITIONTEST: = expectedFollowUpDialog (char*) =
0#00041536#0000000000# FILERECOGNITIONTEST: /fs/images/nfs/FileRecogTest/MNT/test/main_menu.wav#MEDIA_COND:PAS_MEDIA&MEDIA_NOT_BT#>main_menu.global<#<FS0000_Pos_Rec_Tone><FS1000_MainMenu_ini1>
0#00041789#0000000000# FILERECOGNITIONTEST: Preparing test data done
0#00043768#0000000000# FILERECOGNITIONTEST: /fs/images/nfs/FileRecogTest/MNT/test/Framework.wav##>{any_device_name}<#<FS0000_Pos_Rec_Tone><FS1400_DeviceDisambig_<slot>_ini1>
0#00044008#0000000000# FILERECOGNITIONTEST: Preparing test data done
0#00045426#0000000000# FILERECOGNITIONTESTWARNING: expected >{any_device_name}<, got >lowconfidence1#FS1000_MainMenu<
1900#00046452#0000000000# FILERECOGNITIONTESTERROR: expected <FS0000_Pos_Rec_Tone><FS1400_DeviceDisambig_<slot>_ini1>, got <FS0000_Misrec_Tone><FS1000_MainMenu_nm1_004><pause300><FS1000_MainMenu_nm_001>
0#00046480#0000000000# FILERECOGNITIONTEST: Preparing test data done
0#00047026#0000000000# FILERECOGNITIONTEST: Stopping dialog immediately
[VCALogParser] Scenario 1 FAILED.
Can someone suggest me how to read specific lines, or the last 6 lines within a file ?

I can think of two methods. If your files are not too big, you can just read all lines, and keep only the last six ones:
f = open(some_path)
last_lines = f.readlines()[-6:]
But that's really brute-force. Something cleverer is to make a guess, using the seek() method of your file object:
file_size = os.stat(some_path).st_size # in _bytes_, so take care depending on encoding
f = open(some_path)
f.seek(file_size - 1000) # here's the guess. Adjust with expected line length
last_lines = f.readline()[-6:]

To read the last 6 lines of a single file, you could use Python's file.seek to move near to the end of the file and then read the remaining lines. You need to decide what the maximum line length could possibly be, e.g. 1024 characters.
The seek command is first used to move to the end of the file (without reading it in), tell is used to determine with position in the file (as we are at the end, this will be the length). It then goes backwards in the file and reads the lines in. If the file is very short, the whole file is read in.
import os
filename = r"C:\Users\hemanth_venkatappa\Desktop\TEST\Language\test.txt"
back_up = 6 * 1024 # Go back from the end more than 6 lines worth.
with open(filename, "r") as f_input:
f_input.seek(0, os.SEEK_END)
backup = min(back_up, f_input.tell())
f_input.seek(-backup, os.SEEK_END)
print f_input.readlines()[-6:]
Using with will ensure your file is automatically closed afterwards. Prefixing your file path with r avoids you needing to double backslash your file path.
So to then apply this to your directory walk and write your results to a separate output file, you could do the following:
import os
import re
back_up = 6 * 256 # Go back from the end more than 6 lines worth
directory = r"C:\Users\hemanth_venkatappa\Desktop\TEST\Language"
output_filename = r"C:\Users\hemanth_venkatappa\Desktop\TEST\output.txt"
with open(output_filename, 'w') as f_output:
for dirpath, dirnames, filenames in os.walk(directory):
for filename in filenames:
if filename.startswith('VCALogParser_output'):
cur_file = os.path.join(dirpath, filename)
with open(cur_file, "r") as f_input:
f_input.seek(0, os.SEEK_END)
backup = min(back_up , f_input.tell())
f_input.seek(-backup, os.SEEK_END)
last_lines = ''.join(f_input.readlines()[-6:])
try:
summary = ', '.join(re.search(r'(\d+ warning\(s\)).*?(\d+ error\(s\)).*?(\d+ scenarios\(s\))', last_lines, re.S).groups())
except AttributeError:
summary = "No summary"
f_output.write('{}: {}\n'.format(filename, summary))

Or, essentially, use a for loop to append lines to an array and then remove the nth number of items from the array like:
array=[]
f=open("file.txt","r")
for lines in f:
array.append(f.readlines())
f.close()
while len(array) > 5:
del array[0]

Related

Pull specific line from every .txt file in folder and output lines to another .txt

I have a folder of 400 .txt files and am attempting to take the sixth line from every file in the directory, and output each line all into a new singular .txt file with the sixth line from each file listed one after the other in the new file. For example, the output I am attempting to create should look like:
**output.txt**
This is the sixth line from 1.txt
This is the sixth line from 2.txt
This is the sixth line from 3.txt
So far I'm able to print off all the files in the directory in a list to be acted upon with:
import os
entries = os.listdir(r'C:/Users/defaultuser/Desktop/UprocScripts')
for entry in entries:
print(entry)
I have researched and tried various combinations of the readlines() method, but I'm not sure exactly how to combine them in multiples over an entire directory of 400 files. I'm still trying to learn, any ideas if I'm on the right path and how to combine them is appreciated.
Here is another way if you want to use for loop for iterating over your text file and pick a specific line.In this code all the .txt files are fetched at the beginning.
import glob
list_of_txt = glob.glob(r"C:\Users\defaultuser\Desktop\UprocScripts\*.txt")
for textfiles in list_of_txt:
with open(r"C:\Users\defaultuser\Desktop\UprocScripts\final.txt", 'a+') as final_text_file:
with open(textfiles, 'r') as textFile:
for n, line in enumerate(textFile):
if n+1 == 6: # if it's line no. 6 then write it on your final txt file
final_text_file.writelines(line)
Also note that I am using the glob module here. In addition if you want to add "from some.txt" after each line then just replace the last line with this:
final_text_file.write(line.strip() + " from " + textfiles.split('\\')[-1] + "\r\n")
You need to read each file, get the sixth line from each of them, then write that line to the output file.
Like so:
import os
entries = os.listdir(r'C:/Users/defaultuser/Desktop/UprocScripts')
for entry in entries:
with open('output.txt', 'w') as out_file:
with open(entry) as text_file:
lines = text_file.readlines()
target_line = lines[5] # sixth line
out_file.write(target_line)
Note this does read the complete file for each of the input files- which might be inefficient. You can try to get around that by trying to utilize the hint parameter to readlines - which accepts an approximate number of bytes to read until. If you know the apprx size of each line (in bytes) you can pass 6 * line_size as hint to try & optimize the read part.
You don't need to read all the file, you can read only the first 6 lines like this:
import os
entries = os.listdir(r'C:/Users/defaultuser/Desktop/UprocScripts')
final = []
for entry in entries
# Read the first 6 lines and add the last one (you don't need to read everything):
with open(entry) as f:
lines = []
for _ in range(6):
lines.append(f.readline())
final.append(lines[-1])
# And write
with open("final.txt", "r") as f:
f.writelines(final)
import os
files_list = []
sixth_line_list = []
output_list = []
directory = 'C:\\Users\\defaultuser\\Desktop\\UprocScripts'
for file in os.listdir(directory):
if file.endswith('.txt'):
files_list.append(''.join([directory, '\\', file]))
for file in files_list:
with open(file, 'r') as file_:
sixth_line_list.append({file: file_.readlines()[5]})
for i in range(0, len(sixth_line_list), 1):
output_list.append(''.join([sixth_line_list[i].values()[0], ' from ', sixth_line_list[i].keys()[0]]))
with open(''.join([directory, '\\output.txt']), 'w') as output:
output.writelines(output_list)

How can I automatize reading and writing files? [duplicate]

This question already has answers here:
How to delete a specific line in a file?
(17 answers)
Closed 3 years ago.
I would like to read a lot of data in a folder, and want to delete lines that have "DT=(SINGLE SINGLE SINGLE)", and then write it as new data.
In that Data folder, there are 300 data files!
My code is
import os, sys
path = "/Users/xxx/Data/"
allFiles = os.listdir(path)
for fname in allFiles:
print(fname)
with open(fname, "r") as f:
with open(fname, "w") as w:
for line in f:
if "DT=(SINGLE SINGLE SINGLE)" not in line:
w.write(line)
FileNotFoundError: [Errno 2] No such file or directory: '1147.dat'
I want to do it for a bunch of dataset.
How can I automatically read and write to delete the lines?
and is there way to make a new dataset with a different name? e.g. 1147.dat -> 1147_new.dat
The below should do; code demos of what each annotated line does afterwards:
path = "/Users/xxx/Data/"
allFiles = [os.path.join(path, filename) for filename in os.listdir(path)] # [1]
del_keystring = "DT=(SINGLE SINGLE SINGLE)" # general case
for filepath in allFiles: # better longer var names for clarity
print(filepath)
with open(filepath,'r') as f_read: # [2]
loaded_txt = f_read.readlines()
new_txt = []
for line in loaded_txt:
if del_keystring not in line:
new_txt.append(line)
with open(filepath,'w') as f_write: # [2]
f_write.write(''.join([line for line in new_txt])) # [4]
with open(filepath,'r') as f_read: # [5]
assert(len(f_read.readlines()) <= len(loaded_txt))
1 os.listdir returns only the filenames, not the filepaths; os.path.join joins its inputs into a fullpath, with separators (e.g. \\): folderpath + '\\' + filename
[2] NOT same as doing with open(X,'r') as .., with open(X,'w') as ..:; the as 'w' empties the file, thus nothing for as 'r' to read
[3] If f_read.read() == "Abc\nDe\n12", then f_read.read().split('\n')==["Abc,"De","12"]
[4] Undoes [3]: if _ls==["a","bc","12"], then "\n".join([x for x in _ls])=="a\nbc\n12"
[5] Optional code to verify that saved file's # of lines is <= original file's
NOTE: you may see the saved filesize slightly bigger than original's, which may be due to original's better packing, compression, etc - which you can figure from its docs; [5] ensures it isn't due to more lines
# bonus code to explicitly verify intended lines were deleted
with open(original_file_path,'r') as txt:
print(''.join(txt.readlines()[:80])) # select small excerpt
with open(processed_file_path,'r') as txt:
print(''.join(txt.readlines()[:80])) # select small excerpt
# ''.join() since .readlines() returns a list, delimited by \n
NOTE: for more advanced caveats, see comments below answer; for a more compact alternative, see Torxed's version

Split File when string is matches exactly

I have a huge text file that I need to split based on matching a 'EKYC' only value. However, when other values with similar pattern show up my script fails.
I am new in Python and it is wearing me out.
import sys;
import os;
MASTER_TEXT_FILE=sys.argv[1];
OUTPUT_FILE=sys.argv[2];
L = file(MASTER_TEXT_FILE, "r").read().strip().split("EKYC")
i = 0
for l in L:
i = i + 1
f = file(OUTPUT_FILE+"-%d.ekyc" % i , "w")
print >>f, "EKYC" + l
The script breaks when there is EKYCSMRT or EKYCVDA or EKYCTIGO then how can I make the guard to prevent the splitting to occur before the point.
This is the content of all of the messages
EKYC
WIK 12
EKYC
WIK 12
EKYCTIGO
EKYC
WIK 13
TTL
EKYCVD
EKYC
WIK 14
TTL D
Thanks for the assistance.
If possible, you should avoid reading large files into memory all at once. Instead, stream chunks of them at a time.
The sensible chunks of text files are usually lines. This can be done with .readline(), but simply iterating over the file yields its lines too.
After reading a line (which includes the newline), you can .write() it directly to the current output file.
import sys
master_filename = sys.argv[1]
output_filebase = sys.argv[2]
output = None
output_number = 0
for line in open(master_filename):
if line.strip() == 'EKYC':
if output is not None:
output.close()
output = None
else:
if output is None:
output_number += 1
output_filename = '%s-%d.ekyc' % (output_filebase, output_number)
output = open(output_filename, 'w')
output.write(line)
if output is not None:
output.close()
The output file is closed and reset upon encountering 'EKYC' on its own line.
Here, you'll notice that the output file isn't (re)opened until right before there is a line to write to it: this avoids creating an empty output file in case there are no further lines to write to it. You'll have to re-order this slightly if you want the 'EKYC' line to appear in the output file also.
Based on your sample input file, you need to: split('\nEKYC\n')
#!/usr/bin/env python
import sys
MASTER_TEXT_FILE = sys.argv[1]
OUTPUT_FILE = sys.argv[2]
with open(MASTER_TEXT_FILE) as f:
fdata = f.read()
i = 0
for subset in fdata.split('\nEKYC\n'):
i += 1
with open(OUTPUT_FILE+"-%d.ekyc" % i, 'w') as output:
output.write(subset)
Other comments:
Python doesn't use ;.
Your original code wasn't using os.
It's recommended to use with open(<filename>, <mode>) as f: ... since it handles possible errors and closes the file afterward.

Insert selected text from one file into multiple files -python

I have file(lets call it file_A) from which I extract 5 lines (like so)
pppppppp
qqqqqqqq
rrrrrrrr
ssssssss
tttttttt
Now I want to insert these five lines at the beginning of 10 other files (lets call them 1..10).
I want to open file_A (one with the 5 lines) then open one by one the 10 files and insert these lines at the beginning of these files.
Right now I do the following:
1) open file_A, extract first of the five lines (pppppppp) write to new file
2) then the second line….so one.
3) open file 1 and copy all lines and then write out to new file opened in step 1
4) close all files…..repeat
So I end up with 10 new files while I would rather store the 5 lines extracted into the existing file and also avoid opening and closing file_A 10times !
What tool can I use to store the 5 lines in memory while doing this without distorting them?
Thank you
Put files in a list, and iterate over the list:
file_list = ["foo.txt","bar.txt"...]
with open("file_A.txt","r+") as f: # open once
# extract lines
extracted = [] # put data here
for f in file_list:
with open(f, "r+") as out:
lines = out.readlines()
lines[0:0] = extracted # add data at start
out.seek(0)
out.write("".join(lines)) # write new data
I'm not sure if you need the data re-written in place, but if not, you can use (untested):
from itertools import islice, chain
with open('filea.txt') as fin:
first5 = list(islice(fin, 5)) # or whatever criteria to get the 5
for filename in ['file1.txt', 'file2.txt', 'file3.txt']: #etc
with open(filename) as fin, open(filename + '.out') as fout:
fout.writelines(chain(first5, fin))

Strip file names from files and open recursively? Saving previous strings? - PYTHON

I have a question about reading in a .txt rile and taking the string from inside to be used later on in the code.
If I have a file called 'file0.txt' and it contains:
file1.txt
file2.txt
The rest of the files either contain more string file names or are empty.
How can I save both of these strings for later use. What I attempted to do was:
infile = open(file, 'r')
line = infile.readline()
line.split('\n')
But that returned the following:
['file1.txt', '']
I understand that readline only reads one line, but I thought that by splitting it by the return key it would also grab the next file string.
I am attempting to simulate a file tree or to show which files are connected together, but as it stands now it is only going through the first file string in each .txt file.
Currently my output is:
File 1 crawled.
File 3 crawled.
Dead end reached.
My hope was that instead of just recursivley crawling the first file it would go through the entire web, but that goes back to my issue of not giving the program the second file name in the first place.
I'm not asking for a specific answer, just a push in the right direction on how to better handle the strings from the files and be able to store both of them instead of 1.
My current code is pretty ugly, but hopefully it gets the idea across, I will just post it for reference to what I'm trying to accomplish.
def crawl(file):
infile = open(file, 'r')
line = infile.readline()
print(line.split('\n'))
if 'file1.txt' in line:
print('File 1 crawled.')
return crawl('file1.txt')
if 'file2.txt' in line:
print('File 2 crawled.')
return crawl('file2.txt')
if 'file3.txt' in line:
print('File 3 crawled.')
return crawl('file3.txt')
if 'file4.txt' in line:
print('File 4 crawled.')
return crawl('file4.txt')
if 'file5.txt' in line:
print('File 5 crawled.')
return crawl('file5.txt')
#etc...etc...
else:
print('Dead end reached.')
Outside the function:
file = 'file0.txt'
crawl(file)
Using read() or readlines() will help. e.g.
infile = open(file, 'r')
lines = infile.readlines()
print list(lines)
gives
['file1.txt\n', 'file2.txt\n']
or
infile = open(file, 'r')
lines = infile.read()
print list(lines.split('\n'))
gives
['file1.txt', 'file2.txt']
Readline only gets one line from the file so it has a newline at the end. What you want is file.read() which will give you the whole file as a single string. Split that using newline and you should have what you need. Also remember that you need to save the list of lines as a new variable i.e. assign to your line.split('\n') action. You could also just use readlines which will get a list of lines from the file.
change readline to readlines. and no need to split(\n), its already a list.
here is a tutorial you should read
I prepared file0.txt with two files in it, file1.txt, with one file in it, plus file2.txt and file3.txt, which contained no data. Note, this won't extract values already in the list
def get_files(current_file, files=[]):
# Initialize file list with previous values, or intial value
new_files = []
if not files:
new_files = [current_file]
else:
new_files = files
# Read files not already in list, to the list
with open(current_file, "r") as f_in:
for new_file in f_in.read().splitlines():
if new_file not in new_files:
new_files.append(new_file.strip())
# Do we need to recurse?
cur_file_index = new_files.index(current_file)
if cur_file_index < len(new_files) - 1:
next_file = new_files[cur_file_index + 1]
# Recurse
get_files(next_file, new_files)
# We're done
return new_files
initial_file = "file0.txt"
files = get_files(initial_file)
print(files)
Returns: ['file0.txt', 'file1.txt', 'file2.txt', 'file3.txt']
file0.txt
file1.txt
file2.txt
file1.txt
file3.txt
file2.txt and file3.txt were blank
Edits: Added .strip() for safety, and added the contents of the data files so this can be replicated.

Categories

Resources