I have file(lets call it file_A) from which I extract 5 lines (like so)
pppppppp
qqqqqqqq
rrrrrrrr
ssssssss
tttttttt
Now I want to insert these five lines at the beginning of 10 other files (lets call them 1..10).
I want to open file_A (one with the 5 lines) then open one by one the 10 files and insert these lines at the beginning of these files.
Right now I do the following:
1) open file_A, extract first of the five lines (pppppppp) write to new file
2) then the second line….so one.
3) open file 1 and copy all lines and then write out to new file opened in step 1
4) close all files…..repeat
So I end up with 10 new files while I would rather store the 5 lines extracted into the existing file and also avoid opening and closing file_A 10times !
What tool can I use to store the 5 lines in memory while doing this without distorting them?
Thank you
Put files in a list, and iterate over the list:
file_list = ["foo.txt","bar.txt"...]
with open("file_A.txt","r+") as f: # open once
# extract lines
extracted = [] # put data here
for f in file_list:
with open(f, "r+") as out:
lines = out.readlines()
lines[0:0] = extracted # add data at start
out.seek(0)
out.write("".join(lines)) # write new data
I'm not sure if you need the data re-written in place, but if not, you can use (untested):
from itertools import islice, chain
with open('filea.txt') as fin:
first5 = list(islice(fin, 5)) # or whatever criteria to get the 5
for filename in ['file1.txt', 'file2.txt', 'file3.txt']: #etc
with open(filename) as fin, open(filename + '.out') as fout:
fout.writelines(chain(first5, fin))
Related
I have 5 files from which i want to take each line (24 lines in total) and save it to a new file. I managed to find a code which will do that but they way it is, every time i have to manually change the number of the appropriate original file and of the file i want to save it to and also the number of each line every time.
The code:
x1= np.loadtxt("x_p2_40.txt")
x2= np.loadtxt("x_p4_40.txt")
x3= np.loadtxt("x_p6_40.txt")
x4= np.loadtxt("x_p8_40.txt")
x5= np.loadtxt("x_p1_40.txt")
with open("x_p1_40.txt", "r") as file:
content = file.read()
first_line = content.split('\n', 1)[0]
with open("1_p_40_x.txt", "a" ) as f :
f.write("\n")
with open("1_p_40_x.txt", "a" ) as fa :
fa.write(first_line)
print(first_line)
I am a beginner at python, and i'm not sure how to make a loop for this, because i assume i need a loop?
Thank you!
Since you have multiple files here, you could define their names in a list, and use a list comprehension to open file handles to them all:
input_files = ["x_p2_40.txt", "x_p4_40.txt", "x_p6_40.txt", "x_p8_40.txt", "x_p1_40.txt"]
file_handles = [open(f, "r") for f in input_files]
Since each of these file handles is an iterator that yields a single line every time you iterate over it, you could simply zip() all these file handles to iterate over them simultaneously. Also throw in an enumerate() to get the line numbers:
for line_num, files_lines in enumerate(zip(*file_handles), 1):
out_file = f"{line_num}_p_40.txt"
# Remove trailing whitespace on all lines, then add a newline
files_lines = [f.rstrip() + "\n" for f in files_lines]
with open(out_file, "w") as of:
of.writelines(files_lines)
With three files:
x_p2_40.txt:
2_1
2_2
2_3
2_4
x_p4_40.txt:
4_1
4_2
4_3
4_4
x_p6_40.txt:
6_1
6_2
6_3
6_4
I get the following output:
1_p_40.txt:
2_1
4_1
6_1
2_p_40.txt:
2_2
4_2
6_2
3_p_40.txt:
2_3
4_3
6_3
4_p_40.txt:
2_4
4_4
6_4
Finally, since we didn't use a context manager to open the original file handles, remember to close them after we're done:
for fh in file_handles:
fh.close()
If you have files with an unequal number of lines and you want to create files for all lines, consider using itertools.zip_longest() instead of zip()
In order to read each of your input files, you can store them in a list and iterate over it with a for loop. Then we add every line to a single list with the function extend() :
inputFiles = ["x_p2_40.txt", "x_p4_40.txt", "x_p6_40.txt", "x_p8_40.txt", "x_p1_40.txt"]
outputFile = "outputfile.txt"
lines = []
for filename in inputFiles:
with open(filename, 'r') as f:
lines.extend(f.readlines())
lines[-1] += '\n'
Finally you can write all the line to your output file :
with open(outputFile, 'w') as f:
f.write(''.join(lines))
I have a folder of 400 .txt files and am attempting to take the sixth line from every file in the directory, and output each line all into a new singular .txt file with the sixth line from each file listed one after the other in the new file. For example, the output I am attempting to create should look like:
**output.txt**
This is the sixth line from 1.txt
This is the sixth line from 2.txt
This is the sixth line from 3.txt
So far I'm able to print off all the files in the directory in a list to be acted upon with:
import os
entries = os.listdir(r'C:/Users/defaultuser/Desktop/UprocScripts')
for entry in entries:
print(entry)
I have researched and tried various combinations of the readlines() method, but I'm not sure exactly how to combine them in multiples over an entire directory of 400 files. I'm still trying to learn, any ideas if I'm on the right path and how to combine them is appreciated.
Here is another way if you want to use for loop for iterating over your text file and pick a specific line.In this code all the .txt files are fetched at the beginning.
import glob
list_of_txt = glob.glob(r"C:\Users\defaultuser\Desktop\UprocScripts\*.txt")
for textfiles in list_of_txt:
with open(r"C:\Users\defaultuser\Desktop\UprocScripts\final.txt", 'a+') as final_text_file:
with open(textfiles, 'r') as textFile:
for n, line in enumerate(textFile):
if n+1 == 6: # if it's line no. 6 then write it on your final txt file
final_text_file.writelines(line)
Also note that I am using the glob module here. In addition if you want to add "from some.txt" after each line then just replace the last line with this:
final_text_file.write(line.strip() + " from " + textfiles.split('\\')[-1] + "\r\n")
You need to read each file, get the sixth line from each of them, then write that line to the output file.
Like so:
import os
entries = os.listdir(r'C:/Users/defaultuser/Desktop/UprocScripts')
for entry in entries:
with open('output.txt', 'w') as out_file:
with open(entry) as text_file:
lines = text_file.readlines()
target_line = lines[5] # sixth line
out_file.write(target_line)
Note this does read the complete file for each of the input files- which might be inefficient. You can try to get around that by trying to utilize the hint parameter to readlines - which accepts an approximate number of bytes to read until. If you know the apprx size of each line (in bytes) you can pass 6 * line_size as hint to try & optimize the read part.
You don't need to read all the file, you can read only the first 6 lines like this:
import os
entries = os.listdir(r'C:/Users/defaultuser/Desktop/UprocScripts')
final = []
for entry in entries
# Read the first 6 lines and add the last one (you don't need to read everything):
with open(entry) as f:
lines = []
for _ in range(6):
lines.append(f.readline())
final.append(lines[-1])
# And write
with open("final.txt", "r") as f:
f.writelines(final)
import os
files_list = []
sixth_line_list = []
output_list = []
directory = 'C:\\Users\\defaultuser\\Desktop\\UprocScripts'
for file in os.listdir(directory):
if file.endswith('.txt'):
files_list.append(''.join([directory, '\\', file]))
for file in files_list:
with open(file, 'r') as file_:
sixth_line_list.append({file: file_.readlines()[5]})
for i in range(0, len(sixth_line_list), 1):
output_list.append(''.join([sixth_line_list[i].values()[0], ' from ', sixth_line_list[i].keys()[0]]))
with open(''.join([directory, '\\output.txt']), 'w') as output:
output.writelines(output_list)
Let say I have a file with 48,222 lines. I then give an index value, let say, 21,000.
Is there any way in Python to "move" the contents of the file starting from index 21,000 such that now I have two files: the original one and the new one. But the original one now is having 21,000 lines and the new one 27,222 lines.
I read this post which uses partition and is quite describing what I want:
with open("inputfile") as f:
contents1, sentinel, contents2 = f.read().partition("Sentinel text\n")
with open("outputfile1", "w") as f:
f.write(contents1)
with open("outputfile2", "w") as f:
f.write(contents2)
Except that (1) it uses "Sentinel Text" as separator, (2) it creates two new files and require me to delete the old file. As of now, the way I do it is like this:
for r in result.keys(): #the filenames are in my dictionary, don't bother that
f = open(r)
lines = f.readlines()
f.close()
with open("outputfile1.txt", "w") as fn:
for line in lines[0:21000]:
#write each line
with open("outputfile2.txt", "w") as fn:
for line in lines[21000:]:
#write each line
Which is quite a manual work. Is there a built-in or more efficient way?
You can also use writelines() and dump the sliced list of lines from 0 to 20999 into one file and another sliced list from 21000 to the end into another file.
with open("inputfile") as f:
content = f.readlines()
content1 = content[:21000]
content2 = content[21000:]
with open("outputfile1.txt", "w") as fn1:
fn1.writelines(content1)
with open('outputfile2.txt','w') as fn2:
fn2.writelines(content2)
I am reading a folder with a specific file name. I am reading the content within a file, but how do I read specific lines or the last 6 lines within a file?
************************************
Test Scenario No. 1
TestcaseID = FB_71125_1
dpSettingScript = FB_71125_1_DP.txt
************************************
Setting Pre-Conditions (DP values, Sqlite DB):
cp /fs/images/nfs/FileRecogTest/MNT/test/Databases/FB_71125_1_device.sqlite $NUANCE_DB_DIR/device.sqlite
"sync" twice.
Starting the test:
0#00041511#0000000000# FILERECOGNITIONTEST: = testScenarioNo (int)1 =
0#00041514#0000000000# FILERECOGNITIONTEST: = TestcaseID (char*)FB_71125_1 =
0#00041518#0000000000# FILERECOGNITIONTEST: = dpSettingScript (char*)FB_71125_1_DP.txt =
0#00041520#0000000000# FILERECOGNITIONTEST: = UtteranceNo (char*)1 =
0#00041524#0000000000# FILERECOGNITIONTEST: = expectedEventData (char*)0||none|0||none =
0#00041528#0000000000# FILERECOGNITIONTEST: = expectedFollowUpDialog (char*) =
0#00041536#0000000000# FILERECOGNITIONTEST: /fs/images/nfs/FileRecogTest/MNT/test/main_menu.wav#MEDIA_COND:PAS_MEDIA&MEDIA_NOT_BT#>main_menu.global<#<FS0000_Pos_Rec_Tone><FS1000_MainMenu_ini1>
0#00041789#0000000000# FILERECOGNITIONTEST: Preparing test data done
0#00043768#0000000000# FILERECOGNITIONTEST: /fs/images/nfs/FileRecogTest/MNT/test/Framework.wav##>{any_device_name}<#<FS0000_Pos_Rec_Tone><FS1400_DeviceDisambig_<slot>_ini1>
0#00044008#0000000000# FILERECOGNITIONTEST: Preparing test data done
0#00045426#0000000000# FILERECOGNITIONTESTWARNING: expected >{any_device_name}<, got >lowconfidence1#FS1000_MainMenu<
1900#00046452#0000000000# FILERECOGNITIONTESTERROR: expected <FS0000_Pos_Rec_Tone><FS1400_DeviceDisambig_<slot>_ini1>, got <FS0000_Misrec_Tone><FS1000_MainMenu_nm1_004><pause300><FS1000_MainMenu_nm_001>
0#00046480#0000000000# FILERECOGNITIONTEST: Preparing test data done
0#00047026#0000000000# FILERECOGNITIONTEST: Stopping dialog immediately
[VCALogParser] Scenario 1 FAILED.
Can someone suggest me how to read specific lines, or the last 6 lines within a file ?
I can think of two methods. If your files are not too big, you can just read all lines, and keep only the last six ones:
f = open(some_path)
last_lines = f.readlines()[-6:]
But that's really brute-force. Something cleverer is to make a guess, using the seek() method of your file object:
file_size = os.stat(some_path).st_size # in _bytes_, so take care depending on encoding
f = open(some_path)
f.seek(file_size - 1000) # here's the guess. Adjust with expected line length
last_lines = f.readline()[-6:]
To read the last 6 lines of a single file, you could use Python's file.seek to move near to the end of the file and then read the remaining lines. You need to decide what the maximum line length could possibly be, e.g. 1024 characters.
The seek command is first used to move to the end of the file (without reading it in), tell is used to determine with position in the file (as we are at the end, this will be the length). It then goes backwards in the file and reads the lines in. If the file is very short, the whole file is read in.
import os
filename = r"C:\Users\hemanth_venkatappa\Desktop\TEST\Language\test.txt"
back_up = 6 * 1024 # Go back from the end more than 6 lines worth.
with open(filename, "r") as f_input:
f_input.seek(0, os.SEEK_END)
backup = min(back_up, f_input.tell())
f_input.seek(-backup, os.SEEK_END)
print f_input.readlines()[-6:]
Using with will ensure your file is automatically closed afterwards. Prefixing your file path with r avoids you needing to double backslash your file path.
So to then apply this to your directory walk and write your results to a separate output file, you could do the following:
import os
import re
back_up = 6 * 256 # Go back from the end more than 6 lines worth
directory = r"C:\Users\hemanth_venkatappa\Desktop\TEST\Language"
output_filename = r"C:\Users\hemanth_venkatappa\Desktop\TEST\output.txt"
with open(output_filename, 'w') as f_output:
for dirpath, dirnames, filenames in os.walk(directory):
for filename in filenames:
if filename.startswith('VCALogParser_output'):
cur_file = os.path.join(dirpath, filename)
with open(cur_file, "r") as f_input:
f_input.seek(0, os.SEEK_END)
backup = min(back_up , f_input.tell())
f_input.seek(-backup, os.SEEK_END)
last_lines = ''.join(f_input.readlines()[-6:])
try:
summary = ', '.join(re.search(r'(\d+ warning\(s\)).*?(\d+ error\(s\)).*?(\d+ scenarios\(s\))', last_lines, re.S).groups())
except AttributeError:
summary = "No summary"
f_output.write('{}: {}\n'.format(filename, summary))
Or, essentially, use a for loop to append lines to an array and then remove the nth number of items from the array like:
array=[]
f=open("file.txt","r")
for lines in f:
array.append(f.readlines())
f.close()
while len(array) > 5:
del array[0]
I have a question about reading in a .txt rile and taking the string from inside to be used later on in the code.
If I have a file called 'file0.txt' and it contains:
file1.txt
file2.txt
The rest of the files either contain more string file names or are empty.
How can I save both of these strings for later use. What I attempted to do was:
infile = open(file, 'r')
line = infile.readline()
line.split('\n')
But that returned the following:
['file1.txt', '']
I understand that readline only reads one line, but I thought that by splitting it by the return key it would also grab the next file string.
I am attempting to simulate a file tree or to show which files are connected together, but as it stands now it is only going through the first file string in each .txt file.
Currently my output is:
File 1 crawled.
File 3 crawled.
Dead end reached.
My hope was that instead of just recursivley crawling the first file it would go through the entire web, but that goes back to my issue of not giving the program the second file name in the first place.
I'm not asking for a specific answer, just a push in the right direction on how to better handle the strings from the files and be able to store both of them instead of 1.
My current code is pretty ugly, but hopefully it gets the idea across, I will just post it for reference to what I'm trying to accomplish.
def crawl(file):
infile = open(file, 'r')
line = infile.readline()
print(line.split('\n'))
if 'file1.txt' in line:
print('File 1 crawled.')
return crawl('file1.txt')
if 'file2.txt' in line:
print('File 2 crawled.')
return crawl('file2.txt')
if 'file3.txt' in line:
print('File 3 crawled.')
return crawl('file3.txt')
if 'file4.txt' in line:
print('File 4 crawled.')
return crawl('file4.txt')
if 'file5.txt' in line:
print('File 5 crawled.')
return crawl('file5.txt')
#etc...etc...
else:
print('Dead end reached.')
Outside the function:
file = 'file0.txt'
crawl(file)
Using read() or readlines() will help. e.g.
infile = open(file, 'r')
lines = infile.readlines()
print list(lines)
gives
['file1.txt\n', 'file2.txt\n']
or
infile = open(file, 'r')
lines = infile.read()
print list(lines.split('\n'))
gives
['file1.txt', 'file2.txt']
Readline only gets one line from the file so it has a newline at the end. What you want is file.read() which will give you the whole file as a single string. Split that using newline and you should have what you need. Also remember that you need to save the list of lines as a new variable i.e. assign to your line.split('\n') action. You could also just use readlines which will get a list of lines from the file.
change readline to readlines. and no need to split(\n), its already a list.
here is a tutorial you should read
I prepared file0.txt with two files in it, file1.txt, with one file in it, plus file2.txt and file3.txt, which contained no data. Note, this won't extract values already in the list
def get_files(current_file, files=[]):
# Initialize file list with previous values, or intial value
new_files = []
if not files:
new_files = [current_file]
else:
new_files = files
# Read files not already in list, to the list
with open(current_file, "r") as f_in:
for new_file in f_in.read().splitlines():
if new_file not in new_files:
new_files.append(new_file.strip())
# Do we need to recurse?
cur_file_index = new_files.index(current_file)
if cur_file_index < len(new_files) - 1:
next_file = new_files[cur_file_index + 1]
# Recurse
get_files(next_file, new_files)
# We're done
return new_files
initial_file = "file0.txt"
files = get_files(initial_file)
print(files)
Returns: ['file0.txt', 'file1.txt', 'file2.txt', 'file3.txt']
file0.txt
file1.txt
file2.txt
file1.txt
file3.txt
file2.txt and file3.txt were blank
Edits: Added .strip() for safety, and added the contents of the data files so this can be replicated.