Better approach for reading/writing files in python? - python

Suppose I have a file (say file1.txt) with data around 3mb or more. If I want to write this data to a second file (say file2.txt), which one of the following approaches will be better?
Language used: Python 2.7.3
Approach 1:
file1_handler = file("file1.txt", 'r')
for lines in file1_handler:
line = lines.strip()
# Perform some operation
file2_handler = file("file2.txt", 'a')
file2_handler.write(line)
file2_handler.write('\r\n')
file2_handler.close()
file1_handler.close()
Approach 2:
file1_handler = file("file1.txt", 'r')
file2_handler = file("file2.txt", 'a')
for lines in file1_handler:
line = lines.strip()
# Perform some operation
file2_handler.write(line)
file2_handler.write('\r\n')
file2_handler.close()
file1_handler.close()
I think approach two will be better because you just have to open and close file2.txt once. What do you say?

Use with, it will close the files automatically for you:
with open("file1.txt", 'r') as in_file, open("file2.txt", 'a') as out_file:
for lines in in_file:
line = lines.strip()
# Perform some operation
out_file.write(line)
out_file.write('\r\n')
Use open instead of file, file is deprecated.
Of course it's unreasonable to open file2 on every line of file1.

I was recently doing something similar (if I understood you well). How about:
file = open('file1.txt', 'r')
file2 = open('file2.txt', 'wt')
for line in file:
newLine = line.strip()
# You can do your operation here on newLine
file2.write(newLine)
file2.write('\r\n')
file.close()
file2.close()
This approach works like a charm!

My solution (derived from Pavel Anossov + buffering):
dim = 1000
buffer = []
with open("file1.txt", 'r') as in_file, open("file2.txt", 'a') as out_file:
for i, lines in enumerate(in_file):
line = lines.strip()
# Perform some operation
buffer.append(line)
if i%dim == dim-1:
for bline in buffer:
out_file.write(bline)
out_file.write('\r\n')
buffer = []
Pavel Anossov gave the right solution first: this is just a suggestion ;)
Probably it exists a more elegant way to implement this function. If anyone knows it, please tell us.

Related

How to ignore lines in a file starting with "##" and load the table in csv module?

So, I have a file which has some 40 lines starting with '##'. After those lines there is a TSV table structure which I want to read using csv.DictReader().
I am trying the following code:
f = open(file, 'r')
for line in f.readlines():
if line.startswith('##'):
next(line)
However, I am not sure how to load the data into csv.DictReader after ignoring these lines. Any suggestions as to how to go about this?
You can use an iterator, which does not realize all of the file in memory (can be a concern if the file is big)
def read_fn():
path = "./text.tsv"
with open(path, "r") as f:
for line in f:
if line.startswith('##'):
continue
yield line
reader = csv.DictReader(read_fn())
for row in reader:
print(row)
Basically you need to create an intermediate list of lines that you then pass to DictReader (I am also adding a with statement) as this is the conventional, Pythonic way of properly handling files in case of exceptions:
good_lines = []
with open(file, 'r') as f:
for line in f.readlines():
if line.startswith('##'):
next(line)
else:
good_lines.append(line)
dr = csv.DictReader(good_lines)

Copying content of file into another file in Python [duplicate]

I would like to copy certain lines of text from one text file to another. In my current script when I search for a string it copies everything afterwards, how can I copy just a certain part of the text? E.g. only copy lines when it has "tests/file/myword" in it?
current code:
#!/usr/bin/env python
f = open('list1.txt')
f1 = open('output.txt', 'a')
doIHaveToCopyTheLine=False
for line in f.readlines():
if 'tests/file/myword' in line:
doIHaveToCopyTheLine=True
if doIHaveToCopyTheLine:
f1.write(line)
f1.close()
f.close()
The oneliner:
open("out1.txt", "w").writelines([l for l in open("in.txt").readlines() if "tests/file/myword" in l])
Recommended with with:
with open("in.txt") as f:
lines = f.readlines()
lines = [l for l in lines if "ROW" in l]
with open("out.txt", "w") as f1:
f1.writelines(lines)
Using less memory:
with open("in.txt") as f:
with open("out.txt", "w") as f1:
for line in f:
if "ROW" in line:
f1.write(line)
readlines() reads the entire input file into a list and is not a good performer. Just iterate through the lines in the file. I used 'with' on output.txt so that it is automatically closed when done. That's not needed on 'list1.txt' because it will be closed when the for loop ends.
#!/usr/bin/env python
with open('output.txt', 'a') as f1:
for line in open('list1.txt'):
if 'tests/file/myword' in line:
f1.write(line)
Just a slightly cleaned up way of doing this. This is no more or less performant than ATOzTOA's answer, but there's no reason to do two separate with statements.
with open(path_1, 'a') as file_1, open(path_2, 'r') as file_2:
for line in file_2:
if 'tests/file/myword' in line:
file_1.write(line)
Safe and memory-saving:
with open("out1.txt", "w") as fw, open("in.txt","r") as fr:
fw.writelines(l for l in fr if "tests/file/myword" in l)
It doesn't create temporary lists (what readline and [] would do, which is a non-starter if the file is huge), all is done with generator comprehensions, and using with blocks ensure that the files are closed on exit.
f=open('list1.txt')
f1=open('output.txt','a')
for x in f.readlines():
f1.write(x)
f.close()
f1.close()
this will work 100% try this once
in Python 3.10 with parenthesized context managers, you can use multiple context managers in one with block:
with (open('list1.txt', 'w') as fout, open('output.txt') as fin):
fout.write(fin.read())
f = open('list1.txt')
f1 = open('output.txt', 'a')
# doIHaveToCopyTheLine=False
for line in f.readlines():
if 'tests/file/myword' in line:
f1.write(line)
f1.close()
f.close()
Now Your code will work. Try This one.

How to read one particular line from .txt file in python?

I know I can read the line by line with
dataFile = open('myfile.txt', 'r')
firstLine = dataFile.readline()
secondLine = dataFile.readline()
...
I also know how to read all the lines in one go
dataFile = open('myfile.txt', 'r')
allLines = dataFile.read()
But my question is how to read one particular line from .txt file?
I wish to read that line by its index.
e.g. I want the 4th line, I expect something like
dataFile = open('myfile.txt', 'r')
allLines = dataFile.readLineByIndex(3)
Skip 3 lines:
with open('myfile.txt', 'r') as dataFile:
for i in range(3):
next(dataFile)
the_4th_line = next(dataFile)
Or use linecache.getline:
the_4th_line = linecache.getline('myfile.txt', 4)
From another Ans
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!
You can do exactly as you wanted with this:
DataFile = open('mytext.txt', 'r')
content = DataFile.readlines()
oneline = content[5]
DataFile.close()
you could take this down to three lines by removing oneline = content[5] and using content[5] without creating another variable (print(content[5]) for example) I did this just to make it clear that content[5] must be a used as a list to read the one line.

Insert string at the beginning of each line

How can I insert a string at the beginning of each line in a text file, I have the following code:
f = open('./ampo.txt', 'r+')
with open('./ampo.txt') as infile:
for line in infile:
f.insert(0, 'EDF ')
f.close
I get the following error:
'file' object has no attribute 'insert'
Python comes with batteries included:
import fileinput
import sys
for line in fileinput.input(['./ampo.txt'], inplace=True):
sys.stdout.write('EDF {l}'.format(l=line))
Unlike the solutions already posted, this also preserves file permissions.
You can't modify a file inplace like that. Files do not support insertion. You have to read it all in and then write it all out again.
You can do this line by line if you wish. But in that case you need to write to a temporary file and then replace the original. So, for small enough files, it is just simpler to do it in one go like this:
with open('./ampo.txt', 'r') as f:
lines = f.readlines()
lines = ['EDF '+line for line in lines]
with open('./ampo.txt', 'w') as f:
f.writelines(lines)
Here's a solution where you write to a temporary file and move it into place. You might prefer this version if the file you are rewriting is very large, since it avoids keeping the contents of the file in memory, as versions that involve .read() or .readlines() will. In addition, if there is any error in reading or writing, your original file will be safe:
from shutil import move
from tempfile import NamedTemporaryFile
filename = './ampo.txt'
tmp = NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name, 'w') as ftmp:
for line in finput:
ftmp.write('EDF '+line)
move(tmp.name, filename)
For a file not too big:
with open('./ampo.txt', 'rb+') as f:
x = f.read()
f.seek(0,0)
f.writelines(('EDF ', x.replace('\n','\nEDF ')))
f.truncate()
Note that , IN THEORY, in THIS case (the content is augmented), the f.truncate() may be not really necessary. Because the with statement is supposed to close the file correctly, that is to say, writing an EOF (end of file ) at the end before closing.
That's what I observed on examples.
But I am prudent: I think it's better to put this instruction anyway. For when the content diminishes, the with statement doesn't write an EOF to close correctly the file less far than the preceding initial EOF, hence trailing initial characters remains in the file.
So if the with statement doens't write EOF when the content diminishes, why would it write it when the content augments ?
For a big file, to avoid to put all the content of the file in RAM at once:
import os
def addsomething(filepath, ss):
if filepath.rfind('.') > filepath.rfind(os.sep):
a,_,c = filepath.rpartition('.')
tempi = a + 'temp.' + c
else:
tempi = filepath + 'temp'
with open(filepath, 'rb') as f, open(tempi,'wb') as g:
g.writelines(ss + line for line in f)
os.remove(filepath)
os.rename(tempi,filepath)
addsomething('./ampo.txt','WZE')
f = open('./ampo.txt', 'r')
lines = map(lambda l : 'EDF ' + l, f.readlines())
f.close()
f = open('./ampo.txt', 'w')
map(lambda l : f.write(l), lines)
f.close()

Eliminate part of a file in python

In the below file I have 3 occurrences of '.1'. I want to eliminate the last one and write the rest of file to a new file. Kindly suggest some way to do it in PYTHON and thank you all.
d1dlwa_ a.1.1.1 (A:) Protozoan/bacterial hemoglobin {Ciliate (Paramecium caudatum) [TaxId: 5885]}
slfeqlggqaavqavtaqfyaniqadatvatffngidmpnqtnktaaflcaalggpnawt
If the file's not too horrendously huge, by far the simplest approach is:
f = open('oldfile', 'r')
data = f.read()
f.close()
data = data.replace('.1.1.1', '.1.1')
f = open('newfile', 'w')
f.write(data)
f.close()
If the file IS horrendously huge, you'll need to read it and write it by pieces. For example, if each line ISN'T too horrendously huge:
inf = open('oldfile', 'r')
ouf = open('newfile', 'w')
for line in inf:
line = line.replace('.1.1.1', '.1.1')
ouf.write(line)
ouf.close()
inf.close()
Works with any size file:
open('newfile', 'w').writelines(line.replace('.1.1.1', '.1.1')
for line in open('oldfile'))
You can have something like this :
line = line.split(" ")
line[0] = line[0][0:line[0].rindex(".")]
print " ".join(line)
Not the prettiest code, but from my console tests, it works.

Categories

Resources