Eliminate part of a file in python - python

In the below file I have 3 occurrences of '.1'. I want to eliminate the last one and write the rest of file to a new file. Kindly suggest some way to do it in PYTHON and thank you all.
d1dlwa_ a.1.1.1 (A:) Protozoan/bacterial hemoglobin {Ciliate (Paramecium caudatum) [TaxId: 5885]}
slfeqlggqaavqavtaqfyaniqadatvatffngidmpnqtnktaaflcaalggpnawt

If the file's not too horrendously huge, by far the simplest approach is:
f = open('oldfile', 'r')
data = f.read()
f.close()
data = data.replace('.1.1.1', '.1.1')
f = open('newfile', 'w')
f.write(data)
f.close()
If the file IS horrendously huge, you'll need to read it and write it by pieces. For example, if each line ISN'T too horrendously huge:
inf = open('oldfile', 'r')
ouf = open('newfile', 'w')
for line in inf:
line = line.replace('.1.1.1', '.1.1')
ouf.write(line)
ouf.close()
inf.close()

Works with any size file:
open('newfile', 'w').writelines(line.replace('.1.1.1', '.1.1')
for line in open('oldfile'))

You can have something like this :
line = line.split(" ")
line[0] = line[0][0:line[0].rindex(".")]
print " ".join(line)
Not the prettiest code, but from my console tests, it works.

Related

Python split and find specific string from a text file

I have a raw data in a .txt file format and would like to convert it to .csv file format.
This is a sample data from the txt fle:
(L2-CR666 Reception Counter) L2-CR666 Reception Counter has been forced.
(L7-CR126 Handicapped Toilet) L7-CR126 Handicapped Toilet has been forced.
I would like to achieve the following result:
L2-CR666 Reception Counter, forced
L7-CR126 Handicapped Toilet, forced
I have tried the following code but was unable to achieve the stated result. Where did I went wrong?
import csv
with open('Converted Detection\\Testing 01\\2019-02-21.txt') as infile, open('Converted Detection\\Converted CSV\\log.csv', 'w') as outfile:
for line in infile:
outfile.write(infile.read().replace("(", ""))
for line in infile:
outfile.write(', '.join(infile.read().split(')')))
outfile.close()
You can try this :
with open('Converted Detection\\Testing 01\\2019-02-21.txt') as infile, open('Converted Detection\\Converted CSV\\log.csv', 'w') as outfile:
for line in infile:
# Get text inside ()
text = line[line.find("(")+1:line.find(")")]
# Remove \r\n
line = line.rstrip("\r\n")
# Get last word
forcedText = line.split(" ")[len(line.split(" "))-1]
# Remove . char
forcedText = forcedText[:len(forcedText)-1]
outfile.write(text+", "+forcedText+"\n")
outfile.close()
Best
You could use .partition() to truncate everything before ) and then simply replace the parts you do not want accordingly. Also, you do not have to close the file when using the with statement as it automatically closes it for you, and you do not have to import the csv library to save a file with the .csv extension.
The following code outputs your wanted result:
infile_path = "Converted Detection\\Testing 01\\2019-02-21.txt"
outfile_path = "Converted Detection\\Converted CSV\\log.csv"
with open(infile_path, "r") as infile, open(outfile_path, "+w") as outfile:
for line in infile:
line = line.partition(")")[2].replace(" has been forced.", ", forced").strip()
outfile.write(line + "\n")
First for loop is reading infile. No need to reread infile and second loop.
Also with block will take care of closing files.
for line in infile:
line = line.replace("(", "")
outfile.write(', '.join(line.split(')')))
I would suggest using:
lineout = ', '.join(linein.replace('(','').replace(')','').split(' has been ')
where:
linein = line.strip()

Writing a new text file in python

I'm writing code that goes over a text file counting how many words are in every line and having trouble putting the result (many lines that each consist ofa number) into a new text file.
My code:
in_file = open("our_input.txt")
out_file = open("output.txt", "w")
for line in in_file:
line = (str(line)).split()
x = (len(line))
x = str(x)
out_file.write(x)
in_file.close()
out_file.close()
But the file I'm getting has all the number together in one line.
How do I seperate them in the file I'm making?
You need to add a new line after each line :
out_file.write(x + '\n')
Also as a more pythonic way for dealing with files you can use with statement to open the files which will close the files at the end of the block.
And instead of multiple assignment and converting the length to string you can use str.format() method to do all of this jobs in one line:
with open("our_input.txt") as in_file,open("output.txt", "w") as out_file:
for line in in_file:
out_file.write('{}\n'.format(len(line.split())))
Add newline in the file while writing
in_file = open("our_input.txt")
out_file =open("output.txt", "w")
for line in in_file:
line= (str(line)).split()
x=(len(line))
x=str(x)
out_file.write(x)
#Write newline
out_file.write('\n')
in_file.close()
As the previous answers have pointed out, your need to write a newline to separate the ouput.
Here is yet another way to write the code
with open("our_input.txt") as in_file, open("output.txt", "w") as out_file:
res = map(lambda line: len(line.split()), in_file)
for r in res:
out_file.write('%d\n' % r)

How to read one particular line from .txt file in python?

I know I can read the line by line with
dataFile = open('myfile.txt', 'r')
firstLine = dataFile.readline()
secondLine = dataFile.readline()
...
I also know how to read all the lines in one go
dataFile = open('myfile.txt', 'r')
allLines = dataFile.read()
But my question is how to read one particular line from .txt file?
I wish to read that line by its index.
e.g. I want the 4th line, I expect something like
dataFile = open('myfile.txt', 'r')
allLines = dataFile.readLineByIndex(3)
Skip 3 lines:
with open('myfile.txt', 'r') as dataFile:
for i in range(3):
next(dataFile)
the_4th_line = next(dataFile)
Or use linecache.getline:
the_4th_line = linecache.getline('myfile.txt', 4)
From another Ans
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!
You can do exactly as you wanted with this:
DataFile = open('mytext.txt', 'r')
content = DataFile.readlines()
oneline = content[5]
DataFile.close()
you could take this down to three lines by removing oneline = content[5] and using content[5] without creating another variable (print(content[5]) for example) I did this just to make it clear that content[5] must be a used as a list to read the one line.

Better approach for reading/writing files in python?

Suppose I have a file (say file1.txt) with data around 3mb or more. If I want to write this data to a second file (say file2.txt), which one of the following approaches will be better?
Language used: Python 2.7.3
Approach 1:
file1_handler = file("file1.txt", 'r')
for lines in file1_handler:
line = lines.strip()
# Perform some operation
file2_handler = file("file2.txt", 'a')
file2_handler.write(line)
file2_handler.write('\r\n')
file2_handler.close()
file1_handler.close()
Approach 2:
file1_handler = file("file1.txt", 'r')
file2_handler = file("file2.txt", 'a')
for lines in file1_handler:
line = lines.strip()
# Perform some operation
file2_handler.write(line)
file2_handler.write('\r\n')
file2_handler.close()
file1_handler.close()
I think approach two will be better because you just have to open and close file2.txt once. What do you say?
Use with, it will close the files automatically for you:
with open("file1.txt", 'r') as in_file, open("file2.txt", 'a') as out_file:
for lines in in_file:
line = lines.strip()
# Perform some operation
out_file.write(line)
out_file.write('\r\n')
Use open instead of file, file is deprecated.
Of course it's unreasonable to open file2 on every line of file1.
I was recently doing something similar (if I understood you well). How about:
file = open('file1.txt', 'r')
file2 = open('file2.txt', 'wt')
for line in file:
newLine = line.strip()
# You can do your operation here on newLine
file2.write(newLine)
file2.write('\r\n')
file.close()
file2.close()
This approach works like a charm!
My solution (derived from Pavel Anossov + buffering):
dim = 1000
buffer = []
with open("file1.txt", 'r') as in_file, open("file2.txt", 'a') as out_file:
for i, lines in enumerate(in_file):
line = lines.strip()
# Perform some operation
buffer.append(line)
if i%dim == dim-1:
for bline in buffer:
out_file.write(bline)
out_file.write('\r\n')
buffer = []
Pavel Anossov gave the right solution first: this is just a suggestion ;)
Probably it exists a more elegant way to implement this function. If anyone knows it, please tell us.

Insert string at the beginning of each line

How can I insert a string at the beginning of each line in a text file, I have the following code:
f = open('./ampo.txt', 'r+')
with open('./ampo.txt') as infile:
for line in infile:
f.insert(0, 'EDF ')
f.close
I get the following error:
'file' object has no attribute 'insert'
Python comes with batteries included:
import fileinput
import sys
for line in fileinput.input(['./ampo.txt'], inplace=True):
sys.stdout.write('EDF {l}'.format(l=line))
Unlike the solutions already posted, this also preserves file permissions.
You can't modify a file inplace like that. Files do not support insertion. You have to read it all in and then write it all out again.
You can do this line by line if you wish. But in that case you need to write to a temporary file and then replace the original. So, for small enough files, it is just simpler to do it in one go like this:
with open('./ampo.txt', 'r') as f:
lines = f.readlines()
lines = ['EDF '+line for line in lines]
with open('./ampo.txt', 'w') as f:
f.writelines(lines)
Here's a solution where you write to a temporary file and move it into place. You might prefer this version if the file you are rewriting is very large, since it avoids keeping the contents of the file in memory, as versions that involve .read() or .readlines() will. In addition, if there is any error in reading or writing, your original file will be safe:
from shutil import move
from tempfile import NamedTemporaryFile
filename = './ampo.txt'
tmp = NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name, 'w') as ftmp:
for line in finput:
ftmp.write('EDF '+line)
move(tmp.name, filename)
For a file not too big:
with open('./ampo.txt', 'rb+') as f:
x = f.read()
f.seek(0,0)
f.writelines(('EDF ', x.replace('\n','\nEDF ')))
f.truncate()
Note that , IN THEORY, in THIS case (the content is augmented), the f.truncate() may be not really necessary. Because the with statement is supposed to close the file correctly, that is to say, writing an EOF (end of file ) at the end before closing.
That's what I observed on examples.
But I am prudent: I think it's better to put this instruction anyway. For when the content diminishes, the with statement doesn't write an EOF to close correctly the file less far than the preceding initial EOF, hence trailing initial characters remains in the file.
So if the with statement doens't write EOF when the content diminishes, why would it write it when the content augments ?
For a big file, to avoid to put all the content of the file in RAM at once:
import os
def addsomething(filepath, ss):
if filepath.rfind('.') > filepath.rfind(os.sep):
a,_,c = filepath.rpartition('.')
tempi = a + 'temp.' + c
else:
tempi = filepath + 'temp'
with open(filepath, 'rb') as f, open(tempi,'wb') as g:
g.writelines(ss + line for line in f)
os.remove(filepath)
os.rename(tempi,filepath)
addsomething('./ampo.txt','WZE')
f = open('./ampo.txt', 'r')
lines = map(lambda l : 'EDF ' + l, f.readlines())
f.close()
f = open('./ampo.txt', 'w')
map(lambda l : f.write(l), lines)
f.close()

Categories

Resources