I have a .txt file of amino acids separated by ">node" like this:
Filename.txt :
>NODE_1
MSETLVLTRPDDWHVHLRDGAALQSVVPYTARQFARAIAMPNLKPPITTAEQAQAYRERI
KFFLGTDSAPHASVMKENSVCGAGCFTALSALELYAEAFEAAGALDKLEAFASFHGADFY
GLPRNTTQVTLRKTEWTLPESVPFGEAAQLKPLRGGEALRWKLD*
>NODE_2
MSTWHKVQGRPKAQARRPGRKSKDDFVTRVEHDAKNDALLQLVRAEWAMLRSDIATFRGD
MVERFGKVEGEITGIKGQIDGLKGEMQGVKGEVEGLRGSLTTTQWVVGTAMALLAVVTQV
PSIISAYRFPPAGSSAFPAPGSLPTVPGSPASAASAP*
I want to separate this file into two (or as many as there are nodes) files;
Filename1.txt :
>NODE
MSETLVLTRPDDWHVHLRDGAALQSVVPYTARQFARAIAMPNLKPPITTAEQAQAYRERI
KFFLGTDSAPHASVMKENSVCGAGCFTALSALELYAEAFEAAGALDKLEAFASFHGADFY
GLPRNTTQVTLRKTEWTLPESVPFGEAAQLKPLRGGEALRWKLD*
Filename2.txt :
>NODE
MSTWHKVQGRPKAQARRPGRKSKDDFVTRVEHDAKNDALLQLVRAEWAMLRSDIATFRGD
MVERFGKVEGEITGIKGQIDGLKGEMQGVKGEVEGLRGSLTTTQWVVGTAMALLAVVTQV
PSIISAYRFPPAGSSAFPAPGSLPTVPGSPASAASAP*
With a number after the filename
This code works, however it deletes the ">NODE" line and does not create a file for the last node (the one without a '>' afterwards).
with open('FilePathway') as fo:
op = ''
start = 0
cntr = 1
for x in fo.read().split("\n"):
if x.startswith('>'):
if start == 1:
with open (str(cntr) + '.fasta','w') as opf:
opf.write(op)
opf.close()
op = ''
cntr += 1
else:
start = 1
else:
if op == '':
op = x
else:
op = op + '\n' + x
fo.close()
I canĀ“t seem to find the mistake. Would be thankful if you could point it out to me.
Thank you for your help!
Hi again! Thank you for all the comments. With your help, I managed to get it to work perfectly. For anyone with similar problems, this is my final code:
import os
import glob
folder_path = 'FilePathway'
for filename in glob.glob(os.path.join(folder_path, '*.fasta')):
with open(filename) as fo:
for line in fo.readlines():
if line.startswith('>'):
original = line
content = [original]
fileno = 1
filename = filename
y = filename.replace(".fasta","_")
def writefasta():
global content, fileno
if len(content) > 1:
with open(f'{y}{fileno}.fasta', 'w') as fout:
fout.write(''.join(content))
content = [line]
fileno += 1
with open('FilePathway') as fin:
for line in fin:
if line.startswith('>NODE'):
writefasta()
else:
content.append(line)
writefasta()
You could do it like this:
def writefasta(d):
if len(d['content']) > 1:
with open(f'Filename{d["fileno"]}.fasta', 'w') as fout:
fout.write(''.join(d['content']))
d['content'] = ['>NODE\n']
d['fileno'] += 1
with open('test.fasta') as fin:
D = {'content': ['>NODE\n'], 'fileno': 1}
for line in fin:
if line.startswith('>NODE'):
writefasta(D)
else:
D['content'].append(line)
writefasta(D)
This would be better way. It is going to write only on odd iterations. So that, ">NODE" will be skipped and files will be created only for the real content.
with open('filename.txt') as fo:
cntr=1
for i,content in enumerate(fo.read().split("\n")):
if i%2 == 1:
with open (str(cntr) + '.txt','w') as opf:
opf.write(content)
cntr += 1
By the way, since you are using context manager, you dont need to close the file.
Context managers allow you to allocate and release resources precisely
when you want to. It opens the file, writes some data to it and then
closes it.
Please check: https://book.pythontips.com/en/latest/context_managers.html
with open('FileName') as fo:
cntr = 1
for line in fo.readlines():
with open (f'{str(cntr)}.fasta','w') as opf:
opf.write(line)
opf.close()
op = ''
cntr += 1
fo.close()
I am trying to remove multiple lines from a file where the lines begin with a specified string.
I've tried using a list as below but the lines are written to the file the number of times equal to the items in the list. Some of the lines are removed some are not I'm pretty sure that it is due to not reading the next line at the correct time
trunklog = open('TrunkCleanedDaily.csv', 'r')
fh = open("TCDailyFinal.csv", "w")
firstletter = ['Queue,Completed', 'Outbound,', 'Red_Team_DM,', 'Sunshine,', 'Agent,','Disposition,', 'Unknown,']
while True:
line = trunklog.readline()
if not line:
break;
for i in firstletter:
if line.startswith(i):
print('del ' + line, end='')
# line = trunklog.readline()
else:
fh.write(line)
print('keep ' + line,end='')
line = trunklog.readline()
Any help setting me straight about this is appreciated.
Some of the content I am trying to remove:
Queue,Completed,Abandons,Exits,Unique,Completed %,Not Completed %,Total Calls,
Green_Team_AMOne,93,0,0,0,100.00%,0.00%,8.04%,
Green_Team_DM,11,0,0,0,100.00%,0.00%,0.95%,
Green_Team_IVR,19,0,0,0,100.00%,0.00%,1.64%,
Outbound,846,131,0,0,86.59%,13.41%,84.44%,
Red_Team_AMOne,45,0,0,0,100.00%,0.00%,3.89%,
Red_Team_DM,3,0,0,0,100.00%,0.00%,0.26%,
Red_Team_IVR,5,0,0,0,100.00%,0.00%,0.43%,
Sunshine,4,0,0,0,100.00%,0.00%,0.35%,
Queue,Total Call Time,Average Call Time,Average Hold Time,Call Time %,None,
Green_Team_AMOne,32:29:06,20:57,00:10,42.92%,None,
Green_Team_DM,2:41:35,14:41,00:16,3.56%,None,
Green_Team_IVR,1:47:12,05:38,00:19,2.36%,None,
Try below code:
trunklog = open('TrunkCleanedDaily.csv', 'r')
fh = open("TCDailyFinal.csv", "w")
firstletter = ['Queue,Completed', 'Outbound,', 'Red_Team_DM,', 'Sunshine,', 'Agent,', 'Disposition,', 'Unknown,']
for line in trunklog:
cnt=0
for i in firstletter:
if line.startswith(i):
print('del ' + line, end='')
cnt=1
if not cnt:
fh.write(line)
print('keep ' + line, end='')
I have modified your code a little bit.
And added a variable 'cnt', which will be 1, if first word is in firstletter list.
If cnt=0, then it will write line to the new file.
You just have to left intend else statement for for loop and add break if you have to delete a line.
trunklog = open('TrunkCleanedDaily.csv', 'r')
fh = open("TCDailyFinal.csv", "w")
firstletter = ['Queue,Completed', 'Outbound,', 'Red_Team_DM,', 'Sunshine,', 'Agent,','Disposition,', 'Unknown,']
while True:
line = trunklog.readline()
if not line:
break;
for i in firstletter:
if line.startswith(i):
print('del ' + line, end='')
break
else:
fh.write(line)
print('keep ' + line,end='')
Output file
Green_Team_AMOne,93,0,0,0,100.00%,0.00%,8.04%,
Green_Team_DM,11,0,0,0,100.00%,0.00%,0.95%,
Green_Team_IVR,19,0,0,0,100.00%,0.00%,1.64%,
Red_Team_AMOne,45,0,0,0,100.00%,0.00%,3.89%,
Red_Team_IVR,5,0,0,0,100.00%,0.00%,0.43%,
Queue,Total Call Time,Average Call Time,Average Hold Time,Call Time %,None,
Green_Team_AMOne,32:29:06,20:57,00:10,42.92%,None,
Green_Team_DM,2:41:35,14:41,00:16,3.56%,None,
Green_Team_IVR,1:47:12,05:38,00:19,2.36%,None,
I need to write my Python shell to an output text file. I have some of it written into an output text file but all I need is to now add the number of lines and numbers in each line to my output text file.
I have tried to add another for loop outside the for loop. I've tried putting it inside the for loop and it was just complicated.
Text file list of numbers:
1.0, 1.12, 1.123
1.0,1.12,1.123
1
Code:
import re
index = 0
comma_string = ', '
outfile = "output2.txt"
wp_string = " White Space Detected"
tab_string = " tab detected"
mc_string = " Missing carriage return"
ne_string = " No Error"
baconFile = open(outfile,"wt")
with open("Version2_file.txt", 'r') as f:
for line in f:
flag = 0
carrera = ""
index = index +1
print("Line {}: ".format(index))
baconFile.write("Line {}: ".format(index))
if " " in line: #checking for whitespace
carrera = carrera + wp_string + comma_string + carrera
flag = 1
a = 1
if "\t" in line: #checking for tabs return
carrera = carrera + tab_string + comma_string + carrera
flag = 1
if '\n' not in line:
carrera = carrera + mc_string + ne_string + carrera
flag = 1
if flag == 0: #checking if no error is true by setting flag equal to zero
carrera = ne_string
print('\t'.join(str(len(g)) for g in re.findall(r'\d+\.?(\d+)?', line )))
print (carrera)
baconFile.write('\t'.join(str(len(g)) for g in re.findall(r'\d+\.?(\d+)?', line ) ))
baconFile.write(carrera + "\n")
with open("Version2_file.txt", 'r') as f:
content = f.readlines()
print('Number of Lines: {}'.format(len(content)))
for i in range(len(content)):
print('Numbers in Line {}: {}'.format(i+1, len(content[i].split(','))))
baconFile.write('Number of lines: {}'.format(len(content)))
baconFile.write('Numbers in Line {}: {}'.format(i+1, len(content[i].split(','))))
baconFile.close()
Expected to write in output file:
Line 1: 1 2 3 Tab detected, whitespace detected
Line 2: 1 2 3 No error
Line 3: 1 Missing carriage return No error
Number of Lines: 3
Numbers in Line 1: 3
Numbers in Line 2: 3
Numbers in Line 3: 1
Actual from output file:
Line 1: 1 3 2White Space Detected, tab detected, White Space Detected,
Line 2: 1 3 2No Error
Line 3: 0Missing carriage returnNo Error
Number of lines: 3Numbers in Line 1: 3Number of lines: 3Numbers in Line 2: 3Numb
You have closed baconFile in the first open block, but do not open it again in the second open block. Additionally, you never write to baconFile in the second open block, which makes sense considering you've not opened it there, but then you can't expect to have written to it. It seems you simply forgot to add some write statements. Perhaps you confused write with print. Add those write statements in and you should be golden.
baconFile = open(outfile,"wt")
with open("Version2_file.txt", 'r') as f:
for line in f:
# ... line processing ...
baconFile.write(...) # line format info here
# baconFile.close() ## <-- move this
with open("Version2_file.txt", 'r') as f:
content = f.readlines()
baconFile.write(...) # number of lines info here
for i in range(len(content)):
baconFile.write(...) # numbers in each line info here
baconFile.close() # <-- over here
Here's a useful trick you can use to make print statements send their output to a specified file instead of the screen (i.e. stdout):
from contextlib import contextmanager
import os
import sys
#contextmanager
def redirect_stdout(target_file):
save_stdout = sys.stdout
sys.stdout = target_file
yield
sys.stdout = save_stdout
# Sample usage
with open('output2.txt', 'wt') as target_file:
with redirect_stdout(target_file):
print 'hello world'
print 'testing', (1, 2, 3)
print 'done' # Won't be redirected.
Contents of output2.txt file after running the above:
hello world
testing (1, 2, 3)
I want to replace string in a line which contain patternB, something like this:
from:
some lines
line contain patternA
some lines
line contain patternB
more lines
to:
some lines
line contain patternA
some lines
line contain patternB xx oo
more lines
I have code like this:
inputfile = open("d:\myfile.abc", "r")
outputfile = open("d:\myfile_renew.abc", "w")
obj = "yaya"
dummy = ""
item = []
for line in inputfile:
dummy += line
if line.find("patternA") != -1:
for line in inputfile:
dummy += line
if line.find("patternB") != -1:
item = line.split()
dummy += item[0] + " xx " + item[-1] + "\n"
break
outputfile.write(dummy)
It do not replace the line contain "patternB" as expected, but add an new line below it like :
some lines
line contain patternA
some lines
line contain patternB
line contain patternB xx oo
more lines
What can I do with my code?
Of course it is, since you append line to dummy in the beginning of the for loop and then the modified version again in the "if" statement. Also why check for Pattern A if you treat is as you treat everything else?
inputfile = open("d:\myfile.abc", "r")
outputfile = open("d:\myfile_renew.abc", "w")
obj = "yaya"
dummy = ""
item = []
for line in inputfile:
if line.find("patternB") != -1:
item = line.split()
dummy += item[0] + " xx " + item[-1] + "\n"
else:
dummy += line
outputfile.write(dummy)
The simplest will be:
1. Read all File into string
2. Call string.replace
3. Dump string to file
If you want to keep line by line iterator
(for a big file)
for line in inputfile:
if line.find("patternB") != -1:
dummy = line.replace('patternB', 'patternB xx oo')
outputfile.write(dummy)
else:
outputfile.write(line)
This is slower than other responses, but enables big file processing.
This should work
import os
def replace():
f1 = open("d:\myfile.abc","r")
f2 = open("d:\myfile_renew.abc","w")
ow = raw_input("Enter word you wish to replace:")
nw = raw_input("Enter new word:")
for line in f1:
templ = line.split()
for i in templ:
if i==ow:
f2.write(nw)
else:
f2.write(i)
f2.write('\n')
f1.close()
f2.close()
os.remove("d:\myfile.abc")
os.rename("d:\myfile_renew.abc","d:\myfile.abc")
replace()
You can use str.replace:
s = '''some lines
line contain patternA
some lines
line contain patternB
more lines'''
print(s.replace('patternB', 'patternB xx oo'))