Different output format than expected - python

I have written code to read following text file
Generated by trjconv : a bunch of waters t= 0.00000
3000
1SOL OW 1 -1.5040 2.7580 0.6820
1SOL HW1 2 1.4788 2.7853 0.7702
1SOL HW2 3 1.4640 2.8230 0.6243
2SOL OW 4 1.5210 0.9510 2.2050
2SOL HW1 5 -1.5960 0.9780 2.1520
2SOL HW2 6 1.4460 0.9940 2.1640
1000SOL OW 2998 1.5310 1.7952 2.1981
1000SOL HW1 2999 1.4560 1.7375 -2.1836
1000SOL HW2 3000 1.6006 1.7369 2.2286
3.12736 3.12736 3.12736
Generated by trjconv : a bunch of waters t= 9000.00000
3000
1SOL OW 1 1.1579 0.4255 2.1329
1SOL HW1 2 1.0743 0.3793 2.1385
Written Code:
F = open('Data.gro', 'r')
A = open('TTT.txt', 'w')
XO = []
I = range(1, 10)
with open('Data.gro') as F:
for line in F:
if line.split()[0] == '3000':
A.write('Frame' + '\n')
for R in I:
line = next(F)
P = line.split()
x = float(P[3])
XO.append(x)
if line.split()[2] == '3000':
print('Oxygen atoms XYZ coordinates:')
A.write('Oxygen atoms XYZ coordinates:' + '\n')
A.write("%s\n" % (XO))
XO
XO[0] - XO[1]
XO = []
else:
pass
else:
pass
A.close()
First problem:
My problem is Out put text file looks like as follows in one line. It printed as a one line in text file.
FrameOxygen atoms XYZ coordinates:[-1.504, 1.4788, 1.464, 1.521, -1.596, 1.446, 1.531, 1.456, 1.6006]FrameOxygen atoms XYZ coordinates:[1.1579, 1.0743, 1.1514, 2.2976, 2.2161, 2.3118, 2.5927, -2.5927, 2.5365]
Output Should be like below.
Frame
Oxygen atoms XYZ coordinates:
[-1.504, 1.4788, 1.464, 1.521, -1.596, 1.446, 1.531, 1.456, 1.6006]
Frame
Oxygen atoms XYZ coordinates:
[1.1579, 1.0743, 1.1514, 2.2976, 2.2161, 2.3118, 2.5927, -2.5927, 2.5365]
But when I am reading it shows the '\n' at the end of each separated point.
Does any one have a idea.
Second Problem
Next problem is this only generate when I copy paste codes into a python shell. If double click my 'code.py' file it is not generating out put file. There is no error when I copy paste codes into python shell.

1) Which platform and editor you are using?
'\n' should work as expected.
I suspect you are running the code in Windows and you used notepad to inspect the file. Try use Wordpad or other more capable editor to open TTT.txt. The result should be as expected.
2) If you're doubling clicking the script in MS Windows, you are very likely to have missed some exceptions printed by python. Try run it in a command prompt:
python code.py

Anthoney is Correct.
Windows has this issue. use WordPad to open the file.

To answer your first question:
'\n', an escaped n, is the newline character.
To answer your second question:
A frequent problem when pasting into a shell is that the pasting occurs faster than the shell processes it, meaning that the lines could be ignored by the shell.
Another issue you might have, particularly if you're pasting the above code into a shell, is the inconsistent indentation.
Your if and else are not lined up, probably because you only indented 3 spaces from the preceding line.
if line.split()[2] == '3000':
print('Oxygen atoms XYZ coordinates:')
A.write('Oxygen atoms XYZ coordinates:' + '\n')
A.write("%s\n" % (XO))
XO
XO[0] - XO[1]
XO = []
else:
pass
Also, you could nest your openings of files. In particular, this line is redundant, and could be removed:
F = open('Data.gro', 'r')
And you can do this:
...
with open('Data.gro') as F:
with open('TTT.txt', 'w') as A:
...
So that if you have an error writing your file, you will still at least close it. (which means you can remove the A.close() at the end.)

Related

Save rows of Dataframe to Separate txt files

Output displayed on the terminal screen has to be saved into seperate txt files. The output terminal looks like this:
A1001
2
B1001
6
C1001
4
which has been derived from a while loop via commands:
print(apartmentno)
print(df)
How can I print each pair of output into a text file named output1.txt, output2.txt, output3.txt.... outputn.txt?
Expected output:
output1.txt
A1001
2
output2.txt
B1001
6
output3.txt
C1001
4
So far I have tried the following code:
.
.
.
df = disp_date.ClassAttend.sum()
print(apartmentno)
print(final_df)
z = []
z.append(apartmentno)
z.append(df)
print(z)
ctr = 0
for i in z:
ctr += 1
f = open(('d:\\output'+ str(ctr)+ '.txt'), mode='w')
f.write(apartmentno)
f.write(df)
f.close()
Also, I get an error FileNotFoundError: [Errno 2] No such file or directory although I am opening in 'w' mode.
How can I get my required output?
It is not entirely clear to me how you are constructing your loop, but hopefully the following information is helpful to you.
You can use enumerate for the index numbers in your loop, and use the with keyword when dealing with file objects. This way you can create your files. Regarding with:
The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point.
As of Python 3.6, you can also use f-strings, which makes the construction of your filename more readable:
for index, value in enumerate(z):
with open(f'output{index}.txt', 'w') as f:
f.write(value)

Concatenate multiple text files of DNA sequences in Python or R?

I was wondering how to concatenate exon/DNA fasta files using Python or R.
Example files:
So far I really liked using R ape package for the cbind method, solely because of the fill.with.gaps=TRUE attribute. I really need gaps inserted when a species is missing an exon.
My code:
ex1 <- read.dna("exon1.txt", format="fasta")
ex2 <- read.dna("exon2.txt", format="fasta")
output <- cbind(ex1, ex2, fill.with.gaps=TRUE)
write.dna(output, "Output.txt", format="fasta")
Example:
exon1.txt
>sp1
AAAA
>sp2
CCCC
exon2.txt
>sp1
AGG-G
>sp2
CTGAT
>sp3
CTTTT
Output file:
>sp1
AAAAAGG-G
>sp2
CCCCCTGAT
>sp3
----CTTTT
So far I am having trouble trying to apply this technique when I have multiple exon files (trying to figure out a loop to open and execute the cbind method for all files ending with .fa in the directory), and sometimes not all files have exons that are all identical in length - hence DNAbin stops working.
So far I have:
file_list <- list.files(pattern=".fa")
myFunc <- function(x) {
for (file in file_list) {
x <- read.dna(file, format="fasta")
out <- cbind(x, fill.with.gaps=TRUE)
write.dna(out, "Output.txt", format="fasta")
}
}
However when I run this and I check my output text file, it misses many exons and I think that is because not all files have the same exon length... or my script is failing somewhere and I can't figure it out: (
Any ideas? I can also try Python.
If you prefer using Linux one liners you have
cat exon1.txt exon2.txt > outfile
if you want only the unique records from the outfile use
awk '/^>/{f=!d[$1];d[$1]=1}f' outfile > sorted_outfile
I just came out with this answer in Python 3:
def read_fasta(fasta): #Function that reads the files
output = {}
for line in fasta.split("\n"):
line = line.strip()
if not line:
continue
if line.startswith(">"):
active_sequence_name = line[1:]
if active_sequence_name not in output:
output[active_sequence_name] = []
continue
sequence = line
output[active_sequence_name].append(sequence)
return output
with open("exon1.txt", 'r') as file: # read exon1.txt
file1 = read_fasta(file.read())
with open("exon2.txt", 'r') as file: # read exon2.txt
file2 = read_fasta(file.read())
finaldict = {} #Concatenate the
for i in list(file1.keys()) + list(file2.keys()): #both files content
if i not in file1.keys():
file1[i] = ["-" * len(file2[i][0])]
if i not in file2.keys():
file2[i] = ["-" * len(file1[i][0])]
finaldict[i] = file1[i] + file2[i]
with open("output.txt", 'w') as file: # output that in file
for k, i in finaldict.items(): # named output.txt
file.write(">{}\n{}\n".format(k, "".join(i))) #proper formatting
It's pretty hard to comment and explain it completely, and it might not help you, but this is better than nothing :P
I used Ɓukasz Rogalski's code from answer to Reading a fasta file format into Python dict.

Replacing a string in a file in python

What my text is
$TITLE = XXXX YYYY
1 $SUBTITLE= XXXX YYYY ANSA
2 $LABEL = first label
3 $DISPLACEMENTS
4 $MAGNITUDE-PHASE OUTPUT
5 $SUBCASE ID = 30411
What i want
$TITLE = XXXX YYYY
1 $SUBTITLE= XXXX YYYY ANSA
2 $LABEL = new label
3 $DISPLACEMENTS
4 $MAGNITUDE-PHASE OUTPUT
5 $SUBCASE ID = 30411
The code i am using
import re
fo=open("test5.txt", "r+")
num_lines = sum(1 for line in open('test5.txt'))
count=1
while (count <= num_lines):
line1=fo.readline()
j= line1[17 : 72]
j1=re.findall('\d+', j)
k=map(int,j1)
if (k==[30411]):
count1=count-4
line2=fo.readlines()[count1]
r1=line2[10:72]
r11=str(r1)
r2="new label"
r22=str(r2)
newdata = line2.replace(r11,r22)
f1 = open("output7.txt",'a')
lines=f1.writelines(newdata)
else:
f1 = open("output7.txt",'a')
lines=f1.writelines(line1)
count=count+1
The problem is in the writing of line. Once 30411 is searched and then it has to go 3 lines back and change the label to new one. The new output text should have all the lines same as before except label line. But it is not writing properly. Can anyone help?
Apart from many blood-curdling but noncritical problems, you are calling readlines() in the middle of an iteration using readline(), causing you to read lines not from the beginning of the file but from the current position of the fo handle, i.e. after the line containing 30411.
You need to open the input file again with a separate handle or (better) store the last 4 lines in memory instead of rereading the one you need to change.

How do I print a range of lines after a specific pattern into separate files when this pattern appears several times in an input file

Sorry for my previous post, I had no idea what I was doing. I am trying to cut out certain ranges of lines in a given input file and print that range to a separate file. This input file looks like:
18
generated by VMD
C 1.514895 -3.887949 2.104134
C 2.371076 -2.780954 1.718424
C 3.561071 -3.004933 1.087316
C 4.080424 -4.331872 1.114878
C 3.289761 -5.434047 1.607808
C 2.018473 -5.142150 2.078551
C 3.997237 -6.725186 1.709355
C 5.235126 -6.905640 1.295296
C 5.923666 -5.844841 0.553037
O 6.955216 -5.826197 -0.042920
O 5.269004 -4.590026 0.590033
H 4.054002 -2.184680 0.654838
H 1.389704 -5.910354 2.488783
H 5.814723 -7.796634 1.451618
O 1.825325 -1.537706 1.986256
H 2.319215 -0.796042 1.550394
H 3.390707 -7.564847 2.136680
H 0.535358 -3.663175 2.483943
18
generated by VMD
C 1.519866 -3.892621 2.109595
I would like to print every 100th frame starting from the first frame into its own file named "snapshot0.xyz" (The first frame is frame 0).
For example, the above input shows two snapshots. I would like to print out lines 1:20 into its own file named snapshot0.xyz and then skip 100 (2000 lines) snapshots and print out snapshot1.xyz (with the 100th snapshot). My attempt was in python, but you can choose either grep, awk, sed, or Python.
My input file: frames.dat
1 #!/usr/bin/Python
2
3
4
5 mest = open('frames.dat', 'r')
6 test = mest.read().strip().split('\n')
7
8 for i in range(len(test)):
9 if test[i] == '18':
10 f = open("out"+`i`+".dat", "w")
11 for j in range(19):
12 print >> f, test[j]
13 f.close()
I suggest using the csv module for this input.
import csv
def strip_empty_columns(line):
return filter(lambda s: s.strip() != "", line)
def is_count(line):
return len(line) == 1 and line[0].strip().isdigit()
def is_float(s):
try:
float(s.strip())
return True
except ValueError:
return False
def is_data_line(line):
return len(line) == 4 and is_float(line[1]) and is_float(line[2]) and is_float(line[3])
with open('frames.dat', 'r') as mest:
r = csv.reader(mest, delimiter=' ')
current_count = 0
frame_nr = 0
outfile = None
for line in r:
line = strip_empty_columns(line)
if is_count(line):
if frame_nr % 100 == 0:
outfile = open("snapshot%d.xyz" % frame_nr, "w+")
elif outfile:
outfile.close()
outfile = None
frame_nr += 1 # increment the frame counter every time you see this header line like '18'
elif is_data_line(line):
if outfile:
outfile.write(" ".join(line) + "\n")
The opening post mentions to write every 100th frame to an output file named snapshot0.xyz. I assume the 0 should be a counter, ot you would continously overwrite the file. I updated the code with a frame_nr counter and a few lines which open/close an output file depending on the frame_nr and write data if an output file is open.
This might work for you (GNU sed and csplit):
sed -rn '/^18/{x;/x{100}/z;s/^/x/;x};G;/\nx$/P' file | csplit -f snapshot -b '%d.xyz' -z - '/^18/' '{*}'
Filter every 100th frame using sed and pass that file to csplit to create the individual files.

matching and dispalying specific lines through python

I have 15 lines in a log file and i want to read the 4th and 10 th line for example through python and display them on output saying this string is found :
abc
def
aaa
aaa
aasd
dsfsfs
dssfsd
sdfsds
sfdsf
ssddfs
sdsf
f
dsf
s
d
please suggest through code how to achieve this in python .
just to elaborate more on this example the first (string or line is unique) and can be found easily in logfile the next String B comes within 40 lines of the first one but this one occurs at lots of places in the log file so i need to read this string withing the first 40 lines after reading string A and print the same that these strings were found.
Also I cant use with command of python as this gives me errors like 'with' will become a reserved keyword in Python 2.6. I am using Python 2.5
You can use this:
fp = open("file")
for i, line in enumerate(fp):
if i == 3:
print line
elif i == 9:
print line
break
fp.close()
def bar(start,end,search_term):
with open("foo.txt") as fil:
if search_term in fil.readlines()[start,end]:
print search_term + " has found"
>>>bar(4, 10, "dsfsfs")
"dsfsfs has found"
#list of random characters
from random import randint
a = list(chr(randint(0,100)) for x in xrange(100))
#look for this
lookfor = 'b'
for element in xrange(100):
if lookfor==a[element]:
print a[element],'on',element
#b on 33
#b on 34
is one easy to read and simple way to do it. Can you give part of your log file as an example? There are other ways that may work better :).
after edits by author:
The easiest thing you can do then is:
looking_for = 'findthis' i = 1 for line in open('filename.txt','r'):
if looking_for == line:
print i, line
i+=1
it's efficient and easy :)

Categories

Resources