How to add a Chain, Residues and atoms in pdb file?

How to add a Chain, Residues and atoms in pdb file? - python

I want to add a chain with all the residues and one carbon-alpha atom per residue using the OpenMM. Here the link of instances can be used openMM documentation . Currently I have one chain in my pdb file pdb file. Here is what I've tried:
from simtk.openmm.app import *
from simtk.openmm import *
from simtk.unit import *
import numpy as np
import mdtraj as md
# Here I remove all other atoms except carbon-alpha and stored them in a new file 'CA_only.pdb'
main_pdb = md.load_pdb('1PGB.pdb')
atoms_to_keep = [atom.index for atom in main_pdb.topology.atoms if atom.name == 'CA']
new_pdb = main_pdb.atom_slice(atoms_to_keep)
new_pdb.save('CA_only.pdb')
# Then I try to add a chain and residue, but no results
pdb = PDBFile('CA_only.pdb')
top = topology.Topology
chain=top.addChain(id='A')
residue = top.addResidue(MET, 0)
I might be using wrong names of residues. But I don't know what is wrong. Could someone have any idea how to resolve this? Thanks!!

Related

Get all bounds lengths from PDB

I want to get all bounds lengths from PDB file.
I tried Bio.PDB, but I don't understand NeighborSearch class and it's methods: search() and search_all()
from Bio.PDB import *
import numpy as np
structure = PDBParser().get_structure('Kek', '1wba.pdb')
atom_list = [_ for _ in structure.get_atoms()]
kek = NeighborSearch(atom_list).search_all(2)
for atom_pair in kek:
a = atom_pair[0]
b = atom_pair[1]
distance = np.linalg.norm(np.array(a.coord) - np.array(b.coord))
print(distance)
How can I solve my task? Maybe there's another framework - I'll watch every variant if it works right!

As per my understanding you are looking to find ways to calculate the distance between atoms in a PDB file. I adapted your answer and this Biostars solution. Hope it helps a bit
import Bio.PDB
parser = Bio.PDB.PDBParser(QUIET=True)
structures = parser.get_structure('2rdx', '2rdx.pdb')
structure = structures[0]
atom_list = [_ for _ in structure.get_atoms()]
ns = Bio.PDB.NeighborSearch(atom_list)
_cutoff_dist = 5
for target in atom_list:
close_atoms = ns.search(target.coord, _cutoff_dist)
for close_atom in close_atoms:
print(target, close_atom, target - close_atom)
print ("==========")
You can easily find distance between two Atom objects by using the - operator.

Swap numbers from different lines (from fileinput library) without using temporary variables

I am aware that there are similar posts. But none of them seems to deal with input from the file input library. My biggest problem here is that I don't know how to handle each line generated from the for loop without assigning a temporary variable.
I am doing a code practice question where the requirement is to "Write a function to swap two numbers without using any temporary variables."
And the code I was given to start the question is:
import fileinput
import sys
for line in fileinput.input():
I am aware that in python, swapping numbers can be as easy as x,y = y,x, or it can be:
x = x^y
y = x^y
x = x^y
But how can I do it when there is no x or y assigned here?
My solution so far is this:
import fileinput
import sys
l = []
for line in fileinput.input():
l.append(line)
l = l[::-1]
for e in l:
print(e)
But I'm not sure if I have violated the rules by introducing l.
Input example:
5
10
Expected output:
10
5

This works for me. They didn't say anything about not writing to a temporary file right? :)
import fileinput
import os
import sys
l = []
for line in fileinput.input():
open('test.txt', 'a').write(line)
print('\n'.join(open('test.txt').read().split('\n')[-2::-1]))

Reading a file and printing a specific answer

I read a text file called CJ.txt containing 2 columns(z and mub) and 31 rows. ((I just write the important part of my program)).
The question is how to define or call a "r" to reach an appropriate answer for example: I would like to print r[25]. It need r[25]=mub[25]*z[25]
Another r[i], i from 0 to 31 can be obtained similar to above.
from math import *
import numpy as np
from scipy.integrate import quad
from scipy.integrate import odeint
min=l=m=n=b=t=chi=r=None
f=0
z,mub=np.genfromtxt('CJ.txt',unpack=True) # opening the text file
for i in range(len(z)): # This means from 0 to 31
r[i]=mub[i]*z[i] # need a function similar to this
print(r[5],r[31],r[2],r[12]) #and other r
or creating an array
x=[[r[1],r[5],r[7]],
[r[31],r[26],r[20]],
[r[21],r[12],r[14]]]
I don't know that this question is easy or hard, but it is very important to me.
I appreciate you time and your attention.

Is that what you looking for?
z,mub=np.genfromtxt('CJ.txt',unpack=True) # opening the text file
r = []
for i in range(len(z)): # This means from 0 to 31
r.append(mub[i]*z[i]) # need a function similar to this
print(r[5],r[31],r[2],r[12]) #and other r

Intramolecular protein residue contact map using biopython, KeyError: 'CA'

I am trying to identify amino acid residues in contact in the 3D protein structure. I am new to BioPython but found this helpful website http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/protein_contact_map/
Following their lead (which I will reproduce here for completion; Note, however, that I am using a different protein):
import Bio.PDB
import numpy as np
pdb_code = "1QHW"
pdb_filename = "1qhw.pdb"
def calc_residue_dist(residue_one, residue_two) :
"""Returns the C-alpha distance between two residues"""
diff_vector = residue_one["CA"].coord - residue_two["CA"].coord
return np.sqrt(np.sum(diff_vector * diff_vector))
def calc_dist_matrix(chain_one, chain_two) :
"""Returns a matrix of C-alpha distances between two chains"""
answer = np.zeros((len(chain_one), len(chain_two)), np.float)
for row, residue_one in enumerate(chain_one) :
for col, residue_two in enumerate(chain_two) :
answer[row, col] = calc_residue_dist(residue_one, residue_two)
return answer
structure = Bio.PDB.PDBParser().get_structure(pdb_code, pdb_filename)
model = structure[0]
dist_matrix = calc_dist_matrix(model["A"], model["A"])
But when I run the above code, I get the following error message:
Traceback (most recent call last):
File "<ipython-input-26-7239fb7ebe14>", line 4, in <module>
dist_matrix = calc_dist_matrix(model["A"], model["A"])
File "<ipython-input-3-730a11883f27>", line 15, in calc_dist_matrix
answer[row, col] = calc_residue_dist(residue_one, residue_two)
File "<ipython-input-3-730a11883f27>", line 6, in calc_residue_dist
diff_vector = residue_one["CA"].coord - residue_two["CA"].coord
File "/Users/anaconda/lib/python3.6/site-packages/Bio/PDB/Entity.py", line 39, in __getitem__
return self.child_dict[id]
KeyError: 'CA'
Any suggestions on how to fix this issue?

You have heteroatoms (water, ions, etc; anything that isn't an amino acid or nucleic acid) in your structure, remove them with:
for residue in chain:
if residue.id[0] != ' ':
chain.detach_child(residue.id)
This will remove them from your entire structure. You may want to modify if want to keep the heteroatoms for further analysis.

I believe the problem is that some of the elements in model["A"] are not amino acids and therefore do not contain "CA".
To get around this, I wrote a new function which returns only the amino acid residues:
from Bio.PDB import *
chain = model["A"]
def aa_residues(chain):
aa_only = []
for i in chain:
if i.get_resname() in standard_aa_names:
aa_only.append(i)
return aa_only
AA_1 = aa_residues(model["A"])
dist_matrix = calc_dist_matrix(AA_1, AA_1)

So I've been testing (bear in mind I know very little about Bio) and it looks like whatever is in you 1qhw.pdb file is very different from the one in that example.
pdb_code = '1qhw'
structure = Bio.PDB.PDBParser().get_structure(pdb_code, pdb_filename)
model = structure[0]
next, to see what is in it, I did:
print(list(model))
Which gave me:
[<Chain id=A>]
exploring this, it appears the pdb file is a dict of dicts. So, using this id,
test = model['A']
gives me the next dict. This level is the level being passed to your function that is causing the error. Printing this with:
print(list(test))
Gave me a huge list of the data inside, including lots of residues and related info. But crucially, no CA. Try using this to see whats inside and modify the line:
diff_vector = residue_one["CA"].coord - residue_two["CA"].coord
to reflect what you are after, replacing CA where appropriate.
I hope this helps, its a little tricky to get much more specific.

Another solution to obtain the contact map for a protein chain is to use the PdbParser shipped with ConKit.
ConKit is a library specifically designed to work with predicted contacts but has the functionality to extract contacts from a PDB file:
>>> from conkit.io.PdbIO import PdbParser
>>> p = PdbParser()
>>> with open("1qhw.pdb", "r") as pdb_fhandle:
... pdb = p.read(pdb_fhandle, f_id="1QHW", atom_type="CA")
>>> print(pdb)
ContactFile(id="1QHW_0" nmaps=1
This reads your PDB file into the pdb variable, which stores an internal ContactFile hierarchy. In this example, two residues are considered to be in contact if the participating CA atoms are within 8Å of each other.
To access the information, you can then iterate through the ContactFile and access each ContactMap, which in your case corresponds to intra-molecular contacts for chain A.
>>> for cmap in pdb:
... print(cmap)
ContactMap(id="A", ncontacts=1601)
If you would have more than one chain, there would be a ContactMap for each chain, and additional ones for inter-molecular contacts between chains.
The ContactMap for chain A contains 1601 contact pairs. You can access the Contact instances in each ContactMap by either iterating or indexing. Both work fine.
>>> print(cmap[0])
Contact(id="(26, 27)" res1="S" res1_chain="A" res1_seq=26 res2="T" res2_chain="A" res2_seq=27 raw_score=0.961895)
Each level in the hierarchy has various functions with which you could manipulate contact maps. Examples can be found here.

Calling one script from another and importing its values

I have two scripts main.py and get_number.py. The script get_number.py returns a random number whenver it's called. I want to call this script from main.py and print all these returned values. In other words, the script get_number.py is the following:
def get_random():
return np.random.uniform(0,1)
Now I have the following code in main.py
import get_number
n_call = 4
values = np.zeros(n_call)
for i in range(n_call):
values[i]= get_number.get_random()
print(values)
However I am receiving the error that No module named get_number. How would I go about accomplishing this task?

I believe you can import just as importing another libraries
from file1 import *
Importing variables from another file?
I Found some similar Problems up here

You are confusing between get_number and get_random
main.py:
import numpy as np
from get_number import get_random
n_call = 4
values = np.zeros(n_call)
for i in range(n_call):
values[i]= get_random()
print(values)
Out: [ 0.63433276 0.36541908 0.83485925 0.59532567]
get_number:
import numpy as np
def get_random():
return np.random.uniform(0,1)

You have to import this way:
In main.py
from get_number import get_random
n_call = 4
values = np.zeros(n_call)
for i in range(n_call):
values[i]= get_random()
print(values)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to add a Chain, Residues and atoms in pdb file? - python

Related

Get all bounds lengths from PDB

Swap numbers from different lines (from fileinput library) without using temporary variables

Reading a file and printing a specific answer

Intramolecular protein residue contact map using biopython, KeyError: 'CA'

Calling one script from another and importing its values

Categories

Resources