Find chiral centers rdkit - python

Working with some molecules and reactions, it seems that chiral centers in smiles may not be found after applying reactions.
What I get after applying some reactions on a molecule is this smile: C[C](C)[C]1[CH+]/C=C(\\C)CC/C=C(\\C)CC1
which actually seems to a have a chiral center in carbon 3 [C]. If I use Chem.FindMolChiralCenters(n,force=True,includeUnassigned=True) I get an empty list which means that there is no chiral center.
The thing is that if I add H to that Carbon 3 so it becomes [CH] it is recognized as chiral center but with unassigned type (R or S). I tried adding Hs using Chem.AddHs(mol) and then try again Chem.FindMolChiralCenters() but didn't get any chiral center.
I was wondering if there is a way to recognize this chiral center even if they are not added H and to set the proper chiral tag following some kind of rules.
Afer applying two 1,2 hydride shift to my initial mol (Chem.MolFromSmiles('C/C1=C\\C[C#H]([C+](C)C)CC/C(C)=C/CC1')) I get the smiles mentioned before. So given that I had some initial chiral tag I want to know if there is a way to recover lost chirality after reactions.
smarts used for 1,2 hydride shift: [Ch:1]-[C+1:2]>>[C+1:1]-[Ch+0:2]
mol = Chem.MolFromSmiles('C/C1=C\\C[C#H]([C+](C)C)CC/C(C)=C/CC1')
rxn = AllChem.ReactionFromSmarts('[Ch:1]-[C+1:2]>>[C+1:1]-[Ch+0:2]')
products = list()
for product in rxn.RunReactant(mol, 0):
Chem.SanitizeMol(product[0])
products.append(product[0])
print(Chem.MolToSmiles(products[0]))
After applying this reaction twice to the product created I eventually get this smile.
Output:
'C[C](C)[C]1[CH+]/C=C(\\C)CC/C=C(\\C)CC1'
which actually is where it is supposed to be a chiral center in carbon 3
Any idea or should I report it as a bug?

This is not a bug. I think you don't specify that you want a canonical smiles in the MolToSmiles function. So when I try:
mol = Chem.MolFromSmiles('C/C1=C\\C[C#H]([C+](C)C)CC/C(C)=C/CC1')
rxn = AllChem.ReactionFromSmarts('[Ch:1]-[C+1:2]>>[C+1:1]-[Ch+0:2]')
products = list()
for product in rxn.RunReactant(mol, 0):
Chem.SanitizeMol(product[0])
products.append(product[0])
print(Chem.MolToSmiles(products[0]))
Chem.MolToSmiles(ps[0][0])
I obtained exactly the same result as you:
'C[C](C)[CH+]1CC=C(C)CCC=C(C)CC1'
'CC1=CC[CH](CCC(C)=CCC1)=C(C)C'
but when you use this one:
Chem.MolToSmiles(ps[0][0], True)
You can obtain this result:
'CC(C)=[C#H]1C/C=C(\\C)CC/C=C(\\C)CC1'

Related

Can't convert molecule to fingerprint with rdkit

I'm trying to convert molecular smiles into fingerprints using rdkit. I have two smiles:
Nc1cccc(N)n1 and Nc1cc(CSc2ccc(O)cc2)cc(N)n1. The first one was expanded into the second one. In other words, the second molecule contains the first one in its structure.
What I did was use rdkit to remove the common part to obtain smiles of a fragment that differs (CSC1=CC=C(O)C=C1 in kekulized form). I'm trying to convert that fragment into a molecule and then to a fingerprint to calculate similarity with a reference molecule.
Desired transformation
But I get an error: 'Can't kekulize atoms' with indices of those atoms. This is strange to me because all the smiles (the two input smiles and the resulting fragment smiles) can be easily visualized using MarvinSketch or Chemdraw (software for drawing molecules). I even had Marvin kekulize the fragment smiles and tried making a molecule from that but I still get the same error. Here is my code for removing the fragment:
def remove_initial_fragment(mol_smiles, fragment_smiles):
mol = Chem.MolFromSmiles(mol_smiles) #creates molecule from the longer smiles
fragment = Chem.MolFromSmiles(fragment_smiles) #the molecule I want to remove
rm = AllChem.DeleteSubstructs(mol, fragment) #creates new molecule
return Chem.MolToSmiles(rm) #converts the mol I want back into smiles
smiles_frags = [remove_initial_fragment(x, fragment_smiles) for x in smiles]
mols_frags = [Chem.MolFromSmiles(x) for x in smiles_frags]
In my case, the 'fragment_smiles' is the same for all selected smiles.
But then I get an error when trying to convert molecules from the 'mols_frags' list into fingerprints:
MFP_2 = [AllChem.GetMorganFingerprintAsBitVect(x, 2) for x in mols_frags]
I tried looking online for answers but nothing really helped. I even tried to create kekulized smiles separately and passing them directly as input for creating the fingerprints but I still get the same error.
It's super weird to me because when I try to do the same process with the same code for one set of smiles (fragment, longer smiles, resulting smiles), it works without a problem and I can create the fingerprint without any error. But it seems to me that once I input the smiles/molecules as a list, I get the error. Any idea why this could be? Or do you see any error in my code that I'm unaware of?
With fragment_smiles = 'Nc1cccc(N)n1' and a list like smiles = ['Nc1cc(CSc2ccc(O)cc2)cc(N)n1', 'Nc1cc(COc2ccc(O)cc2)cc(N)n1']. I have no problem getting a fingerprint.
It looks as if, after deleting the substructure, there are some smiles_frags that are not correct SMILES.
To prove wich SMILES in the list gives the problem you can use
from rdkit.Chem import AllChem as Chem
fragment = Chem.MolFromSmiles('Nc1cccc(N)n1')
smiles = ['Nc1cc(CSc2ccc(O)cc2)cc(N)n1', 'Nc1cc(COc2ccc(O)cc2)cc(N)n1', 'CC1=CC=Cc2c(N)nc(N)cc12']
for smi in smiles:
try:
mol = Chem.MolFromSmiles(smi)
f1 = Chem.DeleteSubstructs(mol, fragment)
f2 = Chem.MolFromSmiles(Chem.MolToSmiles(f1))
fp = Chem.GetMorganFingerprintAsBitVect(f2, 2)
except:
print('SMILES:', smi)
f = Chem.DeleteSubstructs(mol, fragment)
print('smiles_frag:', Chem.MolToSmiles(f1))
This will give:
SMILES: CC1=CC=Cc2c(N)nc(N)cc12
smiles_frag: ccccC

Random ultrametric trees

I've implemented a program on python which generates random binary trees. So now I'd like to assign to each internal node of the tree a distance to make it ultrametric. Then, the distance between the root and any leaves must be the same. If a node is a leaf then the distance is null. Here is a node :
class Node() :
def __init__(self, G = None , D = None) :
self.id = ""
self.distG = 0
self.distD = 0
self.G = G
self.D = D
self.parent = None
My idea is to set the distance h at the beginning and to decrease it as an internal node is found but its working only on the left side.
def lgBrancheRand(self, h) :
self.distD = h
self.distG = h
hrandomD = round(np.random.uniform(0,h),3)
hrandomG = round(np.random.uniform(0,h),3)
if self.D.D is not None :
self.D.distD = hrandomD
self.distD = round(h-hrandomD,3)
lgBrancheRand(self.D,hrandomD)
if self.G.G is not None :
self.G.distG = hrandomG
self.distG = round(h-hrandomG,3)
lgBrancheRand(self.G,hrandomG)
In summary, you would create random matrices and apply UPGMA to each.
More complete answer below
Simply use the UPGMA algorithm. This is a clustering algorithm used to resolve a pairwise matrix.
You take the total genetic distance between two pairs of "taxa" (technically OTUs) and divide it by two. You assign the closest members of the pairwise matrix as the first 'node'. Reformat the matrix so these two pairs are combined into a single group ('removed') and find the next 'nearest neighbor' ad infinitum. I suspect R 'ape' will have a ultrametric algorhithm which will save you from programming. I see that you are using Python, so BioPython MIGHT have this (big MIGHT), personally I would pipe this through a precompiled C program and collect the results via paup that sort of thing. I'm not going to write code, because I prefer Perl and get flamed if any Perl code appears in a Python question (the Empire has established).
Anyway you will find this algorhithm produces a perfect ultrametric tree. Purests do not like ultrametric trees derived throught this sort of algorithm. However, in your calculation it could be useful because you could find the phylogeny from real data , which is most "clock-like" against the null distribution you are producing. In this context it would be cool.
You might prefer to raise the question on bioinformatics stackexchange.

Huge Difference in result when calculating distance b/w two points using longitude-latitude

I am attempting to calculate the distance between two Positions using SRID(32148)
Here are my two points
Point-1:
Lat:46.489767 Long:-119.043221
Point-2:
Lat:47.610902 Long:-122.336422
This website states that the distance in miles b/w these two points is 173.388 however from the code below the result I am getting is 0.002161632093865483
This is the code that I am using
employeeloc = modelEmployee.objects.filter(user__first_name="adam")[0].location
employerloc = modelEmployer.objects.filter(user__first_name="adam")[0].location
meters = employerloc.distance(employeeloc)
#Caluclate in miles
dobj = Distance(m=meters)
mi = dobj.mi
This is a little more detail with debugging results attached
Any suggestions on why my result is so different ?
Update:
I tried transforming the position using the following code using SRID 4326. However the results are still incorrect
You appear to have used the lon / lat coordinates as SRID(32148) ones; you need to transform them.
This incorrect query gives your result 3.47m, because the coordinates don't match the SRID:
select
st_distance(
st_setsrid(st_point(-122.336422,47.610902),32148),
st_setsrid(st_point(-119.043221,46.489767),32148))
-- 3.47880964046985
This query gives you the 173.71 mi result you expect:
select
st_distance(
st_transform(st_setsrid(st_point(-122.336422,47.610902),4326),32148),
st_transform(st_setsrid(st_point(-119.043221,46.489767),4326),32148))
--279558.106935732m (=173.71mi)
And that is similar to the result of this query:
select
st_distance(
st_setsrid(st_point(-122.336422,47.610902),4326)::geography,
st_setsrid(st_point(-119.043221,46.489767),4326)::geography)
--279522.55326056 m (= 173.69 mi)

Batch-constraining objects (feathers to a wing)

really not long ago I had my first dumb question answered here so... there I am again, with a hopefully less dumb and more interesting headscratcher. Keep in my mind I am still making my baby steps in scripting !
There it is : I need to rig a feathered wing, and I already have all the feathers in place. I thought of mimicking another rig I animated recently that had the feathers point-constrained to the arm and forearm, and orient-constrained to three other controllers on the arm : each and every feather was constrained to two of those controllers at a time, and the constraint's weights would shift as you went down the forearm towards the wrist, so that one feather perfectly at mid-distance between the elbow and the forearm would be equally constrained by both controllers... you get the picture.
My reasoning was as follows : let's make a loop that iterates over every feather, gets its world position, finds the distance from that feather to each of the orient controllers (through Pythagoras), normalize that and feed the values into the weight attribute of an orient constraint. I could even go the extra mile and pass the normalized distance through a sine function to get a nice easing into the feathers' silhouette.
My pseudo-code is ugly and broken, but it's a try. My issues are inlined.
Second try !
It works now, but only on active object, instead of the whole selection. What could be happening ?
import maya.cmds as cmds
# find world space position of targets
base_pos = cmds.xform('base',q=1,ws=1,rp=1)
tip_pos = cmds.xform('tip',q=1,ws=1,rp=1)
def relative_dist_from_pos(pos, ref):
# vector substract to get relative pos
pos_from_ref = [m - n for m, n in zip(pos, ref)]
# pythagoras to get distance from vector
dist_from_ref = (pos_from_ref[0]**2 + pos_from_ref[1]**2 + pos_from_ref[2]**2)**.5
return dist_from_ref
def weight_from_dist(dist_from_base, dist_to_tip):
normalize_fac = (1/(dist_from_base + dist_to_tip))
dist_from_base *= normalize_fac
dist_to_tip *= normalize_fac
return dist_from_base, dist_to_tip
sel = cmds.ls(selection=True)
for obj in sel:
# find world space pos of feather
feather_pos = cmds.xform(obj, q=1, ws=1, rp=1)
# call relative_dist_from_pos
dist_from_base = relative_dist_from_pos(feather_pos, base_pos)
dist_to_tip = relative_dist_from_pos(feather_pos, tip_pos)
# normalize distances
weight_from_dist(dist_from_base, dist_to_tip)
# constrain the feather - weights are inverted
# because the smaller the distance, the stronger the constraint
cmds.orientConstraint('base', obj, w=dist_to_tip)
cmds.orientConstraint('tip', obj, w=dist_from_base)
There you are. Any pointers are appreciated.
Have a good night,
Hadriscus

2 different blocks of text are merging together. Can I separate them if i know what 1 is?

I've used a number of pdf-->text methods to extract text from pdf documents. For one particular type of PDF I have, neither pyPDF or pdfMiner are doing a good job extracting the text. However, http://www.convertpdftotext.net/ does it (almost) perfectly.
I discovered that the pdf I'm using has some transparent text in it, and it is getting merged into the other text.
Some examples of the blocks of text I get back are:
12324 35th Ed. 01-MAR-12 Last LNM: 14/12 NAD 83 14/12 Corrective Object of Corrective
ChartTitle: Intracoastal Waterway Sandy Hook to Little Egg Harbor Position
C HAActRionT N Y -NJ - S A N D Y H OO K ATcO tionLI T TLE EGG HARBOR. Page/Side: N/A
(Temp) indicates that the chart correction action is temporary in nature. Courses and bearings are givCGenD 0in 1 degrees clockwise from 000 true.
Bearings RoEf LlighOCtAT seEc tors aSrehre towwsbuardry th Re ivligher Ct fhroanmn seel Lawighartde.d B Theuoy 5no minal range of lights is expressedf roin mna 4u0tic-24al -mi46les.56 0(NNM ) unless othe0r7w4is-00e n-o05te.d8.8 0 W
to 40-24-48.585N 074-00-05.967W
and
12352 33rd Ed. 01-MAR-11 Last LNM: 03/12 NAD 83 04/12 . . l . . . . Corrective Object of Corrective ChartTitle: Shinnecock Bay to East Rockaway Inlet Position C HAActRionT S H IN N E C OC K B A Y TO AcEtionAS T ROCKAWAY INLET. Page/Side: N/A (Temp) indicates that the chart correction action is temporary in nature. Courses and bearings are givCGenD 0in 1 degrees clockwise from 000 true. (BTeeamringp) s DoEf LlighETtE s ectors aSretat toew Baoratd Ctheh anlighnet lf Droaym beseacoawanr 3d. The nominal range of lights is expressedf roin mna 4u0tic-37al -mi11les.52 0(NNM ) unless othe0r7w3is-29e n-5o3te.d76. 0 W
and I have discovered that the "ghost text" is ALWAYS the following:
Corrective Object of Corrective Position
Action Action
(Temp) indicates that the chart correction action is temporary in nature. Courses and bearings are given in degrees clockwise from 000 true.
Bearings of light sectors are toward the light from seaward. The nominal range of lights is expressed in nautical miles (NM) unless otherwise noted.
In the 2nd example I posted, the text I want (with the ghost text removed) is:
12352 33rd Ed. 01-Mar-11 Last LNM:03/12 NAD 83 04/12
Chart Title:Shinnecock Bay to East Rockaway Inlet. Page/Side:N/A
CGD01
(Temp) DELETE State Boat Channel Daybeacon 3 from 40-37-11.520N 073-29-53.760W
This problem occurs just once per document, and does not appear to be totally consistent (as seen above). I am wondering if one of you wizards could think of a way to remove the ghosted text (I don't need/want it) using python. If I had been using pyPDF, I would have used a regex to rip it out during the conversion to text. Unfortunately, since I'm starting out with a text file from the website listed above, the damage has already been done. I'm at a bit of a loss.
Thanks for reading.
EDIT:
The solution to this problem looks like it be more complex than the rest of the application, so I'm going to withdraw my request for help.
I very much appreciate the thought put into it by those who have contributed.
Given that the ghost text can be split up in seemingly unpredictable ways, I don't think there is a simple automatic way of removing it that would not have false positives. What you need is almost human-level pattern recognition. :-)
What you could try is exploiting the format of these kinds of messages. Roughly;
<number> <number>[rn]d Ed. <date> Last LNM:<mm>/<yy> NAD <date2>
Chart Title:<text>. Page/Side:<N/A or number(s)> CGD<number> <text>
<position>
Using this you could pluck out the nonsense from the predictable elements, and then if you have a list of chart names ('Shinnecock Bay to East Rockaway Inlet') and descriptive words (like 'State', 'Boat', 'Daybeacon') you might be able to reconstruct the original words by finding the smallest levenshtein distance between mangled words in the two text blocks and those in your word lists.
If you can install the poppler software, you could try and use pdftotext with the -layout option to keep the formatting from the original PDF as much as possible. That might make your problem disappear.
You could recursively find all possible ways that your Pattern
"Corrective Object of Corrective Position Action ..." can be contained within your mangled text,
Then you can unmangle the text for each of these possible paths, run some sort of spellcheck over them, and choose the one with the fewest spelling mistakes. Or since you know roughly where each substring should appear, you can use that as a heuristic.
Or you could simply use the first path.
some pseudocode (untested):
def findPaths(mangledText, pattern, path)
if len(pattern)==0: # end of pattern
return [path]
else:
nextLetter= pattern[0]
locations = findAllOccurences (mangledText, nextLetter) # get all indices in mangledText that contain nextLetter
allPaths = []
for loc in locations:
paths = findPaths( mangledText[loc+1:], pattern[1:], path + (loc,) )
allPaths.Extend(paths)
return allPaths # if no locations for the next letters exist, allPaths will be emtpy
Then you can call it like this (optionally remove all spaces from your search pattern, unless you are certain they are all included in the mangled text)
allPossiblePaths = findPaths ( YourMangledText, "Corrective Object...", () )
then allPossiblePaths should contain a list of all possible ways your pattern could be contained in your mangled text.
Each entry is a tuple with the same length as the pattern, containing the index at which the corresponding letter of the pattern occurs in the search text.

Categories

Resources