Converting an Isometric SMILE into its atoms and non-hydrogen neighbours

Converting an Isometric SMILE into its atoms and non-hydrogen neighbours - python

Hope whoever is reading this is well.
I have an issue with my code. I am trying to convert an isometric SMILE, a descriptor of a molecule, into its atomic groups and neightbours.
My code is below.
import rdkit
from rdkit import Chem
def collect_bonded_atoms(smile):
mol = Chem.MolFromSmiles(smile)
atom_counts = {}
for atom in mol.GetAtoms():
neighbors = [(neighbor.GetSymbol(), bond.GetBondType())
for neighbor, bond in zip(atom.GetNeighbors(), atom.GetBonds())]
neighbors.sort()
key = "{}-{}".format(atom.GetSymbol(), "".join(f"{symbol}{'-' if bond_order == 1 else '=' if bond_order == 2 else '#'}" for symbol, bond_order in neighbors))
atom_counts[key] = atom_counts.get(key, 0) + 1
return atom_counts
smile = "CC(C)(C)C(=O)O"
print(collect_bonded_atoms(smile))
And output is
{'C-C-': 3, 'C-C-C-C-C-': 1, 'C-C-O-O=': 1, 'O-C=': 1, 'O-C-': 1}
Whilst this works well for this molecule's SMILE, though preferably I would've liked it to be structured as,
{'C-C-': 3, 'C-C(-C)(-C)-C-': 1, 'C-C-O(=O)': 1, 'O=C': 1, 'O-C-': 1}
I can't figure out how to fix this. This is a side issue.
The main issue I have is when using this molecule
smile = "CCCCCCCCN1C=C[N+](=C1)C.F[P-](F)(F)(F)(F)F"
My output is very wrong. This is my output.
{'C-C-': 1, 'C-C-C-': 6, 'C-C-N-': 1, 'N-C-C#C#': 2, 'C-C#N#': 2, 'C-N#N#': 1, 'C-N-': 1, 'F-P-': 6, 'P-F-F-F-F-F-F-': 1}
First is that double bonds (bond_order == 2) are shown as a #. Second where it shows the number 1 in the molecule SMILE, that represents a ring. This means that it connects to the next 1. In the output, it is all over the place.
Can I please have some guidance on this?
Thanks
Advice on how to fix it, or even better a modification. The side issue isn't as important, but if possible same for it.

You're getting the # because those bonds are part of an aromatic ring, making their bond type AROMATIC instead of a SINGLE or DOUBLE. In RdKit, AROMATIC bonds have a bond order of 1.5 so all those bonds are ending up in the else loop.
To fix this, you can do two things:
Change your if-else condition to acknowledge the 1.5 bond order
Kekulize the mol object to remove conjugation due to the aromatic ring and make all the bonds static. You can do this by updating your code as:
mol = Chem.MolFromSmiles(smile)
Chem.Kekulize(mol)

Related

How do I get arrays to couple multiple numbers?

I'm making a game with a saving system in Python, but it doesn't work as intended.
thingcount = saveArray[0]
The code above is supposed to set thingcount to 5454, as shown in the saveArray:
[5454, 0, 1]
But it only sets thingcount to 5. Does anyone know how to do this?

As some of the comments have noted, if saveArray truly equals [5454, 0, 1], then the command print(thingcount) will yield your desired output of [5454]
If thingcount[0] yields an output of 5, then at some point in your code, saveArray is being set to 5454 only - Maybe as a string, but not the full list of [5454, 0, 1]
Below are two code snippets for a comparison example:
Desired Output
saveArray = [5454, 0, 1]
thingcount = saveArray[0]
print (thingcount)
Console output:
5454
Code that will yield the output you're seeing
saveArray = '5454'
thingcount = saveArray[0]
print (thingcount)
Console output:
5
In any case, I would check what saveArray is being set to - At some point in your code, its being set to a different value, not your full target list of [5454,0,1]
On getting the first element from a list in Python - Below is a link to another thread that discusses getting the first element of a list, if that is helpful:
Returning the first element python

Python: MapMatching with OSRM

I'm looking for some help with OSRM.
First of all I'm using PyCharm and installed the packge osrm, which is a wrapper around the OSRM-API.
I have a list of GPS-points, but some of them are with noisy, so i want to match all of them to the road they belong to (see picture)
I tried the match-function as mentioned in the documentation with the following code:
chunk_size = 99
coordinates = [coordinates[i:i+chunk_size] for i in range(0, len(coordinates), chunk_size)]
osrm.RequestConfig.host = "https://router.project-osrm.org"
coord=[]
for i in coordinates:
result = osrm.match(i, steps=False, overview="simplified")
#print(result)
routes = polyline.decode(result['matchings'][0]['geometry'], geojson=True)
#print(routes)
for j in routes:
coord.append((j[0],j[1]))
First Question: Is this the right way in doing this and and is it possible to plot that right away?
Because after that i but these coord in a dataframe to plot them:
df = pd.DataFrame(coord, columns=['Lon','Lat']) # Lat=51.xxx Lon=12.xxx
print(df)
df = df.sort_values(by=['Lat'])
fig_route = px.line_mapbox(df, lon=df['Lon'], lat=df['Lat'],
width=1200, height=900, zoom=13)
fig_route.update_layout(mapbox_style='open-street-map')
fig_route.update_layout(margin={'r': 0, 't': 0, 'l': 0, 'b': 0})
fig_route.show()
And if I do that the following happens:
These points might have been matched, but look at the whole plot:
This whole plot is a mess :D
And second question:
I've got the feeling, that it takes to long to run the whole code for such a "little task". It takes roughly 10 seconds for the whole code to read the points from excel into a parquet-file (7800 GPS points) - put them into a list and delete duplicates (list(set())) and do the request. Are these 10 seconds ok or is there a mistake in my code?
Thank you in advance for the help!

Find chiral centers rdkit

Working with some molecules and reactions, it seems that chiral centers in smiles may not be found after applying reactions.
What I get after applying some reactions on a molecule is this smile: C[C](C)[C]1[CH+]/C=C(\\C)CC/C=C(\\C)CC1
which actually seems to a have a chiral center in carbon 3 [C]. If I use Chem.FindMolChiralCenters(n,force=True,includeUnassigned=True) I get an empty list which means that there is no chiral center.
The thing is that if I add H to that Carbon 3 so it becomes [CH] it is recognized as chiral center but with unassigned type (R or S). I tried adding Hs using Chem.AddHs(mol) and then try again Chem.FindMolChiralCenters() but didn't get any chiral center.
I was wondering if there is a way to recognize this chiral center even if they are not added H and to set the proper chiral tag following some kind of rules.
Afer applying two 1,2 hydride shift to my initial mol (Chem.MolFromSmiles('C/C1=C\\C[C#H]([C+](C)C)CC/C(C)=C/CC1')) I get the smiles mentioned before. So given that I had some initial chiral tag I want to know if there is a way to recover lost chirality after reactions.
smarts used for 1,2 hydride shift: [Ch:1]-[C+1:2]>>[C+1:1]-[Ch+0:2]
mol = Chem.MolFromSmiles('C/C1=C\\C[C#H]([C+](C)C)CC/C(C)=C/CC1')
rxn = AllChem.ReactionFromSmarts('[Ch:1]-[C+1:2]>>[C+1:1]-[Ch+0:2]')
products = list()
for product in rxn.RunReactant(mol, 0):
Chem.SanitizeMol(product[0])
products.append(product[0])
print(Chem.MolToSmiles(products[0]))
After applying this reaction twice to the product created I eventually get this smile.
Output:
'C[C](C)[C]1[CH+]/C=C(\\C)CC/C=C(\\C)CC1'
which actually is where it is supposed to be a chiral center in carbon 3
Any idea or should I report it as a bug?

This is not a bug. I think you don't specify that you want a canonical smiles in the MolToSmiles function. So when I try:
mol = Chem.MolFromSmiles('C/C1=C\\C[C#H]([C+](C)C)CC/C(C)=C/CC1')
rxn = AllChem.ReactionFromSmarts('[Ch:1]-[C+1:2]>>[C+1:1]-[Ch+0:2]')
products = list()
for product in rxn.RunReactant(mol, 0):
Chem.SanitizeMol(product[0])
products.append(product[0])
print(Chem.MolToSmiles(products[0]))
Chem.MolToSmiles(ps[0][0])
I obtained exactly the same result as you:
'C[C](C)[CH+]1CC=C(C)CCC=C(C)CC1'
'CC1=CC[CH](CCC(C)=CCC1)=C(C)C'
but when you use this one:
Chem.MolToSmiles(ps[0][0], True)
You can obtain this result:
'CC(C)=[C#H]1C/C=C(\\C)CC/C=C(\\C)CC1'

I'm trying to make a simple script that says two different two phrase lines(Python)

So, I'm just starting to program Python and I wanted to make a very simple script that will say something like "Gabe- Hello, my name is Gabe (Just an example of a sentence" + "Jerry- Hello Gabe, I'm Jerry" OR "Gabe- Goodbye, Jerry" + "Jerry- Goodbye, Gabe". Here's pretty much what I wrote.
answers1 = [
"James-Hello, my name is James!"
]
answers2 = [
"Jerry-Hello James, my name is Jerry!"
]
answers3 = [
"Gabe-Goodbye, Samuel."
]
answers4 = [
"Samuel-Goodbye, Gabe"
]
Jack1 = (answers1 + answers2)
Jack2 = (answers3 + answers4)
Jacks = ([Jack1,Jack2])
import random
for x in range(2):
a = random.randint(0,2)
print (random.sample([Jacks, a]))
I'm quite sure it's a very simple fix, but as I have just started Python (Like, literally 2-3 days ago) I don't quite know what the problem would be. Here's my error message
Traceback (most recent call last):
File "C:/Users/Owner/Documents/Test Python 3.py", line 19, in <module>
print (random.sample([Jacks, a]))
TypeError: sample() missing 1 required positional argument: 'k'
If anyone could help me with this, I would very much appreciate it! Other than that, I shall be searching on ways that may be relevant to fixing this.

The problem is that sample requires a parameter k that indicates how many random samples you want to take. However in this case it looks like you do not need sample, since you already have the random integer. Note that that integer should be in the range [0,1], because the list Jack has only two elements.
a = random.randint(0,1)
print (Jacks[a])
or the same behavior with sample, see here for an explanation.
print (random.sample(Jacks,1))
Hope this helps!

random.sample([Jacks, a])
This sample method should looks like
random.sample(Jacks, a)
However, I am concerted you also have no idea how lists are working. Can you explain why do you using lists of strings and then adding values in them? I am losing you here.
If you going to pick a pair or strings, use method described by Florian (requesting data by index value.)

k parameter tell random.sample function that how many sample you need, you should write:
print (random.sample([Jacks, a], 3))
which means you need 3 sample from your list. the output will be something like:
[1, jacks, 0]

How to get match result by given range using regular expression?

I'm stucking with my code to get all return match by given range. My data sample is:
comment
0 [intj74, you're, whipping, people, is, a, grea...
1 [home, near, kcil2, meniaga, who, intj47, a, l...
2 [thematic, budget, kasi, smooth, sweep]
3 [budget, 2, intj69, most, people, think, of, e...
I want to get the result as: (where the given range is intj1 to intj75)
comment
0 [intj74]
1 [intj47]
2 [nan]
3 [intj69]
My code is:
df.comment = df.comment.apply(lambda x: [t for t in x if t=='intj74'])
df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]
I'm not sure how to use regular expression to find the range for t=='range'. Or any other idea to do this?
Thanks in advance,
Pandas Python Newbie

you could replace [t for t in x if t=='intj74'] with, e.g.,
[t for t in x if re.match('intj[0-9]+$', t)]
or even
[t for t in x if re.match('intj[0-9]+$', t)] or [np.nan]
which would also handle the case if there are no matches (so that one wouldn't need to check for that explicitly using df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]) The "trick" here is that an empty list evaluates to False so that the or in that case returns its right operand.

I am new to pandas as well. You might have initialized your DataFrame differently. Anyway, this is what I have:
import pandas as pd
data = {
'comment': [
"intj74, you're, whipping, people, is, a",
"home, near, kcil2, meniaga, who, intj47, a",
"thematic, budget, kasi, smooth, sweep",
"budget, 2, intj69, most, people, think, of"
]
}
print(df.comment.str.extract(r'(intj\d+)'))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting an Isometric SMILE into its atoms and non-hydrogen neighbours - python

Related

How do I get arrays to couple multiple numbers?

Python: MapMatching with OSRM

Find chiral centers rdkit

I'm trying to make a simple script that says two different two phrase lines(Python)

How to get match result by given range using regular expression?

Categories

Resources