pandas to_numeric producing valueerror

pandas to_numeric producing valueerror - python

I am unable to make to_numeric to work in the code below:
tt = ['123.00','10,614,163,994.00']
pd.to_numeric(tt)
I get the following error:
ValueError: Unable to parse string "10,614,163,994.00" at position 1
please help.

to_numeric cannot handle the , as seperator for thousands, millions, ..
You should preprocess tt by something like tt = [n.replace(',','') for n in tt]

The second value in tt is not a number, in the limited definition of number for many parsers. Just remove the commas before trying to do the conversion.
tt = ['123.00','10,614,163,994.00']
tt = [x.replace(',','') for x in tt]
pd.to_numeric(tt)

Related

rdkit.Chem.rdmolfiles.MolToMolFile(NoneType, str)

I am trying to convert smi to sdf format using rdkit python library. I am running following line of python code.
def convertir_smi_sdf(file_smi):
leer = [i for i in open(file_smi)]
print(f"Total de smi: {len(leer)}")
cont = 0
cont_tot = []
for i in leer:
nom_mol = i.split()[1]
smi_mol = i.split()[0]
mol_smi = Chem.MolFromSmiles(smi_mol)
Chem.MolToMolFile(mol_smi, f'{nom_mol}.sdf')
cont += 1
cont_tot.append(cont)
print(f"Se ha convertido {cont_tot[-1]} smiles a SDF")
Any help is highly appreciated
I need this to separate this smiles format in distints sdf archives.
Error:
Output:

These kinds of errors always mean one thing: The SMILES you're inputting is invalid. In your case, you're getting the error because of the SMILES string Cl[Pt](Cl)([NH4])[NH4] which is invalid. See its picture below. Both Nitrogen atoms are forming 5 bonds without any positive charge on them.
When you parse it in RdKit, you'll get a warning like this:
To deal with this, either fix this SMILES manually or ignore it completely. To ignore it, just pass the argument sanitize=False as below:
mol_smi = Chem.MolFromSmiles(smi_mol, sanitize=False)
Just a warning: by adding sanitize=False, you'll be ignoring all the invalid SMILES.

How do I store a large int in python without the newlines being counted by len()?

So I wanna store a long integer which is too big for one line in python. Do I just ignore PEP 8 and just make it longer than 120 characters? Cause if I do it like this:
num="""7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843
8586156078911294949545950173795833195285320880551112540698747158523863050715693290963295227443043557
6689664895044524452316173185640309871112172238311362229893423380308135336276614282806444486645238749
3035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776
6572733300105336788122023542180975125454059475224352584907711670556013604839586446706324415722155397
5369781797784617406495514929086256932197846862248283972241375657056057490261407972968652414535100474
8216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586
1786645835912456652947654568284891288314260769004224219022671055626321111109370544217506941658960408
0719840385096245544436298123098787992724428490918884580156166097919133875499200524063689912560717606
0588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450"""
and try to access a specific index of that integer or use len() on it I get a length of 1009 instead of the 1000 digits the number actually has. And putting everything into one line would make that line 1004 characters long which doesn't seem that great either.

I would use the following literal over multiple lines in parentheses for cleanliness:
num = (
'7316717653'
'1330624919'
'2251196744'
)
so that len(num) from the above example returns: 30

Another option you have is to put the number into another file (say number.txt) and read it at runtime:
number.txt
7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450
main.py
with open("number.txt", "r") as f:
number = f.read()

I wouldn't use this personally, but one option is to remove the newlines:
num = """
123
456
""".replace('\n', '')
print(repr(num)) # -> '123456'

There's lots of good answers already, but here's one that will give you a bit of extra convenience. You just have to put in a number and the size of the chunks per line, and you can reuse it for lots of long numbers, if needed:
Format your number into multiple strings using a for loop and string concatenation:
x = str(7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450)
y = []
y.append("long_num = (")
chunksize = 10
for i in range(0, len(x), chunksize ):
y.append("\t"+"\""+x[i:i+chunksize ]+"\"")
y.append(")")
for part in y:
print (part)
Outputs the following string that you can use in your code, referencing #blhsing's answer:
long_num = (
"7316717653"
"1330624919"
"2251196744"
"2657474235"
"5349194934"
"9698352031"
"2774506326"
"2395783180"
"1698480186"
...
) ```

You can take a look at this post Is there a way to implement methods like __len__ or __eq__ as classmethods?
Simple make a class for your long integer, and replace the len(self) function to not count \n

Written a little time delta function and want to capture results as a numpy array

I have a written some code that takes two data_dict lists one containing opening times and one containing closing times.
The functions finds the difference between these two times and returns a figure in hours X.X hours.
IF, the opening and closing times in the lists are not in the correct format (00:00:00), then the function returns '-1'.
It works perfectly, however I want to be able to capture the results and save them as a numpy array.
The results print like this...
X
Y
Z
A
X
etc...
I am very very new to python and just need some guidance.
Thanks guys.
opening_time_arr = data_dict['Open']
closing_time_arr = data_dict['Close']
if len(opening_time_arr) == len(closing_time_arr):
resultTime = []
for idx, closing_time in enumerate(closing_time_arr):
try:
FORMAT = '%H:%M:%S'
tdelta = datetime.strptime(closing_time, FORMAT) - datetime.strptime(opening_time_arr[idx], FORMAT)
resultTime.append(tdelta)
tdelta_h = tdelta.total_seconds()/3600
print(tdelta_h)
except ValueError:
print('-1')
The function returns
8.0
8.5
6.5
7.5
and so on... there is about 250 entries.
How can I take these numbers and convert them to a numpy array without printing the results like my code does currently.

Oliver - I think you were really close! If tdelta_h is your output in hours, then that is what you want to be appending to resultTime. After your for loop finishes, then you can convert the list to a numpy array using np.array(), and then print out the array if you want to make sure it looks OK.
Here's how I think it should look all together:
import numpy as np
opening_time_arr = data_dict['Open']
closing_time_arr = data_dict['Close']
if len(opening_time_arr) == len(closing_time_arr):
resultTime = []
for idx, closing_time in enumerate(closing_time_arr):
try:
FORMAT = '%H:%M:%S'
tdelta = (datetime.strptime(closing_time, FORMAT) - datetime.strptime(opening_time_arr[idx], FORMAT))
tdelta_h = tdelta.total_seconds()/3600
resultTime.append(tdelta_h)
except ValueError:
resultTime.append(-1)
np.array(resultTime)
print(resultTime)
Hope this helps :)

Number formatting not working with tabulate after printing text

I'm writing this little program learning Python and I have faced a problem. I use tabulate with number formatting set to 5 numbers after separator, to make everything look nice, and it works, until I print text in the table. After text is printed (stating that you cannot divide by 0), formatting on that column seems to be gone.
The code is:
if skaiciuoti == True:
while bp <= bg:
if (bp-a != 0):
y = float(a / (bp - a))
sk1.append(a)
sk2.append(bp)
sk3.append(y)
bp = bp + bz
elif (bp - a == 0):
sk1.append(a)
sk2.append(bp)
sk3.append('Veiksmas negalimas (dalyba is 0)')
bp = bp + bz
lentele = ['A reiksme', 'B reiksme', 'Y reiksme']
duomenys = zip(sk1, sk2, sk3)
print(tabulate (duomenys, headers=lentele, floatfmt=".5f", tablefmt="grid"))
Here are pictures to better illustrate my problem:
Working one
Broken one
I have tried formatting the number before appending it to the list, but it didn't work.
Any suggestions and ideas are welcome.

Okay, I've managed to fix it myself. It was really not that hard, I just didn't think about it.
The solution that worked for me was to format number when appending it to the list (not before the list) like so
sk3.append(format(y, '.5f'))

floatfmt must be a list or a tuple, specifying one format for each column.
In your case, use e.g.:
floatfmts = ('.5f', '.5f', '.5f')
print(tabulate (duomenys, headers=lentele, floatfmt=floatfmts, tablefmt="grid"))
(See e.g. https://bitbucket.org/astanin/python-tabulate/issues/96/floatfmt-option-is-ignored-in-python3)

Python Syntax error on colon in List

I'm trying to make a time series plot, and I have data points every second for about 50 seconds of time (which in my case is in UTC). Python is yelling at me about my array of data in the x axis of my plot, which is as follows:
%run "C:/Users/Jeff/Desktop/Python/STEPS_data.py"
File "C:\Users\Jeff\Desktop\Python\STEPS_data.py", line 3
x = [23:13:51,23:13:52,23:13:53,23:13:54,23:13:55,23:13:56,23:13:57,23:13:58,23:13:59,23:14:00,23:14:01,23:14:02,23:14:03,23:14:04,23:14:05,23:14:06,23:14:07,23:14:08,23:14:09,23:14:10,23:14:11,23:14:12,23:14:13,23:14:14,23:14:15,23:14:16,23:14:17,23:14:18,23:14:19,23:14:20,23:14:21,23:14:22,23:14:23,23:14:24,23:14:25,23:14:26,23:14:27,23:14:28,23:14:29,23:14:30,23:14:31,23:14:32,23:14:33,23:14:34,23:14:35,23:14:36]
^
SyntaxError: invalid syntax
There's a bunch of other info about the plot after this, but it gets hung up on this line, where it says that I have an invalid syntax error at the first colon in the array element 23:14:23, which doesn't really make sense to me. I tried making the array its own variable x1 and just saying x = x1, but that only pushed the syntax error point back by one character.
This seems like a really stupid problem but I'm stumped.

The problem is that : is not allowed everywhere, for example:
>>> a = 10:2
File "<ipython-input-12-63c21fb7e990>", line 1
a = 10:2
^
SyntaxError: invalid syntax
I think you wanted them as strings (in strings the : are allowed):
l = ['23:13:51', '23:13:52', '23:13:53', '23:13:54', '23:13:55', '23:13:56',
'23:13:57', '23:13:58', '23:13:59', '23:14:00', '23:14:01', '23:14:02', '23:14:03',
'23:14:04', '23:14:05', '23:14:06', '23:14:07', '23:14:08', '23:14:09', '23:14:10',
'23:14:11', '23:14:12', '23:14:13', '23:14:14', '23:14:15', '23:14:16', '23:14:17',
'23:14:18', '23:14:19', '23:14:20', '23:14:21', '23:14:22', '23:14:23', '23:14:24',
'23:14:25', '23:14:26', '23:14:27', '23:14:28', '23:14:29', '23:14:30', '23:14:31',
'23:14:32', '23:14:33', '23:14:34', '23:14:35', '23:14:36']
In case you don't want to add all these '' manually just wrap the whole thing as a string and split it:
>>> l = "[23:13:51,23:13:52,23:13:53,23:13:54,23:13:55,23:13:56,23:13:57,23:13:58,23:13:59,23:14:00,23:14:01,23:14:02,23:14:03,23:14:04,23:14:05,23:14:06,23:14:07,23:14:08,23:14:09,23:14:10,23:14:11,23:14:12,23:14:13,23:14:14,23:14:15,23:14:16,23:14:17,23:14:18,23:14:19,23:14:20,23:14:21,23:14:22,23:14:23,23:14:24,23:14:25,23:14:26,23:14:27,23:14:28,23:14:29,23:14:30,23:14:31,23:14:32,23:14:33,23:14:34,23:14:35,23:14:36]"
>>> l[1:-1].split(',')
or did you want them as datetimes?
>>> import datetime
>>> [datetime.datetime.strptime(t, '%H:%M:%S') for t in l[1:-1].split(',')]
or times?
>>> [datetime.datetime.strptime(t, '%H:%M:%S').time() for t in l[1:-1].split(',')]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas to_numeric producing valueerror - python

I am unable to make to_numeric to work in the code below: tt = ['123.00','10,614,163,994.00'] pd.to_numeric(tt) I get the following error: ValueError: Unable to parse string "10,614,163,994.00" at position 1 please help.

to_numeric cannot handle the , as seperator for thousands, millions, .. You should preprocess tt by something like tt = [n.replace(',','') for n in tt]

The second value in tt is not a number, in the limited definition of number for many parsers. Just remove the commas before trying to do the conversion. tt = ['123.00','10,614,163,994.00'] tt = [x.replace(',','') for x in tt] pd.to_numeric(tt)

Related

rdkit.Chem.rdmolfiles.MolToMolFile(NoneType, str)

How do I store a large int in python without the newlines being counted by len()?

Written a little time delta function and want to capture results as a numpy array

Number formatting not working with tabulate after printing text

Python Syntax error on colon in List

Categories

Resources