Converting object to float loses too much precision - pandas - python

I'm trying to plot a DataFrame using pandas but it's not working (see this similar thread for details). I think part of the problem might be that my DataFrame seems to be made of objects:
>>> df.dtypes
Field object
Moment object
Temperature object
However, if I were to convert all the values to type float, I lose a lot of precision. All the values in column Moment are of the form -132.2036E-06 and converting to float with df1 = df.astype(float) changes it to -0.000132.
Anyone know how I can preserve the precision?

You can do this to change the displayed precision
In [1]: df = DataFrame(np.random.randn(5,2))
In [2]: df
Out[2]:
0 1
0 0.943371 0.171686
1 1.508525 0.005589
2 -0.764565 0.259490
3 -1.059662 -0.837602
4 0.561804 -0.592487
[5 rows x 2 columns]
In [3]: pd.set_option('display.precision',12)
In [4]: df
Out[4]:
0 1
0 0.94337126946 0.17168604324
1 1.50852519105 0.00558907755
2 -0.76456509501 0.25948965731
3 -1.05966206139 -0.83760201886
4 0.56180449801 -0.59248656304
[5 rows x 2 columns]

Related

DataFrame.astype() errors parameter

astype raises an ValueError when using dict of columns.
I am trying to convert the type of sparse column in a big DF (from float to int). My problem is with the NaN values. They are not ignored while using a dict of columns even if the errors parameter is set to 'ignore' .
Here is a toy example:
t=pd.DataFrame([[1.01,2],[3.01, 10], [np.NaN,20]])
t.astype({0: int}, errors='ignore')
ValueError: Cannot convert non-finite values (NA or inf) to integer
You can use the new nullable integer dtype in pandas 0.24.0+. You'll first need to convert any floats that aren't exactly equal to integers to be equal to integer values (e.g. rounding, truncating, etc.) before using astype:
In [1]: import numpy as np; import pandas as pd; pd.__version__
Out[1]: '0.24.2'
In [2]: t = pd.DataFrame([[1.01, 2],[3.01, 10], [np.NaN, 20]])
In [3]: t.round().astype('Int64')
Out[3]:
0 1
0 1 2
1 3 10
2 NaN 20
Try this:
t.astype('int64', copy=False, errors='ignore')
Will output:
0 1
0 1.01 2
1 3.01 10
2 NaN 20
As per the doc this may be a dtype.
UPDATE:
t=pd.DataFrame([[1.01,2],[3.01, 10], [np.NaN,20]],
columns=['0', '1'])
t.astype({'0': 'int64', '1': 'int64'}, errors='ignore')
I tried also to add column names to you dataset, but in failure. May be some notation quirks, a bug or a problem with in place copy.
Try this:
out = t.fillna(99999).astype(int)
final = out.replace(99999, 'Nan')
Output:
0 1
0 1 2
1 3 10
2 Nan 20
Try
t_new=t.mask(t.notnull(),t.values.astype(int))

How to edit display precision for only one dataframe pandas

I would like to edit the display precision for a specific dataframe.
Now I saw people stating that you could use something like this:
pd.set_option('precision', 5)
However, how do you make sure that only one specific dataframe uses this precision, and the other remain as they were?
Also, is it possible to alter this precision for specific columns only?
One way is string representation of float column:
np.random.seed(2019)
df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
print (df)
a b c
0 0.903482 0.393081 0.623970
1 0.637877 0.880499 0.299172
2 0.702198 0.903206 0.881382
3 0.405750 0.452447 0.267070
4 0.162865 0.889215 0.148476
df['a'] = df['a'].map("{:,.15f}".format)
print (df)
a b c
0 0.903482214419274 0.393081 0.623970
1 0.637877401022227 0.880499 0.299172
2 0.702198270186552 0.903206 0.881382
3 0.405749797979913 0.452447 0.267070
4 0.162864870291925 0.889215 0.148476
print (df.dtypes)
a object
b float64
c float64
dtype: object
This method will not affect the original dataframe.
df.style.set_precision(5)
For One column you can use
df.style.format({'Column_Name': "{:.5f}"})

Pandas rounds number to 0

I'm trying to assign a value to a cell, yet Pandas rounds it to zero. (I'm using Python 3.6)
in: df['column1']['row1'] = 1 / 331616
in: print(df['column1']['row1'])
out: 0
But if I try to assign this value to a standard Python dictionary key, it works fine.
in: {'column1': {'row1': 1/331616}}
out: {'column1': {'row1': 3.0155360416867704e-06}}
I've already done this, but it didn't help:
pd.set_option('precision',50)
pd.set_option('chop_threshold',
.00000000005)
Please, help.
pandas appears to be presuming that your datatype is an integer (int).
There are several ways to address this, either by setting the datatype to a float when the DataFrame is constructed OR by changing (or casting) the datatype (also referred to as a dtype) to a float on the fly.
setting the datatype (dtype) during construction:
>>> import pandas as pd
In making this simple DataFrame, we provide a single example value (1) and the columns for the DataFrame are defined as containing floats during creation
>>> df = pd.DataFrame([[1]], columns=['column1'], index=['row1'], dtype=float)
>>> df['column1']['row1'] = 1 / 331616
>>> df
column1
row1 0.000003
converting the datatype on the fly:
>>> df = pd.DataFrame([[1]], columns=['column1'], index=['row1'], dtype=int)
>>> df['column1'] = df['column1'].astype(float)
>>> df['column1']['row1'] = 1 / 331616
df
column1
row1 0.000003
Your column's datatype most likely is set to int. You'll need to either convert it to float or mixed types object before assigning the value:
df = pd.DataFrame([1,2,3,4,5,6])
df.dtypes
# 0 int64
# dtype: object
df[0][4] = 7/125
df
# 0
# 0 1
# 1 2
# 2 3
# 3 4
# 4 0
# 5 6
df[0] = df[0].astype('O')
df[0][4] = 7 / 22
df
# 0
# 0 1
# 1 2
# 2 3
# 3 4
# 4 0.318182
# 5 6
df.dtypes
# 0 object
# dtype: object

Python pandas correlation corr() TypeError: Could not compare ['pearson'] with block values

one = pd.DataFrame(data=[1,2,3,4,5], index=[1,2,3,4,5])
two = pd.DataFrame(data=[5,4,3,2,1], index=[1,2,3,4,5])
one.corr(two)
I think it should return a float = -1.00 but instead it's generating the following error:
TypeError: Could not compare ['pearson'] with block values
Thanks in advance for your help.
pandas.DataFrame.corr computes pairwise correlation between the columns of a single data frame. What you need here is pandas.DataFrame.corrwith:
>>> one.corrwith(two)
0 -1
dtype: float64
You are operating on a DataFrame when you should be operating on a Series.
In [1]: import pandas as pd
In [2]: one = pd.DataFrame(data=[1,2,3,4,5], index=[1,2,3,4,5])
In [3]: two = pd.DataFrame(data=[5,4,3,2,1], index=[1,2,3,4,5])
In [4]: one
Out[4]:
0
1 1
2 2
3 3
4 4
5 5
In [5]: two
Out[5]:
0
1 5
2 4
3 3
4 2
5 1
In [6]: one[0].corr(two[0])
Out[6]: -1.0
Why subscript with [0]? Because that is the name of the column in the DataFrame, since you didn't give it one. When you reference a column in a DataFrame, it will return a Series, which is 1-dimensional. The documentation for this function is here.

Pandas get_dummies to output dtype integer/bool instead of float

I would like to know if could ask the get_dummies function in pandas to output the dummies dataframe with a dtype lighter than the default float64.
So, for a sample dataframe with categorical columns:
In []: df = pd.DataFrame([(blue,wood),(blue,metal),(red,wood)],
columns=['C1','C2'])
In []: df
Out[]:
C1 C2
0 blue wood
1 blue metal
2 red wood
after getting the dummies, it looks like:
In []: df = pd.get_dummies(df)
In []: df
Out[]:
C1_blue C1_red C2_metal C2_wood
0 1 0 0 1
1 1 0 1 0
2 0 1 0 1
which is perfectly fine. However, by default the 1's and 0's are float64:
In []: df.dtypes
Out[]:
C1_blue float64
C1_red float64
C2_metal float64
C2_wood float64
dtype: object
I know I can change the dtype afterwards with astype:
In []: df = pd.get_dummies(df).astype(np.int8)
But I don't want to have the dataframe with floats in memory, because I am dealing with a big dataframe (from a csv of about ~5Gb). I would like to have the dummies directly as integers.
There is an open issue w.r.t. this, see here: https://github.com/pydata/pandas/issues/8725
The float issue is now solved. From pandas version 0.19, pd.get_dummies function returns dummy-encoded columns as small integers.
See: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#get-dummies-now-returns-integer-dtypes

Categories

Resources