Output different precision by column with pandas.DataFrame.to_csv()?

Output different precision by column with pandas.DataFrame.to_csv()? - python

Question
Is it possible to specify a float precision specifically for each column to be printed by the Python pandas package method pandas.DataFrame.to_csv?
Background
If I have a pandas dataframe that is arranged like this:
In [53]: df_data[:5]
Out[53]:
year month day lats lons vals
0 2012 6 16 81.862745 -29.834254 0.0
1 2012 6 16 81.862745 -29.502762 0.1
2 2012 6 16 81.862745 -29.171271 0.0
3 2012 6 16 81.862745 -28.839779 0.2
4 2012 6 16 81.862745 -28.508287 0.0
There is the float_format option that can be used to specify a precision, but this applys that precision to all columns of the dataframe when printed.
When I use that like so:
df_data.to_csv(outfile, index=False,
header=False, float_format='%11.6f')
I get the following, where vals is given an inaccurate precision:
2012,6,16, 81.862745, -29.834254, 0.000000
2012,6,16, 81.862745, -29.502762, 0.100000
2012,6,16, 81.862745, -29.171270, 0.000000
2012,6,16, 81.862745, -28.839779, 0.200000
2012,6,16, 81.862745, -28.508287, 0.000000

Change the type of column "vals" prior to exporting the data frame to a CSV file
df_data['vals'] = df_data['vals'].map(lambda x: '%2.1f' % x)
df_data.to_csv(outfile, index=False, header=False, float_format='%11.6f')

The more current version of hknust's first line would be:
df_data['vals'] = df_data['vals'].map(lambda x: '{0:.1}'.format(x))
To print without scientific notation:
df_data['vals'] = df_data['vals'].map(lambda x: '{0:.1f}'.format(x))

This question is a bit old, but I'd like to contribute with a better answer, I think so:
formats = {'lats': '{:10.5f}', 'lons': '{:.3E}', 'vals': '{:2.1f}'}
for col, f in formats.items():
df_data[col] = df_data[col].map(lambda x: f.format(x))
I tried with the solution here, but it didn't work for me, I decided to experiment with previus solutions given here combined with that from the link above.

You can use round method for dataframe before saving the dataframe to the file.
df_data = df_data.round(6)
df_data.to_csv('myfile.dat')

You can do this with to_string. There is a formatters argument where you can provide a dict of columns names to formatters. Then you can use some regexp to replace the default column separators with your delimiter of choice.

The to_string approach suggested by #mattexx looks better to me, since it doesn't modify the dataframe.
It also generalizes well when using jupyter notebooks to get pretty HTML output, via the to_html method. Here we set a new default precision of 4, and override it to get 5 digits for a particular column wider:
from IPython.display import HTML
from IPython.display import display
pd.set_option('precision', 4)
display(HTML(df.to_html(formatters={'wider': '{:,.5f}'.format})))

Related

pandas: convert column with multiple datatypes to int, ignore errors

I have a column with data that needs some massaging. the column may contain strings or floats. some strings are in exponential form. Id like to best try to format all data in this column as a whole number where possible, expanding any exponential notation to integer. So here is an example
df = pd.DataFrame({'code': ['1170E1', '1.17E+04', 11700.0, '24477G', '124601', 247602.0]})
df['code'] = df['code'].astype(int, errors = 'ignore')
The above code does not seem to do a thing. i know i can convert the exponential notation and decimals with simply using the int function, and i would think the above astype would do the same, but it does not. for example, the following code work in python:
int(1170E1), int(1.17E+04), int(11700.0)
> (11700, 11700, 11700)
Any help in solving this would be appreciated. What i'm expecting the output to look like is:
0 '11700'
1 '11700'
2 '11700
3 '24477G'
4 '124601'
5 '247602'

You may check with pd.to_numeric
df.code = pd.to_numeric(df.code,errors='coerce').fillna(df.code)
Out[800]:
0 11700.0
1 11700.0
2 11700.0
3 24477G
4 124601.0
5 247602.0
Name: code, dtype: object
Update
df['code'] = df['code'].astype(object)
s = pd.to_numeric(df['code'],errors='coerce')
df.loc[s.notna(),'code'] = s.dropna().astype(int)
df
Out[829]:
code
0 11700
1 11700
2 11700
3 24477G
4 124601
5 247602

BENY's answer should work, although you potentially leave yourself open to catching exceptions and filling that you don't want to. This will also do the integer conversion you are looking for.
def convert(x):
try:
return str(int(float(x)))
except ValueError:
return x
df = pd.DataFrame({'code': ['1170E1', '1.17E+04', 11700.0, '24477G', '124601', 247602.0]})
df['code'] = df['code'].apply(convert)
outputs
0 11700
1 11700
2 11700
3 24477G
4 124601
5 247602
where each element is a string.
I will be the first to say, I'm not proud of that triple cast.

Formating numbers in multiindex array Pandas

I have a dataframe that looks like this:
Admin ... Unnamed: 14
Job Family Name Values ...
Dentist McDentistFace, Dentist UDS Encounters 0.000000 ... 1.000000
Actual FTE 0.000000 ... 1.000000
UDS Encounters2 NaN ... 1475.000000
Actual FTE2 NaN ... 7.589426
Where the Job Family, Name, and Values are all dimensions of a multiindex.
I'm trying to format the float values in the file, but can't seem to get it to work. I have been able to highlight certain rows with this line:
for i in flagged_providers:
ind = flagged_providers.index(i) * 4
for q in i.results.keys():
style.apply(highlight_col, axis=0, subset=(style.index[ind: ind + 4], q))
# style.apply(format_numbers, axis=0, subset=(style.index[ind: ind + 2], q))
where format_numbers is:
def format_numbers(s):
return f'{s:,.2f}'
and I have also tried this:
for i in flagged_providers:
format_dict[(i.jfam, i.name)] = '{:.2f}'
format_dict[(i.jfam, i.name)] = '{:.2f}'
style.format(formatter=format_dict)
But I can't quite seem to get it to work. Hoping for any ideas? I want to format the first two rows as percentages, then export to excel using the to_excel function.

I figured it out finally. Probably a better way to do this but what worked was:
style.applymap(lambda x: 'number-format:0.00%;', subset=(style.index[ind: ind + 2], locations))

How to convert string into datetime?

I'm quite new to Python and I'm encountering a problem.
I have a dataframe where one of the columns is the departure time of flights. These hours are given in the following format : 1100.0, 525.0, 1640.0, etc.
This is a pandas series which I want to transform into a datetime series such as : S = [11.00, 5.25, 16.40,...]
What I have tried already :
Transforming my objects into string :
S = [str(x) for x in S]
Using datetime.strptime :
S = [datetime.strptime(x,'%H%M.%S') for x in S]
But since they are not all the same format it doesn't work
Using parser from dateutil :
S = [parser.parse(x) for x in S]
I got the error :
'Unknown string format'
Using the panda datetime :
S= pd.to_datetime(S)
Doesn't give me the expected result
Thanks for your answers !

Since it's a columns within a dataframe (A series), keep it that way while transforming should work just fine.
S = [1100.0, 525.0, 1640.0]
se = pd.Series(S) # Your column
# se:
0 1100.0
1 525.0
2 1640.0
dtype: float64
setime = se.astype(int).astype(str).apply(lambda x: x[:-2] + ":" + x[-2:])
This transform the floats to correctly formatted strings:
0 11:00
1 5:25
2 16:40
dtype: object
And then you can simply do:
df["your_new_col"] = pd.to_datetime(setime)

How about this?
(Added an if statement since some entries have 4 digits before decimal and some have 3. Added the use case of 125.0 to account for this)
from datetime import datetime
S = [1100.0, 525.0, 1640.0, 125.0]
for x in S:
if str(x).find(".")==3:
x="0"+str(x)
print(datetime.strftime(datetime.strptime(str(x),"%H%M.%S"),"%H:%M:%S"))

You might give it a go as follows:
# Just initialising a state in line with your requirements
st = ["1100.0", "525.0", "1640.0"]
dfObj = pd.DataFrame(st)
# Casting the string column to float
dfObj_num = dfObj[0].astype(float)
# Getting the hour representation out of the number
df1 = dfObj_num.floordiv(100)
# Getting the minutes
df2 = dfObj_num.mod(100)
# Moving the minutes on the right-hand side of the decimal point
df3 = df2.mul(0.01)
# Combining the two dataframes
df4 = df1.add(df3)
# At this point can cast to other types
Result:
0 11.00
1 5.25
2 16.40
You can run this example to verify the steps for yourself, also you can make it into a function. Make slight variations if needed in order to tweak it according to your precise requirements.
Might be useful to go through this article about Pandas Series.
https://www.geeksforgeeks.org/python-pandas-series/

There must be a better way to do this, but this works for me.
df=pd.DataFrame([1100.0, 525.0, 1640.0], columns=['hour'])
df['hour_dt']=((df['hour']/100).apply(str).str.split('.').str[0]+'.'+
df['hour'].apply((lambda x: '{:.2f}'.format(x/100).split('.')[1])).apply(str))
print(df)
hour hour_dt
0 1100.0 11.00
1 525.0 5.25
2 1640.0 16.40

Python Tabulate format only one float column

I'm using the tabulate module to print a fixed width file and I have one column that I need formatted in such a way that there are 19 places to the left of the decimal and 2 places to the right of the decimal.
import pandas as pd
from tabulate import tabulate
df = pd.DataFrame.from_dict({'A':['x','y','z'],
'B':[1,1.1,11.21],'C':[34.2334,81.1,11]})
df
Out[4]:
A B C
0 x 1.00 34.2334
1 y 1.10 81.1000
2 z 11.21 11.0000
df['C'] = df['C'].apply(lambda x: format(x,'0>22.2f'))
df
Out[6]:
A B C
0 x 1.00 0000000000000000034.23
1 y 1.10 0000000000000000081.10
2 z 11.21 0000000000000000011.00
print(tabulate(df))
- - ----- -----
0 x 1 34.23
1 y 1.1 81.1
2 z 11.21 11
- - ----- -----
Is there any way I can preserve the formatting in column C without affecting the formatting in column B? I know I could use floatfmt = '0>22.2f' but I don't need column B to look that way just column C.
According to the tabulate documentation strings that look like decimals will be automatically converted to numeric. If I could suppress this then format my table before printing (as in the example above) that would solve it for me as well.

The documentation at GitHub is more up-to-date and it states that with floatfmt "every column may have different number formatting". Here is an example using your data:
import pandas as pd
from tabulate import tabulate
df = pd.DataFrame.from_dict({'A':['x','yy','zzz'],
'B':[1,1.1,11.21],'C':[34.2334,81.1,11]})
print(tabulate(df, floatfmt=(None, None, '.2f', '0>22.2f',)))
The result is:
- --- ----- ----------------------
0 x 1.00 0000000000000000034.23
1 yy 1.10 0000000000000000081.10
2 zzz 11.21 0000000000000000011.00
- --- ----- ----------------------
Additionally, as you suggested, you also have the option disable_numparse which disables the automatic convert from string to numeric. You can then format each field manually but this requires more coding. The option colalign may come handy in such a case, so that you can specify different column alignment for strings and numbers (which you would have converted to formatted strings, too).

Do you absolutely need tabulate for this? You can achieve similar effect (bar dashes) with:
In [18]: print(df.__repr__().split('\n',1)[1])
0 x 1.00 0000000000000000034.23
1 y 1.10 0000000000000000081.10
2 z 11.21 0000000000000000011.00
df.__repr__ is representation of df, i.e. what you see when you just type df in a cell. Then I remove the header line by splitting on the first new line char and taking the other half of the split.
Also, if you write it to a machine readable form, you might want to use tabs:
In [8]: df.to_csv(sys.stdout, sep='\t', header=False)
0 x 1.0 0000000000000000034.23
1 y 1.1 0000000000000000081.10
2 z 11.21 0000000000000000011.00
It will render pretty depending on tab rendering settings, but if you output in a file, then you get tab symbols

How can I remove extra digits of a float64 value?

I have a data frame column.
P08107 3.658940e-11
P62979 4.817399e-05
P16401 7.784275e-05
Q96B49 7.784275e-05
Q15637 2.099078e-04
P31689 1.274387e-03
P62258 1.662718e-03
P07437 3.029516e-03
O00410 3.029516e-03
P23381 3.029516e-03
P27348 5.733834e-03
P29590 9.559550e-03
P25685 9.957186e-03
P09429 1.181282e-02
P62937 1.260040e-02
P11021 1.396807e-02
P31946 1.409311e-02
P19338 1.503901e-02
Q14974 2.213431e-02
P11142 2.402201e-02
I want to leave one decimal and remove extra digits, that it looks like
3.7e-11
instead of
3.658940e-11
and etc with all the others.
I know how to slice a string but it doesn't seem to work here.

If you have a pandas dataframe you could set the display options.
import pandas as pd
import numpy as np
pd.options.display.float_format = '{:.2f}'.format
pd.DataFrame(dict(randomvalues=np.random.random_sample((5,))))
Returns:
randomvalues
0 0.02
1 0.66
2 0.24
3 0.87
4 0.63

You could use str.format:
>>> '{:.2g}'.format(3.658940e-11)
'3.7e-11'
String slicing will not work here, because it does not round the values:
>>> s = '3.658940e-11'
>>> s[:3] + 'e' + s.split('e')[1]
'3.6e-11'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Output different precision by column with pandas.DataFrame.to_csv()? - python

Change the type of column "vals" prior to exporting the data frame to a CSV file df_data['vals'] = df_data['vals'].map(lambda x: '%2.1f' % x) df_data.to_csv(outfile, index=False, header=False, float_format='%11.6f')

The more current version of hknust's first line would be: df_data['vals'] = df_data['vals'].map(lambda x: '{0:.1}'.format(x)) To print without scientific notation: df_data['vals'] = df_data['vals'].map(lambda x: '{0:.1f}'.format(x))

You can use round method for dataframe before saving the dataframe to the file. df_data = df_data.round(6) df_data.to_csv('myfile.dat')

You can do this with to_string. There is a formatters argument where you can provide a dict of columns names to formatters. Then you can use some regexp to replace the default column separators with your delimiter of choice.

Related

pandas: convert column with multiple datatypes to int, ignore errors

Formating numbers in multiindex array Pandas

How to convert string into datetime?

Python Tabulate format only one float column

How can I remove extra digits of a float64 value?

Categories

Resources