Avoiding Excel's Scientific Notation Rounding when Parsing with Pandas - python

I have an excel file produced automatically with occasional very large numbers like 135061808695. In the excel file when you click on the cell it shows the full number 135061808695 however visually with the automatic "General" format the number appears as 1.35063E+11.
When I use ExcelFile in Pandas the it pulls the value in scientific notation 1.350618e+11 instead of the full 135061808695. Is there any way to get Pandas to pull the full value without going in an messing with the excel file?

Pandas might very well be pulling the full value but not showing it in its default output:
df = pd.DataFrame({ 'x':[135061808695.] })
df.x
0 1.350618e+11
Name: x, dtype: float64
Standard python format:
print "%15.0f" % df.x
135061808695
Or in pandas, convert to an integer type to get integer formatting:
df.x.astype(np.int64)
0 135061808695
Name: x, dtype: int64

Related

Pandas df.style.bar while maintaining rounding

When I apply the bar styling to a pandas dataframe after rounding I lose the rounding formatting, and I can't figure out how to apply the rounding formatting after because df.style.bar doesn't return a dataframe but a "Styler" object.
df = pd.DataFrame({'A': [1.23456, 2.34567,3.45678], 'B':[2,3,4]})
df['A'] = df['A'].round(2)
df.style.bar(subset='A')
This returns
but I don't want all of those extra zeros displayed.
You will have to treat a styler as purely a rendering of the original dataframe. This means you can use a format to display the data rounded to 2 decimal places.
The basic idea behind styling is that a user will want to modify the way the data is presented but still preserve the underlying format for further manipulation.
f = {'A':'{:.2f}'} #column col A to 2 decimals
df.style.format(f).bar(subset='A')
Read this excellent tutorial for exploring what all you can do with it and how.
EDIT: Added a formatting dict to show general use and to only apply the format to a single column.

convert float64 to int (excel to pandas)

I have imported excel file into python pandas. but when I display customer numbers I get in float64 format i.e
7.500505e+09 , 7.503004e+09
how do convert the column containing these numbers
int(yourVariable) will cast your float64 to a integer number.
Is this what you are looking for?
You can use pandas DataFrames style.format function to apply a formatting to a column like in https://towardsdatascience.com/style-pandas-dataframe-like-a-master-6b02bf6468b0. If you want to round the numbers to i.e. 2 decimal places follow the accepted answer in Python float to Decimal conversion. Conversion to an int reduces accuracy a bit too much i think
For the conecpt binary floats in general (what float64 is) see Python - round a float to 2 digits
Please use pd.set_option to control the precision's to be viewed
>>> import pandas as pd
>>> a = pd.DataFrame({'sam':[7.500505e+09]})
>>> pd.set_option('float_format', '{:f}'.format)
>>> a
sam
0 7500505000.000000
>>>

Pandas seems to change the value when accessing the data in a specific column

When I'm trying to access a specific value in my pandas dataframe, the output provides me with a tiny number (0.0000000000000001) adding to my original value. Why is this happening and how can I stop it?
The data is read in from a csv to a pandas dataframe, which has the value 1.009 contained in it (the csv is exactly 1.009), but when I try and access the value from it, specifying the column, then it gives me 1.0090000000000001. I don't want to simply round the number to x decimal places as my values have varying amounts of decimal places.
print(data_final.iloc[328])
# gives:
# independent 1.009
# dependent 7.757
# Name: 328, dtype: float64
print(data_final.iloc[328,0])
#gives: 1.0090000000000001
print(data_final['independent'].iloc[328])
#gives: 1.0090000000000001
I expected the output to be 1.009 however it is 1.0090000000000001!

Parsing numbers stored as text with comma as decimal and dot as thousands

I have a Excel file to import with pandas whose columns are stored as text. The caveat is that this text is a number under French/Latin convention for decimals (radix) and thousands, such that by letting pandas infer its type, it brings a text column just as it is presented in the original file:
NUMBER
0 23.639.826,11
1 92.275,00
2 1.917.000,00
8 2.409,02
9 13.501,00
Name: NUMBER, dtype: object
How can I make pandas convert this text to the correct float format without having to do the conversion on the Excel file itself or applying string methods to replace the commas and dots?
NUMBER
0 23639826.11
1 92275.00
2 1917000.00
8 2409.02
9 13501.00
I have tried using the thousands='.' parameter when reading the file with pd.read_excel as suggested by the docs to no avail and using pd.to_numeric outputs a ValueError as it is unable to parse the string.
Try df=pd.read_excel(filename, decimal=',', thousands='.')

Suppress Scientific Format in a Dataframe Column

I have a column called accountnumber with values similar to 4.11889000e+11 in a pandas dataframe. I want to suppress the scientific notation and convert the values to 4118890000. I have tried the following method and did not work.
df = pd.read_csv(data.csv)
pd.options.display.float_format = '{:,.3f}'.format
Please recommend.
You don't need the thousand separators "," and the 3 decimals for the account numbers.
Use the following instead.
pd.options.display.float_format = '{:.0f}'.format
I assume the exponential notation for the account numbers must come from the data file. If I create a small csv with the full account numbers, pandas will interpret them as integers.
acct_num
0 4118890000
1 9876543210
df['acct_num'].dtype
Out[51]: dtype('int64')
However, if the account numbers in the csv are represented in exponential notation then pandas will read them as floats.
acct_num
0 4.118890e+11
1 9.876543e+11
df['acct_num'].dtype
Out[54]: dtype('float64')
You have 2 options. First, correct the process that creates the csv so the account numbers are written out correctly. The second is to change the data type of the acct_num column to integer.
df['acct_num'] = df['acct_num'].astype('int64')
df
Out[66]:
acct_num
0 411889000000
1 987654321000

Categories

Resources