I have a column called accountnumber with values similar to 4.11889000e+11 in a pandas dataframe. I want to suppress the scientific notation and convert the values to 4118890000. I have tried the following method and did not work.
df = pd.read_csv(data.csv)
pd.options.display.float_format = '{:,.3f}'.format
Please recommend.
You don't need the thousand separators "," and the 3 decimals for the account numbers.
Use the following instead.
pd.options.display.float_format = '{:.0f}'.format
I assume the exponential notation for the account numbers must come from the data file. If I create a small csv with the full account numbers, pandas will interpret them as integers.
acct_num
0 4118890000
1 9876543210
df['acct_num'].dtype
Out[51]: dtype('int64')
However, if the account numbers in the csv are represented in exponential notation then pandas will read them as floats.
acct_num
0 4.118890e+11
1 9.876543e+11
df['acct_num'].dtype
Out[54]: dtype('float64')
You have 2 options. First, correct the process that creates the csv so the account numbers are written out correctly. The second is to change the data type of the acct_num column to integer.
df['acct_num'] = df['acct_num'].astype('int64')
df
Out[66]:
acct_num
0 411889000000
1 987654321000
Related
How can I retrain only 2 decimals for each values in a Pandas series? (I'm working with latitudes and longitudes). dtype is float64.
series = [-74.002568, -74.003085, -74.003546]
I tried using the round function but as the name suggests, it rounds. I looked into trunc() but this can only remove all decimals. Then I figures why not try running a For loop. I tried the following:
for i in series:
i = "{0:.2f}".format(i)
I was able to run the code without any errors but it didn't modify the data in any way.
Expected output would be the following:
[-74.00, -74.00, -74.00]
Anyone knows how to achieve this? Thanks!
series = [-74.002568, -74.003085, -74.003546]
["%0.2f" % (x,) for x in series]
['-74.00', '-74.00', '-74.00']
It will convert your data to string/object data type. It is just for display purpose. If you want to use it for calculation purpose then you can cast it to float. Then only one digit decimal will be visible.
[float('{0:.2f}'.format(x)) for x in series]
[-74.0, -74.0, -74.0]
here is one way to do it
assuming you meant pandas.Series, and if its true then
# you indicated its a series but defined only a list
# assuming you meant pandas.Series, and if its true then
series = [-74.002568, -74.003085, -74.003546]
s=pd.Series(series)
# use regex extract to pick the number until first two decimal places
out=s.astype(str).str.extract(r"(.*\..{2})")[0]
out
0 -74.00
1 -74.00
2 -74.00
Name: 0, dtype: object
Change the display options. This shouldn't change your underlying data.
pd.options.display.float_format = "{:,.2f}".format
I saw many similar questions like this but none have been done in csv file using python. Basically I have a column with a decimal numbers and I want to write a code where it creates 2 new columns one for just whole number and other for decimals. I turned the column into numeric using the code below.
df['hour_num'] = pd.to_numeric(df['total_time'])
I already have the column 'total_time' and 'hour_num'. I want to know how to get the column 'Whole number' and 'Decimal'
here is the pic to help better understand.
pic
You can convert the numbers to strings and split on ., convert to a DataFrame and assign to original DataFrame.
df = pd.DataFrame({'col1':[2.123, 3.557, 0.123456]})
df[['whole number', 'decimal']] = df['col1'].astype(str).str.split('.').apply(pd.Series)
df['decimal'] = ('0.' + df['decimal']).astype(float)
df['whole number'] = df['whole number'].astype(int)
Output:
col1 whole number decimal
0 2.123000 2 0.123000
1 3.557000 3 0.557000
2 0.123456 0 0.123456
I have a table example input :
Energy1 Energy2
-966.463549649 -966.463549649
-966.463608088 -966.463585840
So I need a script for summing the two energies E1 and E2 and then convert with a factor *627.51 (hartree in kcal/mol) and at the end truncate the number with 4 digits.
I never attempted this with Python. I've always written this in Julia, but I think it should be simple.
Do you know how I can find an example of reading the table and then doing operations with the numbers in it?
something like:
import numpy
data = numpy.loadtxt('table.tab')
print(data[?:,?].sum())
You can use pandas for this if you convert the table to a csv file. You add the columns directly then use the apply function with lambda to multiply each of the elements by the conversion factor. To truncate to 4 digits, you can change pandas global settings to display the format as 1 digit + 3 decimal in scientific notation.
import pandas as pd
df = pd.read_csv('something.csv')
pd.set_option('display.float_format', '{:.3E}'.format)
df['Sum Energies'] = (df['Energy1'] + df['Energy2']).apply(lambda x: x*627.51)
print(df)
This outputs:
Energy1 Energy2 Sum Energies
0 -9.665E+02 -9.665E+02 -1.213E+06
1 -9.665E+02 -9.665E+02 -1.213E+06
so I got a Dataframe with at least 2-3 columns with numbers running from 1 to 3000,
and the numbers have comma. I need to convert the numbers to float or int in all the relevant columns.this is an example for my Dataframe:
data = pd.read_csv('exampleData.csv')
data.head(10)
Out[179]:
Rank Total
1 2
20 40
1,200 1,400
NaN NaN
as you can see from the example, my Dataframe consists of numbers, numbers with comma and some NaNs.I've read several posts here about converting to float or int, but I always get error messages such as: 'str' object has no attribute 'astype'.
my approach is as follows for several columns:
cols = ['Rank', 'Total']
data[cols] = data[cols].apply(lambda x: pd.to_numeric(x.astype(str)
.str.replace(',',''), errors='coerce'))
Use the argument thousands.
pd.read_csv('exampleData.csv', thousands=',')
John's solution won't work for numbers with multiple commas, like 1,384,496.
A more scalable solution would be to just do
data = data.replace({",":""}, regex=True)
Then convert the strings to numeric.
Pandas read_csv() takes many arguments which allow you to control how fields are converted. From the documentation:
decimal : str, default ‘.’
Character to recognize as decimal point (e.g. use ‘,’ for European data).
So, here's a crazy idea: convert the numerical fields using the keyword argument, "decimal = ',' ". Then, multiply the numerical fields by 1000.
I have an excel file produced automatically with occasional very large numbers like 135061808695. In the excel file when you click on the cell it shows the full number 135061808695 however visually with the automatic "General" format the number appears as 1.35063E+11.
When I use ExcelFile in Pandas the it pulls the value in scientific notation 1.350618e+11 instead of the full 135061808695. Is there any way to get Pandas to pull the full value without going in an messing with the excel file?
Pandas might very well be pulling the full value but not showing it in its default output:
df = pd.DataFrame({ 'x':[135061808695.] })
df.x
0 1.350618e+11
Name: x, dtype: float64
Standard python format:
print "%15.0f" % df.x
135061808695
Or in pandas, convert to an integer type to get integer formatting:
df.x.astype(np.int64)
0 135061808695
Name: x, dtype: int64