When I apply the bar styling to a pandas dataframe after rounding I lose the rounding formatting, and I can't figure out how to apply the rounding formatting after because df.style.bar doesn't return a dataframe but a "Styler" object.
df = pd.DataFrame({'A': [1.23456, 2.34567,3.45678], 'B':[2,3,4]})
df['A'] = df['A'].round(2)
df.style.bar(subset='A')
This returns
but I don't want all of those extra zeros displayed.
You will have to treat a styler as purely a rendering of the original dataframe. This means you can use a format to display the data rounded to 2 decimal places.
The basic idea behind styling is that a user will want to modify the way the data is presented but still preserve the underlying format for further manipulation.
f = {'A':'{:.2f}'} #column col A to 2 decimals
df.style.format(f).bar(subset='A')
Read this excellent tutorial for exploring what all you can do with it and how.
EDIT: Added a formatting dict to show general use and to only apply the format to a single column.
Related
How can I retrain only 2 decimals for each values in a Pandas series? (I'm working with latitudes and longitudes). dtype is float64.
series = [-74.002568, -74.003085, -74.003546]
I tried using the round function but as the name suggests, it rounds. I looked into trunc() but this can only remove all decimals. Then I figures why not try running a For loop. I tried the following:
for i in series:
i = "{0:.2f}".format(i)
I was able to run the code without any errors but it didn't modify the data in any way.
Expected output would be the following:
[-74.00, -74.00, -74.00]
Anyone knows how to achieve this? Thanks!
series = [-74.002568, -74.003085, -74.003546]
["%0.2f" % (x,) for x in series]
['-74.00', '-74.00', '-74.00']
It will convert your data to string/object data type. It is just for display purpose. If you want to use it for calculation purpose then you can cast it to float. Then only one digit decimal will be visible.
[float('{0:.2f}'.format(x)) for x in series]
[-74.0, -74.0, -74.0]
here is one way to do it
assuming you meant pandas.Series, and if its true then
# you indicated its a series but defined only a list
# assuming you meant pandas.Series, and if its true then
series = [-74.002568, -74.003085, -74.003546]
s=pd.Series(series)
# use regex extract to pick the number until first two decimal places
out=s.astype(str).str.extract(r"(.*\..{2})")[0]
out
0 -74.00
1 -74.00
2 -74.00
Name: 0, dtype: object
Change the display options. This shouldn't change your underlying data.
pd.options.display.float_format = "{:,.2f}".format
I am trying to add two columns together using the Decimal module in Python but can't seem to get the syntax right for this. I have 2 columns called month1 and month2 and do not want these to become floats at any point in the outcome as division and then rounding will later be required.
The month1 and month2 columns are already to several decimals as they are averages and I need to preserve this accuracy in the addition.
I can see guidance online for how to add numbers together using Decimal but not how to apply it to columns in a pandas dataframe. I've tried things like:
df['MonthTotal'] = Decimal.decimal(df['Month1']) + Decimal.decimal(df['Month1'])
What is the solution?
from decimal import Decimal
def convert_decimal(row):
row["monthtotal"] = Decimal(row["month1"])+Decimal(row["month2"])
return row
df = df.apply(convert_decimal, axis =1)
decimal.Decimal is designed to accept single value, not pandas.Series of them. Assuming that your column is holding strings representing number values, you might use .applymap for using decimal.Decimal element-wise i.e.:
import decimal
import pandas as pd
df = pd.DataFrame({'x':['0.1','0.1','0.1'],'y':['0.1','0.1','0.1'],'z':['0.1','0.1','0.1']})
df_decimal = df.applymap(decimal.Decimal)
df_decimal["total"] = df_decimal.x + df_decimal.y + df_decimal.z
print(df_decimal.total[0])
print(type(df_decimal.total[0]))
output
0.3
<class 'decimal.Decimal'>
I am taking over a project that is built in a pandas data frame where there is a large amount of measurements in this format: 6x6 , 52x14
I need to go in and add a quote (") inches unit symbol after each number in two specific columns that have this type of measurement data, the desired outcomes in the above examples would look like this 6"x6" , 52"x14"
How could I concisely write a code segment to add these quotes after each numeric value in those two columns? Another challenging piece is that there is other measurement data in these columns like the word large, small etc. but the only thing I am concerned with is adding the inch mark after each number.
Here's how to do the string replacement for units with a regex (but depending on your use-case, it might make more sense to split them into separate (numeric) columns width, length; see below):
import pandas as pd
df = pd.DataFrame({'measurements': ['6x6', '52x14']})
df['measurements'].str.replace(r'(\d+)', '\\1"')
0 6"x6"
1 52"x14"
whereas if you want separate (numeric) length, width columns:
df[['length','width']] = df['measurements'].str.partition('x')[[0,2]].astype(int)
measurements length width
0 6x6 6 6
1 52x14 52 14
Separate numeric columns is way cleaner if you'll be doing any calculations (e.g. df['area'] = df.apply(lambda row: row['length']*row['width'], axis=1)).
You could then add your custom units formatting via:
globally override pd.options.display.float_format = '{:.2f}"'.format (although your dimensions are ints, not floats). And that hack will override display of all float columns in all dfs.
or on a column- and dataframe-specific basis, use the pandas Styling API (using CSS)
or here's a total hack to override and monkey-patch pandas.io.formats.format.IntArrayFormatter for floats, since pandas bizarrely doesn't have an equivalent of pd.options.display.float_format for ints
until pandas natively implements this enhance for unit support, the 'right' way to do this is How can I manage units in pandas data?, use the pint package.
Note:
in df[['length','width']] = df['measurements'].str.partition('x')[[0,2]].astype(int), we had to do the [[0,2]] subscripting to exclude the 'x' symbol itself that partition returned. Also we had to do .astype(int) to cast from string/pandas 'object' to int.
I have a dataframe, 11 columns 18k rows. The last column is either a 1 or 0, but when I use .describe() all I get is
count 19020
unique 2
top 1
freq 12332
Name: Class, dtype: int64
as opposed to an actual statistical analysis with mean, std, etc.
Is there a way to do this?
If your numeric (0, 1) column is not being picked up automatically by .describe(), it might be because it's not actually encoded as an int dtype. You can see this in the documentation of the .describe() method, which tells you that the default include parameter is only for numeric types:
None (default) : The result will include all numeric columns.
My suggestion would be the following:
df.dtypes # check datatypes
df['num'] = df['num'].astype(int) # if it's not integer, cast it as such
df.describe(include=['object', 'int64']) # explicitly state the data types you'd like to describe
That is, first check the datatypes (I'm assuming the column is called num and the dataframe df, but feel free to substitute with the right ones). If this indicator/(0,1) column is indeed not encoded as int/integer type, then cast it as such by using .astype(int). Then, you can freely use df.describe() and perhaps even specify columns of which data types you want to include in the description output, for more fine-grained control.
You could use
# percentile list
perc =[.20, .40, .60, .80]
# list of dtypes to include
include =['object', 'float', 'int']
data.describe(percentiles = perc, include = include)
where data is your dataframe (important point).
Since you are new to stack, I might suggest that you include some actual code (i.e. something showing how and on what you are using your methods). You'll get better answers
I am trying to alter my dataframe with the following line of code:
df = df[df['P'] <= cutoff]
However, if for example I set cutoff to be 0.1, numbers such as 0.100496 make it through the filter.
My suspicion is that my initial dataframe has entries in scientific notation and float format as well. Could this be affecting the rounding and precision? Is there a potential workaround to this issue.
Thank you in advance.
EDIT: I am reading from a file. Here is a sample of the total data.
2.29E-98
1.81E-42
2.19E-35
3.35E-30
0.0313755
0.0313817
0.03139
0.0313991
0.0314062
0.1003476
0.1003483
0.1003487
0.1003521
0.100496
Floating point comparison isn't perfect. For example
>>> 0.10000000000000000000000000000000000001 <= 0.1
True
Have a look at numpy.isclose. It allows you to compare floats and set a tolerance for the comparison.
Similar question here