Trailing zeros for rounded Pandas DataFrame after applying style [duplicate] - python

I created a DataFrame in pandas for which I want to colour the cells using a colour index (low values red, high values green). I succeeded in doing so, however the colouring prevents me to format the cells.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
df = df.style.background_gradient(cmap='RdYlGn')
df
which returns
However, when I try to use df.round(2) for example to format the numbers, the following error pops up:
AttributeError: 'Styler' object has no attribute 'round'
Is there anyone who can tell me what's going wrong?

Take a look at the pandas styling guide. The df.style property returns a Styler instance, not a dataframe. From the examples in the pandas styling guide, it seems like dataframe operations (like rounding) are done first, and styling is done last. There is a section on precision in the pandas styling guide. That section proposes three different options for displaying precision of values.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
# Option 1. Round before using style.
style = df.round(2).style.background_gradient(cmap='RdYlGn')
style
# Option 2. Use option_context to set precision.
with pd.option_context('display.precision', 2):
style = df.style.background_gradient(cmap='RdYlGn')
style
# Option 3. Use .set_precision() method of styler.
style = df.style.background_gradient(cmap='RdYlGn').set_precision(2)
style

This works with me:
stylish_df = df.style.background_gradient(cmap='RdYlGn').format(precision=2)

Related

Calculate the mean in pandas while a column has a string

I am currently learning pandas and I am using an imdb movies database, which one of the columns is the duration of the movies. However, one of the values is "None", so I can´t calculate the mean because there is this string in the middle. I thought of changing the "None" to = 0, however that would skew the results. Like can be seen with the code below.
dur_temp = duration.replace("None", 0)
dur_temp = dur_temp.astype(float)
descricao_duration = dur_temp.mean()
Any ideas on what I should do in order to not skew the data? I also graphed it and it becomes more clear how it skews it.
You can replace "None" with numpy.nan, instead that using 0.
Something like this should do the trick:
import numpy as np
dur_temp = duration.replace("None", np.nan)
descricao_duration = dur_temp.mean()
if you want it working for any string in your pandas serie, you could use pd.to_numeric:
pd.to_numeric(dur_temp, errors='coerce').mean()
in this way all the values ​​that cannot be converted to float will be replaced by NaN regardless of which is
Just filter by condition like this
df[df['a']!='None'] #assuming your mean values are in column a
Make them np.NAN values
I am writing it as answer because i can't comment df = df.replace('None ', np.NaN) or df.replace('None', np.NaN, inplace=True)
You can use fillna(value=np.nan) as shown below:
descricao_duration = dur_temp.fillna(value=np.nan).mean()
Demo:
import pandas as pd
import numpy as np
dur_temp = pd.DataFrame({'duration': [10, 20, None, 15, None]})
descricao_duration = dur_temp.fillna(value=np.nan).mean()
print(descricao_duration)
Output:
duration 15.0
dtype: float64

Formatting numbers after coloring dataframe using Styler (pandas)

I created a DataFrame in pandas for which I want to colour the cells using a colour index (low values red, high values green). I succeeded in doing so, however the colouring prevents me to format the cells.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
df = df.style.background_gradient(cmap='RdYlGn')
df
which returns
However, when I try to use df.round(2) for example to format the numbers, the following error pops up:
AttributeError: 'Styler' object has no attribute 'round'
Is there anyone who can tell me what's going wrong?
Take a look at the pandas styling guide. The df.style property returns a Styler instance, not a dataframe. From the examples in the pandas styling guide, it seems like dataframe operations (like rounding) are done first, and styling is done last. There is a section on precision in the pandas styling guide. That section proposes three different options for displaying precision of values.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
# Option 1. Round before using style.
style = df.round(2).style.background_gradient(cmap='RdYlGn')
style
# Option 2. Use option_context to set precision.
with pd.option_context('display.precision', 2):
style = df.style.background_gradient(cmap='RdYlGn')
style
# Option 3. Use .set_precision() method of styler.
style = df.style.background_gradient(cmap='RdYlGn').set_precision(2)
style
This works with me:
stylish_df = df.style.background_gradient(cmap='RdYlGn').format(precision=2)

How to update StyleFrame values?

I have made a StyleFrame from some excel data. I want to update this StyleFrame values (for eg. I want to change the value in the cell corresponding to A1 to 5).
For eg, I have the following code where sf is the styleframe object:
sf[0] = 5
This code makes the entire column after the styleframe object to 5.
But I want to update the values in the styleframe. Is there anyway to do this?
Since StyleFrame wraps the values of every cell in Container which has Styler and value to change cells value you need to do something like this
sf.loc[0, 'A'].value = 3 # change the value in cell 'A1' to 3
But
The purpose of StyleFrame is to add layer of styling to DataFrame.
It is recommended to first deal with your data using DataFrame and only when your data is set, wrap it with StyleFrame and style it as you wish.
import pandas as pd
from StyleFrame import StyleFrame
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': [5, 6, 7]}, columns=['A', 'B'])
df.loc[0, 'A'] = 5 # change the value in cell 'A1' to 5
sf = StyleFrame(df)
# Do the styling here...

Applying pandas styles to arbitrary (non-product) subsets of a dataframe

How does one apply a style to an arbitrary subset of a pandas dataframe? Specifically, I have a dataframe df that contains some NaNs, and I want to apply a background gradient to it everywhere except where there are NaNs (with the same colormap applied to all cells).
I know that background_gradient (and applymap more generally) has a subset parameter, but I do not understand from the documentation how to use it to select an arbitrary subset of the dataframe.
import numpy as np
import pandas as pd
df = pd.DataFrame(data={'A': [0, 1, np.nan], 'B': [.5, np.nan, 0], 'C': [np.nan, 1, 1]})
mask = ~pd.isnull(df)
Then if I try
df.style.background_gradient(subset=mask)
I get the error:
IndexingError: Too many indexers
I know how to apply a style to a subset of a dataframe in the specific case where that subset is a Cartesian product of indices and columns, using something like the solution here: How do I style a subset of a pandas dataframe?. So the question is what to do when the subset is not such a product, as in the example above.
One solution might be to loop through the columns and apply the style column-by-column (then each application is to a Cartesian product subset). In my case, I can pass low and high parameters to the background_gradient method to force the colormaps to match up between columns, but that fails when (as above) one or more of those columns contains a unique non-NaN value. This in turn could be bypassed by rewriting the background_gradient function, but that's clearly undesirable.
You can write a custom function for this:
from matplotlib.cm import get_cmap
cmap = get_cmap('PuBu')
# update with low-high option
def threshold(x,low=0,high=1,mid=0.5):
# nan cell
if np.isnan(x): return ''
# non-nan cell
x = (x-low)/(high-low)
background = f'background-color: rgba{cmap (x, bytes=True)}'
text_color = f'color: white' if x > mid else ''
return background+';'+text_color
# apply the style
df.style.applymap(threshold, low=-1, high=1, mid=0.3)
Output:

How to convert a dataframe column to a factor using rpy2?

I have a Pandas DataFrame in Python that I am converting to an R data.frame using rpy2. Some example setup code is as follows:
import pandas as pd
import rpy2.robjects as robjects
from rpy2.robjects import r, pandas2ri
df = pd.DataFrame({
'col_1': ['a', 'b', 'c'],
'col_2': [1, 2, 3],
'col_3': [2.3, 5.4, 3.8]
})
pandas2ri.activate()
r_df = pandas2ri.py2ri(df)
col_2 is full of integer values, and as expected, during conversion this is transformed into R's int atomic mode. I can check the classes (which I understand to dictate which functions can be applied to the underlying objects) using the following:
r.sapply(r_df, r['class'])
However, this variable is actually nominal (an unordered categorical). As such, I need to convert this column into a factor.
In R I could easily do this via reassignment using something like:
r_df$col2 <- as.factor(r_df$col2)
However, I am unsure of the correct syntax using rpy2. I can access the column using the rx2 accessor method and cast the column to a factor using FactorVector.
col2 = robjects.vectors.FactorVector(r_df.rx2('col_2'))
However, I can't seem to reassign this back to the original dataframe. What is the best way to reassign this back to the original dataframe? And is there a better way to do this conversion? Thanks
Appended
I've managed to convert col_2 to a factor using the code below, but it doesn't feel like an optimal answer, as I am having to look up all of the column names, find the index of the desired column using Python methods instead of R, and then use that for reassignment.
col_2_index = list(r_df.colnames).index('col_2')
col_2 = robjects.vectors.FactorVector(r_df.rx2('col_2'))
r_df[assessor_col_index] = col_2
Ideally, I'd like to see a reassignment method that doesn't rely on looking up the column index. However, my attempts before have thrown the following errors:
r_df['col_2'] = converted_col
TypeError: SexpVector indices must be integers, not str
or
r_df.rx2('col_2') = converted_col
SyntaxError: can't assign to function call

Categories

Resources