How to update StyleFrame values? - python

I have made a StyleFrame from some excel data. I want to update this StyleFrame values (for eg. I want to change the value in the cell corresponding to A1 to 5).
For eg, I have the following code where sf is the styleframe object:
sf[0] = 5
This code makes the entire column after the styleframe object to 5.
But I want to update the values in the styleframe. Is there anyway to do this?

Since StyleFrame wraps the values of every cell in Container which has Styler and value to change cells value you need to do something like this
sf.loc[0, 'A'].value = 3 # change the value in cell 'A1' to 3
But
The purpose of StyleFrame is to add layer of styling to DataFrame.
It is recommended to first deal with your data using DataFrame and only when your data is set, wrap it with StyleFrame and style it as you wish.
import pandas as pd
from StyleFrame import StyleFrame
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': [5, 6, 7]}, columns=['A', 'B'])
df.loc[0, 'A'] = 5 # change the value in cell 'A1' to 5
sf = StyleFrame(df)
# Do the styling here...

Related

New DataFrame is being made when updating dataframe inside a loop

I am trying to make some changes to three dataframes in a loop in this manner.
for sheet in [f1, f2, f3]:
sheet = preprocess_df(sheet)
The preprocess_df function looks like this
def preprocess_df(df):
""" Making a function to preprocess a dataframe individually rather then all three together """
# make column names uniform
columns = [
"Reporting_Type",
"AA_name",
"Date_DD/MM/YYYY",
"Time_HHMMSS",
"Type",
"Name",
"FI_Type",
"Count_linked",
"Average_timelag_FI_Notification",
"FI_Ready_to_FI_request_ratio",
"Count_Consent_Raised",
"Actioned_to_raised_ratio",
"Approved_to_raised_ratio",
"FI_Ready_to_FI_request_ratio(Daily)",
"Daily_Consent_Requests_Data_Delivered",
"Total_Consent_Requests_Data_Delivered",
"Consent_Requests_Data_Delivered_To_Raised_Ratio",
"Daily_Consent_Requests_Raised",
"Daily Consent_Requests_Data_Delivered_To_Raised_Ratio",
]
# Set the sheet size
df = df.iloc[:, :19]
# Set the column names
df.columns = columns
return df
I am basically updating the column names and fixing the dataframe size. The issue that I face is that the sheet variable does get updated if I print the dataframe inside the loop, however, the original f1, f2 and f3 dataframes don't get updated. I think this is because the sheet variable creates a copy of f1 etc. rather than actually using the same dataframe. This seems like an extension of pass-by-reference or pass-by-value concept. Is there a way I could make inplace changes to all the sheets inside the loop?
Indeed, a copy of the dataframe is created when you are doing df = df.iloc[:, :19].
However, you can get around this by using drop, with inplace=True:
import pandas as pd
import numpy as np
def preprocess_df(df):
columns = [
"a",
"b",
] # Swap this list with yours
df.drop(df.columns[:2],inplace=True, axis=1) # Replace 2 with 19 in your code
df.columns = columns
f1 = pd.DataFrame(np.arange(12).reshape(3, 4),columns=['A', 'B', 'C', 'D']) # Just an example
preprocess_df(f1) # You can put this in your for loop
print(f1)
The above code will output something like:
a b
0 0 1
1 4 5
2 8 9

Trailing zeros for rounded Pandas DataFrame after applying style [duplicate]

I created a DataFrame in pandas for which I want to colour the cells using a colour index (low values red, high values green). I succeeded in doing so, however the colouring prevents me to format the cells.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
df = df.style.background_gradient(cmap='RdYlGn')
df
which returns
However, when I try to use df.round(2) for example to format the numbers, the following error pops up:
AttributeError: 'Styler' object has no attribute 'round'
Is there anyone who can tell me what's going wrong?
Take a look at the pandas styling guide. The df.style property returns a Styler instance, not a dataframe. From the examples in the pandas styling guide, it seems like dataframe operations (like rounding) are done first, and styling is done last. There is a section on precision in the pandas styling guide. That section proposes three different options for displaying precision of values.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
# Option 1. Round before using style.
style = df.round(2).style.background_gradient(cmap='RdYlGn')
style
# Option 2. Use option_context to set precision.
with pd.option_context('display.precision', 2):
style = df.style.background_gradient(cmap='RdYlGn')
style
# Option 3. Use .set_precision() method of styler.
style = df.style.background_gradient(cmap='RdYlGn').set_precision(2)
style
This works with me:
stylish_df = df.style.background_gradient(cmap='RdYlGn').format(precision=2)

Creating multiindex header in Pandas

I have a data frame in form of a time series looking like this:
and a second table with additional information to the according column(names) like this:
Now, I want to combine the two, adding specific information from the second table into the header of the first one. With a result like this:
I have the feeling the solution to this is quite trivial, but somehow I just cannot get my head around it. Any help/suggestions/hints on how to approach this?
MWE to create to tables:
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],columns=['a', 'b', 'c'])
df2 = pd.DataFrame([['a','b','c'],['a_desc','b_desc','c_desc'],['a_unit','b_unit','c_unit']]).T
df2.columns=['MSR','OBJDESC','UNIT']
You could get a metadata dict for each of the original column names and then update the original df
# store the column metadata you want in the header here
header_metadata = {}
# loop through your second df
for i, row in df2.iterrows():
# get the column name that this corresponds to
column_name = row.pop('MSR')
# we don't want `scale` metadta
row.pop('SCALE')
# we will want to add the data in dict(row) to our first df
header_metadata[column_name] = dict(row)
# rename the columns of your first df
df1.columns = (
'\n'.join((c, *header_metadata[c]))
for c in df1.columns
)

Formatting numbers after coloring dataframe using Styler (pandas)

I created a DataFrame in pandas for which I want to colour the cells using a colour index (low values red, high values green). I succeeded in doing so, however the colouring prevents me to format the cells.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
df = df.style.background_gradient(cmap='RdYlGn')
df
which returns
However, when I try to use df.round(2) for example to format the numbers, the following error pops up:
AttributeError: 'Styler' object has no attribute 'round'
Is there anyone who can tell me what's going wrong?
Take a look at the pandas styling guide. The df.style property returns a Styler instance, not a dataframe. From the examples in the pandas styling guide, it seems like dataframe operations (like rounding) are done first, and styling is done last. There is a section on precision in the pandas styling guide. That section proposes three different options for displaying precision of values.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
# Option 1. Round before using style.
style = df.round(2).style.background_gradient(cmap='RdYlGn')
style
# Option 2. Use option_context to set precision.
with pd.option_context('display.precision', 2):
style = df.style.background_gradient(cmap='RdYlGn')
style
# Option 3. Use .set_precision() method of styler.
style = df.style.background_gradient(cmap='RdYlGn').set_precision(2)
style
This works with me:
stylish_df = df.style.background_gradient(cmap='RdYlGn').format(precision=2)

Panda's DataFrame dup each row, apply changes to the duplicate and combine back into a dataframe

I need to create a duplicate for each row in a dataframe, apply some basic operations to the duplicate row and then combine these dupped rows along with the originals back into a dataframe.
I'm trying to use apply for it and the print shows that it's working correctly but when I return these 2 rows from the function and the dataframe is assembled I get an error message "cannot copy sequence with size 7 to array axis with dimension 2". It is as if it's trying to fit these 2 new rows back into the original 1 row slot. Any insight on how I can achieve it within apply (and not by iterating over every row in a loop)?
def f(x):
x_cpy=x.copy()
x_cpy['A']=x['B']
print(pd.concat([x,x_cpy],axis=1).T.reset_index(drop=True))
#return pd.concat([x,x_cpy],axis=1).T.reset_index(drop=True)
hld_pos.apply(f,axis=1)
The apply function of pandas operates along an axis. With axis=1, it operates along every row. To do something like what you're trying to do, think of how you would construct a new row from your existing row. Something like this should work:
import pandas as pd
my_df = pd.DataFrame({'a': [1, 2, 3], 'b': [2, 4, 6]})
def f(row):
"""Return a new row with the items of the old row squared"""
pd.Series({'a': row['a'] ** 2, 'b': row['b'] ** 2})
new_df = my_df.apply(f, axis=1)
combined = concat([my_df, new_df], axis=0)

Categories

Resources