Send a styled Dataframe in the body of an email - python

i have this function below which makes my dataframe pretty with a border and some highlighting. However, because I have used .style I cant use .to_html() to put the dataframe in the body an email so i use .render(). However, when I use render() the border formatting changes slightly. There is a picture below of what it looks like in Python which is how I want it, and another picture for how it looks in the email. Any idea how i can put the styled dataframe into a body of an email whilst keeping the formatting?
import win32com.client
import numpy as np
import pandas as pd
import datetime as dt
import time
import os
curr_date = dt.datetime.now().strftime("%Y%m%d")
csv = pd.read_csv("Pipeline_Signals_" + curr_date + ".csv", delimiter = ',')
df = pd.DataFrame(csv)
df = df1.replace(np.nan, '', regex=True)
def _color_red_or_green(val):
color = 'red' if "*" in val else 'white'
return 'background-color: %s' % color
df = (df.style
.applymap(_color_red_or_green)
.set_table_styles([{'selector': 'th', 'props': [('border-color', 'black'),('background-color', 'white'), ('border-style','solid')]}])
.hide_index()
.set_properties(**{'color': 'black',
'border-style' :'solid',
'border-color': 'black'}))
df1 = df.render()
import win32com.client
inbox = win32com.client.gencache.EnsureDispatch("Outlook.Application").GetNamespace("MAPI")
inbox = win32com.client.Dispatch("Outlook.Application")
mail = inbox.CreateItem(0x0)
mail.To = "test#test.co.uk"
mail.CC = "test#test.co.uk"
mail.Subject = "Test Signals " + curr_date
mail.HTMLBody = df1
mail.Display()
This is what the dataframe looks like in Python and what I want it to look like
This is what the dataframe looks like when i put it in the body of an email. For some reason, the borders change.

It seems that it comes from the CSS sheet used by default when your table is displayed. You should try to set the CSS "border-collapse" attribute of your table to "collapse" when you style it. If it doesn't work, try to set the "border-spacing" attribute to 0.

Related

Streamlit: Display Text after text, not all at once

I want to build a data annotation interface. I read in an excel file, where only a text column is relevant. So "df" could also be replaced by a list of texts. This is my code:
import streamlit as st
import pandas as pd
import numpy as np
st.title('Text Annotation')
df = pd.read_excel('mini_annotation.xlsx')
for i in range(len(df)):
text = df.iloc[i]['Text']
st.write(f"Text {i} out of {len(df)}")
st.write("Please classify the following text:")
st.write("")
st.write(text)
text_list = []
label_list = []
label = st.selectbox("Classification:", ["HOF", "NOT","Not Sure"])
if st.button("Submit"):
text_list.append(text)
label_list.append(label)
df_annotated = pd.DataFrame(columns=['Text', 'Label'])
df_annotated["Text"] = text_list
df_annotated["Label"] = label_list
df_annotated.to_csv("annotated_file.csv", sep=";")
The interface looks like this:
However, I want the interface to display just one text, e.g. the first text of my dataset. After the user has submitted his choice via the "Submit" button, I want the first text to be gone and the second text should be displayed. This process should continue until the last text of the dataset is reached.
How do I do this?
(I am aware of the Error message, for this I just have to add a key to the selectbox, but I am not sure if this is needed in the end)
This task is solved with the help of the session state. U can read about it here: session state
import streamlit as st
import pandas as pd
import numpy as np
st.title('Text Annotation')
df = pd.DataFrame(['Text 1', 'Text 2', 'Text 3', 'Text 4'], columns=['Text'])
if st.session_state.get('count') is None:
st.session_state.count = 0
st.session_state.label_list = []
with st.form("my_form"):
st.write("Progress: ", st.session_state.count, "/", len(df))
if st.session_state.count < len(df):
st.write(df.iloc[st.session_state.count]['Text'])
label = st.selectbox("Classification:", ["HOF", "NOT","Not Sure"])
submitted = st.form_submit_button("Submit")
if submitted:
st.session_state.label_list.append(label)
st.session_state.count += 1
else:
st.write("All done!")
submitted = st.form_submit_button("Submit")
if submitted:
st.write(st.session_state.label_list)

Python streamlit: Updated cells jump to default using streamlit

I try to find a solution for the following issue.
I would like to upload an excel sheet, consisting of multiple sheets (use case here 2). Afterwards I added tabs via Streamlit and used the aggrid component to be able to change some cells. However if I change cells in Sheet 1 and jump to tab 2 and back, changes are gone. This is not the desired output, meaning that any changes done in the cell should remain.
I tried via st.cache and st.experimental_memo however without success.
My code is below
import numpy as np
import streamlit as st
import pandas as pd
from st_aggrid import GridOptionsBuilder, AgGrid, GridUpdateMode, DataReturnMode, JsCode,GridOptionsBuilder
excelfile=st.sidebar.file_uploader("Select Excel-File for cleansing",key="Raw_Data")
if excelfile==None:
st.balloons()
tab1, tab2 = st.tabs(["Sheet 1", "Sheet 2"])
#st.cache()
def load_sheet1():
sheet1=pd.read_excel(excelfile,sheet_name="Sheet1")
return sheet1
#st.cache()
def load_sheet2():
sheet1=pd.read_excel(excelfile,sheet_name="Sheet2")
return sheet1
df=load_sheet1()
with tab1:
gd = GridOptionsBuilder.from_dataframe(df)
gd.configure_pagination(enabled=True)
gd.configure_default_column(editable=True, groupable=True)
gd.configure_selection(selection_mode="multiple", use_checkbox=True)
gridoptions = gd.build()
grid_table = AgGrid(
df,
gridOptions=gridoptions,
update_mode=GridUpdateMode.SELECTION_CHANGED,
theme="material",
)
df1=load_sheet2()
with tab2:
gd = GridOptionsBuilder.from_dataframe(df1)
gd.configure_pagination(enabled=True)
gd.configure_default_column(editable=True, groupable=True)
gd.configure_selection(selection_mode="multiple", use_checkbox=True)
gridoptions = gd.build()
grid_table = AgGrid(
df1,
gridOptions=gridoptions,
update_mode=GridUpdateMode.SELECTION_CHANGED,
theme="material",
)
I also can share with you my test excel file:
Sheet 1
Col1
Col2
A
C
B
D
Sheet 2
Col3
Col4
E
G
F
H
Any kind of support how to eliminate this issue would be more than awesome
EDIT: Here is a solution without the load button.
I couldn't find a way to do it without adding a button to reload the page to apply changes. Since streamlit reruns the whole code every time you interact with it is a bit tricky to rendre elements the right way. Here is your code refactored. Hope this helps !
import streamlit as st
import pandas as pd
from st_aggrid import AgGrid, GridUpdateMode, GridOptionsBuilder
# Use session_state to keep stack of changes
if 'df' not in st.session_state:
st.session_state.df = pd.DataFrame()
if 'df1' not in st.session_state:
st.session_state.df1 = pd.DataFrame()
if 'excelfile' not in st.session_state:
st.session_state.excelfile = None
#st.cache()
def load_sheet1():
sheet1 = pd.read_excel(excelfile, sheet_name="Sheet1")
return sheet1
#st.cache()
def load_sheet2():
sheet1 = pd.read_excel(excelfile, sheet_name="Sheet2")
return sheet1
def show_table(data):
if not data.empty:
gd = GridOptionsBuilder.from_dataframe(data)
gd.configure_pagination(enabled=True)
gd.configure_default_column(editable=True, groupable=True)
gd.configure_selection(selection_mode="multiple", use_checkbox=True)
gridoptions = gd.build()
grid_table = AgGrid(
data,
gridOptions=gridoptions,
# Use MODEL_CHANGED instead of SELECTION_CHANGED
update_mode=GridUpdateMode.MODEL_CHANGED,
theme="material"
)
# Get the edited table when you make changes and return it
edited_df = grid_table['data']
return edited_df
else:
return pd.DataFrame()
excelfile = st.sidebar.file_uploader("Select Excel-File for cleansing", key="Raw_Data")
if st.session_state.excelfile != excelfile:
st.session_state.excelfile = excelfile
try:
st.session_state.df = load_sheet1()
st.session_state.df1 = load_sheet2()
except:
st.session_state.df = pd.DataFrame()
st.session_state.df1 = pd.DataFrame()
tab1, tab2 = st.tabs(["Sheet 1", "Sheet 2"])
with tab1:
# Get the edited DataFrame from the ag grid object
df = show_table(st.session_state.df)
with tab2:
# Same thing here...
df1 = show_table(st.session_state.df1)
# Then you need to click on a button to make the apply changes and
# reload the page before you go to the next tab
if st.button('Apply changes'):
# Store new edited DataFrames in session state
st.session_state.df = df
st.session_state.df1 = df1
# Rerun the page so that changes apply and new DataFrames are rendered
st.experimental_rerun()
After loading your file and making your changes in the first tab hit the "apply changes" button to reload the page before moving to the second tab.

How to define a function to add cell colour for a table that created by python-docx?

I want to define a function to filling header colour,
for now, I am just hard code every object.
Here is the code I used, I tried to define a function to add cell colour.
The error msg shown below:
from docx import Document
from openpyxl import load_workbook
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
from zmq import NULL
document = Document()
k=1
for k in range (1,5):
table = document.add_table(rows=2, cols=3)
table.cell(0,0).text = ('Type')
table.cell(0,1).text = ('Value')
table.cell(0,2).text = ('Connections')
def set_table_header_bg_color(table.rows[row_ix].cell):
"""
set background shading for Header Rows
"""
tblCell = cell._tc
tblCellProperties = tc.get_or_add_tcPr()
clShading = OxmlElement('w:shd')
clShading.set(qn('w:fill'), "00519E") #Hex of Dark Blue Shade {R:0x00, G:0x51, B:0x9E}
tblCellProperties.append(clShading)
return cell
for each_row in table.rows :
for each_cell in each_row.cells:
set_table_header_bg_color(each_cell)
Could you guys help me to solve this error?
I copy this function from https://stackoverflow.com/a/62079054/18540814
but the function seems cannot use in my case......
Some things to note
You need to be careful of indentation; 'def' should not be indented and don't place it in the middle of a section of code.
There is no need to initialise a variable when assigning to a range; k=1 isn't needed the range is assigned on the next line.
No need to return the 'cell' the def is a function to change the shading on a cell, there is nothing to return.
As you have written it you are creating 3 tables with the 'k' range not sure if that is your intent.
from docx import Document
from docx.oxml.shared import qn
from docx.oxml.xmlchemy import OxmlElement
doc_name = 'doc_w_tables.docx'
document = Document()
tables = document.tables
def set_table_header_bg_color(tc):
"""
set background shading for Header Rows
"""
tblCellProperties = tc._element.tcPr
clShading = OxmlElement('w:shd')
clShading.set(qn('w:fill'), "00519E") # Hex of Dark Blue Shade {R:0x00, G:0x51, B:0x9E}
tblCellProperties.append(clShading)
for k in range(0, 3):
table = document.add_table(rows=2, cols=3)
table.cell(0, 0).text = 'Type'
table.cell(0, 1).text = 'Value'
table.cell(0, 2).text = 'Connections'
for j in range(0,3):
set_table_header_bg_color(table.rows[0].cells[j])
document.save(doc_name)
docx python-docx

How do I import various Data Frames from different python file?

I have a python file called 'clean_data.py' which has all the data frames I need, and I want to import them for use in another python file called 'main.py' to use in creating a dashboard.
Is it possible to create a class in my clean_data.py, and if so can someone direct me to an article (which I struggled to find so far) so that I can figure it out?
The aim is to shift from CSV to an API overtime, so I wanted to keep data side wrangling side of things in a different file while the web app components in the main.py file.
Any help would be much appreciated.
The code from the clean_data.py is:
import pandas as pd
import csv
import os # To access my file directory
print(os.getcwd()) # Let's me know the Current Work Directory
fdi_data = pd.read_csv(r'Data/fdi_data.csv')
fdi_meta = pd.read_csv(r'Data/fdi_metadata.csv')
debt_data = pd.read_csv(r'Data/debt_data.csv')
debt_meta = pd.read_csv(r'Data/debt_metadata.csv')
gdp_percap_data = pd.read_csv(r'Data/gdp_percap_data.csv', header=2)
gdp_percap_meta = pd.read_csv(r'Data/gdp_percap_metadata.csv')
gov_exp_data = pd.read_csv(r'Data/gov_exp_data.csv', header=2)
gov_exp_meta = pd.read_csv(r'Data/gov_exp_metadata.csv')
pop_data = pd.read_csv(r'Data/pop_data.csv', header=2)
pop_meta = pd.read_csv(r'Data/pop_metadata.csv')
"""
'wb' stands for World Bank
"""
def wb_merge_data(data, metadata):
merge = pd.merge(
data,
metadata,
on = 'Country Code',
how = 'inner'
)
return merge
fdi_merge = wb_merge_data(fdi_data, fdi_meta)
debt_merge = wb_merge_data(debt_data, debt_meta)
gdp_percap_merge = wb_merge_data(gdp_percap_data, gdp_percap_meta)
gov_exp_merge = wb_merge_data(gov_exp_data, gov_exp_meta)
pop_merge = wb_merge_data(pop_data, pop_meta)
def wb_drop_data(data):
drop = data.drop(['Country Code','Indicator Name','Indicator Code','TableName','SpecialNotes','Unnamed: 5'], axis=1)
return drop
fdi_merge = wb_drop_data(fdi_merge)
debt_merge = wb_drop_data(debt_merge)
gdp_percap_merge = wb_drop_data(gdp_percap_merge)
gov_exp_merge = wb_drop_data(gov_exp_merge)
pop_merge = wb_drop_data(pop_merge)
def wb_mr_data(data, value_name):
data = data.melt(['Country Name','Region','IncomeGroup']).reset_index()
data = data.rename(columns={'variable': 'Year', 'value': value_name})
data = data.drop('index', axis = 1)
return data
fdi_merge = wb_mr_data(fdi_merge, 'FDI')
debt_merge = wb_mr_data(debt_merge, 'Debt')
gdp_percap_merge = wb_mr_data(gdp_percap_merge, 'GDP per Cap')
gov_exp_merge = wb_mr_data(gov_exp_merge, 'Gov Expend.')
pop_merge = wb_mr_data(pop_merge, 'Population')
def avg_groupby(data, col_cal, cn=False, ig=False, rg=False):
if cn == True:
return data.groupby('Country Name')[col_cal].mean().reset_index()
elif ig == True:
return data.groupby('IncomeGroup')[col_cal].mean().reset_index()
elif rg == True:
return data.groupby('Region')[col_cal].mean().reset_index()
"""
avg_cn_... For country
avg_ig_... Income Group
avg_rg_... Region
"""
avg_cn_fdi = avg_groupby(fdi_merge, 'FDI', cn=True)
avg_ig_fdi = avg_groupby(fdi_merge, 'FDI', ig=True)
avg_rg_fdi = avg_groupby(fdi_merge, 'FDI', rg=True)
avg_cn_debt = avg_groupby(debt_merge, 'Debt', cn=True)
avg_ig_debt = avg_groupby(debt_merge, 'Debt', ig=True)
avg_rg_debt = avg_groupby(debt_merge, 'Debt', rg=True)
avg_cn_gdp_percap = avg_groupby(gdp_percap_merge, 'GDP per Cap', cn=True)
avg_ig_gdp_percap = avg_groupby(gdp_percap_merge, 'GDP per Cap', ig=True)
avg_rg_gdp_percap = avg_groupby(gdp_percap_merge, 'GDP per Cap', rg=True)
avg_cn_gexp = avg_groupby(gov_exp_merge, 'Gov Expend.', cn=True)
avg_ig_gexp = avg_groupby(gov_exp_merge, 'Gov Expend.', ig=True)
avg_rg_gexp = avg_groupby(gov_exp_merge, 'Gov Expend.', rg=True)
avg_cn_pop = avg_groupby(pop_merge, 'Population', cn=True)
avg_ig_pop = avg_groupby(pop_merge, 'Population', ig=True)
avg_rg_pop = avg_groupby(pop_merge, 'Population', rg=True)
In Python, every file is a module. So if you want to re-use your code, you can simple import this module. For example,
# main.py
import clean_data
print(clean_data.avg_cn_fdi)
Maybe you needn't create a class for this
You can import the whole python file like you'd import any other locally created files and have access to the DataFrames in them. Here's an example:
I created a file called temporary.py:
import pandas as pd
data = pd.read_csv("temp.csv")
And then in a separate file I was able to use data like so:
import temporary
print(temporary.data)
Or, you could also do:
from temporary import data
print(data)
All that being said, I don't believe that this would be the best way to handle your data.

Simple Bokeh app: Chart does not update as expected

So I have been trying to build a little something from the bokeh example there:
https://demo.bokeh.org/weather.
My dataset is really similar and this should be very straight forward, yet I have a problem that I cannot explain.
import os , pickle
import pandas as pd
from bokeh.io import curdoc
from bokeh.layouts import row, column
from bokeh.models import ColumnDataSource, Select
from bokeh.plotting import figure
base_path = '/Users/xxxxxx/Desktop/data/'
domain = 'IEM_Domain'
metric = 'total_area_burned'
def get_dataset(dic , selection , scenario):
def _get_mean(thing):
_df = pd.DataFrame(thing)
_df = _df.mean(axis = 1).cumsum(axis=0)
return _df
data = { model : _get_mean( dic[model] ) for model in dic.keys() if all([scenario in model , selection in model])}
df = pd.DataFrame(data)
return ColumnDataSource(data=df)
def make_plot(source, title):
plot = figure(x_axis_type="datetime", plot_width=800, tools="")
plot.title.text = title
for _df in source :
for col in _df.to_df().columns :
if 'index' not in col :
plot.line( _df.to_df()['index'] , _df.to_df()[col] , source = _df)
else : pass
# fixed attributes
plot.xaxis.axis_label = 'Year'
plot.yaxis.axis_label = "Area burned (km)"
plot.axis.axis_label_text_font_style = "bold"
return plot
def update_plot(attrname, old, new):
rcp45 = rcp45_select.value
rcp85 = rcp85_select.value
src45 = get_dataset(dic , rcp45 , 'rcp45')
src85 = get_dataset(dic , rcp85 , 'rcp85')
source45.data = src45.data
source85.data = src85.data
rcp45 = 'CCSM4_rcp45'
rcp85 = 'CCSM4_rcp85'
dic = pickle.load(open(os.path.join(base_path , "_".join([domain , metric ]) + '.p'), 'rb'),encoding='latin1')
rcp45_models = [ i for i in dic.keys() if 'rcp45' in i]
rcp85_models = [ i for i in dic.keys() if 'rcp85' in i]
rcp45_select = Select(value=rcp45, title='RCP 45', options=sorted(rcp45_models))
rcp85_select = Select(value=rcp85, title='RCP 85', options=sorted(rcp85_models))
source45 = get_dataset(dic , rcp45 , 'rcp45')
source85 = get_dataset(dic , rcp85 ,'rcp85')
print(source45.data)
plot = make_plot([source45 , source85], "Total area burned ")
rcp45_select.on_change('value', update_plot)
rcp85_select.on_change('value', update_plot)
controls = column(rcp45_select, rcp85_select)
curdoc().add_root(row(plot, controls))
curdoc().title = "Total Area burned"
Everything runs find until I try to change the value in the dropdown, I can see that the function update_plot() is doing the job, updating the data when the dropdown is used. But for some reason the plot doesn't change , the example works fine though. I have been digging everywhere in the code but can't seem to find what I am doing wrong.
I have tried to simplify the make_plot() to see if it could come from there but that didn't change anything so I am out of ideas.
I found that but couldn't apply it : Bokeh: chart from pandas dataframe won't update on trigger
Edit after first answer
I tried to get ride of the columndatasource and replaced it by a traditionnal dictionnary but still run into the same issue.
Here is the updated code :
import os , pickle
import pandas as pd
from bokeh.io import curdoc
from bokeh.layouts import row, column
from bokeh.models import ColumnDataSource, Select
from bokeh.plotting import figure
base_path = '/Users/julienschroder/Desktop/data/'
domain = 'IEM_Domain'
metric = 'total_area_burned'
scenarios = ['rcp45','rcp85']
def get_dataset(dic ,selection , scenario = scenarios):
#function taking the raw source as dic and a selection of models, it return a dictionnary
# like this {scenario : pd.Dataframe(models)} that way i can plot each scenario on their own
def _get_mean_cumsum(df ,name):
#Extract, average and cumsum the raw data to a dataframe
_df = pd.DataFrame(df)
_df = _df.mean(axis = 1).cumsum(axis=0)
_df = _df.to_frame(name=name)
return _df
#Just one model at a time for now but hoping to get multilines and so multi models in the future
data = { scenario : pd.concat([_get_mean_cumsum(dic[model] , model) for model in selection if scenario in model ] ,axis=1) for scenario in scenarios }
return data
def make_plot(source, title):
plot = figure(x_axis_type="datetime", plot_width=800, tools="")
plot.title.text = title
#for now it will just deal with one model at a time but in the future I hope to have some multiline plotting hence the for loops
for col in source['rcp45']:
plot.line(source['rcp45'].index,source['rcp45'][col] )
for col in source['rcp85']:
plot.line(source['rcp85'].index , source['rcp85'][col])
# fixed attributes
plot.xaxis.axis_label = 'Year'
plot.yaxis.axis_label = "Area burned (km)"
plot.axis.axis_label_text_font_style = "bold"
return plot
def update_plot(attrname, old, new):
rcp45 = rcp45_select.value
rcp85 = rcp85_select.value
source = get_dataset(dic,[rcp45 ,rcp85])
#check to see if source gets updated
print(source) # <- gets updated properly after dropdown action
rcp45 = 'CCSM4_rcp45'
rcp85 = 'CCSM4_rcp85'
# dic = pickle.load(open(os.path.join(base_path , "_".join([domain , metric ]) + '.p'), 'rb'),encoding='latin1')
dic = pickle.load(open('IEM_Domain_total_area_burned.p', 'rb'),encoding='latin1') #data available there : https://github.com/julienschroder/Bokeh_app/tree/master/1
rcp45_models = [ i for i in dic.keys() if 'rcp45' in i]
rcp85_models = [ i for i in dic.keys() if 'rcp85' in i]
rcp45_select = Select(value=rcp45, title='RCP 45', options=sorted(rcp45_models))
rcp85_select = Select(value=rcp85, title='RCP 85', options=sorted(rcp85_models))
source = get_dataset(dic,[rcp45 ,rcp85])
plot = make_plot(source , "Total area burned ")
rcp45_select.on_change('value', update_plot)
rcp85_select.on_change('value', update_plot)
controls = column(rcp45_select, rcp85_select)
curdoc().add_root(row(plot, controls))
curdoc().title = "Total Area burned"
I get my two first lines but nothing happen when using the dropdown.
I uploaded a smaller dataset on this github page if someone wants to try out the data
https://github.com/julienschroder/Bokeh_app/tree/master/1
Well, I can't say with 100% certainty since I can't run the code without the data, but I have a decent good idea what the problem might be. The .data attribute one ColumnDataSource is not actually a simple Python dictionary:
In [4]: s = ColumnDataSource(data=dict(a=[1,2], b=[3,4]))
In [5]: type(s.data)
Out[5]: bokeh.core.property.containers.PropertyValueDict
It's actually a specially wrapped dictionary that can automatically emit event notifications when its contents are changed. This is part of the machinery that let's Bokeh respond and update things automatically in such a handy way. I'm guessing that setting the .data of one source by using the .data of another source is what is somehow causing the problem. I'm hypothesizing that setting .data with something besides a real Python dicts prevents the event handlers not getting wired up correctly.
So, a suggestion for an immediate-term workaround: Don't construct a ColumnDataSource in get_dataset. Only construct and return a plain Python dictionary. It's possible that df.to_dict will just give you e exactly the right sort of dict. Or else you can make a dict by hand, putting in the columns you need.
And a request: It's possible that this limitation can be fixed up. Or if not, it's certainly possible that a loud warning can be made if a user does this. Please file a bug report on the GitHub issue tracker with all this information.

Categories

Resources