Python streamlit: Updated cells jump to default using streamlit - python

I try to find a solution for the following issue.
I would like to upload an excel sheet, consisting of multiple sheets (use case here 2). Afterwards I added tabs via Streamlit and used the aggrid component to be able to change some cells. However if I change cells in Sheet 1 and jump to tab 2 and back, changes are gone. This is not the desired output, meaning that any changes done in the cell should remain.
I tried via st.cache and st.experimental_memo however without success.
My code is below
import numpy as np
import streamlit as st
import pandas as pd
from st_aggrid import GridOptionsBuilder, AgGrid, GridUpdateMode, DataReturnMode, JsCode,GridOptionsBuilder
excelfile=st.sidebar.file_uploader("Select Excel-File for cleansing",key="Raw_Data")
if excelfile==None:
st.balloons()
tab1, tab2 = st.tabs(["Sheet 1", "Sheet 2"])
#st.cache()
def load_sheet1():
sheet1=pd.read_excel(excelfile,sheet_name="Sheet1")
return sheet1
#st.cache()
def load_sheet2():
sheet1=pd.read_excel(excelfile,sheet_name="Sheet2")
return sheet1
df=load_sheet1()
with tab1:
gd = GridOptionsBuilder.from_dataframe(df)
gd.configure_pagination(enabled=True)
gd.configure_default_column(editable=True, groupable=True)
gd.configure_selection(selection_mode="multiple", use_checkbox=True)
gridoptions = gd.build()
grid_table = AgGrid(
df,
gridOptions=gridoptions,
update_mode=GridUpdateMode.SELECTION_CHANGED,
theme="material",
)
df1=load_sheet2()
with tab2:
gd = GridOptionsBuilder.from_dataframe(df1)
gd.configure_pagination(enabled=True)
gd.configure_default_column(editable=True, groupable=True)
gd.configure_selection(selection_mode="multiple", use_checkbox=True)
gridoptions = gd.build()
grid_table = AgGrid(
df1,
gridOptions=gridoptions,
update_mode=GridUpdateMode.SELECTION_CHANGED,
theme="material",
)
I also can share with you my test excel file:
Sheet 1
Col1
Col2
A
C
B
D
Sheet 2
Col3
Col4
E
G
F
H
Any kind of support how to eliminate this issue would be more than awesome

EDIT: Here is a solution without the load button.
I couldn't find a way to do it without adding a button to reload the page to apply changes. Since streamlit reruns the whole code every time you interact with it is a bit tricky to rendre elements the right way. Here is your code refactored. Hope this helps !
import streamlit as st
import pandas as pd
from st_aggrid import AgGrid, GridUpdateMode, GridOptionsBuilder
# Use session_state to keep stack of changes
if 'df' not in st.session_state:
st.session_state.df = pd.DataFrame()
if 'df1' not in st.session_state:
st.session_state.df1 = pd.DataFrame()
if 'excelfile' not in st.session_state:
st.session_state.excelfile = None
#st.cache()
def load_sheet1():
sheet1 = pd.read_excel(excelfile, sheet_name="Sheet1")
return sheet1
#st.cache()
def load_sheet2():
sheet1 = pd.read_excel(excelfile, sheet_name="Sheet2")
return sheet1
def show_table(data):
if not data.empty:
gd = GridOptionsBuilder.from_dataframe(data)
gd.configure_pagination(enabled=True)
gd.configure_default_column(editable=True, groupable=True)
gd.configure_selection(selection_mode="multiple", use_checkbox=True)
gridoptions = gd.build()
grid_table = AgGrid(
data,
gridOptions=gridoptions,
# Use MODEL_CHANGED instead of SELECTION_CHANGED
update_mode=GridUpdateMode.MODEL_CHANGED,
theme="material"
)
# Get the edited table when you make changes and return it
edited_df = grid_table['data']
return edited_df
else:
return pd.DataFrame()
excelfile = st.sidebar.file_uploader("Select Excel-File for cleansing", key="Raw_Data")
if st.session_state.excelfile != excelfile:
st.session_state.excelfile = excelfile
try:
st.session_state.df = load_sheet1()
st.session_state.df1 = load_sheet2()
except:
st.session_state.df = pd.DataFrame()
st.session_state.df1 = pd.DataFrame()
tab1, tab2 = st.tabs(["Sheet 1", "Sheet 2"])
with tab1:
# Get the edited DataFrame from the ag grid object
df = show_table(st.session_state.df)
with tab2:
# Same thing here...
df1 = show_table(st.session_state.df1)
# Then you need to click on a button to make the apply changes and
# reload the page before you go to the next tab
if st.button('Apply changes'):
# Store new edited DataFrames in session state
st.session_state.df = df
st.session_state.df1 = df1
# Rerun the page so that changes apply and new DataFrames are rendered
st.experimental_rerun()
After loading your file and making your changes in the first tab hit the "apply changes" button to reload the page before moving to the second tab.

Related

Streamlit: Display Text after text, not all at once

I want to build a data annotation interface. I read in an excel file, where only a text column is relevant. So "df" could also be replaced by a list of texts. This is my code:
import streamlit as st
import pandas as pd
import numpy as np
st.title('Text Annotation')
df = pd.read_excel('mini_annotation.xlsx')
for i in range(len(df)):
text = df.iloc[i]['Text']
st.write(f"Text {i} out of {len(df)}")
st.write("Please classify the following text:")
st.write("")
st.write(text)
text_list = []
label_list = []
label = st.selectbox("Classification:", ["HOF", "NOT","Not Sure"])
if st.button("Submit"):
text_list.append(text)
label_list.append(label)
df_annotated = pd.DataFrame(columns=['Text', 'Label'])
df_annotated["Text"] = text_list
df_annotated["Label"] = label_list
df_annotated.to_csv("annotated_file.csv", sep=";")
The interface looks like this:
However, I want the interface to display just one text, e.g. the first text of my dataset. After the user has submitted his choice via the "Submit" button, I want the first text to be gone and the second text should be displayed. This process should continue until the last text of the dataset is reached.
How do I do this?
(I am aware of the Error message, for this I just have to add a key to the selectbox, but I am not sure if this is needed in the end)
This task is solved with the help of the session state. U can read about it here: session state
import streamlit as st
import pandas as pd
import numpy as np
st.title('Text Annotation')
df = pd.DataFrame(['Text 1', 'Text 2', 'Text 3', 'Text 4'], columns=['Text'])
if st.session_state.get('count') is None:
st.session_state.count = 0
st.session_state.label_list = []
with st.form("my_form"):
st.write("Progress: ", st.session_state.count, "/", len(df))
if st.session_state.count < len(df):
st.write(df.iloc[st.session_state.count]['Text'])
label = st.selectbox("Classification:", ["HOF", "NOT","Not Sure"])
submitted = st.form_submit_button("Submit")
if submitted:
st.session_state.label_list.append(label)
st.session_state.count += 1
else:
st.write("All done!")
submitted = st.form_submit_button("Submit")
if submitted:
st.write(st.session_state.label_list)

Any way to make my Dash app run faster for a huge amount of data?

I have created a Dash app to read data from a .csv file and represent it, where the user has the option to choose which variable he wants to represent.
The problem I'm facing is that the Dash app keeps freezing or is very slow, most likely due to the amount of sheer data I'm reading (the .csv files I need to read have above 2 million lines).
Is there any way I can make it faster? Maybe optimizing my code in some way?
Any help is appreciated, thanks in advance.
import pandas as pd
import numpy as np
from matplotlib import lines
import plotly.express as px
from dash import Dash, html, dcc,Input, Output
from tkinter import Tk
from tkinter.filedialog import askopenfilename
import webbrowser
print("Checkpoint 1")
def open_file():
global df, drop_list
Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filename = askopenfilename() # show an "Open" dialog box and return the path to the selected file
print(filename)
newfilename = filename.replace('/', '\\\\')
print(newfilename)
df = pd.read_csv ('' + newfilename, sep=";", skiprows=4, skipfooter=2, engine='python') # Read csv file using pandas
# Detect all the different signals in the csv
signals = df["Prozesstext"].unique()
signals = pd.DataFrame(signals) # dataframe creation
signals.sort_values(by=0) # after the dataframe is created it can be sorted
drop_list = [] # list used for the dropdown menu
for each in signals[0]:
drop_list.append(each)
app = Dash(__name__)
fig = px.line([]) #figure starts with an empty chart
open_file()
print("Checkpoint 2")
app.layout = html.Div([
html.H1(id = 'H1', children = 'Reading Data from CSV', style = {'textAlign':'center','marginTop':40,'marginBottom':40}),
dcc.Dropdown(drop_list[:-1],id='selection_box'),
html.Div(id='dd-output-container'),
dcc.Graph(
id='trend1',
figure=fig
)
])
webbrowser.open("http://127.0.0.1:8050", new=2, autoraise=True)
# FIRST CALLBACK
#app.callback(
Output(component_id='trend1',component_property='figure'),
Input('selection_box', 'value'),
prevent_initial_call = True
)
def update_trend1(value):
df2 = df[df['Prozesstext'].isin([value])] #without empty spaces it can be just df.column_name
return px.line(df2, x="Zeitstempel", y="Daten", title=value, markers = True) # line chart
if __name__ == '__main__':
app.run_server()
#app.run_server(debug=True)
I suggest the following:
Build a Multi-value dropdown menu with the names of all columns in the CSV file. Look at this here.
Based on the selected columns by a user, the corresponding data will be imported from the CSV file.
It is not clear in your question how you represented the csv data on Dash. I recommend dbc.Table.
By doing this, you will minimize the cost of reading the entire data from the CSV file.

hvplot not refreshing properly when dependency changes

I am trying to establish a dashboard using jupyter, panel and hvplot. This is the code snippet.
import panel.widgets as pnw
import panel as pn
pn.extension()
# some data cleansing yields a df containing the data of interest
products = df["product_name"].drop_duplicates()
def histogram_of_variant_revenue(selected_product=None):
selection = df[df["product_name"] == selected_product]
selection["revenue"] = selection["revenue"].astype(float)
print(selection.shape)
return selection[["name", "revenue"]].drop_duplicates().set_index("name").hvplot.bar()
def histogram_of_variant_customers(selected_product=None):
selection = df[df["product_name"] == selected_product]
selection["customers"] = selection["customers"].astype(float)
print(selection.shape)
return selection[["name", "customers"]].drop_duplicates().set_index("name").hvplot.bar()
selected_product = pnw.Select(name='Product', options=sorted(list(products)))
customers = pn.bind(histogram_of_variant_customers, selected_product)
revenue = pn.bind(histogram_of_variant_revenue, selected_product)
combined_panel = pn.Column(selected_product, customers, revenue)
combined_panel
At the default suggestion.
After the next use of the drop-down selection. Notice that instead of getting a new chart - the old one seems to have moved to the right and the new one placed into the figure.
Any idea on how I can get a new histogram after selecting from the drop-down?

Select Specific Sheet from Python Imported Excel File

The following code requests the user to select an excel file they'd like to import as a pandas data frame; however, it doesn't provide the ability to select which sheet (if multiple exist):
import pandas as pd
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
path = filedialog.askopenfilename()
x = pd.read_excel(path, sheet_name = 1)
x
Conditions to include in new solution:
If only one sheet exists, automatically select and upload to pandas data frame
If multiple sheets exists, allow user to choose through a dialog box which sheet they'd like to import
The solution offered by #GordonAitchJay, and implementation with Tkinter, is excellent. Definitely the way to go, if you're running the script directly with Python or in an IDE like Spyder.
However, the OP is working in Jupyter and it turns out that Jupyter and Tkinter do not get along very well. The OP expressed difficulties, and while I do get it to work at first, if I push the code for performance, I'm also noticing serious lags and hiccups. This being the case, I thought I would just add a way to make the interaction work smoothly in Jupyter by using the ipywidgets framework.
# Jupyter notebook
import pandas as pd
import ipywidgets as widgets
from IPython.display import clear_output
from ipyfilechooser import FileChooser
from ipywidgets import interact
from pathlib import Path
# get home dir of user
home = str(Path.home())
# initialize a dict for the excel file; this removes the need to set global values
dict_file = {}
# change to simply `home` if you want users to navigate through diff dirs
fc = FileChooser(f'{home}/excel')
# same here
fc.sandbox_path = f'{home}/excel'
# limit file extensions to '.xls, .xlsb, .xlsm, .xlsx'
fc.filter_pattern = ['*.xls*']
fc.title = '<b>Select Excel file</b>'
display(fc)
# create empty dropdown for sheet names
dropdown = widgets.Dropdown(options=[''], value='', description='Sheets:', disabled=False)
# create output frame for the df
out = widgets.Output(layout=widgets.Layout(display='flex', flex_flow='column', align_items='flex-start', width='100%'))
# callback func for FileChooser
def get_sheets(chooser):
# (re)populate dict
dict_file.clear()
dict_file['file'] = pd.ExcelFile(fc.value)
sheet_names = dict_file['file'].sheet_names
# only 1 sheet, we'll print this one immediate (further below)
if len(sheet_names) == 1:
# set value of the dropdown to this sheet
dropdown.options = sheet_names
dropdown.value = sheet_names[0]
# disable the dropdown; so it's just showing the selection to the user
dropdown.disabled = True
else:
# append empty string and set this as default; this way the user must always make a deliberate choice
sheet_names.append('')
dropdown.options = sheet_names
dropdown.value = sheet_names[-1]
# allow selection by user
dropdown.disabled = False
return
# bind FileChooser to callback
fc.register_callback(get_sheets)
# prompt on selection sheet
def show_df(sheet):
if sheet == '':
if out != None:
# clear previous df, when user selects a new wb
out.clear_output()
else:
# clear previous output 'out' frame before displaying new df, else they'll get stacked
out.clear_output()
with out:
df = dict_file['file'].parse(sheet_name=sheet)
if len(df) == 0:
# if sheet is empty, let the user know
display('empty sheet')
else:
display(df)
return
# func show_df is called with input of widget as param on selection sheet
interact(show_df, sheet=dropdown)
# display 'out' (with df)
display(out)
Snippet of interaction in notebook:
If that's all you need tkinter for, this will do.
It shows a simple combobox with the sheetnames. In this case, the first sheetname is named Orders.
As soon as you select an item, the window closes and it parses that sheet.
import pandas as pd
import tkinter as tk
from tkinter import ttk, filedialog
root = tk.Tk()
root.withdraw()
# path = filedialog.askopenfilename()
# limit user input to Excel file (or path == '' in case of "Cancel")
path = filedialog.askopenfilename(filetypes = [('Excel files', '*.xls*')])
# if user didn't cancel, continue
if path != '':
# Get the sheetnames first without parsing all the sheets
excel_file = pd.ExcelFile(path)
sheet_names = excel_file.sheet_names
sheet_name = None
if len(sheet_names) == 1:
sheet_name = sheet_names[0]
elif len(sheet_names) > 1:
# Show the window again
root.deiconify()
root.minsize(280, 30)
root.title('Select sheet to open')
# Create a combobox with the sheetnames as options to select
combotext = tk.StringVar(value=sheet_names[0])
box = ttk.Combobox(root,
textvariable=combotext,
values=sheet_names,
state='readonly')
box.pack()
# This function gets called when you select an item in the combobox
def callback_function(event):
# Mark sheet_name as global so it doesn't just make a new local variable
global sheet_name
sheet_name = combotext.get()
# Close tkinter so Python can continue execution after root.mainloop()
root.destroy()
root.bind('<<ComboboxSelected>>', callback_function)
root.mainloop()
# Finally, parse the selected sheet
# This is equivalent to pd.read_excel
df = excel_file.parse(sheet_name=sheet_name)

How to update existing plot with Panel?

I have a dashboard application that works with Bokeh. I am trying to change it to use Panel and Geoviews. I am using the Panel Callback API, as this seems most like my existing code with Bokeh. I am running a regular Python script with the Panel server.
When my callback creates the new plot for the widgets selection then Panel displays an additional plot instead of updating the existing plot. Using "servable" causes an additional plot in the existing browser window, using "show" displays an additional window. How do I update the existing plot?
Here is some test code. (The full application displays a choropleth map with Geo data, and has many more widgets with code that reads different data, but this code illustrates the problem.)
import census_read_data as crd
import census_read_geopandas as crg
import pandas as pd
import geopandas as gpd
import geoviews as gv
from bokeh.plotting import show
from bokeh.models import PrintfTickFormatter
import panel as pn
import hvplot.pandas
# Get Census Merged Ward and Local Authority Data
# Replaced by test DataFrame
geography = pd.DataFrame(data=[
['E36007378', 'Chiswick Riverside', 'E09000018', 'Hounslow'],
['E36007379', 'Cranford', 'E09000018', 'Hounslow'],
['E36007202', 'Ealing Broadway', 'E09000009', 'Ealing'],
['E36007203', 'Ealing Common', 'E09000009', 'Ealing'],
['E36007204', 'East Acton', 'E09000009', 'Ealing'],
['E09000018', 'Hounslow', 'E09000018', 'Hounslow'],
['E09000009', 'Ealing', 'E09000009', 'Ealing']
], columns=["GeographyCode", "Name", "LAD11CD", "LAD11NM"])
# Get London Ward GeoPandas DataFrame
# Replaced by test DataFrame
london_wards_data_gdf = pd.DataFrame(data=[
['E36007378', 'E09000018', 378],
['E36007379', 'E09000018', 379],
['E36007202', 'E09000009', 202],
['E36007203', 'E09000009', 203],
['E36007204', 'E09000009', 204]
], columns=["cmwd11cd", "lad11cd", "data"])
# Get LAD GeoPandas DataFrame
# Replaced by test DataFrame
london_lads_data_gdf = pd.DataFrame(data=[
['E09000018', 757],
['E09000009', 609]
], columns=["lad11cd", "data"])
locationcol = "GeographyCode"
namecol = "Name"
datacol = 'data'
# Panel
pn.extension('bokeh')
gv.extension('bokeh')
lad_max_value = london_lads_data_gdf[datacol].max()
ward_max_value = london_wards_data_gdf[datacol].max()
title = datacol + " by Local Authority"
local_authorities = geography['LAD11CD'].unique()
granularities = ['Local Authorities', 'Wards']
# Create Widgets
granularity_widget = pn.widgets.RadioButtonGroup(options=granularities)
local_authority_widget = pn.widgets.Select(name='Wards for Local Authority',
options=['All'] +
[geography[geography[locationcol] == lad][namecol].iat[0]
for lad in local_authorities],
value='All')
widgets = pn.Column(granularity_widget, local_authority_widget)
layout = widgets
def update_graph(event):
# Callback recreates map when granularity or local_authority are changed
global layout
granularity = granularity_widget.value
local_authority_name = local_authority_widget.value
print(f'granularity={granularity}')
if granularity == 'Local Authorities':
gdf = london_lads_data_gdf
max_value = lad_max_value
title = datacol + " by Local Authority"
else:
max_value = ward_max_value
if local_authority_name == 'All':
gdf = london_wards_data_gdf
title = datacol + " by Ward"
else:
local_authority_id = geography[geography['Name'] ==
local_authority_name].iloc[0]['GeographyCode']
gdf = london_wards_data_gdf[london_wards_data_gdf['lad11cd'].str.match(
local_authority_id)]
title = datacol + " by Ward for " + local_authority_name
# Replace gv.Polygons with hvplot.bar for test purposes
map = gdf.hvplot.bar(y=datacol, height=500)
layout = pn.Column(widgets, map)
# With servable, a new plot is added to the browser window each time the widgets are changed
# layout.servable()
# With servable, a new browser window is shown each time the widgets are changed
layout.show()
granularity_widget.param.watch(update_graph, 'value')
local_authority_widget.param.watch(update_graph, 'value')
update_graph(None)
# panel serve panel_test_script.py --show
I ended up implementing my solution using Params, rather than callbacks, which worked great. However, I eventually saw Dynamically updating a Holoviz Panel layout which showed me a solution to my original question.
The callback should not show() a new layout (with the new map), but should simply update the existing layout, replacing the existing map with the new map. As I write this it seems obvious!
This code fragment shows the solution:
...
widgets = pn.Column(granularity_widget, local_authority_widget)
empty_map = pn.pane.Markdown('### Map placeholder...')
layout = pn.Column(widgets, empty_map)
def update_graph(event):
...
# Replace gv.Polygons with hvplot.bar for test purposes
map = gdf.hvplot.bar(y=datacol, height=500)
# Update existing layout with new map
layout[1] = map
granularity_widget.param.watch(update_graph, 'value')
local_authority_widget.param.watch(update_graph, 'value')
# Invoke callback to show map for initial widget values
update_graph(None)
layout.show()

Categories

Resources