I am trying to build an annotation interface using streamlit.
In my dataset, each data point may have multiple labels (i.e. labels in the code below). However, I could only select one label using st.multiselect() rather than the expected "multiple select". Specifically, every time I click the one of the choices, the page will be updated and the next data point pops up.
I am not sure what went wrong after getting stuck in this for hours. Could anyone provide any pointers for me?
import pandas as pd
import streamlit as st
df = pd.read_pickle("unlabeled.pkl")
records = df.to_dict("records")
if "annotations" not in st.session_state:
st.session_state.records = records
st.session_state.current_record = records[0]
annotated_data = list()
if st.session_state.records:
labels = st.session_state.current_record["labels"]
example = st.session_state.current_record["example"]
text = st.session_state.current_record["text"]
demo = "\n".join(["- {}".format(ee) for ee in example])
text = "- {}".format(text)
st.write(f"# Example\n{demo}\n# Output\n{text}")
labels = st.multiselect(
label="Select Labels",
options=labels
)
st.write('You Selected:', labels)
if st.button("Save"):
st.session_state.records.remove(st.session_state.current_record)
st.session_state.current_record = st.session_state.records[0]
annotated_data.append(
{
**st.session_state.current_record,
"label": labels
}
)
if len(annotated_data) % 50 == 0:
save_data(annotated_data)
save_data(annotated_data)
You can use a form to prevent the app from rerunning after interacting with the multiselect widget (instead, the app won't rerun until you hit "Submit").
Related
I am well and truly baffled. First time using the dropdown widget so forgive me if this is obvious and thank you for any help you can provide.
Here is the dataframe I want to display and how it was built:
def top_10_venues(data) :
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
try:
columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
except:
columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = data['Neighborhood']
for ind in np.arange(denver_grouped.shape[0]):
neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(data.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted = neighborhoods_venues_sorted.set_index(['Neighborhood'])
top_10_venues(denver_grouped)
neighborhoods_venues_sorted
Here is my dropdown widget:
#Experimenting with Jupyter dropdown
filtered_df = None
dropdown = widgets.SelectMultiple(
options=neighborhoods_venues_sorted.index,
description='Venue',
disabled=False,
layout={'height':'100px', 'width':'40%'})
def max_density(widget):
global filtered_df
selection = list(widget['new'])
with out:
clear_output()
display(neighborhoods_venues_sorted[selection])
filtered_df = neighborhoods_venues_sorted[selection]
out = widgets.Output()
dropdown.observe(filter_dataframe, names='value')
display(dropdown)
display(out)
Here is what I end up seeing, the unformatted dataframe I ran the function on?
Booyah, figured it out!
Seems my issue was a misunderstanding of what was happening within the cell that created neighborhoods_venues_sorted. I thought I was creating a dataframe. Instead I created a function
First is the sort function
def return_most_common_venues(row, num_top_venues):
row_categories = row.iloc[1:]
row_categories_sorted = row_categories.sort_values(ascending=False)
return row_categories_sorted.index.values[0:num_top_venues]
This is the new function instead of a block of code in a cell
#Function to create sorted data frame with top 10 most common venues
def top_ten_venues(df) :
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
try:
columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
except:
columns.append('{}th Most Common Venue'.format(ind+1))
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = df['Neighborhood']
for ind in np.arange(denver_grouped.shape[0]):
neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df.iloc[ind, :], num_top_venues)
#important to have a return in a function, this is the output that can be attached to a variable
return neighborhoods_venues_sorted
Next I ran it on my targeted dataframe and assigned it to a variable. This fixed my issue, I'm still too new to understand fully why when this exact same code was run in a cell it refused to assign it as a new dataframe.
#creating a variable to hold the df for later access
neighborhoods_venues_sorted = top_ten_venues(denver_grouped)
#reindexing because it's fun
neighborhoods_venues_sorted = neighborhoods_venues_sorted.set_index(['Neighborhood'])
The Problem
I wanted to create an interactive hbar plot, where you can switch between 3 different data sources, using a select widget, a python callback and a local bokeh serve. The plot with the default source renders fine, but when I switch to a different source, the y labels stay the same and the plot turns blank. Changing back to the original value on the select widget does not show the plot I started out with and stays blank.
When I hard-code the inital source to a different one in the code, it renders just fine until I switch it by using the widget again, so the data itself seems to work fine individually.
Am I missing something? I read through many threads, docs and tutorials but can't find anything wrong with my code.
Here is what I have done so far:
I read a .csv and create 3 seperate dataframes and then convert then to columndatasources. Every source has 10 data entries with the columns "species", "ci_lower" and "ci_upper".
Here is an example of one source (all three are built exactly the same way, with different taxon classes):
df = pd.read_csv(os.path.join(os.path.dirname(__file__), "AZA_MLE_Jul2018_utf8.csv",), encoding='utf-8')
m_df = df[df["taxon_class"]=="Mammalia"]
m_df = m_df.sort_values(by="mle", ascending=False)
m_df = m_df.reset_index(drop=True)
m_df = m_df.head(10)
m_df = m_df.sort_values(by="species", ascending=False)
m_df = m_df.reset_index(drop=True)
m_source = bp.ColumnDataSource(m_df)
I saved all 3 sources in a dict:
sources_dict={
"Mammalia": m_source,
"Aves": a_source,
"Reptilia": r_source
}
... and then created my variable called "source" that should change interactively with the "Mammalia" source as default:
source = sources_dict["Mammalia"]
Next I created a figure and added a hbar plot with the source variable as follows:
plot = bp.figure(x_range=(0, np.amax(source.data["ci_upper"])+5), y_range=source.data["species"])
plot.hbar(y="species", right="ci_lower", left="ci_upper", height=0.5, fill_color="#b3de69", source=source)
Then I added the select widget with a python callback:
def select_handler(attr, old, new):
source.data["species"]=sources_dict[new].data["species"]
source.data["ci_lower"]=sources_dict[new].data["ci_lower"]
source.data["ci_upper"]=sources_dict[new].data["ci_upper"]
select = Select(title="Taxonomic Class:", value="Mammalia", options=list(sources_dict.keys()))
select.on_change("value", select_handler)
curdoc().add_root(bk.layouts.row(plot, select))
I tried this:
My suspicion was that the error lies within the callback function, so I tried many different variants, all with the same bad result. I will list some of them here:
I tried using a python native dictionary:
new_data= {
'species': sources_dict[new].data["species"],
'ci_lower': sources_dict[new].data["ci_lower"],
'ci_upper': sources_dict[new].data["ci_upper"]
}
source.data=new_data
I tried assigning the whole data source, not just swapping the data
source=sources_dict[new]
I also tried using dict()
source.data = dict(species=sources_dict[new].data["species"], ci_lower=sources_dict[new].data["ci_lower"], ci_upper=sources_dict[new].data["ci_upper"])
Screenshots
Here is a screenshot of the initial plot, when I run the py file with bokeh serve --show file.py
And here one after changing the selected value:
Would greatly appreaciate any hints that could help me figure this out
Answering your question in the comment, changing data does not change the ranges because y_range=some_thing is just a convenience over creating a proper range class that's done behind the curtain.
Here's how you can do it manually. Notice that I don't touch x_range at all - by default it's DataRange1d that computes its start/end values automatically.
from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models import Select, ColumnDataSource
from bokeh.plotting import figure
d1 = dict(x=[0, 1], y=['a', 'b'])
d2 = dict(x=[8, 9], y=['x', 'y'])
ds = ColumnDataSource(d1)
def get_factors(data):
return sorted(set(data['y']))
p = figure(y_range=get_factors(d1))
p.circle(x='x', y='y', source=ds)
s = Select(options=['1', '2'], value='1')
def update(attr, old, new):
if new == '1':
ds.data = d1
else:
ds.data = d2
p.y_range.factors = get_factors(ds.data)
s.on_change('value', update)
curdoc().add_root(column(p, s))
I'm writing a function for Jupyter notebooks, where a user will be able to obtain data as Pandas Dataframe (irrelevant for this question) and display it with filters he could create when needed.
My problem is that I can't "link" the interact filter to the data itself. I had no problems to define manually the filters in the code, but not from the user side. I researched a lot questions from Stackoverflow, Google, project Github and docs before posting here.
Here is the POC:
import pandas as pd
import ipywidgets as widgets
from ipywidgets import *
from IPython.display import display
import numpy as np
np.random.seed(0)
# Data example
my_columns = list('ABCD')
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
# Encapsulate Table inside Output widget
table_out = widgets.Output()
with table_out:
display(df)
# Filter for ints
def filter_int(column, x):
return df.loc[df[column] > x]
# Our filter generator
def generate_filter(button):
# Check if exist before creating
if not select_definition.value in [ asd.children[0].description for asd in filters.children ]:
# Not exist. Create this filter
new_filter = interactive(
filter_int, # Our filter for ints
column=fixed(select_definition.value), # Which column as filter
x=widgets.IntSlider(min=0, max=100, step=1, value=10, description=select_definition.value) # Value from the user
)
# Append created filter
filters.children=tuple(list(filters.children) + [new_filter])
# Define button and event
button = widgets.Button(description="Add")
button.on_click(generate_filter)
# Define Dropdown
select_definition = widgets.Dropdown(options=my_columns, layout=Layout(width='10%'))
# Put Dropdown and button together
choose_filter = widgets.HBox([select_definition, button])
# Where we will put all our filters
filters = widgets.HBox()
display(choose_filter, filters, table_out)
Which will create this:
I'm able to create the filters for the columns dynamically, but I'm not sure how to make them update the table and link together (so the table will be updated based on multiple filters).
The expected result is to be able create filters for column A and B and update the table with values defined by them, as shown in the image below:
Any help is appreciated!
Note: The last image was generated with df.loc[(df['A'] > 22) & (df['B'] > 92)]
I have a Holoviews code with the intent of saving the output as .html. The below works fine i.e. html is genereated and tags are renders but filters don't work. What am I doing wrong?
def load_data(country, lan_name, **kwargs):
df = subset
if country != 'ALL':
df = df[(df.country == country)]
if lan_name != 'ALL':
df = df[(df.lan_name == lan_name)]
table = format_chars(df['term'], df['hex'])
#hv.Table(df, ['country', 'lan_name'], [], label='Data Table')
layout = (table).opts(
opts.Layout(merge_tools=False),
opts.Div(width=700, height=400),
)
return layout
methods = ['ALL'] + sorted(list(subset['country'].unique()))
models = ['ALL'] + sorted(list(subset['lan_name'].unique()))
dmap = hv.DynamicMap(load_data, kdims=['country', 'lan_name']).redim.values(country=methods, lan_name=models)
hv.save(dmap, 'output.html', backend='bokeh')
By "filters" it sounds like you mean the widgets that select along the country and lan_name dimensions. Each time you select a new value of a widget, a DynamicMap calls the Python function that you provide it (load_data here) to calculate the display (which is what makes it "Dynamic"). There is no Python process available when you have a static HTML file, so the display will never get updated in that case.
To make some limited functionality available in a static HTML file, you can convert the DynamicMap to a HoloMap that contains all the displayed items for some specific combinations of widget values (http://holoviews.org/user_guide/Live_Data.html#Converting-from-DynamicMap-to-HoloMap). The resulting parameter space can quickly get quite large, so you will often need to select a feasible subset of values for this to be a practical option.
I try to build a heat map by using bokeh. However I keep getting the same error. I'll include both my code and error below, please help me out!
I assumed that the error is mainly about Nan's in my data, so I've added necessary if statements to the code to make sure that this issue is addressed. Even tried to fill any possible Na's with zero in the following lists: 'user','module','ratio','color', and 'alpha'. However none of these changes helped.
colors = ['#ff0000','#ff1919','#ff4c4c','#ff7f7f','#99cc99','#7fbf7f','#4ca64c','#329932','#008000'] sorted_userlist = list(total_checks_sorted.index) user = [] module = [] ratio = [] color = [] alpha = []
for m_id in ol_module_ids:
pset = m_id.split('/')[-1]
col_name1 = m_id + '_ratio'
col_name2 = m_id + '_total'
min_checks = min(check_matrix[col_name2].values)
max_checks = max(check_matrix[col_name2].values)
for i, u in enumerate(sorted_userlist):
module.append(pset)
user.append(str(i+1))
ratio_value = check_matrix[col_name1][u]
ratio.append(ratio_value)
al= math.sqrt((check_matrix[col_name2][u]-min_checks+0.0001)/float(max_checks))
if ratio_value>0.16:
al = min(al*100,1)
alpha.append(al)
if np.isnan(ratio_value):
color.append(colors[0])
else:
color.append(colors[int(ratio_value*8)])
#fill NAs in source lists with zeroes pd.Series(ratio).fillna(0).tolist()
col_source = ColumnDataSource(data = dict(module = module, user = user, color=color, alpha=alpha, ratio = ratio))
#source = source.fillna('')
#TOOLS = "resize,hover,save,pan,box_zoom,wheel_zoom" TOOLS = "reset,hover,save,pan,box_zoom,wheel_zoom"
p=figure(title="Ratio of Correct Checks Each Student Each Online Homework Problem",
x_range=pset,
#y_range = list(reversed(sorted_userlist)),
y_range=list(reversed(list(map(str, range(1,475))))),
x_axis_location="above", plot_width=900, plot_height=4000,
toolbar_location="left", tools=TOOLS)
#axis_line_color = None)
#outline_line_color = None)#
p.rect("module", "user", 1, 1, source=col_source,
color="color", alpha = 'alpha', line_color=None)
show(p)
NaN values are not JSON serializable (this is a glaring deficiency in the JSON standard). You mentioned there are NaN values in the ratio list, which you are putting in the ColumnDataSource here:
col_source = ColumnDataSource(data=dict(..., ratio=ratio))
Since it is in the CDS, Bokeh will try to serialize it, resulting in the error. You have two options:
If you don't actually need the numeric ratio values in the plot for some reason (e.g. to drive a hover tool or custom mapper or something), then just leave it out of the data source.
If you do need to send the ratio values, then you must put the data into a NumPy array. Bokeh serializes NumPy arrays using a different, non-JSON approach, so it is then possible to send NaNs successfully.