I'm trying to make an interactive plot with ipywidgets using plotly, but I'm afraid i'm not getting something.
I have some dataframe with coordinates and some columns. I'd want to plot the dataframe in a scatterplot so that coord1=x, coord2=y and each marker point is colored by the value of a column selected by a column selected interactively.
Additionally I'd want that when I change the column value with the interactive menu, the color for every point changes to the column that i selected, rescaling the min and max of the colorbar accordingly to the min and max of the new column.
Furthermore, when I change another selector (selector2) then i want the plot to display only the subset of mu dataframe that matched a certain colID big_grid[big_grid["id_col"]==selector2.value].
Lastly there should be a rangeslider widget to adjust the color range of the colorbar
so by now i have this
big_grid=pd.DataFrame(data=dict(id_col=[1,2,3,4,5],
col1=[0.1,0.2,0.3,0.4,0.5],
col2=[10,20,30,40,50],
coord1=[6,7,8,9,10],
coord2=[6,7,8,9,10]))
list_elem=["col1","col2"]
list_id=big_grid.id_col.values
dropm_elem=widgets.Dropdown(options=list(list_elem))
dropm_id=widgets.SelectMultiple(
options=list_id,
description="Active",
disabled=False
)
rangewidg=widgets.FloatRangeSlider(value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()],
min=big_grid[dropm_elem.value].min(),
max=big_grid[dropm_elem.value].max(),
step=0.001,
readout_format='.3f',
description="Color Range",
continuous_update=False)
fig = go.FigureWidget(data=px.scatter(big_grid,
x="coord1",
y="coord2",
color=big_grid[dropm_elem.value],
color_continuous_scale="Turbo",)
)
def handle_id_change(change):
fig.data[0]['x']=big_grid[big_grid['id_col'].isin(dropm_id.value)]["coord1"]
fig.data[0]['y']=big_grid[big_grid['id_col'].isin(dropm_id.value)]["coord2"]
fig.data[0]['marker']['color']=big_grid[big_grid['id_col'].isin(dropm_id.value)][dropm_elem.value]
fig.data[0]['marker']['cmin']=big_grid[big_grid['id_col'].isin(dropm_id.value)][dropm_elem.value].min()
fig.data[0]['marker']['cmax']=big_grid[big_grid['id_col'].isin(dropm_id.value)][dropm_elem.value].max()
def handle_elem_change(change):
fig.data[0]['marker']['color']=big_grid[big_grid['id_col'].isin(dropm_id.value)][dropm_elem.value]
dropm_elem.observe(handle_elem_change,names='value')
dropm_id.observe(handle_id_change,names='value')
right_box1 =widgets.HBox([fig])
right_box2=widgets.VBox([dropm_elem,dropm_id,rangewidg])
box=widgets.HBox([right_box1,right_box2])
box
So, like this the selection of the subset (from dropm_id) works, but the rangewidget and the hovering are broken. Basically when i change dromp_elem the color doesn't adjust as i am expecting, and instead it gets dark and uniform. At the same time if you change column and you hover over the points it lists the value of col2, but the label still says col1.
I'm afraid that I'm overcomplicating my life and there is surely an easier way, could someone enlighten me?
EDIT: If I use a different approach and I use a global variable to define the subset to plot, a plotting function and a the widget.interact function I can make it work. The problem is that in this case the plot is not a widget, so i cannot put it into a VBox or HBox.
It also still feels wrong and using global variables is not grood practice. I'll provide the code anyway for reference:
def plot(elem,rang):
fig = px.scatter(subset, x="coord1", y="coord2", color=elem,color_continuous_scale="Turbo",range_color=rang)
fig.show()
def handle_elem_change(change):
with rangewidg.hold_trait_notifications(): #This is because if you do't put it it set max,
rangewidg.max=big_grid[dropm_elem.value].max() #and if max is < min he freaks out. Like this he first
rangewidg.min=big_grid[dropm_elem.value].min() #set everything and then send the eventual errors notification.
rangewidg.value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()]
def handle_id_change(change):
global subset
subset=big_grid[big_grid['id_col'].isin(dropm_id.value)]
big_grid=pd.DataFrame(data=dict(id_col=[1,2,3,4,5],
col1=[0.1,0.2,0.3,0.4,0.5],
col2=[10,20,30,40,50],
coord1=[6,7,8,9,10],
coord2=[6,7,8,9,10]))
subset=big_grid
list_elem=["col1","col2"]
list_id=big_grid.id_col.values
dropm_elem=widgets.Dropdown(options=list(list_elem))
dropm_id=widgets.SelectMultiple(
options=list_id,
description="Active",
disabled=False
)
rangewidg=widgets.FloatRangeSlider(value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()],
min=big_grid[dropm_elem.value].min(),
max=big_grid[dropm_elem.value].max(),
step=0.001,
readout_format='.3f',
description="Color Range",
continuous_update=False)
dropm_elem.observe(handle_elem_change,names='value')
dropm_id.observe(handle_id_change,names='value')
display(dropm_id)
widgets.interact(plot,elem=dropm_elem,rang=rangewidg)
So, I would want the behaviour of this second code, but in a widget.Hbox, ans possibly without using global variables
UPDATE: I manage to get a working version using the following code:
def handle_elem_change(change):
with rangewidg.hold_trait_notifications(): #This is because if you do't put it it set max,
rangewidg.max=big_grid[dropm_elem.value].max() #and if max is < min he freaks out. Like this he first
rangewidg.min=big_grid[dropm_elem.value].min() #set everything and then send the eventual errors notification.
rangewidg.value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()]
def plot_change(change):
df=big_grid[big_grid['id_col'].isin(dropm_id.value)]
output.clear_output(wait=True)
with output:
fig = px.scatter(df, x="coord1", y="coord2", color=dropm_elem.value,hover_data=["info"],
width=500,height=800, color_continuous_scale="Turbo",range_color=rangewidg.value)
fig.show()
#define the widgets dropm_elem and rangewidg, which are the possible df.columns and the color range
#used in the function plot.
big_grid=pd.DataFrame(data=dict(id_col=[1,2,3,4,5],
col1=[0.1,0.2,0.3,0.4,0.5],
col2=[10,20,30,40,50],
coord1=[6,7,8,9,10],
coord2=[6,7,8,9,10],
info=["info1","info2","info3","info4","info5",]))
list_elem=["col1","col2","info"]
list_id=big_grid.id_col.values
dropm_elem=widgets.Dropdown(options=list_elem) #creates a widget dropdown with all the _ppms
dropm_id=widgets.SelectMultiple(
options=list_id,
description="Active Jobs",
disabled=False
)
rangewidg=widgets.FloatRangeSlider(value=[big_grid[dropm_elem.value].min(),big_grid[dropm_elem.value].max()],
min=big_grid[dropm_elem.value].min(),
max=big_grid[dropm_elem.value].max(),
step=0.001,
readout_format='.3f',
description="Color Scale Range",
continuous_update=False)
output=widgets.Output()
# this line is crucial, it basically says: Whenever you move the dropdown menu widget, call the function
# #handle_elem_change, which will in turn update the values of rangewidg
dropm_elem.observe(handle_elem_change,names='value')
dropm_elem.observe(plot_change,names='value')
dropm_id.observe(plot_change,names='value')
rangewidg.observe(plot_change,names='value')
# # #this line is also crucial, it links the widgets dropmenu and rangewidg with the function plot, assigning
# # #to elem and to rang (parameters of function plot) the values of dropmenu and rangewidg
left_box = widgets.VBox([output])
right_box =widgets.VBox([dropm_elem,rangewidg,dropm_id])
tbox=widgets.HBox([left_box,right_box])
# widgets.interact(plot,elem=dropm_elem,rang=rangewidg)
display(tbox)
This way everything works, but I basically need to create a new dataframe every time that I move anything. It might not be very efficient for big dataframes, but it runs.
Related
Here is the code of my plot.
input_dropdown = alt.binding_select(options=[None]+all_ids, name='Series ID', labels=["All"]+all_ids)
selection = alt.selection_single(fields=['ID'], bind=input_dropdown)
chart = alt.Chart(source_df).mark_line().encode(
x=alt.X('Date:T', title='Date'),
y=alt.Y('Value:Q', title='Value'),
color = alt.Color('ID:N', title='Series ID'),
strokeDash='Type:N'
).properties(
width=700
).add_selection(
selection
).transform_filter(
selection
)
st.altair_chart(chart)
Currently I can filter data displayed by choosing one value of ID column.
What should I do to filter by multiple ID values?
Smth like, show me the data for both ids '1' and '2'.
This is currently not possible via a widget like a dropdown because it is not implemented in the underlying Vega and Vega-Lite libraries. You could another chart as a selection element, or maybe use a streamlit component since it looks like your code is using that library already.
I would like to find a way to modify the labels on holoviews sankey diagrams that they show, in addition to the numerical values, also the percentage values.
For example:
import holoviews as hv
import pandas as pd
hv.extension('bokeh')
data = {'A':['XX','XY','YY','XY','XX','XX'],
'B':['RR','KK','KK','RR','RK','KK'],
'values':[10,5,8,15,19,1]}
df = pd.DataFrame(data, columns=['A','B','values'])
sankey = hv.Sankey(df)
For 'From' label 'YY' which is 'YY - 8' change this to 'YY - 8 (13.7%)' - add the additional percentage in there.
I have found ways to change from the absolute value to percentage by using something along the lines of:
value_dim = hv.Dimension('Percentage', unit='%')
But can't find a way to have both values in the label.
Additionally, I tried to modify the hover tag. In my search to find ways to modify this I found ways to reference and display various attributes in the hover information (through the bokeh tooltips) but it does not seem like you can manipulate this information.
In this post two possible ways are explained how to achive the wanted result. Let's start with the example DataFrame and the necessary imports.
import holoviews as hv
from holoviews import opts, dim # only needed for 2. solution
import pandas as pd
data = {'A':['XX','XY','YY','XY','XX','XX'],
'B':['RR','KK','KK','RR','RK','KK'],
'values':[10,5,8,15,19,1],
}
df = pd.DataFrame(data)
1. Option
Use hv.Dimension(spec, **params), which gives you the opportunity to apply a formatter with the keyword value_format to a column name. This formatter is simple the combination of the value and the value in percent.
total = df.groupby('A', sort=False)['values'].sum().sum()
def fmt(x):
return f'{x} ({round(x/total,2)}%)'
hv.Sankey(df, vdims = hv.Dimension('values', value_format=fmt))
2. Option
Extend the DataFrame df by one column wich stores the labels, you want to use. This can be later reused inside the Sankey, with opts(labels=dim('labels')). To ckeck if the calculations are correct, you can turn show_values on, but this will cause a duplicate inside the labels. Therefor in the final solution show_values is set to False. This can be sometime tricky to find the correct order.
labels = []
for item in ['A', 'B']:
grouper = df.groupby(item, sort=False)['values']
total_sum = grouper.sum().sum()
for name, group in grouper:
_sum = group.sum()
_percent = round(_sum/total_sum,2)
labels.append(f'{name} - {_sum} ({_percent}%)')
df['labels'] = labels
hv.Sankey(df).opts(show_values=False, labels=dim('labels'))
The downside of this solution is, that we apply a groupby for both columns 'A' and 'B'. This is something holoviews will do, too. So this is not very efficient.
Output
Comment
Both solutions create nearly the same figure, except that the HoverTool is not equal.
I have a DataFrame with the variables below. I am trying to find the relationship by plotting "profit" with other variables excluding "Date".
Date
Billable_Fixed Bid
Billable_Time_Material
Billable_Transaction_Based
Non_Billable
Indirect_Costs
Unbilled_CP_and_AM
Direct_Costs
Profit
Code:
cols = [
'Billable_Fixed Bid',
'Billable_Time_Material',
'Billable_Transaction_Based',
'Non_Billable',
'Unbilled_CP_and_AM',
'Direct_Costs'
]
sns.pairplot(data1,x_vars=cols,y_vars='Profit',size =5,kind='reg')
Problem is the plots are getting displayed in a single line, which is not clearly visible.
I want it to display 2 plots per line so that it is clearly visible.
Can anyone help?
As per the comment: Using FacetGrid with col_wrap=2 will solve your problem. Check the examples in the documentation.
I love how easy it is to set up basic hover feedback with HoverTool, but I'm wrestling with a couple aspects of the display. I have time-series data, with measurements that represent amounts in US$. This data starts out life as a pandas.Series. Legible plotting is easy (following assumes jupyter notebook):
p = figure(title='Example currency', x_axis_type='datetime',
plot_height=200, plot_width=600, tools='')
p.line(my_data.index, my_data)
p.yaxis[0].formatter = NumeralTickFormatter(format='$0,0')
show(p)
This shows me the time-series, with date-formatting on the x-axis and y-axis values that look like "$150,000", "$200,000", "$250,000", etc. I have two questions about HoverTool behavior:
Controlling formatting for $x and $y.
Accessing the name of the dataset under the cursor.
Simply adding a HoverTool allows me to see values, but in unhelpful units:
p.add_tools(HoverTool())
The corresponding tooltip values with these defaults show "1.468e+5" rather than "$146,800" (or even "146800", the underlying Series value); similarly, the date value appears as "1459728000000" rather than (say) "2016-04-04". I can manually work around this display issue by making my pandas.Series into a ColumnDataSource and adding string columns with the desired formatting:
# Make sure Series and its index have `name`, before converting to DataFrame
my_data.name = 'revenue'
my_data.index.name = 'day'
df = my_data.reset_index()
# Add str columns for tooltip display
df['daystr'] = df['day'].dt.strftime('%m %b %Y')
df['revstr'] = df['revenue'].apply(lambda x: '${:,d}'.format(int(x)))
cds = ColumnDataSource(df)
p = figure(title='Example currency', x_axis_type='datetime',
plot_height=200, plot_width=600, tools='')
p.line('day', 'revenue', source=cds)
p.yaxis[0].formatter = NumeralTickFormatter(format='$0,0')
p.add_tools(HoverTool(tooltips=[('Amount', '#revstr'), ('Day', '#daystr')]))
show(p)
but is there a way to handle the formatting in the HoverTool configuration instead? That seems much more desirable than all the data-set transformation that's required above. I looked through the documentation and (quickly) scanned through the source, and didn't see anything obvious that would save me from building the "output" columns as above.
Related to that, when I have several lines in a single plot, is there any way for me to access the name (or perhaps legend value) of each line within HoverTool.tooltips? It would be extremely helpful to include something in the tooltip to differentiate which dataset values are coming from, rather than needing to rely on (say) line-color in conjunction with the tool-tip display. For now, I've added an additional column to the ColumnDataSource that's just the string value I want to show; that obviously only works for datasets that include a single measurement column. When multiple lines are sharing an underlying ColumnDataSource, it would be sufficient to access the column-name that's provided to y.
Hey i know its 2 years late but this is for other people that come across this
p.add_tools(HoverTool(
tooltips=[
('Date', '#Date{%F}'),
('Value', '#Value{int}')],
formatters={
'Date':'datetime',
'Value':'numeral'},mode='vline'
))
I have an dataframe called afplot:
apple_fplot = apple_f1.groupby(['Year','Domain Category'])['Value'].sum()
afplot = apple_fplot.unstack('Domain Category')
I now need to produce a plot for each column of afplot, and need to save each plot to a unique filename.
I've been trying to do this through a for loop, (I know thats inefficient) but can't seem to get it right.
for index, column in afplot.iteritems():
plt.figure(index); afplot[column].plot(figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Fungicide used / lb')
plt.title('Amount of fungicides used on apples in the US')
plt.legend()
plt.savefig('C:/Users/User/Documents/Work/Year 3/Project/Plots/apple_fplot{}'.format(index))
I'm not sure if I'm going about this the right way, but the whole idea is to have the plot be reset each time it goes through the iteration, plotting only the next column's line plot, and then saves it to a new filename.
The df.iteritems() iterator returns (column name, series) pairs ([see docs])1. So you can simplify:
for col, data in afplot.iteritems():
ax = data.plot(title='Amount of fungicides used on apples in the US'))
ax.set_ylabel('Fungicide used / lb')
plt.gcf().savefig('C:/Users/User/Documents/Work/Year 3/Project/Plots/apple_fplot{}'.format(col))
plt.close()
The xlabel should already be 'Year' as this seems to be the name of the index. Legend is True by default. See additional plot parameters.