So I have a dataframe with 3 columns: date, price, text
import pandas as pd
from datetime import datetime
import random
columns = ('dates','prices','text')
datelist = pd.date_range(datetime.today(), periods=5).tolist()
prices = []
for i in range(0, 5):
prices.append(random.randint(50, 60))
text =['AAA','BBB','CCC','DDD','EEE']
df = pd.DataFrame({'dates': datelist, 'price':prices, 'text':text})
dates price text
0 2022-11-23 14:11:51.142574 51 AAA
1 2022-11-24 14:11:51.142574 57 BBB
2 2022-11-25 14:11:51.142574 52 CCC
3 2022-11-26 14:11:51.142574 51 DDD
4 2022-11-27 14:11:51.142574 59 EEE
I want to plot date and price on a line chart, but when I hover over the line I want it to show the text from the row corresponding to that date.
eg when I hover over the point corresponding to 2022-11-27 I want the text to show 'EEE'
ive tried a few things in matplotlib etc but can only get data from the x and y axis to show but I cant figure out how to show data from a different column.
You could use Plotly.
import plotly.graph_objects as go
fig = go.Figure(data=go.Scatter(x=df['dates'], y=df['price'], mode='lines+markers', text=df['text']))
fig.show()
You should be aware that cursor & dataframe indexing will probably work well with points on a scatter plot, but it is a little bit trickier to handle a lineplot.
With a lineplot, matplotlib draws the line between 2 data points (basically, it's linear interpolation), so a specific logic must be taken care of to:
specify the intended behavior
implement the corresponding mouseover behavior when the cursor lands "between" 2 data points.
The lib/links below may provide tools to handle scatter plots and lineplots, but I am not expert enough to point you to this specific part in either the SO link nor the mplcursors link.
(besides, the exact intended behavioor was not clearly stated in your initial question; consider editing/clarifying)
So, alternatively to DankyKang's answer, have a look at this SO question and answers that cover a large panel of possibilities for mouseover: How to add hovering annotations to a plot
A library worth noting is this one: https://mplcursors.readthedocs.io/en/stable/
Quoting:
mplcursors provides interactive data selection cursors for Matplotlib. It is inspired from mpldatacursor, with a much simplified API.
mplcursors requires Python 3, and Matplotlib≥3.1.
Specifically this example based on dataframes: https://mplcursors.readthedocs.io/en/stable/examples/dataframe.html
Quoting:
DataFrames can be used similarly to any other kind of input. Here, we generate a scatter plot using two columns and label the points using all columns.
This example also applies a shadow effect to the hover panel.
copy-pasta of code example, should this answer be considered not complete enough :
from matplotlib import pyplot as plt
from matplotlib.patheffects import withSimplePatchShadow
import mplcursors
from pandas import DataFrame
df = DataFrame(
dict(
Suburb=["Ames", "Somerset", "Sawyer"],
Area=[1023, 2093, 723],
SalePrice=[507500, 647000, 546999],
)
)
df.plot.scatter(x="Area", y="SalePrice", s=100)
def show_hover_panel(get_text_func=None):
cursor = mplcursors.cursor(
hover=2, # Transient
annotation_kwargs=dict(
bbox=dict(
boxstyle="square,pad=0.5",
facecolor="white",
edgecolor="#ddd",
linewidth=0.5,
path_effects=[withSimplePatchShadow(offset=(1.5, -1.5))],
),
linespacing=1.5,
arrowprops=None,
),
highlight=True,
highlight_kwargs=dict(linewidth=2),
)
if get_text_func:
cursor.connect(
event="add",
func=lambda sel: sel.annotation.set_text(get_text_func(sel.index)),
)
return cursor
def on_add(index):
item = df.iloc[index]
parts = [
f"Suburb: {item.Suburb}",
f"Area: {item.Area:,.0f}m²",
f"Sale price: ${item.SalePrice:,.0f}",
]
return "\n".join(parts)
show_hover_panel(on_add)
plt.show()
I'm exporting datasets from equipment logging software and am trying to use Bokeh (Python) as an interactive visual aide during analysis. Everything is working fine, except for the date/time which refuses to be imported in its current format (24/08/2022 01:40:32). I have data for every second for at least a month's worth (So dropping the date wouldn't work).
I've been playing about with Bokeh for a while now by simply ignoring the date/time and replacing it by a consecutive series (1,2,3...) and plotting it as such, but the time has come to fix my temporary solution and I just cant seem to figure out how to define the formatting or how to convert it. (Bokeh documentation)
Example code:
from bokeh.io import output_file, show # OUTPUT_FILE FOR EXPORT (NOT USED)
from bokeh.layouts import gridplot # MULTIPLOT
from bokeh.plotting import figure
from bokeh.palettes import Spectral4 # COLOUR PALETTE
import pandas as pd
import external_tags as tags # TAG DEFINITIONS USED FOR CSV IMPORTING
# import csv
df = pd.read_csv("AUGUST_PS_1MIN.csv") # testset with 1 min intervals
# TOOLS
TOOLS = "box_zoom, box_select, crosshair, reset, hover"
Figure_Title = "TESTING AUTOMATING IMPORT WITHOUT MANUAL TWEAKING"
line_width = 1.5
alpha = 1
height = 500
x = df[tags.Date_Time_UTC[0]]
# These just redirect to my imported tag definitions TAG = ["column name", "friendly name"]
fig1a = tags.PS_MH_LOAD
fig1b = tags.PS_MH_WINCH_PWR
fig1c = tags.PS_PWR_MSB1
fig1d = tags.PS_PWR_MSB2
# FIGURE A (TOP LEFT)
s1 = figure(sizing_mode="stretch_width", height=height, title="LOAD", tools=TOOLS, x_axis_type='datetime')
s1.line(x, df[fig1a[0]], color=Spectral4[0], alpha=alpha, line_width=line_width, legend_label=fig1a[1])
s1.line(x, df[fig1b[0]], color=Spectral4[1], alpha=alpha, line_width=line_width, legend_label=fig1b[1])
s1.line(x, df[fig1c[0]], color=Spectral4[2], alpha=alpha, line_width=line_width, legend_label=fig1c[1])
s1.line(x, df[fig1d[0]], color=Spectral4[3], alpha=alpha, line_width=line_width, legend_label=fig1d[1])
#### some repetitive code has been omitted here for brevity
# Define the grid
# p = gridplot([[s1, s2],[s3, s4]])
# show the results
show(s1)
Example of a dataset
2022-08-26 04:03:52.000,0,30,30,894.70751953125,-63.785041809082,-0.497732371091843,2.14258599281311,0.0307948496192694,355.496154785156,0,0,0,2.38387619901914E-05,0,102.844131469727,0.040388036519289,0.703329265117645,0,0,0.0244150012731552,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106834815815091
2022-08-26 04:03:53.000,0,30,30,895.21142578125,-63.6380615234375,-0.550026297569275,2.14223098754883,0.0307948496192694,355.496154785156,0,0,0,1.45306594276917E-05,0,102.827079772949,0.0610153041779995,0.733967423439026,0,0,0.0245136469602585,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106870988383889
2022-08-26 04:03:54.000,0,30,30,895.726196289063,-63.6465072631836,-0.533430516719818,2.1423876285553,0.0307948496192694,355.496154785156,0,0,0,8.71746851771604E-06,0,102.834602355957,0.0816425681114197,0.764605581760406,0,0,0.0246122926473618,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106907160952687
2022-08-26 04:03:55.000,0,30,30,896.1552734375,-63.0882987976074,-0.534056782722473,2.14190745353699,0.0307948496192694,355.496154785156,0,0,0,5.21722904522903E-06,0,102.811561584473,0.10226983577013,0.795243740081787,0,0,0.024710938334465,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106943333521485
2022-08-26 04:03:56.000,0,30,30,895.727600097656,-63.0707931518555,-0.515181064605713,2.14224052429199,0.0307948496192694,355.496154785156,0,0,0,3.12787688017124E-06,0,102.827545166016,0.122897103428841,0.825881898403168,0,0,0.0248095821589231,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106979506090283
2022-08-26 04:03:57.000,0,30,30,895.690246582031,-63.511173248291,-0.49309903383255,2.14326453208923,0.0307948496192694,355.496154785156,0,0,0,7.10703216100228E-06,0,102.876693725586,0.143524378538132,0.856520056724548,0,0,0.0249082278460264,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0107015678659081
Any help would be appreciated. :)
tl;dr: how do I import/use the date and time in Bokeh when the source is formatted as follows: "2022-08-26 04:03:57"
UPDATE
I got it to be recognized as datetime! Still some kinks and formatting to figure out, but this is what did the trick for me:
x = df[tags.Date_Time_UTC[0]]
x = pd.to_datetime(x)
I also manually removed the trailing decimals from the seconds.
2022-08-26 04:03:56.000 -> 2022-08-26 04:03:56
Further answers and tips are, of course, welcome. But I can continue for now!
Thanks for the help!
Because you have imported pandas the easiest way to parse a string to a datetime object is pd.to_datetime(). This function can also parse multiple formats using %f-string notation.
For example
pd.to_datetime('2022-01-01', format='%Y-%m-%d')
and
pd.to_datetime("01/01/2022 00:00:00", format='%d/%m/%Y %H:%M:%S')
will both result in the same datetime object.
If you want to parse a complete column of a pandas DataFrame you could use the .iloc method. Let's say you want to parse the first column (zero based index).
df.iloc[:,0] = pd.to_datetime(df.iloc[:,0], format="%Y-%m-%d")
should work.
The example below is copied from here and if you want to read the bokeh tutorial, there is one which shows how to enable datetime axes.
Example
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
sample={'A':[pd.to_datetime(x, format='%Y-%m') for x in ['2012-01','2012-02','2012-03']],'B':[7,8,9]}
source = ColumnDataSource(sample)
p = figure(width=400, height=400, x_axis_type='datetime')
p.line(x='A', y='B', source=source, line_width=2)
output_notebook()
show(p)
Output
FYI: The function pd.read_csv() has a argument parse_dates which calls pd.to_datetime while parsing the csv-file. But there are multiple options and the usage depends on the data. So you have to read the documentation because this would make this post really long.
bokeh version 2.4.3 seems to parse your second example date: bokeh.core.properties.Datetime().is_valid("2022-08-26 04:03:57") returns True. However, it doesn't think your first example, "24/08/2022 01:40:32" is valid. This answer might help with that one, though? Using Bokeh datetime with Pandas
I have a pandas dataframe I am pulling data from and showing as a bar plot using Bokeh. What I want is show the max value of each bar upon hover. This is the first day I'm using Bokeh and I already changed the code a couple times and I'm really confused how to set it up. I added the:
p.add_tools(HoverTool(tooltips=[("x_ax", "#x_ax"), ("y_ax", "#y_ax")]))
line, but just don't understand it.
Here's the code:
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource, ranges, LabelSet
from bokeh.plotting import figure, save, gridplot, output_file
# prepare some data
# x = pd.Series(range(1,36))
x_ax = FAdf['SampleID']
y_ax = FAdf['First Run Au (ppm)']
# output to static HTML file
output_file("bars.html")
# create a new plot with a title and axis labels
p = figure(x_range=x_ax, title="Batch results", x_axis_label='sample', y_axis_label='Au (ppm)',
toolbar_location="above", plot_width=1200, plot_height=800)
p.add_tools(HoverTool(tooltips=[("x_ax", "#x_ax"), ("y_ax", "#y_ax")]))
# setup for the bars
p.vbar(x=x_ax, top=y_ax, width=0.9)
p.xgrid.grid_line_color = None
p.y_range.start = 0
# turn bar tick labels 45 deg
p.xaxis.major_label_orientation = np.pi/3.5
# show the results
show(p)
Sample from the FAdf database:
SampleID:
0 KR-19 349
1 KR-19 351
2 Blank_2
3 KR-19 353
First Run Au (ppm):
0 0.019
1 0.002
2 0.000
3 0.117
If you pass actual literal data sequences to a glyph method like you have above, then Bokeh uses generic field names like "x" and "y" since it has no way of knowing any other names use. These are the columns you would need to configure the hover tool with:
tooltips=[("x_ax", "#x"), ("y_ax", "#y")])
Alternatively, you can pass a source argument to the vbar method so that the columns have the column names that you prefer. This is described in the Users Guide:
https://docs.bokeh.org/en/latest/docs/user_guide/data.html
I'm making histograms of grades historically by term.
I want to make an interactive Bokeh bar chart with a slider that can cycle through the terms.
I have the bar chart working on a single term but when I try to add additional terms I can't get the bar chart to select a single term and then do the updating and slight through it.
I really just need to have some help getting the groupby object to select just one term.
import os
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import curdoc, output_file, show
from bokeh.layouts import widgetbox
from bokeh.models import Slider
from bokeh.transform import jitter
from bokeh.palettes import viridis
from bokeh.transform import factor_cmap
Importing necessary modules.
input_file = 'Alltime_grades.csv'
output_file('Grades.html')
df = pd.read_csv(input_file)
group = df.groupby(['Term','Grade'])
Here the is the input file import code. The input file has 3 columns - "Term", "Grade", and "Grade_Count".
So say Spring 2019 - A - 5000
I got this to work on a single term by not grouping by term even tho the field was still there.
Grades = ['A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D', 'D-', 'F']
Above, I made a list of grades so that they display in the correct order
source = ColumnDataSource(group)
grade_cmap = factor_cmap('Grade',palette=viridis(22) , factors=Grades)
###
p = figure(plot_height=500, plot_width=700, title='Grades Over Time', toolbar_location=None, tools="", x_range = Grades)
###
p.vbar(x='Grade', top='grade_count_max', width=.75, bottom=0, source=source, line_color=grade_cmap, fill_color=grade_cmap)
p.y_range.start = 0
p.xaxis.axis_label = 'Grade'
p.yaxis.axis_label = 'Count of Grade'
curdoc().add_root(p)
show(p)
Above is the code I used to make the single bar chart to display.
source = ColumnDataSource(data={
'x' : group.loc['Spring 2019'].Grade,
'top' : group.loc['Spring 2019'].grade_count_max)})
For the multiple terms I tried this as the Column Data Source.
So for now I am getting can't use 'loc' method on Group by.
I need to have some way to make it select a single term so that I can them make an update function that follows the slider and updates the terms.
Beyond that I'm not even sure that I can cycle through non-numeric values on a slider but a good first step would be to be able to slice the group by at all.
Thanks for any help you can provide.
I think you should describe your data while is not clear what is the main problem. This is a problem with pandas and not with bokeh, however, I ran with a similar problem but I cannot understand your data problem. The way I solved (in my case) was to use the pandas function Grouper and then I applied the mean. The ColumnDataSource for instance, allowed me to use the .loc function.
I finally figured this out by using the .get_group method on the GroupBy Object!
I'm working on automating plotting functions for metabolomics data with bokeh. Currently, I'm trying to read in my dataframe from CSV and iterate through the columns generating box plots for each metabolite (column).
I have an example df that looks like this:
Sample Group AMP ADP ATP
1A A 239847 239084 987374
1B A 245098 241210 988950
2A B 238759 200554 921032
2B B 230029 215408 89980
Here is what my code looks like:
import pandas
from bokeh.plotting import figure, output_file, show, save
from bokeh.charts import BoxPlot
df = pandas.read_csv("testdata_2.csv")
for colname, col in df.iteritems():
p = BoxPlot(df, values=df[colname], label='Group', xlabel='Group', ylabel='Peak Area',
title=colname)
output_file("boxplot.html")
show(p)
This generates an error:
raise ValueError("expected an element of either %s, got %r" % (nice_join(self.type_params), value))
ValueError: expected an element of either Column Name or Column String or List(Column Name or Column String
It seems that setting values=df[colname] is the issue. If I replace it with values=df['colname'] it gives me a key error for colname. I can plot just fine if I specify a given column such as values='ATP' but I need to be able loop through all columns.
Any guidance? Is this even the best approach?
Thanks in advance.
If you want to organize them horizontally, you can create different graphs, and then you could use for instance hplot from bokeh.io as follows:
import pandas
from bokeh.plotting import figure, output_file, show, save
from bokeh.charts import BoxPlot
from bokeh.io import hplot
df = pandas.read_csv("testdata_2.csv")
p = []
for colname in ['AMP','ADP','ATP']:
p += [BoxPlot(df, values=colname, label='Group', xlabel='Group',
ylabel='Peak Area',title=colname, width=250,height=250)]
output_file("boxplot.html")
show(hplot(*p))
For your particular example I get: