Fixing Python Bokeh Datetime Import Formatting - python
I'm exporting datasets from equipment logging software and am trying to use Bokeh (Python) as an interactive visual aide during analysis. Everything is working fine, except for the date/time which refuses to be imported in its current format (24/08/2022 01:40:32). I have data for every second for at least a month's worth (So dropping the date wouldn't work).
I've been playing about with Bokeh for a while now by simply ignoring the date/time and replacing it by a consecutive series (1,2,3...) and plotting it as such, but the time has come to fix my temporary solution and I just cant seem to figure out how to define the formatting or how to convert it. (Bokeh documentation)
Example code:
from bokeh.io import output_file, show # OUTPUT_FILE FOR EXPORT (NOT USED)
from bokeh.layouts import gridplot # MULTIPLOT
from bokeh.plotting import figure
from bokeh.palettes import Spectral4 # COLOUR PALETTE
import pandas as pd
import external_tags as tags # TAG DEFINITIONS USED FOR CSV IMPORTING
# import csv
df = pd.read_csv("AUGUST_PS_1MIN.csv") # testset with 1 min intervals
# TOOLS
TOOLS = "box_zoom, box_select, crosshair, reset, hover"
Figure_Title = "TESTING AUTOMATING IMPORT WITHOUT MANUAL TWEAKING"
line_width = 1.5
alpha = 1
height = 500
x = df[tags.Date_Time_UTC[0]]
# These just redirect to my imported tag definitions TAG = ["column name", "friendly name"]
fig1a = tags.PS_MH_LOAD
fig1b = tags.PS_MH_WINCH_PWR
fig1c = tags.PS_PWR_MSB1
fig1d = tags.PS_PWR_MSB2
# FIGURE A (TOP LEFT)
s1 = figure(sizing_mode="stretch_width", height=height, title="LOAD", tools=TOOLS, x_axis_type='datetime')
s1.line(x, df[fig1a[0]], color=Spectral4[0], alpha=alpha, line_width=line_width, legend_label=fig1a[1])
s1.line(x, df[fig1b[0]], color=Spectral4[1], alpha=alpha, line_width=line_width, legend_label=fig1b[1])
s1.line(x, df[fig1c[0]], color=Spectral4[2], alpha=alpha, line_width=line_width, legend_label=fig1c[1])
s1.line(x, df[fig1d[0]], color=Spectral4[3], alpha=alpha, line_width=line_width, legend_label=fig1d[1])
#### some repetitive code has been omitted here for brevity
# Define the grid
# p = gridplot([[s1, s2],[s3, s4]])
# show the results
show(s1)
Example of a dataset
2022-08-26 04:03:52.000,0,30,30,894.70751953125,-63.785041809082,-0.497732371091843,2.14258599281311,0.0307948496192694,355.496154785156,0,0,0,2.38387619901914E-05,0,102.844131469727,0.040388036519289,0.703329265117645,0,0,0.0244150012731552,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106834815815091
2022-08-26 04:03:53.000,0,30,30,895.21142578125,-63.6380615234375,-0.550026297569275,2.14223098754883,0.0307948496192694,355.496154785156,0,0,0,1.45306594276917E-05,0,102.827079772949,0.0610153041779995,0.733967423439026,0,0,0.0245136469602585,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106870988383889
2022-08-26 04:03:54.000,0,30,30,895.726196289063,-63.6465072631836,-0.533430516719818,2.1423876285553,0.0307948496192694,355.496154785156,0,0,0,8.71746851771604E-06,0,102.834602355957,0.0816425681114197,0.764605581760406,0,0,0.0246122926473618,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106907160952687
2022-08-26 04:03:55.000,0,30,30,896.1552734375,-63.0882987976074,-0.534056782722473,2.14190745353699,0.0307948496192694,355.496154785156,0,0,0,5.21722904522903E-06,0,102.811561584473,0.10226983577013,0.795243740081787,0,0,0.024710938334465,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106943333521485
2022-08-26 04:03:56.000,0,30,30,895.727600097656,-63.0707931518555,-0.515181064605713,2.14224052429199,0.0307948496192694,355.496154785156,0,0,0,3.12787688017124E-06,0,102.827545166016,0.122897103428841,0.825881898403168,0,0,0.0248095821589231,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106979506090283
2022-08-26 04:03:57.000,0,30,30,895.690246582031,-63.511173248291,-0.49309903383255,2.14326453208923,0.0307948496192694,355.496154785156,0,0,0,7.10703216100228E-06,0,102.876693725586,0.143524378538132,0.856520056724548,0,0,0.0249082278460264,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0107015678659081
Any help would be appreciated. :)
tl;dr: how do I import/use the date and time in Bokeh when the source is formatted as follows: "2022-08-26 04:03:57"
UPDATE
I got it to be recognized as datetime! Still some kinks and formatting to figure out, but this is what did the trick for me:
x = df[tags.Date_Time_UTC[0]]
x = pd.to_datetime(x)
I also manually removed the trailing decimals from the seconds.
2022-08-26 04:03:56.000 -> 2022-08-26 04:03:56
Further answers and tips are, of course, welcome. But I can continue for now!
Thanks for the help!
Because you have imported pandas the easiest way to parse a string to a datetime object is pd.to_datetime(). This function can also parse multiple formats using %f-string notation.
For example
pd.to_datetime('2022-01-01', format='%Y-%m-%d')
and
pd.to_datetime("01/01/2022 00:00:00", format='%d/%m/%Y %H:%M:%S')
will both result in the same datetime object.
If you want to parse a complete column of a pandas DataFrame you could use the .iloc method. Let's say you want to parse the first column (zero based index).
df.iloc[:,0] = pd.to_datetime(df.iloc[:,0], format="%Y-%m-%d")
should work.
The example below is copied from here and if you want to read the bokeh tutorial, there is one which shows how to enable datetime axes.
Example
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
sample={'A':[pd.to_datetime(x, format='%Y-%m') for x in ['2012-01','2012-02','2012-03']],'B':[7,8,9]}
source = ColumnDataSource(sample)
p = figure(width=400, height=400, x_axis_type='datetime')
p.line(x='A', y='B', source=source, line_width=2)
output_notebook()
show(p)
Output
FYI: The function pd.read_csv() has a argument parse_dates which calls pd.to_datetime while parsing the csv-file. But there are multiple options and the usage depends on the data. So you have to read the documentation because this would make this post really long.
bokeh version 2.4.3 seems to parse your second example date: bokeh.core.properties.Datetime().is_valid("2022-08-26 04:03:57") returns True. However, it doesn't think your first example, "24/08/2022 01:40:32" is valid. This answer might help with that one, though? Using Bokeh datetime with Pandas
Related
Unable to use Bokeh and Panda to read a csv and plot it
I'm trying to plot a line graph from a simple CSV file with two columns using Bokeh for data visualisation and Panda to read the CSV and handle the data. However, i can't seem to pass the data I've imported using pandas to Bokeh to plot my line graph. This is running locally on my computer. I've tried and debugged each section of the code and the sole problem seems to occur when I pass the data from pandas to bokeh. I've tried printing the columns I've selected from my csv to check that the entire column has been selected too. #Requirements for App from bokeh.plotting import figure, output_file, show import pandas as pd from bokeh.models import ColumnDataSource #Import data-->Weight measurements over a period of time [ STUB ] weight = pd.read_csv("weight.csv") #Define parameters x=weight["Date"] y=weight["Weight"] #Take data and present in a graph output_file("test.html") p = figure(plot_width=400, plot_height=400) p.line(x,y,line_width=2) show(p) I expect to get a line graph that plots each weight entry each day but I get a blank plot.
This should work. Pandas doesn't know that it is working with dates so you have to specify this with pd.to_datetime(). #!/usr/bin/python3 from bokeh.plotting import figure, output_file, show import pandas as pd from bokeh.models import DatetimeTickFormatter, ColumnDataSource #Import data-->Weight measurements over a period of time [ STUB ] weight = pd.read_csv("weight.csv") #Define parameters weight["Date"] = pd.to_datetime(weight['Date']) weight["Weight"] = pd.to_numeric(weight['Weight']) source = ColumnDataSource(weight) #Take data and present in a graph output_file("test.html") p = figure(plot_width=400, plot_height=400, x_axis_type='datetime') p.line(x='Date',y='Weight',line_width=2, source=source) p.xaxis.formatter=DatetimeTickFormatter( minutes=["%M"], hours=["%H:%M"], days=["%d/%m/%Y"], months=["%m/%Y"], years=["%Y"] ) show(p)
Verbatim labels in legend in bokeh plots
I'm trying to use bokeh in python for interactive analysis of my plots. My data are stored in pandas.Dataframe. I'd like to have a legend with column names as labels. However, bokeh extracts values from respective column instead. import pandas as pd from bokeh.plotting import figure from bokeh.io import output_notebook, show from bokeh.models import ColumnDataSource output_notebook() BokehJS 0.12.13 successfully loaded. df = pd.DataFrame({'accuracy': np.random.random(10)}, index=pd.Index(np.arange(10), name='iteration')) df output: accuracy iteration 0 0.977427 1 0.057319 2 0.307741 3 0.127390 4 0.662976 5 0.313618 6 0.214040 7 0.214274 8 0.864432 9 0.800101 Now plot: p = figure(width=900, y_axis_type="log") source = ColumnDataSource(df) p.line(x='iteration', y='accuracy', source=source, legend='accuracy') show(p) Result: Desired output, obtained with adding space: legend='accuracy'+' ': Although I've reached my goal, the method does not satisfy me. I think, there should be more elegant and official way to tell between column name and legend label.
There is. Bokeh tries to "do the right thing" in most situations, but doing that makes for a few corner cases where the behavior is less desirable, and this is one of them. However, specifically in this instance, you can always be explicit about whether the string is to be interpreted as a value or as field: from bokeh.core.properties import value p.line(x='iteration', y='accuracy', source=source, legend=value('accuracy'))
Bokeh Slider using Pandas datetime index
I'm trying to add a slider via Bokeh to my plot which is connected to a pandas dataframe. The plot is using the datetime index to show how Air Quality Index over one year. I would like to add a slider for each month, January - December 2016. I'm not able to find a clear example with code that connects the slider to a plot which is connected to a pandas dataframe. Someone help please! I was able to find the following code, but the plot was generated with random data. The output of this code is exactly what I'm looking to do but with time series data. from bokeh.io import output_notebook, show, vform from bokeh.plotting import figure, Figure from bokeh.models import ColumnDataSource, Slider, CustomJS import numpy as np output_notebook() x = np.sort(np.random.uniform(0, 100, 2000)) y = np.sin(x*10) + np.random.normal(scale=0.1, size=2000) fig = Figure(plot_height=400, x_range=(0, 2)) source = ColumnDataSource(data={"x":x, "y":y}) line = fig.line(x="x", y="y", source=source) callback = CustomJS(args=dict(x_range=fig.x_range), code=""" var start = cb_obj.get("value"); x_range.set("start", start); x_range.set("end", start+2); """) slider = Slider(start=0, end=100, step=2, callback=callback) show(vform(slider, fig)) I also found the source code of making this type of slider (below/linked here) but I am unsure how to implement it. As you can probably tell, I'm fairly new to Bokeh. Please help! class DateRangeSlider(AbstractSlider): """ Slider-based date range selection widget. """ #property def value_as_datetime(self): ''' Convenience property to retrieve the value tuple as a tuple of datetime objects. ''' if self.value is None: return None v1, v2 = self.value if isinstance(v1, numbers.Number): d1 = datetime.utcfromtimestamp(v1 / 1000) else: d1 = v1 if isinstance(v2, numbers.Number): d2 = datetime.utcfromtimestamp(v2 / 1000) else: d2 = v2 return d1, d2 value = Tuple(Date, Date, help=""" Initial or selected range. """) start = Date(help=""" The minimum allowable value. """) end = Date(help=""" The maximum allowable value. """) step = Int(default=1, help=""" The step between consecutive values. """) format = Override(default="%d %b %G")
I just worked through a similar situation with my project. I did not use the pandas datetime functionality as my dates were mixed format, but it was easy to update once I cleaned my data. The important part is to have your callback function adjust the .data attribute of your ColumnDataSource. In the example you have, the callback function is written in Javascript. I used the code from the example Iain references, but I had to do a small workaround for panda dataframes. In the example below, data is a list of panda dataframes. def callback(attrname, old, new): month = slider.value source.data = ColumnDataSource(data[month]).data This replaces the current data for the graph with data from a different pandas dataframe. If, like me, all your data is in one data frame, you could also do some pandas filtering to return the data that you want to display. data_to_use = data[data['Month'] == month[slider.value]] Again, when I did that, I had to convert data_to_use to a ColumnDataSource and then replace the .data attribute of the source for my graph.
The Gapminder example from the Bokeh gallery does this using years, a similar approach should work for Months for your dataset. As you are only worried about months, you don't need to work with a datetime index, just get it as a list. gapminder
Bokeh: chart from pandas dataframe won't update on trigger
I have got a pandas dataframe whose columns I want to show as lines in a plot using a Bokeh server. Additionally, I would like to have a slider for shifting one of the lines against the other. My problem is the update functionality when the slider value changes. I have tried the code from the sliders-example of bokeh, but it does not work. Here is an example import pandas as pd from bokeh.io import vform from bokeh.plotting import Figure, output_file, show from bokeh.models import CustomJS, ColumnDataSource, Slider df = pd.DataFrame([[1,2,3],[3,4,5]]) df = df.transpose() myindex = list(df.index.values) mysource = ColumnDataSource(df) plot = Figure(plot_width=400, plot_height=400) for i in range(len(mysource.column_names) - 1): name = mysource.column_names[i] plot.line(x = myindex, y = str(name), source = mysource) offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1) def update_data(attrname, old, new): # Get the current slider values a = offset.value temp = df[1].shift(a) #to finish# offset.on_change('value', update_data) layout = vform(offset, plot) show(layout) Inside the update_data-function I have to update mysource, but I cannot figure out how to do that. Can anybody point me in the right direction?
Give this a try... change a=offset.value to a=cb_obj.get('value') Then put source.trigger('change') after you do whatever it is you are trying to do in that update_data function instead of offset.on_change('value', update_data). Also change offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1, callback=CustomJS.from_py_func(offset)) Note this format I'm using works with flexx installed. https://github.com/zoofio/flexx if you have Python 3.5 you'll have to download the zip file, extract, and type python setup.py install as it isn't posted yet compiled for this version...
Display only part of Y-axis on Bokeh
Using Bokeh 0.8.1, how can i display a long timeserie, but start 'zoomed-in' on one part, while keeping the rest of data available for scrolling ? For instance, considering the following time serie (IBM stock price since 1980), how could i get my chart to initially display only price since 01/01/2014 ? Example code : import pandas as pd import bokeh.plotting as bk from bokeh.models import ColumnDataSource bk.output_notebook() TOOLS="pan,wheel_zoom,box_zoom,reset,save" # Quandl data, too lazy to generate some random data df = pd.read_csv('https://www.quandl.com/api/v1/datasets/GOOG/NYSE_IBM.csv') df['Date'] = pd.to_datetime(df['Date']) df = df[['Date', 'Close']] #Generating a bokeh source source = ColumnDataSource() dtest = {} for col in df: dtest[col] = df[col] source = ColumnDataSource(data=dtest) # plotting stuff ! p = bk.figure(title='title', tools=TOOLS,x_axis_type="datetime", plot_width=600, plot_height=300) p.line(y='Close', x='Date', source=source) bk.show(p) outputs : but i want to get this (which you can achieve with the box-zoom tool - but I'd like to immediately start like this)
So, it looks (as of 0.8.1) that we need to add some more convenient ways to set ranges with datetime values. That said, although this is a bit ugly, it does currently work for me: import time, datetime x_range = ( time.mktime(datetime.datetime(2014, 1, 1).timetuple())*1000, time.mktime(datetime.datetime(2016, 1, 1).timetuple())*1000 ) p = bk.figure( title='title', tools=TOOLS,x_axis_type="datetime", plot_width=600, plot_height=300, x_range=x_range )