I've a time series (typically energy usage) recorded over a range of days. Since usage tends to be different over the weekend I want to highlight the weekends.
I've done what seems sensible:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import random
#Create dummy data.
start=datetime.datetime(2022,10,22,0,0)
finish=datetime.datetime(2022,11,7,0,0)
def randomWalk():
i=0
while True:
i=i+random.random()-0.5
yield i
s = pd.Series({i: next(randomWalk()) for i in pd.date_range(start, finish,freq='h')})
# Plot it.
plt.figure(figsize=[12, 8]);
s.plot();
# Color the labels according to the day of week.
for label, day in zip(plt.gca().xaxis.get_ticklabels(which='minor'),
pd.date_range(start,finish,freq='d')):
label.set_color('red' if day.weekday() > 4 else 'black')
But what I get is wrong. Two weekends appear one off, and the third doesn't show at all.
I've explored the 'label' objects, but their X coordinate is just an integer, and doesn't seem meaningful. Using DateFormatter just gives nonsense.
How would be best to fix this, please?
OK - since matplotlib only provides the information we need to the Tick Label Formatter functions, that's what we have to use:
minorLabels=plt.gca().xaxis.get_ticklabels(which='minor')
majorLabels=plt.gca().xaxis.get_ticklabels(which='major')
def MinorFormatter(dateInMinutes, index):
# Formatter: first param is value (date in minutes, would you believe), second is which item in order.
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
minorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return day.day
def MajorFormatter(dateInMinutes, index):
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
majorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return "" if (index==0 or index==len(majorLabels)-1) else day.strftime("%d\n%b\n%Y")
plt.gca().xaxis.set_minor_formatter(MinorFormatter)
plt.gca().xaxis.set_major_formatter(MajorFormatter)
Pretty clunky, but it works. Could be fragile, though - anyone got a better answer?
Matplotlib is meant for scientific use and although technically styling is possible, it's really hard and not worth the effort.
Consider using Plotly instead of Matplotlib as below:
#pip install plotly in terminal
import plotly.express as px
# read plotly express provided sample dataframe
df = px.data.tips()
# create plotly figure with color_discrete_map property specifying color per day
fig = px.bar(df, x="day", y="total_bill", color='day',
color_discrete_map={"Sat": "orange", "Sun": "orange", "Thur": "blue", "Fri": "blue"}
)
# send to browser
fig.show()
Solves your problem using a lot fewer lines. Only thing here is you need to make sure your data is in a Pandas DataFrame rather than Series with column names which you can pass into plotly.express.bar or scatter plot.
I'm exporting datasets from equipment logging software and am trying to use Bokeh (Python) as an interactive visual aide during analysis. Everything is working fine, except for the date/time which refuses to be imported in its current format (24/08/2022 01:40:32). I have data for every second for at least a month's worth (So dropping the date wouldn't work).
I've been playing about with Bokeh for a while now by simply ignoring the date/time and replacing it by a consecutive series (1,2,3...) and plotting it as such, but the time has come to fix my temporary solution and I just cant seem to figure out how to define the formatting or how to convert it. (Bokeh documentation)
Example code:
from bokeh.io import output_file, show # OUTPUT_FILE FOR EXPORT (NOT USED)
from bokeh.layouts import gridplot # MULTIPLOT
from bokeh.plotting import figure
from bokeh.palettes import Spectral4 # COLOUR PALETTE
import pandas as pd
import external_tags as tags # TAG DEFINITIONS USED FOR CSV IMPORTING
# import csv
df = pd.read_csv("AUGUST_PS_1MIN.csv") # testset with 1 min intervals
# TOOLS
TOOLS = "box_zoom, box_select, crosshair, reset, hover"
Figure_Title = "TESTING AUTOMATING IMPORT WITHOUT MANUAL TWEAKING"
line_width = 1.5
alpha = 1
height = 500
x = df[tags.Date_Time_UTC[0]]
# These just redirect to my imported tag definitions TAG = ["column name", "friendly name"]
fig1a = tags.PS_MH_LOAD
fig1b = tags.PS_MH_WINCH_PWR
fig1c = tags.PS_PWR_MSB1
fig1d = tags.PS_PWR_MSB2
# FIGURE A (TOP LEFT)
s1 = figure(sizing_mode="stretch_width", height=height, title="LOAD", tools=TOOLS, x_axis_type='datetime')
s1.line(x, df[fig1a[0]], color=Spectral4[0], alpha=alpha, line_width=line_width, legend_label=fig1a[1])
s1.line(x, df[fig1b[0]], color=Spectral4[1], alpha=alpha, line_width=line_width, legend_label=fig1b[1])
s1.line(x, df[fig1c[0]], color=Spectral4[2], alpha=alpha, line_width=line_width, legend_label=fig1c[1])
s1.line(x, df[fig1d[0]], color=Spectral4[3], alpha=alpha, line_width=line_width, legend_label=fig1d[1])
#### some repetitive code has been omitted here for brevity
# Define the grid
# p = gridplot([[s1, s2],[s3, s4]])
# show the results
show(s1)
Example of a dataset
2022-08-26 04:03:52.000,0,30,30,894.70751953125,-63.785041809082,-0.497732371091843,2.14258599281311,0.0307948496192694,355.496154785156,0,0,0,2.38387619901914E-05,0,102.844131469727,0.040388036519289,0.703329265117645,0,0,0.0244150012731552,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106834815815091
2022-08-26 04:03:53.000,0,30,30,895.21142578125,-63.6380615234375,-0.550026297569275,2.14223098754883,0.0307948496192694,355.496154785156,0,0,0,1.45306594276917E-05,0,102.827079772949,0.0610153041779995,0.733967423439026,0,0,0.0245136469602585,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106870988383889
2022-08-26 04:03:54.000,0,30,30,895.726196289063,-63.6465072631836,-0.533430516719818,2.1423876285553,0.0307948496192694,355.496154785156,0,0,0,8.71746851771604E-06,0,102.834602355957,0.0816425681114197,0.764605581760406,0,0,0.0246122926473618,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106907160952687
2022-08-26 04:03:55.000,0,30,30,896.1552734375,-63.0882987976074,-0.534056782722473,2.14190745353699,0.0307948496192694,355.496154785156,0,0,0,5.21722904522903E-06,0,102.811561584473,0.10226983577013,0.795243740081787,0,0,0.024710938334465,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106943333521485
2022-08-26 04:03:56.000,0,30,30,895.727600097656,-63.0707931518555,-0.515181064605713,2.14224052429199,0.0307948496192694,355.496154785156,0,0,0,3.12787688017124E-06,0,102.827545166016,0.122897103428841,0.825881898403168,0,0,0.0248095821589231,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106979506090283
2022-08-26 04:03:57.000,0,30,30,895.690246582031,-63.511173248291,-0.49309903383255,2.14326453208923,0.0307948496192694,355.496154785156,0,0,0,7.10703216100228E-06,0,102.876693725586,0.143524378538132,0.856520056724548,0,0,0.0249082278460264,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0107015678659081
Any help would be appreciated. :)
tl;dr: how do I import/use the date and time in Bokeh when the source is formatted as follows: "2022-08-26 04:03:57"
UPDATE
I got it to be recognized as datetime! Still some kinks and formatting to figure out, but this is what did the trick for me:
x = df[tags.Date_Time_UTC[0]]
x = pd.to_datetime(x)
I also manually removed the trailing decimals from the seconds.
2022-08-26 04:03:56.000 -> 2022-08-26 04:03:56
Further answers and tips are, of course, welcome. But I can continue for now!
Thanks for the help!
Because you have imported pandas the easiest way to parse a string to a datetime object is pd.to_datetime(). This function can also parse multiple formats using %f-string notation.
For example
pd.to_datetime('2022-01-01', format='%Y-%m-%d')
and
pd.to_datetime("01/01/2022 00:00:00", format='%d/%m/%Y %H:%M:%S')
will both result in the same datetime object.
If you want to parse a complete column of a pandas DataFrame you could use the .iloc method. Let's say you want to parse the first column (zero based index).
df.iloc[:,0] = pd.to_datetime(df.iloc[:,0], format="%Y-%m-%d")
should work.
The example below is copied from here and if you want to read the bokeh tutorial, there is one which shows how to enable datetime axes.
Example
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
sample={'A':[pd.to_datetime(x, format='%Y-%m') for x in ['2012-01','2012-02','2012-03']],'B':[7,8,9]}
source = ColumnDataSource(sample)
p = figure(width=400, height=400, x_axis_type='datetime')
p.line(x='A', y='B', source=source, line_width=2)
output_notebook()
show(p)
Output
FYI: The function pd.read_csv() has a argument parse_dates which calls pd.to_datetime while parsing the csv-file. But there are multiple options and the usage depends on the data. So you have to read the documentation because this would make this post really long.
bokeh version 2.4.3 seems to parse your second example date: bokeh.core.properties.Datetime().is_valid("2022-08-26 04:03:57") returns True. However, it doesn't think your first example, "24/08/2022 01:40:32" is valid. This answer might help with that one, though? Using Bokeh datetime with Pandas
I'm trying to get xy coordinates of points drawn by the user. I want to have them as a dictionary, a list or a pandas DataFrame.
I'm using Bokeh 2.0.2 in Jupyter. There'll be a background image (which is not the focus of this post) and on top, the user will create points that I could use further.
Below is where I've managed to get to (with some dummy data). And I've commented some lines which I believe are the direction in which I'd have to go. But I don't seem to get the grasp of it.
from bokeh.plotting import figure, show, Column, output_notebook
from bokeh.models import PointDrawTool, ColumnDataSource, TableColumn, DataTable
output_notebook()
my_tools = ["pan, wheel_zoom, box_zoom, reset"]
#create the figure object
p = figure(title= "my_title", match_aspect=True,
toolbar_location = 'above', tools = my_tools)
seeds = ColumnDataSource({'x': [2,14,8], 'y': [-1,5,7]}) #dummy data
renderer = p.scatter(x='x', y='y', source = seeds, color='red', size=10)
columns = [TableColumn(field="x", title="x"),
TableColumn(field="y", title="y")]
table = DataTable(source=seeds, columns=columns, editable=True, height=100)
#callback = CustomJS(args=dict(source=seeds), code="""
# var data = source.data;
# var x = data['x']
# var y = data['y']
# source.change.emit();
#""")
#
#seeds.x.js_on_change('change:x', callback)
draw_tool = PointDrawTool(renderers=[renderer])
p.add_tools(draw_tool)
p.toolbar.active_tap = draw_tool
show(Column(p, table))
From the documentation at https://docs.bokeh.org/en/latest/docs/user_guide/tools.html#pointdrawtool:
The tool will automatically modify the columns on the data source corresponding to the x and y values of the glyph. Any additional columns in the data source will be padded with the declared empty_value, when adding a new point. Any newly added points will be inserted on the ColumnDataSource of the first supplied renderer.
So, just check the corresponding data source, seeds in your case.
The only issue here is if you want to know exactly what point has been changed or added. In this case, the simplest solution would be to create a custom subclass of PointDrawTool that does just that. Alternatively, you can create an additional "original" data source and compare seeds to it each time it's updated.
The problem is that the execute it in Python. But show create a static version. Here is a simple example that fix it! I removed the table and such to make it a bit cleaner, but it will also work with it:
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import PointDrawTool
output_notebook()
#create the figure object
p = figure(width=400,height=400)
renderer = p.scatter(x=[0,1], y=[1,2],color='red', size=10)
draw_tool = PointDrawTool(renderers=[renderer])
p.add_tools(draw_tool)
p.toolbar.active_tap = draw_tool
# This part is imporant
def app(doc):
global p
doc.add_root(p)
show(app) #<-- show app and not p!
I have got a pandas dataframe whose columns I want to show as lines in a plot using a Bokeh server. Additionally, I would like to have a slider for shifting one of the lines against the other.
My problem is the update functionality when the slider value changes. I have tried the code from the sliders-example of bokeh, but it does not work.
Here is an example
import pandas as pd
from bokeh.io import vform
from bokeh.plotting import Figure, output_file, show
from bokeh.models import CustomJS, ColumnDataSource, Slider
df = pd.DataFrame([[1,2,3],[3,4,5]])
df = df.transpose()
myindex = list(df.index.values)
mysource = ColumnDataSource(df)
plot = Figure(plot_width=400, plot_height=400)
for i in range(len(mysource.column_names) - 1):
name = mysource.column_names[i]
plot.line(x = myindex, y = str(name), source = mysource)
offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1)
def update_data(attrname, old, new):
# Get the current slider values
a = offset.value
temp = df[1].shift(a)
#to finish#
offset.on_change('value', update_data)
layout = vform(offset, plot)
show(layout)
Inside the update_data-function I have to update mysource, but I cannot figure out how to do that. Can anybody point me in the right direction?
Give this a try... change a=offset.value to a=cb_obj.get('value')
Then put source.trigger('change') after you do whatever it is you are trying to do in that update_data function instead of offset.on_change('value', update_data).
Also change offset = Slider(title="offset", value=0.0, start=-1.0, end=1.0, step=1, callback=CustomJS.from_py_func(offset))
Note this format I'm using works with flexx installed. https://github.com/zoofio/flexx if you have Python 3.5 you'll have to download the zip file, extract, and type python setup.py install as it isn't posted yet compiled for this version...
Using Bokeh 0.8.1, how can i display a long timeserie, but start 'zoomed-in' on one part, while keeping the rest of data available for scrolling ?
For instance, considering the following time serie (IBM stock price since 1980), how could i get my chart to initially display only price since 01/01/2014 ?
Example code :
import pandas as pd
import bokeh.plotting as bk
from bokeh.models import ColumnDataSource
bk.output_notebook()
TOOLS="pan,wheel_zoom,box_zoom,reset,save"
# Quandl data, too lazy to generate some random data
df = pd.read_csv('https://www.quandl.com/api/v1/datasets/GOOG/NYSE_IBM.csv')
df['Date'] = pd.to_datetime(df['Date'])
df = df[['Date', 'Close']]
#Generating a bokeh source
source = ColumnDataSource()
dtest = {}
for col in df:
dtest[col] = df[col]
source = ColumnDataSource(data=dtest)
# plotting stuff !
p = bk.figure(title='title', tools=TOOLS,x_axis_type="datetime", plot_width=600, plot_height=300)
p.line(y='Close', x='Date', source=source)
bk.show(p)
outputs :
but i want to get this (which you can achieve with the box-zoom tool - but I'd like to immediately start like this)
So, it looks (as of 0.8.1) that we need to add some more convenient ways to set ranges with datetime values. That said, although this is a bit ugly, it does currently work for me:
import time, datetime
x_range = (
time.mktime(datetime.datetime(2014, 1, 1).timetuple())*1000,
time.mktime(datetime.datetime(2016, 1, 1).timetuple())*1000
)
p = bk.figure(
title='title', tools=TOOLS,x_axis_type="datetime",
plot_width=600, plot_height=300, x_range=x_range
)