I can't get Bokeh to display my plot. This is my Python code.
import pandas as pd
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_file, show
if __name__ == '__main__':
file = 'Overview Data.csv'
overview_df = pd.read_csv(file)
overview_ds = ColumnDataSource(overview_df)
output_file('Wins across Seasons.html')
print(overview_ds.data)
p = figure(plot_width=400, plot_height=400)
# add a circle renderer with a size, color, and alpha
p.circle('Season', 'Wins', source = overview_ds, size=20, color="navy", alpha=0.5)
# show the results
show(p)
I checked my Chrome browser Inspect Element and the console shows the following.
Wins across Seasons.html:17 [bokeh] could not set initial ranges
e.set_initial_range # Wins across Seasons.html:17
This only seems to happen when I am reading from a file. Hard-coding x and y coordinates work.
I have checked other posts but none of the fixes worked. All my packages are up to date.
This is the file I am reading
Season,Matches Played,Wins,Losses,Goals,Goals Conceded,Clean Sheets
2011-12,38,28,5,89,33,20
2010-11,38,23,4,78,37,15
2009-10,38,27,7,86,28,19
2008-09,38,28,4,68,24,24
2007-08,38,27,5,80,22,21
2006-07,38,28,5,83,27,16
This is the output of the print statement.
{'Season': array(['2011-12', '2010-11', '2009-10', '2008-09', '2007-08', '2006-07'],
dtype=object), 'Matches Played': array([38, 38, 38, 38, 38, 38], dtype=int64), 'Wins': array([28, 23, 27, 28, 27, 28], dtype=int64), 'Losses': array([5, 4, 7, 4, 5, 5], dtype=int64), 'Goals': array([89, 78, 86, 68, 80, 83], dtype=int64), 'Goals Conceded': array([33, 37, 28, 24, 22, 27], dtype=int64), 'Clean Sheets': array([20, 15, 19, 24, 21, 16], dtype=int64), 'index': array([0, 1, 2, 3, 4, 5], dtype=int64)}
Bokeh does not know what to do with those string dates unless you tell it. There are two basic possibilities:
Keep them as strings, and treat them as categorical factors. You can do that by telling Bokeh what the factors are when you create the plot:
p = figure(plot_width=400, plot_height=400,
x_range=list(overview_df.Season.unique()))
That results in this figure:
If you want a different order of categories you can re-order x_range however you like.
Convert them to real datetime values and use a datetime axis. You can do this by telling Pandas to parse column 0 as a date field:
overview_df = pd.read_csv(file, parse_dates=[0])
and telling Bokeh to use a datetime axis:
p = figure(plot_width=400, plot_height=400, x_axis_type="datetime")
That results in this figure:
you can convert the 'Season'-column to datetime to get an output.
overview_df = pd.read_csv(file)
overview_df.Season = pd.to_datetime(overview_df.Season)
overview_ds = ColumnDataSource(overview_df)
Related
I was wondering if there was a way to color a line to follow the curve from the user specified input. Example is shown below. The user wants to color a line that starts from x = 11, to x = 14 (see image below for the result). I tried f.ex df.loc[..] where it tries to locate points closest to. But then it just colors it from x = 10 to 15. Anyone have an idea how to solve this? Do I need to add extra points between two points, how would I do that? The user might also add x = 11 to x = 19.
Appreciate any help or guidance.
from bokeh.plotting import figure, output_file, show
import pandas as pd
p = figure(width=600, height=600, tools="pan,reset,save")
data = {'x': [1, 2, 3, 6, 10, 15, 20, 22],
'y': [2, 3, 6, 8, 18, 24, 50, 77]}
df = pd.DataFrame(data)
p.line(df.x, df.y)
show(p)
What the result should look like when user inputs x = 11 (start) and x = 14 (end):
With pandas you can create an interpolated DataFrame from the original.
With this you can add a new line in red.
from bokeh.plotting import figure, output_notebook, show
import pandas as pd
output_notebook()
p = figure(width=600, height=600, tools="pan,reset,save")
data = {'x': [1, 2, 3, 6, 10, 15, 20, 22],
'y': [2, 3, 6, 8, 18, 24, 50, 77]}
df = pd.DataFrame(data)
df_interpolated = (df.copy()
.set_index('x')
.reindex(index = range(df['x'].min(), df['x'].max()))
.reset_index() # optional, you could write 'index' in the second line plot, too.
.interpolate()
)
p.line(df.x, df.y)
p.line(df_interpolated.x[11:14], df_interpolated.y[11:14], color="red")
show(p)
I have coded a horizontal grouped bar plot using Python. My requirement is that I want to write the number associated with each bar alongside the bars. I have seen problems similar to this on the internet. But I am not sure how to carry out the task in my specific case where there are grouped bars. The following is my code:
# importing package
import matplotlib.pyplot as plt
import pandas as pd
# create data
df = pd.DataFrame([['A', 10, 20, 10, 30], ['B', 20, 25, 15, 25], ['C', 12, 15, 19, 6],
['D', 10, 29, 13, 19]],
columns=['Team', 'Round 1', 'Round 2', 'Round 3', 'Round 4'])
# view data
print(df)
# plot grouped bar chart
ax=df.plot.barh(x='Team',
stacked=False,
log=True,
title='Grouped Bar Graph with dataframe')
The updates required is added here. Note that I have increased the figure size so you can see the numbers and moved the legend box outside the plot. At least a part of the solution is available here. If you have the newer version of matplotlib (3.4.2 or later), you can also use the bar_label feature
# importing package
import matplotlib.pyplot as plt
import pandas as pd
# create data
df = pd.DataFrame([['A', 10, 20, 10, 30], ['B', 20, 25, 15, 25], ['C', 12, 15, 19, 6],
['D', 10, 29, 13, 19]],
columns=['Team', 'Round 1', 'Round 2', 'Round 3', 'Round 4'])
# view data
print(df)
# plot grouped bar chart
ax=df.plot.barh(x='Team', stacked=False, figsize=(10,7), log = True,
title='Grouped Bar Graph with dataframe')
# Move legend outside the graph
ax.legend(bbox_to_anchor=(1.01, 1))
# Add labels
for p in ax.patches:
ax.annotate(str(p.get_width()), (p.get_x() + p.get_width(), p.get_y()-0.075), xytext=(5, 10), textcoords='offset points')
Output
In the figure (see the link below the code), you can see that the bottom horizontal gridline is above the x-axis whereas I would prefer it to be overlapping the x-axis to make the graph look more accurate. Could anyone please tell me how to achieve that? Also, it would be amazing if someone could tell me how I can start my graph from 0 at the bottom left corner. Here is my code:
import matplotlib.pyplot as plt
import numpy as np
x_coordinates = np.array([0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
y_coordinates = np.array([0, 5, 10, 15, 20, 25, 30, 35, 40,45 ])
plt.xlabel("extension/mm")
plt.ylabel("tension/ N")
plt.title("extention vs tension correlation")
plt.xticks([0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
plt.minorticks_on()
plt.grid(b=True, which="minor", color="black" )
plt.grid(b=True, which ="major",color="black")
plt.plot(x_coordinates, y_coordinates)
plt.show()
It's plotting the minor ticks, and that looks confusing against the x-axis. If your plot range ends on a major tick, then it will look nicer. Here is one possible solution:
plt.ylim([min(y_coordinates),max(y_coordinates)])
plt.xlim([min(x_coordinates),max(x_coordinates)])
I'm trying to build an audiofingerprint algorithm like Shazam.
I have a variable length array of frequency point data like so:
[[69, 90, 172],
[6, 18, 24],
[6, 18],
[6, 18, 24, 42],
[]
...
I would like to dotplot it like a spectrogram sort of like this. My data doesn't explicitly have a time series axes but each row is a 0.1s slice of time. I am aware of plt.specgram.
np.repeat can create an accompanying array of x's. It needs an array of sizes to be calculated from the input values.
Here is an example supposing the x's are .1 apart (like in the post's description, but unlike the example image).
import numpy as np
import matplotlib.pyplot as plt
# ys = [[69, 90, 172], [6, 18, 24], [6, 18], [6, 18, 24, 42]]
ys = [np.random.randint(50, 3500, np.random.randint(2, 6)) for _ in range(30)]
sizes = [len(y) for y in ys]
xs = [np.repeat(np.arange(.1, (len(ys) + .99) / 10, .1), sizes)]
plt.scatter(xs, np.concatenate(ys), marker='x', color='blueviolet')
plt.show()
I've recently started using the dark chesterish theme from dunovank, and I
love how good a simple pandas.DataFrame.plot() looks like out of the box:
Snippet 1:
# Theme from dunovank, exclude if not installed:
from jupyterthemes import jtplot
jtplot.style()
# snippet from pandas docs:
ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000', periods=1000)).cumsum()
ax = ts.plot()
Output 1:
But I'd like to add an alternating background color (seems to be all the rage with big news agencies). The post How can I set the background color on specific areas of a pyplot figure? gives a good description of how you can do it. And it's really easy for numeric x-values:
Snippet 2:
# imports
import pandas as pd
import numpy as np
from jupyterthemes import jtplot
# Sample data
np.random.seed(123)
rows = 50
dfx = pd.DataFrame(np.random.randint(90,110,size=(rows, 1)), columns=['Variable Y'])
dfy = pd.DataFrame(np.random.randint(25,68,size=(rows, 1)), columns=['Variable X'])
df = pd.concat([dfx,dfy], axis = 1)
jtplot.style()
ax = df.plot()
for i in range(0, 60, 20):
ax.axvspan(i, i+10, facecolor='lightgrey', alpha=0.025)
Output 2:
But it gets a lot messier (for me at least) when the x-axis is of a time or date format. And that's because the axis in my two examples goes from this
# in:
ax.lines[0].get_data()
# out:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
dtype=int64)
To this (abbreviated):
# in:
ts.plot().lines[0].get_data()
# out:
.
.
Period('2002-09-15', 'D'), Period('2002-09-16', 'D'),
Period('2002-09-17', 'D'), Period('2002-09-18', 'D'),
Period('2002-09-19', 'D'), Period('2002-09-20', 'D'),
Period('2002-09-21', 'D'), Period('2002-09-22', 'D'),
Period('2002-09-23', 'D'), Period('2002-09-24', 'D'),
Period('2002-09-25', 'D'), Period('2002-09-26', 'D')], dtype=object)
ts.plot().lines[0].get_data() returns the data on the x-axis. But is there a way to find out where matplotlib renders the vertical lines for each 'Jan' observation, so I can more easily find decent intervals for the alternating black and grey background color?
Thank you for any suggestions!
Edit - Or is there a theme?
Or does anyone know if there exists a theme somewhere that is free to use?
I've checked all matplotlib themes import matplotlib.pyplot as plt; print(plt.style.available) and Seaborn, but with no success.
Edit 2 - Suggested solution from ImportanceOfBeingErnest with the chesterish theme activated:
In my humble opinion, this is a perfect setup for a time series chart (could maybe drop the splines though)
Gridlines are by default shown at the positions of the major ticks. You can get those ticks via ax.get_xticks(). The problem will be that it is not guaranteed that the edges of the plot coincide with those ticks, in fact they are most often dissimilar. So in order to have a consistent shading over the range of the axes, the first shade should start at the edge of the plot and end at the first gridline, then the following shades can go in between gridlines, up to the last, which will again be between the last gridline and the edge of the axes.
Another problem is that the limits of the plot and hence the automatically generated gridlines may change over the lifetime of the plot, e.g. because you decide to have different limits or zoom or pan the plot. So ideally one would recreate the shading each time the axis limits change. This is what the following does:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# time series
ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000', periods=1000)).cumsum()
# numeric series
#ts = pd.Series(np.random.randn(1000),index=np.linspace(25,800,1000)).cumsum()
ax = ts.plot(x_compat=True)
ax.grid()
class GridShader():
def __init__(self, ax, first=True, **kwargs):
self.spans = []
self.sf = first
self.ax = ax
self.kw = kwargs
self.ax.autoscale(False, axis="x")
self.cid = self.ax.callbacks.connect('xlim_changed', self.shade)
self.shade()
def clear(self):
for span in self.spans:
try:
span.remove()
except:
pass
def shade(self, evt=None):
self.clear()
xticks = self.ax.get_xticks()
xlim = self.ax.get_xlim()
xticks = xticks[(xticks > xlim[0]) & (xticks < xlim[-1])]
locs = np.concatenate(([[xlim[0]], xticks, [xlim[-1]]]))
start = locs[1-int(self.sf)::2]
end = locs[2-int(self.sf)::2]
for s, e in zip(start, end):
self.spans.append(self.ax.axvspan(s, e, zorder=0, **self.kw))
gs = GridShader(ax, facecolor="lightgrey", first=False, alpha=0.7)
plt.show()
Use an axis vertical span with datetime values for the x-values:
from jupyterthemes import jtplot
import pandas as pd
import numpy as np
from datetime import datetime
jtplot.style()
ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000', periods=1000)).cumsum()
ax = ts.plot()
# or an appropriate for-loop
ax.axvspan(datetime(1999, 12, 15), datetime(2000, 1, 15), facecolor='red', alpha=0.25)
ax.axvspan(datetime(2000, 12, 15), datetime(2001, 1, 15), facecolor='red', alpha=0.25)