I am trying to represent the data using the bokeh scatter.
Here is my code:
from bokeh.plotting import Scatter, output_file, show import pandas
df=pandas.Dataframe(colume["X","Y"])
df["X"]=[1,2,3,4,5,6,7]
df["Y"]=[23,43,32,12,34,54,33]
p=Scatter(df,x="X",y="Y", title="Day Temperature measurement", xlabel="Tempetature", ylabel="Day")
output_file("File.html")
show(p)
The Output should look like this:
Expected Output
The error is:
ImportError Traceback (most recent call
> last) <ipython-input-14-1730ac6ad003> in <module>
> ----> 1 from bokeh.plotting import Scatter, output_file, show
> 2 import pandas
> 3
> 4 df=pandas.Dataframe(colume["X","Y"])
> 5
ImportError: cannot import name 'Scatter' from 'bokeh.plotting'
(C:\Users\LENOVO\Anaconda3\lib\site-packages\bokeh\plotting__init__.py)
I had also found that the Scatter is no longer maintained now. Is there is any way to use it?
Also which alternative do I have to represent the data same as the Scatter using any another python libraries?
Using older version of Bokeh will resolve this issue?
Scatter (with a capital S) has never been part of bokeh.plotting. It used to be a part of the old bokeh.charts API that was removed several years ago. However, it is not needed at all to create basic scatter plots, since all the glyph methods in bokeh.plotting (e.g circle, square) are all implicitly scatter-type functions to begin with:
from bokeh.plotting import figure, show
import pandas as pd
df = pd.DataFrame({"X" :[1,2,3,4,5,6,7],
"Y": [23,43,32,12,34,54,33]})
p = figure(x_axis_label="Tempetature", y_axis_label="Day",
title="Day Temperature measurement")
p.circle("X", "Y", size=15, source=df)
show(p)
Which yields:
You can also just pass the data directly to circle as in the other answer.
If you want to do fancier things, like map the marker type based on a column there is also a plot.scatter (lower case s) methods on the figure:
from bokeh.plotting import figure, show
from bokeh.sampledata.iris import flowers
from bokeh.transform import factor_cmap, factor_mark
SPECIES = ['setosa', 'versicolor', 'virginica']
MARKERS = ['hex', 'circle_x', 'triangle']
p = figure(title = "Iris Morphology")
p.xaxis.axis_label = 'Petal Length'
p.yaxis.axis_label = 'Sepal Width'
p.scatter("petal_length", "sepal_width", source=flowers, legend_field="species", fill_alpha=0.4, size=12,
marker=factor_mark('species', MARKERS, SPECIES),
color=factor_cmap('species', 'Category10_3', SPECIES))
show(p)
which yields:
If you look up "scatter" in the docs, you'll find
Scatter Markers
To scatter circle markers on a plot, use the circle() method of Figure:
from bokeh.plotting import figure, output_file, show
# output to static HTML file
output_file("line.html")
p = figure(plot_width=400, plot_height=400)
# add a circle renderer with a size, color, and alpha
p.circle([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], size=20, color="navy", alpha=0.5)
# show the results
show(p)
To work with dataframes, just pass in the columns like df.X and df.Y to the x and y args.
from bokeh.plotting import figure, show, output_file
import pandas as pd
df = pd.DataFrame(columns=["X","Y"])
df["X"] = [1,2,3,4,5,6,7]
df["Y"] = [23,43,32,12,34,54,33]
p = figure()
p.scatter(df.X, df.Y, marker="circle")
#from bokeh.io import output_notebook
#output_notebook()
show(p) # or output to a file...
Related
I recently started using Bokeh for interactive network visualization. I'm plotting coordinates for 50 points, nodes that represent machines. Below is image of how my data is represented and my code. (I only put 14 machines to be simpler).
I've managed to plot the points but I have a question that I didn't find anywhere a specific solution. For some machines I have its temperature, but others no. How can I make the machines that I have the temperature information have a different color?
Like, the ones that I have this information be red, and the others that I don't have the information be blue? All tutorials that I found about changing the nodes color involves palettes, but this wouldn't have much use for me now.
import pandas
from bokeh.io import output_notebook, show, save
from bokeh.io import output_notebook, show, save
from bokeh.models import Range1d, Circle, ColumnDataSource, MultiLine
from bokeh.plotting import figure
output_notebook()
from bokeh.plotting import ColumnDataSource, figure, output_file, show
df = pandas.read_excel('Pasta1.xlsx', engine='openpyxl')
source = ColumnDataSource(data=dict(x = df['x'] ,y = df['y']))
TOOLTIPS = [("index", "$index"),("(x,y)", "($x, $y)")]
p = figure(width=1000, height=500, tooltips=TOOLTIPS,title="Redes")
p.circle('x', 'y', size=10,fill_color='red', source=source)
show(p)
The solutions is to use the color keyword of the p.circle() and pass a list (or array) instead of a static value. If you now the rules for your color it should be easy to pass the information to your plot.
The keyword fill_color also accepts lists (or arrays), if you prefere this.
Example
The example below creates the color column in the pandas DataFrame first, using np.where(). This can be done also by hand of unsing other technics.
import numpy as np
import pandas as pd
from bokeh.io import output_notebook, show, save
from bokeh.models import Range1d, Circle, ColumnDataSource, MultiLine
from bokeh.plotting import figure
output_notebook()
df = pd.DataFrame({
'machine':['J'+str(i) for i in range(13)],
'x':list(range(13)),
'y':list(range(13)),
'Temp' : [np.nan, 32, np.nan, 33, np.nan, np.nan, np.nan, np.nan, 35, np.nan, np.nan, 32, np.nan]
})
df['color'] = np.where(df['Temp'].isna(), 'blue', 'red')
source = ColumnDataSource(df)
TOOLTIPS = [("index", "$index"),("(x,y)", "($x, $y)"), ('name', "#machine")]
p = figure(width=500, height=500, tooltips=TOOLTIPS,title="Redes")
p.circle(x='x', y='y', size=10, color='color', source=source)
show(p)
Output
I have a dataframe that details sales of various product categories vs. time. I'd like to make a "line and marker" plot of sales vs. time, per category. To my surprise, this appears to be very difficult in Bokeh.
The scatter plot is easy. But then trying to overplot a line of sales vs. date with the same source (so I can update both scatter and line plots in one go when the source updates) and in such a way that the colors of the line match the colors of the scatter plot markers proves near impossible.
Minimal reproducible example with contrived data:
import pandas as pd
df = pd.DataFrame({'Date':['2020-01-01','2020-01-02','2020-01-01','2020-01-02'],\
'Product Category':['shoes','shoes','grocery','grocery'],\
'Sales':[100,180,21,22],'Colors':['red','red','green','green']})
df['Date'] = pd.to_datetime(df['Date'])
from bokeh.io import output_notebook
output_notebook()
from bokeh.io import output_file, show
from bokeh.plotting import figure
source = ColumnDataSource(df)
plot = figure(x_axis_type="datetime", plot_width=800, toolbar_location=None)
plot.scatter(x="Date",y="Sales",size=15, source=source, fill_color="Colors", fill_alpha=0.5, \
line_color="Colors",legend="Product Category")
for cat in list(set(source.data['Product Category'])):
tmp = source.to_df()
col = tmp[tmp['Product Category']==cat]['Colors'].values[0]
plot.line(x="Date",y="Sales",source=source, line_color=col)
show(plot)
Here's what it looks like, which is clearly wrong:
Here's what I want and don't know how to make:
Can Bokeh not make such plots, where scatter markers and lines have the same color per category, with a legend?
With bokeh it is often helpful to first think about the visualisation you want and then structuring the data source appropriately. You want two lines, on per category, the x axis is time and y axis is the sales. Then a natural way to structure your data source is the following:
df = pd.DataFrame({'Date':['2020-01-01','2020-01-02'],
'Shoe Sales':[100, 180],
'Grocery Sales': [21, 22]
})
from bokeh.io import output_notebook
output_notebook()
from bokeh.io import output_file, show
from bokeh.plotting import figure
source = ColumnDataSource(df)
plot = figure(x_axis_type="datetime", plot_width=800, toolbar_location=None)
categories = ["Shoe Sales", "Grocery Sales"]
colors = {"Shoe Sales": "red", "Grocery Sales": "green"}
for category in categories:
plot.scatter(x="Date",y=category,size=15, source=source, fill_color=colors[category], legend=category)
plot.line(x="Date",y=category,source=source, line_color=colors[category])
show(plot)
The solutions is to group your data. Then you can plot lines for each group.
Minimal Example
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
output_notebook()
df = pd.DataFrame({'Date':['2020-01-01','2020-01-02','2020-01-01','2020-01-02'],
'Product Category':['shoes','shoes','grocery','grocery'],
'Sales':[100,180,21,22],'Colors':['red','red','green','green']})
df['Date'] = pd.to_datetime(df['Date'])
plot = figure(x_axis_type="datetime",
plot_width=400,
plot_height=400,
toolbar_location=None
)
plot.scatter(x="Date",
y="Sales",
size=15,
source=df,
fill_color="Colors",
fill_alpha=0.5,
line_color="Colors",
legend_field="Product Category"
)
for color in df['Colors'].unique():
plot.line(x="Date", y="Sales", source=df[df['Colors']==color], line_color=color)
show(plot)
Output
If I want to make a scatter plot in matplotlib I can do:
import pandas as pd
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
import matplotlib.pyplot as plt
df = pd.DataFrame({'a': range(1, 6), 'b': list('ABCDE')})
plt.scatter(df['a'], df['b'])
plt.show()
Which gives
How would I get the same output in bokeh?
I tried (same set-up as above):
source = ColumnDataSource(df)
p = figure(
title="Something great",
tools='save,pan,box_zoom,reset,wheel_zoom',
background_fill_color="#fafafa"
)
p.scatter(
'a',
'b',
source=source
)
show(p)
but that does not plot anything. If I plot column a against itself it works fine, suggesting that the code structure is fine, but that it only works for numerical values. Is there a quick fix to this?
y_range parameter fixed the issue for me.
I found it at Handling Categorical Data.
p = figure(
y_range=df['b'], # < -- what I added
title="Something great",
tools='save,pan,box_zoom,reset,wheel_zoom',
background_fill_color="#fafafa"
)
I get two different results when I use bokehs circle (ordiamond_cross` function) and line function. The line function includes negative values and the circle does not.
Plot with line
and a plot with diamond_cross
I want to plot the temperatures for a certain place over a timespan. I have a lot of values therefore I would like to make a scatterplot through bokeh.
I also get the same problem when I use the x function.
In my code below you can change the diamond_cross with line and remove the fill_alpha and size then you will probably also get two different graphs.
import pandas as pd
import numpy as np
import bokeh as bk
import scipy.special
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool
from bokeh.models.glyphs import Quad
from bokeh.layouts import gridplot
df = pd.read_csv('KNMI2.csv', sep=';',
usecols= ['YYYYMMDD','Techt', 'YYYY', 'MM', 'D'])
jan= df[df['MM'].isin(['1'])]
source_jan = ColumnDataSource(jan)
p = figure(plot_width = 800, plot_height = 800,
x_range=(0,32), y_range=(-20,20))
p.diamond_cross(x='D', y='Techt', source=source_jan,
fill_alpha=0.2, size=2)
p.title.text = 'Temperatuur per uur vanaf 1951 tot 2019'
p.xaxis.axis_label = 'januari'
p.yaxis.axis_label = 'Temperatuur (C)'
show(p)
If both the circle/ diamond_cross function work the same as the line function then their plots will also show negative values.
I had a similar issue where the the data type of the variable I was trying to plot was string instead of int.
Try using
jan = df[df['MM'].isin(['1'])]
jan['Techt'] = jan['Techt'].astype(int)
source_jan = ColumnDataSource(jan)
I am running the following code to render a plot with dates in the x axis and floats in the y axis:
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import DatetimeTickFormatter
from bokeh.charts import Bar, Line, show
def datetime(x):
return pd.DataFrame(x, dtype='datetime64')
openxbids = pd.read_csv('data')
openxbids.sort_values('date')
output_file("lines.html")
p = figure(width=800, height=250, x_axis_type="datetime")
p.line(datetime(openxbids['date']), openxbids['bids'], color = 'navy', alpha=0.5)
show(p)
However, when I run this, I get a graph without any data plotted. The x and y axis ranges seem to be correctly detected. What am I missing?