I would like to plot a whole pandas DataFrame with Bokeh. I.e., I am looking for a Bokeh equivalent of the third line:
import pandas as pd
income_df = pd.read_csv("income_2013_dollars.csv", sep='\t', thousands=',')
income_df.plot(x="year")
Is there currently a way to do that, or do I have to pass each y-value separately?
Note from Bokeh project maintainers: This answer refers to an obsolete and deprecated API was long since removed from Bokeh. For information about creating bar charts with modern and fully supported Bokeh APIs, see other Questions/Answers.
You may find the charts examples useful:
https://github.com/bokeh/bokeh/tree/master/examples/charts
If you wanted a bar chart it would be:
from bokeh.charts import Bar
Bar(income_df, notebook=True).show() # assuming the index is corretly set on your df
You may want a Line or TimeSeries which work similarly - just checkout the examples for more details and more configuration - like adding titles, labels etc.
Note that you can use other output methods - notebook, file, or server. See the documentation here:
http://docs.bokeh.org/en/latest/docs/user_guide/charts.html#generic-arguments
Update: (sorry for the confusion on how to display the output). An alternative way of specifying the display type of the chart is to use the methods output_notebook(), output_file("file.html"), output_server() and then use the show method. For example
from bokeh.charts import Bar
from bokeh.plotting import output_notebook, show
output_notebook()
bar = Bar(income_df)
show(bar)
However, you cannot do the following
from bokeh.charts import Bar
from bokeh.plotting import output_notebook
output_notebook()
Bar(income_df).show() # WILL GIVE YOU AN ERROR
The two show methods are different.
See this User's Guide Section for modern information on creating Bar charts with Pandas:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html#pandas
For example:
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg import autompg as df
from bokeh.transform import factor_cmap
df.cyl = df.cyl.astype(str)
group = df.groupby('cyl')
source = ColumnDataSource(group)
cyl_cmap = factor_cmap('cyl', palette="Spectral5", factors=sorted(df.cyl.unique()))
p = figure(x_range=group, title="MPG by # Cylinders",
toolbar_location=None, tools="")
p.vbar(x='cyl', top='mpg_mean', width=1, source=source,
line_color=cyl_cmap, fill_color=cyl_cmap)
p.y_range.start = 0
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "some stuff"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)
Related
I am working with relatively large datasets (approximately 10x20.000.000 data point), for which Datashader is a useful visualisation tool. To give more information in these visualisations, I would like to add lines showing averages/standarddeviations on top of this datashade figure. Does anyone know how this would be possible?
My current code:
from bokeh.plotting import figure
from bokeh.io import show
x = 'xcol'
y= 'ycol'
data = dataframe
fig = figure(x_axis_label=x, y_axis_label=y)
points = hv.Points(data[[x, y]], label=('Title'))
hd.datashade(points, cmap='crest')
What I would like to do is for example add the following line to the figure generated with the code above:
fig.line([1,10,20], [0, 1000,2000], line_width=4)
Thanks in advance.
Below shown the syntax used to get a map visualized and plotted from Plotly Express - choropleth from a "csv" DataFrame.
import pandas as pd
import numpy as np
import plotly.express as px
df = "//location.csv"
fig = px.choropleth(data_frame = df,
locations= df["location"],
locationmode='country names',
color=df["location"],
hover_name=df["location"],
title = "Location Data",
color_continuous_scale = px.colors.sequential.Oranges)
fig["layout"].pop("updatemenus")
fig.show()
However, when I use the above syntax on the Visual Studio Code Jupyter Notebook, the map does not get visualized and plotted. Which is shown as below,
But when I run the same code on the Anaconda Jupyter Notebook, I do get the map visualized and plotted as shown below,
Why isn't the map not getting visualized and plotted on VS code, and is there any way to resolve this issue on VS code?
I was interested in this question because I usually work with jypyterLab. I ran it based on this answer, and when I ran it in vscode, it displayed correctly in my default browser. The code I ran was based on the code in the official reference.
import plotly.express as px
from plotly.offline import plot
df = px.data.gapminder().query("year==2007")
fig = px.choropleth(df, locations="iso_alpha",
color="lifeExp", # lifeExp is a column of gapminder
hover_name="country", # column to add to hover information
color_continuous_scale=px.colors.sequential.Plasma)
# fig.show()
plot(fig)
I am trying to customize plotly iplot that rendered multiple time series, but iplot accept only one parameters. I checked into plotly documentation, and usinf go object was mentioned. But I am still not able able to adding custom fonts and watermark to the plotly plot. Can anyone help me out? any possible idea to make this work?
minimal data and demo code
Here is the code that I tried to use for adding custom fonts and watermark on that. I am new to plotly so some fancy built int functions are not quite intuitive to me. Any possible help would be appreciated.
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from IPython.core.display import display, HTML
import matplotlib as mpl
import cufflinks as cf
import seaborn as sns
import pandas as pd
import numpy as np
# setup
display(HTML("<style>.container { width:35% !important; } .widget-select > select {background-color: gainsboro;}</style>"))
init_notebook_mode(connected=True)
np.random.seed(1)
mpl.rcParams['figure.dpi']= 440
# sample data from cufflinks
df = cf.datagen.lines()
# plotly
iplot([{
'x': df.index,
'y': df[col],
'name': col
} for col in df.columns])
plus, I want to smooth the output of above code (which is multiple time series plot), how can I do that? any idea? Thanks
update
I have done this with matplotlib but don't know doing same thing in plotly. here is my script for loading customized font, watermark:
import matplotib.pyplot as plt
import matplotlib.font_manager as fm
fig, ax = plt.subplots(figsize=(10,6))
fname=r'C:\Users\Nunito-Black.ttf'
myfont=fm.FontProperties(fname=fname,size=50)
legend_fname=r'C:\Users\RobotoCondensed-Regular.ttf'
legend_font=fm.FontProperties(fname=legend_fname,size=20)
## some code for passing plot data to plotting function
ax.text(0.5, 0.5, 'mylogo',fontsize=60,fontproperties=myfont,color='black',
transform=ax.transAxes,ha='center', va='center', alpha=0.3)
plt.show()
how can I do same things in plotly plot? any idea?
Note: This is not (yet) an answer.
I do not understand what do you mean by smooth on the first part. Anyway I see some not necessary imports plus it seems to me you use plotly with an old sintax.
import plotly.graph_objs as go
import cufflinks as cf
import pandas as pd
df = cf.datagen.lines()
fig = go.Figure()
for col in df.columns:
fig.add_trace(
go.Scatter(x=df.index,
y=df[col],
name=col))
fig.show()
The output being
Consider that in this case you could use pd.util.testing.makeTimeDataFrame() instead of import cufflinks.
For the second part i suggest you to read the documentation for go.Layout.font? which is
Supported dict properties:
color
family
HTML font family - the typeface that will be
applied by the web browser. The web browser
will only be able to apply a font if it is
available on the system which it operates.
Provide multiple font families, separated by
commas, to indicate the preference in which to
apply fonts if they aren't available on the
system. The plotly service (at https://plot.ly
or on-premise) generates images on a server,
where only a select number of fonts are
installed and supported. These include "Arial",
"Balto", "Courier New", "Droid Sans",, "Droid
Serif", "Droid Sans Mono", "Gravitas One", "Old
Standard TT", "Open Sans", "Overpass", "PT Sans
Narrow", "Raleway", "Times New Roman".
size
The usage in Python is here and apparently the js version is more flexible see this
I had a look at Kaggle's univariate-plotting-with-pandas. There's this line which generates bar graph.
reviews['province'].value_counts().head(10).plot.bar()
I don't see any color scheme defined specifically.
I tried plotting it using jupyter notebook but could see only one color instead of all multiple colors as at Kaggle.
I tried reading the document and online help but couldn't get any method to generate these colors just by the line above.
How do we do that? Is there a config to set this randomness by default?
It seems like the multicoloured bars were the default behaviour in one of the former pandas versions and Kaggle must have used that one for their tutorial (you can read more here).
You can easily recreate the plot by defining a list of standard colours and then using it as an argument in bar.
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
'#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
reviews['province'].value_counts().head(10).plot.bar(color=colors)
Tested on pandas 0.24.1 and matplotlib 2.2.2.
In seaborn is it not problem:
import seaborn as sns
sns.countplot(x='province', data=reviews)
In matplotlib are not spaces, but possible with convert values to one row DataFrame:
reviews['province'].value_counts().head(10).to_frame(0).T.plot.bar()
Or use some qualitative colormap:
import matplotlib.pyplot as plt
N = 10
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Paired(np.arange(N)))
reviews['province'].value_counts().head(N).plot.bar(color=plt.cm.Pastel1(np.arange(N)))
The colorful plot has been produced with an earlier version of pandas (<= 0.23). Since then, pandas has decided to make bar plots monochrome, because the color of the bars is pretty meaningless. If you still want to produce a bar chart with the default colors from the "tab10" colormap in pandas >= 0.24, and hence recreate the previous behaviour, it would look like
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
N = 13
df = pd.Series(np.random.randint(10,50,N), index=np.arange(1,N+1))
cmap = plt.cm.tab10
colors = cmap(np.arange(len(df)) % cmap.N)
df.plot.bar(color=colors)
plt.show()
I am experimenting with Bokeh and mixing pieces of code. I created the graph below from a Pandas DataFrame, which displays the graph correctly with all the tool elements I want. However, the tooltip is partially displaying the data.
Here is the graph:
Here is my code:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import HoverTool
from collections import OrderedDict
x = yearly_DF.index
y0 = yearly_DF.weight.values
y1 = yearly_DF.muscle_weight.values
y2 = yearly_DF.bodyfat_p.values
#output_notebook()
p = figure(plot_width=1000, plot_height=600,
tools="pan,box_zoom,reset,resize,save,crosshair,hover",
title="Annual Weight Change",
x_axis_label='Year',
y_axis_label='Weight',
toolbar_location="left"
)
hover = p.select(dict(type=HoverTool))
hover.tooltips = OrderedDict([('Year', '#x'),('Total Weight', '#y0'), ('Muscle Mass', '$y1'), ('BodyFat','$y2')])
output_notebook()
p.line(x, y0, legend="Weight")
p.line(x, y1, legend="Muscle Mass", line_color="red")
show(p)
I have tested with Firefox 39.0, Chrome 43.0.2357.130 (64-bit) and Safari Version 8.0.7. I have cleared the cache and I get the same error in all browsers. Also I did pip install bokeh --upgrade to make sure I have the latest version running.
Try using ColumnDataSource.
Hover tool needs to have access to the data source so that it can display info.
#x, #y are the x-y values in data unit. (# prefix is special, can only followed by a limited set of variable, #y2 is not one of them)., Normally I would use $+ column_name to display the value of my interest, such as $weight. See here for more info.
Besides, I am surprised that the hover would appear at all. As I thought hoverTool doesn't work with line glyph, as noted here
Try the following : (I haven't tested, might have typos).
df = yearly_DF.reset_index() # move index to column.
source = ColumnDataSource(ColumnDataSource.from_df(df)
hover.tooltips = OrderedDict([('x', '#x'),('y', '#y'), ('year', '$index'), ('weight','$weight'), ('muscle_weight','$muscle_weight'), ('body_fat','$bodyfat_p')])
p.line(x='index', y='weight', source=source, legend="Weight")
p.line(x='index', y='muscle_weight', source=source, legend="Muscle Mass", line_color="red")
Are you using Firefox? This was a reported issue with some older versions of FF:
https://github.com/bokeh/bokeh/issues/1981
https://github.com/bokeh/bokeh/issues/2122
Upgrading FF resolved the issue.