Python / Pandas / Bokeh: plotting multiple lines with legends from dataframe - python

I have data in a Pandas dataframe that I am trying to plot to a time series line graph.
When plotting one single line, I have been able to do this quite successfully using the p.line function, ensuring I make the x_axis_type 'datetime'.
To plot multiple lines, I have tried using p.multi_line, which worked well but I also need a legend and, according to this post, it's not possible to add a legend to a multiline: Bokeh how to add legend to figure created by multi_line method?
Leo's answer to the question in the link above looks promising, but I can't seem to work out how to apply this when the data is sourced from a dataframe.
Does anyone have any tips?

OK, this seems to work:
from bokeh.plotting import figure, output_file, save
from bokeh.models import ColumnDataSource
import pandas as pd
from pandas import HDFStore
from bokeh.palettes import Spectral11
# imports data to dataframe from our storage hdf5 file
# our index column has no name, so this is assigned a name so it can be
# referenced to for plotting
store = pd.HDFStore('<file location>')
df = pd.DataFrame(store['d1'])
df = df.rename_axis('Time')
#the number of columns is the number of lines that we will make
numlines = len(df.columns)
#import color pallet
mypalette = Spectral11[0:numlines]
# remove unwanted columns
col_list = ['Column A', 'Column B']
df = df[col_list]
# make a list of our columns
col = []
[col.append(i) for i in df.columns]
# make the figure,
p = figure(x_axis_type="datetime", title="<title>", width = 800, height = 450)
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = '<units>'
# loop through our columns and colours
for (columnnames, colore) in zip(col, mypalette):
p.line(df.index, df[columnnames], legend = columnnames, color = colore )
# creates an output file
output_file('<output location>')
#save the plot
save(p)

Related

Multiple consecutive bar plots with a time slider in Plotly, Python

I have a Pandas dataframe representing portfolio weights in multiple dates, such as the following contents in CSV format:
DATE,ASSET1,ASSET2,ASSET3,ASSET4,ASSET5,ASSET6,ASSET7
2010-01-04,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-02-03,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-03-05,0.217195,0.0,0.250000,0.032805,0.25,0.000000,0.250000
2010-04-06,0.139636,0.0,0.250000,0.110364,0.25,0.000000,0.250000
2010-05-05,0.179569,0.0,0.218951,0.101480,0.25,0.000000,0.250000
2010-06-04,0.207270,0.0,0.211974,0.080756,0.25,0.000000,0.250000
2010-07-06,0.132468,0.0,0.250000,0.117532,0.25,0.000000,0.250000
2010-08-04,0.116353,0.0,0.250000,0.133647,0.25,0.000000,0.250000
2010-09-02,0.081677,0.0,0.250000,0.168323,0.25,0.000000,0.250000
2010-10-04,0.000000,0.0,0.250000,0.250000,0.25,0.009955,0.240045
For each row in the Pandas dataframe resulting from this CSV, we can generate a bar chart with the portfolio composition at that day. I would like to have multiple bar charts, with a time slider, such that we can choose one of the dates and see the portfolio composition during that day.
Can this be achieved with Plotly?
I could not find a way to do it straight in the dataframe above, but it is possible to do it by "melting" the dataframe. The following code achieves what I was looking for, together with some beautification of the chart:
import pandas as pd
from io import StringIO
import plotly.express as px
string = """
DATE,ASSET1,ASSET2,ASSET3,ASSET4,ASSET5,ASSET6,ASSET7
2010-01-04,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-02-03,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-03-05,0.217195,0.0,0.250000,0.032805,0.25,0.000000,0.250000
2010-04-06,0.139636,0.0,0.250000,0.110364,0.25,0.000000,0.250000
2010-05-05,0.179569,0.0,0.218951,0.101480,0.25,0.000000,0.250000
2010-06-04,0.207270,0.0,0.211974,0.080756,0.25,0.000000,0.250000
2010-07-06,0.132468,0.0,0.250000,0.117532,0.25,0.000000,0.250000
2010-08-04,0.116353,0.0,0.250000,0.133647,0.25,0.000000,0.250000
2010-09-02,0.081677,0.0,0.250000,0.168323,0.25,0.000000,0.250000
2010-10-04,0.000000,0.0,0.250000,0.250000,0.25,0.009955,0.240045
"""
df = pd.read_csv(StringIO(string))
df = df.melt(id_vars=['DATE']).sort_values(by = 'DATE')
fig = px.bar(df, x="variable", y="value", animation_frame="DATE")
fig.update_layout(legend_title_text = None)
fig.update_xaxes(title = "Asset")
fig.update_yaxes(title = "Proportion")
fig.update_layout(autosize = True, height = 600)
fig.update_layout(hovermode="x")
fig.update_layout(plot_bgcolor="#F8F8F8")
fig.update_traces(
hovertemplate=
'<i></i> %{y:.2%}'
)
fig.show()
This produces the following:

Is there a pandas way to automatically have nice names for columns to use in legends or similar graph items?

Is there any pandas way to "link" a dataframe column name with a nice description for that name?
See the following snippet where I have a dataframe with two column: the weight in kg and the height in meter of ten people.
When I create the dataframe I use this syntax
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
but I would like to "attach" in the creation of the dataframe a beautiful description for column name a and $\b_0$ some latex for column name b so that all the graph items that automatically use that names appears nice to the user (legend, tick labels, axis labels and so on).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
sz = 10
bmi = np.random.normal(25,0.1,sz)
h = np.random.normal(70*2.54/100,4*2.54/100,sz)
w = bmi*h**2
df = pd.DataFrame({'height_m':h,'weight_kg':w})
ax1 = df.plot.scatter(x='height_m',y='weight_kg')
plt.savefig('raw.png')
ax2 = df.plot.scatter(x='height_m',y='weight_kg')
ax2.set_xlabel('$h_0$, Altezza/m')
ax2.set_ylabel('$p_0$, Peso/kg')
plt.savefig('publishable.png')
plt.show()
This is the raw picture straight from pandas:
This is the picture I would like to get... but without modifying by myself the plot adding set_xlabel and set_ylabel and so on...
You can name your DataFrame correctly from the beginning and plot the dataframe accessing df.columns:
sz = 10
bmi = np.random.normal(25,0.1,sz)
h = np.random.normal(70*2.54/100,4*2.54/100,sz)
w = bmi*h**2
df = pd.DataFrame({'$h_0$, Altezza/m':h,'$p_0$, Peso/kg':w})
df.plot.scatter(x=df.columns[0], y=df.columns[1])
plt.savefig('publishable.png')
plt.show()
Plus, if you are using Jupyter Notebook / Jupyter Lab, it will convert the LaTeX correctly:

Marked in Scatter plots, if Unexpected Values shows

I have a dataframe, like this. I want to do scatter plots of it.
I want to do scatter plots of Value1 but whenever value2 is decreased to below 0.6, I want to marked in those scatter plots (Value1) to red color otherwise default color is okay.
Any Suggestions ?
Add another column with color information:
import matplotlib.cm as cm
df['color'] = [int(value < 0.6) for value in df.Value2]
df.plot.scatter(x=df.index, y='Value1',c='color',cmap=cm.jet)
I use seaborn's lmplot (advanced scatterplot) tool for that.
You can make a new column in your spreadsheet file with name "Category". It's very easy to categorize variables in excel or openoffice
(It's something like this -> (if(cell_value<0.6-->low),if(cell_value>0.6-->high)).)
So your test data should look like this:
Than you can import the data in python (I use Anaconda 3.5 with spider: python 3.6) I saved the file in .txt format. but any other format is possible (.csv etc.)
#Import libraries
import seaborn as sns
import pandas as pd
import numpy as np
import os
#Open data.txt which is stored in a repository
os.chdir(r'C:\Users\DarthVader\Desktop\Graph')
f = open('data.txt')
#Get data in a list splitting by semicolon
data = []
for l in f:
v = l.strip().split(';')
data.append(v)
f.close()
#Convert list as dataframe for plot purposes
df = pd.DataFrame(data, columns = ['ID', 'Value', 'Value2','Category'])
#pop out first row with header
df2 = df.iloc[1:]
#Change variables to be plotted as numeric types
df2[['Value','Value2']] = df2[['Value','Value2']].apply(pd.to_numeric)
#Make plot with red color with values below 0.6 and green color with values above 0.6
sns.lmplot( x="Value", y="Value2", data=df2, fit_reg=False, hue='Category', legend=False, palette=dict(high="#2ecc71", low="#e74c3c"))
Your output should look like this.

Create a stacked graph or bar graph using plotly in python

I have data like this :
[ ('2018-04-09', '10:18:11',['s1',10],['s2',15],['s3',5])
('2018-04-09', '10:20:11',['s4',8],['s2',20],['s1',10])
('2018-04-10', '10:30:11',['s4',10],['s5',6],['s6',3]) ]
I want to plot a stacked graph preferably of this data.
X-axis will be time,
it should be like this
I created this image in paint just to show.
X axis will show time like normal graph does( 10:00 ,April 3,2018).
I am stuck because the string value (like 's1',or 's2' ) will change in differnt bar graph.
Just to hard code and verify,I try this:
import plotly
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import matplotlib
plotly.offline.init_notebook_mode()
def createPage():
graph_data = []
l1=[('com.p1',1),('com.p2',2)('com.p3',3)]
l2=[('com.p1',1),('com.p4',2)('com.p5',3)]
l3=[('com.p2',8),('com.p3',2)('com.p6',30)]
trace_temp = go.Bar(
x='2018-04-09 10:18:11',
y=l1[0],
name = 'top',
)
graph_data.append(trace_temp)
plotly.offline.plot(graph_data, filename='basic-scatter3.html')
createPage()
Error I am getting is Tuple Object is not callable.
So can someone please suggest some code for how I can plot such data.
If needed,I may store data in some other form which may be helpful in plotting.
Edit :
I used the approach suggested in accepted answer and succeed in plotting using plotly like this
fig=df.iplot(kin='bar',barmode='stack',asFigure=True)
plotly.offline.plt(fig,filename="stack1.html)
However I faced one error:
1.When Time intervals are very close,Data overlaps on graph.
Is there a way to overcome it.
You could use pandas stacked bar plot. The advantage is that you can create with pandas easily the table of column/value pairs you have to generate anyhow.
from matplotlib import pyplot as plt
import pandas as pd
all_data = [('2018-04-09', '10:18:11', ['s1',10],['s2',15],['s3',5]),
('2018-04-09', '10:20:11', ['s4',8], ['s2',20],['s1',10]),
('2018-04-10', '10:30:11', ['s4',10],['s5',6], ['s6',3]) ]
#load data into dataframe
df = pd.DataFrame(all_data, columns = list("ABCDE"))
#combine the two descriptors
df["day/time"] = df["A"] + "\n" + df["B"]
#assign each list to a new row with the appropriate day/time label
df = df.melt(id_vars = ["day/time"], value_vars = ["C", "D", "E"])
#split each list into category and value
df[["category", "val"]] = pd.DataFrame(df.value.values.tolist(), index = df.index)
#create a table with category-value pairs from all lists, missing values are set to NaN
df = df.pivot(index = "day/time", columns = "category", values = "val")
#plot a stacked bar chart
df.plot(kind = "bar", stacked = True)
#give tick labels the right orientation
plt.xticks(rotation = 0)
plt.show()
Output:

output table to left of horizontal bar chart with pandas

I am trying to get an output from a dataframe that shows a stacked horizontal bar chart with a table to the left of it. The relevant data is as follows:
import pandas as pd
import matplotlib.pyplot as plt
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
I'd like to get something similar to what's in the following: Python Matplotlib how to get table only. I can get the stacked bar chart:
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True)
Adding in the keyword argument table=True puts a table below the chart. How do I get the axis to either display the df as a table or add one in next to the chart. Also, the DataFrame will eventually have more than one row, but if I can get it work for one then I should be able to get it to work for n rows.
Thanks in advance.
Unfortunately using the pandas.plot method you won't be able to do this. The docs for the table parameter state:
If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table.
So you will have to use matplotlib directly to get this done. One option is to create 2 subplots; one for your table and one for your chart. Then you can add the table and modify it as you see fit.
import matplotlib.pyplot as plt
import pandas as pd
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
fig, (ax1, ax2) = plt.subplots(1, 2)
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True, ax=ax2)
ax1.table(cellText=df[['days_green','days_yellow','days_red']].values, colLabels=['days_green', 'days_yellow', 'days_red'], loc='center')
ax1.axis('off')
fig.show()

Categories

Resources