I am trying to get an output from a dataframe that shows a stacked horizontal bar chart with a table to the left of it. The relevant data is as follows:
import pandas as pd
import matplotlib.pyplot as plt
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
I'd like to get something similar to what's in the following: Python Matplotlib how to get table only. I can get the stacked bar chart:
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True)
Adding in the keyword argument table=True puts a table below the chart. How do I get the axis to either display the df as a table or add one in next to the chart. Also, the DataFrame will eventually have more than one row, but if I can get it work for one then I should be able to get it to work for n rows.
Thanks in advance.
Unfortunately using the pandas.plot method you won't be able to do this. The docs for the table parameter state:
If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table.
So you will have to use matplotlib directly to get this done. One option is to create 2 subplots; one for your table and one for your chart. Then you can add the table and modify it as you see fit.
import matplotlib.pyplot as plt
import pandas as pd
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
fig, (ax1, ax2) = plt.subplots(1, 2)
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True, ax=ax2)
ax1.table(cellText=df[['days_green','days_yellow','days_red']].values, colLabels=['days_green', 'days_yellow', 'days_red'], loc='center')
ax1.axis('off')
fig.show()
Related
I have a Pandas dataframe representing portfolio weights in multiple dates, such as the following contents in CSV format:
DATE,ASSET1,ASSET2,ASSET3,ASSET4,ASSET5,ASSET6,ASSET7
2010-01-04,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-02-03,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-03-05,0.217195,0.0,0.250000,0.032805,0.25,0.000000,0.250000
2010-04-06,0.139636,0.0,0.250000,0.110364,0.25,0.000000,0.250000
2010-05-05,0.179569,0.0,0.218951,0.101480,0.25,0.000000,0.250000
2010-06-04,0.207270,0.0,0.211974,0.080756,0.25,0.000000,0.250000
2010-07-06,0.132468,0.0,0.250000,0.117532,0.25,0.000000,0.250000
2010-08-04,0.116353,0.0,0.250000,0.133647,0.25,0.000000,0.250000
2010-09-02,0.081677,0.0,0.250000,0.168323,0.25,0.000000,0.250000
2010-10-04,0.000000,0.0,0.250000,0.250000,0.25,0.009955,0.240045
For each row in the Pandas dataframe resulting from this CSV, we can generate a bar chart with the portfolio composition at that day. I would like to have multiple bar charts, with a time slider, such that we can choose one of the dates and see the portfolio composition during that day.
Can this be achieved with Plotly?
I could not find a way to do it straight in the dataframe above, but it is possible to do it by "melting" the dataframe. The following code achieves what I was looking for, together with some beautification of the chart:
import pandas as pd
from io import StringIO
import plotly.express as px
string = """
DATE,ASSET1,ASSET2,ASSET3,ASSET4,ASSET5,ASSET6,ASSET7
2010-01-04,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-02-03,0.250000,0.0,0.250000,0.000000,0.25,0.000000,0.250000
2010-03-05,0.217195,0.0,0.250000,0.032805,0.25,0.000000,0.250000
2010-04-06,0.139636,0.0,0.250000,0.110364,0.25,0.000000,0.250000
2010-05-05,0.179569,0.0,0.218951,0.101480,0.25,0.000000,0.250000
2010-06-04,0.207270,0.0,0.211974,0.080756,0.25,0.000000,0.250000
2010-07-06,0.132468,0.0,0.250000,0.117532,0.25,0.000000,0.250000
2010-08-04,0.116353,0.0,0.250000,0.133647,0.25,0.000000,0.250000
2010-09-02,0.081677,0.0,0.250000,0.168323,0.25,0.000000,0.250000
2010-10-04,0.000000,0.0,0.250000,0.250000,0.25,0.009955,0.240045
"""
df = pd.read_csv(StringIO(string))
df = df.melt(id_vars=['DATE']).sort_values(by = 'DATE')
fig = px.bar(df, x="variable", y="value", animation_frame="DATE")
fig.update_layout(legend_title_text = None)
fig.update_xaxes(title = "Asset")
fig.update_yaxes(title = "Proportion")
fig.update_layout(autosize = True, height = 600)
fig.update_layout(hovermode="x")
fig.update_layout(plot_bgcolor="#F8F8F8")
fig.update_traces(
hovertemplate=
'<i></i> %{y:.2%}'
)
fig.show()
This produces the following:
I have a dataframe with 3 variables:
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Approved"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
I want a barplot grouped by the Period column, showing all the values contained in the Observations column and colored with the Result column.
How can I do this?
I tried the sns.barplot, but it joined the values in Observations column in just one bar(mean of the values).
sns.barplot(x='Period',y='Observations',hue='Result',data=df,ci=None)
Plot output
Assuming that you want one bar for each row, you can do as follows:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
result_cat = df["Result"].astype("category")
result_codes = result_cat.cat.codes.values
cmap = plt.cm.Dark2(range(df["Result"].unique().shape[0]))
patches = []
for code in result_cat.cat.codes.unique():
cat = result_cat.cat.categories[code]
patches.append(mpatches.Patch(color=cmap[code], label=cat))
df.plot.bar(x='Period',
y='Observations',
color=cmap[result_codes],
legend=False)
plt.ylabel("Observations")
plt.legend(handles=patches)
If you would like it grouped by the months, and then stacked, please use the following (note I updated your code to make sure one month had more than one status), but not sure I completely understood your question correctly:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Under evaluation"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
df.groupby(['Period', 'Result'])['Observations'].sum().unstack('Result').plot(kind='bar', stacked=True)
I have a pandas dataframe:
import pandas as pd
data1 = {'Date':['03-19-2019'],
'Total':[35],
'Solved':[19],
'Arrived':[23],
}
df1 = pd.DataFrame(data1)
and I want to plot a bar plot like this:
with
df1.plot(kind='barh',x='Date',y='Total', ax=ax0, color='#C0C0C0',
width=0.5)
df1.plot(kind='barh',x='Date',y='Arrived', ax=ax0, color='#C0FFFF',
width=0.5)
df1.plot(kind='barh',x='Date',y='Solved', ax=ax0, color='#C0C0FF',
width=0.5)
However, to avoid overlapping, I have to draw each column taking into account which of them has the bigger value.(Total greater than Arrived greater than Solved)
How can I avoid to do this and automate this process easily?
There must be a straightforward and simpler approach in Pandas but I just came up with this quick workaround. The idea is following:
Leave out the first column Date and sort the remaining columns.
Use the sorted indices for plotting the columns in ascending order
To make the colors consistent, you can make use of dictionary so that the ascending/descending order doesn't affect your colors.
fig, ax0 = plt.subplots()
ids = np.argsort(df1.values[0][1:])[::-1]
colors = {'Total': '#C0C0C0', 'Arrived': '#C0FFFF', 'Solved':'#C0C0FF'}
for col in np.array(df1.columns[1:].tolist())[ids]:
df1.plot(kind='barh',x='Date',y=col, ax=ax0, color=colors[col], width=0.1)
A stacked bar graph can be produced in pandas via the stacked=True option. To use this you need to make the "Date" the index first.
import matplotlib.pyplot as plt
import pandas as pd
data1 = {'Date':['03-19-2019'],
'Total':[35],
'Solved':[19],
'Arrived':[23],
}
df = pd.DataFrame(data1)
df.set_index("Date").plot(kind="barh", stacked=True)
plt.show()
I have data like this :
[ ('2018-04-09', '10:18:11',['s1',10],['s2',15],['s3',5])
('2018-04-09', '10:20:11',['s4',8],['s2',20],['s1',10])
('2018-04-10', '10:30:11',['s4',10],['s5',6],['s6',3]) ]
I want to plot a stacked graph preferably of this data.
X-axis will be time,
it should be like this
I created this image in paint just to show.
X axis will show time like normal graph does( 10:00 ,April 3,2018).
I am stuck because the string value (like 's1',or 's2' ) will change in differnt bar graph.
Just to hard code and verify,I try this:
import plotly
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import matplotlib
plotly.offline.init_notebook_mode()
def createPage():
graph_data = []
l1=[('com.p1',1),('com.p2',2)('com.p3',3)]
l2=[('com.p1',1),('com.p4',2)('com.p5',3)]
l3=[('com.p2',8),('com.p3',2)('com.p6',30)]
trace_temp = go.Bar(
x='2018-04-09 10:18:11',
y=l1[0],
name = 'top',
)
graph_data.append(trace_temp)
plotly.offline.plot(graph_data, filename='basic-scatter3.html')
createPage()
Error I am getting is Tuple Object is not callable.
So can someone please suggest some code for how I can plot such data.
If needed,I may store data in some other form which may be helpful in plotting.
Edit :
I used the approach suggested in accepted answer and succeed in plotting using plotly like this
fig=df.iplot(kin='bar',barmode='stack',asFigure=True)
plotly.offline.plt(fig,filename="stack1.html)
However I faced one error:
1.When Time intervals are very close,Data overlaps on graph.
Is there a way to overcome it.
You could use pandas stacked bar plot. The advantage is that you can create with pandas easily the table of column/value pairs you have to generate anyhow.
from matplotlib import pyplot as plt
import pandas as pd
all_data = [('2018-04-09', '10:18:11', ['s1',10],['s2',15],['s3',5]),
('2018-04-09', '10:20:11', ['s4',8], ['s2',20],['s1',10]),
('2018-04-10', '10:30:11', ['s4',10],['s5',6], ['s6',3]) ]
#load data into dataframe
df = pd.DataFrame(all_data, columns = list("ABCDE"))
#combine the two descriptors
df["day/time"] = df["A"] + "\n" + df["B"]
#assign each list to a new row with the appropriate day/time label
df = df.melt(id_vars = ["day/time"], value_vars = ["C", "D", "E"])
#split each list into category and value
df[["category", "val"]] = pd.DataFrame(df.value.values.tolist(), index = df.index)
#create a table with category-value pairs from all lists, missing values are set to NaN
df = df.pivot(index = "day/time", columns = "category", values = "val")
#plot a stacked bar chart
df.plot(kind = "bar", stacked = True)
#give tick labels the right orientation
plt.xticks(rotation = 0)
plt.show()
Output:
I have data in a Pandas dataframe that I am trying to plot to a time series line graph.
When plotting one single line, I have been able to do this quite successfully using the p.line function, ensuring I make the x_axis_type 'datetime'.
To plot multiple lines, I have tried using p.multi_line, which worked well but I also need a legend and, according to this post, it's not possible to add a legend to a multiline: Bokeh how to add legend to figure created by multi_line method?
Leo's answer to the question in the link above looks promising, but I can't seem to work out how to apply this when the data is sourced from a dataframe.
Does anyone have any tips?
OK, this seems to work:
from bokeh.plotting import figure, output_file, save
from bokeh.models import ColumnDataSource
import pandas as pd
from pandas import HDFStore
from bokeh.palettes import Spectral11
# imports data to dataframe from our storage hdf5 file
# our index column has no name, so this is assigned a name so it can be
# referenced to for plotting
store = pd.HDFStore('<file location>')
df = pd.DataFrame(store['d1'])
df = df.rename_axis('Time')
#the number of columns is the number of lines that we will make
numlines = len(df.columns)
#import color pallet
mypalette = Spectral11[0:numlines]
# remove unwanted columns
col_list = ['Column A', 'Column B']
df = df[col_list]
# make a list of our columns
col = []
[col.append(i) for i in df.columns]
# make the figure,
p = figure(x_axis_type="datetime", title="<title>", width = 800, height = 450)
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = '<units>'
# loop through our columns and colours
for (columnnames, colore) in zip(col, mypalette):
p.line(df.index, df[columnnames], legend = columnnames, color = colore )
# creates an output file
output_file('<output location>')
#save the plot
save(p)