Using Streamlit and matplotlib to display a pandas dataframe bar plot

Using Streamlit and matplotlib to display a pandas dataframe bar plot - python

At the moment I have:
fig, ax = plt.subplots()
ax = df.plot.barh(stacked=True)
st.pyplot(fig)
The dataframe for reference if necessary looks like:
A B C D E
Cat1 5.3 NaN NaN NaN NaN
Cat2 NaN NaN 12.1 NaN NaN
Cat3 NaN NaN NaN 3.4 4.5
Cat4 NaN 2.8 NaN NaN NaN
where if I get rid of the fig in st.pyplot(fig), forcing the function to render the global figure - it produces a nice stacked bar plot, but with the deprecation warning.
So I know it's not a problem with matplotlib producing the plot from my dataframe, but actually with streamlit displaying the plot.
Basically, what matplotlib syntax do I need to get streamlit to produce this horizontal stacked bar plot?
Thanks in advance

You can remove fig from st.pyplot() and streamlit will show your plot.
Or you can render the horizontal bar plot with altair.
Internal streamlit chart builder (that is altair's wrapper) will also produce your plot, but not with horizontal bars.
import pandas as pd
import altair as alt
import matplotlib.pyplot as plt
import streamlit as st
st.set_option('deprecation.showPyplotGlobalUse', False)
st.set_page_config(page_title="Stacked Bar",layout="wide")
df=pd.read_clipboard()
df1=df.reset_index().melt(id_vars='index')
chart=alt.Chart(df1).mark_bar().encode(
y=alt.Y("index:N", title=""),
x="value:Q",
color="variable:N").properties(height=300)
fig, ax = plt.subplots()
ax = df.plot.barh(stacked=True)
col1, col2, col3 = st.columns(3)
with col1:
st.title('Matplotlib plot')
st.pyplot()
with col2:
st.title('Altair plot')
st.altair_chart(chart, use_container_width=True)
with col3:
st.title('Streamlit plot')
st.bar_chart(df)

Related

Python Seaborn Lineplot

I am new to Python and have a question regarding a lineplot.
I have a data set which I would like to display as a Seaborn lineplot.
In this dataset I have 3 categories which should be on the Y axis. I have no data for an X axis, but I want to use the index.
Unfortunately I did not get it right. I would like to use it like the Excel picture.
The columns are also of different lengths.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("Testdata.csv", delimiter= ";")
df
Double Single Triple
0 50.579652 24.498143 60.954680
1 53.313919 24.497490 60.494626
2 54.174343 24.490651 60.052566
3 56.622435 24.485605 59.622501
4 59.656155 26.201791 59.199581
... ... ... ...
410 NaN NaN 75.478118
411 NaN NaN 73.780804
412 NaN NaN 72.716096
413 NaN NaN 72.468472
414 NaN NaN 71.179819
How do I do that?
I appreciate your help.

First melt your columns and then use hue parameter to plot each line:
fig, ax = pyplot.subplots(figsize=(10, 10))
ax =seaborn.lineplot(
data= df.melt(id_vars='index').rename(columns=str.title),
x= 'index',
y= 'value',
hue='varaible'
)

How to change spacing between two ticks in matplotlib chart?

I'm plotting some data that requires Day 0 to not be shown on the x-axis. The dataframe has no column for Day 0, but Matplotlib creates a space for it between day -1 and 1. I've looked through the documentation, but can't find a way to adjust spacing between only two ticks. The dataframe is:
group stat -1.0 1.0 2.0 3.0 4.0 5.0
abc mean 8.362999 17.043362 3.526539 22.931884 10.835121 6.035011
abc sem 1.481135 5.029173 0.822778 13.768812 2.149704 0.840965
abc std 3.311919 11.245573 1.839788 30.787999 4.806885 1.880455
Code to plot:
df.set_index(['subject'], inplace=True)
df.drop(['group'],axis=1,inplace=True)
x = df.columns.values
y = df.loc['mean'].values
sem = df.loc['sem'].values
plt.errorbar(x, y, sem, color='#0075d9', marker='o', clip_on=False)
This is an example of the chart (please ignore the shading):
You can see that it has more space between -1 and 1 than the other ticks. Is there a way to 'drop' the Day 0 tick from the X-axis?

Matplotlib: Stacked area chart for all the groups

I am trying to create a stacked area chart for all the groups in my data on a similar timeline x-axis. My data looks like following
dataDate name prediction
2018-09-30 A 2.309968
2018-10-01 A 1.516652
2018-10-02 A 2.086062
2018-10-03 A 1.827490
2018-09-30 B 0.965861
2018-10-01 B 6.521989
2018-10-02 B 9.219777
2018-10-03 B 17.434451
2018-09-30 C 6.890485
2018-10-01 C 6.106187
2018-10-02 C 5.535563
2018-10-03 C 1.913100
And I am trying to create something like following
The x-axes will be the time series. Please help me to recreate the same. Thanks

Say your data is stored in a dataframe named df. Then you can pivot the dataframe and plot it directly. Make sure your dates are actual dates, not strings.
df["dataDate"] = pd.to_datetime(df["dataDate"])
df.pivot("dataDate", "name", "prediction").plot.area();

You can copy your data in clipboard and try something like this
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_clipboard()
fig, ax = plt.subplots()
for label, sub_df in df.set_index('dataDate').groupby('name'):
sub_df.plot.area(ax=ax, label=label)
plt.legend()

Pandas Seaborn Heatmap Error

I have a DataFrame that looks like this when unstacked.
Start Date 2016-07-11 2016-07-12 2016-07-13
Period
0 1.000000 1.000000 1.0
1 0.684211 0.738095 NaN
2 0.592105 NaN NaN
I'm trying to plot it in Seaborn as a heatmap but it's giving me unintended results.
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.array(data), columns=['Start Date', 'Period', 'Users'])
df = df.fillna(0)
df = df.set_index(['Start Date', 'Period'])
sizes = df['Users'].groupby(level=0).first()
df = df['Users'].unstack(0).divide(sizes, axis=1)
plt.title("Test")
sns.heatmap(df.T, mask=df.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.savefig(table._v_name + "fig.png")
I want it so that text doesn't overlap and there aren't 6 heat legends on the side. Also if possible, how do I fix the date so that it only displays %Y-%m-%d?

While exact reproducible data is not available, consider below using posted snippet data. This example runs a pivot_table() to achieve the structure as posted with StartDates across columns. Overall, your heatmap possibly outputs the multiple color bars and overlapping figures due to the unstack() processing where you seem to be dividing by users (look into seaborn.FacetGrid to split). So below runs the df as is through heatmap. Also, an apply() re-formats datetime to specified need.
from io import StringIO
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
data = '''Period,StartDate,Value
0,2016-07-11,1.000000
0,2016-07-12,1.000000
0,2016-07-13,1.0
1,2016-07-11,0.684211
1,2016-07-12,0.738095
1,2016-07-13
2,2016-07-11,0.592105
2,2016-07-12
2,2016-07-13'''
df = pd.read_csv(StringIO(data))
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['StartDate'] = df['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
pvtdf = df.pivot_table(values='Value', index=['Period'],
columns='StartDate', aggfunc=sum)
print(pvtdf)
# StartDate 2016-07-11 2016-07-12 2016-07-13
# Period
# 0 1.000000 1.000000 1.0
# 1 0.684211 0.738095 NaN
# 2 0.592105 NaN NaN
sns.set()
plt.title("Test")
ax = sns.heatmap(pvtdf.T, mask=pvtdf.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.show()

Remove interpolation Time series plot for missing values

I'm trying to plot a time series data but I have some problems.
I'm using this code:
from matplotlib import pyplot as plt
plt.figure('Fig')
plt.plot(data.index,data.Colum,'g', linewidth=2.0,label='Data')
And I get this:
But I dont want the interpolation between missing values!
How can I achieve this?

Since you are using pandas you could do something like this:
import pandas as pd
import matplotlib.pyplot as plt
pd.np.random.seed(1234)
idx = pd.date_range(end=datetime.today().date(), periods=10, freq='D')
vals = pd.Series(pd.np.random.randint(1, 10, size=idx.size), index=idx)
vals.iloc[4:8] = pd.np.nan
print vals
Here is an example of a column from a DataFrame with DatetimeIndex
2016-03-29 4.0
2016-03-30 7.0
2016-03-31 6.0
2016-04-01 5.0
2016-04-02 NaN
2016-04-03 NaN
2016-04-04 NaN
2016-04-05 NaN
2016-04-06 9.0
2016-04-07 1.0
Freq: D, dtype: float64
To plot it without dates where data is NaN you could do something like this:
fig, ax = plt.subplots()
ax.plot(range(vals.dropna().size), vals.dropna())
ax.set_xticklabels(vals.dropna().index.date.tolist());
fig.autofmt_xdate()
Which should produce a plot like this:
The trick here is to replace the dates with some range of values that do not trigger matplotlib's internal date processing when you call .plot method.
Later, when the plotting is done, replace the ticklabels with actual dates.
Optionally, call .autofmt_xdate() to make labels readable.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Streamlit and matplotlib to display a pandas dataframe bar plot - python

Related

Python Seaborn Lineplot

How to change spacing between two ticks in matplotlib chart?

Matplotlib: Stacked area chart for all the groups

Pandas Seaborn Heatmap Error

Remove interpolation Time series plot for missing values

Categories

Resources