I am new to Python and have a question regarding a lineplot.
I have a data set which I would like to display as a Seaborn lineplot.
In this dataset I have 3 categories which should be on the Y axis. I have no data for an X axis, but I want to use the index.
Unfortunately I did not get it right. I would like to use it like the Excel picture.
The columns are also of different lengths.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("Testdata.csv", delimiter= ";")
df
Double Single Triple
0 50.579652 24.498143 60.954680
1 53.313919 24.497490 60.494626
2 54.174343 24.490651 60.052566
3 56.622435 24.485605 59.622501
4 59.656155 26.201791 59.199581
... ... ... ...
410 NaN NaN 75.478118
411 NaN NaN 73.780804
412 NaN NaN 72.716096
413 NaN NaN 72.468472
414 NaN NaN 71.179819
How do I do that?
I appreciate your help.
First melt your columns and then use hue parameter to plot each line:
fig, ax = pyplot.subplots(figsize=(10, 10))
ax =seaborn.lineplot(
data= df.melt(id_vars='index').rename(columns=str.title),
x= 'index',
y= 'value',
hue='varaible'
)
Related
At the moment I have:
fig, ax = plt.subplots()
ax = df.plot.barh(stacked=True)
st.pyplot(fig)
The dataframe for reference if necessary looks like:
A B C D E
Cat1 5.3 NaN NaN NaN NaN
Cat2 NaN NaN 12.1 NaN NaN
Cat3 NaN NaN NaN 3.4 4.5
Cat4 NaN 2.8 NaN NaN NaN
where if I get rid of the fig in st.pyplot(fig), forcing the function to render the global figure - it produces a nice stacked bar plot, but with the deprecation warning.
So I know it's not a problem with matplotlib producing the plot from my dataframe, but actually with streamlit displaying the plot.
Basically, what matplotlib syntax do I need to get streamlit to produce this horizontal stacked bar plot?
Thanks in advance
You can remove fig from st.pyplot() and streamlit will show your plot.
Or you can render the horizontal bar plot with altair.
Internal streamlit chart builder (that is altair's wrapper) will also produce your plot, but not with horizontal bars.
import pandas as pd
import altair as alt
import matplotlib.pyplot as plt
import streamlit as st
st.set_option('deprecation.showPyplotGlobalUse', False)
st.set_page_config(page_title="Stacked Bar",layout="wide")
df=pd.read_clipboard()
df1=df.reset_index().melt(id_vars='index')
chart=alt.Chart(df1).mark_bar().encode(
y=alt.Y("index:N", title=""),
x="value:Q",
color="variable:N").properties(height=300)
fig, ax = plt.subplots()
ax = df.plot.barh(stacked=True)
col1, col2, col3 = st.columns(3)
with col1:
st.title('Matplotlib plot')
st.pyplot()
with col2:
st.title('Altair plot')
st.altair_chart(chart, use_container_width=True)
with col3:
st.title('Streamlit plot')
st.bar_chart(df)
I'm leaning python pandas and playing with some example data. I have a CSV file of a dataset with net worth by percentile of US population by quarter of year.
I've successfully subseted the data by percentile to create three scatter plots of net worth by year, one plot for each of three population sections. However, I'm trying to combine those three plots to one data frame so I can combine the lines on a single plot figure.
Data here:
https://www.federalreserve.gov/releases/z1/dataviz/download/dfa-income-levels.csv
Code thus far:
import pandas as pd
import matplotlib.pyplot as plt
# importing numpy as np
import numpy as np
df = pd.read_csv("dfa-income-levels.csv")
df99th = df.loc[df['Category']=="pct99to100"]
df99th.plot(x='Date',y='Net worth', title='Net worth by percentile')
dfmid = df.loc[df['Category']=="pct40to60"]
dfmid.plot(x='Date',y='Net worth')
dflow = df.loc[df['Category']=="pct00to20"]
dflow.plot(x='Date',y='Net worth')
data = dflow['Net worth'], dfmid['Net worth'], df99th['Net worth']
headers = ['low', 'mid', '99th']
newdf = pd.concat(data, axis=1, keys=headers)
And that yields a dataframe shown below, which is not what I want for plotting the data.
low mid 99th
0 NaN NaN 3514469.0
3 NaN 2503918.0 NaN
5 585550.0 NaN NaN
6 NaN NaN 3602196.0
9 NaN 2518238.0 NaN
... ... ... ...
747 NaN 8610343.0 NaN
749 3486198.0 NaN NaN
750 NaN NaN 32011671.0
753 NaN 8952933.0 NaN
755 3540306.0 NaN NaN
Any recommendations for other ways to approach this?
#filter you dataframe to only the categories you're interested in
filtered_df = df[df['Category'].isin(['pct99to100', 'pct00to20', 'pct40to60'])]
filtered_df = filtered_df[['Date', 'Category', 'Net worth']]
fig, ax = plt.subplots() #ax is an axis object allowing multiple plots per axis
filtered_df.groupby('Category').plot(ax=ax)
I don't see the categories mentioned in your code in the csv file you shared. In order to concat dataframes along columns, you could use pd.concat along axis=1. It concats the columns of same index number. So first set the Date column as index and then concat them, and then again bring back Date as a dataframe column.
To set Date column as index of dataframe, df1 = df1.set_index('Date') and df2 = df2.set_index('Date')
Concat the dataframes df1 and df2 using df_merge = pd.concat([df1,df2],axis=1) or df_merge = pd.merge(df1,df2,on='Date')
bringing back Date into column by df_merge = df_merge.reset_index()
Using this code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.options.display.float_format = '{:.2f}'.format
a = pd.read_csv(r'C:\Users\Leonardo\Desktop\TRABALHO\dadosboias\MARINHA_TRATADO\Cabo Frio\boia_1\cabofrio.csv', na_values=['-9999.0'])
a.index = pd.to_datetime(a[['Year', 'Month', 'Day', 'Hour', 'Minute']])
pd.options.mode.chained_assignment = None
The output is something like this:
index wspd wdir gust hs
2009-06-24 15:21:00 1.4669884357700003 9.0 2.03121475722 nan
2009-06-24 16:21:00 1.4669884357700003 34.0 2.03121475722 nan
2009-06-24 17:21:00 0.677071585741 127.0 1.35414317148 nan
2009-06-24 18:21:00 0.22569052858000002 146.0 0.902762114322 nan
... ... ... ...
2013-02-10 17:21:00 nan nan nan nan
And doing a simple plotting with plt.plot(a.hs, 'r.') the output is this:
As can be seeable the dataframe has a lot of missing data in "hs" column. The main objective is to plot just the periods with data. In the image you can see that 2012-03 to 2013-3 have a lot of good data of "hs", so the objective is to plot this period and get something like this:
I Would be thankful if someone could help.
You can just select the relevant range, e.g.
a.loc['2012-03-01':'2013-03-01', 'hs'].plot()
I have a DataFrame that looks like this when unstacked.
Start Date 2016-07-11 2016-07-12 2016-07-13
Period
0 1.000000 1.000000 1.0
1 0.684211 0.738095 NaN
2 0.592105 NaN NaN
I'm trying to plot it in Seaborn as a heatmap but it's giving me unintended results.
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.array(data), columns=['Start Date', 'Period', 'Users'])
df = df.fillna(0)
df = df.set_index(['Start Date', 'Period'])
sizes = df['Users'].groupby(level=0).first()
df = df['Users'].unstack(0).divide(sizes, axis=1)
plt.title("Test")
sns.heatmap(df.T, mask=df.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.savefig(table._v_name + "fig.png")
I want it so that text doesn't overlap and there aren't 6 heat legends on the side. Also if possible, how do I fix the date so that it only displays %Y-%m-%d?
While exact reproducible data is not available, consider below using posted snippet data. This example runs a pivot_table() to achieve the structure as posted with StartDates across columns. Overall, your heatmap possibly outputs the multiple color bars and overlapping figures due to the unstack() processing where you seem to be dividing by users (look into seaborn.FacetGrid to split). So below runs the df as is through heatmap. Also, an apply() re-formats datetime to specified need.
from io import StringIO
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
data = '''Period,StartDate,Value
0,2016-07-11,1.000000
0,2016-07-12,1.000000
0,2016-07-13,1.0
1,2016-07-11,0.684211
1,2016-07-12,0.738095
1,2016-07-13
2,2016-07-11,0.592105
2,2016-07-12
2,2016-07-13'''
df = pd.read_csv(StringIO(data))
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['StartDate'] = df['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
pvtdf = df.pivot_table(values='Value', index=['Period'],
columns='StartDate', aggfunc=sum)
print(pvtdf)
# StartDate 2016-07-11 2016-07-12 2016-07-13
# Period
# 0 1.000000 1.000000 1.0
# 1 0.684211 0.738095 NaN
# 2 0.592105 NaN NaN
sns.set()
plt.title("Test")
ax = sns.heatmap(pvtdf.T, mask=pvtdf.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.show()
I'm trying to plot a time series data but I have some problems.
I'm using this code:
from matplotlib import pyplot as plt
plt.figure('Fig')
plt.plot(data.index,data.Colum,'g', linewidth=2.0,label='Data')
And I get this:
But I dont want the interpolation between missing values!
How can I achieve this?
Since you are using pandas you could do something like this:
import pandas as pd
import matplotlib.pyplot as plt
pd.np.random.seed(1234)
idx = pd.date_range(end=datetime.today().date(), periods=10, freq='D')
vals = pd.Series(pd.np.random.randint(1, 10, size=idx.size), index=idx)
vals.iloc[4:8] = pd.np.nan
print vals
Here is an example of a column from a DataFrame with DatetimeIndex
2016-03-29 4.0
2016-03-30 7.0
2016-03-31 6.0
2016-04-01 5.0
2016-04-02 NaN
2016-04-03 NaN
2016-04-04 NaN
2016-04-05 NaN
2016-04-06 9.0
2016-04-07 1.0
Freq: D, dtype: float64
To plot it without dates where data is NaN you could do something like this:
fig, ax = plt.subplots()
ax.plot(range(vals.dropna().size), vals.dropna())
ax.set_xticklabels(vals.dropna().index.date.tolist());
fig.autofmt_xdate()
Which should produce a plot like this:
The trick here is to replace the dates with some range of values that do not trigger matplotlib's internal date processing when you call .plot method.
Later, when the plotting is done, replace the ticklabels with actual dates.
Optionally, call .autofmt_xdate() to make labels readable.