Why are bars missing in my stacked bar chart -- Python w/matplotlib - python

all.
I am trying to create a stacked bar chart built using time series data. My issue -- if I plot my data as time series (using lines) then everything works fine and I get a (messy) time series graph that includes correct dates. However, if I instead try to plot this as a stacked bar chart, my dates disappear and none of my bars appear.
I have tried messing with the indexing, height, and width of the bars. No luck.
Here is my code:
import pylab
import pandas as pd
import matplotlib.pyplot as plt
df1= pd.read_excel('pathway/filename.xls')
df1.set_index('TIME', inplace=True)
ax = df1.plot(kind="Bar", stacked=True)
ax.set_xlabel("Date")
ax.set_ylabel("Change in Yield")
df1.sum(axis=1).plot( ax=ax, color="k", title='Historical Decomposition -- 1 year -- One-Quarter Revision')
plt.axhline(y=0, color='r', linestyle='-')
plt.show()
If i change
ax = df1.plot(kind="Bar", stacked=True)
to ax = df1.plot(kind="line", stacked=False)
I get:
if instead I use ax = df1.plot(kind="Bar", stacked=True)
I get:
Any thoughts here?

Without knowing what the data looks like, I'd try something like this:
#Import data here and generate DataFrame
print(df.head(5))
A B C D
DATE
2020-01-01 -0.01 0.06 0.40 0.45
2020-01-02 -0.02 0.05 0.39 0.42
2020-01-03 -0.03 0.04 0.38 0.39
2020-01-04 -0.04 0.03 0.37 0.36
2020-01-05 -0.05 0.02 0.36 0.33
f, ax = plt.subplots()
ax.bar(df.index, df['A'])
ax.bar(df.index, df['B'])
ax.bar(df.index, df['C'], bottom=df['B'])
ax.plot(df.index, df['D'], color='black', linewidth=2)
ax.set_xlabel('Date')
ax.set_ylabel('Change in Yield')
ax.axhline(y=0, color='r')
ax.set_xticks([])
ax.legend()
plt.show()
Edit:: Ok, I've found a way looking at this post here:
Plot Pandas DataFrame as Bar and Line on the same one chart
Try resetting the index so that it is a separate column. In my example, it is called 'DATE'. Then try:
ax = df[['DATE','D']].plot(x='DATE',color='black')
df[['DATE','A','B','C']].plot(x='DATE', kind='bar',stacked=True,ax=ax)
ax.axhline(y=0, color='r')
ax.set_xticks([])
ax.set_xlabel('Date')
ax.set_ylabel('Change in Yield')
ax.legend()
plt.show()

Related

Matplotlib line and bar in the same chart

I have a pandas dataframe with this easy structure:
Month
Energy
Percentage
Jan
10
0.5
Feb
11
0.6
March
13
0.71
April
15
0.73
May
18
0.81
June
20
0.85
July
24
0.91
August
28
0.93
September
24
0.81
November
17
0.71
December
15
0.6
And I want to plot the energy in bar and the percentaje in a line all in the same chart with two Y axis one for the Energy and other for the percentage. The final result I want to seems like in the picture below:
I'm also interested in show the fixed axis X with all the months even if this month doesnt have values yet
Hope you can help me
Thanks!
The code below solves your problem. So basically using the twinx() function you are able to create the 2nd access on the same plot. I copied and pasted the data you provided in your question to an excel file named Book1.xlsx.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel("Book1.xlsx")
df['Percentage'] = df['Percentage'] * 100
# create figure and axis objects with subplots()
fig,ax = plt.subplots()
fig = plt.figure(figsize=(60, 50))
# make a plot
ax.plot(df.Month, df.Percentage,
marker="o",
color="red")
# set x-axis label
ax.set_xlabel("Months", fontsize = 14)
# set y-axis label
ax.set_ylabel("Percentage",
color="red",
fontsize=14)
# twin object for two different y-axis on the sample plot
ax2=ax.twinx()
# make a plot with different y-axis using second axis object
ax.bar(df.Month ,df.Energy )
ax2.set_ylabel("Energy",color="blue",fontsize=14)
plt.setp(ax.get_xticklabels(), rotation=40, horizontalalignment='right')
plt.show()

How can I plot all columns of data frame as linear regression in a single subplot? [duplicate]

I am trying to plot two displots side by side with this code
fig,(ax1,ax2) = plt.subplots(1,2)
sns.displot(x =X_train['Age'], hue=y_train, ax=ax1)
sns.displot(x =X_train['Fare'], hue=y_train, ax=ax2)
It returns the following result (two empty subplots followed by one displot each on two lines)-
If I try the same code with violinplot, it returns result as expected
fig,(ax1,ax2) = plt.subplots(1,2)
sns.violinplot(y_train, X_train['Age'], ax=ax1)
sns.violinplot(y_train, X_train['Fare'], ax=ax2)
Why is displot returning a different kind of output and what can I do to output two plots on the same line?
seaborn.distplot has been DEPRECATED in seaborn 0.11 and is replaced with the following:
displot(), a figure-level function with a similar flexibility over the kind of plot to draw. This is a FacetGrid, and does not have the ax parameter, so it will not work with matplotlib.pyplot.subplots.
histplot(), an axes-level function for plotting histograms, including with kernel density smoothing. This does have the ax parameter, so it will work with matplotlib.pyplot.subplots.
It is applicable to any of the seaborn FacetGrid plots that there is no ax parameter. Use the equivalent axes-level plot.
Look at the documentation for the figure-level plot to find the appropriate axes-level plot function for your needs.
See Figure-level vs. axes-level functions
Because the histogram of two different columns is desired, it's easier to use histplot.
See How to plot in multiple subplots for a number of different ways to plot into maplotlib.pyplot.subplots
Also review seaborn histplot and displot output doesn't match
Tested in seaborn 0.11.1 & matplotlib 3.4.2
fig, (ax1, ax2) = plt.subplots(1, 2)
sns.histplot(x=X_train['Age'], hue=y_train, ax=ax1)
sns.histplot(x=X_train['Fare'], hue=y_train, ax=ax2)
Imports and DataFrame Sample
import seaborn as sns
import matplotlib.pyplot as plt
# load data
penguins = sns.load_dataset("penguins", cache=False)
# display(penguins.head())
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
Axes Level Plot
With the data in a wide format, use sns.histplot
# select the columns to be plotted
cols = ['bill_length_mm', 'bill_depth_mm']
# create the figure and axes
fig, axes = plt.subplots(1, 2)
axes = axes.ravel() # flattening the array makes indexing easier
for col, ax in zip(cols, axes):
sns.histplot(data=penguins[col], kde=True, stat='density', ax=ax)
fig.tight_layout()
plt.show()
Figure Level Plot
With the dataframe in a long format, use displot
# create a long dataframe
dfl = penguins.melt(id_vars='species', value_vars=['bill_length_mm', 'bill_depth_mm'], var_name='bill_size', value_name='vals')
# display(dfl.head())
species bill_size vals
0 Adelie bill_length_mm 39.1
1 Adelie bill_depth_mm 18.7
2 Adelie bill_length_mm 39.5
3 Adelie bill_depth_mm 17.4
4 Adelie bill_length_mm 40.3
# plot
sns.displot(data=dfl, x='vals', col='bill_size', kde=True, stat='density', common_bins=False, common_norm=False, height=4, facet_kws={'sharey': False, 'sharex': False})
Multiple DataFrames
If there are multiple dataframes, they can be combined with pd.concat, and use .assign to create an identifying 'source' column, which can be used for row=, col=, or hue=
# list of dataframe
lod = [df1, df2, df3]
# create one dataframe with a new 'source' column to use for row, col, or hue
df = pd.concat((d.assign(source=f'df{i}') for i, d in enumerate(lod, 1)), ignore_index=True)
See Import multiple csv files into pandas and concatenate into one DataFrame to read multiple files into a single dataframe with an identifying column.

How to do a histogram from 2 datasets (Bin problem)

I am trying to do a histogram like the one below but I am struggling with the bins. This is my code:
plt.subplots(figsize=(2, 1), dpi=400)
width = 0.005
plt.xticks(((density_1.index.unique()) | set(density_2.index.unique())), rotation=90, fontsize=1.5)
plt.yticks(list(set(density_1.unique()) | set(density_2.unique())), fontsize=2)
plt.hist(density_1.index, density_1, width, color='Green', label=condition_1,alpha=0.5)
plt.hist(density_2.index, density_2, width, color='Red', label=condition_2,alpha=0.5,bins=my_beans1)
plt.legend(loc="upper right", fontsize=2)
plt.show()
Those are my pandas:
1st Data sample:
Xticks Yticks
0.27 0.068182
0.58 0.045455
0.32 0.045455
0.47 0.045455
0.75 0.045455
0.17 0.045455
0.43 0.022727
0.66 0.022727
0.11 0.022727
0.68 0.022727
0.59 0.022727
2nd Data sample:
Xticks Yticks
0.94 0.058442
0.86 0.058442
0.74 0.045455
0.93 0.045455
0.99 0.045455
0.71 0.038961
0.63 0.019481
0.97 0.019481
0.87 0.019481
0.84 0.019481
0.75 0.019481
0.89 0.019481
0.80 0.012987
I did this picture by using plt.bar() but I need to do with plt.hist. That is for the full dataset, but I am providing a sample of my dataframe to make it shorter.
I saw some forums and webs to do hist and the use of bins but I always get errors.
I tried something like this:
my_bins1=density_2.unique()
my_bins2=10
Assuming that you want to use a histogram to count the frequency of y-ticks, something like this might be what you are looking for:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
data1 = pd.read_csv(r"file1", sep='\t')
data2 = pd.read_csv(r"file2", sep='\t')
data1 = data1.set_index('x_ticks')
data2 = data2.set_index('x_ticks')
plt.figure()
bins=np.linspace(0, 0.1, num=100)
n, bins, rectangles = plt.hist(data1, bins, color = 'green', alpha=0.5, label='dataset 1')
plt.hist(data2, bins, color = 'red', alpha=0.5, label='dataset 2')
plt.legend(loc='upper right')
plt.title('frequency of y-ticks')
plt.show()
Output looks like this:

seaborn is not plotting within defined subplots

I am trying to plot two displots side by side with this code
fig,(ax1,ax2) = plt.subplots(1,2)
sns.displot(x =X_train['Age'], hue=y_train, ax=ax1)
sns.displot(x =X_train['Fare'], hue=y_train, ax=ax2)
It returns the following result (two empty subplots followed by one displot each on two lines)-
If I try the same code with violinplot, it returns result as expected
fig,(ax1,ax2) = plt.subplots(1,2)
sns.violinplot(y_train, X_train['Age'], ax=ax1)
sns.violinplot(y_train, X_train['Fare'], ax=ax2)
Why is displot returning a different kind of output and what can I do to output two plots on the same line?
seaborn.distplot has been DEPRECATED in seaborn 0.11 and is replaced with the following:
displot(), a figure-level function with a similar flexibility over the kind of plot to draw. This is a FacetGrid, and does not have the ax parameter, so it will not work with matplotlib.pyplot.subplots.
histplot(), an axes-level function for plotting histograms, including with kernel density smoothing. This does have the ax parameter, so it will work with matplotlib.pyplot.subplots.
It is applicable to any of the seaborn FacetGrid plots that there is no ax parameter. Use the equivalent axes-level plot.
Look at the documentation for the figure-level plot to find the appropriate axes-level plot function for your needs.
See Figure-level vs. axes-level functions
Because the histogram of two different columns is desired, it's easier to use histplot.
See How to plot in multiple subplots for a number of different ways to plot into maplotlib.pyplot.subplots
Also review seaborn histplot and displot output doesn't match
Tested in seaborn 0.11.1 & matplotlib 3.4.2
fig, (ax1, ax2) = plt.subplots(1, 2)
sns.histplot(x=X_train['Age'], hue=y_train, ax=ax1)
sns.histplot(x=X_train['Fare'], hue=y_train, ax=ax2)
Imports and DataFrame Sample
import seaborn as sns
import matplotlib.pyplot as plt
# load data
penguins = sns.load_dataset("penguins", cache=False)
# display(penguins.head())
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
Axes Level Plot
With the data in a wide format, use sns.histplot
# select the columns to be plotted
cols = ['bill_length_mm', 'bill_depth_mm']
# create the figure and axes
fig, axes = plt.subplots(1, 2)
axes = axes.ravel() # flattening the array makes indexing easier
for col, ax in zip(cols, axes):
sns.histplot(data=penguins[col], kde=True, stat='density', ax=ax)
fig.tight_layout()
plt.show()
Figure Level Plot
With the dataframe in a long format, use displot
# create a long dataframe
dfl = penguins.melt(id_vars='species', value_vars=['bill_length_mm', 'bill_depth_mm'], var_name='bill_size', value_name='vals')
# display(dfl.head())
species bill_size vals
0 Adelie bill_length_mm 39.1
1 Adelie bill_depth_mm 18.7
2 Adelie bill_length_mm 39.5
3 Adelie bill_depth_mm 17.4
4 Adelie bill_length_mm 40.3
# plot
sns.displot(data=dfl, x='vals', col='bill_size', kde=True, stat='density', common_bins=False, common_norm=False, height=4, facet_kws={'sharey': False, 'sharex': False})
Multiple DataFrames
If there are multiple dataframes, they can be combined with pd.concat, and use .assign to create an identifying 'source' column, which can be used for row=, col=, or hue=
# list of dataframe
lod = [df1, df2, df3]
# create one dataframe with a new 'source' column to use for row, col, or hue
df = pd.concat((d.assign(source=f'df{i}') for i, d in enumerate(lod, 1)), ignore_index=True)
See Import multiple csv files into pandas and concatenate into one DataFrame to read multiple files into a single dataframe with an identifying column.

Horizontal lines not appearing on matplotlib plot

Here is the sample data:
Datetime Price Data1 Data2 ShiftedPrice
0 2017-11-05 09:20:01.134 2123.0 12.23 34.12 300.0
1 2017-11-05 09:20:01.789 2133.0 32.43 45.62 330.0
2 2017-11-05 09:20:02.238 2423.0 35.43 55.62 NaN
3 2017-11-05 09:20:02.567 3423.0 65.43 56.62 NaN
4 2017-11-05 09:20:02.948 2463.0 45.43 58.62 NaN
I am trying to draw a plot between Datetime and Shiftedprice columns and horizontal lines for mean, confidence intervals of the ShiftedPrice column.
Have a look at the code below:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
df1 = df.dropna(subset=['ShiftedPrice'])
df1
fig = plt.figure(figsize=(20,10))
ax = fig.add_subplot(121)
ax = df1.plot(x='Datetime',y='ShiftedPrice')
# Plotting the mean
ax.axhline(y=df1['ShiftedPrice'].mean(), color='r', linestyle='--', lw=2)
plt.show()
# Plotting Confidence Intervals
ax.axhline(y=df1['ShiftedPrice'].mean() + 1.96*np.std(df1['ShiftedPrice'],ddof=1), color='g', linestyle=':', lw=2)
ax.axhline(y=df1['ShiftedPrice'].mean() - 1.96*np.std(df1['ShiftedPrice'],ddof=1), color='g', linestyle=':', lw=2)
plt.show()
My problem is that horizontal lines are not appearing. Instead, I get the following message
ax.axhline(y=df1['ShiftedPrice'].mean(), color='r', linestyle='--', lw=2)
Out[22]: <matplotlib.lines.Line2D at 0xccc5c18>

Categories

Resources