Stacked area chart display eventhough df contains 0 or NaN - python

I am using df.plot.area() and am very confused by the result. The dataframe has integers as index. The values to plot are in different columns. One column contains zeros from a specific integer onwards, however I can still see a thin line in the plot which isn't right.
After data processing this is the code I am using to actually plot:
# Start plotting
df.plot(kind='area', stacked=True, color=colors)
plt.legend(loc='best')
plt.xlabel('Year', fontsize=12)
plt.ylabel(mylabel, fontsize=12)
# Reverse Legend
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1])
plt.title(filename[:-4])
plt.tight_layout()
plt.autoscale(enable=True, axis='x', tight=True)
And this is a snapshot of the result, the orange thin line shouldn't be visiable because the value in the dataframe is zero.
Thanks for your support!

Related

How to reduce the blank area in a grouped boxplot with many missing hue categories

I have an issue when plotting a categorical grouped boxplot by seaborn in Python, especially using 'hue'.
My raw data is as shown in the figure below. And I wanted to plot values in column 8 after categorized by column 1 and 4.
I used seaborn and my code is shown below:
ax = sns.boxplot(x=output[:,1], y=output[:,8], hue=output[:,4])
ax.set_xticklabel(ax.get_xticklabels(), rotation=90)
plt.legend([],[])
However, the generated plot always contains large blank area, as shown in the upper figure below. I tried to add 'dodge=False' in sns.boxplot according to a post here (https://stackoverflow.com/questions/53641287/off-center-x-axis-in-seaborn), but it gives the lower figure below.
Actually, what I want Python to plot is a boxplot like what I generated using JMP below.
It seems that if one of the 2nd categories is empty, seaborn will still leave the space on the generated figure for each 1st category, thus causes the observed off-set/blank area.
So I wonder if there is any way to solve this issue, like using other package in python?
Seaborn reserves a spot for each individual hue value, even when some of these values are missing. When many hue values are missing, this leads to annoying open spots. (When there would be only one box per x-value, dodge=False would solve the problem.)
A workaround is to generate a separate subplot for each individual x-label.
Reproducible example for default boxplot with missing hue values
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230206)
df = pd.DataFrame({'label': np.repeat(['label1', 'label2', 'label3', 'label4'], 250),
'cat': np.repeat(np.random.choice([*'abcdefghijklmnopqrst'], 40), 25),
'value': np.random.randn(1000).cumsum()})
df['cat'] = pd.Categorical(df['cat'], [*'abcdefghijklmnopqrst'])
sns.set_style('white')
plt.figure(figsize=(15, 5))
ax = sns.boxplot(df, x='label', y='value', hue='cat', palette='turbo')
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1, 1), ncol=2)
sns.despine()
plt.tight_layout()
plt.show()
Individual subplots per x value
A FacetGrid is generated with a subplot ("facet") for each x value
The original hue will be used as x-value for each subplot. To avoid empty spots, the hue should be of string type. When the hue would be pd.Categorical, seaborn would still reserve a spot for each of the categories.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value')
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
plt.tight_layout()
plt.show()
Adding consistent coloring
A dictionary palette can color the boxes such that corresponding boxes in different subplots have the same color. hue= with the same column as the x= will do the coloring, and dodge=False will remove the empty spots.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
cats = np.sort(df['cat'].unique())
palette_dict = {cat: color for cat, color in zip(cats, sns.color_palette('turbo', len(cats)))}
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value',
hue='cat', dodge=False, palette=palette_dict)
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
# ax.tick_params(axis='x', labelrotation=90) # optionally rotate the tick labels
plt.tight_layout()
plt.show()

Plot pandas df into boxplot & histogram

Currently I am trying to plot a boxplot into a histogram (without using seaborn). I have tried many varieties but I always get skewed graphs.
This was my starting point:
#Histogram
df.hist(column="Q7", bins=20, figsize=(14,6))
#Boxplot
df.boxplot(column="Q7",vert=False,figsize=(14,6))
which resulted in the following graph:
As you can see the boxplot and outliers are at the bottom, but I want it to be on top.
Anyone an idea?
You can use subplots and set the percentage of the plot to ensure that the boxplot is first and the hist plot is below. In below example, I am using 30% for boxplot and 70% for bistogram. Also adjusted the spacing between the plots and used a common x-axis using sharex. Hope this is what you are looking for...
fig, ax = plt.subplots(2, figsize=(14, 6), sharex=True, # Common x-axis
gridspec_kw={"height_ratios": (.3, .7)}) # boxplot 30% of the vertical space
#Boxplot
df.boxplot(column="Q7",vert=False,figsize=(14,6), ax=ax[0])
#Histogram
df.hist(column="Q7", bins=20, figsize=(14,6), ax=ax[1])
ax[1].title.set_size(0)
plt.subplots_adjust(hspace=0.1) ##Adjust gap between the two plots

Combining a boxplot and a histogram into one plot

I'm trying to plot a boxplot and a histgramm as you can see in the following image.
boxplot and histogramm combinaion
I have this for the moment:
fig,ax=plt.subplots()
fig.set_size_inches(10,7)
ax = fig.add_axes([0,0,1,1])
ax.set_title('Heating need [kWh/m^2]')
ax.set_xlabel('Cluster')
ax.set_ylabel('Heating need')
bp1=ax.boxplot([heating0,heating1945,heating1960,heating1970,heating1980,heating1990,heating2000],labels=['<1945', '1945-1959', '1960-1969', '1970-1979', '1980-1989', '1990-1999', '>=2000'],showfliers=False,patch_artist=True)
plt.setp(bp1['boxes'], color='blue')
ax.plot([200,200,220,230,230,170,130,100,30,30],label='underline for swiss energetic index') #underline for the norms
ax.plot([230,230,250,260,260,200,160,130,60,60],label='upperline for swiss energetic index') #upperline for the norms
#plt.yticks([0,200,400,600,800])
plt.legend(loc='upper right')
The result is :
and I want to replace the plot line by a histogramm.
I think you're looking for plt.bar
A minimal example:
fig,ax=plt.subplots()
ax.boxplot([100,200,300,400,500],1)
data = [200,200]
ax.bar(range(0,len(data)*2,2),data)

How do I plot subplots with different labels from pandas dataframe columns using matplotlib

I have the following code to print out columns from a pandas dataframe as two histograms:
df = pd.read_csv('fairview_Procedure_combined.csv')
ax = df.hist(column=['precision', 'recall'], bins=25, grid=False, figsize=(12,8), color='#86bf91', zorder=2, rwidth=0.9)
ax = ax[0]
for x in ax:
# Despine
x.spines['right'].set_visible(False)
x.spines['top'].set_visible(False)
x.spines['left'].set_visible(False)
# Switch off ticks
x.tick_params(axis="both", which="both", bottom="off", top="off", labelbottom="on", left="off", right="off", labelleft="on")
# Draw horizontal axis lines
vals = x.get_yticks()
for tick in vals:
x.axhline(y=tick, linestyle='dashed', alpha=0.4, color='#eeeeee', zorder=1)
# Remove title
x.set_title("")
# Set x-axis label
x.set_xlabel("test", labelpad=20, weight='bold', size=12)
# Set y-axis label
x.set_ylabel("count", labelpad=20, weight='bold', size=12)
# Format y-axis label
x.yaxis.set_major_formatter(StrMethodFormatter('{x:,g}'))
which gives the attached output:
I would like however to have different labels on the x-axis (in particular, those listed in my column list, that is, precision and recall)
Also, I have a grouping column (semantic_type) I would like to use to generate a bunch of paired graphs, but when I pass the by keyword in my hist method to group the histograms by semantic_type, I get an error of color kwarg must have one color per data set. 18 data sets and 1 colors were provided)
I figured it out using subplots... piece of cake.

Adding a marker to a plot at specific points

I have a dataframe(first few rows):
I can plot it with matplotlib.pyplot:
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Price')
df1[['Close']].plot(ax=ax1)
To get:
What I would like to do is to add a marker to plot, down triangle, at the index 2018-09-10 04:00:00 which is indicated by the value -1 in the position column of the dataframe.
I tried to do this:
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Price')
df1[['Close']].plot(ax=ax1)
ax1.plot(
df1.loc[df1.positions == -1.0].index,
df1.Close[df1.positions == -1.0],
'v', markersize=5, color='k'
)
I get the plot like this:
So two things. One is that the index gets converted to something that shoots to year 2055, I don't understand why. Plus is there a way to add a marker at the specific position using just the first plot call? I tried to use markevery but with no success.
If you want to combine pandas plots and matplotlib datetime plots, the pandas plot needs to be plotted in compatibility mode
df1['Close'].plot(ax=ax1, x_compat=True)
That might give you the desired plot already.
If you don't want to use matplotlib, you can plot the filtered dataframe
df1['Close'].plot(ax=ax1)
df1['Close'][df1.positions == -1.0].plot(ax=ax1, marker="v", markersize=5, color='k')

Categories

Resources