Overlaying Pandas plot with Matplotlib is sensitive to the plotting order - python

I have the following problem: I'm trying to overlay two plots: One Pandas plot via plot.area() for a dataframe, and a second plot that is a standard Matplotlib plot. Depending the coder order for those two, the Matplotlib plot is displayed only if the code is before the Pandas plot.area() on the same axes.
Example: I have a Pandas dataframe called revenue that has a DateTimeIndex, and a single column with "revenue" values (float). Separately I have a dataset called projection with data along the same index (revenue.index)
If the code looks like this:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
# First -- Pandas area plot
revenue.plot.area(ax = ax)
# Second -- Matplotlib line plot
ax.plot(revenue.index, projection, color='black', linewidth=3)
plt.tight_layout()
plt.show()
Then the only thing displayed is the pandas plot.area() like this:
1/ Pandas plot.area() and 2/ Matplotlib line plot
However, if the order of the plotting is reversed:
fig, ax = plt.subplots(figsize=(10, 6))
# First -- Matplotlib line plot
ax.plot(revenue.index, projection, color='black', linewidth=3)
# Second -- Pandas area plot
revenue.plot.area(ax = ax)
plt.tight_layout()
plt.show()
Then the plots are overlayed properly, like this:
1/ Matplotlib line plot and 2/ Pandas plot.area()
Can someone please explain me what I'm doing wrong / what do I need to do to make the code more robust ? Kind TIA.

The values on the x-axis are different in both plots. I think DataFrame.plot.area() formats the DateTimeIndex in a pretty way, which is not compatible with pyplot.plot().
If you plot of the projection first, plot.area() can still plot the data and does not format the x-axis.
Mixing the two seems tricky to me, so I would either use pyplot or Dataframe.plot for both the area and the line:
import pandas as pd
from matplotlib import pyplot as plt
projection = [1000, 2000, 3000, 4000]
datetime_series = pd.to_datetime(["2021-12","2022-01", "2022-02", "2022-03"])
datetime_index = pd.DatetimeIndex(datetime_series.values)
revenue = pd.DataFrame({"value": [1200, 2200, 2800, 4100]})
revenue = revenue.set_index(datetime_index)
fig, ax = plt.subplots(1, 2, figsize=(10, 4))
# Option 1: only pyplot
ax[0].fill_between(revenue.index, revenue.value)
ax[0].plot(revenue.index, projection, color='black', linewidth=3)
ax[0].set_title("Pyplot")
# Option 2: only DataFrame.plot
revenue["projection"] = projection
revenue.plot.area(y='value', ax=ax[1])
revenue.plot.line(y='projection', ax=ax[1], color='black', linewidth=3)
ax[1].set_title("DataFrame.plot")
The results then look like this, where DataFrame.plot gives a much cleaner looking result:
If you do not want the projection in the revenue DataFrame, you can put it in a separate DataFrame and set the index to match revenue:
projection_df = pd.DataFrame({"projection": projection})
projection_df = projection_df.set_index(datetime_index)
projection_df.plot.line(ax=ax[1], color='black', linewidth=3)

Related

Set axis limits across faceted plot

How can I fix the x-axis on each of the plots in the following situation? Using xlim only affects the second plot axis, not both.
import pandas as pd
import matplotlib.pyplot as plt
sample = pd.DataFrame({'mean':[1,2,3,4,5], 'median':[10,20,30,40,50]})
sample.hist()
plt.xlim(0, 100)
Bonus, what is the correct pandas terminology for the two plots here? Subplots? Facets?
The correct terminology would be subplot or axes since hist returns the matplotlib axis instances:
axes = sample.hist()
for ax in axes.ravel():
ax.set_xlim(0,100)
Output:

How to plot a stacked area plot

I have a dataframe(df) with two columns: 'Foundation Type', which has 4 types of foundations (Shafts, Piles, Combination, Spread), and another column 'Vs30' with different values for parameter Vs30. Each row represents a bridge, with a type of foundation and a Vs30 value.
First, I create an new column 'binVs30' in df, converting each element of 'Vs30' into different bins, which has 5 different kind of ranges ([0-200],[200-400]...[800-1000]).
df['binVs30'] = pd.cut(df.Vs30, bins=np.arange(0, 1100, 200))
then, I created a stacked area plot with the code as follow:
color_table = pd.crosstab(df['binVs30'], df['Foundation Type'], dropna=False)
ax = color_table.plot(kind='area', figsize=(8, 8), stacked=True, rot=0)
display(ax)
plt.xlabel('')
plt.ylabel('Frequency', fontsize=12)
plt.legend(title='Foundation Type', loc='upper right')
plt.title('Column Database', fontsize='20')
plt.show()
The resulting picture shows some extra bins that shouldn't be there. Therefore, I had to fix the xticks by manually adding the following code:
locs, labels = plt.xticks()
plt.xticks(locs, ['','0-200','','200-400','','400-600','','600-800','','800-1000'], fontsize=10, rotation=45)
Is there a reason why Python creates those extra bins that shouldn't exist? Is that a bug that Python has? Since if I change it to a stacked bar plot, the problem just vanished. Is there a way that I could fix it by not manually adding bin code?
Also two other questions are, how to add the edgecolor for an area plot? Something like:
color_table.plot(kind='area', figsize=(8, 8), stacked=True, edgecolor='black', legend=None, rot=0)
The command edgecolor='black' doesn't work in a stacked area plot.
And, if I want to create bin for 'Vs30' like ([0-200],[200-400]...[>800]). Is there a way I can do that? Since the way I create 'binVs30' column doesn't allow me create a bin that is '>800'.
There are a couple of questions here. Firstly about including an open-ended bin in your pd.cut(). You can use np.inf to capture everything in the last bin and assign it a custom label. Secondly, since you're already using matplotlib, I'd recommend using its stacking plot directly rather than via pandas. Then you can use edgecolor argument without any issues.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame(data={
"foundation" : np.random.choice(list("ABCD"), 1000),
"binVs30" : np.random.randint(0, 1200, 1000)
})
bins = [0, 200, 400, 600, 800, np.inf]
labels = ["0-199", "200-399", "400-599", "600-799", "800+"]
df["bins"] = pd.cut(
df["binVs30"], bins=bins, labels=labels,
right=False, include_lowest=True)
stack_data = pd.crosstab(df['bins'], df['foundation'], dropna=False)
stack_array = stack_data.values.T.tolist()
pal = sns.color_palette("Set1")
plt.figure(figsize=(8,4))
plt.stackplot(
labels, stack_array, labels=list("ABCD"),
colors=pal, alpha=0.4, edgecolor="black")
plt.legend(loc='upper left')
plt.show()

Superimposing plots in seaborn cause x-axis to misallign

I am having an issue trying to superimpose plots with seaborn. I am able to generate the two plots separetly as
fig, (ax1,ax2) = plt.subplots(ncols=2,figsize=(30, 7))
sns.lineplot(data=data1, y='MSE',x='pct_gc',ax=ax1)
sns.boxplot(x="pct_gc", y="MSE", data=data2,ax=ax2,width=0.4)
The output looks like this:
But when i try to put both plots superimposed, but assiging both to the same ax object.
fig, (ax1,ax2) = plt.subplots(ncols=2,figsize=(30, 7))
sns.lineplot(data=data1, y='MSE',x='pct_gc',ax=ax1)
sns.boxplot(x="pct_gc", y="MSE", data=data2,ax=ax2,width=0.4)
I am not able to identify with the X axis in the Lineplot changes when superimposing both plots (both plots X axis go from 0 to 0.069).
My goal is for both plots to be superimposed, while keeping the same X axis range.
Seaborn's boxplot creates categorical x-axis, with all boxes nicely with the same distance. Internally the x-axis is numbered as 0, 1, 2, ... but externally it gets the labels from 0 to 0.069.
To combine a line plot with a boxplot, matplotlib's boxplot can be addressed directly, so that positions and widths can be set explicitly. When patch_artist=True, a rectangle is created (instead of just lines), for which a facecolor can be given. manage_ticks=False prevents that boxplot changes the x ticks and their limits. Optionally notch=True would accentuate the median a bit more, but depending on the data, the confidence interval might be too large and look weird.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
data1 = pd.DataFrame({'pct_gc': np.linspace(0, 0.069, 200), 'MSE': np.random.normal(0.02, 0.1, 200).cumsum()})
data1['pct_range'] = pd.cut(data1['pct_gc'], 10)
fig, ax1 = plt.subplots(ncols=1, figsize=(20, 7))
sns.lineplot(data=data1, y='MSE', x='pct_gc', ax=ax1)
for interval, color in zip(np.unique(data1['pct_range']), plt.cm.tab10.colors):
ax1.boxplot(data1[data1['pct_range'] == interval]['MSE'],
positions=[interval.mid], widths=0.4 * interval.length,
patch_artist=True, boxprops={'facecolor': color},
notch=False, medianprops={'color':'yellow', 'linewidth':2},
manage_ticks=False)
plt.show()

how to perform conditional area plotting with matplotlib?

I have created the following dataframe based on a range of data.
df['data_classification'] = df.myDatarange.apply(lambda a:'Very good' if a>=-90
else ('Good' if (a>= -100 or a<=-91)
else ('Moderate' if (a>= -110 or a<=-101)
else ('Poor' if (a>= -123 or a<=-111)
else ('Bad' if (a>= -140 or a<=-124)
else 'Off' )))))
I am planning to plot myDatarange with data_classification and somehow show the relation with different colour. I am very confused how to plot this.
I can plot myDatarange as a single lineplot, but how to relate the two data?
So far, I have tried the following:
x1 = df1.index
y1 = df1.myDatarange
f, (ax1,ax2) = plt.subplots(2,figsize=(5, 5))
ax1.plot(x1,y1,color='red', linewidth=1.9, alpha=0.9, label="myDataRange")
plt.show()
How can I plot the above range of data based on classification as area plot? Is there a better way than area plot to express my data? There are examples on the net, but not very clear on conditional side of it.
Seaborn's barplot can take a hue parameter to color each bar corresponding to the 'data_classification'. The new 'data_classification' column can be created quicker and easier to modify via pd.cut.
The barplot can be used as background for the lineplot to show the classification of each value.
Here is an example to get you started:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
df = pd.DataFrame({'myDatarange': np.random.randint(-150, -50, size=50)})
ranges = [-10**6, -140, -123, -110, -100, -90, 10**6]
df['data_classification'] = pd.cut(df['myDatarange'], ranges, right=False,
labels=['Off', 'Bad', 'Poor', 'Moderate', 'Good', 'Very Good'])
fig, ax1 = plt.subplots(figsize=(12, 4))
ax1.plot(df.index, df['myDatarange'], color='blue', linewidth=2, alpha=0.9, label="myDataRange")
sns.barplot(x=df.index, y=[df['myDatarange'].min()] * len(df),
hue='data_classification', alpha=0.5, palette='inferno', dodge=False, data=df, ax=ax1)
for bar in ax1.patches: # optionally set the bars to fill the complete background, default seaborn sets the width to about 80%
bar.set_width(1)
plt.legend(bbox_to_anchor=(1.02, 1.05) , loc='upper left')
plt.tight_layout()
plt.show()
PS: If you want to the 0 at the bottom (now at the top due to the negative y-values), you could call ax.invert_yaxis().

plot ellipse in a seaborn scatter plot

I have a data frame in pandas format (pd.DataFrame) with columns = [z1,z2,Digit], and I did a scatter plot in seaborn:
dataframe = dataFrame.apply(pd.to_numeric, errors='coerce')
sns.lmplot("z1", "z2", data=dataframe, hue='Digit', fit_reg=False, size=10)
plt.show()
What I want to is plot an ellipse around each of these points. But I can't seem to plot an ellipse in the same figure.
I know the normal way to plot an ellipse is like:
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
elps = Ellipse((0, 0), 4, 2,edgecolor='b',facecolor='none')
a = plt.subplot(111, aspect='equal')
a.add_artist(elps)
plt.xlim(-4, 4)
plt.ylim(-4, 4)
plt.show()
But because I have to do "a = plt.subplot(111, aspect='equal')", the plot will be on a different figure. And I also can't do:
a = sns.lmplot("z1", "z2", data=rect, hue='Digit', fit_reg=False, size=10)
a.add_artist(elps)
because the 'a' returned by sns.lmplot() is of "seaborn.axisgrid.FacetGrid" object. Any solutions? Is there anyway I can plot an ellipse without having to something like a.set_artist()?
Seaborn's lmplot() used a FacetGrid object to do the plot, and therefore your variable a = lm.lmplot(...) is a reference to that FacetGrid object.
To add your elipse, you need a refence to the Axes object. The problem is that a FacetGrid can contain multiple axes depending on how you split your data. Thankfully there is a function FacetGrid.facet_axis(row_i, col_j) which can return a reference to a specific Axes object.
In your case, you would do:
a = sns.lmplot("z1", "z2", data=rect, hue='Digit', fit_reg=False, size=10)
ax = a.facet_axis(0,0)
ax.add_artist(elps)

Categories

Resources