Set axis limits across faceted plot - python

How can I fix the x-axis on each of the plots in the following situation? Using xlim only affects the second plot axis, not both.
import pandas as pd
import matplotlib.pyplot as plt
sample = pd.DataFrame({'mean':[1,2,3,4,5], 'median':[10,20,30,40,50]})
sample.hist()
plt.xlim(0, 100)
Bonus, what is the correct pandas terminology for the two plots here? Subplots? Facets?

The correct terminology would be subplot or axes since hist returns the matplotlib axis instances:
axes = sample.hist()
for ax in axes.ravel():
ax.set_xlim(0,100)
Output:

Related

Change the tick frequency on the x axis using a for loop [duplicate]

I do have a question with matplotlib in python. I create different figures, where every figure should have the same height to print them in a publication/poster next to each other.
If the y-axis has a label on the very top, this shrinks the height of the box with the plot. So I use MaxNLocator to remove the upper and lower y-tick. In some plots, I want to have the 1.0 as a number on the y-axis, because I have normalized data. So I need a solution, which expands in these cases the y-axis and ensures 1.0 is a y-Tick, but does not corrupt the size of the figure using tight_layout().
Here is a minimal example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
x = np.linspace(0,1,num=11)
y = np.linspace(1,.42,num=11)
fig,axs = plt.subplots(1,1)
axs.plot(x,y)
locator=MaxNLocator(prune='both',nbins=5)
axs.yaxis.set_major_locator(locator)
plt.tight_layout()
fig.show()
Here is a link to a example-pdf, which shows the problems with height of upper boxline.
I tried to work with adjust_subplots() but this is of no use for me, because I vary the size of the figures and want to have same the font size all the time, which changes the margins.
Question is:
How can I use MaxNLocator and specify a number which has to be in the y-axis?
Hopefully someone of you has some advice.
Greetings,
Laenan
Assuming that you know in advance how many plots there will be in 1 row on a page one way to solve this would be to put all those plots into one figure - matplotlib will make sure they are alinged on axes:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
x = np.linspace(0, 1, num=11)
y = np.linspace(1, .42, num=11)
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(8,3), gridspec_kw={'wspace':.2})
ax1.plot(x,y)
ax2.plot(x,y)
locator=MaxNLocator(prune='both', nbins=5)
ax1.yaxis.set_major_locator(locator)
# You don't need to use tight_layout and using it might give an error
# plt.tight_layout()
fig.show()

Overlaying Pandas plot with Matplotlib is sensitive to the plotting order

I have the following problem: I'm trying to overlay two plots: One Pandas plot via plot.area() for a dataframe, and a second plot that is a standard Matplotlib plot. Depending the coder order for those two, the Matplotlib plot is displayed only if the code is before the Pandas plot.area() on the same axes.
Example: I have a Pandas dataframe called revenue that has a DateTimeIndex, and a single column with "revenue" values (float). Separately I have a dataset called projection with data along the same index (revenue.index)
If the code looks like this:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
# First -- Pandas area plot
revenue.plot.area(ax = ax)
# Second -- Matplotlib line plot
ax.plot(revenue.index, projection, color='black', linewidth=3)
plt.tight_layout()
plt.show()
Then the only thing displayed is the pandas plot.area() like this:
1/ Pandas plot.area() and 2/ Matplotlib line plot
However, if the order of the plotting is reversed:
fig, ax = plt.subplots(figsize=(10, 6))
# First -- Matplotlib line plot
ax.plot(revenue.index, projection, color='black', linewidth=3)
# Second -- Pandas area plot
revenue.plot.area(ax = ax)
plt.tight_layout()
plt.show()
Then the plots are overlayed properly, like this:
1/ Matplotlib line plot and 2/ Pandas plot.area()
Can someone please explain me what I'm doing wrong / what do I need to do to make the code more robust ? Kind TIA.
The values on the x-axis are different in both plots. I think DataFrame.plot.area() formats the DateTimeIndex in a pretty way, which is not compatible with pyplot.plot().
If you plot of the projection first, plot.area() can still plot the data and does not format the x-axis.
Mixing the two seems tricky to me, so I would either use pyplot or Dataframe.plot for both the area and the line:
import pandas as pd
from matplotlib import pyplot as plt
projection = [1000, 2000, 3000, 4000]
datetime_series = pd.to_datetime(["2021-12","2022-01", "2022-02", "2022-03"])
datetime_index = pd.DatetimeIndex(datetime_series.values)
revenue = pd.DataFrame({"value": [1200, 2200, 2800, 4100]})
revenue = revenue.set_index(datetime_index)
fig, ax = plt.subplots(1, 2, figsize=(10, 4))
# Option 1: only pyplot
ax[0].fill_between(revenue.index, revenue.value)
ax[0].plot(revenue.index, projection, color='black', linewidth=3)
ax[0].set_title("Pyplot")
# Option 2: only DataFrame.plot
revenue["projection"] = projection
revenue.plot.area(y='value', ax=ax[1])
revenue.plot.line(y='projection', ax=ax[1], color='black', linewidth=3)
ax[1].set_title("DataFrame.plot")
The results then look like this, where DataFrame.plot gives a much cleaner looking result:
If you do not want the projection in the revenue DataFrame, you can put it in a separate DataFrame and set the index to match revenue:
projection_df = pd.DataFrame({"projection": projection})
projection_df = projection_df.set_index(datetime_index)
projection_df.plot.line(ax=ax[1], color='black', linewidth=3)

Vertically align time series (plot and barplot) sharing same x-axis in matplotlib

Is there an easy way to align two subplots of a time series of different kinds (plot and barplot) in matplotlib? I use the pandas wrapper since I am dealing with pd.Series objects:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1)
series.head(3).plot(marker='o', ax=axes[0])
series.head(3).plot.bar(ax=axes[1])
plt.tight_layout()
The result is not visually great, it would be great to keep the code simplicity and:
Vertically align data points in the top plot to the bars on the bottom plot
Share the axis of the bar plot with the first and remove the visibility on x-axis labels of the top plot altogether (but keep grids whenever present)
Based on the ideas thrown in the comments, I think that this is the simplest solution (giving up the pandas API), which is exactly what I needed:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1, sharex=True)
axes[0].plot(series.head(3), marker='o')
axes[1].bar(series.head(3).index, series.head(3))
plt.tight_layout()
With eventual fix on the xticks for cases with missing values, where the xticks are not plotted daily (e.g. plt.xticks(series.head(3).index)).
Thanks for the help!

How to prevent overlapping x-axis labels in sns.countplot

For the plot
sns.countplot(x="HostRamSize",data=df)
I got the following graph with x-axis label mixing together, how do I avoid this? Should I change the size of the graph to solve this problem?
Having a Series ds like this
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(136)
l = "1234567890123"
categories = [ l[i:i+5]+" - "+l[i+1:i+6] for i in range(6)]
x = np.random.choice(categories, size=1000,
p=np.diff(np.array([0,0.7,2.8,6.5,8.5,9.3,10])/10.))
ds = pd.Series({"Column" : x})
there are several options to make the axis labels more readable.
Change figure size
plt.figure(figsize=(8,4)) # this creates a figure 8 inch wide, 4 inch high
sns.countplot(x="Column", data=ds)
plt.show()
Rotate the ticklabels
ax = sns.countplot(x="Column", data=ds)
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.tight_layout()
plt.show()
Decrease Fontsize
ax = sns.countplot(x="Column", data=ds)
ax.set_xticklabels(ax.get_xticklabels(), fontsize=7)
plt.tight_layout()
plt.show()
Of course any combination of those would work equally well.
Setting rcParams
The figure size and the xlabel fontsize can be set globally using rcParams
plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["xtick.labelsize"] = 7
This might be useful to put on top of a juypter notebook such that those settings apply for any figure generated within. Unfortunately rotating the xticklabels is not possible using rcParams.
I guess it's worth noting that the same strategies would naturally also apply for seaborn barplot, matplotlib bar plot or pandas.bar.
You can rotate the x_labels and increase their font size using the xticks methods of pandas.pyplot.
For Example:
import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
chart = sns.countplot(x="HostRamSize",data=df)
plt.xticks(
rotation=45,
horizontalalignment='right',
fontweight='light',
fontsize='x-large'
)
For more such modifications you can refer this link:
Drawing from Data
If you just want to make sure xticks labels are not squeezed together, you can set a proper fig size and try fig.autofmt_xdate().
This function will automatically align and rotate the labels.
plt.figure(figsize=(15,10)) #adjust the size of plot
ax=sns.countplot(x=df['Location'],data=df,hue='label',palette='mako')
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right") #it will rotate text on x axis
plt.tight_layout()
plt.show()
you can try this code & change size & rotation according to your need.
I don't know whether it is an option for you but maybe turning the graphic could be a solution (instead of plotting on x=, do it on y=), such that:
sns.countplot(y="HostRamSize",data=df)

matplotlib - pandas - No xlabel and xticks for twinx axes in subploted figures

I had a similar question, which was answered previously. However, it differs in usage of Pandas package with it.
Here is my previous question: matplotlib - No xlabel and xticks for twinx axes in subploted figures
So, my question like last one is that why it does not show xlabel and xticks for first row diagrams when using this Python code.
Two notes:
I also used subplots instead of gridspec but same result.
If you uncomment any of the commented lines in this code, which is related to using the Pandas on the axes in each diagram, the xlabel and xticks will disappear!
import matplotlib.pyplot as plt
import matplotlib.gridspec as gspec
import numpy as np
import pandas as pd
from math import sqrt
fig = plt.figure()
gs = gspec.GridSpec(2, 2)
gs.update(hspace=0.7, wspace=0.7)
ax1 = plt.subplot(gs[0, 0])
ax2 = plt.subplot(gs[0, 1])
ax3 = plt.subplot(gs[1, 0])
ax4 = plt.subplot(gs[1, 1])
x1 = np.linspace(1,10,10)
ax12 = ax1.twinx()
ax1.set_xlabel("Fig1")
ax12.set_xlabel("Fig1")
ax1.set_ylabel("Y1")
ax12.set_ylabel("Y2")
# pd.Series(range(10)).plot(ax=ax1)
ax12.plot(x1, x1**3)
ax22 = ax2.twinx()
ax2.set_xlabel("Fig2")
ax22.set_xlabel("Fig2")
ax2.set_ylabel("Y3")
ax22.set_ylabel("Y4")
# pd.Series(range(10)).plot(ax=ax2)
ax22.plot(x1, x1**0.5)
ax32 = ax3.twinx()
ax3.set_xlabel("Fig3")
ax32.set_xlabel("Fig3")
ax3.set_ylabel("Y5")
ax32.set_ylabel("Y6")
# pd.Series(range(200)).plot(ax=ax3)
ax42 = ax4.twinx()
ax4.set_xlabel("Fig4")
ax42.set_xlabel("Fig4")
ax4.set_ylabel("Y7")
ax42.set_ylabel("Y8")
# pd.Series(range(10)).plot(ax=ax42)
plt.subplots_adjust(wspace=0.8, hspace=0.8)
plt.show()
I just got the same issue because I was mixing plots made with matplotlib and made with Pandas.
You should not plot with Pandas, here is how you could replace:
pd.Series(range(10)).plot(ax=ax42)
with
ax42.plot(pd.Series(range(10))
As Scimonster mentioned above for me it worked when I plotted all the pandas before creating twinx axes.
I had few plots in twinx which were also coming from Pandas Dataframe objects (x,t plots). I created separate lists before starting the plot and then used them to plot after plotting the first pandas plots.
to summarize my work flow was
1. Creating lists for twinx plots
2. opening plot and plotting all pandas plots with normal axes.
3. creating twinx axes
4. plotting lists on twinx axes
fortunately, this flow is working for me

Categories

Resources