I have a function that plots a boxplot and a histogram side by side in two columns. I would like to change my code to make it 4 columns to shorten the output. I have played with the code and I am missing something. I can change it to 4 columns, but then the right 2 are blank and everything is in the two left columns.
I have tried changing the line to
ax_box, ax_hist, ax_box2, ax_hist2 = axs[i*ncols], axs[i*ncols+1], axs[i*ncols+2], axs[i*ncols+3]
instead of
ax_box, ax_hist = axs[i*ncols], axs[i*ncols+1]
among other iterations of changing the indexes on the columns.
I am new to python and I know I am missing something that will be obvious to more experienced people.
my code is:
`def hist_box_all1(data, bins):
ncols = 2 # Number of columns for subplots
nrows = len(data.columns) # Number of rows for subplots
height_ratios = [0.75, 0.75] * (nrows // 2) + [0.75] * (nrows % 2)
fig, axs = plt.subplots(nrows=nrows, ncols=ncols, figsize=(15,4*nrows), gridspec_kw={'height_ratios': height_ratios})
axs = axs.ravel() # Flatten the array of axes
for i, feature in enumerate(data.columns):
ax_box, ax_hist = axs[i*ncols], axs[i*ncols+1]
sns.set(font_scale=1) # Set the size of the label
x = data[feature]
n = data[feature].mean() # Get the mean for the legend
m=data[feature].median()
sns.boxplot(
x=x,
ax=ax_box,
showmeans=True,
meanprops={
"marker": "o",
"markerfacecolor": "white",
"markeredgecolor": "black",
"markersize": "7",
},
color="teal",
)
sns.histplot(
x=x,
bins=bins,
kde=True,
stat="density",
ax=ax_hist,
color="darkorchid",
edgecolor="black",
)
ax_hist.axvline(
data[feature].mean(), color="teal", label="mean=%f" % n
) # Draw the mean line
ax_hist.axvline(
data[feature].median(), color="red", label="median=%f" % m
) #Draw the median line
ax_box.set(yticks=[]) # Format the y axis label
#sns.despine(ax=ax_hist) # Remove the axis lines on the hist plot
#sns.despine(ax=ax_box, left=True) # Remove the axis lines on the box plot
ax_hist.legend(loc="upper right") # Place the legend in the upper right corner
plt.suptitle(feature)
plt.tight_layout()`
Here is a screen shot of the output
Here is a screen shot of the data
I find this way of using matplotlib.pyplot and add_subplot() more convenient to add multiple subplots in a plot.
import matplotlib.pyplot as plt
fig = plt.figure(figsize=[22, 6])
ax = fig.add_subplot(1, 2, 1) # (1,2) means 1 row 2 columns, '1' at the final index indicates the order of this subplot i.e., first subplot on left side
ax.hist(somedata_for_left_side)
ax.set_title('Title 01')
ax = fig.add_subplot(1, 2, 2) # '2' at the final index indicates the order of this subplot i.e., second subplot on the right side
ax.hist(somedata_for_right_side)
ax.set_title('Title 02')
Example output plot (different titles and plot type):
You can try to adapt this to your code. Let me know if this helps.
Since it's a bit hard to see visually what your error is. Can you add some screenshots for easier understanding?
Related
I have data with lots of x values around zero and only a few as you go up to around 950,
I want to create a plot with a non-linear x axis so that the relationship can be seen in a 'straight line' form. Like seen in this example,
I have tried using plt.xscale('log') but it does not achieve what I want.
I have not been able to use the log scale function with a scatter plot as it then only shows 3 values rather than the thousands that exist.
I have tried to work around it using
plt.plot(retper, aep_NW[y], marker='o', linewidth=0)
to replicate the scatter function which plots but does not show what I want.
plt.figure(1)
plt.scatter(rp,aep,label="SSI sum")
plt.show()
Image 3:
plt.figure(3)
plt.scatter(rp, aep)
plt.xscale('log')
plt.show()
Image 4:
plt.figure(4)
plt.plot(rp, aep, marker='o', linewidth=0)
plt.xscale('log')
plt.show()
ADDITION:
Hi thank you for the response.
I think you are right that my x axis is truncated but I'm not sure why or how...
I'm not really sure what to post code wise as the data is all large and coming from a server so can't really give you the data to see it with.
Basically aep_NW is a one dimensional array with 951 elements, values from 0-~140, with most values being small and only a few larger values. The data represents a storm severity index for 951 years.
Then I want the x axis to be the return period for these values, so basically I made a rp array, of the same size, which is given values from 951 down decreasing my a half each time.
I then sort the aep_NW values from lowest to highest with the highest value being associated with the largest return value (951), then the second highest aep_NW value associated with the second largest return period value (475.5) ect.
So then when I plot it I need the x axis scale to be similar to the example you showed above or the first image I attatched originally.
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1]/2
i = i - 1
y = np.argsort(aep_NW)
fig, ax = plt.subplots()
ax.scatter(rp,aep_NW[y],label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
plt.title("AEP for NW Europe: total loss per entire extended winter season")
plt.show()
It looks like in your "Image 3" the x axis is truncated, so that you don't see the data you are interested in. It appears this is due to there being 0's in your 'rp' array. I updated the examples to show the error you are seeing, one way to exclude the zeros, and one way to clip them and show them on a different scale.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
n = 100
numseas = np.logspace(-5, 3, n)
aep_NW = np.linspace(0, 140, n)
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1] /2
i = i - 1
y = np.argsort(aep_NW)
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
ax = axes[0]
ax.scatter(rp, aep_NW[y], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
ax = axes[1]
rp = np.array(rp)[y]
mask = rp > 0
ax.scatter(rp[mask], aep_NW[y][mask], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period (0 values excluded)")
ax = axes[2]
log2_clipped_rp = np.log2(rp.clip(2**-100, None))[y]
ax.scatter(log2_clipped_rp, aep_NW[y], label="SSI sum")
xticks = list(range(-110, 11, 20))
xticklabels = [f'$2^{{{i}}}$' for i in xticks]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
ax.set_xlabel("log$_2$ Return period (values clipped to 2$^{-100}$)")
plt.show()
Currently my chart is showing only the main big chart on the left.
However, I now want to add the two smaller plots to the right-hand side of my main plot; with each individual set of data.
I am struggling with subplots to figure out how to do this. My photo below shows my desired output.
filenamesK = glob("C:/Users/Ke*.csv")
filenamesZ = glob("C:/Users/Ze*.csv")
K_Z_Averages = {'K':[], 'Z':[]}
# We will create a function for plotting, instead of nesting lots of if statements within a long for-loop.
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24) # Read in the csv.
df.columns=['sample','Time','ms','Temp1'] # Set the column names
df=df.astype(str) # Set the data type as a string.
df["Temp1"] = df["Temp1"].str.replace('\+ ', '').str.replace(' ', '').astype(float) # Convert to float
# Take the average of the data from the Temp1 column, starting from sample 60 until sample 150.
avg_Temp1 = df.iloc[60-1:150+1]["Temp1"].mean()
# Append this average to a K_Z_Averages, containing a column for average from each K file and the average from each Z file.
# Glob returns the whole path, so you need to replace 0 for 10.
K_Z_Averages[os.path.basename(filename)[0]].append(avg_Temp1)
fig_ax.plot(df[["Temp1"]], color=color)
fig, ax = plt.subplots(figsize=(20, 15))
for f in filenamesK:
plot_data(f, ax, 'blue')
for f in filenamesZ:
plot_data(f, ax, 'red')
plt.show()
#max 's answer is fine, but something you can also do matplotlib>=3.3 is
import matplotlib.pyplot as plt
fig = plt.figure(constrained_layout=True)
axs = fig.subplot_mosaic([['Left', 'TopRight'],['Left', 'BottomRight']],
gridspec_kw={'width_ratios':[2, 1]})
axs['Left'].set_title('Plot on Left')
axs['TopRight'].set_title('Plot Top Right')
axs['BottomRight'].set_title('Plot Bottom Right')
Note hw the repeated name 'Left' is used twice to indicate that this subplot takes up two slots in the layout. Also note the use of width_ratios.
This is a tricky question. Essentially, you can place a grid on a figure (add_gridspec()) and than open subplots (add_subplot()) in and over different grid elements.
import matplotlib.pyplot as plt
# open figure
fig = plt.figure()
# add grid specifications
gs = fig.add_gridspec(2, 3)
# open axes/subplots
axs = []
axs.append( fig.add_subplot(gs[:,0:2]) ) # large subplot (2 rows, 2 columns)
axs.append( fig.add_subplot(gs[0,2]) ) # small subplot (1st row, 3rd column)
axs.append( fig.add_subplot(gs[1,2]) ) # small subplot (2nd row, 3rd column)
Edit: The graph is fixed now but I am having troubles plotting the legend. It only shows legend for 1 of the plots. As seen in the picture below
I am trying to plot a double axis graph with twinx but I am facing some difficulties as seen in the picture below.
Any input is welcomed! If you require any additional information, I am happy to provide them to you.
as compared to the original before plotting z-axis.
I am unsure why my graph is like that as initially before plotting my secondary y axis, (the pink line), the closing value graph can be seen perfectly but now it seems cut.
It may be due to my data as provided below.
Link to testing1.csv: https://filebin.net/ou93iqiinss02l0g
Code I have currently:
# read csv into variable
sg_df_merged = pd.read_csv("testing1.csv", parse_dates=[0], index_col=0)
# define figure
fig = plt.figure()
fig, ax5 = plt.subplots()
ax6 = ax5.twinx()
x = sg_df_merged.index
y = sg_df_merged["Adj Close"]
z = sg_df_merged["Singapore"]
curve1 = ax5.plot(x, y, label="Singapore", color = "c")
curve2 = ax6.plot(x, z, label = "Face Mask Compliance", color = "m")
curves = [curve1, curve2]
# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
ax5.grid #not sure what this line does actually
# set x-axis values to 45 degree angle
for label in ax5.xaxis.get_ticklabels():
label.set_rotation(45)
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)
plt.gca().legend(loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show();
Initially, I thought it was due to my excel having entire blank lines but I have since removed the rows which can be found here
Also, I have tried to interpolate but somehow it doesn't work. Any suggestions on this is very much welcomed
Only rows that where all NaN, were dropped. There’s still a lot of rows with NaN.
In order for matplotlib to draw connecting lines between two data points, the points must be consecutive.
The plot API isn't connecting the data between the NaN values
This can be dealt with by converting the pandas.Series to a DataFrame, and using .dropna.
See that x has been dropped, because it will not match the index length of y or z. They are shorter after .dropna.
y is now a separate dataframe, where .dropna is used.
z is also a separate dataframe, where .dropna is used.
The x-axis for the plot are the respective indices.
# read csv into variable
sg_df_merged = pd.read_csv("test.csv", parse_dates=[0], index_col=0)
# define figure
fig, ax5 = plt.subplots(figsize=(8, 6))
ax6 = ax5.twinx()
# select specific columns to plot and drop additional NaN
y = pd.DataFrame(sg_df_merged["Adj Close"]).dropna()
z = pd.DataFrame(sg_df_merged["Singapore"]).dropna()
# add plots with markers
curve1 = ax5.plot(y.index, 'Adj Close', data=y, label="Singapore", color = "c", marker='o')
curve2 = ax6.plot(z.index, 'Singapore', data=z, label = "Face Mask Compliance", color = "m", marker='o')
# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
# rotate xticks
ax5.xaxis.set_tick_params(rotation=45)
# add a grid to ax5
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)
# create a legend for both axes
curves = curve1 + curve2
labels = [l.get_label() for l in curves]
ax5.legend(curves, labels, loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show()
Because of the nature of what I am plotting, I want subplots akin to nested tables.
I'm not sure how to ask the question clearly so I'll added some pictures instead which I hope illustrate the problem.
What I have:
What I want:
Current (shortened) code looks something like this:
fig, axes = plt.subplots(nrows=5, ncols=4)
fig.suptitle(title, fontsize='x-large')
data0.plot(x=data0.x, y=data0.y, ax=axes[0,0],kind='scatter')
data1.plot(x=data1.x, y=data1.y, ax=axes[0,1],kind='scatter')
axes[0,0].set_title('title 0')
axes[0,1].set_title('title 1')
I can't figure out how to set a title for axes[0,0] and [0,1] together. I can't find anything in the documentation either. I am not fond of fussing around with tables in latex to achieve this. Any pointers?
Setting the figure title using fig.suptitle() and the axes (subplot) titles using ax.set_title() is rather straightforward. For setting an intermediate, column spanning title there is indeed no build in option.
One way to solve this issue can be to use a plt.figtext() at the appropriate positions. One needs to account some additional space for that title, e.g. by using fig.subplots_adjust and find appropriate positions of this figtext.
In the example below, we use the bounding boxes of the axes the title shall span over to find a centralized horizontal position. The vertical position is a best guess.
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
y = np.random.rand(10,8)
colors=["b", "g", "r", "violet"]
fig, axes = plt.subplots(nrows=2, ncols=4, sharex=True, sharey=True, figsize=(8,5))
#set a figure title on top
fig.suptitle("Very long figure title over the whole figure extent", fontsize='x-large')
# adjust the subplots, i.e. leave more space at the top to accomodate the additional titles
fig.subplots_adjust(top=0.78)
ext = []
#loop over the columns (j) and rows(i) to populate subplots
for j in range(4):
for i in range(2):
axes[i,j].scatter(x, y[:,4*i+j], c=colors[j], s=25)
# each axes in the top row gets its own axes title
axes[0,j].set_title('title {}'.format(j+1))
# save the axes bounding boxes for later use
ext.append([axes[0,j].get_window_extent().x0, axes[0,j].get_window_extent().width ])
# this is optional
# from the axes bounding boxes calculate the optimal position of the column spanning title
inv = fig.transFigure.inverted()
width_left = ext[0][0]+(ext[1][0]+ext[1][1]-ext[0][0])/2.
left_center = inv.transform( (width_left, 1) )
width_right = ext[2][0]+(ext[3][0]+ext[3][1]-ext[2][0])/2.
right_center = inv.transform( (width_right, 1) )
# set column spanning title
# the first two arguments to figtext are x and y coordinates in the figure system (0 to 1)
plt.figtext(left_center[0],0.88,"Left column spanning title", va="center", ha="center", size=15)
plt.figtext(right_center[0],0.88,"Right column spanning title", va="center", ha="center", size=15)
axes[0,0].set_ylim([0,1])
axes[0,0].set_xlim([0,10])
plt.show()
New in matplotlib 3.4.0
You can use subfigures if you have matplotlib version >= 3.4.0 (as mentioned in a comment by #ra0).
Once the subfigures are created, you can treat them exactly as you would a normal figure and create subplots and add suptitles.
Documentation and examples on subfigures.
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
y = np.random.rand(10, 8)
colors = ["b", "g", "r", "violet"]
fig = plt.figure(figsize=(8, 5), constrained_layout=True)
subfigs = fig.subfigures(1, 2)
titles = ["Left spanning title", "Right spanning title"]
for i, subfig in enumerate(subfigs):
axes = subfig.subplots(2, 2)
for j, row in enumerate(axes):
for k, ax in enumerate(row):
ax.scatter(x, y[:, i*4 + j*2 + k], color=colors[i*2 + k], s=25)
ax.set_xlim([0, 10])
ax.set_ylim([0, 1])
if j == 0:
ax.set_title(f"fig{i}, row{j}, col{k}")
subfig.suptitle(titles[i])
fig.suptitle("Very long figure title over the whole figure extent", fontsize='x-large')
plt.show()
Is it possible to embed a changing number of plots in a matplotlib axis? For example, the inset_axes method is used to place inset axes inside parent axes:
However, I have several rows of plots and I want to include some inset axes inside the last axis object of each row.
fig, ax = plt.subplots(2,4, figsize=(15,15))
for i in range(2):
ax[i][0].plot(np.random.random(40))
ax[i][2].plot(np.random.random(40))
ax[i][3].plot(np.random.random(40))
# number of inset axes
number_inset = 5
for j in range(number_inset):
ax[i][4].plot(np.random.random(40))
Here instead of the 5 plots drawn in the last column, I want several inset axes containing a plot. Something like this:
The reason for this is that every row refers to a different item to be plotted and the last column is supposed to contain the components of such item. Is there a way to do this in matplotlib or maybe an alternative way to visualize this?
Thanks
As #hitzg mentioned, the most common way to accomplish something like this is to use GridSpec. GridSpec creates an imaginary grid object that you can slice to produce subplots. It's an easy way to align fairly complex layouts that you want to follow a regular grid.
However, it may not be immediately obvious how to use it in this case. You'll need to create a GridSpec with numrows * numinsets rows by numcols columns and then create the "main" axes by slicing it with intervals of numinsets.
In the example below (2 rows, 4 columns, 3 insets), we'd slice by gs[:3, 0] to get the upper left "main" axes, gs[3:, 0] to get the lower left "main" axes, gs[:3, 1] to get the next upper axes, etc. For the insets, each one is gs[i, -1].
As a complete example:
import numpy as np
import matplotlib.pyplot as plt
def build_axes_with_insets(numrows, numcols, numinsets, **kwargs):
"""
Makes a *numrows* x *numcols* grid of subplots with *numinsets* subplots
embedded as "sub-rows" in the last column of each row.
Returns a figure object and a *numrows* x *numcols* object ndarray where
all but the last column consists of axes objects, and the last column is a
*numinsets* length object ndarray of axes objects.
"""
fig = plt.figure(**kwargs)
gs = plt.GridSpec(numrows*numinsets, numcols)
axes = np.empty([numrows, numcols], dtype=object)
for i in range(numrows):
# Add "main" axes...
for j in range(numcols - 1):
axes[i, j] = fig.add_subplot(gs[i*numinsets:(i+1)*numinsets, j])
# Add inset axes...
for k in range(numinsets):
m = k + i * numinsets
axes[i, -1][k] = fig.add_subplot(gs[m, -1])
return fig, axes
def plot(axes):
"""Recursive plotting function just to put something on each axes."""
for ax in axes.flat:
data = np.random.normal(0, 1, 100).cumsum()
try:
ax.plot(data)
ax.set(xticklabels=[], yticklabels=[])
except AttributeError:
plot(ax)
fig, axes = build_axes_with_insets(2, 4, 3, figsize=(12, 6))
plot(axes)
fig.tight_layout()
plt.show()
This is what I did to obtain the same result without setting the number of inset plots in advance.
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np
fig = plt.figure(figsize=(12,6))
nrows = 2
ncols = 4
# changing the shape of GridSpec's output
outer_grid = gridspec.GridSpec(nrows, ncols)
grid = []
for i in range(nrows*ncols):
grid.append(outer_grid[i])
outer_grid = np.array(grid).reshape(nrows,ncols)
for i in range(nrows):
inner_grid_1 = gridspec.GridSpecFromSubplotSpec(1, 1,
subplot_spec=outer_grid[i][0])
ax = plt.Subplot(fig, inner_grid_1[0])
ax.plot(np.random.normal(0,1,50).cumsum())
fig.add_subplot(ax)
inner_grid_2 = gridspec.GridSpecFromSubplotSpec(1, 1,
subplot_spec=outer_grid[i][1])
ax2 = plt.Subplot(fig, inner_grid_2[0])
ax2.plot(np.random.normal(0,1,50).cumsum())
fig.add_subplot(ax2)
inner_grid_3 = gridspec.GridSpecFromSubplotSpec(1, 1,
subplot_spec=outer_grid[i][2])
ax3 = plt.Subplot(fig, inner_grid_3[0])
ax3.plot(np.random.normal(0,1,50).cumsum())
fig.add_subplot(ax3)
# this value can be set based on some other calculation depending
# on each row
numinsets = 3
inner_grid_4 = gridspec.GridSpecFromSubplotSpec(numinsets, 1,
subplot_spec=outer_grid[i][3])
# Adding subplots to the last inner grid
for j in range(inner_grid_4.get_geometry()[0]):
ax4 = plt.Subplot(fig, inner_grid_4[j])
ax4.plot(np.random.normal(0,1,50).cumsum())
fig.add_subplot(ax4)
# Removing labels
for ax in fig.axes:
ax.set(xticklabels=[], yticklabels=[])
fig.tight_layout()