I have a histogram with 4 different objects on each bin, that now are stacked on top of each other. Instead, I need to plot the different objects side by side within the same histogram bin (similar to the top left plot in https://matplotlib.org/3.1.1/gallery/statistics/histogram_multihist.html):
bins=np.logspace(np.log10(0.01),np.log10(20), 11)
plt.hist(a[nosfr]/1e+11, bins, color='red', fill=True, linewidth=2, density=True, histtype='bar', edgecolor='k')
plt.hist(a[highsfr]/1e+11, bins, color='orange', fill=True, linewidth=2, density=True, histtype='bar', edgecolor='k')
plt.hist(b[mynosfr]/1e+11, bins, color='blue', edgecolor='k', fill=True, linewidth=2, density=True, alpha=0.7, histtype='bar')
plt.hist(b[myhighsfr]/1e+11, bins, color='cyan', edgecolor='k', fill=True, linewidth=2, density=True, alpha=0.7, histtype='bar')
plt.xscale('log')
plt.xlim(2e-2, 2e+1)
[nosfr], [highsfr] etc. draw objects with different criteria within the same sample (a and b). All the examples I've looked at are slightly different from what I need, and I can't find the right way. Thanks!
Call plot method from your data frame with kind parameter set to bar.
x = np.random.random((10, 4))
df = pd.DataFrame(x, columns=['a', 'b', 'c', 'd'])
df.plot(kind='bar')
This is the result:
Related
I am trying to do EDA with the Kaggle dataset link
I made a plot with 3 subplots and have plotted 3 vertical lines on the basis of mean, median and mode. is there any way to show these 3 lines in a legend?
This is my code
def plott(data):
fig, axes = plt.subplots(3, sharex=True, figsize=(15, 15),gridspec_kw={"height_ratios": (1, 0.2, 0.6)})
fig.suptitle('Spread of Data for ' + data.name, fontsize=20, fontweight='bold')
sns.histplot(data, kde=True, binwidth=1, ax=axes[0])
sns.boxplot(x=data, orient='h', ax=axes[1])
sns.violinplot(x=data, ax=axes[2])
axes[0].set_xlabel('')
axes[1].set_xlabel('')
axes[2].set_xlabel('')
axes[0].axvline(data.mean(), color='r', linewidth=2, linestyle='solid')
axes[0].axvline(data.median(), color='r', linewidth=2, linestyle='dashed')
axes[0].axvline(data.mode()[0], color='r', linewidth=2, linestyle='dotted')
axes[1].axvline(data.mean(), color='r', linewidth=2, linestyle='solid')
axes[1].axvline(data.median(), color='r', linewidth=2, linestyle='dashed')
axes[1].axvline(data.mode()[0], color='r', linewidth=2, linestyle='dotted')
axes[2].axvline(data.mean(), color='r', linewidth=2, linestyle='solid')
axes[2].axvline(data.median(), color='r', linewidth=2, linestyle='dashed')
axes[2].axvline(data.mode()[0], color='r', linewidth=2, linestyle='dotted')
axes[0].tick_params(axis='both', which='both', labelsize=10, labelbottom=True)
axes[1].tick_params(axis='both', which='both', labelsize=10, labelbottom=True)
axes[2].tick_params(axis='both', which='both', labelsize=10, labelbottom=True)
plott(df['Age'])
This is the resulting plot
Is there a way to add the legend in here in accordance to the 3 vertical lines
like this with each line type denoting the value?
Also, how to add more values in x axis of all three graphs?
like make it interval of 5 or 2 years apart?
Thanks
Give the axvlines a "label" value, then call plt.legend after plotting it.
Example:
import matplotlib.pyplot as plt
plt.plot([1,2,3],[1,2,3],label="Test")
plt.axvline(x=0.22058956, label="Test2", color="red")
plt.legend()
Output:
I have a figure with two subplots in log-log scale. I would like to plot the minor ticks as well. Even though I have applied different solutions from Stack Overflow, my figure does not look as I want.
One of the solutions I have modified comes from ImportanceOfBeingErnest and the code looks like this:
fig, ((ax1, ax2)) = plt.subplots(1, 2, figsize=(8, 5), sharey=True)
# First plot
ax1.loglog(PLOT1['X'], PLOT1['Y'], 'o',
markerfacecolor='red', markeredgecolor='red', markeredgewidth=1,
markersize=1.5, alpha=0.2)
ax1.set(xlim=(1e-4, 1e4), ylim=(1e-8, 1e2))
ax1.set_xscale("log"); ax1.set_yscale("log")
ax1.xaxis.set_major_locator(matplotlib.ticker.LogLocator(base=10.0, numticks=25))
ax1.yaxis.set_major_locator(matplotlib.ticker.LogLocator(base=10.0, numticks=25))
locmaj = matplotlib.ticker.LogLocator(base=10,numticks=25)
ax1.xaxis.set_major_locator(locmaj)
locmin = matplotlib.ticker.LogLocator(base=10.0,subs=(0.2,0.4,0.6,0.8),numticks=25)
ax1.xaxis.set_minor_locator(locmin)
ax1.xaxis.set_minor_formatter(matplotlib.ticker.NullFormatter())
locmaj = matplotlib.ticker.LogLocator(base=10,numticks=25)
ax1.yaxis.set_major_locator(locmaj)
locmin = matplotlib.ticker.LogLocator(base=10.0,subs=(0.2,0.4,0.6,0.8),numticks=25)
ax1.yaxis.set_minor_locator(locmin)
ax1.yaxis.set_minor_formatter(matplotlib.ticker.NullFormatter())
ax1.set_xlabel('X values', fontsize=10, fontweight='bold')
ax1.set_ylabel('Y values', fontsize=10, fontweight='bold')
# Plot 2
ax2.loglog(PLOT2['X'], PLOT2['Y'], 'o',
markerfacecolor='blue', markeredgecolor='blue', markeredgewidth=1,
markersize=1.5, alpha=0.2)
ax2.set(xlim=(1e-4, 1e4), ylim=(1e-8, 1e2))
ax2.xaxis.set_major_locator(matplotlib.ticker.LogLocator(base=10.0, numticks=25))
ax2.yaxis.set_major_locator(matplotlib.ticker.LogLocator(base=10.0, numticks=25))
locmaj = matplotlib.ticker.LogLocator(base=10,numticks=25)
ax2.xaxis.set_major_locator(locmaj)
ax2.yaxis.set_major_locator(locmaj)
locmin = matplotlib.ticker.LogLocator(base=10.0,subs=(0.2,0.4,0.6,0.8),numticks=25)
ax2.xaxis.set_minor_locator(locmin)
ax2.yaxis.set_minor_locator(locmin)
ax2.xaxis.set_minor_formatter(matplotlib.ticker.NullFormatter())
ax2.yaxis.set_minor_formatter(matplotlib.ticker.NullFormatter())
ax2.set_xlabel('X values', fontsize=10, fontweight='bold')
ax2.set_ylabel('Y values', fontsize=10, fontweight='bold')
ax2.minorticks_on()
plt.show()
The plot I get is the following. As you can see, the minor ticks only appear on the x-axis from ax1.
How can I set the minor ticks in both subplots and both axis (x and y)?
Thank you so much.
I have a DataFrame with three numerical variables Porosity, Perm and AI. I would like to make a subplot and in each plot, I would like the histogram of the three variables, by a categorical variable 'Facies'. Facies can take only two values: Sand and Shale.
In summary, each subplot needs a histogram and each histogram must be drawn based in the categorical variable Facies, to make a comparison between facies.
So far, I can make it work, but I cannot add the axis title to each subplot.
plt.subplot(311)
plt.hist(df_sd['Porosity'].values, label='Sand', bins=30, alpha=0.6)
plt.hist(df_sh['Porosity'].values, label='Shale', bins=30, alpha=0.6)
ax.set(xlabel='Porosity (fraction)', ylabel='Density', title='Porosity
Histogram')
plt.legend()
plt.subplot(312)
plt.hist(df_sd['log10Perm'].values, label='Sand', bins=30, alpha=0.6,)
plt.hist(df_sh['log10Perm'].values, label='Shale', bins=30, alpha=0.6)
ax.set(xlabel='Permeability (mD)', ylabel='Density', title='Permeability
Histogram')
plt.legend()
plt.subplot(313)
plt.hist(df_sd['AI'].values, label='Sand', bins=30, alpha=0.6)
plt.hist(df_sh['AI'].values, label='Shale', bins=30, alpha=0.6)
ax.set(xlabel='AI (units)', ylabel='Density', title='Acoustic Impedance
Histogram')
plt.legend()
plt.subplots_adjust(left=0.0, bottom=0.0, right=1.5, top=3.5, wspace=0.1,
hspace=0.2);
#I have tried with:
fig, axs = plt.subplots(2, 1)
but when I code
axs[0].hist(df_sd['Porosity'].values, label='Sand', bins=30, alpha=0.6)
axs[0].hist(df_sd['Porosity'].values, label='Shale', bins=30, alpha=0.6)
#But the histogram for shale overrides the histogram for Sand.
I would like to have this result but with both x and y axis with label names. Furthermore, it would be helpful to have a title for each subplot.
I just did a subplot with contours, but I think the framework will be very similar:
fig, axs = plt.subplots(2, 2, constrained_layout=True)
for ax, extend in zip(axs.ravel(), extends):
cs = ax.contourf(X, Y, Z, levels, cmap=cmap, extend=extend, origin=origin)
fig.colorbar(cs, ax=ax, shrink=0.9)
ax.set_title("extend = %s" % extend)
ax.locator_params(nbins=4)
plt.show()
I think the main point to note (and this I learned from the link below) is their use of zip(axs.ravel()) in the for loop to establish each ax and then plot what you wish on that ax. I'm fairly certain you can adapt this for your uses.
The full example is available at: https://matplotlib.org/gallery/images_contours_and_fields/contourf_demo.html#sphx-glr-gallery-images-contours-and-fields-contourf-demo-py
I have found an answer:
fig = plt.figure()
ax = fig.add_subplot(111)
ax1 = fig.add_subplot(311)
ax2 = fig.add_subplot(312)
ax2 = fig.add_subplot(313)
plt.subplot(311)
ax1.hist(df_sd['Porosity'].values, label='Sand', bins=30, alpha=0.6)
ax1.hist(df_sh['Porosity'].values, label='Shale', bins=30, alpha=0.6)
ax1.set(xlabel='Porosity (fraction)', ylabel='Density', title='Porosity Histogram')
ax1.legend()
I am working on a regression problem and I want to plot 3 DataFrames. I don't know how to set the labels for the Dataframes. I want blue->ACTUAL, green->SVR, red->MLR.
What is wrong with the code?
ax1 = y_test[1800:1900].plot(color='blue', linewidth=3)
predicted_y[1800:1900].plot(color='green', linewidth=3, ax =ax1)
predicted_y1[1800:1900].plot(color='red', linewidth=3, ax=ax1)
plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05), prop={'size':35})
plt.show()
I plot this and it shows me all colors with 0 values.
I think it should work if you add labels to your plots:
ax1 = y_test[1800:1900].plot(color='blue', linewidth=3, label = 'ACTUAL')
predicted_y[1800:1900].plot(color='green', linewidth=3, ax =ax1, label = 'SVR')
predicted_y1[1800:1900].plot(color='red', linewidth=3, ax=ax1, label = 'MVR')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05), prop={'size':35})
plt.show()
I'm trying to plot both a line plot and scatter plot on the same figure. The scatter plot looks great, but the line is plotted at the incorrect indices. That is, the scatter plot data is along the correct indices, [0,4621], but the line plot is "bunched up" into indices [3750,4621].
plt.figure()
plt.plot(ii, values,
color='k', alpha=0.2)
plt.scatter(ii, scores,
color='g', s=20, alpha=0.3, marker="o")
plt.scatter(jj, scores[scores >= threshold],
color='r', s=20, alpha=0.7, marker="o")
plt.scatter(kk, labels[labels==1],
color='k', s=20, alpha=1.0, marker="+")
plt.axis([0, len(labels), 0, 1.1])
plt.title(relativePath)
plt.show()
The issue is the axes setting plt.axis([0, len(labels), 0, 1.1]) because values does not fit in the y-axis bounds. So normalizing the values list keeps it within the specified bounds [0,1.1]. This is done with norm_values = [float(v)/max(values) for v in values].