I am working on showing a correlation matrix for dataset features using this code
#Correlation matrix/Heatmap
fig= plt.subplots(figsize=(14,8))
sns.heatmap(cdf.corr() , annot = True, vmin=-1, vmax=1, center= 0)
and then show the distribution of two features on the grid with
plt.plot(cdf['BALANCE'], cdf['PAYMENTS'], marker='.', linewidth=0, color='#128128')
plt.grid(which='major', color='#cccccc', alpha=0.45)
plt.xlabel('Balance', fontsize=16)
plt.ylabel('Payment', fontsize=16)
plt.title('Balance vs payment', fontsize=20)
plt.show()
But the problem here is that the correlation matrix is displayed in combination with the other shape, what is the reason for that?
Like this:
The two plots are drawn on the same axes. You can either clear the axis with plt.cla() after the heatmap or have different axes (either both in the same figure of different ones)
for different figures
fig1 , ax1 = plt.subplots()
fig2 , ax2 = plt.subplots()
sns.heatmap(cdf.corr(), ax = ax1 )
ax2.plot( cdf['BALANCE'], cdf['PAYMENTS'] )
plt.show()
or on the same figure
fig , axs = plt.subplots(2)
sns.heatmap( cdf.corr() , ax = axs[0] )
axs[1].plot( cdf['BALANCE'], cdf['PAYMENTS'] )
plt.show()
I have a list of ratings for which I am plotting a histogram. On the left (y-axis) it shows the count of the frequency, is there a way for it to show the % based on traffic.
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.hist(item['ratings'], bins = 5)
ax.legend()
ax.set_title("Ratings Frequency")
ax.set_xlabel("Ratings")
ax.set_ylabel("frequency")
ax.axhline(y=0, linestyle='--', color='k')
You can use countplot try using the seaborn library it will make it very easy to do data visualization
import seaborn as sns
sns.countplot()
I have a DataFrame with three numerical variables Porosity, Perm and AI. I would like to make a subplot and in each plot, I would like the histogram of the three variables, by a categorical variable 'Facies'. Facies can take only two values: Sand and Shale.
In summary, each subplot needs a histogram and each histogram must be drawn based in the categorical variable Facies, to make a comparison between facies.
So far, I can make it work, but I cannot add the axis title to each subplot.
plt.subplot(311)
plt.hist(df_sd['Porosity'].values, label='Sand', bins=30, alpha=0.6)
plt.hist(df_sh['Porosity'].values, label='Shale', bins=30, alpha=0.6)
ax.set(xlabel='Porosity (fraction)', ylabel='Density', title='Porosity
Histogram')
plt.legend()
plt.subplot(312)
plt.hist(df_sd['log10Perm'].values, label='Sand', bins=30, alpha=0.6,)
plt.hist(df_sh['log10Perm'].values, label='Shale', bins=30, alpha=0.6)
ax.set(xlabel='Permeability (mD)', ylabel='Density', title='Permeability
Histogram')
plt.legend()
plt.subplot(313)
plt.hist(df_sd['AI'].values, label='Sand', bins=30, alpha=0.6)
plt.hist(df_sh['AI'].values, label='Shale', bins=30, alpha=0.6)
ax.set(xlabel='AI (units)', ylabel='Density', title='Acoustic Impedance
Histogram')
plt.legend()
plt.subplots_adjust(left=0.0, bottom=0.0, right=1.5, top=3.5, wspace=0.1,
hspace=0.2);
#I have tried with:
fig, axs = plt.subplots(2, 1)
but when I code
axs[0].hist(df_sd['Porosity'].values, label='Sand', bins=30, alpha=0.6)
axs[0].hist(df_sd['Porosity'].values, label='Shale', bins=30, alpha=0.6)
#But the histogram for shale overrides the histogram for Sand.
I would like to have this result but with both x and y axis with label names. Furthermore, it would be helpful to have a title for each subplot.
I just did a subplot with contours, but I think the framework will be very similar:
fig, axs = plt.subplots(2, 2, constrained_layout=True)
for ax, extend in zip(axs.ravel(), extends):
cs = ax.contourf(X, Y, Z, levels, cmap=cmap, extend=extend, origin=origin)
fig.colorbar(cs, ax=ax, shrink=0.9)
ax.set_title("extend = %s" % extend)
ax.locator_params(nbins=4)
plt.show()
I think the main point to note (and this I learned from the link below) is their use of zip(axs.ravel()) in the for loop to establish each ax and then plot what you wish on that ax. I'm fairly certain you can adapt this for your uses.
The full example is available at: https://matplotlib.org/gallery/images_contours_and_fields/contourf_demo.html#sphx-glr-gallery-images-contours-and-fields-contourf-demo-py
I have found an answer:
fig = plt.figure()
ax = fig.add_subplot(111)
ax1 = fig.add_subplot(311)
ax2 = fig.add_subplot(312)
ax2 = fig.add_subplot(313)
plt.subplot(311)
ax1.hist(df_sd['Porosity'].values, label='Sand', bins=30, alpha=0.6)
ax1.hist(df_sh['Porosity'].values, label='Shale', bins=30, alpha=0.6)
ax1.set(xlabel='Porosity (fraction)', ylabel='Density', title='Porosity Histogram')
ax1.legend()
I trying to represent data in a loglog plot, but i cant figure out the difference between the two plotting methods, FIG4 the data is scattered. FIG5 the data is not scattered. What is the interpretation?
Here is the code:
fig4, ax4 = plt.subplots()
ax4.scatter(t, sigma, marker='o', label='strain', color='red', s=0.5)
ax4.set_xlabel('log(t)')
ax4.set_ylabel('log(Sigma)')
ax4.set_title('FIG4:Log(t),log(sigma)')
ax4.set_yscale('log')
ax4.set_xscale('log')
plt.grid()
plt.show()
fig5, ax5 = plt.subplots()
ax5.set_xlabel('log(t)')
ax5.set_ylabel('log(Sigma)')
ax5.set_title('FIG5: Log(t),log(sigma)')
plt.loglog(t,sigma)
plt.grid()
plt.show()
Here are the two plots:
I have searched around SO and haven't been able to find how to format this text (I've also checked around google and the matplotlib docs)
I'm currently creating a figure and then adding 4 subplots in a 2x2 matrix format so I'm trying to scale down all the text:
fig = plt.figure()
ax1 = fig.add_subplot(221)
ax1.tick_params(labelsize='xx-small')
ax1.set_title(v, fontdict={'fontsize':'small'})
ax1.hist(results[v], histtype='bar', label='data', bins=bins, alpha=0.5)
ax1.hist(results[v+'_sim'], histtype='bar', label='truth', bins=bins, alpha=0.8)
ax1.legend(loc='best', fontsize='x-small')
You can set the parameters before plot:
plt.rcParams['xtick.labelsize'] = "xx-small"
plt.rcParams['ytick.labelsize'] = "xx-small"