Adding histogram and 1 std band to residual plot - python

I have plotted a graph in python with a subplot of residuals and am trying to find a way to at a histogram plot of the residuals on the end of the histogram plot. I would also like to add a grey band on the residual plot showing 1 standard deviation.
also is there a way to remove the top and right-hand side boarders of the plot.
Here is a copy of the code and the graph I currently have.
fig1 = pyplot.figure(figsize =(9.6,7.2))
plt.frame1 =fig1.add_axes((0.2,0.4,.75,.6))
pyplot.errorbar(xval, yval*1000, yerr=yerr*1000, xerr=xerr, marker='x', linestyle='None')
# Axis labels
pyplot.xlabel('Height (m)', fontsize = 12)
pyplot.ylabel('dM/dt (g $s^{-1}$)', fontsize = 12)
# Generate best fit line using model function and best fit parameters, and add to plot
fit_line=model_funct(xval, [a_soln, b_soln])
pyplot.plot(xval, fit_line*1000)
# Set suitable axis limits: you will probably need to change these...
#pyplot.xlim(-1, 61)
#pyplot.ylim(65, 105)
# pyplot.show()
plt.frame2 = fig1.add_axes((0.2,0.2,.75,.2)) #start frame1 at 0.2, 0.4
plt.xlabel("Height of Water (m)", fontsize = 12)
plt.ylabel("Normalised\nResiduals", fontsize = 12) #\n is used to start a new line
plt.plot(h,normalised_residuals,"x", color = "green")
plt.axhline(0, linewidth=1, linestyle="--", color="black")
plt.savefig("Final Graph.png", dpi = 500)

The naming in your code is a bit weird, therefore I only post snippets since it is hard to try it by myself. Sometimes you use pyplot and sometimes you use plt which should be the same. Also you should name your axis like this ax = fig1.add_axes((0.2,0.4,.75,.6)). Then, if you do the plot, you should call it with the axis directly, i.e. use ax.errorbar().
To hide the borders of the axis in the top plot use:
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
Adding an error band in the bottom plot is pretty easy to do. Just calculate the mean and standard deviation using np.mean() and np.std(). Afterwards, call
plt.fill_between(h, y1=np.mean(normalised_residuals) - np.std(normalised_residuals),
y2=np.mean(normalised_residuals) + np.std(normalised_residuals),
color='gray', alpha=.5)
and change the color and alpha however you want it to be.
For the histogram projection you just add another axis like you've done it two times before (let's assume it is called ax) and call
ax.hist(normalised_residuals, bins=8, orientation="horizontal")
Here, bins has to be set to a small value probably since you don't have that many data points.

Related

Seaborn kde plot plotting probabilities instead of density (histplot without bars)

I have a question about seaborn kdeplot. In histplot one can set up which stats they want to have (counts, frequency, density, probability) and if used with the kde argument, it also applies to the kdeplot. However, I have not found a way how to change it directly in the kdeplot if I wanted to have just the kde plot estimation with probabilities. Alternatively, the same result should be coming from histplot if the bars were possible to be switched off, which I also have not found. So how can one do that?
To give some visual example, I would like to have just the red curve, ie. either pass an argument to kdeplot to use probabilities, or to remove the bars from histplot:
import seaborn
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="probabilities")
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density")
plt.legend()
Thanks a lot.
The y-axis of a histplot with stat="probability" corresponds to the probability that a value belongs to a certain bar. The value of 0.23 for the highest bar, means that there is a probability of about 23% that a flipper length is between 189.7 and 195.6 mm (being the edges of that specific bin). Note that by default, 10 bins are spread out between the minimum and maximum value encountered.
The y-axis of a kdeplot is similar to a probability density function. The height of the curve is proportional to the approximate probability of a value being within a bin of width 1 of the corresponding x-value. A value of 0.031 for x=191 means there is a probability of about 3.1 % that the length is between 190.5 and 191.5.
Now, to directly get probability values next to a kdeplot, first a bin width needs to be chosen. Then the y-values can be divided by that bin with to correspond to an x-value being within a bin of that width. The PercentageFormatter provides a way to set such a correspondence, using ax.yaxis.set_major_formatter(PercentFormatter(1/binwidth)).
The code below illustrates an example with a binwidth of 5 mm, and how a histplot can match a kdeplot.
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import PercentFormatter
fig, ax1 = plt.subplots()
penguins = sns.load_dataset("penguins")
binwidth = 5
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="Probabilities",
binwidth=binwidth, ax=ax1)
ax2 = ax1.twinx()
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density", ls=':', lw=5, ax=ax2)
ax2.set_ylim(0, ax1.get_ylim()[1] / binwidth) # similir limits on the y-axis to align the plots
ax2.yaxis.set_major_formatter(PercentFormatter(1 / binwidth)) # show axis such that 1/binwidth corresponds to 100%
ax2.set_ylabel(f'Probability for a bin width of {binwidth}')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.show()
PS: To only show the kdeplot with a probability, the code could be:
binwidth = 5
ax = sns.kdeplot(data=penguins, x="flipper_length_mm")
ax.yaxis.set_major_formatter(PercentFormatter(1 / binwidth)) # show axis such that 1/binwidth corresponds to 100%
ax.set_ylabel(f'Probability for a bin width of {binwidth}')
Another option could be to draw a histplot with kde=True, and remove the generated bars. To be interpretable, a binwidth should be set. With binwidth=1 you'd get the same y-axis as a density plot. (kde_kws={'cut': 3}) lets the kde smoothly go to about zero, default the kde curve is cut off with the minimum and maximum of the data).
ax = sns.histplot(data=penguins, x="flipper_length_mm", binwidth=1, kde=True, stat='probability', kde_kws={'cut': 3})
ax.containers[0].remove() # remove the bars
ax.relim() # the axis limits need to be recalculated without the bars
ax.autoscale_view()

seaborn distplot different bar width on each figure

Sorry for giving an image however I think it is the best way to show my problem.
As you can see all of the bin width are different, from my understanding it shows range of rent_hours. I am not sure why different figure have different bin width even though I didn't set any.
My code looks is as follows:
figure, axes = plt.subplots(nrows=4, ncols=3)
figure.set_size_inches(18,14)
plt.subplots_adjust(hspace=0.5)
for ax, age_g in zip(axes.ravel(), age_cat):
group = total_usage_df.loc[(total_usage_df.age_group == age_g) & (total_usage_df.day_of_week <= 4)]
sns.distplot(group.rent_hour, ax=ax, kde=False)
ax.set(title=age_g)
ax.set_xlim([0, 24])
figure.suptitle("Weekday usage pattern", size=25);
additional question:
Seaborn : How to get the count in y axis for distplot using PairGrid for here it says that kde=False makes y-axis count however http://seaborn.pydata.org/generated/seaborn.distplot.html in the doc, it uses kde=False and still seems to show something else. How can I set y-axis to show count?
I've tried
sns.distplot(group.rent_hour, ax=ax, norm_hist=True) and it still seems to give something else rather than count.
sns.distplot(group.rent_hour, ax=ax, kde=False) gives me count however I don't know why it is giving me count.
Answer 1:
From the documentation:
norm_hist : bool, optional
If True, the histogram height shows a density rather than a count.
This is implied if a KDE or fitted density is plotted.
So you need to take into account your bin width as well, i.e. compute the area under the curve and not just the sum of the bin heights.
Answer 2:
# Plotting hist without kde
ax = sns.distplot(your_data, kde=False)
# Creating another Y axis
second_ax = ax.twinx()
#Plotting kde without hist on the second Y axis
sns.distplot(your_data, ax=second_ax, kde=True, hist=False)
#Removing Y ticks from the second axis
second_ax.set_yticks([])

How to plot a Spectrogram with very small values? [duplicate]

I am using matplotlib.pyplot.specgram and matplotlib.pyplot.pcolormesh to make spectrogram plots of a seismic signal.
Background information -The reason for using pcolormesh is that I need to do arithmitic on the spectragram data array and then replot the resulting spectrogram (for a three-component seismogram - east, north and vertical - I need to work out the horizontal spectral magnitude and divide the vertical spectra by the horizontal spectra). It is easier to do this using the spectrogram array data than on individual amplitude spectra
I have found that the plots of the spectrograms after doing my arithmetic have unexpected values. Upon further investigation it turns out that the spectrogram plot made using the pyplot.specgram method has different values compared to the spectrogram plot made using pyplot.pcolormesh and the returned data array from the pyplot.specgram method. Both plots/arrays should contain the same values, I cannot work out why they do not.
Example:
The plot of
plt.subplot(513)
PxN, freqsN, binsN, imN = plt.specgram(trN.data, NFFT = 20000, noverlap = 0, Fs = trN.stats.sampling_rate, detrend = 'mean', mode = 'magnitude')
plt.title('North')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.clim(0, 150)
plt.colorbar()
#np.savetxt('PxN.txt', PxN)
looks different to the plot of
plt.subplot(514)
plt.pcolormesh(binsZ, freqsZ, PxN)
plt.clim(0,150)
plt.colorbar()
even though the "PxN" data array (that is, the spectrogram data values for each segment) is generated by the first method and re-used in the second.
Is anyone aware why this is happening?
P.S. I realise that my value for NFFT is not a square number, but it's not important at this stage of my coding.
P.P.S. I am not aware of what the "imN" array (fourth returned variable from pyplot.specgram) is and what it is used for....
First off, let's show an example of what you're describing so that other folks
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
# Brownian noise sequence
x = np.random.normal(0, 1, 10000).cumsum()
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(8, 10))
values, ybins, xbins, im = ax1.specgram(x, cmap='gist_earth')
ax1.set(title='Specgram')
fig.colorbar(im, ax=ax1)
mesh = ax2.pcolormesh(xbins, ybins, values, cmap='gist_earth')
ax2.axis('tight')
ax2.set(title='Raw Plot of Returned Values')
fig.colorbar(mesh, ax=ax2)
plt.show()
Magnitude Differences
You'll immediately notice the difference in magnitude of the plotted values.
By default, plt.specgram doesn't plot the "raw" values it returns. Instead, it scales them to decibels (in other words, it plots the 10 * log10 of the amplitudes). If you'd like it not to scale things, you'll need to specify scale="linear". However, for looking at frequency composition, a log scale is going to make the most sense.
With that in mind, let's mimic what specgram does:
plotted = 10 * np.log10(values)
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, plotted, cmap='gist_earth')
ax.axis('tight')
ax.set(title='Plot of $10 * log_{10}(values)$')
fig.colorbar(mesh)
plt.show()
Using a Log Color Scale Instead
Alternatively, we could use a log norm on the image and get a similar result, but communicate that the color values are on a log scale more clearly:
from matplotlib.colors import LogNorm
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, values, cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()
imshow vs pcolormesh
Finally, note that the examples we've shown have had no interpolation applied, while the original specgram plot did. specgram uses imshow, while we've been plotting with pcolormesh. In this case (regular grid spacing) we can use either.
Both imshow and pcolormesh are very good options, in this case. However,imshow will have significantly better performance if you're working with a large array. Therefore, you might consider using it instead, even if you don't want interpolation (e.g. interpolation='nearest' to turn off interpolation).
As an example:
extent = [xbins.min(), xbins.max(), ybins.min(), ybins.max()]
fig, ax = plt.subplots()
mesh = ax.imshow(values, extent=extent, origin='lower', aspect='auto',
cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()

python, matplotlib: specgram data array values does not match specgram plot

I am using matplotlib.pyplot.specgram and matplotlib.pyplot.pcolormesh to make spectrogram plots of a seismic signal.
Background information -The reason for using pcolormesh is that I need to do arithmitic on the spectragram data array and then replot the resulting spectrogram (for a three-component seismogram - east, north and vertical - I need to work out the horizontal spectral magnitude and divide the vertical spectra by the horizontal spectra). It is easier to do this using the spectrogram array data than on individual amplitude spectra
I have found that the plots of the spectrograms after doing my arithmetic have unexpected values. Upon further investigation it turns out that the spectrogram plot made using the pyplot.specgram method has different values compared to the spectrogram plot made using pyplot.pcolormesh and the returned data array from the pyplot.specgram method. Both plots/arrays should contain the same values, I cannot work out why they do not.
Example:
The plot of
plt.subplot(513)
PxN, freqsN, binsN, imN = plt.specgram(trN.data, NFFT = 20000, noverlap = 0, Fs = trN.stats.sampling_rate, detrend = 'mean', mode = 'magnitude')
plt.title('North')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.clim(0, 150)
plt.colorbar()
#np.savetxt('PxN.txt', PxN)
looks different to the plot of
plt.subplot(514)
plt.pcolormesh(binsZ, freqsZ, PxN)
plt.clim(0,150)
plt.colorbar()
even though the "PxN" data array (that is, the spectrogram data values for each segment) is generated by the first method and re-used in the second.
Is anyone aware why this is happening?
P.S. I realise that my value for NFFT is not a square number, but it's not important at this stage of my coding.
P.P.S. I am not aware of what the "imN" array (fourth returned variable from pyplot.specgram) is and what it is used for....
First off, let's show an example of what you're describing so that other folks
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
# Brownian noise sequence
x = np.random.normal(0, 1, 10000).cumsum()
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(8, 10))
values, ybins, xbins, im = ax1.specgram(x, cmap='gist_earth')
ax1.set(title='Specgram')
fig.colorbar(im, ax=ax1)
mesh = ax2.pcolormesh(xbins, ybins, values, cmap='gist_earth')
ax2.axis('tight')
ax2.set(title='Raw Plot of Returned Values')
fig.colorbar(mesh, ax=ax2)
plt.show()
Magnitude Differences
You'll immediately notice the difference in magnitude of the plotted values.
By default, plt.specgram doesn't plot the "raw" values it returns. Instead, it scales them to decibels (in other words, it plots the 10 * log10 of the amplitudes). If you'd like it not to scale things, you'll need to specify scale="linear". However, for looking at frequency composition, a log scale is going to make the most sense.
With that in mind, let's mimic what specgram does:
plotted = 10 * np.log10(values)
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, plotted, cmap='gist_earth')
ax.axis('tight')
ax.set(title='Plot of $10 * log_{10}(values)$')
fig.colorbar(mesh)
plt.show()
Using a Log Color Scale Instead
Alternatively, we could use a log norm on the image and get a similar result, but communicate that the color values are on a log scale more clearly:
from matplotlib.colors import LogNorm
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, values, cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()
imshow vs pcolormesh
Finally, note that the examples we've shown have had no interpolation applied, while the original specgram plot did. specgram uses imshow, while we've been plotting with pcolormesh. In this case (regular grid spacing) we can use either.
Both imshow and pcolormesh are very good options, in this case. However,imshow will have significantly better performance if you're working with a large array. Therefore, you might consider using it instead, even if you don't want interpolation (e.g. interpolation='nearest' to turn off interpolation).
As an example:
extent = [xbins.min(), xbins.max(), ybins.min(), ybins.max()]
fig, ax = plt.subplots()
mesh = ax.imshow(values, extent=extent, origin='lower', aspect='auto',
cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()

Python: subplots with different total sizes

Original Post
I need to make several subplots with different sizes.
I have simulation areas of the size (x y) 35x6µm to 39x2µm and I want to plot them in one figure. All subplots have the same x-ticklabels (there is a grid line every 5µm on the x-axis).
When I plot the subplots into one figure, then the graphs with the small x-area are streched, so that the x-figuresize is completely used. Therefore, the x-gridlines do not match together anymore.
How can I achieve that the subplots aren't streched anymore and are aligned on the left?
Edit: Here is some code:
size=array([[3983,229],[3933,350],[3854,454],[3750,533],[3500,600]], dtype=np.float)
resolution=array([[1024,256],[1024,320],[1024,448],[1024,512],[1024,640]], dtype=np.float)
aspect_ratios=(resolution[:,0]/resolution[:,1])*(size[:,1]/size[:,0])
number_of_graphs=len(data)
fig, ax=plt.subplots(nrows=number_of_graphs, sharex=xshare)
fig.set_size_inches(12,figheight)
for i in range(number_of_graphs):
temp=np.rot90(np.loadtxt(path+'/'+data[i]))
img=ax[i].imshow(temp,
interpolation="none",
cmap=mapping,
norm=specific_norm,
aspect=aspect_ratios[i]
)
ax[i].set_adjustable('box-forced')
#Here I have to set some ticks and labels....
ax[i].xaxis.set_ticks(np.arange(0,int(size[i,0]),stepwidth_width)*resolution[i,0]/size[i,0])
ax[i].set_xticklabels((np.arange(0, int(size[i,0]), stepwidth_width)))
ax[i].yaxis.set_ticks(np.arange(0,int(size[i,1]),stepwidth_height)*resolution[i,1]/size[i,1])
ax[i].set_yticklabels((np.arange(0, int(size[i,1]), stepwidth_height)))
ax[i].set_title(str(mag[i]))
grid(True)
savefig(path+'/'+name+'all.pdf', bbox_inches='tight', pad_inches=0.05) #saves graph
Here are some examples:
If I plot different matrices in a for loop, the iPhython generates an output which is pretty much what I want. The y-distande between each subplot is constant, and the size of each figure is correct. You can see, that the x-labels match to each other:
When I plot the matrices in one figure using subplots, then this is not the case: The x-ticks do not fit together, and every subplot has the same size on the canvas (which means, that for thin subplots there is more white space reservated on the canvas...).
I simply want the first result from iPython in one output file using subplots.
Using GridSpec
After the community told me to use GridSpec to determine the size of my subplots directly I wrote a code for automatic plotting:
size=array([[3983,229],[3933,350],[3854,454],[3750,533],[3500,600]], dtype=np.float)
#total size of the figure
total_height=int(sum(size[:,1]))
total_width=int(size.max())
#determines steps of ticks
stepwidth_width=500
stepwidth_height=200
fig, ax=plt.subplots(nrows=len(size))
fig.set_size_inches(size.max()/300., total_height/200)
gs = GridSpec(total_height, total_width)
gs.update(left=0, right=0.91, hspace=0.2)
height=0
for i in range (len(size)):
ax[i] = plt.subplot(gs[int(height):int(height+size[i,1]), 0:int(size[i,0])])
temp=np.rot90(np.loadtxt(path+'/'+FFTs[i]))
img=ax[i].imshow(temp,
interpolation="none",
vmin=-100,
vmax=+100,
aspect=aspect_ratios[i],
)
#Some rescaling
ax[i].xaxis.set_ticks(np.arange(0,int(size[i,0]),stepwidth_width)*resolution[i,0]/size[i,0])
ax[i].set_xticklabels((np.arange(0, int(size[i,0]), stepwidth_width)))
ax[i].yaxis.set_ticks(np.arange(0,int(size[i,1]),stepwidth_height)*resolution[i,1]/size[i,1])
ax[i].set_yticklabels((np.arange(0, int(size[i,1]), stepwidth_height)))
ax[i].axvline(antenna[i]) #at the antenna position a vertical line is plotted
grid(True)
#colorbar
cbaxes = fig.add_axes([0.93, 0.2, 0.01, 0.6]) #[left, bottom, width, height]
cbar = plt.colorbar(img, cax = cbaxes, orientation='vertical')
tick_locator = ticker.MaxNLocator(nbins=3)
cbar.locator = tick_locator
cbar.ax.yaxis.set_major_locator(matplotlib.ticker.AutoLocator())
cbar.set_label('Intensity',
#fontsize=12
)
cbar.update_ticks()
height=height+size[i,1]
plt.show()
And here is the result....
Do you have any ideas?
What about using matplotlib.gridspec.GridSpec? Docs.
You could try something like
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
gs = GridSpec(8, 39)
ax1 = plt.subplot(gs[:6, :35])
ax2 = plt.subplot(gs[6:, :])
data1 = np.random.rand(6, 35)
data2 = np.random.rand(2, 39)
ax1.imshow(data1)
ax2.imshow(data2)
plt.show()

Categories

Resources