I'm trying to plot 3 figures of the normal distribution of publications but I am only getting one good figure (UK). The remaining two (USA and JAPAN) have a normal curve that is incomplete.
I fitted the curves to histograms so you could say that each figure needs to hold two graphs, i.e. a histogram and a Gaussian distribution.
Please take a look at a part of my code and let me know how to fix this.
I am very open to suggestions, thanks.
My Matplotlib figures: fitted distribution, fitted distribution, fitted distribution
for item in totalIPs:
USA=totalIPs[18]
JAPAN=totalIPs[10]
UK=totalIPs[17]
AUSTRALIA=totalIPs[0]
#print(USA)
#print(JAPAN)
#print(UK)
#print(AUSTRALIA)
#print('done')
#print(country)
#print(ipFirmnames)
#print(totalIPs)
#print("done")
#Calculating mean and standard deviation
#from sublists in country list of lists
#i could write a function for this but dont know how
mu_USA=statistics.mean(USA)
mu_JAPAN=statistics.mean(JAPAN)
mu_UK=statistics.mean(UK)
std_USA=statistics.stdev(USA)
std_JAPAN=statistics.stdev(JAPAN)
std_UK=statistics.stdev(UK)
plt.figure(1)
plt.hist(USA, bins=10, normed=True, alpha=0.6, color='g')
plt.figure(2)
plt.hist(JAPAN,bins=10,normed=True,alpha=0.6, color ='g')
plt.figure(3)
plt.hist(UK, bins=10,normed=True, alpha=0.6, color = 'g')
standardize_USA=(np.array(USA)-mu_USA)/std_USA
standardize_JAPAN=(np.array(JAPAN)-mu_JAPAN)/std_JAPAN
standardize_UK=(np.array(UK)-mu_UK)/std_UK
xmin, xmax = plt.xlim()
x1=np.linspace(xmin, xmax, 100)
x2=np.linspace(xmin, xmax, 100)
x3=np.linspace(xmin, xmax, 100)
fitted_pdf_USA=ss.norm.pdf(x1,mu_USA, std_USA)
fitted_pdf_JAPAN=ss.norm.pdf(x3,mu_JAPAN, std_JAPAN)
fitted_pdf_UK=ss.norm.pdf(x3,mu_UK, std_UK)
plt.figure(1)
plt.plot(x1, fitted_pdf_USA, 'K', linewidth=2)
plt.figure(2)
plt.plot(x2, fitted_pdf_JAPAN,'K', linewidth=2)
fitted_pdf_JAPAN=ss.norm.pdf(x2,mu_JAPAN, std_JAPAN)
plt.figure(3)
plt.plot(x3, fitted_pdf_UK,'K', linewidth=2)
#plt.show()
print(standardize_USA)
print(standardize_JAPAN)
#print(USA)
print(UK)
print(JAPAN)
The problem that you have is that the limit for the curve is obtained from only one curve in the part
xmin, xmax = plt.xlim()
Make individual limits for every plot from its respective data, not the graph limit, and it will solve your issue. Do use using max() and min() from numpy.
x1=np.linspace(USA.min(),USA.max(),100)
Do it for every plot with its respective data. This way will give smooth curves, but the limit is not the graph, but the data. If it got too small, just increase the limits through a multiplication (as 1.1*max()) or a sum (max()+10; min depends on the data).
Related
I have a question about seaborn kdeplot. In histplot one can set up which stats they want to have (counts, frequency, density, probability) and if used with the kde argument, it also applies to the kdeplot. However, I have not found a way how to change it directly in the kdeplot if I wanted to have just the kde plot estimation with probabilities. Alternatively, the same result should be coming from histplot if the bars were possible to be switched off, which I also have not found. So how can one do that?
To give some visual example, I would like to have just the red curve, ie. either pass an argument to kdeplot to use probabilities, or to remove the bars from histplot:
import seaborn
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="probabilities")
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density")
plt.legend()
Thanks a lot.
The y-axis of a histplot with stat="probability" corresponds to the probability that a value belongs to a certain bar. The value of 0.23 for the highest bar, means that there is a probability of about 23% that a flipper length is between 189.7 and 195.6 mm (being the edges of that specific bin). Note that by default, 10 bins are spread out between the minimum and maximum value encountered.
The y-axis of a kdeplot is similar to a probability density function. The height of the curve is proportional to the approximate probability of a value being within a bin of width 1 of the corresponding x-value. A value of 0.031 for x=191 means there is a probability of about 3.1 % that the length is between 190.5 and 191.5.
Now, to directly get probability values next to a kdeplot, first a bin width needs to be chosen. Then the y-values can be divided by that bin with to correspond to an x-value being within a bin of that width. The PercentageFormatter provides a way to set such a correspondence, using ax.yaxis.set_major_formatter(PercentFormatter(1/binwidth)).
The code below illustrates an example with a binwidth of 5 mm, and how a histplot can match a kdeplot.
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import PercentFormatter
fig, ax1 = plt.subplots()
penguins = sns.load_dataset("penguins")
binwidth = 5
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="Probabilities",
binwidth=binwidth, ax=ax1)
ax2 = ax1.twinx()
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density", ls=':', lw=5, ax=ax2)
ax2.set_ylim(0, ax1.get_ylim()[1] / binwidth) # similir limits on the y-axis to align the plots
ax2.yaxis.set_major_formatter(PercentFormatter(1 / binwidth)) # show axis such that 1/binwidth corresponds to 100%
ax2.set_ylabel(f'Probability for a bin width of {binwidth}')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.show()
PS: To only show the kdeplot with a probability, the code could be:
binwidth = 5
ax = sns.kdeplot(data=penguins, x="flipper_length_mm")
ax.yaxis.set_major_formatter(PercentFormatter(1 / binwidth)) # show axis such that 1/binwidth corresponds to 100%
ax.set_ylabel(f'Probability for a bin width of {binwidth}')
Another option could be to draw a histplot with kde=True, and remove the generated bars. To be interpretable, a binwidth should be set. With binwidth=1 you'd get the same y-axis as a density plot. (kde_kws={'cut': 3}) lets the kde smoothly go to about zero, default the kde curve is cut off with the minimum and maximum of the data).
ax = sns.histplot(data=penguins, x="flipper_length_mm", binwidth=1, kde=True, stat='probability', kde_kws={'cut': 3})
ax.containers[0].remove() # remove the bars
ax.relim() # the axis limits need to be recalculated without the bars
ax.autoscale_view()
I have plotted a graph in python with a subplot of residuals and am trying to find a way to at a histogram plot of the residuals on the end of the histogram plot. I would also like to add a grey band on the residual plot showing 1 standard deviation.
also is there a way to remove the top and right-hand side boarders of the plot.
Here is a copy of the code and the graph I currently have.
fig1 = pyplot.figure(figsize =(9.6,7.2))
plt.frame1 =fig1.add_axes((0.2,0.4,.75,.6))
pyplot.errorbar(xval, yval*1000, yerr=yerr*1000, xerr=xerr, marker='x', linestyle='None')
# Axis labels
pyplot.xlabel('Height (m)', fontsize = 12)
pyplot.ylabel('dM/dt (g $s^{-1}$)', fontsize = 12)
# Generate best fit line using model function and best fit parameters, and add to plot
fit_line=model_funct(xval, [a_soln, b_soln])
pyplot.plot(xval, fit_line*1000)
# Set suitable axis limits: you will probably need to change these...
#pyplot.xlim(-1, 61)
#pyplot.ylim(65, 105)
# pyplot.show()
plt.frame2 = fig1.add_axes((0.2,0.2,.75,.2)) #start frame1 at 0.2, 0.4
plt.xlabel("Height of Water (m)", fontsize = 12)
plt.ylabel("Normalised\nResiduals", fontsize = 12) #\n is used to start a new line
plt.plot(h,normalised_residuals,"x", color = "green")
plt.axhline(0, linewidth=1, linestyle="--", color="black")
plt.savefig("Final Graph.png", dpi = 500)
The naming in your code is a bit weird, therefore I only post snippets since it is hard to try it by myself. Sometimes you use pyplot and sometimes you use plt which should be the same. Also you should name your axis like this ax = fig1.add_axes((0.2,0.4,.75,.6)). Then, if you do the plot, you should call it with the axis directly, i.e. use ax.errorbar().
To hide the borders of the axis in the top plot use:
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
Adding an error band in the bottom plot is pretty easy to do. Just calculate the mean and standard deviation using np.mean() and np.std(). Afterwards, call
plt.fill_between(h, y1=np.mean(normalised_residuals) - np.std(normalised_residuals),
y2=np.mean(normalised_residuals) + np.std(normalised_residuals),
color='gray', alpha=.5)
and change the color and alpha however you want it to be.
For the histogram projection you just add another axis like you've done it two times before (let's assume it is called ax) and call
ax.hist(normalised_residuals, bins=8, orientation="horizontal")
Here, bins has to be set to a small value probably since you don't have that many data points.
I am using matplotlib.pyplot.specgram and matplotlib.pyplot.pcolormesh to make spectrogram plots of a seismic signal.
Background information -The reason for using pcolormesh is that I need to do arithmitic on the spectragram data array and then replot the resulting spectrogram (for a three-component seismogram - east, north and vertical - I need to work out the horizontal spectral magnitude and divide the vertical spectra by the horizontal spectra). It is easier to do this using the spectrogram array data than on individual amplitude spectra
I have found that the plots of the spectrograms after doing my arithmetic have unexpected values. Upon further investigation it turns out that the spectrogram plot made using the pyplot.specgram method has different values compared to the spectrogram plot made using pyplot.pcolormesh and the returned data array from the pyplot.specgram method. Both plots/arrays should contain the same values, I cannot work out why they do not.
Example:
The plot of
plt.subplot(513)
PxN, freqsN, binsN, imN = plt.specgram(trN.data, NFFT = 20000, noverlap = 0, Fs = trN.stats.sampling_rate, detrend = 'mean', mode = 'magnitude')
plt.title('North')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.clim(0, 150)
plt.colorbar()
#np.savetxt('PxN.txt', PxN)
looks different to the plot of
plt.subplot(514)
plt.pcolormesh(binsZ, freqsZ, PxN)
plt.clim(0,150)
plt.colorbar()
even though the "PxN" data array (that is, the spectrogram data values for each segment) is generated by the first method and re-used in the second.
Is anyone aware why this is happening?
P.S. I realise that my value for NFFT is not a square number, but it's not important at this stage of my coding.
P.P.S. I am not aware of what the "imN" array (fourth returned variable from pyplot.specgram) is and what it is used for....
First off, let's show an example of what you're describing so that other folks
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
# Brownian noise sequence
x = np.random.normal(0, 1, 10000).cumsum()
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(8, 10))
values, ybins, xbins, im = ax1.specgram(x, cmap='gist_earth')
ax1.set(title='Specgram')
fig.colorbar(im, ax=ax1)
mesh = ax2.pcolormesh(xbins, ybins, values, cmap='gist_earth')
ax2.axis('tight')
ax2.set(title='Raw Plot of Returned Values')
fig.colorbar(mesh, ax=ax2)
plt.show()
Magnitude Differences
You'll immediately notice the difference in magnitude of the plotted values.
By default, plt.specgram doesn't plot the "raw" values it returns. Instead, it scales them to decibels (in other words, it plots the 10 * log10 of the amplitudes). If you'd like it not to scale things, you'll need to specify scale="linear". However, for looking at frequency composition, a log scale is going to make the most sense.
With that in mind, let's mimic what specgram does:
plotted = 10 * np.log10(values)
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, plotted, cmap='gist_earth')
ax.axis('tight')
ax.set(title='Plot of $10 * log_{10}(values)$')
fig.colorbar(mesh)
plt.show()
Using a Log Color Scale Instead
Alternatively, we could use a log norm on the image and get a similar result, but communicate that the color values are on a log scale more clearly:
from matplotlib.colors import LogNorm
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, values, cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()
imshow vs pcolormesh
Finally, note that the examples we've shown have had no interpolation applied, while the original specgram plot did. specgram uses imshow, while we've been plotting with pcolormesh. In this case (regular grid spacing) we can use either.
Both imshow and pcolormesh are very good options, in this case. However,imshow will have significantly better performance if you're working with a large array. Therefore, you might consider using it instead, even if you don't want interpolation (e.g. interpolation='nearest' to turn off interpolation).
As an example:
extent = [xbins.min(), xbins.max(), ybins.min(), ybins.max()]
fig, ax = plt.subplots()
mesh = ax.imshow(values, extent=extent, origin='lower', aspect='auto',
cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()
I am using matplotlib.pyplot.specgram and matplotlib.pyplot.pcolormesh to make spectrogram plots of a seismic signal.
Background information -The reason for using pcolormesh is that I need to do arithmitic on the spectragram data array and then replot the resulting spectrogram (for a three-component seismogram - east, north and vertical - I need to work out the horizontal spectral magnitude and divide the vertical spectra by the horizontal spectra). It is easier to do this using the spectrogram array data than on individual amplitude spectra
I have found that the plots of the spectrograms after doing my arithmetic have unexpected values. Upon further investigation it turns out that the spectrogram plot made using the pyplot.specgram method has different values compared to the spectrogram plot made using pyplot.pcolormesh and the returned data array from the pyplot.specgram method. Both plots/arrays should contain the same values, I cannot work out why they do not.
Example:
The plot of
plt.subplot(513)
PxN, freqsN, binsN, imN = plt.specgram(trN.data, NFFT = 20000, noverlap = 0, Fs = trN.stats.sampling_rate, detrend = 'mean', mode = 'magnitude')
plt.title('North')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.clim(0, 150)
plt.colorbar()
#np.savetxt('PxN.txt', PxN)
looks different to the plot of
plt.subplot(514)
plt.pcolormesh(binsZ, freqsZ, PxN)
plt.clim(0,150)
plt.colorbar()
even though the "PxN" data array (that is, the spectrogram data values for each segment) is generated by the first method and re-used in the second.
Is anyone aware why this is happening?
P.S. I realise that my value for NFFT is not a square number, but it's not important at this stage of my coding.
P.P.S. I am not aware of what the "imN" array (fourth returned variable from pyplot.specgram) is and what it is used for....
First off, let's show an example of what you're describing so that other folks
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
# Brownian noise sequence
x = np.random.normal(0, 1, 10000).cumsum()
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(8, 10))
values, ybins, xbins, im = ax1.specgram(x, cmap='gist_earth')
ax1.set(title='Specgram')
fig.colorbar(im, ax=ax1)
mesh = ax2.pcolormesh(xbins, ybins, values, cmap='gist_earth')
ax2.axis('tight')
ax2.set(title='Raw Plot of Returned Values')
fig.colorbar(mesh, ax=ax2)
plt.show()
Magnitude Differences
You'll immediately notice the difference in magnitude of the plotted values.
By default, plt.specgram doesn't plot the "raw" values it returns. Instead, it scales them to decibels (in other words, it plots the 10 * log10 of the amplitudes). If you'd like it not to scale things, you'll need to specify scale="linear". However, for looking at frequency composition, a log scale is going to make the most sense.
With that in mind, let's mimic what specgram does:
plotted = 10 * np.log10(values)
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, plotted, cmap='gist_earth')
ax.axis('tight')
ax.set(title='Plot of $10 * log_{10}(values)$')
fig.colorbar(mesh)
plt.show()
Using a Log Color Scale Instead
Alternatively, we could use a log norm on the image and get a similar result, but communicate that the color values are on a log scale more clearly:
from matplotlib.colors import LogNorm
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, values, cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()
imshow vs pcolormesh
Finally, note that the examples we've shown have had no interpolation applied, while the original specgram plot did. specgram uses imshow, while we've been plotting with pcolormesh. In this case (regular grid spacing) we can use either.
Both imshow and pcolormesh are very good options, in this case. However,imshow will have significantly better performance if you're working with a large array. Therefore, you might consider using it instead, even if you don't want interpolation (e.g. interpolation='nearest' to turn off interpolation).
As an example:
extent = [xbins.min(), xbins.max(), ybins.min(), ybins.max()]
fig, ax = plt.subplots()
mesh = ax.imshow(values, extent=extent, origin='lower', aspect='auto',
cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()
figure = mlab.figure('DensityPlot')
grid = mlab.pipeline.scalar_field(xi, yi, zi, density)
min = density.min()
max=density.max()
mlab.pipeline.volume(grid, vmin=min, vmax=max)
mlab.axes()
mlab.show()
My code generates a 3d density plot. However the plot is very choppy. How does one increase the resolution? Am I supposed to increase the number of points, or change the interpolation procedure used by pipeline?