Related
I am plotting density map of ~40k points but hist2d returns a uniform density map. This is my code
hist2d(x, y, bins=(1000, 1000), cmap=plt.cm.jet)
Here is the scatter plot
Here is the histogram
I was expecting that there is a red horizontal portion in the center and the gradually turns blue towards higher/lower y values
EDIT:
#bb1 suggested decrease the number of bins but by setting it to bins=(100, 1000), I get this result
I think you are specifying too many bins. By setting bins=(1000,000) you get 1,000,000 bins. With 40,000 points, most of the bins will be empty and they overwhelm the image.
You may also consider using seaborn kdeplot() function instead of plt.hist2d(). It will visualize the density of data without subdividing data into bins:
import seaborn as sns
sns.kdeplot(x=x, y=y, levels = 100, fill=True, cmap="mako", thresh=0)
I am using matplotlib.pyplot.specgram and matplotlib.pyplot.pcolormesh to make spectrogram plots of a seismic signal.
Background information -The reason for using pcolormesh is that I need to do arithmitic on the spectragram data array and then replot the resulting spectrogram (for a three-component seismogram - east, north and vertical - I need to work out the horizontal spectral magnitude and divide the vertical spectra by the horizontal spectra). It is easier to do this using the spectrogram array data than on individual amplitude spectra
I have found that the plots of the spectrograms after doing my arithmetic have unexpected values. Upon further investigation it turns out that the spectrogram plot made using the pyplot.specgram method has different values compared to the spectrogram plot made using pyplot.pcolormesh and the returned data array from the pyplot.specgram method. Both plots/arrays should contain the same values, I cannot work out why they do not.
Example:
The plot of
plt.subplot(513)
PxN, freqsN, binsN, imN = plt.specgram(trN.data, NFFT = 20000, noverlap = 0, Fs = trN.stats.sampling_rate, detrend = 'mean', mode = 'magnitude')
plt.title('North')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.clim(0, 150)
plt.colorbar()
#np.savetxt('PxN.txt', PxN)
looks different to the plot of
plt.subplot(514)
plt.pcolormesh(binsZ, freqsZ, PxN)
plt.clim(0,150)
plt.colorbar()
even though the "PxN" data array (that is, the spectrogram data values for each segment) is generated by the first method and re-used in the second.
Is anyone aware why this is happening?
P.S. I realise that my value for NFFT is not a square number, but it's not important at this stage of my coding.
P.P.S. I am not aware of what the "imN" array (fourth returned variable from pyplot.specgram) is and what it is used for....
First off, let's show an example of what you're describing so that other folks
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
# Brownian noise sequence
x = np.random.normal(0, 1, 10000).cumsum()
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(8, 10))
values, ybins, xbins, im = ax1.specgram(x, cmap='gist_earth')
ax1.set(title='Specgram')
fig.colorbar(im, ax=ax1)
mesh = ax2.pcolormesh(xbins, ybins, values, cmap='gist_earth')
ax2.axis('tight')
ax2.set(title='Raw Plot of Returned Values')
fig.colorbar(mesh, ax=ax2)
plt.show()
Magnitude Differences
You'll immediately notice the difference in magnitude of the plotted values.
By default, plt.specgram doesn't plot the "raw" values it returns. Instead, it scales them to decibels (in other words, it plots the 10 * log10 of the amplitudes). If you'd like it not to scale things, you'll need to specify scale="linear". However, for looking at frequency composition, a log scale is going to make the most sense.
With that in mind, let's mimic what specgram does:
plotted = 10 * np.log10(values)
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, plotted, cmap='gist_earth')
ax.axis('tight')
ax.set(title='Plot of $10 * log_{10}(values)$')
fig.colorbar(mesh)
plt.show()
Using a Log Color Scale Instead
Alternatively, we could use a log norm on the image and get a similar result, but communicate that the color values are on a log scale more clearly:
from matplotlib.colors import LogNorm
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, values, cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()
imshow vs pcolormesh
Finally, note that the examples we've shown have had no interpolation applied, while the original specgram plot did. specgram uses imshow, while we've been plotting with pcolormesh. In this case (regular grid spacing) we can use either.
Both imshow and pcolormesh are very good options, in this case. However,imshow will have significantly better performance if you're working with a large array. Therefore, you might consider using it instead, even if you don't want interpolation (e.g. interpolation='nearest' to turn off interpolation).
As an example:
extent = [xbins.min(), xbins.max(), ybins.min(), ybins.max()]
fig, ax = plt.subplots()
mesh = ax.imshow(values, extent=extent, origin='lower', aspect='auto',
cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()
I'm trying to plot 3 figures of the normal distribution of publications but I am only getting one good figure (UK). The remaining two (USA and JAPAN) have a normal curve that is incomplete.
I fitted the curves to histograms so you could say that each figure needs to hold two graphs, i.e. a histogram and a Gaussian distribution.
Please take a look at a part of my code and let me know how to fix this.
I am very open to suggestions, thanks.
My Matplotlib figures: fitted distribution, fitted distribution, fitted distribution
for item in totalIPs:
USA=totalIPs[18]
JAPAN=totalIPs[10]
UK=totalIPs[17]
AUSTRALIA=totalIPs[0]
#print(USA)
#print(JAPAN)
#print(UK)
#print(AUSTRALIA)
#print('done')
#print(country)
#print(ipFirmnames)
#print(totalIPs)
#print("done")
#Calculating mean and standard deviation
#from sublists in country list of lists
#i could write a function for this but dont know how
mu_USA=statistics.mean(USA)
mu_JAPAN=statistics.mean(JAPAN)
mu_UK=statistics.mean(UK)
std_USA=statistics.stdev(USA)
std_JAPAN=statistics.stdev(JAPAN)
std_UK=statistics.stdev(UK)
plt.figure(1)
plt.hist(USA, bins=10, normed=True, alpha=0.6, color='g')
plt.figure(2)
plt.hist(JAPAN,bins=10,normed=True,alpha=0.6, color ='g')
plt.figure(3)
plt.hist(UK, bins=10,normed=True, alpha=0.6, color = 'g')
standardize_USA=(np.array(USA)-mu_USA)/std_USA
standardize_JAPAN=(np.array(JAPAN)-mu_JAPAN)/std_JAPAN
standardize_UK=(np.array(UK)-mu_UK)/std_UK
xmin, xmax = plt.xlim()
x1=np.linspace(xmin, xmax, 100)
x2=np.linspace(xmin, xmax, 100)
x3=np.linspace(xmin, xmax, 100)
fitted_pdf_USA=ss.norm.pdf(x1,mu_USA, std_USA)
fitted_pdf_JAPAN=ss.norm.pdf(x3,mu_JAPAN, std_JAPAN)
fitted_pdf_UK=ss.norm.pdf(x3,mu_UK, std_UK)
plt.figure(1)
plt.plot(x1, fitted_pdf_USA, 'K', linewidth=2)
plt.figure(2)
plt.plot(x2, fitted_pdf_JAPAN,'K', linewidth=2)
fitted_pdf_JAPAN=ss.norm.pdf(x2,mu_JAPAN, std_JAPAN)
plt.figure(3)
plt.plot(x3, fitted_pdf_UK,'K', linewidth=2)
#plt.show()
print(standardize_USA)
print(standardize_JAPAN)
#print(USA)
print(UK)
print(JAPAN)
The problem that you have is that the limit for the curve is obtained from only one curve in the part
xmin, xmax = plt.xlim()
Make individual limits for every plot from its respective data, not the graph limit, and it will solve your issue. Do use using max() and min() from numpy.
x1=np.linspace(USA.min(),USA.max(),100)
Do it for every plot with its respective data. This way will give smooth curves, but the limit is not the graph, but the data. If it got too small, just increase the limits through a multiplication (as 1.1*max()) or a sum (max()+10; min depends on the data).
I have only the angle values for a set of data. Now i need to plot a angle distribution curve ie., angle on the x axis v/s no.of times/frequency of angle occurring on the y axis.
These are the angles sorted out for a set of data:-
[98.1706427, 99.09896751, 99.10879006, 100.47518838, 101.22770381, 101.70374296,
103.15715294, 104.4653976,105.50441485, 106.82885361, 107.4605319, 108.93228646,
111.22463712, 112.23658018, 113.31223886, 113.4000603, 114.14565594, 114.79809084,
115.15788861, 115.42991416, 115.66216071, 115.69821092, 116.56319054, 117.09232139,
119.30835385, 119.31377834, 125.88278338, 127.80937901, 132.16187185, 132.61262906,
136.6751744, 138.34164387,]
How can i do this..??
How can i write a python program for this...?? and plot it in a graph as a distribution curve
Function hist actually returns the x and y coordinates of the bins. You can use this function to prepare the data for the line plot:
y, x, _ = plt.hist(angles) # No need for the 3rd return value
xc = (x[:-1] + x[1:]) / 2 # Take centerpoints
# plt.clf()
plt.plot(xc, y)
plt.show() # Etc.
You will end up having both the histogram and the line plot. If this is not desirable, clean the canvas before plotting the line by uncommenting the call to clf().
EDIT:
If you want a line plot as well, it is better to generate the histogram with numpy and then use that information also for the line:
from matplotlib import pyplot as plt
import numpy as np
angles = [98.1706427, 99.09896751, 99.10879006, 100.47518838, 101.22770381,
101.70374296, 103.15715294, 104.4653976, 105.50441485, 106.82885361,
107.4605319, 108.93228646, 111.22463712, 112.23658018, 113.31223886,
113.4000603, 114.14565594, 114.79809084, 115.15788861, 115.42991416,
115.66216071, 115.69821092, 116.56319054, 117.09232139, 119.30835385,
119.31377834, 125.88278338, 127.80937901, 132.16187185, 132.61262906,
136.6751744, 138.34164387, ]
hist,edges = np.histogram(angles, bins=20)
bin_centers = 0.5*(edges[:-1] + edges[1:])
bin_widths = (edges[1:]-edges[:-1])
plt.bar(bin_centers,hist,width=bin_widths)
plt.plot(bin_centers, hist,'r')
plt.xlabel('angle [$^\circ$]')
plt.ylabel('frequency')
plt.show()
this looks like this:
If you are not interested in the histogram itself, leave out the line plt.bar(bin_centers,hist,width=bin_widths).
EDIT2:
I don't really see the scientific value in a smoothed histogram. If you increase the resolution of the histogram (the bins parameter in the np.histogram command), it can change quite considerably. For instance, new peaks may occur if you increase the bin count, or two peaks may merge into one if you decrease the bin count. Keeping this in mind, smoothing the histogram curve suggests that you have more data than you do. However, if you really must, you can smooth a curve as explained in this answer, i.e.
from scipy.interpolate import spline
x = np.linspace(edges[0], edges[-1], 500)
y = spline(bin_centers, hist, x)
and then plot y over x.
I am using matplotlib.pyplot.specgram and matplotlib.pyplot.pcolormesh to make spectrogram plots of a seismic signal.
Background information -The reason for using pcolormesh is that I need to do arithmitic on the spectragram data array and then replot the resulting spectrogram (for a three-component seismogram - east, north and vertical - I need to work out the horizontal spectral magnitude and divide the vertical spectra by the horizontal spectra). It is easier to do this using the spectrogram array data than on individual amplitude spectra
I have found that the plots of the spectrograms after doing my arithmetic have unexpected values. Upon further investigation it turns out that the spectrogram plot made using the pyplot.specgram method has different values compared to the spectrogram plot made using pyplot.pcolormesh and the returned data array from the pyplot.specgram method. Both plots/arrays should contain the same values, I cannot work out why they do not.
Example:
The plot of
plt.subplot(513)
PxN, freqsN, binsN, imN = plt.specgram(trN.data, NFFT = 20000, noverlap = 0, Fs = trN.stats.sampling_rate, detrend = 'mean', mode = 'magnitude')
plt.title('North')
plt.xlabel('Time [s]')
plt.ylabel('Frequency [Hz]')
plt.clim(0, 150)
plt.colorbar()
#np.savetxt('PxN.txt', PxN)
looks different to the plot of
plt.subplot(514)
plt.pcolormesh(binsZ, freqsZ, PxN)
plt.clim(0,150)
plt.colorbar()
even though the "PxN" data array (that is, the spectrogram data values for each segment) is generated by the first method and re-used in the second.
Is anyone aware why this is happening?
P.S. I realise that my value for NFFT is not a square number, but it's not important at this stage of my coding.
P.P.S. I am not aware of what the "imN" array (fourth returned variable from pyplot.specgram) is and what it is used for....
First off, let's show an example of what you're describing so that other folks
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
# Brownian noise sequence
x = np.random.normal(0, 1, 10000).cumsum()
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(8, 10))
values, ybins, xbins, im = ax1.specgram(x, cmap='gist_earth')
ax1.set(title='Specgram')
fig.colorbar(im, ax=ax1)
mesh = ax2.pcolormesh(xbins, ybins, values, cmap='gist_earth')
ax2.axis('tight')
ax2.set(title='Raw Plot of Returned Values')
fig.colorbar(mesh, ax=ax2)
plt.show()
Magnitude Differences
You'll immediately notice the difference in magnitude of the plotted values.
By default, plt.specgram doesn't plot the "raw" values it returns. Instead, it scales them to decibels (in other words, it plots the 10 * log10 of the amplitudes). If you'd like it not to scale things, you'll need to specify scale="linear". However, for looking at frequency composition, a log scale is going to make the most sense.
With that in mind, let's mimic what specgram does:
plotted = 10 * np.log10(values)
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, plotted, cmap='gist_earth')
ax.axis('tight')
ax.set(title='Plot of $10 * log_{10}(values)$')
fig.colorbar(mesh)
plt.show()
Using a Log Color Scale Instead
Alternatively, we could use a log norm on the image and get a similar result, but communicate that the color values are on a log scale more clearly:
from matplotlib.colors import LogNorm
fig, ax = plt.subplots()
mesh = ax.pcolormesh(xbins, ybins, values, cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()
imshow vs pcolormesh
Finally, note that the examples we've shown have had no interpolation applied, while the original specgram plot did. specgram uses imshow, while we've been plotting with pcolormesh. In this case (regular grid spacing) we can use either.
Both imshow and pcolormesh are very good options, in this case. However,imshow will have significantly better performance if you're working with a large array. Therefore, you might consider using it instead, even if you don't want interpolation (e.g. interpolation='nearest' to turn off interpolation).
As an example:
extent = [xbins.min(), xbins.max(), ybins.min(), ybins.max()]
fig, ax = plt.subplots()
mesh = ax.imshow(values, extent=extent, origin='lower', aspect='auto',
cmap='gist_earth', norm=LogNorm())
ax.axis('tight')
ax.set(title='Log Normalized Plot of Values')
fig.colorbar(mesh)
plt.show()