How to plot pdf at the same graph as the histogram - python

I wrote a function to plot a histogram of the following data(shortened)
data_1 =
[0.68417915 0.53041328 0.05499373 0.32483917 0.30501979 0.12136537
0.22964997 0.5837272 0.06000122 0.69908738 0.15690346 0.20363323
0.10390346 0.98658757 0.98359924 0.29493355 0.72561782 0.75613625
0.69628136 0.71322217 0.63060554 0.91118187 0.14915375 0.70929528
0.42408604 0.35388851 0.62253336 0.63676291 0.44358184 0.45063505
0.36477958 0.15807182 0.714753 0.96713497 0.4094859 0.56495619
0.57509395 0.9355384 0.46284749 0.67779101 0.92363017 0.05682404
0.89631817 0.52587218 0.79428246 0.14486141 0.31300898 0.10176549
0.21841843 0.25688406 0.55415834 0.84957183 0.76246304 0.98489949
0.3936749 0.51460251 0.50138111 0.36060756 0.44854838 0.3919771
0.05113578 0.23980216 0.96111616 0.05969004 0.63652018 0.77869691
0.74565952 0.53789898 0.8876854 0.02370424 0.75647449 0.1494505
0.56362217 0.84942793 0.75265825 0.43319662 0.1012875 0.09946243
0.69463561 0.46931918 0.12913483 0.22142044 0.77253391 0.1691685
0.41114265 0.011321 0.41941435 0.28070956 0.65810948 0.58770776
0.68763623 0.36828773 0.70466821 0.8332811 0.12652526 0.16867114
0.59106388 0.56926637 0.87954323 0.62176163 7.35566843e-01
1.00146415e-01 6.68137620e-01 4.39246138e-01
3.75875260e-01 2.12544712e-02 3.68062161e-01 5.35692768e-01
6.50231419e-01 7.51573475e-01 1.43792206e-01 3.51057868e-01
1.77127799e-03 9.88480387e-01 8.73988015e-01 3.78791845e-01
5.89179323e-01 4.05978444e-01 6.88178816e-01 8.73515486e-01
3.66033185e-01 7.98291151e-01 2.30921252e-01 8.68201375e-04
4.92515713e-01 4.56100036e-01 5.66357689e-01 1.18801303e-01
8.15197293e-01 1.90998886e-02 4.91136435e-01 4.90613456e-01
1.31219088e-01 8.44170500e-01 1.72284226e-01 9.48296215e-01
7.36638954e-01 2.23674369e-01 7.46383520e-02 1.56815967e-01
6.14167905e-02 9.55175567e-01 1.74517808e-01 6.16529512e-01
7.02704931e-01 2.17204373e-01 6.78545848e-01 8.99756168e-01
5.28857712e-01 8.34009864e-01 5.87747412e-01 9.01901813e-02
9.94429960e-01 8.20847209e-01 3.88627889e-01 7.99302264e-01
1.19291073e-01 3.92748464e-01 4.84674232e-01 6.86047613e-01
9.09811416e-01 4.11619033e-01 5.22738580e-01 7.87679969e-01
8.31886542e-01 5.75564445e-01 7.03306890e-01 4.37121850e-01
2.17908948e-01 9.27734103e-01 1.69151398e-01 1.02815443e-01
8.86529746e-01 9.12471508e-01 3.62394360e-02 5.75760637e-01
9.02910130e-01 9.46808438e-01 5.22324825e-01 7.41599515e-02
1.67554744e-01 9.67044492e-01 6.41305316e-02 2.02375526e-01
7.87664750e-01 4.10928526e-01 3.75066800e-01 1.02825038e-01
7.99960722e-01 5.15931793e-01 6.07891990e-01 4.22650890e-01
2.50692729e-01 4.76696332e-01 3.42881458e-01 4.56350909e-01
2.21493003e-02 9.22045389e-01 4.31748031e-01 3.67451551e-01]
And the following code
def plot_histo(data_list, bin_count):
plt.hist(data_list, bins=bin_count, density= True)
return plt.show()
plot_1 = plot_histo(data_1, 100)
I also want to plot the pdf of this distribution on the same graph, but i don't really know how to do that since I'm new to python! Any tips?

There are several methods to estimate the pdf from the samples.
One way is to use kernel density estimation, which can be done easily with sns.kdeplot.
Another way is to fit parameters of a known distribution, e.g. with scikit-learn GaussianMixture if you have reasons to think your data is gaussian.
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import seaborn as sns
from sklearn.mixture import GaussianMixture
num_samples = 1_000
mu = 3.14
sigma = 1.27
data = np.random.randn(num_samples) * sigma + mu
fig, ax = plt.subplots()
# Histogram
ax.hist(data, bins=10, density=True, label="Histogram")
# Kernel Density Estimation (KDE)
sns.kdeplot(data, bw_method=0.2, ax=ax, label="KDE")
# Gaussian fitting
gm = GaussianMixture(n_components=1).fit(data.reshape(-1, 1))
mu_ = gm.means_.item()
sigma_ = gm.covariances_.item()
xx = np.linspace(*ax.get_xlim(), num=100)
plt.plot(xx, stats.norm.pdf(xx, mu_, sigma_), label="Gaussian")
ax.legend()
plt.show()

Related

Spline in 3D can not be differentiated due to an AttributeError

I am trying to fit a smoothing B-spline to some data and I found this very helpful post on here. However, I not only need the spline, but also its derivatives, so I tried to add the following code to the example:
tck_der = interpolate.splder(tck, n=1)
x_der, y_der, z_der = interpolate.splev(u_fine, tck_der)
For some reason this does not seem to work due to some data type issues. I get the following traceback:
Traceback (most recent call last):
File "interpolate_point_trace.py", line 31, in spline_example
tck_der = interpolate.splder(tck, n=1)
File "/home/user/anaconda3/lib/python3.7/site-packages/scipy/interpolate/fitpack.py", line 657, in splder
return _impl.splder(tck, n)
File "/home/user/anaconda3/lib/python3.7/site-packages/scipy/interpolate/_fitpack_impl.py", line 1206, in splder
sh = (slice(None),) + ((None,)*len(c.shape[1:]))
AttributeError: 'list' object has no attribute 'shape'
The reason for this seems to be that the second argument of the tck tuple contains a list of numpy arrays. I thought turning the input data to be a numpy array as well would help, but it does not change the data types of tck.
Does this behavior reflect an error in scipy, or is the input malformed?
I tried manually turning the list into an array:
tck[1] = np.array(tck[1])
but this (which didn't surprise me) also gave an error:
ValueError: operands could not be broadcast together with shapes (0,8) (7,1)
Any ideas of what the problem could be? I have used scipy before and on 1D splines the splder function works just fine, so I assume it has something to do with the spline being a line in 3D.
------- edit --------
Here is a minimum working example:
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
from mpl_toolkits.mplot3d import Axes3D
total_rad = 10
z_factor = 3
noise = 0.1
num_true_pts = 200
s_true = np.linspace(0, total_rad, num_true_pts)
x_true = np.cos(s_true)
y_true = np.sin(s_true)
z_true = s_true / z_factor
num_sample_pts = 80
s_sample = np.linspace(0, total_rad, num_sample_pts)
x_sample = np.cos(s_sample) + noise * np.random.randn(num_sample_pts)
y_sample = np.sin(s_sample) + noise * np.random.randn(num_sample_pts)
z_sample = s_sample / z_factor + noise * np.random.randn(num_sample_pts)
tck, u = interpolate.splprep([x_sample, y_sample, z_sample], s=2)
x_knots, y_knots, z_knots = interpolate.splev(tck[0], tck)
u_fine = np.linspace(0, 1, num_true_pts)
x_fine, y_fine, z_fine = interpolate.splev(u_fine, tck)
# this is the part of the code I inserted: the line under this causes the crash
tck_der = interpolate.splder(tck, n=1)
x_der, y_der, z_der = interpolate.splev(u_fine, tck_der)
# end of the inserted code
fig2 = plt.figure(2)
ax3d = fig2.add_subplot(111, projection='3d')
ax3d.plot(x_true, y_true, z_true, 'b')
ax3d.plot(x_sample, y_sample, z_sample, 'r*')
ax3d.plot(x_knots, y_knots, z_knots, 'go')
ax3d.plot(x_fine, y_fine, z_fine, 'g')
fig2.show()
plt.show()
Stumbled into the same problem...
I circumvented the error by using interpolate.splder(tck, n=1) and instead used interpolate.splev(spline_ev, tck, der=1) which returns the derivatives at the points spline_ev (see Scipy Doku).
If you need the spline I think you can then use interpolate.splprep() again.
In total something like:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
points = np.random.rand(10,2) * 10
(tck, u), fp, ier, msg = interpolate.splprep(points.T, s=0, k=3, full_output=True)
spline_ev = np.linspace(0.0, 1.0, 100, endpoint=True)
spline_points = interpolate.splev(spline_ev, tck)
# Calculate derivative
spline_der_points = interpolate.splev(spline_ev, tck, der=1)
spline_der = interpolate.splprep(spline_der_points.T, s=0, k=3, full_output=True)
# Plot the data and derivative
fig = plt.figure()
plt.plot(points[:,0], points[:,1], '.-', label="points")
plt.plot(spline_points[0], spline_points[1], '.-', label="tck")
plt.plot(spline_der_points[0], spline_der_points[1], '.-', label="tck_der")
# Show tangent
plt.arrow(spline_points[0][23]-spline_der_points[0][23], spline_points[1][23]-spline_der_points[1][23], 2.0*spline_der_points[0][23], 2.0*spline_der_points[1][23])
plt.legend()
plt.show()
EDIT:
I also opened an Issue on Github and according to ev-br the usage of interpolate.splprep is depreciated and one should use make_interp_spline / BSpline instead.
As noted in other answers, splprep output is incompatible with splder, but is compatible with splev. And the latter can evaluate the derivatives.
However, for interpolation, there is an alternative approach, which avoids splprep altogether. I'm basically copying a reply on the SciPy issue tracker (https://github.com/scipy/scipy/issues/10389):
Here's an example of replicating the splprep outputs. First let's make sense out of the splprep output:
# start with the OP example
import numpy as np
from scipy import interpolate
points = np.random.rand(10,2) * 10
(tck, u), fp, ier, msg = interpolate.splprep(points.T, s=0, k=3, full_output=True)
# check the meaning of the `u` array: evaluation of the spline at `u`
# gives back the original points (up to a list/transpose)
xy = interpolate.splev(u, tck)
xy = np.asarray(xy)
np.allclose(xy.T, points)
Next, let's replicate it without splprep. First, build the u array: the curve is represented parametrically, and u is essentially an approximation for the arc length. Other parametrizations are possible, but here let's stick to what splprep does. Translating the pseudocode from the doc page, https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.splprep.html
vv = np.sum((points[1:, :] - points[:-1, :])**2, axis=1)
vv = np.sqrt(vv).cumsum()
vv/= vv[-1]
vv = np.r_[0, vv]
# check:
np.allclose(u, vv)
Now, interpolate along the parametric curve: points vs vv:
spl = interpolate.make_interp_spline(vv, points)
# check spl.t vs knots from splPrep
spl.t - tck[0]
The result, spl, is a BSpline object which you can evaluate, differentiate etc in a usual way:
np.allclose(points, spl(vv))
# differentiate
spl_derivative = spl.derivative(vv)

Colormap/color problems with bar3d plot

I have written some code to plot the longitude and latitude of towns/cities in the UK with the 3D bar height representing the population of these places. I am trying to also colour code the bars using a colormap so that the variation in population can be more easily seen. However, my code doesnt seem to follow the colormap and instead all the bars have very similar colours - I am not sure where I have gone wrong
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
pop = [189120, 91297, 107123, 107355, 94782, 87590, 142968, 1085810, 117963, 147663, 194189, 187503, 349561, 229700, 535907, 145818, 335145, 110507, 116447, 86011, 88483, 119441, 325949, 106943, 92363, 255394, 109805, 144170, 109185, 468720, 113507, 120046, 104157, 589900, 136362, 88243, 88134, 88855, 91053, 94932, 120256, 162949, 284321, 144957, 474632, 443760, 100160, 552267, 8173941, 211228, 107627, 510746, 174700, 171750, 268064, 128060, 215173, 186682, 289301, 86552, 96555, 159994, 161707, 234982, 154718, 238137, 97886, 95580, 218705, 107926, 109691, 134022, 103886, 518090, 155298, 123187, 253651, 175547, 91703, 102885, 89663, 105878, 270726, 174286, 109015, 179485, 182441, 142723, 99251, 165456, 131982, 91930, 218791, 83641, 103608, 105367, 265178, 100153, 109120, 152841]
lat = [57.14369, 53.55, 51.56844, 51.26249, 51.37795, 52.13459, 53.39337, 52.48142, 53.75, 53.81667, 53.58333, 50.72048, 53.79391, 50.82838, 51.45523, 52.2, 51.48, 51.73575, 51.9, 53.1905, 53.25, 51.88921, 52.40656, 51.11303, 54.52429, 52.92277, 53.52327, 56.5, 50.76871, 55.95206, 50.7236, 54.96209, 51.38914, 55.86515, 51.86568, 53.56539, 53.71667, 54.68611, 50.85519, 51.75369, 51.62907, 53.64904, 53.7446, 52.05917, 53.79648, 52.6386, 53.22683, 53.41058, 51.50853, 51.87967, 51.26667, 53.48095, 54.57623, 52.04172, 54.97328, 51.58774, 52.25, 52.62783, 52.9536, 52.52323, 53.54051, 51.75222, 52.57364, 50.37153, 50.71667, 50.79899, 53.76667, 51.58571, 51.45625, 53.61766, 53.43012, 53.42519, 53.48771, 53.38297, 51.50949, 52.41426, 50.90395, 51.53782, 53.64779, 53.45, 51.90224, 53.40979, 53.00415, 54.90465, 52.56667, 51.62079, 51.55797, 52.67659, 53.68331, 53.39254, 51.65531, 52.51868, 51.50853, 51.34603, 53.53333, 51.31903, 52.58547, 52.18935, 50.81448, 53.95763]
long = [-2.09814, -1.48333, 0.45782, -1.08708, -2.35907, -0.46632, -3.01479, -1.89983, -2.48333, -3.05, -2.43333, -1.8795, -1.75206, -0.13947, -2.59665, 0.11667, -3.18, 0.46958, -2.08333, -2.89189, -1.41667, 0.90421, -1.51217, -0.18312, -1.55039, -1.47663, -1.13691, -2.96667, 0.28453, -3.19648, -3.52751, -1.60168, 0.54863, -4.25763, -2.2431, -0.07553, -1.85, -1.2125, 0.57292, -0.47517, -0.74934, -1.78416, -0.33525, 1.15545, -1.54785, -1.13169, -0.53792, -2.97794, -0.12574, -0.41748, 0.51667, -2.23743, -1.23483, -0.75583, -1.61396, -2.99835, -0.88333, 1.29834, -1.15047, -1.46523, -2.1183, -1.25596, -0.24777, -4.14305, -2.0, -1.09125, -2.71667, 0.60459, -0.97113, -2.1552, -1.35678, -2.32443, -2.29042, -1.4659, -0.59541, -1.78094, -1.40428, 0.71433, -3.00648, -2.73333, -0.20256, -2.15761, -2.18538, -1.38222, -1.81667, -3.94323, -1.78116, -2.44926, -1.49768, -2.58024, -0.39602, -1.9945, -0.12574, -2.97665, -2.61667, -0.55893, -2.12296, -2.22001, -0.37126, -1.08271]
fig = plt.figure()
ax = Axes3D(fig)
X,Y,Z = np.array(long),np.array(lat),np.log10(np.array(pop))
colours = plt.cm.rainbow_r(Z/np.log10(max(pop)))
plot1 = ax.bar3d(X,Y,Z,dx=0.2,dy=0.2,dz=Z/3,color=colours)
ax.set_xlabel('\nLongitude (\u00B0)')
ax.set_ylabel('\nLatitude (\u00B0)')
ax.set_zlabel('\nlog\u2081\u2080(Population)')
ax.set_zlim3d(4,7)
ax.view_init(elev=70,azim=280)
colourMap = plt.cm.ScalarMappable(cmap=plt.cm.rainbow_r)
colourMap.set_array(Z)
colBar = plt.colorbar(colourMap).set_label('log\u2081\u2080(Population)')
plt.show()
The code produces this plot:
I think that the majority of the bars should be red/orange in colour, with only the major cities being yellow/green/blue...
(Z/np.log10(max(pop))).min() is 0.7. So all values are indeed in the upper range of the colormap.
You probably want to normalize your data before giving it to the colormap:
norm = plt.Normalize((Z/np.log10(max(pop))).min(), (Z/np.log10(max(pop))).max())
colours = plt.cm.rainbow_r(norm(Z/np.log10(max(pop))))

scipy.optimize.curvefit fails when using bounds

I'm trying to fit a set of data with a function (see the example below) using scipy.optimize.curvefit,
but when I use bounds (documentation) the fit fails and I simply get
the initial guess parameters as output.
As soon as I substitute -np.inf ad np.inf as bounds for the second parameter
(dt in the function), the fit works.
What am I doing wrong?
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt
#Generate data
crc=np.array([-1.4e-14, 7.3e-14, 1.9e-13, 3.9e-13, 6.e-13, 8.0e-13, 9.2e-13, 9.9e-13,
1.e-12, 1.e-12, 1.e-12, 1.0e-12, 1.1e-12, 1.1e-12, 1.1e-12, 1.0e-12, 1.1e-12])
time=np.array([0., 368., 648., 960., 1520.,1864., 2248., 2655., 3031.,
3384., 3688., 4048., 4680., 5343., 6055., 6928., 8120.])
#Define the function for the fit
def testcurve(x, Dp, dt):
k = -Dp*(x+dt)*2e11
curve = 1e-12 * (1+2*(-np.exp(k) + np.exp(4*k) - np.exp(9*k) + np.exp(16*k)))
curve[0]= 0
return curve
#Set fit bounds
dtmax=time[2]
param_bounds = ((-np.inf, -dtmax),(np.inf, dtmax))
#Perform fit
(par, par_cov) = opt.curve_fit(testcurve, time, crc, p0 = (5e-15, 0), bounds = param_bounds)
#Print and plot output
print(par)
plt.plot(time, crc, 'o')
plt.plot(time, testcurve(time, par[0], par[1]), 'r-')
plt.show()
I encountered the same behavior today in a different fitting problem. After some searching online, I found this link quite helpful: Why does scipy.optimize.curve_fit not fit to the data?
The short answer is that: using extremely small (or large) numbers in numerical fitting is not robust and scale them leads to a much better fitting.
In your case, both crc and Dp are extremely small numbers which could be scaled up. You could play with the scale factors and within certain range the fitting looks quite robust. Full example:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt
#Generate data
crc=np.array([-1.4e-14, 7.3e-14, 1.9e-13, 3.9e-13, 6.e-13, 8.0e-13, 9.2e-13, 9.9e-13,
1.e-12, 1.e-12, 1.e-12, 1.0e-12, 1.1e-12, 1.1e-12, 1.1e-12, 1.0e-12, 1.1e-12])
time=np.array([0., 368., 648., 960., 1520.,1864., 2248., 2655., 3031.,
3384., 3688., 4048., 4680., 5343., 6055., 6928., 8120.])
# add scale factors to the data as well as the fitting parameter
scale_factor_1 = 1e12 # 1./np.mean(crc) also works if you don't want to set the scale factor manually
scale_factor_2 = 1./2e11
#Define the function for the fit
def testcurve(x, Dp, dt):
k = -Dp*(x+dt)*2e11 * scale_factor_2
curve = 1e-12 * (1+2*(-np.exp(k) + np.exp(4*k) - np.exp(9*k) + np.exp(16*k))) * scale_factor_1
curve[0]= 0
return curve
#Set fit bounds
dtmax=time[2]
param_bounds = ((-np.inf, -dtmax),(np.inf, dtmax))
#Perform fit
(par, par_cov) = opt.curve_fit(testcurve, time, crc*scale_factor_1, p0 = (5e-15/scale_factor_2, 0), bounds = param_bounds)
#Print and plot output
print(par[0]*scale_factor_2, par[1])
plt.plot(time, crc*scale_factor_1, 'o')
plt.plot(time, testcurve(time, par[0], par[1]), 'r-')
plt.show()
Fitting results: [6.273102923176595e-15, -21.12202697564494], which gives a reasonable fitting and also is very close to the result without any bounds: [6.27312512e-15, -2.11307470e+01]

2d density contour plot with matplotlib

I'm attempting to plot my dataset, x and y (generated from a csv file via numpy.genfromtxt('/Users/.../somedata.csv', delimiter=',', unpack=True)) as a simple density plot. To ensure this is self containing I will define them here:
x = [ 0.2933215 0.2336305 0.2898058 0.2563835 0.1539951 0.1790058
0.1957057 0.5048573 0.3302402 0.2896122 0.4154893 0.4948401
0.4688092 0.4404935 0.2901995 0.3793949 0.6343423 0.6786809
0.5126349 0.4326627 0.2318232 0.538646 0.1351541 0.2044524
0.3063099 0.2760263 0.1577156 0.2980986 0.2507897 0.1445099
0.2279241 0.4229934 0.1657194 0.321832 0.2290785 0.2676585
0.2478505 0.3810182 0.2535708 0.157562 0.1618909 0.2194217
0.1888698 0.2614876 0.1894155 0.4802076 0.1059326 0.3837571
0.3609228 0.2827142 0.2705508 0.6498625 0.2392224 0.1541462
0.4540277 0.1624592 0.160438 0.109423 0.146836 0.4896905
0.2052707 0.2668798 0.2506224 0.5041728 0.201774 0.14907
0.21835 0.1609169 0.1609169 0.205676 0.4500787 0.2504743
0.1906289 0.3447547 0.1223678 0.112275 0.2269951 0.1616036
0.1532181 0.1940938 0.1457424 0.1094261 0.1636615 0.1622345
0.705272 0.3158471 0.1416916 0.1290324 0.3139713 0.2422002
0.1593835 0.08493619 0.08358301 0.09691083 0.2580497 0.1805554 ]
y = [ 1.395807 1.31553 1.333902 1.253527 1.292779 1.10401 1.42933
1.525589 1.274508 1.16183 1.403394 1.588711 1.346775 1.606438
1.296017 1.767366 1.460237 1.401834 1.172348 1.341594 1.3845
1.479691 1.484053 1.468544 1.405156 1.653604 1.648146 1.417261
1.311939 1.200763 1.647532 1.610222 1.355913 1.538724 1.319192
1.265142 1.494068 1.268721 1.411822 1.580606 1.622305 1.40986
1.529142 1.33644 1.37585 1.589704 1.563133 1.753167 1.382264
1.771445 1.425574 1.374936 1.147079 1.626975 1.351203 1.356176
1.534271 1.405485 1.266821 1.647927 1.28254 1.529214 1.586097
1.357731 1.530607 1.307063 1.432288 1.525117 1.525117 1.510123
1.653006 1.37388 1.247077 1.752948 1.396821 1.578571 1.546904
1.483029 1.441626 1.750374 1.498266 1.571477 1.659957 1.640285
1.599326 1.743292 1.225557 1.664379 1.787492 1.364079 1.53362
1.294213 1.831521 1.19443 1.726312 1.84324 ]
Now, I have used many attempts to plot my contours using variations on:
delta = 0.025
OII_OIII_sAGN_sorted = numpy.arange(numpy.min(OII_OIII_sAGN), numpy.max(OII_OIII_sAGN), delta)
Dn4000_sAGN_sorted = numpy.arange(numpy.min(Dn4000_sAGN), numpy.max(Dn4000_sAGN), delta)
OII_OIII_sAGN_X, Dn4000_sAGN_Y = np.meshgrid(OII_OIII_sAGN_sorted, Dn4000_sAGN_sorted)
Z1 = matplotlib.mlab.bivariate_normal(OII_OIII_sAGN_X, Dn4000_sAGN_Y, 1.0, 1.0, 0.0, 0.0)
Z2 = matplotlib.mlab.bivariate_normal(OII_OIII_sAGN_X, Dn4000_sAGN_Y, 0.5, 1.5, 1, 1)
# difference of Gaussians
Z = 0.2 * (Z2 - Z1)
pyplot_middle.contour(OII_OIII_sAGN_X, Dn4000_sAGN_Y, Z, 12, colors='k')
This doesn't seem to give the desired output.I have also tried:
H, xedges, yedges = np.histogram2d(OII_OIII_sAGN,Dn4000_sAGN)
extent = [xedges[0],xedges[-1],yedges[0],yedges[-1]]
ax.contour(H, extent=extent)
Not quite working as I wanted either. Essentially, I am looking for something similar to this:
If anyone could help me with this I would be very grateful, either by suggesting a totally new method or modifying my existing code. Please also attach images of your output if you have some useful techniques or ideas.
seaborn does density plots right out of the box:
import seaborn as sns
import matplotlib.pyplot as plt
sns.kdeplot(x, y)
plt.show()
It seems that histogram2d takes some fiddling to plot the contour in the right place. I took the transpose of the histogram matrix and also took the mean values of the elements in xedges and yedges instead of just removing one from the end.
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
h, xedges, yedges = np.histogram2d(x, y, bins=9)
xbins = xedges[:-1] + (xedges[1] - xedges[0]) / 2
ybins = yedges[:-1] + (yedges[1] - yedges[0]) / 2
h = h.T
CS = plt.contour(xbins, ybins, h)
plt.scatter(x, y)
plt.show()

Scipy curve_fit does not seem to change the initial parameters

I know that there are several similar questions such as Use of curve_fit to fit data but the answer there (specific float type) does not seem to work for me. I am wondering if anyone is able to explain to me why the following code (which is an adaptation from an answer -> https://stackoverflow.com/a/11507723/1093485) does not work.
sample code:
#! /usr/bin/env python
import numpy
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def gaussFunction(x, A, mu, sigma):
return A*numpy.exp(-(x-mu)**2/(2.*sigma**2))
x_points = [4245.428, 4245.4378, 4245.4477, 4245.4575, 4245.4673, 4245.4772, 4245.487, 4245.4968, 4245.5066, 4245.5165, 4245.5263, 4245.5361, 4245.546, 4245.5558, 4245.5656, 4245.5755, 4245.5853, 4245.5951, 4245.605, 4245.6148, 4245.6246, 4245.6345, 4245.6443, 4245.6541, 4245.6639, 4245.6738, 4245.6836, 4245.6934, 4245.7033, 4245.7131, 4245.7229, 4245.7328, 4245.7426, 4245.7524, 4245.7623, 4245.7721, 4245.7819, 4245.7918, 4245.8016, 4245.8114, 4245.8213, 4245.8311, 4245.8409, 4245.8508, 4245.8606, 4245.8704, 4245.8803, 4245.8901, 4245.8999, 4245.9097, 4245.9196, 4245.9294, 4245.9392, 4245.9491, 4245.9589, 4245.9687, 4245.9786, 4245.9884, 4245.9982, 4246.0081, 4246.0179, 4246.0277, 4246.0376, 4246.0474, 4246.0572, 4246.0671, 4246.0769, 4246.0867, 4246.0966, 4246.1064, 4246.1162, 4246.1261, 4246.1359, 4246.1457, 4246.1556, 4246.1654, 4246.1752, 4246.1851, 4246.1949, 4246.2047, 4246.2146]
y_points = [978845.0, 1165115.0, 1255368.0, 1253901.0, 1199857.0, 1134135.0, 1065403.0, 977347.0, 866444.0, 759457.0, 693284.0, 679772.0, 696896.0, 706494.0, 668272.0, 555221.0, 374547.0, 189968.0, 161754.0, 216483.0, 181937.0, 73967.0, 146627.0, 263495.0, 284992.0, 240291.0, 327541.0, 555690.0, 758847.0, 848035.0, 800159.0, 645769.0, 444412.0, 249627.0, 125078.0, 254856.0, 498501.0, 757049.0, 977861.0, 1125316.0, 1202892.0, 1263220.0, 1366361.0, 1497071.0, 1559804.0, 1464012.0, 1196896.0, 848736.0, 584363.0, 478640.0, 392943.0, 312466.0, 355540.0, 320666.0, 114711.0, 690948.0, 1409389.0, 1825921.0, 1636486.0, 1730980.0, 4081179.0, 7754166.0, 11747734.0, 15158681.0, 17197366.0, 17388832.0, 15710571.0, 12593935.0, 8784968.0, 5115720.0, 2277684.0, 769734.0, 674437.0, 647250.0, 708156.0, 882759.0, 833756.0, 504655.0, 317790.0, 711106.0, 1011705.0]
# Try to fit the gaussian
trialX = numpy.linspace(x_points[0],x_points[-1],1000)
coeff, var_matrix = curve_fit(gaussFunction, x_points, y_points)
yEXP = gaussFunction(trialX, *coeff)
# Plot the data (just for visualization)
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x_points, y_points, 'b*')
plt.plot(trialX,yEXP, '--')
plt.show()
The output that it generates:
Without starting parameters they default to 1. Putting in
reasonable p0, it works:
p0 = [np.max(y_points), x_points[np.argmax(y_points)], 0.1]
coeff, var_matrix = curve_fit(gaussFunction, x_points, y_points, p0)
Leads to:

Categories

Resources