I am trying to fit a Gaussian Distribution over a histogram for a project.
I have read a csv file and produced a pandas data frame.
I have produced the histogram, and I have tried a few methods to fit a distribution, however I always end up with a straight line formed at the bottom of my histogram. I think it may have something to do with the column I am trying to fit over is an Float64 but I'm not sure how to change that.
So far I have...
x=df1['rating']
plt.figure(figsize(10,10))
plt.hist(x, bins=20,color='c',edgecolor='k', alpha=0.65, linewidth=2)
plt.axvline(x.mean(), color='k', linestyle='dashed', linewidth=2,label="mean")
plt.axvline(x.median(),color='r',linestyle='dashed',linewidth=2)
Rating Histogram
Related
I am plotting density map of ~40k points but hist2d returns a uniform density map. This is my code
hist2d(x, y, bins=(1000, 1000), cmap=plt.cm.jet)
Here is the scatter plot
Here is the histogram
I was expecting that there is a red horizontal portion in the center and the gradually turns blue towards higher/lower y values
EDIT:
#bb1 suggested decrease the number of bins but by setting it to bins=(100, 1000), I get this result
I think you are specifying too many bins. By setting bins=(1000,000) you get 1,000,000 bins. With 40,000 points, most of the bins will be empty and they overwhelm the image.
You may also consider using seaborn kdeplot() function instead of plt.hist2d(). It will visualize the density of data without subdividing data into bins:
import seaborn as sns
sns.kdeplot(x=x, y=y, levels = 100, fill=True, cmap="mako", thresh=0)
I am plotting a histogram of observed values from a population against a normal distribution (dervived from the mean and std of the sample). The sample has an unusual number of observations of value 0 (not to be confused with "NAN"). As a result, the graph of the two does not show clearly.
How can I best truncate the one bar in the histogram to allow the rest of the plot to fill the frame?
Why don't you set the y-limit to be 0.00004? Then you can analyze better the plot.
axes = plt.gca()
axes.set_xlim([xmin,xmax])
axes.set_ylim([ymin,ymax])
I am using Python Seaborn package to plot the kde of both original and sampled data. The issue is that the values in the y-axis is very small with multiple zeros. Is it possible to normalized the values or make it look more elegant?.
My implementation:
ax=sns.kdeplot(old_d,shade=True,label='Original kde')
ax=sns.kdeplot(new_d,shade=True, label='Sampled kde')
plt.legend(prop={'size': 12})
ax.set_xlabel('CPU time (in microsecond)',size=16)
ax.set_ylabel('Probability',size=16)
plt.show()
Example of this code,
So I have a line plot, and I want to add markers on only some of the points along the plot (I have detected the peaks in the plot and want to mark them). When I plot without the peaks labelled it works as it should, and when I plot the peaks alone it seems to plot them properly, but when I try to plot them on the same plot, the line plot disappears over most of the graph and seems to maybe have become compressed to the side of the plot, if that makes any sense?
Here is my code without the peaks plotted and the resulting graph:
def plotPeaks(file):
indices, powerSums, times=detectPeaks(file)
plt.figure(figsize=(100, 10))
plt.plot(times, powerSums)
Plot without peaks marked
Then when I add the code that should show the peaks, which occur at x-values corresponding to the values stored in the indices, I get this:
def plotPeaks(file):
indices, powerSums, times=detectPeaks(file)
plt.figure(figsize=(100, 10))
plt.plot(times, powerSums)
for i in indices:
plt.scatter(i, powerSums[i], marker='o')
Plot with peaks marked
Am I missing something obvious, or is this a glitch that someone has a solution for?
Assuming indices stores indices of times, this should be the last line.
plt.scatter(times[i], powerSums[i], marker='o')
My Question:
How can i draw a curve though this data, thus describing an equation for this plot..
I generated this scatter plot by following code, but I am not able to figure out how to generate an equation for this data and draw the corresponding curve on this plot simultaneously. Please Help.!
def draw(data,xlabel,ylabel):
print('length of data : ',len(data))
x,y = [],[]
for i in data:
x.append((i[1]))
y.append((i[0]))
plt.scatter(x, y,marker=r'o',color='b')
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.show()
Basically I want something like this:
You have to perform a curve fitting procedure, which is called a regression problem in mathematics. In your case it seems that data is more or less exponential, but you can fit arbitrary function through scipy.optimize.curve_fit
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html