Can someone explain how to make a scatter plot and linear regression from an excel file?
I know how to import the the file with pandas, I know how to do a scatter plot by plugging in my own data in matplotlib, but I don't know how to make python do all three from the file.
Ideally it would also give r value, p value, std error, slope and intercept.
I'm very new to all of this and any help would be great.
I've searched around stack overflow, reddit, and else where, but I haven't found anything recent.
SciPy has a basic linear regression function that fits your criteria: scipy.stats.linregress Just use the appropriate columns from your DataFrame as x and y.
Pyplot's basic plt.plot(x, y) function will give you a line: matplotlib.pyplot.plot. You can compute a set of y values using the slope and intercept.
Related
Apologies beforehand, since I'm relatively new to Python.
I have the following data for a reaction describing the growth of a compound:
data
The first derivative of this S-shaped curve describing the reaction seems to resemble an F-distribution curve. In my understanding, a cumulative distribution function (CDF) is an integral of a distribution curve, thus I was hoping to fit a CDF like F (10,10) to fit and model my reaction 1 (resembling the bottom right function of the image attached below).
cdf
The formula to describe such curve shape is written as follows:
Formula CDF F-distribution
Thus my question is: How can I write this formula in a pythonic way? NOTE: I've tried fitting different types of logistic functions, but none are fitted correctly. The CDF like F, however, seems to properly describe reaction 1.
Thanks a lot for the help!!
Out of the box seaborn does a very good job to plot a 2D KDE or jointplot. However it is not returning anything like a function that I can evaluate to numerically read the values of the estimated density.
How can I evaluate numerically the density that sns.kdeplot or jointplot has put in the plot?
Just for completeness. I see something interesting in the scipy docs, stats.gaussian_kde but I am getting very clunky density plots,
which for some reason because of missing extent are really off compared to the scatter plot. So I would like to stay away from the scipy kde, at least until I figure how to make it work why pyplot is so much more "not smart" as seaborn is.
Anyhow, the evaluate method of the scipy.stats.gaussian_kde does its job.
I also faced this issue in jointplot() method. I opened a file distribution.py on this path anaconda3/lib/python3.7/site-packages/seaborn/. Then I added these lines in _bivariate_kdeplot() function:
print("xx=",xx[50])
print("yy=",yy[:,50])
print("z=",z[50])
This prints out 100 values of x,y and z arrays of 50 index. Where "z" is the density and "xx" and "yy" are the values adjusted according to the bandwidth, cut and clip, in a meshgrid form distributed according to grid size, that were given by the user. This gave me some idea about the actual values of the 2D kde plot.
If you print out entire array of each variable then you will get 100 x 100 values of each.
I am writing a code that allows me to obtain the linear regression of some measures. I have used different codes but with all of them I get strange result. Instead of being a line with a constant slope, the line I get is first horizontal and between the penultimate point and the last point the slope decreases.
The code I am using is:
import matplotlib.pyplot as plt
import numpy as np
x0=[0.00000001,0.000001,0.0001,0.01]
y0=[0.9974209723854539,0.9945196648709005,0.9914759279447916,0.9852556749265332]
x=np.array(x0)
y=np.array(y0)
m,b=np.polyfit(x,y,1)
print(m,b)
plt.scatter(x,y)
plt.plot(x,m*x+b,color='green')
plt.xscale('log')
d=['linear fit '+str(round(m,4))+'*x+'+str(round(b,4)),'real measure']
plt.legend(d,loc="upper right",borderaxespad=0.1,title="")
And I get the following graph:
Phyton plot
Which is very different from what I should get, which I have been able to draw in Origin:
Origin plot
I have tried various linear fit methods but with all of them I get this form which is wrong.
Hopefully you can help me find the error.
Thank you very much.
I have a list of counts ('y' in the code below), that I am using to plot a probability distribution - so note it is not raw data but really frequencies that I have already calculated which should fall across various bins. A scatter plot and even a histogram (plotted with the bar function) revealed that it was some manner of bimodal distribution. I wanted to be able to fit a pdf to this so I first tried just a sum of gaussians but the curve fitting algorithm in SciPy was unsuccessful in fitting the curve. I then came across Kernel Density Estimation which from what I have read is the best way to achieve this but for some reason, even after putting together code from here at stack overflow from an answer to a similar question and also from a different website, both of which recommended using the gaussian_kde function from scipy.stats, I have so far been unsuccessful in being able to do so. Am I wrong in assuming that I can do this for what I have in the first place? If I am correct, what can I do to get it right?
x = np.linspace(x_min, x_max, n_bins)
y = np.array(normed_pdf)
plt.scatter(x,y,s=5, label='Sim Data')
plt.hold('on')
kde = gaussian_kde(y, bw_method=0.1 / y.std(ddof=1))
kde.covariance_factor = lambda : .25
kde._compute_covariance()
plt.plot(x, kde(x), 'r-', label='fit')
plt.hold('off')
plt.grid(True)
plt.legend(prop={'size':10})
plt.show()
I know that I might as well use R or GNUPlot or some other tool to do this but I want to be able to do it within Python. Call me a stickler self-contained, consistent code.
I am looking for something in python to fit a few 3d points to a paraboloid.
I think it can be done with scipy's curve_fit() but i am not able to get that right.
Framing the problem more precisely:
I have a set of x, y and z coordinates and wish to fit a paraboloid in those points as well as plot the entire picture (with points and fitted paraboloid)
I know there are a few questions related to this, but they are not that general and i did not find myself convinced with those answers