Python: log-lin plot with negative values - python

I am attempting to make a log-lin plot where my y-axis has negative values. I also want to fit a best-fit line to it.
Here's the lin-lin plot:
Here's the plot with the (wrong) code:
plt.scatter(x, y, c='indianred', alpha=0.5)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
plt.plot(x,p(x),"-", color="grey", alpha=0.5)
plt.yscale('log')
plt.show()
Even tho the line for the plot is wrong, it plots the y-axis as I want it (I believe). However, if I comment plt.plot(x,p(x),"-", color="grey", alpha=0.5), I get the following plot instead:
This is essentially the same I get when writing the code the proper way:
plt.scatter(x, y, c='indianred', alpha=0.5)
p = np.polyfit(x, np.log(y), 1)
plt.semilogy(x, np.exp(p[0] * x + p[1]), 'g--')
plt.yscale('log')
plt.show()
but I obviously get the following error due to my negative y-values:
RuntimeWarning: invalid value encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)
What alternative do I have so my y-values can be better seen? A best-fit line is not a must. I just want to better observe the relationship between these variables (which should be correlated theoretically).

Related

Polynomial regression plot looking weird

I'm trying to plot a fitted polynomial using matplotlib:
my code:
x = data['LSTAT'].values.reshape(-1,1).copy()
y = data['MEDV'].values.reshape(-1,1).copy()
plt.figure(figsize=(8,5))
from sklearn.preprocessing import PolynomialFeatures
polynomial_features= PolynomialFeatures(degree=2)
xp = polynomial_features.fit_transform(x)
#xp.sort(axis=0)
model = LinearRegression().fit(xp,y)
y_pred = model.predict(xp)
plt.scatter(x,y)
plt.plot(x, y_pred, color='r')
plt.show()
my resulting plot:
Now, I have tried the fix proposed in these two posts:
wrong polynomial regression plot
why is my draw of 3-degree polymonial so weird?
if I uncomment the xp.sort(axis=0), which is the proposed solution of 1), I get the following plot:
Which is not correct.
If I try the proposed solution of 2)
plt.plot(np.sort(x),y_pred[np.argsort(x)], color='r')
I get the following error:
ValueError: x and y can be no greater than 2D, but have shapes (506, 1) and (506, 1, 1)
I'm not sure what is going on...
The problem is the order that matplotlib plots.
I fixed this way, but I'm sure there are easier fixes:
#fixing indexes and sorting
x_pd = pd.Series(x.flatten())
y_pred_pd = pd.Series(y_pred.flatten())
x_sorted = x_pd.sort_values()
Y_pred = np.array(y_pred_pd[x_sorted.index])
x_arr = np.array(x_sorted)
#plotting
plt.scatter(x,y)
plt.plot(x,y_pred, color='r', alpha=0.5)
plt.plot(x_arr, Y_pred, color='g')
plt.show()
plt.close()
The resulting plot:

Difficult to plot linear regression line on scatter plot with log scale

I have a example dataframe like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'a':[0.05, 0.11, 0.18, 0.20, 0.22, 0.27],
'b':[3.14, 1.56, 33.10, 430.00, 239.10, 2600.22]})
I would like to plot these properties as a scatter plot and then show the linear tendency line of these samples. And I need to put the data on the y axis (df['b']) on log scale.
Although, when I try to do that using the aid of np.polyfit, I get a strange line.
# Coefficients for polynomial function (degree 1)
coefs = np.polyfit(df['a'], df['b'], 1)
fit_coefs = np.poly1d(coefs)
plt.figure()
plt.scatter(df['a'], df['b'], s = 50, edgecolors = 'black')
plt.plot(df['a'], fit_coefs(df['a']), color='red',linestyle='--')
plt.xlabel('a')
plt.ylabel('b')
plt.yscale('log')
And if I convert df['b] to log before the plot, I can get the right linear tendency, but I would like to show the y-axis with the values of the last plot and not as converted log values as this one below:
df['b_log'] = np.log10(df['b'])
coefs = np.polyfit(df['a'], df['b_log'], 1)
fit_coefs = np.poly1d(coefs)
plt.figure()
plt.scatter(df['a'], df['b_log'], s = 50, edgecolors = 'black')
plt.plot(df['a'], fit_coefs(df['a']), color='red', linestyle='--')
plt.xlabel('a')
plt.ylabel('b_log')
So basically, I need a plot like the last one, but the values on y-axis should be like the second plot and I still would get the right linear tendency. Anyone could help me?
You are doing two different things there: First, you are fitting a linear curve to your exponential data (which is presumably not what you want), then you are fitting a linear curve to your log data, which is ok.
In order to get the linear curve from the linear coefficients in the logarithmic plot, you can just do 10**fit_coefs(df['a']):
df['b_log'] = np.log10(df['b'])
coefs = np.polyfit(df['a'], df['b_log'], 1)
fit_coefs = np.poly1d(coefs)
plt.figure()
plt.scatter(df['a'], df['b'], s = 50, edgecolors = 'black')
plt.plot(df['a'], 10**fit_coefs(df['a']), color='red', linestyle='--')
plt.xlabel('a')
plt.ylabel('b_log')
plt.yscale("log")

How to plot integration equation using Python?

I have a few of integration equations and need to convert it into Python. The problem is when I tried to plot a graph according to the equation, some of the plot is not same with the original one.
The first equation is the error probability of authentication in normal operation:
The second equation is the error probability of authentication under MIM attack:
The error probability can be calculated by:
It is noted that:
Supposedly, the graph (original) will be shown like this:
Pe^normal = blue lines
Pe^MIM = red lines
Differences between two error probabilities = green lines
I tried to code it into Python and this is my full codes:
import matplotlib.pyplot as plt
import math
import numpy as np
from scipy.special import iv,modstruve
x=np.arange(0.1,21,1)
x = np.array(x)
t = 0.9
y = (np.exp(t*x/2)*(iv(0, t*x/2) - modstruve(0,t*x/2))-1)/(np.exp(t*x)-1)
z = (np.exp((1-t**2)*x/2)*(iv(0, (1-t**2)*x/2) - modstruve(0,(1-t**2)*x/2))-1)/(np.exp((1-t**2)*x)-1)
z2= y+z
plt.plot(x, y,'o', color='red',label='Normal')
plt.plot(x, z2, '-', color='black', label='MIM')
plt.plot(x, z, marker='s', linestyle='--', color='g', label='DIFF')
plt.xlabel('Mean photon number N')
plt.ylabel('Error probabiity')
plt.scatter(x,y)
plt.text(10, 0.4, 't=0.9', size=12, ha='center', va='center')
plt.ylim([0, 0.5])
plt.xlim([0, 20])
plt.legend()
plt.show()
The graph produce from the code is:
It looks like that my plot is not same with the original one in terms N=0 of Pe^MIM (red line) and differences between two error probabilities (green line).
I hope that anyone may help me to solve this problem.
Thank you.

How to plot a trendline on scatter-plot matplotlib based on KDE?

I am currently trying to plot a trend-line plot on my scatter-plot in MatPlotLib.
I am aware of numpy polyfit function. It does not do what I want.
So here what I have so far:
plot = plt.figure(figsize=(10,10)) #Set up the size of the figure
cmap = "viridis" #Set up the color map
plt.scatter(samples[1], samples[0], s=0.1, c=density_sm, cmap=cmap) #Plot the Cross-Plot
plt.colorbar().set_label('Density of points')
plt.axis('scaled')
plt.xlim(-0.3,0.3)
plt.ylim(-0.3,0.3)
plt.xlabel("Intercept")
plt.ylabel("Gradient")
plt.axhline(0, color='green', alpha=0.5, linestyle="--")
plt.axvline(0, color='green', alpha=0.5, linestyle="--")
#Trend-line_1
z = np.polyfit(samples[1], samples[0], 1)
p = np.poly1d(z)
plt.plot(samples[0],p(samples[0]),color="#CC3333", linewidth=0.5)
#Trend-line_2
reg = sm.WLS(samples[0], samples[1]).fit()
plt.plot(samples[1], reg.fittedvalues)
And here is the result:
Scatter-plot with trends
What I want is:
Scatter-Plot_desired
Trend can easily be seen, but the question is what function to use?
The behaviour of polyfit is as excepted and the result is correct. The problem is that polyfit does not do, what you expect. All (typical) fitting routines minimize the vertical (y-axis) distance between the fit and the data points to be fit. What you seem to expect is however that it minimizes the euclidean distance between the fit and the data. See the difference in this figure:
Here see also code that illustrates the fact with random data. Note that the linear relationship of the data (parameter a) is recovered by the fit, which would not be the case for the euclidean fit. Therefore the seemingly off fit is to be prefered.
N = 10000
a = -1
b = 0.1
datax = 0.3*b*np.random.randn(N)
datay = a*datax+b*np.random.randn(N)
plot = plt.figure(1,figsize=(10,10)) #Set up the size of the figure
plot.clf()
plt.scatter(datax,datay) #Plot the Cross-Plot
popt = np.polyfit(datax,datay,1)
print("Result is {0:1.2f} and should be {1:1.2f}".format(popt[-2],a))
xplot = np.linspace(-1,1,1000)
def pol(x,popt):
popt = popt[::-1]
res = 0
for i,p in enumerate(popt):
res += p*x**i
return res
plt.plot(xplot,pol(xplot,popt))
plt.xlim(-0.3,0.3)
plt.ylim(-0.3,0.3)
plt.xlabel("Intercept")
plt.ylabel("Gradient")
plt.tight_layout()
plt.show()
samples[0] is your "y" and samples[1] is your "x". In the trend line plot use samples[1].

how to fit data and then sample from the fitted function to draw curve

Given two arrays x and y,I was trying to use np.polyfit function to fit the data,using the following way:
z = np.polyfit(x, y, 20)
f = np.poly1d(z)
but since i want to plot a line chart instead of a smooth curve, so then i use this function f to sample an array for plotting line.
x_new = np.linspace(x[0], x[-1], fitting_size)
y_new = np.zeros(fitting_size)
for t in range(fitting_size):
y_new[t] = f(x_new[t])
plt.plot(x_new, y_new, marker='v', ms=1)
The problem is that the above segment code stills gives me a smooth curve. How can i fix it? Thanks.
Unfortunately the intention behind the question is a bit unclear. However, if you want to perform a linear fit, you need to provide the degree deg=1 to polyfit. There is then no reason to sample from the fit; one can simply use the same input array and apply the fitting function to it.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1,5,20)
y = 3*x**2+np.random.rand(len(x))*10
z = np.polyfit(x, y, 1)
f = np.poly1d(z)
z2 = np.polyfit(x, y, 2)
f2 = np.poly1d(z2)
plt.plot(x,y, marker=".", ls="", c="k", label="data")
plt.plot(x, f(x), label="linear fit")
plt.plot(x, f2(x), label="quadratic fit")
plt.legend()
plt.show()

Categories

Resources