I have a simple plot containing two datasets in arrays, and am trying to use regression to calculate a best fit line through the points.
However the line I am getting is way off to the left and up of the data points.
How I can get the line to be in the right place, and are there any other tips and suggestions to my code?
from pylab import *
Is = array([-13.74,-13.86,-13.32,-18.41,-23.83])
gra = array([31.98,29.41,28.12,34.28,40.09])
plot(gra,Is,'kx')
(m,b) = polyfit(Is,gra,1)
print(b)
print(m)
z = polyval([m,b],Is)
plot(Is,z,'k--')
If anyone is curious, the data is the bandgap of a silicon transistor at various temperatures.
You have to be careful as to which of your arrays you pass as x coordinates and which as y coordinates. Consider that you have data values y at positions x. Then you have to evaluate the polynomial wrt. x too.
from pylab import*
Is = array([-13.74,-13.86,-13.32,-18.41,-23.83])
gra = array([31.98,29.41,28.12,34.28,40.09])
# rename the variables for clarity
x = gra
y = Is
plot(x, y, 'kx')
(m,b) = polyfit(x, y, 1)
print(b)
print(m)
z = polyval([m,b], x)
plot(x, z, 'k--')
show()
Related
I want to be able to find the intersection between a line and a three-dimensional surface.
Mathematically, I have done this by taking the following steps:
Define the (x, y, z) coordinates of the line in a parametric manner. e.g. (x, y, z) = (1+t, 2+3t, 1-t)
Define the surface as a function. e.g. z = f(x, y)
Substitute the values of x, y, and z from the line into the surface function.
By solving, I would be able to get the intersection of the surface and the line
I want to know if there is a method for doing this in Python. I am also open to suggestions on more simple ways to solving for the intersection.
You can use the following code:
import numpy as np
import scipy as sc
import scipy.optimize
from matplotlib import pyplot as plt
def f(x, y):
""" Function of the surface"""
# example equation
z = x**2 + y**2 -10
return z
p0 = np.array([1, 2, 1]) # starting point for the line
direction = np.array( [1, 3, -1]) # direction vector
def line_func(t):
"""Function of the straight line.
:param t: curve-parameter of the line
:returns xyz-value as array"""
return p0 + t*direction
def target_func(t):
"""Function that will be minimized by fmin
:param t: curve parameter of the straight line
:returns: (z_line(t) - z_surface(t))**2 – this is zero
at intersection points"""
p_line = line_func(t)
z_surface = f(*p_line[:2])
return np.sum((p_line[2] - z_surface)**2)
t_opt = sc.optimize.fmin(target_func, x0=-10)
intersection_point = line_func(t_opt)
The main idea is to reformulate the algebraic equation point_of_line = point_of_surface (condition for intersection) into a minimization problem: |point_of_line - point_of_surface| → min. Due to the representation of the surface as z_surface = f(x, y) it is convenient to calculate the distance for a given t-value only on basis of the z-values. This is done in target_func(t). And then the optimal t-value is found by fmin.
The correctness and plausibility of the result can be checked with some plotting:
from mpl_toolkits.mplot3d import Axes3D
ax = plt.subplot(projection='3d')
X = np.linspace(-5, 5, 10)
Y = np.linspace(-5, 5, 10)
tt = np.linspace(-5, 5, 100)
XX, YY = np.meshgrid(X, Y)
ZZ = f(XX, YY)
ax.plot_wireframe(XX, YY, ZZ, zorder=0)
LL = np.array([line_func(t) for t in tt])
ax.plot(*LL.T, color="orange", zorder=10)
ax.plot([x], [y], [z], "o", color="red", ms=10, zorder=20)
Note that this combination of wire frame and line plots does not handle well, which part of the orange line should be below the blue wire lines of the surface.
Also note, that for this type of problem there might be any number of solutions from 0 up to +∞. This depends on the actual surface. fmin finds an local optimum, this might be a global optimum with target_func(t_opt)=0 or it might not. Changing the initial guess x0 might change which local optimum fmin finds.
I am trying to draw a best fit curve for my data. It is terribly bad sample of data, but for simplicity's sake let's say, I expect to draw a straight line as a best fit in log-log scale.
I think I already did that with regression and it returns me a reasonable fit line. But I want to double check it with curve fit function in scipy. And I also want to extract the equation of the fit line.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.optimize as optimization
x = np.array([ 1.72724547e-08, 1.81960233e-08, 1.68093027e-08, 2.22839973e-08,
2.23090589e-08, 4.28020801e-08, 2.30004711e-08, 2.48543008e-08,
1.08633065e-07, 3.24417303e-08, 3.22946248e-08, 3.82328031e-08,
3.97713860e-08, 3.44080732e-08, 3.81526816e-08, 3.30756706e-08
])
y = np.array([ 4.18793565e+12, 4.40554864e+12, 4.48745390e+12, 4.50816705e+12,
4.57088190e+12, 4.60256574e+12, 4.66659380e+12, 4.79733449e+12, 7.31139083e+12, 7.53355564e+12, 8.03526122e+12, 8.14704284e+12,
8.47227414e+12, 8.62978548e+12, 8.81048873e+12, 9.46237161e+12
])
# Regression Function
def regress(x, y):
"""Return a tuple of predicted y values and parameters for linear regression."""
p = sp.stats.linregress(x, y)
b1, b0, r, p_val, stderr = p
y_pred = sp.polyval([b1, b0], x)
return y_pred, p
# plotting z
allx, ally = x, y # data, non-transformed
y_pred, _ = regress(np.log(allx), np.log(ally)) # change here # transformed input
plt.loglog(allx, ally, marker='$\\star$',color ='g', markersize=5,linestyle='None')
plt.loglog(allx, np.exp(y_pred), "c--", label="regression") # transformed output
# Let's fit an exponential function.
# This looks like a line on a lof-log plot.
def myExpFunc(x, a, b):
return a * np.power(x, b)
popt, pcov = curve_fit(myExpFunc, x, y, maxfev=1000)
plt.plot(x, myExpFunc(x, *popt), 'r:',
label="({0:.3f}*x**{1:.3f})".format(*popt))
print "Exponential Fit: y = (a*(x**b))"
print "\ta = popt[0] = {0}\n\tb = popt[1] = {1}".format(*popt)
plt.show()
Again I apologize for a bad dataset. your help will be very appreciated.
My plot looks like this:
enter code here
I am trying to run a least square algorithm using numpy and is having trouble. Can someone please tell me what I am doing wrong in the given code? When I set y to be y = np.power(X, 1) + np.random.rand(20)*3 or some other reasonable function of x, everything is working fine. But for that particular y defined by those given y values, the plot I am getting is senseless.
Is this some kind of numerical problem?
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
X = np.arange(1,21)
y = np.array([-0.00454712, -0.00457764, -0.0045166 , -0.00442505, -0.00427246,
-0.00411987, -0.00378418, -0.003479 , -0.00314331, -0.00259399,
-0.00213623, -0.00146484, -0.00082397, -0.00030518, 0.00027466,
0.00076294, 0.00146484, 0.00192261, 0.00247192, 0.00314331])
#y = np.power(X, 1) + np.random.rand(20)*3
w = np.linalg.lstsq(X.reshape(20, 1), y)[0]
plt.plot(X, y, 'red')
plt.plot(X, X*w[0], 'blue')
plt.show()
Are you sure there is a linear relationship between what you are fitting and the y variable data?
Using the code (y = np.power(X, 1) + np.random.rand(20)*3) from your example, you have a linear relationship built into the y variable itself (with some noise) which allows your plot to track relatively well with the linear equation.
X = np.arange(1,21)
#y = np.power(X, 1) + np.random.rand(20)*3
w = np.linalg.lstsq(X.reshape(20, 1), y)[0]
plt.plot(X, y, 'red')
plt.plot(X, X*w[0], 'blue')
plt.show()
However, when you alternate to something like your y variable
y = np.array([-0.00454712, -0.00457764, -0.0045166 , -0.00442505, -0.00427246,
-0.00411987, -0.00378418, -0.003479 , -0.00314331, -0.00259399,
-0.00213623, -0.00146484, -0.00082397, -0.00030518, 0.00027466,
0.00076294, 0.00146484, 0.00192261, 0.00247192, 0.00314331])
You end up with something less easy to fit.
Looking at the documentation, if you are attempting to something that fits this set of values, you will need to build in a constant component in which case lstsq does not do by default.
The docs state for lstsq
Return the least-squares solution to a linear matrix equation.
Solves the equation a x = b
If you really want to fit the data to a linear equation, running code like the below will give you something that almost matches your original data. However, the data behind this process seems to have polynomial/exponential driver which would make polyfit better.
X = np.arange(1,21)
y = np.array([-0.00454712, -0.00457764, -0.0045166 , -0.00442505, -0.00427246,
-0.00411987, -0.00378418, -0.003479 , -0.00314331, -0.00259399,
-0.00213623, -0.00146484, -0.00082397, -0.00030518, 0.00027466,
0.00076294, 0.00146484, 0.00192261, 0.00247192, 0.00314331])
#y = np.power(X, 1) + np.random.rand(20)*3
X2 = np.vstack([X, np.ones(len(X))]).T
w = np.linalg.lstsq(X2, y)[0]
plt.plot(X, y, 'red')
plt.plot(X, X.dot(w[0])+w[1], 'blue')
plt.show()
I'm new to Python so please be patient. I appreciate any help!
What I have: three 1D lists (xr, yr, zr), one containing x-values, the other two y- and z-values
What I want to do: create a 3D contour plot in matplotlib
I realized that I need to convert the three 1D lists into three 2D lists, by using the meshgrid function.
Here's what I have so far:
xr = np.asarray(xr)
yr = np.asarray(yr)
zr = np.asarray(zr)
X, Y = np.meshgrid(xr,yr)
znew = np.array([zr for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = znew.reshape(X.shape)
Running this gives me the following error (for the last line I entered above):
total size of new array must be unchanged
I went digging around stackoverflow, and tried using suggestions from people having similar problems. Here are the errors I get from each of those suggestions:
Changing the last line to:
Z = znew.reshape(X.shape[0])
Gives the same error.
Changing the last line to:
Z = znew.reshape(X.shape[0], len(znew))
Gives the error:
Shape of x does not match that of z: found (294, 294) instead of (294, 86436).
Changing it to:
Z = znew.reshape(X.shape, len(znew))
Gives the error:
an integer is required
Any ideas?
Well,sample code below works for me
import numpy as np
import matplotlib.pyplot as plt
xr = np.linspace(-20, 20, 100)
yr = np.linspace(-25, 25, 110)
X, Y = np.meshgrid(xr, yr)
#Z = 4*X**2 + Y**2
zr = []
for i in range(0, 110):
y = -25.0 + (50./110.)*float(i)
for k in range(0, 100):
x = -20.0 + (40./100.)*float(k)
v = 4.0*x*x + y*y
zr.append(v)
Z = np.reshape(zr, X.shape)
print(X.shape)
print(Y.shape)
print(Z.shape)
plt.contour(X, Y, Z)
plt.show()
TL;DR
import matplotlib.pyplot as plt
import numpy as np
def get_data_for_mpl(X, Y, Z):
result_x = np.unique(X)
result_y = np.unique(Y)
result_z = np.zeros((len(result_x), len(result_y)))
# result_z[:] = np.nan
for x, y, z in zip(X, Y, Z):
i = np.searchsorted(result_x, x)
j = np.searchsorted(result_y, y)
result_z[i, j] = z
return result_x, result_y, result_z
xr, yr, zr = np.genfromtxt('data.txt', unpack=True)
plt.contourf(*get_data_for_mpl(xr, yr, zr), 100)
plt.show()
Detailed answer
At the beginning, you need to find out for which values of x and y the graph is being plotted. This can be done using the numpy.unique function:
result_x = numpy.unique(X)
result_y = numpy.unique(Y)
Next, you need to create a numpy.ndarray with function values for each point (x, y) from zip(X, Y):
result_z = numpy.zeros((len(result_x), len(result_y)))
for x, y, z in zip(X, Y, Z):
i = search(result_x, x)
j = search(result_y, y)
result_z[i, j] = z
If the array is sorted, then the search in it can be performed not in linear time, but in logarithmic time, so it is enough to use the numpy.searchsorted function to search. but to use it, the arrays result_x and result_y must be sorted. Fortunately, sorting is part of the numpy.unique method and there are no additional actions to do. It is enough to replace the search (this method is not implemented anywhere and is given simply as an intermediate step) method with np.searchsorted.
Finally, to get the desired image, it is enough to call the matplotlib.pyplot.contour or matplotlib.pyplot.contourf method.
If the function value does not exist for (x, y) for all x from result_x and all y from result_y, and you just want to not draw anything, then it is enough to replace the missing values with NaN. Or, more simply, create result_z as numpy.ndarray` from NaN and then fill it in:
result_z = numpy.zeros((len(result_x), len(result_y)))
result_z[:] = numpy.nan
I have pieced together the
following code to plot a triangular mesh with the colors specified by an
additional scalar function:
#! /usr/bin/env python
import numpy as np
from mayavi import mlab
# Create cone
n = 8
t = np.linspace(-np.pi, np.pi, n)
z = np.exp(1j*t)
x = z.real.copy()
y = z.imag.copy()
z = np.zeros_like(x)
triangles = [(0, i, i+1) for i in range(n)]
x = np.r_[0, x]
y = np.r_[0, y]
z = np.r_[1, z]
t = np.r_[0, t]
# These are the scalar values for each triangle
f = np.mean(t[np.array(triangles)], axis=1)
# Plot it
mesh = mlab.triangular_mesh(x, y, z, triangles,
representation='wireframe',
opacity=0)
cell_data = mesh.mlab_source.dataset.cell_data
cell_data.scalars = f
cell_data.scalars.name = 'Cell data'
cell_data.update()
mesh2 = mlab.pipeline.set_active_attribute(mesh,
cell_scalars='Cell data')
mlab.pipeline.surface(mesh2)
mlab.show()
This works reasonably well. However, instead of having every triangle
with a uniform color and sharp transitions between the triangles, I'd
much rather have a smooth interpolation over the entire surface.
Is there a way to do that?
I think you want to use point data instead of cell data. With cell data, a single scalar value is not localized to any point. It is assigned to the entire face. It looks like you just want to assign the t data to the vertices instead. The default rendering of point scalars will smoothly interpolate across each face.
point_data = mesh.mlab_source.dataset.point_data
point_data.scalars = t
point_data.scalars.name = 'Point data'
point_data.update()
mesh2 = mlab.pipeline.set_active_attribute(mesh,
point_scalars='Point data')