Extrapolating with a single data point - python

Is there an function for extrapolating in numpy?
I tried using the interp but of course that interpolates between the range of my values and not outside the range of values.
So for example i have my x-values between 1 and 8, inclusive, and for each x-value, i have its corresponding y-value and I want to find the y-value when my x-value is 0
import numpy as np
x = np.arange(1,8,1)
y = np.array((10,20,30,40,50,60,70))
np.interp(0,x,y)
Is there a function like the interp??

scipy.interpolate.interp1d allows extrapolation.
import numpy as np
from scipy import interpolate
x = np.arange(1,8,1)
y = np.array((10,20,30,40,50,60,70))
interpolate.interp1d(x, y, fill_value='extrapolate')
hope this answers your question

Related

How to get the second derivative/dip from the graph or generate the best eps value

Dataset is below
,id,revenue ,profit
0,101,779183,281257
1,101,144829,838451
2,101,766465,757565
3,101,353297,261071
4,101,1615461,275760
5,101,246731,949229
6,101,951518,301016
7,101,444669,430583
Code is below
import pandas as pd;
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import StandardScaler
import seaborn as sns
from sklearn.neighbors import NearestNeighbors
df = pd.read_csv('1.csv',index_col=None)
df1 = StandardScaler().fit_transform(df)
dbsc = DBSCAN(eps = 2.5, min_samples = 20).fit(df1)
labels = dbsc.labels_
My shape of df is 1999
I got the dip value eps value from the below method, from graph its clear that eps=2.5
Below is the method to find the best eps value
ns = 5
nbrs = NearestNeighbors(n_neighbors=ns).fit(df3)
distances, indices = nbrs.kneighbors(df3)
distanceDec = sorted(distances[:,ns-1], reverse=True)
plt.plot(indices[:,0], distanceDec)
#plt.plot(list(range(1,2000)), distanceDec)
How to find the dip in the graph automatically by the system mean best eps is expected out? without looking in to graph, my system has to tell best eps
If I understand correctly, you are looking for the precise y value of the inflection point appearing in your ε(x) plot (it should be around 2.0), right?
If this is correct, being ε(x) your curve, the problem is reduced to:
Compute the second derivative of your curve: ε''(x).
Find the zero (or zeroes) of such second derivative: x0.
Recover the optimized ε value, just by plugging the zero into your curve: ε(x0).
Here I attach my answer, based in this two other Stack Overflow answers:
https://stackoverflow.com/a/26042315/10489040 (Compute derivative of an array)
https://stackoverflow.com/a/3843124/10489040 (Find zero in array)
import numpy as np
import matplotlib.pyplot as plt
# Generating x data range from -1 to 4 with a step of 0.01
x = np.arange(-1, 4, 0.01)
# Simulating y data with an inflection point as y(x) = x³ - 5x² + 2x
y = x**3 - 5*x**2 + 2*x
# Plotting your curve
plt.plot(x, y, label="y(x)")
# Computing y 1st derivative of your curve with a step of 0.01 and plotting it
y_1prime = np.gradient(y, 0.01)
plt.plot(x, y_1prime, label="y'(x)")
# Computing y 2nd derivative of your curve with a step of 0.01 and plotting it
y_2prime = np.gradient(y_1prime, 0.01)
plt.plot(x, y_2prime, label="y''(x)")
# Finding the index of the zero (or zeroes) of your curve
x_zero_index = np.where(np.diff(np.sign(y_2prime)))[0]
# Finding the x value of the zero of your curve
x_zero_value = x[x_zero_index][0]
# Finding the y value corresponding to the x value of the zero
y_zero_value = y[x_zero_index][0]
# Reporting
print(f'The inflection point of your curve is {y_zero_value:.3f}.')
In any case, keep in mind that the inflection point (around 2.0) does not match with the "dip" point appearing around 2.5.

Python generate random right skewed gaussian with constraints

I need to generate a unit curve that is going to look like a right skewed gaussian and I have the following constraints:
The X axis is Days (variable but usually 45+)
All values on the Y axis sum to 1
The peak will always occur around day 4 or 5
Example:
Is there a way to do this programmatically in python?
as noted by #Severin, a gamma looks to be a reasonable fit. e.g:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as sps
x = np.linspace(75)
plt.plot(x, sps.gamma.pdf(x, 4) '.-')
plt.show()
if they really need to sum to 1, rather than integrate, I'd use the cdf and then use np.diff on the result

Include sigma in polyfit- python

I have 3 arrays. One array contains x-values, the second array contains y-values, and the third array contains values for sigma (errors).
How can I use the numpy.polyfit function to fit for x, y, and sigma? I have figured out how to fit the x and y values but not sigma.
import numpy as np
p = np.polyfit(x,y,2)
xp = np.linspace(0.4,1,40)
y = np.polyval(p,xp)
Use the w parameter as described here
p = np.polyfit(x,y,2,w=1/sigma)

Interpolation without specifying indices in Python

I have two arrays of the same length, say array x and array y. I want to find the value of y corresponding to x=0.56. This is not a value present in array x.
I would like python to find by itself the closest value larger than 0.56 (and its corresponding y value) and the closest value smaller than 0.56 (and its corresponding y value). Then simply interpolate to find the value of y when x 0.56.
This is easily done when I find the indices of the two x values and corresponding y values by myself and input them into Python (see following bit of code).
But is there any way for python to find the indices by itself?
#interpolation:
def effective_height(h1,h2,g1,g2):
return (h1 + (((0.56-g1)/(g2-g1))*(h2-h1)))
eff_alt1 = effective_height(x[12],x[13],y[12],y[13])
In this bit of code, I had to find the indices [12] and [13] corresponding to the closest smaller value to 0.56 and the closest larger value to 0.56.
Now I am looking for a similar technique where I would just tell python to interpolate between the two values of x for x=0.56 and print the corresponding value of y when x=0.56.
I have looked at scipy's interpolate but don't think it would help in this case, although further clarification on how I can use it in my case would be helpful too.
Does Numpy interp do what you want?:
import numpy as np
x = [0,1,2]
y = [2,3,4]
np.interp(0.56, x, y)
Out[81]: 2.56
Given your two arrays, x and y, you can do something like the following using SciPy.
from scipy.interpolate import InterpolatedUnivariateSpline
spline = InterpolatedUnivariateSpline(x, y, k=5)
spline(0.56)
The keyword k must be between 1 and 5, and controls the degree of the spline.
Example:
>>> x = range(10)
>>> y = range(0, 100, 10)
>>> spline = InterpolatedUnivariateSpline(x, y, k=5)
>>> spline(0.56)
array(5.6000000000000017)

Fourier transform of a Gaussian is not a Gaussian, but thats wrong! - Python

I am trying to utilize Numpy's fft function, however when I give the function a simple gausian function the fft of that gausian function is not a gausian, its close but its halved so that each half is at either end of the x axis.
The Gaussian function I'm calculating is
y = exp(-x^2)
Here is my code:
from cmath import *
from numpy import multiply
from numpy.fft import fft
from pylab import plot, show
""" Basically the standard range() function but with float support """
def frange (min_value, max_value, step):
value = float(min_value)
array = []
while value < float(max_value):
array.append(value)
value += float(step)
return array
N = 256.0 # number of steps
y = []
x = frange(-5, 5, 10/N)
# fill array y with values of the Gaussian function
cache = -multiply(x, x)
for i in cache: y.append(exp(i))
Y = fft(y)
# plot the fft of the gausian function
plot(x, abs(Y))
show()
The result is not quite right, cause the FFT of a Gaussian function should be a Gaussian function itself...
np.fft.fft returns a result in so-called "standard order": (from the docs)
If A = fft(a, n), then A[0]
contains the zero-frequency term (the
mean of the signal), which is always
purely real for real inputs. Then
A[1:n/2] contains the
positive-frequency terms, and
A[n/2+1:] contains the
negative-frequency terms, in order of
decreasingly negative frequency.
The function np.fft.fftshift rearranges the result into the order most humans expect (and which is good for plotting):
The routine np.fft.fftshift(A)
shifts transforms and their
frequencies to put the zero-frequency
components in the middle...
So using np.fft.fftshift:
import matplotlib.pyplot as plt
import numpy as np
N = 128
x = np.arange(-5, 5, 10./(2 * N))
y = np.exp(-x * x)
y_fft = np.fft.fftshift(np.abs(np.fft.fft(y))) / np.sqrt(len(y))
plt.plot(x,y)
plt.plot(x,y_fft)
plt.show()
Your result is not even close to a Gaussian, not even one split into two halves.
To get the result you expect, you will have to position your own Gaussian with the center at index 0, and the result will also be positioned that way. Try the following code:
from pylab import *
N = 128
x = r_[arange(0, 5, 5./N), arange(-5, 0, 5./N)]
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(r_[y[N:], y[:N]])
plot(r_[y_fft[N:], y_fft[:N]])
show()
The plot commands split the arrays in two halfs and swap them to get a nicer picture.
It is being displayed with the center (i.e. mean) at coefficient index zero. That is why it appears that the right half is on the left, and vice versa.
EDIT: Explore the following code:
import scipy
import scipy.signal as sig
import pylab
x = sig.gaussian(2048, 10)
X = scipy.absolute(scipy.fft(x))
pylab.plot(x)
pylab.plot(X)
pylab.plot(X[range(1024, 2048)+range(0, 1024)])
The last line will plot X starting from the center of the vector, then wrap around to the beginning.
A fourier transform implicitly repeats indefinitely, as it is a transform of a signal that implicitly repeats indefinitely. Note that when you pass y to be transformed, the x values are not supplied, so in fact the gaussian that is transformed is one centred on the median value between 0 and 256, so 128.
Remember also that translation of f(x) is phase change of F(x).
Following on from Sven Marnach's answer, a simpler version would be this:
from pylab import *
N = 128
x = ifftshift(arange(-5,5,5./N))
y = exp(-x*x)
y_fft = fft(y) / sqrt(2 * N)
plot(fftshift(y))
plot(fftshift(y_fft))
show()
This yields a plot identical to the above one.
The key (and this seems strange to me) is that NumPy's assumed data ordering --- in both frequency and time domains --- is to have the "zero" value first. This is not what I'd expect from other implementations of FFT, such as the FFTW3 libraries in C.
This was slightly fudged in the answers from unutbu and Steve Tjoa above, because they're taking the absolute value of the FFT before plotting it, thus wiping away the phase issues resulting from not using the "standard order" in time.

Categories

Resources