linear interpolation between two data points

linear interpolation between two data points - python

I have two data points x and y:
x = 5 (value corresponding to 95%)
y = 17 (value corresponding to 102.5%)
No I would like to calculate the value for xi which should correspond to 100%.
x = 5 (value corresponding to 95%)
xi = ?? (value corresponding to 100%)
y = 17 (value corresponding to 102.5%)
How should I do this using python?

is that what you want?
In [145]: s = pd.Series([5, np.nan, 17], index=[95, 100, 102.5])
In [146]: s
Out[146]:
95.0 5.0
100.0 NaN
102.5 17.0
dtype: float64
In [147]: s.interpolate(method='index')
Out[147]:
95.0 5.0
100.0 13.0
102.5 17.0
dtype: float64

You can use numpy.interp function to interpolate a value
import numpy as np
import matplotlib.pyplot as plt
x = [95, 102.5]
y = [5, 17]
x_new = 100
y_new = np.interp(x_new, x, y)
print(y_new)
# 13.0
plt.plot(x, y, "og-", x_new, y_new, "or");

We can easily plot this on a graph without Python:
This shows us what the answer should be (13).
But how do we calculate this? First, we find the gradient with this:
The numbers substituted into the equation give this:
So we know for 0.625 we increase the Y value by, we increase the X value by 1.
We've been given that Y is 100. We know that 102.5 relates to 17. 100 - 102.5 = -2.5. -2.5 / 0.625 = -4 and then 17 + -4 = 13.
This also works with the other numbers: 100 - 95 = 5, 5 / 0.625 = 8, 5 + 8 = 13.
We can also go backwards using the reciprocal of the gradient (1 / m).
We've been given that X is 13. We know that 102.5 relates to 17. 13 - 17 = -4. -4 / 0.625 = -2.5 and then 102.5 + -2.5 = 100.
How do we do this in python?
def findXPoint(xa,xb,ya,yb,yc):
m = (xa - xb) / (ya - yb)
xc = (yc - yb) * m + xb
return
And to find a Y point given the X point:
def findYPoint(xa,xb,ya,yb,xc):
m = (ya - yb) / (xa - xb)
yc = (xc - xb) * m + yb
return yc
This function will also extrapolate from the data points.

Related

numpy sorting and removing top values

I don't know if there is a name for this algorithm, but basically for a given y, I want to find the maximum x such that:
import numpy as np
np_array = np.random.rand(1000, 1)
np.sum(np_array[np_array > x] - x) >= y
Of course, a search algo would be to find the top value n_1, reduce it to the second largest value, n_2. Stop if n_1 - n-2 > y; else reduce both n_1 and n_2 to n_3, stop if (n_1 - n_3) + (n_2 - n_3) > y ...
But I feel there must be an algo to generate a sequence of {xs} that converges to its true value.

Let's use your example from the comments:
a = np.array([0.1, 0.3, 0.2, 0.6, 0.1, 0.4, 0.5, 0.2])
y = 0.5
First let's sort the data in descending order:
s = np.sort(a)[::-1] # 0.6, 0.5, 0.4, 0.3, 0.2, 0.2, 0.1,
Let's take a look at how the choice of x affects the possible values of the sum r = np.sum(np_array[np_array > x] - x):
If x ≥ 0.6, then r = 0.0 - x ⇒ -∞ < r ≤ -0.6
If 0.6 > x ≥ 0.5, then r = 0.6 - x ⇒ 0.0 < r ≤ 0.1 (where 0.1 = 0.6 - 0.5 × 1)
If 0.5 > x ≥ 0.4, then r = 0.6 - x + 0.5 - x = 1.1 - 2 * x ⇒ 0.1 < r ≤ 0.3 (where 0.3 = 1.1 - 0.4 × 2)
If 0.4 > x ≥ 0.3, then r = 0.6 - x + 0.5 - x + 0.4 - x = 1.5 - 3 * x ⇒ 0.3 < r ≤ 0.6 (where 0.6 = 1.5 - 0.3 × 3)
If 0.3 > x ≥ 0.2, then r = 0.6 - x + 0.5 - x + 0.4 - x + 0.3 - x = 1.8 - 4 * x ⇒ 0.6 < r ≤ 1.0 (where 1.0 = 1.8 - 0.2 × 4)
If 0.2 > x ≥ 0.1, then r = 0.6 - x + 0.5 - x + 0.4 - x + 0.3 - x + 0.2 - x + 0.2 - x = 2.2 - 6 * x ⇒ 1.0 < r ≤ 1.6 (where 1.6 = 2.2 - 0.1 × 6)
If 0.1 > x, then r = 0.6 - x + 0.5 - x + 0.4 - x + 0.3 - x + 0.2 - x + 0.2 - x + 0.1 - x + 0.1 - x = 2.4 - 8 * x ⇒ 1.6 < r ≤ ∞
The range of r is continuous except for the portion a[0] < r ≤ 0.0. Duplicate elements affect the range of available r values for each value in a, but otherwise are nothing special. We can remove, but also account for the duplicates by using np.unique instead of np.sort:
s, t = np.unique(a, return_counts=True)
s, t = s[::-1], t[::-1]
w = np.cumsum(t)
If your data can reasonably be expected not to contain duplicates, then use the sorted s shown in the beginning, and set t = np.ones(s.size, dtype=int) and therefore w = np.arange(s.size) + 1.
For s[i] > x ≥ s[i + 1], the bounds of r are given by c[i] - w[i] * s[i] < r ≤ c[i] - w[i] * s[i + 1], where
c = np.cumsum(s * t) # You can use just `np.cumsum(s)` if no duplicates
So finding where y ends up is a matter of placing it between the correct bounds. This can be done with a binary search, e.g., np.searchsorted:
# Left bound. Sum is strictly greater than this
bounds = c - w * s
i = np.searchsorted(bounds[1:], y, 'right')
The first element of bounds is always 0.0, and the resulting index i will point to the upper bound. By truncating off the first element, we shift the result to the lower bound, and ignore the zero.
The solution is found by solving for the location of x in the selected bin:
y = c[i] - w[i] * x
So you have:
x = (c[i] - y) / w[i]
You can write a function:
def dm(a, y, duplicates=False):
if duplicates:
s, t = np.unique(a, return_counts=True)
s, t = s[::-1], t[::-1]
w = np.cumsum(t)
c = np.cumsum(s * t)
i = np.searchsorted((c - w * s)[1:], y, 'right')
x = (c[i] - y) / w[i]
else:
s = np.sort(a)[::-1]
c = np.cumsum(s)
i = np.searchsorted((c - s)[1:], y, 'right')
x = (c[i] - y) / (i + 1)
return x
This does not handle the case where y < 0, but it does allow you to enter many y values simultaneously, since searchsorted is pretty well vectorized.
Here is a usage sample:
>>> dm(a, 0.5, True)
Out[247]: 0.3333333333333333
>>> dm(a, 0.6, True)
0.3
>>> dm(a, [0.1, 0.2, 0.3, 0.4, 0.5], True)
array([0.5 , 0.45 , 0.4 , 0.36666667, 0.33333333])
As for whether this algorithm has a name: I am not aware of any. Since I wrote this, I feel that "discrete madness" is an appropriate name. Slips off the tongue nicely too: "Ah yes, I computed the threshold using discrete madness".

This is an answer to the original question, where we find the maximum x s.t. np.sum(np_array[np_array > x]) >= y:
You can accomplish this with sorting and cumulative sum:
s = np.sort(np_array)[::-1]
c = np.cumsum(s)
i = np.argmax(c > y)
result = s[i]
s is the candidates for x in descending order. Comparing the cumulative sum c to y tells you exactly where the sum will exceed y. np.argmax returns the index of the first place that happens. The result is that index extracted from s.
This computation in numpy is slower than it needs to be because we can short circuit the sum immediately without computing a separate mask. The complexity is the same, however. You could speed up the following with numba or cython:
s = np.sort(np_array)[::-1]
c = 0
for i in range(len(s)):
c += s[i]
if c > y:
break
result = s[i]

How to add new variables for an xarray dataset using groupby and apply?

I am facing serious difficulties in understanding how the xarray.groupby really works. I am trying to apply a given function "f" over each group of a xarray DatasetGroupBy collection, such that "f" should add new variables to each of the applied groups of the original xr.DataSet.
Here is a Brief Introduction:
My problem is commonly found in geoscience, remote sensing, etc.
The aim is to apply a given function over an Array, pixel by pixel (or gridcell by gridcell).
Example
Let's assume that I want to evaluate the wind speed components (u,v) of a wind-field for a given region in respect to a new direction. Therefore, I whish to evaluate rotated version of the 'u' and 'v components, namely: u_rotated and v_rotated.
Let's assume that this new direction is 30° rotated anti-clockwise in respect to each pixel position in the wind-field. So the new wind components would be (u_30_degrees and v_30_degrees).
My first attempt was to stack each of the x and y coordinates (or longitudes and latitudes) into a new dimension called pixel, and later groupby by this new dimension ("pixel") and apply a function which would do the vector-wind rotation.
Here is a snippet of my initial attempt:
# First, let's create some functions for vector rotation:
def rotate_2D_vector_per_given_degrees(array2D, angle=30):
'''
Parameters
----------
array2D : 1D length 2 numpy array
angle : float angle in degrees (optional)
DESCRIPTION. The default is 30.
Returns
-------
Rotated_2D_Vector : 1D of length 2 numpy array
'''
R = get_rotation_matrix(rotation = angle)
Rotated_2D_Vector = np.dot(R, array2D)
return Rotated_2D_Vector
def get_rotation_matrix(rotation=90):
'''
Description:
This function creates a rotation matrix given a defined rotation angle (in degrees)
Parameters:
rotation: in degrees
Returns:
rotation matrix
'''
theta = np.radians(rotation) # degrees
c, s = np.cos(theta), np.sin(theta)
R = np.array(((c, -s), (s, c)))
return R
# Then let's create a reproducible dataset for analysis:
u_wind = xr.DataArray(np.ones( shape=(20, 30)),
dims=('x', 'y'),
coords={'x': np.arange(0, 20),
'y': np.arange(0, 30)},
name='u')
v_wind = xr.DataArray(np.ones( shape=(20, 30))*0.3,
dims=('x', 'y'),
coords={'x': np.arange(0, 20),
'y': np.arange(0, 30)},
name='v')
data = xr.merge([u_wind, v_wind])
# Let's create the given function that will be applied per each group in the dataset:
def rotate_wind(array, degrees=30):
# This next line, I create a 1-dimension vector of length 2,
# with wind speed of the u and v components, respectively.
# The best solution I found has been conver the dataset into a single xr.DataArray
# by stacking the 'u' and 'v' components into a single variable named 'wind'.
vector = array.to_array(dim='wind').values
# Now, I rotate the wind vector given a rotation angle in degrees
Rotated = rotate_2D_vector_per_given_degrees(vector, degrees)
# Ensuring numerical division problems as 1e-17 == 0.
Rotated = np.where( np.abs(Rotated - 6.123234e-15) < 1e-15, 0, Rotated)
# sanity check for each point position:
print('Coords: ', array['point'].values,
'Wind Speed: ', vector,
'Response :', Rotated,
end='\n\n'+'-'*20+'\n')
components = [a for a in data.variables if a not in data.dims]
for dim, value in zip(components, Rotated):
array['{0}_rotated_{1}'.format(dim, degrees)] = value
return array
# Finally, lets stack our dataset per grid-point, groupby this new dimension, and apply the desired function:
stacked = data.stack(point = ['x', 'y'])
stacked = stacked.groupby('point').apply(rotate_wind)
# lets unstack the data to recover the original dataset:
data = stacked.unstack('point')
# Let's check if the function worked correctly
data.to_dataframe().head(30)
Though the above example is apparently working, I am still unsure if its results are correct, or even if the groupby-apply function implementation is efficient (clean, non-redundant, fast, etc.).
Any insights are most welcome!
Sincerely,

You can merely multiply the whole array by the rotation matrice, something like np.dot(R, da).
So, if you have the following Dataset:
>>> dims = ("x", "y")
>>> sizes = (20, 30)
>>> ds = xr.Dataset(
data_vars=dict(u=(dims, np.ones(sizes)), v=(dims, np.ones(sizes) * 0.3)),
coords={d: np.arange(s) for d, s in zip(dims, sizes)},
)
>>> ds
<xarray.Dataset>
Dimensions: (x: 20, y: 30)
Coordinates:
* x (x) int64 0 1 2 3 4 ... 16 17 18 19
* y (y) int64 0 1 2 3 4 ... 26 27 28 29
Data variables:
u (x, y) float64 1.0 1.0 ... 1.0 1.0
v (x, y) float64 0.3 0.3 ... 0.3 0.3
Converted, like you did, to the following DataArray:
>>> da = ds.stack(point=["x", "y"]).to_array(dim="wind")
>>> da
<xarray.DataArray (wind: 2, point: 600)>
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.3, 0.3, 0.3, ..., 0.3, 0.3, 0.3]])
Coordinates:
* point (point) MultiIndex
- x (point) int64 0 0 0 0 ... 19 19 19 19
- y (point) int64 0 1 2 3 ... 26 27 28 29
* wind (wind) <U1 'u' 'v'
Then, you have your rotated values thanks to np.dot(R, da):
>>> np.dot(R, da).shape
(2, 600)
>>> type(np.dot(R, da))
<class 'numpy.ndarray'>
But it is a numpy ndarray. So if you want to go back to a xarray DataArray, you can use a trick like that (other solutions may exist):
def rotate(da, dim, angle):
# Put dim first
dims_orig = da.dims
da = da.transpose(dim, ...)
# Rotate
R = rotation_matrix(angle)
rotated = da.copy(data=np.dot(R, da), deep=True)
# Rename values of "dim" coord according to rotation
rotated[dim] = [f"{orig}_rotated_{angle}" for orig in da[dim].values]
# Transpose back to orig
return rotated.transpose(*dims_orig)
And use it like:
>>> da_rotated = rotate(da, dim="wind", angle=30)
>>> da_rotated
<xarray.DataArray (wind: 2, point: 600)>
array([[0.7160254 , 0.7160254 , 0.7160254 , ..., 0.7160254 , 0.7160254 ,
0.7160254 ],
[0.75980762, 0.75980762, 0.75980762, ..., 0.75980762, 0.75980762,
0.75980762]])
Coordinates:
* point (point) MultiIndex
- x (point) int64 0 0 0 0 ... 19 19 19 19
- y (point) int64 0 1 2 3 ... 26 27 28 29
* wind (wind) <U12 'u_rotated_30' 'v_rotated_30'
Eventually, you can go back to the original Dataset structure like that:
>>> ds_rotated = da_rotated.to_dataset(dim="wind").unstack(dim="point")
>>> ds_rotated
<xarray.Dataset>
Dimensions: (x: 20, y: 30)
Coordinates:
* x (x) int64 0 1 2 3 ... 17 18 19
* y (y) int64 0 1 2 3 ... 27 28 29
Data variables:
u_rotated_30 (x, y) float64 0.716 ... 0.716
v_rotated_30 (x, y) float64 0.7598 ... 0.7598

How can I bin a Pandas Series setting the bin size to a preset value of max/min for each bin

I have a pd.Series of floats and I would like to bin it into n bins where the bin size for each bin is set so that max/min is a preset value (e.g. 1.20)?
The requirement means that the size of the bins is not constant. For example:
data = pd.Series(np.arange(1, 11.0))
print(data)
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
dtype: float64
I would like the bin sizes to be:
1.00 <= bin 1 < 1.20
1.20 <= bin 2 < 1.20 x 1.20 = 1.44
1.44 <= bin 3 < 1.44 x 1.20 = 1.73
...
etc
Thanks

Here's one with pd.cut, where the bins can be computed taking the np.cumprod of an array filled with 1.2:
data = pd.Series(list(range(11)))
import numpy as np
n = 20 # set accordingly
bins= np.r_[0,np.cumprod(np.full(n, 1.2))]
# array([ 0. , 1.2 , 1.44 , 1.728 ...
pd.cut(data, bins)
0 NaN
1 (0.0, 1.2]
2 (1.728, 2.074]
3 (2.986, 3.583]
4 (3.583, 4.3]
5 (4.3, 5.16]
6 (5.16, 6.192]
7 (6.192, 7.43]
8 (7.43, 8.916]
9 (8.916, 10.699]
10 (8.916, 10.699]
dtype: category
Where bins in this case goes up to:
np.r_[0,np.cumprod(np.full(20, 1.2))]
array([ 0. , 1.2 , 1.44 , 1.728 , 2.0736 ,
2.48832 , 2.985984 , 3.5831808 , 4.29981696, 5.15978035,
6.19173642, 7.43008371, 8.91610045, 10.69932054, 12.83918465,
15.40702157, 18.48842589, 22.18611107, 26.62333328, 31.94799994,
38.33759992])
So you'll have to set that according to the range of values of the actual data

This is I believe the best way to do it because you are considering the max and min values from your array. Therefore you won't need to worry about what values are you using, only the multiplier or step_size for your bins (of course you'd need to add a column name or some additional information if you will be working with a DataFrame):
data = pd.Series(np.arange(1, 11.0))
bins = []
i = min(data)
while i < max(data):
bins.append(i)
i = i*1.2
bins.append(i)
bins = list(set(bins))
bins.sort()
df = pd.cut(data,bins,include_lowest=True)
print(df)
Output:
0 (0.999, 1.2]
1 (1.728, 2.074]
2 (2.986, 3.583]
3 (3.583, 4.3]
4 (4.3, 5.16]
5 (5.16, 6.192]
6 (6.192, 7.43]
7 (7.43, 8.916]
8 (8.916, 10.699]
9 (8.916, 10.699]
Bins output:
Categories (13, interval[float64]): [(0.999, 1.2] < (1.2, 1.44] < (1.44, 1.728] < (1.728, 2.074] < ... <
(5.16, 6.192] < (6.192, 7.43] < (7.43, 8.916] <
(8.916, 10.699]]

Thanks everyone for all the suggestions. None does quite what I was after (probably because my original question wasn't clear enough) but they really helped me figure out what to do so I have decided to post my own answer (I hope this is what I am supposed to do as I am relatively new at being an active member of stackoverflow...)
I liked #yatu's vectorised suggestion best because it will scale better with large data sets but I am after the means to not only automatically calculate the bins but also figure out the minimum number of bins needed to cover the data set.
This is my proposed algorithm:
The bin size is defined so that bin_max_i/bin_min_i is constant:
bin_max_i / bin_min_i = bin_ratio
Figure out the number of bins for the required bin size (bin_ratio):
data_ratio = data_max / data_min
n_bins = math.ceil( math.log(data_ratio) / math.log(bin_ratio) )
Set the lower boundary for the smallest bin so that the smallest data point fits in it:
bin_min_0 = data_min
Create n non-overlapping bins meeting the conditions:
bin_min_i+1 = bin_max_i
bin_max_i+1 = bin_min_i+1 * bin_ratio
Stop creating further bins once all dataset can be split between the bins already created. In other words, stop once:
bin_max_last > data_max
Here is a code snippet:
import math
import pandas as pd
bin_ratio = 1.20
data = pd.Series(np.arange(2,12))
data_ratio = max(data) / min(data)
n_bins = math.ceil( math.log(data_ratio) / math.log(bin_ratio) )
n_bins = n_bins + 1 # bin ranges are defined as [min, max)
bins = np.full(n_bins, bin_ratio) # initialise the ratios for the bins limits
bins[0] = bin_min_0 # initialise the lower limit for the 1st bin
bins = np.cumprod(bins) # generate bins
print(bins)
[ 2. 2.4 2.88 3.456 4.1472 4.97664
5.971968 7.1663616 8.59963392 10.3195607 12.38347284]
I am now set to build a histogram of the data:
data.hist(bins=bins)

Update function based on categorical values python

MatchId ExpectedGoals_Team1 ExpectedGoals_Team2 Timestamp Stages Home Away
0 698085 0.8585339288573895 1.4819072820614578 2016-08-13 11:30:00 0 [92, 112] [94]
1 698086 1.097064295289673 1.0923520385902274 2016-09-12 14:00:00 0 [] [164]
2 698087 1.2752442136224664 0.8687263006179976 2016-11-25 14:00:00 1 [90] [147]
3 698088 1.0571269856980154 1.4323522262211752 2016-02-16 14:00:00 2 [10, 66, 101] [50, 118]
4 698089 1.2680212913301165 0.918961072480616 2016-05-10 14:00:00 2 [21] [134, 167]
Here is the function that needs to be updating the outcomes based on the categorized column 'Stages'.
x1 = np.array([1, 0, 0])
x2 = np.array([0, 1, 0])
x3 = np.array([0, 0, 1])
total_timeslot = 196
m=1
def squared_diff(row):
ssd = []
Home = row.Home
Away = row.Away
y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
for k in range(total_timeslot):
if k in Home:
ssd.append(sum((x2 - y) ** 2))
elif k in Away:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return sum(ssd)
sum(df.apply(squared_diff, axis=1))
For m=1, Out[400]: 7636.305551658377
By assigning an arbitrary value of m for each category in Stages I want to test a cost function. Let m1 = 2, m2 = 3.
Here is how I attempted.
def stages(row):
Stages = row.Stages
if Stages == 0:
return np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
elif Stages == 1:
return np.array([1 - (row.ExpectedGoals_Team1*m1 + row.ExpectedGoals_Team2*m1), row.ExpectedGoals_Team1*m1, row.ExpectedGoals_Team2*m1])
else:
return np.array([1 - (row.ExpectedGoals_Team1*m2 + row.ExpectedGoals_Team2*m2), row.ExpectedGoals_Team1*m2, row.ExpectedGoals_Team2*m2])
df.apply(squared_diff, Stages, axis=1)
TypeError: apply() got multiple values for argument 'axis'

df.apply(squared_diff, Stages, axis=1) got error because the second parameter is for axis so it thought axis=Stages, but then the third parameter is again axis=1.
To address the problem, you can first store desired m into a separate column
df['m'] = df.Stages.apply(lambda x: 1 if x == 0 else 2 if x == 1 else 3)
Then replace this line in your squared_diff function
y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
with
y = np.array([1 - (row.ExpectedGoals_Team1*row.m + row.ExpectedGoals_Team2*row.m), row.ExpectedGoals_Team1*row.m, row.ExpectedGoals_Team2*row.m])

Fitting an exponential decay using a convolution integral -

I'm fitting the following data where t: time (s), G: counts, f: impulse function:
t G f
-7200 4.7 0
-6300 5.17 0
-5400 4.93 0
-4500 4.38 0
-3600 4.47 0
-2700 4.4 0
-1800 3.36 0
-900 3.68 0
0 4.58 0
900 11.73 11
1800 18.23 8.25
2700 19.33 3
3600 19.04 0.5
4500 17.21 0
5400 12.98 0
6300 11.59 0
7200 9.26 0
8100 7.66 0
9000 6.59 0
9900 5.68 0
10800 5.1 0
Using the following convolution integral:
And more specifically:
Where: lambda_1 = 0.000431062 and lambda_2 = 0.000580525.
The code used to perform that fitting is:
#Extract data into numpy arrays
t=df['t'].as_matrix()
g=df['G'].as_matrix()
f=df['f'].as_matrix()
#Definition of the function
def convol(x,A,B,C):
dx=x[1]-x[0]
return A*np.convolve(f, np.exp(-lambda_1*x))[:len(x)]*dx+B*np.convolve(f, np.exp(-lambda_2*x))[:len(x)]*dx+C
#Determination of fit parameters A,B,C
popt, pcov = curve_fit(convol, t, g)
A,B,C= popt
perr = np.sqrt(np.diag(pcov))
#Plot fit
fit = convol(t,A,B,C)
plt.plot(t, fit)
plt.scatter(t, g,s=50, color='black')
plt.show()
The problem is that my fit parameters, A, and B are too low and have no physical meaning. I think my problem is related to the step width dx. It should tend to 0 in order to approximate my sum (np.convolve() corresponds a discrete sum of the convolution product) into an integral.

While this is not an answer, I cannot format code in a comment and so post it here. This code shows how to add bounds to curve_fit. Note that if parameter values are returned at or extremely near the bounds there is likely some other problem.
#Determination of fit parameters A,B,C
lowerBounds = [0.0, 0.0, 0.0] # A, B, C lower bounds
upperBounds = [10.0, 10.0, 10.0] # A, B, C upper bounds
popt, pcov = curve_fit(convol, t, g, bounds=[lowerBounds, upperBounds])

I think the problem is that the convolution calculation is incorrect.
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
t = np.array([ -7200, -6300, -5400, -4500, -3600, -2700, -1800, -900, 0, 900, 1800, 2700, 3600, 4500, 5400, 6300, 7200, 8100, 9000, 9900, 10800])
g = np.array([ 4.7, 5.17, 4.93, 4.38, 4.47, 4.4, 3.36, 3.68, 4.58, 11.73, 18.23, 19.33, 19.04, 17.21, 12.98, 11.59, 9.26, 7.66, 6.59, 5.68, 5.1])
f = np.array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 8.25, 3, 0.5, 0, 0, 0, 0, 0, 0, 0, 0])
lambda_1 = 0.000431062
lambda_2 = 0.000580525
delta_t = 900
# Define the exponential parts of the integrals
x_1 = np.exp(-lambda_1 * t)
x_2 = np.exp(-lambda_2 * t)
# Define the convolution for a given 't' (in this case, using the index of 't')
def convolution(n, x):
return np.dot(f[:n], x[:n][::-1])
# The integrals do not vary as part of the optimization, so calculate them now
integral_1 = delta_t * np.array([convolution(i, x_1) for i in range(len(t))])
integral_2 = delta_t * np.array([convolution(i, x_2) for i in range(len(t))])
#Definition of the function
def convol(n,A,B,C):
return A * integral_1[n] + B * integral_2[n] + C
#Determination of fit parameters A,B,C
popt, pcov = scipy.optimize.curve_fit(convol, range(len(t)), g)
A,B,C= popt
perr = np.sqrt(np.diag(pcov))
# Print out the coefficients determined by the optimization
print(A, B, C)
#Plot fit
fit = convol(range(len(t)),A,B,C)
plt.plot(t, fit)
plt.scatter(t, g,s=50, color='black')
plt.show()
The values that I get for the coefficients are:
A = 7.9742184468342304e-05
B = -1.0441976351760864e-05
C = 5.1089841502260178
I don't know if the negative value for B is reasonable or not so I have left it as-is. If you want coefficients that are positive, you can constrain them as shown by James.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

linear interpolation between two data points - python

is that what you want? In [145]: s = pd.Series([5, np.nan, 17], index=[95, 100, 102.5]) In [146]: s Out[146]: 95.0 5.0 100.0 NaN 102.5 17.0 dtype: float64 In [147]: s.interpolate(method='index') Out[147]: 95.0 5.0 100.0 13.0 102.5 17.0 dtype: float64

You can use numpy.interp function to interpolate a value import numpy as np import matplotlib.pyplot as plt x = [95, 102.5] y = [5, 17] x_new = 100 y_new = np.interp(x_new, x, y) print(y_new) # 13.0 plt.plot(x, y, "og-", x_new, y_new, "or");

Related

numpy sorting and removing top values

How to add new variables for an xarray dataset using groupby and apply?

How can I bin a Pandas Series setting the bin size to a preset value of max/min for each bin

Update function based on categorical values python

Fitting an exponential decay using a convolution integral -

Categories

Resources