Hi I have two numpy arrays (in this case representing depth and percentage depth dose data) as follows:
depth = np.array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ,
1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.2,
2.4, 2.6, 2.8, 3. , 3.5, 4. , 4.5, 5. , 5.5])
pdd = np.array([ 80.40649399, 80.35692155, 81.94323956, 83.78981286,
85.58681373, 87.47056637, 89.39149833, 91.33721651,
93.35729334, 95.25343909, 97.06283306, 98.53761309,
99.56624117, 100. , 99.62820672, 98.47564754,
96.33163961, 93.12182427, 89.0940637 , 83.82699219,
77.75436857, 63.15528566, 46.62287768, 29.9665386 ,
16.11104226, 6.92774817, 0.69401413, 0.58247614,
0.55768992, 0.53290371, 0.5205106 ])
which when plotted give the following curve:
I need to find the depth at which the pdd falls to a given value (initially 50%). I have tried slicing the arrays at the point where the pdd reaches 100% as I'm only interested in the points after this.
Unfortunately np.interp only appears to work where both x and y values are incresing.
Could anyone suggest where I should go next?
If I understand you correctly, you want to interpolate the function depth = f(pdd) at pdd = 50.0. For the purposes of the interpolation, it might help for you to think of pdd as corresponding to your "x" values, and depth as corresponding to your "y" values.
You can use np.argsort to sort your "x" and "y" by ascending order of "x" (i.e. ascending pdd), then use np.interp as usual:
# `idx` is an an array of integer indices that sorts `pdd` in ascending order
idx = np.argsort(pdd)
depth_itp = np.interp([50.0], pdd[idx], depth[idx])
plt.plot(depth, pdd)
plt.plot(depth_itp, 50, 'xr', ms=20, mew=2)
This isn't really a programming solution, but it's how you can find the depth. I'm taking the liberty of renaming your variables, so x(i) = depth(i) and y(i) = pdd(i).
In a given interval [x(i),x(i+1)], your linear interpolant is
p_1(X) = y(i) + (X - x(i))*(y(i+1) - y(i))/(x(i+1) - x(i))
You want to find X such that p_1(X) = 50. First find i such that x(i)>50 and x(i+1), then the above equation can be rearranged to give
X = x(i) + (50 - y(i))*((x(i+1) - x(i))/(y(i+1) - y(i)))
For your data (with MATLAB; sorry, no python code) I make it approximately 2.359. This can then be verified with np.interp(X, depth, pdd)
There are several methods to carry out interpolation. For your case, you are basically looking for the depth at 50% which is not available in your data. The simplest interpolation is the linear case. I'm using numerical recipes library in C++ for acquiring the interpolated value via several techniques, therefore,
Linear Interpolation: see page 117
interpolated value depth(50%): 2.35915
Polynomial Interpolation: see page 117
interpolated value depth(50%): 2.36017
Cubic Spline Interpolation: see page 120
interpolated value depth(50%): 2.19401
Rational Function Interpolation: see page 124
interpolated value depth(50%): 2.35986
Related
How do you use the new Polynomials sub-package in numpy to give it new x values and get an output of y values?
https://numpy.org/doc/stable/reference/routines.polynomials.package.html
In prior versions of numpy it went something like this:
poly = np.poly1d(np.polyfit(x, y, 3)
new_x = np.linspace(0, 100)
new_y = poly(new_x)
The new version I am struggling to give it x values that give me the y values of each?
from numpy.polynomial import Polynomial
poly = Polynomial(Polynomial.fit(x, y, 3))
When I give it an array of x it just returns the coefficients.
You can directly call the resulting series to evaluate it:
from numpy.polynomial import Polynomial
poly = Polynomial.fit(x, y, 3)
new_y = poly(new_x)
Check this page of the documentation it has several examples.
Unfortunately, the answer by #Joan Charmant and the supportive comment #rh109019 do not work.
The intuitive way suggested by #Joan Charmant is, basically, what the question's about: it doesn't work.
Evidently, there is a new method introduced in numpy.polynomial.polynomial devoted specifically to evaluating polynomials. See here.
Here's my code where I'm comparing the two approaches.
import numpy as np
Pgauge = np.asarray([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0])
NIST = np.asarray([1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 8.1])
calibrationCurve = np.polynomial.polynomial.Polynomial.fit(Pgauge,
NIST,
deg=1
)
print("The polynomial: {}".format(calibrationCurve))
x = np.asarray([0, 1]) # values of x to evaluate the polynomial at
c = calibrationCurve.coef # coefficients of the polynomial
print("The intuitive (wrong) way: {}".format(calibrationCurve(x)))
print("The correct way: {}".format(np.polynomial.polynomial.polyval(x, c)))
The first print command prints out the polynomial:4.6+3.5x.
If we want to evaluate it at the points 0 and 1 (x = np.asarray([0, 1])), we expect to get 4.6 and 8.1 respectively.
The second print command (that reads "The intuitive (wrong) way"), uses the method suggested by #Joan Charmant. It gives [0.1, 1.1] as the result. Which is wrong. Though seemingly, it looks ok: it gives two numbers as expected. But the numbers themselves are wrong. I don't know how these numbers were calculated. But if I had a bigger series of data, I wouldn't go with a calculator through it and assume I've got a correct result.
The last print command makes use of the polyval method suggested in the user manual that I cited above. And it works perfectly well. It gives [4.6, 8.1] as the result.
It so happens that my answer is wrong as well (see all the comments below by #user2357112 supports Monica).
But still, I'll leave it here for the folks who, like me, fell the victim of the confusing new numpy.polynomial library.
FIRST: why my code is wrong?
Everything's ok with it. But the line print("The polynomial: {}".format(calibrationCurve)) doesn't give me what, I think, it must give me. It takes the correct polynomial, changes its coefficients somehow and prints out a new polynomial with the changed coefficients. Still, it does store the correct polynomial in its memory and when you do the thing suggested by #Joan Charmant it may give you the correct answer if you ask it properly.
SECOND: how to use the new numpy.polynomial library in order to get a correct result?
Due to that peculiarity, you have to introduce a new line of code. Namely, do the Polynomial.fit() and immediately afterwards use the .convert() method. Then work with the converted polynomial only.
Here's my code that works correctly now.
import numpy as np
Pgauge = np.asarray([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0])
NIST = np.asarray([1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 8.1])
calibrationCurveMessedUp = np.polynomial.polynomial.Polynomial.fit(Pgauge,
NIST,
deg=1
)
calibrationCurve = calibrationCurveMessedUp.convert()
print("The polynomial: {}".format(calibrationCurve))
print("The rounded polynomial coefficients: {}".format(calibrationCurve.coef))
x = np.asarray([0, 1]) # values of x to evaluate the polynomial at
print(calibrationCurve(x))
THIRD: a little note.
Apparently, there is a possibility to get the correct polynomial without the additional line of code. Probably, you have to give the correct window and domain parameters to the Polynomial.fit() function. Or may be there is another way.
If anybody knows such a way, you're welcome to edit my current answer and add your code.
I need to generate numbers in a logarithmic space from .5 to 1.
This code accomplishes that:
IN: np.geomspace(.5, 1, num=10)
OUT: [0.5, 0.540029869446153, 0.5832645197880583, 0.6299605249474366, 0.6803950000871885, 0.7348672461377994, 0.7937005259840997, 0.8572439828530728, 0.9258747122872905, 1.0]
However, the smaller increments occur closer to .5. I'd like them to occur closer to 1 (hence, backwards, I'm just not entirely sure what the right term would be).
I've tried np.geomspace(1, .5, num=10) but it just gives me the same output in reverse order.
IIUC you can do:
import numpy as np
1.5 - np.geomspace(1, .5, num=10)
array([0.5 , 0.57412529, 0.64275602, 0.70629947, 0.76513275,
0.819605 , 0.87003948, 0.91673548, 0.95997013, 1. ])
I have various dataframes, each has a different depth range.
For a more complex computation (this is a fragment of a question posted here: Curve fitting for each column in Pandas + extrapolate values),
I need to write a function, so it would expand the depth column / array for equal increments dz (in this case 0.5) towards zero (surface).
Here the missing quotas are 0.15, 0.65 and 1.15
import numpy as np
depth = np.array([1.65, 2.15, 2.65, 3.15, 3.65, 4.15, 4.65, 5.15, 5.65, 6.15, 6.65, 7.15, 7.65, 8.15, 8.65])
Any ideas how to write a function so it does it each time for a different depth range ( i.e. depending on the varying minimum value)?
A very simple solution I did is:
depth_min = np.min(depth)
step = 0.5
missing_vals = np.arange(depth_min - step, 0, -step)[::-1]
depth_tot = np.concatenate((missing_vals, depth), axis=0)
I'm sure there exist better ways
I have two arrays: one with 30 years of observations, and one with 30 years of historical model runs. I want to calculate the standard deviation between observations and model results, to see how much the model deviates from observations. How do I go about doing this?
Edit
Here are the two arrays (Each number represents a year(1971-2000)):
obs = [ 2790.90283203 2871.02514648 2641.31738281 2721.64453125
2554.19384766 2773.7746582 2500.95825195 3238.41186523
2571.62133789 2421.93017578 2615.80395508 2271.70654297
2703.82275391 3062.25366211 2656.18359375 2593.62231445
2547.87182617 2846.01245117 2530.37573242 2535.79931641
2237.58032227 2890.19067383 2406.27587891 2294.24975586
2510.43847656 2395.32055664 2378.36157227 2361.31689453 2410.75
2593.62915039]
model = [ 2976.01928711 3353.92114258 3000.92700195 3116.5078125 2935.31787109
2799.75805664 3328.06225586 3344.66333008 3318.31689453
3348.85302734 3578.70800781 2791.78198242 4187.99902344
3610.77124023 2991.984375 3112.97412109 4223.96826172
3590.92724609 3284.6015625 3846.34936523 3955.84350586
3034.26074219 3574.46362305 3674.80175781 3047.98144531
3209.56616211 2654.86547852 2780.55053711 3117.91699219
2737.67626953]
You want to compare two signals, e.g. A and B in the following example:
import numpy as np
A = np.random.rand(5)
B = np.random.rand(5)
print "A:", A
print "B:", B
Output:
A: [ 0.66926369 0.63547359 0.5294013 0.65333154 0.63912645]
B: [ 0.17207719 0.26638423 0.55176735 0.05251388 0.90012135]
Analyzing individual signals
The standard deviation of each single signal is not what you need:
print "standard deviation of A:", np.std(A)
print "standard deviation of B:", np.std(B)
Output:
standard deviation of A: 0.0494162021651
standard deviation of B: 0.304319034639
Analyzing the difference
Instead you might compute the difference and apply some common measure like the sum of absolute differences (SAD), the sum of squared differences (SSD) or the correlation coefficient:
print "difference:", A - B
print "SAD:", np.sum(np.abs(A - B))
print "SSD:", np.sum(np.square(A - B))
print "correlation:", np.corrcoef(np.array((A, B)))[0, 1]
Output:
difference: [ 0.4971865 0.36908937 -0.02236605 0.60081766 -0.2609949 ]
SAD: 1.75045448355
SSD: 0.813021824351
correlation: -0.38247081
Use numpy.
import numpy as np
data = [1.2, 2.3, 1.3, 1.2, 5.4]
np.std(data)
Or you could try this:
import numpy as np
obs = np.array([1.2, 2.3, 1.3, 1.2, 5.4])
model = np.array([1.1, 2.4, 1.2, 1.2, 5.3])
np.std(obs-model)
The standard deviation of the same index of multiple lists (e.g. comparing model vs measurement, multiple measurement data etc.. ) as such as
import numpy as np
obs = np.array([0,1,2,3,4])
model = np.array([2,4,6,8,10])
can be calculated by stacking the data into one array:
arr = np.vstack((obs,model))
Now the standard deviation is calculated using np.std() with a specific axis
std = np.std(arr,axis=0)
Alternative one line solution:
std = np.std((model,obs),axis=0)
Output:
[1.0, 1.5, 2.0, 2.5, 3.0]
If you're doing anything more complicated than just finding the standard deviation and/or mean, use numpy/scipy. If that's all you need to do, use the statistics package from the Python Standard Library.
>>> import statistics
>>> statistics.stdev([1, 2, 3])
1.0
It was added in Python 3.4 (see PEP-450) as a lightweight alternative to Numpy for basic stats equations.
I want to develop some python code to align datasets obtained by different instruments recording the same event.
As an example, say I have two sets of measurements:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Define some data
data1 = pd.DataFrame({'TIME':[1.1, 2.4, 3.2, 4.1, 5.3],\
'VALUE':[10.3, 10.5, 11.0, 10.9, 10.7],\
'ERROR':[0.2, 0.1, 0.4, 0.3, 0.2]})
data2 = pd.DataFrame({'TIME':[0.9, 2.1, 2.9, 4.2],\
'VALUE':[18.4, 18.7, 18.9, 18.8],\
'ERROR':[0.3, 0.2, 0.5, 0.4]})
# Plot the data
plt.errorbar(data1.TIME, data1.VALUE, yerr=data1.ERROR, fmt='ro')
plt.errorbar(data2.TIME, data2.VALUE, yerr=data2.ERROR, fmt='bo')
plt.show()
The result is plotted here:
What I would like to do now is to align the second dataset (data2) to the first one (data1). i.e. to get this:
The second dataset must be shifted to match the first one by subtracting a constant (to be determined) from all its values. All I know is that the datasets are correlated since the two instruments are measuring the same event but with different sampling rates.
At this stage I do not want to make any assumptions about what function best describes the data (fitting will be done after alignment).
I am cautious about using means to perform shifts since it may produce bad results, depending on how the data is sampled. I was considering taking each data2[TIME_i] and working out the shortest distance to data1[~TIME_i]. Then minimizing the sum of those. But I am not sure that would work well either.
Does anyone have any suggestions on a good method to use? I looked at mlpy but it seems to only work on 1D arrays.
Thanks.
You can substract the mean of the difference: data2.VALUE-(data2.VALUE - data1.VALUE).mean()
import pandas as pd
import matplotlib.pyplot as plt
# Define some data
data1 = pd.DataFrame({
'TIME': [1.1, 2.4, 3.2, 4.1, 5.3],
'VALUE': [10.3, 10.5, 11.0, 10.9, 10.7],
'ERROR': [0.2, 0.1, 0.4, 0.3, 0.2],
})
data2 = pd.DataFrame({
'TIME': [0.9, 2.1, 2.9, 4.2],
'VALUE': [18.4, 18.7, 18.9, 18.8],
'ERROR': [0.3, 0.2, 0.5, 0.4],
})
# Plot the data
plt.errorbar(data1.TIME, data1.VALUE, yerr=data1.ERROR, fmt='ro')
plt.errorbar(data2.TIME, data2.VALUE-(data2.VALUE - data1.VALUE).mean(),
yerr=data2.ERROR, fmt='bo')
plt.show()
Another possibility is to subtract the mean of each series
You can calculate the offset of the average and subtract that from every value. If you do this for every value they should align relatively well. This would assume both dataset look relatively similar, so it might not work the best.
Although this question is not Matlab related, you might still be interested in this:
Remove unknown DC Offset from a non-periodic discrete time signal