How to get interpolated array values in numpy / scipy - python

I was wondering how I could get the interpolated value of a 3D array. I am trying to get the value at for example position: (1.4, 2.3, 4.2) of a 3d array. How can I get the interpolated value?
counterX = 1.5
counterY = 1.5
counterZ = 1.5
for x in range(0, length)
for y in range(0, length)
for z in range(0, length)
value = img[counterX, counterY, counterZ]
counterZ = 0
counterY = 0
counterX, counterY and counterZ are float values rather than integers. However I cannot css them int(...) since my results need to be very exact. Therefore I thought interpolation would be the best solution.

Just go for trilinear Interpolation as described here:
https://en.wikipedia.org/wiki/Trilinear_interpolation
For your example this would be:
C00 = (1,2,4)*0.6 + (2,2,4)*0.4
C01 = (1,3,4)*0.6 + (2,3,4)*0.4
C10 = (1,2,5)*0.6 + (2,2,5)*0.4
C11 = (1,3,5)*0.6 + (2,3,5)*0.4
C0 = C00*0.8 + C10*0.2
C1 = C01*0.8 + C11*0.2
C = C0*0.7 + C1*0.3

I am not sure what is exactly your problem.
Would you like to create an interpolated array from some observed values ? Then I would personnally recommend to use a kriging model, pyKriging seems to do that but I never used it personnally.
Then you could create a function (using the prediction model built through kriging) taking 3 arguments counterX, counterY and counterZ and just evaluate the prediction in any positions.

Related

Fit a time series in python with a mean value as boundary condition

I have the following boundary conditions for a time series in python.
The notation I use here is t_x, where x describe the time in milliseconds (this is not my code, I just thought this notation is good to explain my issue).
t_0 = 0
t_440 = -1.6
t_830 = 0
mean_value = -0.6
I want to create a list that contains 83 values (so the spacing is 10ms for each value).
The list should descibe a "curve" that starts at zero, has the minimum value of -1.6 at 440ms (so 44 in the list), ends with 0 at 880ms (so 83 in the list) and the overall mean value of the list should be -0.6.
I absolutely could not come up with an idea how to "fit" the boundaries to create such a list.
I would really appreciate help.
It is a quick and dirty approach, but it works:
X = list(range(0, 830 +1, 10))
Y = [0.0 for x in X]
Y[44] = -1.6
b = 12.3486
for x in range(44):
Y[x] = -1.6*(b*x+x**2)/(b*44+44**2)
for x in range(83, 44, -1):
Y[x] = -1.6*(b*(83-x)+(83-x)**2)/(b*38+38**2)
print(f'{sum(Y)/len(Y)=:8.6f}, {Y[0]=}, {Y[44]=}, {Y[83]=}')
from matplotlib import pyplot as plt
plt.plot(X,Y)
plt.show()
With the code giving following output:
sum(Y)/len(Y)=-0.600000, Y[0]=-0.0, Y[44]=-1.6, Y[83]=-0.0
And showing following diagram:
The first step in coming up with the above approach was to create a linear sloping 'curve' from the minimum to the zeroes. I turned out that linear approach gives here too large mean Y value what means that the 'curve' must have a sharp peak at its minimum and need to be approached with a polynomial. To make things simple I decided to use quadratic polynomial and approach the minimum from left and right side separately as the curve isn't symmetric. The b-value was found by trial and error and its precision can be increased manually or by writing a small function finding it in an iterative way.
Update providing a generic solution as requested in a comment
The code below provides a
meanYboundaryXY(lbc = [(0,0), (440,-1.6), (830,0), -0.6], shape='saw')
function returning the X and Y lists of the time series data calculated from the passed parameter with the boundary values:
def meanYboundaryXY(lbc = [(0,0), (440,-1.6), (830,0), -0.6]):
lbcXY = lbc[0:3] ; meanY_boundary = lbc[3]
minX = min(x for x,y in lbcXY)
maxX = max(x for x,y in lbcXY)
minY = lbc[1][1]
step = 10
X = list(range(minX, maxX + 1, step))
lenX = len(X)
Y = [None for x in X]
sumY = 0
for x, y in lbcXY:
Y[x//step] = y
sumY += y
target_sumY = meanY_boundary*lenX
if shape == 'rect':
subY = (target_sumY-sumY)/(lenX-3)
for i, y in enumerate(Y):
if y is None:
Y[i] = subY
elif shape == 'saw':
peakNextY = 2*(target_sumY-sumY)/(lenX-1)
iYleft = lbc[1][0]//step-1
iYrght = iYleft+2
iYstart = lbc[0][0] // step
iYend = lbc[2][0] // step
for i in range(iYstart, iYleft+1, 1):
Y[i] = peakNextY * i / iYleft
for i in range(iYend, iYrght-1, -1):
Y[i] = peakNextY * (iYend-i)/(iYend-iYrght)
else:
raise ValueError( str(f'meanYboundaryXY() EXIT, {shape=} not in ["saw","rect"]') )
return (X, Y)
X, Y = meanYboundaryXY()
print(f'{sum(Y)/len(Y)=:8.6f}, {Y[0]=}, {Y[44]=}, {Y[83]=}')
from matplotlib import pyplot as plt
plt.plot(X,Y)
plt.show()
The code outputs:
sum(Y)/len(Y)=-0.600000, Y[0]=0, Y[44]=-1.6, Y[83]=0
and creates following two diagrams for shape='rect' and shape='saw':
As an old geek, i try to solve the question with a simple algorithm.
First calculate points as two symmetric lines from 0 to 44 and 44 to 89 (orange on the graph).
Calculate sum except middle point and its ratio with sum of points when mean is -0.6, except middle point.
Apply ratio to previous points except middle point. (blue curve on the graph)
Obtain curve which was called "saw" by Claudio.
For my own, i think quadratic interpolation of Claudio is a better curve, but needs trial and error loops.
import matplotlib
# define goals
nbPoints = 89
msPerPoint = 10
midPoint = nbPoints//2
valueMidPoint = -1.6
meanGoal = -0.6
def createSerieLinear():
# two lines 0 up to 44, 44 down to 88 (89 values centered on 44)
serie=[0 for i in range(0,nbPoints)]
interval =valueMidPoint/midPoint
for i in range(0,midPoint+1):
serie[i]=i*interval
serie[nbPoints-1-i]=i*interval
return serie
# keep an original to plot
orange = createSerieLinear()
# work on a base
base = createSerieLinear()
# total except midPoint
totalBase = (sum(base)-valueMidPoint)
#total goal except 44
totalGoal = meanGoal*nbPoints - valueMidPoint
# apply ratio to reduce
reduceRatio = totalGoal/totalBase
for i in range(0,midPoint):
base[i] *= reduceRatio
base[nbPoints-1-i] *= reduceRatio
# verify
meanBase = sum(base)/nbPoints
print("new mean:",meanBase)
# draw
from matplotlib import pyplot as plt
X =[i*msPerPoint for i in range(0,nbPoints)]
plt.plot(X,base)
plt.plot(X,orange)
plt.show()
new mean: -0.5999999999999998
Hope you enjoy simple things :)

Exponential fit in pandas

I have this data:
puf = pd.DataFrame({'id':[1,2,3,4,5,6,7,8],
'val':[850,1889,3289,6083,10349,17860,28180,41236]})
The data seems to follow an exponential curve. Let's see the plot:
puf.plot('id','val')
I want to fit an exponential curve ($$ y = Ae^{Bx} $$, A times e to the B*X)and add it as a column in Pandas. Firstly I tried to log the values:
puf['log_val'] = np.log(puf['val'])
And then to use Numpy to fit the equation:
puf['fit'] = np.polyfit(puf['id'],puf['log_val'],1)
But I get an error:
ValueError: Length of values (2) does not match length of index (8)
My expected result is the fitted values as a new column in Pandas. I attach an image with the column fitted values I want (in orange):
I'm stuck in this code. I'm not sure what I am doing wrong. How can I create a new column with my fitted values?
Note that you asked for an exponential model yet you have the results for log-linear model.
Check out the work below:
For log-linear, we are fitting E(log(Y))ie log(y) - (log(b[0]) +b[1]*x):
from scipy.optimize import least_squares
least_squares(lambda b: np.log(puf['val']) -(np.log(b[0]) + b[1] * puf['id']),
[1,1])['x']
array([5.99531305e+02, 5.51106793e-01])
These are the values that excel gives.
On the other hand to fit an exponential curve, the randomness is on Y and not on its logarithm, E(Y)=b[0]*exp(b[1] *x) Hence we have:
least_squares(lambda b: puf['val'] - b[0]*exp(b[1] * puf['id']), [0,1])['x']
array([1.08047304e+03, 4.58116127e-01]) # correct results for exponential fit
Depending on your model choice, the values are alittle different.
Better Model? Since you have same number of parameters, consider the one that gives you lower deviance or better out of sample prediction
Note that the ideal exponential model is E(Y) = A'B'^X which for comparison can be written as log(E(Y)) = A + XB while log-linear model will be E(log(Y) = A + XB. Note the difference in Expectation.
From the two models we have:
Notice how when we go to higher numbers the log-linear overestimates. While in the lower numbers the exponential overestimates.
Code for image:
from scipy.optimize import least_squares
log_lin = least_squares(lambda b: np.log(puf['val']) -(np.log(b[0]) + b[1] * puf['id']),
[1,1])['x']
expo = least_squares(lambda b: puf['val'] - b[0]*exp(b[1] * puf['id']), [0,1])['x']
exp_fun = lambda x: expo[0] * exp(expo[1]*x)
log_lin_fun = lambda x:log_lin[0] * exp(log_lin[1]*x)
plt.plot(puf.id, puf.val, label = 'original')
plt.plot(puf.id, exp_fun(puf.id), label='exponential')
plt.plot(puf.id, log_lin_fun(puf.id), label='log-linear')
plt.legend()
Your getting that error because np.polyfit(puf['id'],puf['log_val'],1) returns two values array([0.55110679, 6.39614819]) which isn't the shape of your dataframe.
This is what you want
y = a* exp (b*x) -> ln(y)=ln(a)+bx
f = np.polyfit(df['id'], np.log(df['val']), 1)
where
a = np.exp(f[1]) -> 599.5313046712091
b = f[0] -> 0.5511067934637022
Giving
puf['fit'] = a * np.exp(b * puf['id'])
id val fit
0 1 850 1040.290193
1 2 1889 1805.082864
2 3 3289 3132.130026
3 4 6083 5434.785677
4 5 10349 9430.290286
5 6 17860 16363.179739
6 7 28180 28392.938399
7 8 41236 49266.644002

Splitting integrated probability density into two spatial regions

I have some probability density function:
T = 10000
tmin = 0
tmax = 10**20
t = np.linspace(tmin, tmax, T)
time = np.asarray(t) #this line may be redundant
for j in range(T):
timedep_PD[j]= probdensity_func(x,time[j],initial_state)
I want to integrate it over two distinct regions of x. I tried the following to split the timedep_PD array into two spatial regions and then proceeded to integrate:
step = abs(xmin - xmax) / T
l1 = int(np.floor((abs(ab - xmin)* T ) / abs(xmin - xmax)))
l2 = int(np.floor((abs(bd - ab)* T ) / abs(xmin - xmax)))
#For spatial region 1
R1 = np.empty([l1])
R1 = x[:l1]
for i in range(T):
Pd1[i] = Pd[i][:l1]
#For spatial region 2
Pd2 = np.empty([T,l2])
R2 = np.empty([l2])
R2 = x[l1:l1+l2]
for i in range(T):
Pd2[i] = Pd[i][l1:l1+l2]
#Integrating over each spatial region
for i in range(T):
P[0][i] = np.trapz(Pd1[i],R1)
P[1][i] = np.trapz(Pd2[i],R2)
Is there an easier/more clear way to go about splitting up a probability density function into two spatial regions and then integrating within each spatial region at each time-step?
The loops can be eliminated by using vectorized operations instead. It's not clear whether Pd is a 2D NumPy array; it it's something else (e.g., a list of lists), it should be converted to a 2D NumPy array with np.array(...). After that you can do this:
Pd1 = Pd[:, :l1]
Pd2 = Pd[:, l1:l1+l2]
No need to loop over the time index; the slicing happens for all times at once (having : in place of an index means "all valid indices").
Similarly, np.trapz can integrate all time slices at once:
P1 = np.trapz(Pd1, R1, axis=1)
P2 = np.trapz(Pd2, R2, axis=1)
Each P1 and P2 is now a time series of integrals. The axis parameter determines along which axis Pd1 gets integrated - it's the second axis, i.e., space.

numpy contour plot with cost function

a = np.array(x)
b = np.array(y)
a_transpose = a.transpose()
a_trans_times_a = np.dot(a_transpose,a)
a_trans_times_b = np.dot(a_transpose,b)
def cost(theta):
x_times_theta = np.dot(a, theta)
_y_minus_x_theta = b - x_times_theta
_y_minus_x_theta_transpose = _y_minus_x_theta.transpose()
return np.dot(_y_minus_x_theta_transpose, _y_minus_x_theta)
n = 256
p = np.linspace(-100,100, n)
q= np.linspace(-100,100, n)
P, Q = np.meshgrid(p,q)
pl.contourf(P, Q, cost(np.array([P,Q])) ,8, alpha =0.75, cmap = 'jet')
C = pl.contour(P,Q, cost(np.array([P,Q])), 8, colors = 'black', linewidth = 0.5 )
Hi, I'm trying to make a contour plot using a cost function on two parameters, involving matrix multiplication. I've tested the cost function and it works properly in interactive session. However, running it on a linspace makes it get error "ValueError: objects are not aligned". I understand now that it has to do with how I structure P,Q. Would the solution involve writing a for loop to explicitly get an array of outputs? How would I write this?
EDIT: a,b are matrices with correct size. The cost function takes a 2-vector and outputs a number.
It's hard to know exactly without having at hand the shapes of a and b, but this error is probably caused by np.array[P,Q] being a 3-dimensional array. It seems you expect it to be 2-dimensional and for np.dot(a,theta) to perform matrix multiplication.
Presumably you want theta to be the the angular coordinate at a particular x and y value. In this case you should do
theta = np.arctan2(Q,P) #this is a 2D array of theta coordinates
costarray = cost(theta)
pl.contourf(P,Q,costarray,8,alpha=0.75,cmap='jet')

Is a Fuzzy C-Means algorithm available for Python?

I have some dots in a 3 dimensional space and would like to cluster them. I know Pythons module "cluster", but it has only K-Means. Do you know a module which has FCM (Fuzzy C-Means)?
(If you know some other python modules which are related to clustering you could name them as a bonus. But the important question is the one for a FCM-algorithm in python.)
Matlab
It seems to be quite easy to use FCM in Matlab (example). Isn't something like this available for Python?
NumPy, SciPy and Sage
I didn't find FCM in NumPy, SciPy or Sage. I've downloaded the documentation and searched for it. No results
Python-cluster
It seems like the cluster module will add fuzzy C-Means with the next version (see Roadmap). But I need it now
PEACH will provide some Fuzzy C-Means functionality:
http://code.google.com/p/peach/
However there doesn't seem to be any usable documentation as the wiki is empty. An example for using FCM with PEACH can be found on its website.
Have a look at scikit-fuzzy package. It has the very basic fuzzy logic functionality, including fuzzy c-means clustering.
Python
There is a fuzzy-c-means package in the PyPI. Check out the link : fuzzy-c-means Python
This is the simplest way to use FCM in python. Hope it helps.
I have done it from scratch, using K++ initialization (with fixed seeds and 5 centroids. It should't be too difficult to addapt it to your desired number of centroids):
# K++ initialization Algorithm:
import random
def initialize(X, K):
C = [X[0]]
for k in range(1, K):
D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
np.random.seed(20) # fixxing seeds
#random.seed(0) # fixxing seeds
r = scipy.rand()
for j,p in enumerate(cumprobs):
if r < p:
i = j
break
C.append(X[i])
return C
a = initialize(data2,5) # "a" is the centroids initial array... I used 5 centroids
# Now the Fuzzy c means algorithm:
m = 1.5 # Fuzzy parameter (it can be tuned)
r = (2/(m-1))
# Initial centroids:
c1,c2,c3,c4,c5 = a[0],a[1],a[2],a[3],a[4]
# prepare empty lists to add the final centroids:
cc1,cc2,cc3,cc4,cc5 = [],[],[],[],[]
n_iterations = 10000
for j in range(n_iterations):
u1,u2,u3,u4,u5 = [],[],[],[],[]
for i in range(len(data2)):
# Distances (of every point to each centroid):
a = LA.norm(data2[i]-c1)
b = LA.norm(data2[i]-c2)
c = LA.norm(data2[i]-c3)
d = LA.norm(data2[i]-c4)
e = LA.norm(data2[i]-c5)
# Pertenence matrix vectors:
U1 = 1/(1 + (a/b)**r + (a/c)**r + (a/d)**r + (a/e)**r)
U2 = 1/((b/a)**r + 1 + (b/c)**r + (b/d)**r + (b/e)**r)
U3 = 1/((c/a)**r + (c/b)**r + 1 + (c/d)**r + (c/e)**r)
U4 = 1/((d/a)**r + (d/b)**r + (d/c)**r + 1 + (d/e)**r)
U5 = 1/((e/a)**r + (e/b)**r + (e/c)**r + (e/d)**r + 1)
# We will get an array of n row points x K centroids, with their degree of pertenence
u1.append(U1)
u2.append(U2)
u3.append(U3)
u4.append(U4)
u5.append(U5)
# now we calculate new centers:
c1 = (np.array(u1)**2).dot(data2) / np.sum(np.array(u1)**2)
c2 = (np.array(u2)**2).dot(data2) / np.sum(np.array(u2)**2)
c3 = (np.array(u3)**2).dot(data2) / np.sum(np.array(u3)**2)
c4 = (np.array(u4)**2).dot(data2) / np.sum(np.array(u4)**2)
c5 = (np.array(u5)**2).dot(data2) / np.sum(np.array(u5)**2)
cc1.append(c1)
cc2.append(c2)
cc3.append(c3)
cc4.append(c4)
cc5.append(c5)
if (j>5):
change_rate1 = np.sum(3*cc1[j] - cc1[j-1] - cc1[j-2] - cc1[j-3])/3
change_rate2 = np.sum(3*cc2[j] - cc2[j-1] - cc2[j-2] - cc2[j-3])/3
change_rate3 = np.sum(3*cc3[j] - cc3[j-1] - cc3[j-2] - cc3[j-3])/3
change_rate4 = np.sum(3*cc4[j] - cc4[j-1] - cc4[j-2] - cc4[j-3])/3
change_rate5 = np.sum(3*cc5[j] - cc5[j-1] - cc5[j-2] - cc5[j-3])/3
change_rate = np.array([change_rate1,change_rate2,change_rate3,change_rate4,change_rate5])
changed = np.sum(change_rate>0.0000001)
if changed == 0:
break
print(c1) # to check a centroid coordinates c1 - c5 ... they are the last centroids calculated, so supposedly they converged.
print(U) # this is the degree of pertenence to each centroid (so n row points x K centroids columns).
I know it is not very pythonic, but I hope it can be a starting point for your complete fuzzy C means algorithm. I think that "soft clustering" is the way to go when data is not easily separable (for example, when "t-SNE visualization" show all data together instead of showing groups clearly separated. In this case, forcing data to pertain strictly to only one clustering can be dangerous). I would give a try with m = 1.1, to m = 2.0, so you can see how the fuzzy parameter affects to the pertenence matrix.

Categories

Resources