Related
I am trying to fit Resident Time Distribution (RTD) Data. RTD is typically skewed distribution. I have built a simple code that takes this non equally space-time data set from the RTD.
Data Sett
timeArray = [0.0, 0.5, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 12.0, 14.0]
concArray = [0.0, 0.6, 1.4, 5.0, 8.0, 10.0, 8.0, 6.0, 4.0, 3.0, 2.2, 1.5, 0.6, 0.0]
To fit the data I have been using python curve_fit function
parameters, covariance = curve_fit(nCSTR, time, conc, p0=guess)
and different sets of models (ex. CSTR, Sine, Gauss) to fit the data. However, no success so far.
The RTD data that I have correspond to a CSTR and there is an equation that model very accurate this type of behavior.
#Generalize nCSTR model
y = (( (np.power(x/tau,n-1)) * np.power(n,n) ) / (tau * math.gamma(n)) ) * np.exp(-n*x/tau)
As a separate note: from the Generalized nCSTR model I am using gamma instead of (n-1)! factorial terms because of the complexities of the code trying to deal with decimal values in factorials terms.
This CSTR model should be the one fitting the data without problem but for some reason is not able to do so. The outcome after executing my code:
timeArray = [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0, 12.5, 13.0, 13.5, 14.0]
concArray = [0.0, 0.6, 1.4, 2.6, 5.0, 6.5, 8.0, 9.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.5, 3.0, 2.5, 2.2, 1.8, 1.5, 1.2, 1.0, 0.8, 0.6, 0.5, 0.3, 0.1, 0.0]
#Recast time and conc into numpy arrays
time = np.asarray(timeArray)
conc = np.asarray(concArray)
plt.plot(time, conc, 'o')
def nCSTR(x, tau, n):
y = (( (np.power(x/tau,n-1)) * np.power(n,n) ) / (tau * math.gamma(n)) ) * np.exp(-n*x/tau)
return y
guess = [1, 12]
parameters, covariance = curve_fit(nCSTR, time, conc, p0=guess)
tau = parameters[0]
n = parameters[1]
y = np.arange(0.0, len(time), 1.0)
for i in range(len(timeArray)):
y[i] = (( (np.power(time[i]/tau,n-1)) * np.power(n,n) ) / (tau * math.gamma(n)) ) * np.exp(-n*time[i]/tau)
plt.plot(time,y)
is this plot Fitting Output
I know I am missing something and any help will be well appreciated. The model has been well known for decades so it should not be related to the equation. I did some dummy data to confirm that the equation is written correctly and the output was the same type of profile that I am looking for. In that end, the equestion is fine.
import numpy as np
import math
t = np.arange(0.0, 10.5, 0.5)
tau = 2
n = 5
y = np.arange(0.0, len(t), 1.0)
for i in range(len(t)):
y[i] = (( (np.power(t[i]/tau,n-1)) * np.power(n,n) ) / (tau * math.gamma(n)) ) * np.exp(-n*t[i]/tau)
print(y)
plt.plot(t,y)
CSTR profile with Dummy Data (image)
If anyone is interested in the theory behind it I recommend any reading related to Tank In Series (specifically CSTR) Fogler has a great book about this topic.
I think that the main problem is that your model does not allow for an overall scale factor or that your data may not be normalized as you expect.
If you'll permit me to convert your curve-fitting program to use lmfit (I am a lead author), you might do:
import numpy as np
from scipy.special import gamma
import matplotlib.pyplot as plt
from lmfit import Model
timeArray = [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0, 12.5, 13.0, 13.5, 14.0]
concArray = [0.0, 0.6, 1.4, 2.6, 5.0, 6.5, 8.0, 9.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.5, 3.0, 2.5, 2.2, 1.8, 1.5, 1.2, 1.0, 0.8, 0.6, 0.5, 0.3, 0.1, 0.0]
#Recast time and conc into numpy arrays
time = np.asarray(timeArray)
conc = np.asarray(concArray)
plt.plot(time, conc, 'o', label='data')
def nCSTR(x, scale, tau, n):
"""scaled CSTR model"""
z = n*x/tau
return scale * np.exp(-z) * z**(n-1) * (n/(tau*gamma(n)))
# create a Model for your model function
cmodel = Model(nCSTR)
# now create a set of Parameters for your model (note that parameters
# are named using your function arguments), and give initial values
params = cmodel.make_params(tau=3, scale=10, n=10)
# since you have `xxx**(n-1)`, setting a lower bound of 1 on `n`
# is wise, otherwise you would have to handle complex values
params['n'].min = 1
# now fit the model to your `conc` data with those parameters
# (and also passing in independent variables using `x`: the argument
# name from the signature of the model function)
result = cmodel.fit(conc, params, x=time)
# print out a report of the results
print(result.fit_report())
# you do not need to construct the best fit yourself, it is in `result`:
plt.plot(time, result.best_fit, label='fit')
plt.legend()
plt.show()
This will print out a report that includes statistics and uncertainties:
[[Model]]
Model(nCSTR)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 29
# data points = 29
# variables = 3
chi-square = 2.84348862
reduced chi-square = 0.10936495
Akaike info crit = -61.3456602
Bayesian info crit = -57.2437727
R-squared = 0.98989860
[[Variables]]
scale: 49.7615649 +/- 0.81616118 (1.64%) (init = 10)
tau: 5.06327482 +/- 0.05267918 (1.04%) (init = 3)
n: 4.33771512 +/- 0.14012112 (3.23%) (init = 10)
[[Correlations]] (unreported correlations are < 0.100)
C(scale, n) = -0.521
C(scale, tau) = 0.477
C(tau, n) = -0.406
and generate a plot of
I have an empty numpy array, a list of indices, and list of values associated with the indices. The issue is that there may be duplicates in the indices. In all these "collision" cases, I'd like the smallest value to be picked. Just wondering what is the best way to go about it.
Eg:
array = [0,0,0,0,0,0,0]
indices = [0, 0, 2, 3, 2, 4]
values = [1.0, 3.0, 3.5, 1.5, 2.5, 8.0]
Result:
out = [1.0, 0, 2.5, 1.5, 8.0, 0.0, 0.0]
You can always implement something manually like:
import numpy as np
def index_reduce(arr, indices, out, reducer=min):
touched = np.zeros_like(out, dtype=np.bool_)
for i, x in enumerate(indices):
if not touched[x]:
out[x] = arr[i]
touched[x] = True
else:
out[x] = reducer(out[x], arr[i])
return out
which essentially loops through the indices and assign the values of arr to out if not already touched (keeping track of this with the touched array) and reducing the output with the specified reducer.
NOTE: The reducer function needs to be such that the final result can only depend on the current and previous value.
The usage of this would be:
indices = [0, 0, 2, 3, 2, 4]
values = [1.0, 3.0, 3.5, 1.5, 2.5, 8.0]
array = np.zeros(7)
index_reduce(values, indices, array)
# array([1. , 0. , 2.5, 1.5, 8. , 0. , 0. ])
If performances are of concern, you can also accelerate the above code with Numba with a simple decoration provided that also the values and indices inputs are NumPy arrays:
import numba as nb
index_reduce_nb = nb.njit(index_reduce)
indices = np.array([0, 0, 2, 3, 2, 4])
values = np.array([1.0, 3.0, 3.5, 1.5, 2.5, 8.0])
array = np.zeros(7)
index_reduce_nb(values, indices, array)
# array([1. , 0. , 2.5, 1.5, 8. , 0. , 0. ])
Benchmarks
The above solutions can be compared to a Torch-based solution (reworked from #Shai's answer):
import torch
def index_reduce_torch(arr, indices, out, reduce_="amin"):
arr = torch.from_numpy(arr)
indices = torch.from_numpy(indices)
out = torch.from_numpy(out)
return out.index_reduce_(dim=0, index=indices, source=arr, reduce=reduce_, include_self=False).numpy()
or, with additional skipping of Torch gradients:
index_reduce_torch_ng = torch.no_grad()(index_reduce_torch)
index_reduce_torch_ng.__name__ = "index_reduce_torch_ng"
and a Pandas-based solution (reworked from #bpfrd's answer):
import pandas as pd
def index_reduce_pd(arr, indices, out, reducer=min):
df = pd.DataFrame(data=zip(indices, arr))
df1 = df.groupby(0, as_index=False).agg(reducer)
out[df1[0]] = df1[1]
return out
using the following code:
funcs = index_reduce, index_reduce_nb, index_reduce_pd, index_reduce_torch, index_reduce_torch_ng
timings = {}
for i in range(4, 18):
n = 2 ** i
print(f"n = {n}, i = {i}")
extrema = 0, 2 * n
indices = np.random.randint(*extrema, n)
values = np.random.random(n)
out = np.zeros(extrema[1] + 1)
timings[n] = []
base = funcs[0](values, indices, out)
for func in funcs:
res = func(values, indices, out)
is_good = np.allclose(base, res)
timed = %timeit -r 16 -n 16 -q -o func(values, indices, out)
timing = timed.best * 1e6
timings[n].append(timing if is_good else None)
print(f"{func.__name__:>24} {is_good} {timing:10.3f} µs")
to produce with the additional lines:
import matplotlib.pyplot as plt
df = pd.DataFrame(data=timings, index=[func.__name__ for func in funcs]).transpose()
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', figsize=(6, 4))
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ylim=[0, 500], figsize=(6, 4))
fig = plt.gcf()
fig.patch.set_facecolor('white')
these plots:
(the second is a zoomed-in version of the first).
These indicate that the Numba accelerated solution could be the fastest, closely followed by the Torch-based solution while the Pandas approach could be the slowest, even slower than the explicit solution without acceleration.
You are looking for index_reduce_, which was introduced in PyTorch 1.12.
import torch
array = torch.zeros(7)
indices = torch.tensor([0, 0, 2, 3, 2, 4])
values = torch.tensor([1.0, 3.0, 3.5, 1.5, 2.5, 8.0])
out = array.index_reduce_(dim=0, index=indices, source=values, reduce='amin', include_self=False)
You'll get your desired output:
tensor([1.0000, 0.0000, 2.5000, 1.5000, 8.0000, 0.0000, 0.0000])
Note that this method is in "beta" and its API may change in future PyTorch versions.
You can use pandas groupby agg as the following:
indices = [0, 0, 2, 3, 2, 4]
values = [1.0, 3.0, 3.5, 1.5, 2.5, 8.0]
array = [0,0,0,0,0,0,0]
df = pd.DataFrame(zip(indices, values), columns=['indices','values'])
df1 = df.groupby('indices', as_index=False).agg(values=('values', min))
for i,j in zip(df1['indices'].tolist(), df1['values'].tolist()):
array[i] = j
output:
array
>[1.0, 0, 2.5, 1.5, 8.0, 0, 0]
I have input Tensors of 0-3 dimensions and always want to output to a 3D Tensor (for use with a tf.einsum function where I can't use broadcasting), with the axis being filled from inside out. Is there a better way for me to do this than the following (ugly) conditional? I read through tf.expand_dims, tf.reshape, and tf.broadcast_to but couldn't find anything that would allow a dynamic shape based on input Tensors of varying dimensions.
import tensorflow as tf
def broadcast_cash_flows(x):
shape = tf.shape(x)
dimensions = len(shape)
return tf.cond(dimensions == 0,
lambda: cf_0d(x),
lambda: tf.cond(dimensions == 1,
lambda: cf_1d(x),
lambda: tf.cond(dimensions == 2,
lambda: cf_2d(x),
lambda: x)))
def cf_0d(x):
return tf.expand_dims(tf.expand_dims(tf.expand_dims(x,0),0),0)
def cf_1d(x):
return tf.expand_dims(tf.expand_dims(x,0),0)
def cf_2d(x):
return tf.expand_dims(x,0)
cf0 = tf.constant(2.0)
print(broadcast_cash_flows(cf0))
cf1 = tf.constant([2.0, 1.0, 3.0])
print(broadcast_cash_flows(cf1))
cf2 = tf.constant([[2.0, 1.0, 3.0],
[3.0, 2.0, 4.0]])
print(broadcast_cash_flows(cf2))
cf3 = tf.constant([[[2.0, 1.0, 3.0],
[3.0, 2.0, 4.0]],
[[2.0, 1.0, 3.0],
[3.0, 2.0, 4.0]]])
print(broadcast_cash_flows(cf3))
tf.expand_dims is convenient when you want to add one dimension.
tf.newaxis is convenient when you want to add multiple dimensions in one operation (instead of calling tf.expand_dims multiple times).
Modified Code -
import tensorflow as tf
def broadcast_cash_flows(x):
shape = tf.shape(x)
dimensions = len(shape)
if(dimensions == 0):
return x[tf.newaxis,tf.newaxis,tf.newaxis]
elif(dimensions == 1):
return x[tf.newaxis,tf.newaxis,:]
elif(dimensions == 2):
return x[tf.newaxis,:,:]
else:
return x
cf0 = tf.constant(2.0)
print(broadcast_cash_flows(cf0))
cf1 = tf.constant([2.0, 1.0, 3.0])
print(broadcast_cash_flows(cf1))
cf2 = tf.constant([[2.0, 1.0, 3.0],
[3.0, 2.0, 4.0]])
print(broadcast_cash_flows(cf2))
cf3 = tf.constant([[[2.0, 1.0, 3.0],
[3.0, 2.0, 4.0]],
[[2.0, 1.0, 3.0],
[3.0, 2.0, 4.0]]])
print(cf3.shape)
print(broadcast_cash_flows(cf3))
Output -
tf.Tensor([[[2.]]], shape=(1, 1, 1), dtype=float32)
tf.Tensor([[[2. 1. 3.]]], shape=(1, 1, 3), dtype=float32)
tf.Tensor(
[[[2. 1. 3.]
[3. 2. 4.]]], shape=(1, 2, 3), dtype=float32)
(2, 2, 3)
tf.Tensor(
[[[2. 1. 3.]
[3. 2. 4.]]
[[2. 1. 3.]
[3. 2. 4.]]], shape=(2, 2, 3), dtype=float32)
I have the following code:
import numpy as np
import tensorflow as tf
a = np.array([0.5, 0.5])
b = np.array([0.2, 0.2, 0.0, 0.0])
non_zeros = ~tf.equal(b, 0.)
cast_op = tf.cast(non_zeros, tf.float64)
new_vec = tf.multiply(a, cast_op) # won't work
# the required output is [0.5, 0.5, 0.0, 0.0]
I am trying to obtain the vector [0.5, 0.5, 0.0, 0.0] as explained in the code. Does anyone know how to do this? I also looked at tf.fill but that takes a scalar value, so won't work for me.
You get an error because tf.multiply expects tensors of the same shape. What you could do, however, is to simply do this:
a = np.array[0.5, 0.5])
b = np.array([0.2, 0.2, 0.0, 0.0])
b = np.logical_and(b, n.ones(b.shape)).astype(float)
a = np.concatenate((a, np.zeros(b.shape[0] - a.shape[0])))
new_vec = a * b
You can exploit the broadcasting capability of the tf.multiply op.
I've added next to every line the shape of the tensor: please note the usage of tf.expand_dims to add a 1 dimension to the a tensor in order to get, after the multiplication, a tensor with shape (2,4).
This tensor has repeated values (2 rows, 4 columns equal), hence we can just take the first row
import numpy as np
import tensorflow as tf
a = np.array([0.5, 0.5]) #(2)
b = np.array([0.2, 0.2, 0.0, 0.0]) #(4)
non_zeros = ~tf.equal(b, 0.) #(4)
cast_op = tf.cast(non_zeros, tf.float64) # (4)
new_vec = tf.multiply(tf.expand_dims(a, axis=[1]),
cast_op) # (2, 1) * (4) = (2, 4)
new_vec = new_vec[0, :] # (4)
print(new_vec)
sess = tf.InteractiveSession()
print(sess.run(new_vec))
This code produces [0.5 0.5 0. 0.]
I would like to convert a vector to a symmetric Toeplitz matrix using Tensorflow operations like this:
a = tf.placeholder(tf.float32, shape=[vector_size])
A = some_tensorflow_operation(a)
where the shape of A is [vector_size, vector_size]. The relation between the two variables is as below.
a = [a1,a2,a3]
A = [[a1,a2,a3],[a2,a1,a2],[a3,a2,a1]]
What is the simplest way to do it?
In case vector_size=3:
>>> a = tf.placeholder(tf.float32, shape=[vector_size])
>>> A = [[a[0],a[1],a[2]],[a[1],a[0],a[1]],[a[2],a[1],a[0]]]
>>> sess = tf.Session()
>>> sess.run(A, {a: [1, 2, 3]})
[[1.0, 2.0, 3.0], [2.0, 1.0, 2.0], [3.0, 2.0, 1.0]]