I'd like to ask some questions about running lmer (Linear Mixed Effects Regression) models in Python.
Here are the two lines(or formulas) that I had run in the lme4 package(in R). Is there any way I could fit the models as below in Python?
TEST1 <- score ~ p1 + p2 + p3 + (1|v1) + (1|v2), data = df, control = lmerControl(boundary.tol = 1e-4, calc.derivs = FALSE))
TEST2 <- score ~ (1|v1) + (1|v2), data = df, control = lmerControl(boundary.tol = 1e-4, calc.derivs = FALSE))
If you aren't required to actually run the model in Python, you could call and run the LMER models in R directly from your Python environment.
You could do this through Rpy2 & rmagic or through Pymer4. Both options allow you to use the lme4 package in R with the option to call them from a Python environment like jupyter notebooks.
I wrote a tutorial on how you could do this with examples that is available here: https://towardsdatascience.com/how-to-run-linear-mixed-effects-models-in-python-jupyter-notebooks-4f8079c4b589
As EJJ noted, there are implementations of LMER in Python such as in statsmodels and Tensorflow but they appear less intuitive to use than the above method.
Related
Is there a way to define the contrast of the ANOVA on Python using the OLS.fit() function?
Trying to extend the R code Contrast("Contr.sum", "Contr.sum") to Python without success.
Results=ols('score ~ C(Var3) + C(Var1) + C(Var2)', data=Dataset).fit()
I'm creating a time-series forecasting model with external, controllable features similar to the "Forecasting Demand for Electricity" example found at https://medium.com/tensorflow/structural-time-series-modeling-in-tensorflow-probability-344edac24083. In order to model the influence of the external factors, I am using an sts.LinearRegression() as a component of my model, but those external factors are very non-linear in nature and it's causing unwanted negative predictions in my model.
I've tried creating (simpler) forecasting outside of TFP STS, and found that a RandomForestRegressor works much better a LinearRegressor for these external features. What I'd LIKE to do is replace the sts.LinearRegression() with an sts.RandomForestRegressor(), but that isn't available from the sts library. In fact, there are hardly any options available from the sts library: https://www.tensorflow.org/probability/api_docs/python/tfp/sts/LinearRegression
I've also tried converting my target variable to log form, but there are numerous instances of zeros (which are inf for log), and this doesn't turn out to be a useful transformation.
My model architecture for TFP STS looks something like this:
def build_model(observed_time_series):
season_effect = sts.Seasonal(
num_seasons = 4, num_steps_per_season = 13, observed_time_series = observed_time_series,
name = 'season_effect')
marketing_effect = sts.LinearRegression(
design_matrix = tf.stack([recent_publicity - np.mean(recent_publicity),
active_ad - np.mean(active_ad)], axis = -1),
name = 'marketing_effect')
autoregressive = sts.Autoregressive(order=1,
observed_time_series = observed_time_series,
name = 'autoregressive')
model = sts.Sum([season_effect,
marketing_effect,
autoregressive],
observed_time_series = observed_time_series)
return model
Where I want to change the "marketing_effect" component of the model to something non-linear.
Is my only option here to clone the TFP STS library and create a custom function to handle non-linear data with something like a Random Forest Regressor? Does anyone know of a better option?
I'm not familiar with the usage of random forests in sts models. Can you point to a system where this exists? The trick with tfp.sts is that the math all works out nice & analytically because everything is marginally gaussian. If we can make that work, I think we're definitely open to bringing in other models.
I would really appreciate some help with running code written in Python 3 from Matlab.
My Python code loads various libraries and uses them to perform numerical integration of a differential equation (for the numpy vector: e_array ).
The Python code which I would like to call from Matlab is the following:
from numba import jit
from scipy.integrate import quad
import numpy as np
#jit(nopython = True)
def integrand1(x,e,delta,r):
return (-2*np.sqrt(e*r)/np.pi)*(x/np.sqrt(1-x**2))/(1+(delta+2*x*np.sqrt(e*r))**2)
#jit(nopython = True)
def f1(e,delta,r):
return quad(integrand1, -1, 1, args=(e,delta,r))[0]
#jit(nopython = True)
def runge1(e,dtau,delta,r):
k1 = f1(e,delta,r)
k2 = f1((e+k1*dtau/2),delta,r)
k3 = f1((e+k2*dtau/2),delta,r)
k4 = f1((e+k3*dtau),delta,r)
return e + (dtau/6)*(k1+2*k2+2*k3+k4)
time_steps = 60
e = 10
dtau=1
r=1
delta=-1
e_array = np.zeros(time_steps)
time = np.zeros(time_steps)
for i in range(time_steps):
e_array[i] = e
time[i] = i*dtau
e = runge1(e,dtau,delta,r)
Ideally, I would like to be able to call this Python code (pythoncode.py) in Matlab as if it were a Matlab function and feed it the parameters: time_steps, e, dtau, r and delta. I would be very happy with a solution which looks like this:
e_array = pythoncode.py(time_steps = 60, e = 10, dtau = 1, r = 1, delta = -1)
where pythoncode.py is treated as a Matlab function which takes said parameters, feeds them into the Python code and returns the Matlab vector e_array.
I want to point out that there are several additional Python codes that I'd like to be able to call from Matlab and I'm hope to get an idea of how to do this from your answers regarding this specific Python code.
A related question involves the Python libraries which I use in the Python code: Is there a way to "compile" the Python code such that I can call it in Matlab without installing the libraries it uses (f.e the numba library) on the computer running the Matlab code?
Thanks very much for helping,
Asaf
You'll probably need to shell escape out of Matlab to invoke python -- prefix the command you'd run on the shell with !.
Matlab Shell Escape Functions suggests saving a mat file and then opening it in your python code -- see Read .mat files in Python .
In terms of compiling the python, you could take a look at How to compile a Python file and see if that helps you.
I'm trying to do Dirichlet Regression using Python. Unfortunately I cannot find a Python package that does the job. So I tried to call R library DirichletReg using rpy2. However, it is not very intuitive to me how to call a regression function such as DirichReg(Y ~ X1 + X2 + X3, data=predictorData) where Y = DR_data(compositionalData). I saw an example of calling linear regression function lm in the documentation of rpy2. But my case is slightly different as Y is not a column name in the table but an R object DR_data.
I'm wondering what the proper way is to do this, or whether there is a Python package for Dirichlet Regression.
You can send objects into the "Formula" environment from python. This example is from the rpy2 docs:
import array
from rpy2.robjects import IntVector, Formula
from rpy2.robjects.packages import importr
stats = importr('stats')
x = IntVector(range(1, 11))
y = x.ro + stats.rnorm(10, sd=0.2)
fmla = Formula('y ~ x')
env = fmla.environment
env['x'] = x
env['y'] = y
fit = stats.lm(fmla)
You can also create named variables in the R environment (outside the Formula). See here. Worst case scenario, you move some your python data into R through rpy2, then issue the commands directly in R through the rpy2 bridge as described here.
I want to calculate logistic regression parameters using R's glm package. I'm working with python and using rpy2 for that.
For some reason, when I'm running the glm function using R I get much faster results than by using rpy2. Do you know why the calculations using rpy2 is much slower?
I'm using R - V2.13.1 and rpy2 - V2.0.8
Here is the code I'm using:
import numpy
from rpy2 import robjects as ro
import rpy2.rlike.container as rlc
def train(self, x_values, y_values, weights):
x_float_vector = [ro.FloatVector(x) for x in numpy.array(x_values).transpose()]
y_float_vector = ro.FloatVector(y_values)
weights_float_vector = ro.FloatVector(weights)
names = ['v' + str(i) for i in xrange(len(x_float_vector))]
d = rlc.TaggedList(x_float_vector + [y_float_vector], names + ['y'])
data = ro.RDataFrame(d)
formula = 'y ~ '
for x in names:
formula += x + '+'
formula = formula[:-1]
fit_res = ro.r.glm(formula=ro.r(formula), data=data, weights=weights_float_vector, family=ro.r('binomial(link="logit")'))
Without the full R code you are benchmarking against, it is difficult to precisely point out where the problem might be.
You might want to run this through a Python profiler to see where the bottleneck(s) is (are).
Finally, the current release for rpy2 is 2.2.6. Beside API changes, it is running faster and has (presumably) less bugs than 2.0.8.
Edit: From your comments I am now suspecting that you are calling your function
in a loop, and a large fraction of the time is spent building R vectors (that might only have to be built once).