Set ANOVA contrast in Python OLS - python

Is there a way to define the contrast of the ANOVA on Python using the OLS.fit() function?
Trying to extend the R code Contrast("Contr.sum", "Contr.sum") to Python without success.
Results=ols('score ~ C(Var3) + C(Var1) + C(Var2)', data=Dataset).fit()

Related

Control for a variable in regression (Python)

I am trying to perform multiple linear regression using statsmodels while controlling for one variable but I am not sure how to control for a variable in python.
import statsmodels.formula.api as smf
results = smf.ols('y ~ Var1 + Var2', data=df).fit()
print(results.summary())
How can I control for Var2 on this example? Thank you

Running lmer (Linear Mixed Effects Regression) in Python

I'd like to ask some questions about running lmer (Linear Mixed Effects Regression) models in Python.
Here are the two lines(or formulas) that I had run in the lme4 package(in R). Is there any way I could fit the models as below in Python?
TEST1 <- score ~ p1 + p2 + p3 + (1|v1) + (1|v2), data = df, control = lmerControl(boundary.tol = 1e-4, calc.derivs = FALSE))
TEST2 <- score ~ (1|v1) + (1|v2), data = df, control = lmerControl(boundary.tol = 1e-4, calc.derivs = FALSE))
If you aren't required to actually run the model in Python, you could call and run the LMER models in R directly from your Python environment.
You could do this through Rpy2 & rmagic or through Pymer4. Both options allow you to use the lme4 package in R with the option to call them from a Python environment like jupyter notebooks.
I wrote a tutorial on how you could do this with examples that is available here: https://towardsdatascience.com/how-to-run-linear-mixed-effects-models-in-python-jupyter-notebooks-4f8079c4b589
As EJJ noted, there are implementations of LMER in Python such as in statsmodels and Tensorflow but they appear less intuitive to use than the above method.

Create Normal Curve using TabPy

I have before made Normal curves using the amazing blog by Robin Kennedy:
https://public.tableau.com/en-us/s/blog/2013/11/fitting-normal-curve-histogram
But when it comes to TabPy, I am failing to do so. As using Python codes inside Tableau needs some tweaking to do, some of the basic functions of Python fail in Tableau.
Even if I go step by step with the blog and manipulating the code as TabPy needs me to work, the final formula to make the curve -
(1 / ([St Dev] * SQRT(2*PI()))) * EXP(-((ATTR([Profit Bin]) – [Mean])^2 / (2*[St Dev]^2))) * [Profit Bin Size]* TOTAL (SUM([Number of Records]))
is rather getting difficult for me to recreate.
What I wrote till now is -
SCRIPT_REAL("
import numpy as np
import matplotlib.mlab as mlab
import math
sigma = maths.sqrt(_arg1)
x = np.linspace(_arg2 - 3*_arg3, _arg2 + 3*_arg3, 100)
", FLOAT([Variance]), FLOAT([Mean]), FLOAT([Std Dev]) )
I am clueless how to proceed next, I mean how to plot that out. Any advice in this?
I have made the histogram using Tableau bins but I need to make the curve using Python.

Call R library DirichletReg from Python using rpy2

I'm trying to do Dirichlet Regression using Python. Unfortunately I cannot find a Python package that does the job. So I tried to call R library DirichletReg using rpy2. However, it is not very intuitive to me how to call a regression function such as DirichReg(Y ~ X1 + X2 + X3, data=predictorData) where Y = DR_data(compositionalData). I saw an example of calling linear regression function lm in the documentation of rpy2. But my case is slightly different as Y is not a column name in the table but an R object DR_data.
I'm wondering what the proper way is to do this, or whether there is a Python package for Dirichlet Regression.
You can send objects into the "Formula" environment from python. This example is from the rpy2 docs:
import array
from rpy2.robjects import IntVector, Formula
from rpy2.robjects.packages import importr
stats = importr('stats')
x = IntVector(range(1, 11))
y = x.ro + stats.rnorm(10, sd=0.2)
fmla = Formula('y ~ x')
env = fmla.environment
env['x'] = x
env['y'] = y
fit = stats.lm(fmla)
You can also create named variables in the R environment (outside the Formula). See here. Worst case scenario, you move some your python data into R through rpy2, then issue the commands directly in R through the rpy2 bridge as described here.

Slow glm calculation when using rpy2

I want to calculate logistic regression parameters using R's glm package. I'm working with python and using rpy2 for that.
For some reason, when I'm running the glm function using R I get much faster results than by using rpy2. Do you know why the calculations using rpy2 is much slower?
I'm using R - V2.13.1 and rpy2 - V2.0.8
Here is the code I'm using:
import numpy
from rpy2 import robjects as ro
import rpy2.rlike.container as rlc
def train(self, x_values, y_values, weights):
x_float_vector = [ro.FloatVector(x) for x in numpy.array(x_values).transpose()]
y_float_vector = ro.FloatVector(y_values)
weights_float_vector = ro.FloatVector(weights)
names = ['v' + str(i) for i in xrange(len(x_float_vector))]
d = rlc.TaggedList(x_float_vector + [y_float_vector], names + ['y'])
data = ro.RDataFrame(d)
formula = 'y ~ '
for x in names:
formula += x + '+'
formula = formula[:-1]
fit_res = ro.r.glm(formula=ro.r(formula), data=data, weights=weights_float_vector, family=ro.r('binomial(link="logit")'))
Without the full R code you are benchmarking against, it is difficult to precisely point out where the problem might be.
You might want to run this through a Python profiler to see where the bottleneck(s) is (are).
Finally, the current release for rpy2 is 2.2.6. Beside API changes, it is running faster and has (presumably) less bugs than 2.0.8.
Edit: From your comments I am now suspecting that you are calling your function
in a loop, and a large fraction of the time is spent building R vectors (that might only have to be built once).

Categories

Resources