lmerTest in R versus lme in statsmodels/Python - python

I do have a complex linear mixed effect model.
model_lme = MixedLM.from_formula("Ratings ~ LCDev455*ST0LT1*TargetType*Female0Male1", groups=data["ResponseID"], data=data)
model_lme = model_lme.fit()
model_lme.summary()
This perfectly works in R, lmerTest. Not even a convergence warning. But in python statsmodels, it gives the error: Singular matrix.
I thought that these two packages (lmerTest in R, lme in Python) were giving the same output.
Does anyone have any idea why this might happen? I am more of a beginner so I cannot really simulate my data here.
Thank you!

Related

IV2SLS output summary much different than previous OLS estimation

I ran a different version of Mincer´s equation to estimate salary. Firstly, I ran an OLS version without considering endogeneity and the results are the following:Output summary
Salario is actually the natural log of it.
After that, I wrote next code to get an estimation with 2SLS method to solve endogeneity in variables No_Feliz and Años_Edu using No_Dep and max_edu_padres as instrumental variables for each one. However, the output is a bit confusing and I don´t know how to deal with it.
from statsmodels.sandbox.regression.gmm import IV2SLS
resultIV = IV2SLS(_dfb['Salario'], _dfb[['No_Feliz','Años_Edu']], _dfb[['No_Dep','Estatura','Años_Expe','Edad','Edad_2','Peso','Peso_2','DI','D_Hombre']]).fit()
resultIV.summary()
Results: Output summary IV2SLS
It´s clear the output has some issues (R2 too high compared to ols result, no result for F-statistic, coef of No_Feliz has a positive sign while it's negative in OLS estimation, the other exogenous variables are not taken into account despite the fact I included them)
I´d appreciate if someone could help me to fix it or at least make things a bit more clear to me. Thank you very much!
Don't use statsmodels's sandbox version which is not well tested. Instead use linearmodels. It is fully tested and correct.
Start by installing using pip install linearmodels
Then you need to be explicit about which variables are endogenous and need instrumenting, which are exogenous so in the model but not instrumented, and which are instruments only.
from linearmodels import IV2SLS
dep = _dfb['Salario']
exog = None
endog = _dfb[['No_Feliz','Años_Edu']]
instr = _dfb[['No_Dep','Estatura','Años_Expe','Edad','Edad_2','Peso','Peso_2','DI','D_Hombre']]
resultIV = IV2SLS(dep, exog, endog, instr).fit()
resultIV.summary # Note: this is a property so no ()
You can find the help which has more details. There are also some examples.

How to change arrays of shape (60,58) (60,59) to be equal

I'm writing a program that is for a friend of mine that is currently studying Aeronautical Engineering. I'm trying to test if the math I've implemented works. For those who know, I'm trying to calculate the divergence (I think I'm not an engineer and I'm not going to pretend that I am).
He sent me a Stack overflow link to a how he thinks this should be done. (The thread can be found here. His version doesn't work for me as it gives me a Numpy error as seen below:
numpy.core._internal.AxisError: axis 1 is out of bounds for array of
dimension 1
Now I've tried a different method that gives me a different error as seen below:
ValueError: operands could not be broadcast together with shapes (60,58)
(60,59)
This method gives me the error above and I'm not entirely sure how to fix it. I've put the code that gives me the above error.
velocity = np.diff(c_flow)/np.diff(zex)
ucom = velocity.real
vcom = -(velocity.imag)
deltau = np.divide((np.diff(ucom)),(np.diff(x)))
deltav = np.divide((np.diff(vcom)),np.diff(y))
print(deltau + deltav)
Note: C_flow is defined earlier in the program and is the complex potential. zex is also defined earlier as an early form of the complex variable. x and y are two coordinate matrices from coordinate vectors.
The expected results from the print statement should be zero or a value that is very close to zero. (I'm not entirely sure what the value should be but as I've said, I'm not an engineer)
Thank you in advance
EDIT:
After following BenT's advice I used np.gradient and np.sum but this was adding the axis in the wrong direction so to counteract this the I separated the two functions as seen below:
velocity = np.diff(c_flow)/np.diff(z)
grad = (np.gradient(velocity))
divergence = np.sum(grad, axis=0)
print(np.average(divergence))
print(np.average(velocity))

Parallel Scipy COO Matrix Computations

I am trying to calculate sparse matrix calculations using scipy for an algorithm that require intensive dependent computations(PageRank) on very large RDF datasets. I want to use multiple cores for the scipy calculation within the following code
F = sparse.coo_matrix((y['data'],(y['row'],y['col'])),shape=y['shape'])
W = sparse.coo_matrix((y['data'],(y['row'],y['col'])),shape=y['shape'])
P = sparse.bmat([[None, W], [F, None]])
previous = np.ones(n)/n
ones = np.ones(n)/n
while error > epsilon:
tmp = np.array(previous)
previous = damping*P.T.dot(previous) + (1-damping)*ones
error = np.linalg.norm(tmp - previous)
if(printerror):
print(error)
I have searched every possible answer I could find and I tried integrating the mkl(anaconda build) within the code but the performance on multiple cores does not seem to scale up. I have come to an understanding that the scipy call csr.h does not make use of BLAS call, I am wondering whether I need to make changes and replace the call to csr_matvec in from scipy/sparsetools with an appropriate Sparse BLAS call since MKL has those and then link scipy to mkl. Am I understanding something wrong or missing something. I would really appreciate some help in the matter. One similar question is here Thanks!!

Optimising python function with numba

I am trying to speed up a python function using numba, however I cannot seem to make it compile.
The input for my function is a 27x4 array of type np.int32.
My function is:
#nb.jit(nopython=True)
def edge_profile(input):
pos = input[:,:3]
val = input[:,3]
centre = np.mean(pos,axis=0).astype(np.int32)
diff = np.absolute(pos-centre).sum(axis=1)
cell_edge = np.zeros(3)
for i in range(3):
idx = np.where(diff==i+1)[0]
idy = np.where(val[idx]==1)[0]
cell_edge[i] = len(idy)
return cell_edge.astype(np.int32)
However this produces an extremely large error message which I have unable to use to diagnose the problem. I have tried specifying the input types as follows:
#nb.jit(nb.int32[:](nb.int32[:,:]))
def ...
however this produces an equally large error message.
I fell that I am probably using some function/feature that is not supported in numba, but I do not know enough about it to identify the problem. Any help would be greatly appreciated.
Numba should work ok so long as you stick to basic lists and arrays in the function you want to speed up. It appears that you are already using functions from numpy that are probably already well optimized. So its unlikely you will see a speed up even if you did get it to work. You haven't mentioned what your OS is. Under ubuntu 14.04 you can get it to work through some steps outlined here.

Theano matrix multiplication

I have a piece of code that is supposed to calculate a simple
matrix product, in python (using theano). The matrix that I intend to multiply with is a shared variable.
The example is the smallest example that demonstrates my problem.
I have made use of two helper-functions. floatX converts its input to something of type theano.config.floatX
init_weights generates a random matrix (in type floatX), of given dimensions.
The last line causes the code to crash. In fact, this forces so much output on the commandline that I can't even scroll to the top of it anymore.
So, can anyone tell me what I'm doing wrong?
def floatX(x):
return numpy.asarray(x,dtype=theano.config.floatX)
def init_weights(shape):
return floatX(numpy.random.randn(*shape))
a = init_weights([3,3])
b = theano.shared(value=a,name="b")
x = T.matrix()
y = T.dot(x,b)
f = theano.function([x],y)
This work for me. So my guess is that you have a problem with your blas installation. Make sure to use Theano development version:
http://deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions
It have better default for some configuration. If that do not fix the problem, look at the error message. There is main part that is after the code dump. After the stack trace. This is what is the most useful normally.
You can disable direct linking by Theano to blas with this Theano flag: blas.ldflags=
This can cause slowdown. But it is a quick check to confirm the problem is blas.
If you want more help, dump the error message to a text file and put it on the web and link to it from here.

Categories

Resources