I am trying to define a splunk dashboard equivalent to this code in python
from scipy import stats
stats.beta.cdf(x, T, F) - stats.beta.cdf(y, T, F)
Where x and y are splunk expressions (defined with splunk's eval).
I saw a lot of complex stuff (classifiers, anomaly detection, etc...) when looking at the splunk docs, but I couldn't find any reference to known distribution functions such as Beta and Gamma.
Could someone refer me to any statistics package for splunk ?
I discovered the | script directive in splunk.
Now, This is the python code I wrote, and it runs directly from splunk
from scipy import stats
import splunk.Intersplunk
src_cols = ["s1","s2"]
new_cols = ["n1"]
print (",".join(src_cols+new_cols))
for row in splunk.Intersplunk.readResults():
output = map(lambda c: row[c], src_cols)
output += [stats.beta.cdf(row["s1"],0, 1) - row["s2"],0, 1)]
print (",".join(output))
Related
When calling a function in the survival package in R from within python with the rpy2 interface I get the following error:
RRuntimeError: Error in formula[[2]] : subscript out of bounds
Any pointer to solve the issue please?
Thanks
Code:
import pandas as pd
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
R = ro.r
from rpy2.robjects import pandas2ri
pandas2ri.activate()
## install the survival package
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1) # select the first mirror in the list
utils.install_packages(StrVector('survival'))
#Load the library and example data set
survival=importr('survival')
infert = R('infert')
## Linear model works fine
reslm=R.lm('case~spontaneous+induced',data=infert)
#Run the example clogit function, which fails
rescl=R.clogit('case~spontaneous+induced+strata(stratum)',data=infert)
After trying around, I found out, there is a difference, whether you offer the R instance of rpy2 the full R-code string to execute, or not.
Thus, you can make your function run, by giving as much as possible as R code:
#Run the example clogit function, which fails
rescl=R.clogit('case~spontaneous+induced+strata(stratum)',data=infert)
#But give the R code to be executed as one complete string - this works:
rescl=R('clogit(case ~ spontaneous + induced + strata(stratum), data = infert)')
If you capture the return value to a variable within R, you can inspect the data and get out the critical information of the model
by the usual functions in R.
E.g.
R('rescl.in.R <- clogit(case ~ spontaneous + induced + strata(stratum), data = infert)')
R('str(rescl.in.R)')
# or:
R('coef(rescl.in.R)')
## array([1.98587552, 1.40901163])
R('names(rescl.in.R)')
## array(['coefficients', 'var', 'loglik', 'score', 'iter',
## 'linear.predictors', 'residuals', 'means', 'method', 'n', 'nevent',
## 'terms', 'assign', 'wald.test', 'y', 'formula', 'xlevels', 'call',
## 'userCall'], dtype='<U17')
It helps a lot - at least in this first phase of using rpy2 (for me, too), to have your r instance open and trying the code in parallel which you do, since the output in R is far more readable and you know and see what you are doing and what you could address.
In Python, the output is stripped off of important informations (like the name etc) - and in addition, it is not pretty-printed.
This fails when including the strata() function within the formula because it's not evaluated in the right environment. In R, formulas are special language constructs and so they need to be treated separately by rpy2.
So, for your example, this would look like:
rescl = R.clogit(ro.Formula('case ~ spontaneous + induced + strata(stratum)'),
data = infert)
See the documentation for rpy2.robjects.Formula for more details. That documentation also discusses the pros & cons of this approach vs that provided by #Gwang-jin-kim
I would really appreciate some help with running code written in Python 3 from Matlab.
My Python code loads various libraries and uses them to perform numerical integration of a differential equation (for the numpy vector: e_array ).
The Python code which I would like to call from Matlab is the following:
from numba import jit
from scipy.integrate import quad
import numpy as np
#jit(nopython = True)
def integrand1(x,e,delta,r):
return (-2*np.sqrt(e*r)/np.pi)*(x/np.sqrt(1-x**2))/(1+(delta+2*x*np.sqrt(e*r))**2)
#jit(nopython = True)
def f1(e,delta,r):
return quad(integrand1, -1, 1, args=(e,delta,r))[0]
#jit(nopython = True)
def runge1(e,dtau,delta,r):
k1 = f1(e,delta,r)
k2 = f1((e+k1*dtau/2),delta,r)
k3 = f1((e+k2*dtau/2),delta,r)
k4 = f1((e+k3*dtau),delta,r)
return e + (dtau/6)*(k1+2*k2+2*k3+k4)
time_steps = 60
e = 10
dtau=1
r=1
delta=-1
e_array = np.zeros(time_steps)
time = np.zeros(time_steps)
for i in range(time_steps):
e_array[i] = e
time[i] = i*dtau
e = runge1(e,dtau,delta,r)
Ideally, I would like to be able to call this Python code (pythoncode.py) in Matlab as if it were a Matlab function and feed it the parameters: time_steps, e, dtau, r and delta. I would be very happy with a solution which looks like this:
e_array = pythoncode.py(time_steps = 60, e = 10, dtau = 1, r = 1, delta = -1)
where pythoncode.py is treated as a Matlab function which takes said parameters, feeds them into the Python code and returns the Matlab vector e_array.
I want to point out that there are several additional Python codes that I'd like to be able to call from Matlab and I'm hope to get an idea of how to do this from your answers regarding this specific Python code.
A related question involves the Python libraries which I use in the Python code: Is there a way to "compile" the Python code such that I can call it in Matlab without installing the libraries it uses (f.e the numba library) on the computer running the Matlab code?
Thanks very much for helping,
Asaf
You'll probably need to shell escape out of Matlab to invoke python -- prefix the command you'd run on the shell with !.
Matlab Shell Escape Functions suggests saving a mat file and then opening it in your python code -- see Read .mat files in Python .
In terms of compiling the python, you could take a look at How to compile a Python file and see if that helps you.
I'm trying to do Dirichlet Regression using Python. Unfortunately I cannot find a Python package that does the job. So I tried to call R library DirichletReg using rpy2. However, it is not very intuitive to me how to call a regression function such as DirichReg(Y ~ X1 + X2 + X3, data=predictorData) where Y = DR_data(compositionalData). I saw an example of calling linear regression function lm in the documentation of rpy2. But my case is slightly different as Y is not a column name in the table but an R object DR_data.
I'm wondering what the proper way is to do this, or whether there is a Python package for Dirichlet Regression.
You can send objects into the "Formula" environment from python. This example is from the rpy2 docs:
import array
from rpy2.robjects import IntVector, Formula
from rpy2.robjects.packages import importr
stats = importr('stats')
x = IntVector(range(1, 11))
y = x.ro + stats.rnorm(10, sd=0.2)
fmla = Formula('y ~ x')
env = fmla.environment
env['x'] = x
env['y'] = y
fit = stats.lm(fmla)
You can also create named variables in the R environment (outside the Formula). See here. Worst case scenario, you move some your python data into R through rpy2, then issue the commands directly in R through the rpy2 bridge as described here.
I am new to Python. I am trying to run MATLAB from inside Python using the mlab package. I was following the guide on the website, and I entered this in the Python command line:
from mlab.releases import latest_release
The error I got was:
cannot import name find_available_releases
It seems that under matlabcom.py there was no find_available_releases function.
May I know if anyone knows how to resolve this? Thank you!
PS: I am using Windows 7, MATLAB 2012a and Python 2.7
I skimmed through the code, and I don't think all of the README file and its documentation match what's actually implemented. It appears to be mostly copied from the original mlabwrap project.
This is confusing because mlabwrap is implemented using a C extension module to interact with the MATLAB Engine API. However the mlab code seems to have replaced that part with a pure Python implementation as the MATLAB-bridge backend. It comes from "Dana Pe'er Lab" and it uses two different methods to interact with MATLAB depending on the platform (COM/ActiveX on Windows and pipes on Linux/Mac).
Now that we understand how the backend is implemented, you can start looking at the import error.
Note: the Linux/Mac part of the code tries to find the MATLAB executable in some hardcoded fixed locations, and allows to choose between different versions.
However you are working on Windows, and the code doesn't really implement any way of picking between MATLAB releases for this platform (so all of the methods like discover_location and find_available_releases are useless on Windows). In the end, the COM object is created as:
self.client = win32com.client.Dispatch('matlab.application')
As explained here, the ProgID matlab.application is not version-specific, and will simply use whatever was registered as the default MATLAB COM server. We can explicitly specify what MATLAB version we want (assuming you have multiple installations), for instance matlab.application.8.3 will pick MATLAB R2014a.
So to fix the code, IMO the easiest way would be to get rid of all that logic about multiple MATLAB versions (in the Windows part of the code), and just let it create the MATLAB COM object as is. I haven't attempted it, but I don't think it's too involved... Good luck!
EDIT:
I download the module and I managed to get it to work on Windows (I'm using Python 2.7.6 and MATLAB R2014a). Here are the changes:
$ git diff
diff --git a/src/mlab/matlabcom.py b/src/mlab/matlabcom.py
index 93f075c..da1c6fa 100644
--- a/src/mlab/matlabcom.py
+++ b/src/mlab/matlabcom.py
## -21,6 +21,11 ## except:
print 'win32com in missing, please install it'
raise
+def find_available_releases():
+ # report we have all versions
+ return [('R%d%s' % (y,v), '')
+ for y in range(2006,2015) for v in ('a','b')]
+
def discover_location(matlab_release):
pass
## -62,7 +67,7 ## class MatlabCom(object):
"""
self._check_open()
try:
- self.eval('quit();')
+ pass #self.eval('quit();')
except:
pass
del self.client
diff --git a/src/mlab/mlabraw.py b/src/mlab/mlabraw.py
index 3471362..16e0e2b 100644
--- a/src/mlab/mlabraw.py
+++ b/src/mlab/mlabraw.py
## -42,6 +42,7 ## def open():
if is_win:
ret = MatlabConnection()
ret.open()
+ return ret
else:
if settings.MATLAB_PATH != 'guess':
matlab_path = settings.MATLAB_PATH + '/bin/matlab'
diff --git a/src/mlab/releases.py b/src/mlab/releases.py
index d792b12..9d6cf5d 100644
--- a/src/mlab/releases.py
+++ b/src/mlab/releases.py
## -88,7 +88,7 ## class MatlabVersions(dict):
# Make it a module
sys.modules['mlab.releases.' + matlab_release] = instance
sys.modules['matlab'] = instance
- return MlabWrap()
+ return instance
def pick_latest_release(self):
return get_latest_release(self._available_releases)
First I added the missing find_available_releases function. I made it so that it reports that all MATLAB versions are available (like I explained above, it doesn't really matter because of the way the COM object is created). An even better fix would be to detect the installed/registered MATLAB versions using the Windows registry (check the keys HKCR\Matlab.Application.X.Y and follow their CLSID in HKCR\CLSID). That way you can truly choose and pick which version to run.
I also fixed two unrelated bugs (one where the author forgot the function return value, and the other unnecessarily creating the wrapper object twice).
Note: During testing, it might be faster NOT to start/shutdown a MATLAB instance each time the script is called. This is why I commented self.eval('quit();') in the close function. That way you can start MATLAB using matlab.exe -automation (do this only once), and then repeatedly re-use the session without shutting it down. Just kill the process when you're done :)
Here is a Python example to test the module (I also show a comparison against NumPy/SciPy/Matplotlib):
test_mlab.py
# could be anything from: latest_release, R2014b, ..., R2006a
# makes no difference :)
from mlab.releases import R2014a as matlab
# show MATLAB version
print "MATLAB version: ", matlab.version()
print matlab.matlabroot()
# compute SVD of a NumPy array
import numpy as np
A = np.random.rand(5, 5)
U, S, V = matlab.svd(A, nout=3)
print "S = \n", matlab.diag(S)
# compare MATLAB's SVD against Scipy's SVD
U, S, V = np.linalg.svd(A)
print S
# 3d plot in MATLAB
X, Y, Z = matlab.peaks(nout=3)
matlab.figure(1)
matlab.surf(X, Y, Z)
matlab.title('Peaks')
matlab.xlabel('X')
matlab.ylabel('Y')
matlab.zlabel('Z')
# compare against matplotlib surface plot
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='jet')
ax.view_init(30.0, 232.5)
plt.title('Peaks')
plt.xlabel('X')
plt.ylabel('Y')
ax.set_zlabel('Z')
plt.show()
Here is the output I get:
C:\>python test_mlab.py
MATLAB version: 8.3.0.532 (R2014a)
C:\Program Files\MATLAB\R2014a
S =
[[ 2.41632007]
[ 0.78527851]
[ 0.44582117]
[ 0.29086795]
[ 0.00552422]]
[ 2.41632007 0.78527851 0.44582117 0.29086795 0.00552422]
EDIT2:
The above changes have been accepted and merged into mlab.
You are right in saying that the find_available_releases() is not written. 2 ways to work this out
Check out the code in linux and work on it (You are working on
windows !)
Change the Code as below
Add the following function in matlabcom.py as in matlabpipe.py
def find_available_releases():
global _RELEASES
if not _RELEASES:
_RELEASES = list(_list_releases())
return _RELEASES
If you see mlabraw.py file, the following code will give you a clear idea why I am saying this !
import sys
is_win = 'win' in sys.platform
if is_win:
from matlabcom import MatlabCom as MatlabConnection
from matlabcom import MatlabError as error
from matlabcom import discover_location, find_available_releases
from matlabcom import WindowsMatlabReleaseNotFound as MatlabReleaseNotFound
else:
from matlabpipe import MatlabPipe as MatlabConnection
from matlabpipe import MatlabError as error
from matlabpipe import discover_location, find_available_releases
from matlabpipe import UnixMatlabReleaseNotFound as MatlabReleaseNotFound
I want to calculate logistic regression parameters using R's glm package. I'm working with python and using rpy2 for that.
For some reason, when I'm running the glm function using R I get much faster results than by using rpy2. Do you know why the calculations using rpy2 is much slower?
I'm using R - V2.13.1 and rpy2 - V2.0.8
Here is the code I'm using:
import numpy
from rpy2 import robjects as ro
import rpy2.rlike.container as rlc
def train(self, x_values, y_values, weights):
x_float_vector = [ro.FloatVector(x) for x in numpy.array(x_values).transpose()]
y_float_vector = ro.FloatVector(y_values)
weights_float_vector = ro.FloatVector(weights)
names = ['v' + str(i) for i in xrange(len(x_float_vector))]
d = rlc.TaggedList(x_float_vector + [y_float_vector], names + ['y'])
data = ro.RDataFrame(d)
formula = 'y ~ '
for x in names:
formula += x + '+'
formula = formula[:-1]
fit_res = ro.r.glm(formula=ro.r(formula), data=data, weights=weights_float_vector, family=ro.r('binomial(link="logit")'))
Without the full R code you are benchmarking against, it is difficult to precisely point out where the problem might be.
You might want to run this through a Python profiler to see where the bottleneck(s) is (are).
Finally, the current release for rpy2 is 2.2.6. Beside API changes, it is running faster and has (presumably) less bugs than 2.0.8.
Edit: From your comments I am now suspecting that you are calling your function
in a loop, and a large fraction of the time is spent building R vectors (that might only have to be built once).