Calling R functions in rpy2 error - "argument is missing" - python

I'm facing some issues in using rpy2 package in Python.
Actually, I am trying to call a function called upliftRF (of the library "uplift" in R) by passing some arguments.
As stated on page 27 of https://cran.r-project.org/web/packages/uplift/uplift.pdf, one of the arguments of the function can be x or a formula that describes the model to fit based on a dataframe ("data" parameter in arguments).
When executing the code of page 29 in R, everything is running without any problems. However, I have some issues in rpy2. Here is my code :
import pandas.rpy.common as com
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
uplift = importr('uplift')
kwargs = {'n': 1000, 'p' : 20, 'rho' : 0, 'sigma' : np.sqrt(2), 'beta.den': 4}
dd = uplift.sim_pte(**kwargs)
ddPD = pandas2ri.ri2py(dd)
ddPD['treat'] = [1 if x==1 else 0 for x in ddPD['treat']]
dd = com.convert_to_r_dataframe(ddPD)
kwargs2 = {'formula':'y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat)',
'mtry':3,'ntree':200,'split_method':'KL','minsplit':200,'data':dd}
fit1 = uplift.upliftRF(**kwargs2)
Then, I get this error :
RRuntimeError: Error in is.data.frame(x) : argument "x" is missing, with no default
However, "x" is not a mandatory parameter of the function.
I guess that the error will be the same for any other R function that has one argument which is not mandatory at all.
Thank you for your help !

import pandas.rpy.common as com
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
uplift = importr('uplift')
Next, you should be able to use the most-common way to call Python functions because importr is "translating" named parameters in the
definition of the R function into syntactically-valid Python names.
dd = uplift.sim_pte(n = 1000, p = 20, rho = 0,
sigma = np.sqrt(2), beta_den = 4)
At this point you appear to have an R data.frame. Going to pandas to add a column, then back to R, is definitely possible:
ddPD = pandas2ri.ri2py(dd)
ddPD['treat'] = [1 if x==1 else 0 for x in ddPD['treat']]
dd = com.convert_to_r_dataframe(ddPD)
However, unless there is a good reason I'd recommend to stick to one conversion scheme when shuttling between pandas and rpy2. The one
defined in pandas or the one defined in rpy2 as consistency across
is presumably less tested. The error RRuntimeError: Error: $ operator is invalid for atomic vectors might come from this.
The alternative to going to pandas is to use the eminently expressive R package dplyr. rpy2 is providing a tailored interface to it since version 2.7.0:
from rpy2.robjects.lib import dplyr
dd = (dplyr.DataFrame(dd)
.mutate(treat = 'ifelse(treat==1, 1, 0)')
It was already pointed out in your answer that the formula should
be declared as such (formulas are language objects in R, but there is
no equivalent at the language level in Python). When writing this
as a common Python call:
fit1 = uplift.upliftRF(formula = robjects.Formula('y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat)'),
mtry = 3,
ntree = 200,
split_method = 'KL',
minsplit = 200,
data = dd)

Related

Step by step time integrators in Python

I am solving a first order initial value problem of the form:
dy/dt = f(t,y(t)), y(0)=y0
I would like to obtain y(n+1) from a given numerical scheme, like for example :
using explicit Euler's scheme, we have
y(i) = y(i-1) + f(t-1,y(t-1)) * dt
Example code:
# Test code to evaluate different time integrators for the following equation:
# y' = (1/2) y + 2sin(3t) ; y(0) = -24/37
def dy_dt(y,t):
func = (1/2)*y + 2*np.sin(3*t)
return func
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
tmin = 0
tmax = 50
delt= 1e-2
t = np.arange(tmin,tmax,delt)
total_steps = len(t)
y_explicit=np.zeros(total_steps)
#y_ODEint=np.zeros(total_steps)
y0 = -24/37
y_explicit[0]=y0
#y_ODEint[0]=y0
# exact solution
y_exact = -(24/37)*np.cos(3*t)- (4/37)*np.sin(3*t) + (y0+24/37)*np.exp(0.5*t)
# Solution using ODEint Python
y_ODEint = odeint(dy_dt,y0,t)
for i in range(1,total_steps):
# Explicit scheme
y_explicit[i] = y_explicit[i-1] + (dy_dt(y_explicit[i-1],t[i-1]))*delt
# Update using ODEint
# y_ODEint[i] = odeint(dy_dt,y_ODEint[i-1],[0,delt])[-1]
plt.figure()
plt.plot(t,y_exact)
plt.plot(t,y_explicit)
# plt.plot(t,y_ODEint)
The current issue I am having is that the functions like ODEint in python provide the entire y(t) as opposed to y(i). like in the line "y_ODEint = odeint(dy_dt,y0,t)"
See in the code, how I have coded the explicit scheme, which gives y(i) for every time step. I want to do the same with ODEint, i tried something but didn't work (all commented lines)
I want to obtain y(i) rather than all ys using ODEint. Is that possible ?
Your system is time variant so you cannot translate the time step from (t[i-1], t[i]) to (0, delt).
The step by step integration will is unstable for your differential equation though
Here is what I get
def dy_dt(y,t):
func = (1/2)*y + 2*np.sin(3*t)
return func
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
tmin = 0
tmax = 40
delt= 1e-2
t = np.arange(tmin,tmax,delt)
total_steps = len(t)
y_explicit=np.zeros(total_steps)
#y_ODEint=np.zeros(total_steps)
y0 = -24/37
y_explicit[0]=y0
# exact solution
y_exact = -(24/37)*np.cos(3*t)- (4/37)*np.sin(3*t) + (y0+24/37)*np.exp(0.5*t)
# Solution using ODEint Python
y_ODEint = odeint(dy_dt,y0,t)
# To be filled step by step
y_ODEint_2 = np.zeros_like(y_ODEint)
y_ODEint_2[0] = y0
for i in range(1,len(y_ODEint_2)):
# update your code to run with the correct time interval
y_ODEint_2[i] = odeint(dy_dt,y_ODEint_2[i-1],[tmin+(i-1)*delt,tmin+i*delt])[-1]
plt.figure()
plt.plot(t,y_ODEint, label='single run')
plt.plot(t,y_ODEint_2, label='step-by-step')
plt.plot(t, y_exact, label='exact')
plt.legend()
plt.ylim([-20, 20])
plt.grid()
Important to notice that both methods are unstable, but the step-by-step explodes slightly before than the single odeint call.
With, for example dy_dt(y,t): -(1/2)*y + 2*np.sin(3*t) the integration becomes more stable, for instance, there is no noticeable error after integrating from zero to 200.

What's wrong in this code? Unknown attribute 'array' of type Module(<module 'numpy' from filename __init__.py'>

I'm trying to create an array inside a function using #vectorize, I don't know why I keep receiving this error:
Unknown attribute 'array' of type Module( < module 'numpy' from 'filename.... /lib/python3.6/site-packages/numpy/ __ init __ .py'>)
Code:
from numba import vectorize, float32
import numpy as np
#vectorize([float32(float32[:,:], float32[:])], target='cuda')
def fitness(vrp_data, individual):
# The first distance is from depot to the first node of the first route
depot = np.array([0.0, 0.0, 30.0, 40.0], dtype=np.float32)
firstnode = np.array([0.0, 0.0, 0.0, 0.0], dtype=np.float32)
firstnode = vrp_data[vrp_data[:,0] == individual[0]][0] if
individual[0] !=0 else depot
x1 = depot[2]
x2 = firstnode[2]
y1 = depot[3]
y2 = firstnode[3]
dx = x1 - x2
dy = y1 - y2
totaldist = math.sqrt(dx * dx + dy * dy)
return totaldist
The code works fine without the function decoration.
The problem
numpy.array is not supported by Numba. Numba only supports a subset of the Numpy top-level functions (ie any function you call like numpy.foo). Here's an identical issue from the Numba bug tracker.
The "solution"
Here's the list of Numpy functions that Numba actually supports. numpy.zeros is supported, so in an ideal world you could just change the lines in your code that use np.array to:
depot = np.zeros(4, dtype=np.float32)
depot[2:] = [30, 40]
firstnode = np.zeros(4, dtype=np.float32)
and it would work. However, when targeting cuda all Numpy functions that allocate memory (including np.zeros) are disabled. So you'll have to come up with a solution that doesn't involve any array allocation.
Issues with use of vectorize
Also, it looks like vectorize is not the wrapper function you should be using. Instead, a function like the one you've written requires the use of guvectorize. Here's the closest thing to your original code that I was able to get to work:
import math
from numba import guvectorize, float32
import numpy as np
#guvectorize([(float32[:,:], float32[:], float32[:])], '(m,n),(p)->()')
def fitness(vrp_data, individual, totaldist):
# The first distance is from depot to the first node of the first route
depot = np.zeros(4, dtype=np.float32)
depot[2:] = [30, 40]
firstnode = np.zeros(4, dtype=np.float32)
firstnode = vrp_data[vrp_data[:,0] == individual[0]][0] if individual[0] !=0 else depot
x1 = depot[2]
x2 = firstnode[2]
y1 = depot[3]
y2 = firstnode[3]
dx = x1 - x2
dy = y1 - y2
totaldist[0] = math.sqrt(dx * dx + dy * dy)
The third argument in the signature is actually the return value, so you call the function like:
vrp_data = np.arange(100, 100 + 4*4, dtype=np.float32).reshape(4,4)
individual = np.arange(100, 104, dtype=np.float32)
fitness(vrp_data, individual)
Output:
95.67131
Better error message in latest Numba
You should probably upgrade your version of Numba. In the current version, your original code raises a somewhat more specific error message:
TypingError: Failed in nopython mode pipeline (step: nopython frontend). Use of unsupported NumPy function 'numpy.array' or unsupported use of the function.

mcint module Python-Monte Carlo integration

I am trying to run a code that outputs a Gaussian distribtuion by integrating the 1-D gaussian distribution equation using Monte Carlo integration. I am trying to use the mcint module. I defined the gaussian equation and the sampler function that is used in the mcint module. I am not sure what the 'measure' part in the mcint function does and what it should be set to. Does anyone know what measure is supposed to be? And how do I know what to set it as?
from matplotlib import pyplot as mp
import numpy as np
import mcint
import random
#f equation
def gaussian(x,x0,sig0,time,var):
[velocity,diffussion_coeffient] = var
mu = x0 + (velocity*time)
sig = sig0 + np.sqrt(2.0*diffussion_coeffient*time)
return (1/(np.sqrt(2.0*np.pi*(sig**2.0))))*(np.exp((-(x-mu)**2.0)/(2.0*(sig**2.0))))
#random variables that are generated during the integration
def sampler(varinterval):
while True:
velocity = random.uniform(varinterval[0][0],varinterval[0][1])
diffussion_coeffient = random.uniform(varinterval[1][0],varinterval[1][1])
yield (velocity,diffussion_coeffient)
if __name__ == "__main__":
x0 = 0
#ranges for integration
velocitymin = -3.0
velocitymax = 3.0
diffussion_coeffientmin = 0.01
diffussion_coeffientmax = 0.89
varinterval = [[velocitymin,velocitymax],[diffussion_coeffientmin,diffussion_coeffientmax]]
time = 1
sig0 = 0.05
x = np.linspace(-20, 20, 120)
res = []
for i in np.linspace(-10, 10, 120):
result, error = mcint.integrate(lambda v: gaussian(i,x0,sig0,time,v), sampler(varinterval), measure=1, n=1000)
res.append(result)
mp.plot(x,res)
mp.show()
Is this the module you are talking about? If that's the case the whole source is only 17 lines long (at the times of writing). The relevant line is the last one, which reads:
return (measure*sample_mean, measure*math.sqrt(sample_var/n))
As you can see, the measure argument (whose default value is unity) is used to scale the values returned by the integrate method.

MINRES implementation in Python

Is there any python implementation of MINRES pseudoinversion algorithm that can deal with Hermitian matrices?
I have found a few sources, but all of them are only capable of working with real matrices and do not seem to be easily generalizable onto the complex case:
https://searchcode.com/codesearch/view/89958680/
https://github.com/pascanur/theano_optimize
(there are a couple of other links, but my reputation does not allow me to post them)
A Hermitian system of size $n$
$$\mathbf y = \mathbf H^{-1}\mathbf v$$
can be embedded in a real, symmetric system of size $2n$:
\begin{equation}
\begin{bmatrix}
\Re(\mathbf y)\\Im(\mathbf y)
\end{bmatrix}=
\begin{bmatrix}
\Re(\mathbf H)&-\Im(\mathbf H)\\Im(\mathbf H)&\Re(\mathbf H)
\end{bmatrix}^{-1}
\begin{bmatrix}
\Re(\mathbf v)\\Im(\mathbf v)
\end{bmatrix}.
\end{equation}
Minimum-residual methods are often used for large problems, where constructing $H$ is impractical. In which case we may have an operation which computes a matrix-vector product, $f: \mathbb C^n \to \mathbb C^n; ,, f(\mathbf v) = \mathbf H\mathbf v.$ This function can be wrapped to operate on $\mathbf x \in \mathbb R^{2n}$ by converting $\mathbf x$ back to a complex vector, applying $f$, and then embedding the result back in $\mathbb R^{2n}$.
Here is an example in python / numpy / scipy:
from scipy.sparse.linalg import minres, LinearOperator
from pylab import *
# Problem size
N = 100
# error helper
er = lambda t,a,b:print('%s error:'%t,mean(abs(a-b)))
# random Hermitian matrix
Q = randn(N,N) + 1j*randn(N,N)
H = Q#conj(Q.T)
# random complex vector
v = randn(N) + 1j*randn(N)
# ground-truth solution
x0 = inv(H)#v
# Pack/unpack complex vector as stacked real vector
c2r = lambda v:block([real(v),imag(v)])
r2c = lambda v:kron([1,1j],eye(N))#v
# Verify that we can embed C^n in R^(2N)
Hr = real(H)
Hi = imag(H)
Hs = block([[Hr,-Hi],[Hi,Hr]])
vs = c2r(v)
xs = inv(Hs)#vs
x1 = r2c(xs)
er('Embed',x0,x1)
# Verify that minres works as expected in R-embed
x2 = r2c(minres(Hs,vs,tol=1e-12)[0])
er('Minres 1',x0,x2)
# Demonstrate using operators
Av = lambda u:c2r( H # r2c(u) )
A = LinearOperator((N*2,)*2,Av,Av)
# Minres, converting input/output to/from complex/real
x3 = r2c(minres(Hs,vs,tol=1e-12)[0])
er('Minres 2',x0,x3)
>>> Embed error: 5.317184726020268e-12
>>> Minres 1 error: 6.641342200989796e-11
>>> Minres 2 error: 6.641342200989796e-11

Converting MATLAB's interp1 to Python interp1d

I'm converting a MATLAB code into a Python code.
The code uses the function interp1 in MATLAB. I found that the scipy function interp1d should be what I'm after, but I'm not sure. Could you tell me if the code, I implemented is correct?
My Python version is 3.4.1, MATLAB version is R2013a. However, the code has been implemented around 2010].
MATLAB:
S_T = [0.0, 2.181716948, 4.363766232, 6.546480392, 8.730192373, ...
10.91523573, 13.10194482, 15.29065504, 17.48170299, 19.67542671, ...
21.87216588, 24.07226205, 26.27605882, 28.48390208; ...
1.0, 1.000382662968538, 1.0020234819906781, 1.0040560245904753, ...
1.0055690037530718, 1.0046180687475195, 1.000824223678225, ...
0.9954866694014762, 0.9891408937764872, 0.9822543350571298, ...
0.97480163751874, 0.9666158376141503, 0.9571711322843011, ...
0.9460998105962408; ...
1.0, 0.9992731388936672, 0.9995093132493109, 0.9997021748479805, ...
0.9982835412406582, 0.9926319477117723, 0.9833685776596993, ...
0.9730725288209638, 0.9626092685176822, 0.9525234896714959, ...
0.9426698515488858, 0.9326788630704709, 0.9218100196936996, ...
0.9095717918978693];
S = transpose(S_T);
dist = 0.00137;
old = 15.61;
ll = 125;
ref = 250;
start = 225;
high = 7500;
low = 2;
U = zeros(low,low,high);
for ii=1:high
g0= start-ref*dist*ii;
g1= g0+ll;
if(g0 <=0.0 && g1 >= 0.0)
temp= old/2*(1-cos(2*pi*g0/ll));
for jj=1:low
U(jj,jj,ii)= temp;
end
end
end
for ii=1:low
S_mod(ii,1,:)=interp1(S(:,1),S(:,ii+1),U(ii,ii,:),'linear');
end
Python:
import numpy
import os
from scipy import interpolate
S = [[0.0, 2.181716948, 4.363766232, 6.546480392, 8.730192373, 10.91523573, 13.10194482, 15.29065504, \
17.48170299, 19.67542671, 21.87216588, 24.07226205, 26.27605882, 28.48390208], \
[1.0, 1.000382662968538, 1.0020234819906781, 1.0040560245904753, 1.0055690037530718, 1.0046180687475195, \
1.000824223678225, 0.9954866694014762, 0.9891408937764872, 0.9822543350571298, 0.97480163751874, \
0.9666158376141503, 0.9571711322843011, 0.9460998105962408], \
[1.0, 0.9992731388936672, 0.9995093132493109, 0.9997021748479805, 0.9982835412406582, 0.9926319477117723, \
0.9833685776596993, 0.9730725288209638, 0.9626092685176822, 0.9525234896714959, 0.9426698515488858, \
0.9326788630704709, 0.9218100196936996, 0.9095717918978693]]
dist = 0.00137
old = 15.61
ll = 125
ref = 250
start = 225
high = 7500
low = 2
U = [numpy.zeros( [low, low] ) for _ in range(high)]
for ii in range(high):
g0 = start - ref * dist * (ii+1)
g1 = g0 + ll
if g0 <=0.0 and g1 >= 0.0:
for jj in range(low):
U[ii][jj,jj] = old / 2 * (1 - numpy.cos( 2 * numpy.pi * g0 / ll) )
S_mod = []
for jj in range(high):
temp = []
for ii in range(low):
temp.append(interpolate.interp1d( S[0], S[ii+1], U[jj][ii,ii]))
S_mod.append(temp)
Ok so I've solved my own problem (thanks to the explanation on the MATLAB interp1 from Alex!).
The python interp1d doesn't have query points in itself, but instead creates a function which you then use to get your new data points. Thus, it should be:
f = interpolate.interp1d( S[0], S[ii+1])
temp.append(f(U[jj][ii,ii]))
There is a python library that let's you use MATLAB functions through wrappers: mlabwrap. If you don't need to change the code of the functions itself this could save you some time.
I don't know scipy, but I can tell you what the interp1 call in MATLAB is doing:
http://www.mathworks.com/help/matlab/ref/interp1.html
You are using the syntax:
vq = interp1(x,v,xq,method)
"Vector x contains the sample points, and v contains the corresponding values, v(x). Vector xq contains the coordinates of the query points."
So, in your code, S(:,1) contains the sample points where your grid is defined, S(:,ii+1) contains your sampled values for your 1-D function, and U(ii,ii,:) contains the query points where you want to interpolate to find new functional values between known values in your grid. You are using linear interpolation.
1-D interpolation is an extremely well defined operation, and interp1 is a relatively straightforward interface for this operation. What exactly do you not understand? Are you clear what interpolation is?
Essentially, you have a discretely defined function f[x], the first argument to interp1 is x, the second argument is f[x], and the third argument are arbitrarily defined query points Xq at which you want to find new function values f[Xq]. Since these values are not known, you have to use an interpolation method for how you will approximate f[Xq]. 'linear' means you will use a linear weighted average of the two known sampled neighbors (left and right neighbors) nearest to Xq.

Categories

Resources