How can I partition pyspark RDDs holding R functions

How can I partition pyspark RDDs holding R functions - python

import rpy2.robjects as robjects
dffunc = sc.parallelize([(0,robjects.r.rnorm),(1,robjects.r.runif)])
dffunc.collect()
Outputs
[(0, <rpy2.rinterface.SexpClosure - Python:0x7f2ecfc28618 / R:0x26abd18>), (1, <rpy2.rinterface.SexpClosure - Python:0x7f2ecfc283d8 / R:0x26aad28>)]
While the partitioned version results in an error:
dffuncpart = dffunc.partitionBy(2)
dffuncpart.collect()
RuntimeError: ('R cannot evaluate code before being initialized.', <built-in function unserialize>
It seems like this error is that R wasn't loaded on one of the partitions, which I assume implies that the first import step was not performed. Is there anyway around this?
EDIT 1 This second example causes me to think there's a bug in the timing of pyspark or rpy2.
dffunc = sc.parallelize([(0,robjects.r.rnorm), (1,robjects.r.runif)]).partitionBy(2)
def loadmodel(model):
import rpy2.robjects as robjects
return model[1](2)
dffunc.map(loadmodel).collect()
Produces the same error R cannot evaluate code before being initialized.
dffuncpickle = sc.parallelize([(0,pickle.dumps(robjects.r.rnorm)),(1,pickle.dumps(robjects.r.runif))]).partitionBy(2)
def loadmodelpickle(model):
import rpy2.robjects as robjects
import pickle
return pickle.loads(model[1])(2)
dffuncpickle.map(loadmodelpickle).collect()
Works just as expected.

I'd like to say that "this is not a bug in rpy2, this is a feature" but I'll realistically have to settle with "this is a limitation".
What is happening is that rpy2 has 2 interface levels. One is a low-level one (closer to R's C API) and available through rpy2.rinterface and the other one is a high-level interface with more bells and whistles, more "pythonic", and with classes for R objects inheriting from rinterface level-ones (that last part is important for the part about pickling below). Importing the high-level interface results in initializing (starting) the embedded R with default parameters if necessary. Importing the low-level interface rinterface does not have this side effect and the initialization of the embedded R must be performed explicitly (function initr). rpy2 was designed this way because the initialization of the embedded R can have parameters: importing first rpy2.rinterface, setting the initialization, then importing rpy2.robjects makes this possible.
In addition to that the serialization (pickling) of R objects wrapped by rpy2 is currently only defined at the rinterface level (see the documentation). Pickling robjects-level (high-level) rpy2 objects is using the rinterface-level code and when unpickling them they will remain at that lower-level (the Python pickle contains the module the class of the object is defined in and will import that module - here rinterface, which does not imply the initialization of the embedded R). The reason for things being this way are simply that it was "good enough for now": at the time this was implemented I had to simultaneously think of a good way to bridge two somewhat different languages and learn my way through Python C-API and pickling/unpickling Python objects. Given the ease with which one can write something like
import rpy2.robjects
or
import rpy2.rinterface
rpy2.rinterface.initr()
before unpickling, this was never revisited. The uses of rpy2's pickling I know about are using Python's multiprocessing (and adding something similar to the import statements in the code initializing a child process was a cheap and sufficient fix). May this is the time to look at this again. File a bug report for rpy2 if the case.
edit: this is undoubtedly an issue with rpy2. pickled robjects-level objects should unpickle back to robjects-level, not rinterface-level. I have opened an issue in the rpy2 tracker (and already pushed a rudimentary patch in the default/dev branch).
2nd edit: The patch is part of released rpy2 starting with version 2.7.7 (latest release at the time of writing is 2.7.8).

Related

Using R packages in Python using rpy2

There is a package in R that I need to use on my data. All my data preprocessing has already been done in python and all the modelling as well. The package in R is 'PMA'. I have used r2py before using Rs PLS package as follows
import numpy as np
from rpy2.robjects.numpy2ri import numpy2ri
import rpy2.robjects as ro
def Rpcr(X_train,Y_train,X_test):
ro.r('''source('R_pls.R')''')
r_pls=ro.globalenv['R_pls']
r_x_train=numpy2ri(X_train)
r_y_train=numpy2ri(Y_train)
r_x_test=numpy2ri(X_test)
p_res=r_pls(r_x_train,r_y_train,r_x_test)
yp_test=np.array(p_res[0])
yp_test=yp_test.reshape((yp_test.size,))
yp_train=np.array(p_res[1])
yp_train=yp_train.reshape((yp_train.size,))
ncomps=np.array(p_res[2])
ncomps=ncomps.reshape((ncomps.size,))
return yp_test,yp_train,ncomps
when I followed this format is gave an error that function numpy2ri does not exist.
So I have been working off of rpy2 manual and have tried a number of things with no success. The package I am working with in R is implemented like so:
library('PMA')
cspa=CCA(X,Z,typex="standard", typez="standard", K=1, penaltyx=0.25, penaltyz=0.25)
# X and Z are dataframes with dimension ppm and pXq
# cspa returns an R object which I need two attributes u and v
U<-cspa$u
V<-cspa$v
So trying to implement something like I was seeing on the rpy2 tried to load the module in python and use it in python like so
import rpy2.robjects as ro
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage as STAP
from rpy2.robjects import numpy2ri
from rpy2.robjects.packages import importr
base=importr('base'
scca=importr('PMA')
numpy2ri.activate() # To turn NumPy arrays X1 and X2 to r objects
out=scca.CCA(X1,X2,typex="standard",typez="standard", K=1, penaltyz=0.25,penaltyz=0.25)
and got the following error
OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
Abort trap: 6
I also tried using R code directly using an example they had
string<-'''SCCA<-function(X,Z,K,alpha){
library("PMA")
scca<-CCA(X,Z,typex="standard",typez="standard",K=K penaltyx=alpha,penaltyz=alpha)
u<-scca$u
v<-scca$v
out<-list(U=u,V=v)
return(out)}'''
scca=STAP(string,"scca")
which as I understand can be used like an r function directly
numpy2ri.activate()
scca(X,Z,1,0.25)
this results in the same error as above.
So I do not know exactly how to fix it and have been unable to find anything similar.

The error for some reason is a mac-os issue. https://stackoverflow.com/a/53014308/1628393
Thus all you have to do
is modify it with this command and it works well
os.environ['KMP_DUPLICATE_LIB_OK']='True'
string<-'''SCCA<-function(X,Z,K,alpha){
library("PMA")
scca<-CCA(X,Z,typex="standard",typez="standard",K=Kpenaltyx=alpha,penaltyz=alpha)
u<-scca$u
v<-scca$v
out<-list(U=u,V=v)
return(out)}'''
scca=STAP(string,"scca")
then the function is called by
scca.SCCA(X,Z,1,0.25)

Running rpy2 in parallel using multiprocessing raises weird exception that cannot be caught

So this is a problem that I have not been able to solve, and neither do I know of a good way to make a MCVE out of. Essentially, it has been briefly discussed here, but as the comments show, there was some disagreement, and the final verdict is still out. Hence I am posting a similar question again, hoping to get a better answer.
Background
I have sensor data from a couple of thousand sensors, that I get every minute. My interest lies in forecasting the data. For this I am using the ARIMA family of forecasting models. Long story short, after discussion with the rest of my research group, we decided to use the Arima function available in the R package forecast, instead of the statsmodels implementation of the same.
Problem Definition
Since, I have data from a few thousand sensors, for which I would like to at least analyse a whole week's worth of data (to begin with), and since a week has 7 days, I have 7 times the number of sensors data with me. Essentially a some 14k sensor-day combinations. Finding the best ARIMA order (which minimizes BIC) and forecasting the next day of week data takes about 1 minute for each sensor-day combination. Which means upwards of 11 days to just process one week data on a single core!
This is obviously a waste, when I have 15 more cores just idling away the whole time. So, obviously, this is a problem for parallel processing. Note that each sensor-day combination does not influence any other sensor-day combination. Also, the rest of my code is fairly well profiled, and optimized.
Issue
The issue is that I get this weird error that I cannot catch anywhere. Here is the error reproduced:
Exception in thread Thread-3:
Traceback (most recent call last):
File "/home/kartik/miniconda3/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/kartik/miniconda3/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/kartik/miniconda3/lib/python3.5/multiprocessing/pool.py", line 429, in _handle_results
task = get()
File "/home/kartik/miniconda3/lib/python3.5/multiprocessing/connection.py", line 251, in recv
return ForkingPickler.loads(buf.getbuffer())
File "/home/kartik/miniconda3/lib/python3.5/site-packages/rpy2/robjects/robject.py", line 55, in _reduce_robjectmixin
rinterface_level=rinterface_factory(rdumps, rtypeof)
ValueError: Mismatch between the serialized object and the expected R type (expected 6 but got 24)
Here are a few characteristics of this error that I have discovered:
It is raised in the rpy2 package
It has something to do with Thread 3. Since Python is zero indexed, I am guessing this is the fourth thread. Therefore, 4x6 = 24, which adds up to the numbers shown in the final error statement
rpy2 is being used in only one place in my code where it might have to recode returned values into Python types. Protecting that line in try: ... except: ... does not catch that exception
The exception is not raised when I ditch the multiprocessing and call the function within a loop
The exception does not crash the program, just suspends it forever (till I Ctrl+C it into terminating)
All that I tried till now, have had no effect in resolving the error
Things Tried
I have tried everything from extreme procedural coding, with functions to deal with the least cases (that is only one function to be called in parallel), to extreme encapsulation, where the executable block in the if __name__ == '__main__': calls a function which reads in the data, does the necessary grouping, then passes the groups to another function, which imports multiprocessing and calls another function in parallel, which imports the processing module that imports rpy2, and passes the data to the Arima function in R.
Basically, it doesn't matter if rpy2 is called and initialized deep inside function nests, such that it has no idea another instance might be initialized, or if it is called and initialized once, globally, the error is raised if multiprocessing is involved.
Pseudo Code
Here is an attempt to present at least some basic pseudo code such that the error might be reproduced.
import numpy as np
import pandas as pd
def arima_select(y, order):
from rpy2 import robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()
forecast = importr('forecast')
res = forecast.Arima(y, order=ro.FloatVector(order))
return res
def arima_wrapper(data):
data = data[['tstamp', 'val']]
data.set_index('tstamp', inplace=True)
return arima_select(data, (1,1,1))
def applyParallel(groups, func):
from multiprocessing import Pool, cpu_count
with Pool(cpu_count()) as p:
ret_list = p.map(func, [group for _, group in groups])
return pd.concat(ret_list, keys=[name for name, _ in groups])
def wrapper():
df = pd.read_csv('file.csv', parse_dates=[1], infer_datetime_format=True)
df['day'] = df['tstamp'].dt.day
res = applyParallel(df.groupby(['sensor', 'day']), arima_wrapper)
print(res)
Obviously, the above code can be encapsulated further, but I think it should reproduce the error quite accurately.
Data Sample
Here is the output of print(data.head(6)) when placed immediately below data.set_index('tstamp', inplace=True) in arima_wrapper from the pseudo code above:
Or alternatively, data for a sensor, for a whole week can be generated simply with:
def data_gen(start_day):
r = pd.Series(pd.date_range('2016-09-{}'.format(str(start_day)),
periods=24*60, freq='T'),
name='tstamp')
d = pd.Series(np.random.randint(10, 80, 1440), name='val')
s = pd.Series(['sensor1']*1440, name='sensor')
return pd.concat([s, r, d], axis=1)
df = pd.concat([data_gen(day) for day in range(1,8)], ignore_index=True)
Observations and Questions
The first observation is that this error is only raised when multiprocessing is involved, not when the function (arima_wrapper) is called in a loop. Therefore, it must be associated somehow with multiprocessing issues. R is not very multiprocess friendly, but when written in the way shown in the pseudo code, each instance of R should not know about the existence of the other instances.
The way the pseudo code is structured, there must be an initialization of rpy2 for each call inside the multiple subprocesses spawned by multiprocessing. If that were true, each instance of rpy2 should have spawned its own instance of R, which should just execute one function, and terminate. That would not raise any errors, because it would be similar to the single threaded operation. Is my understanding here accurate, or am I completely or partially missing the point?
Were all instances of rpy2 to somehow share an instance of R, then I might reasonably expect the error. What is true: is R shared among all instances of rpy2, or is there an instance of R for each instance of rpy2?
How might this issue be overcome?
Since SO hates question threads with multiple questions in them, I will prioritize my questions such that partial answers will be accepted. Here is my priority list:
How might this issue be overcome? A working code example that does not raise the issue will be accepted as answer even if it does not answer any other question, provided no other answer does better, or was posted earlier.
Is my understanding of Python imports accurate, or am I missing the point about multiple instances of R? If I am wrong, how should I edit the import statements such that a new instance is created within each subprocess? Answers to this question are likely to point me towards a probable solution, and will be accepted, provided no answer does better, or was posted earlier
Is R shared among all instances of rpy2 or is there an instance of R for each instance of rpy2? Answers to this question will be accepted only if they lead to a resolution of the problem.

(...) Long story short (...)
Really ?
How might this issue be overcome? A working code example that does not
raise the issue will be accepted as answer even if it does not answer
any other question, provided no other answer does better, or was
posted earlier.
Answers may leave a quite bit of work on your end...
Is my understanding of Python imports accurate, or am
I missing the point about multiple instances of R? If I am wrong, how
should I edit the import statements such that a new instance is
created within each subprocess? Answers to this question are likely to
point me towards a probable solution, and will be accepted, provided
no answer does better, or was posted earlier
Python packages/modules are "uniquely" imported across your process which means that all code using the package/module within the process is using the same single import (you don't have a copy per import in a given block).
Because of this, I'd recommend to use an initialization function when creating your Pool rather than repeatedly import rpy2 and setup the conversion each time a task is sent to a worker. You may also gain in performance if each task is short.
def arima_select(y, order):
# FIXME: check whether the rpy2.robjects package
# should be (re) imported as ro to be visible
res = forecast.Arima(y, order=ro.FloatVector(order))
return res
forecast = None
def worker_init():
from rpy2 import robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()
global forecast
forecast = importr('forecast')
def applyParallel(groups, func):
from multiprocessing import Pool, cpu_count
with Pool(cpu_count(), worker_init) as p:
ret_list = p.map(func, [group for _, group in groups])
return pd.concat(ret_list, keys=[name for name, _ in groups])
Is R shared among all
instances of rpy2 or is there an instance of R for each instance of
rpy2? Answers to this question will be accepted only if they lead to a
resolution of the problem.
rpy2 is making R available by linking its C shared library. One such library per Python process, and that's as a stateful library (R not able to handle concurrency). I think that your issue has more to do with object serialization (see http://rpy2.readthedocs.io/en/version_2.8.x/robjects_serialization.html#object-serialization) than with concurrency.
What is happening is some apparent confusion when reconstructing the R objects after Python pickled the rpy2 object. More specifically, when looking that the R object types mentioned in the error message:
>>> from rpy2.rinterface import str_typeint
>>> str_typeint(6)
'LANGSXP'
>>> str_typeint(24)
'RAWSXP'
I am guessing that the R object returned by forecast.Arima contains an unevaluated R expression (for example the call that lead to that result object) and when serializing and unserializing it is coming back as something different (a raw vector of bytes). This is possibly a bug with R's own serialization mechanism (since rpy2 is using it behind the hood). For now, and solve your issue, you may want to extract what forecast.Arima what you care most about and only return that from the function call ran by the worker.

The following changes to the arima_select function in the pesudo code presented in the question work:
import numpy as np
import pandas as pd
from rpy2 import rinterface as ri
ri.initr()
def arima_select(y, order):
def rimport(packname):
as_environment = ri.baseenv['as.environment']
require = ri.baseenv['require']
require(ri.StrSexpVector([packname]),
quiet = ri.BoolSexpVector((True, )))
packname = ri.StrSexpVector(['package:' + str(packname)])
pack_env = as_environment(packname)
return pack_env
frcst = rimport("forecast")
args = (('y', ri.FloatSexpVector(y)),
('order', ri.FloatSexpVector(order)),
('include.constant', ri.StrSexpVector(const)))
return frcst['Arima'].rcall(args, ri.globalenv)
Keeping the rest of the pseudo code the same. Note that I have since optimized the code further, and it does not require all the functions presented in the question. Basically, the following is necessary and sufficient:
import numpy as np
import pandas as pd
from rpy2 import rinterface as ri
ri.initr()
def arima(y, order=(1,1,1)):
# This is the same as arima_select above, just renamed to arima
...
def applyParallel(groups, func):
from multiprocessing import Pool, cpu_count
with Pool(cpu_count(), worker_init) as p:
ret_list = p.map(func, [group for _, group in groups])
return pd.concat(ret_list, keys=[name for name, _ in groups])
def main():
# Create your df in your favorite way:
def data_gen(start_day):
r = pd.Series(pd.date_range('2016-09-{}'.format(str(start_day)),
periods=24*60, freq='T'),
name='tstamp')
d = pd.Series(np.random.randint(10, 80, 1440), name='val')
s = pd.Series(['sensor1']*1440, name='sensor')
return pd.concat([s, r, d], axis=1)
df = pd.concat([data_gen(day) for day in range(1,8)], ignore_index=True)
applyParallel(df.groupby(['sensor', pd.Grouper(key='tstamp', freq='D')]),
arima) # Note one may use partial from functools to pass order to arima
Note that I also do not call arima directly from applyParallel since my goal is to find the best model for the given series (for a sensor and day). I use a function arima_wrapper to iterate through the order combinations, and call arima at each iteration.

Converting a script from rpy to rpy2 using rpy2.rpy_classic

I'm a bit of a novice when it comes to python, but I want to convert a python script using rpy into one using rpy2. We do have rpy installed somewhere (for python 2.6.x), but it's not playing nicely with the current version of R (3.2.0). We do however have rpy2 installed for the version of python being used in these scripts (python 2.7[.5])
As far as I can tell, these are the lines which need to change (I've simplified the function a bit):
from rpy import r
r.library('<libname>', quietly=True)
r("""\
func <- function(x,a={options.a},b={options.b}) {{
...
*R code here*
...
l<-list(o=o,md=a+b)
l
}}""".format(options=options))
and later in the script, there's a line which calls this function:
out = r.func(<python expression>)['o']
I can do the first half as follows:
import rpy2.rpy_classic as rpy
rpy.set_default_mode(rpy.NO_CONVERSION)
rpy.r.library('<libname>', quietly=True)
rpy.r("""\
func <- function(x,a={options.a},b={options.b}) {{
...
*R code here*
...
l<-list(o=o,md=a+b)
l
}}""".format(options=options))
Trying the above at an interactive prompt (with some fake data), the output is:
<rpy2.rpy_classic.Robj object at 0x2b9e48481510>
but I need the output value of the function rpy.r.func rather than its not-converted value (as I need to obtain the func(<expression)$o value)
Am I moving on the right track? And how do I rewrite the rpy (v1) code so that I get what I want (from rpy2)?

rpy_classic was mostly there in the early days to demonstrate that the lower-level interface in rpy2 could be used to implement any higher-level interface, including the one in rpy. It is not meant to be an ultimate compatibility tool.
With rpy2's high-level interface robjects, your rpy code would look like:
from rpy2.robjects.packages import importr
from rpy2.robjects import r
lib=importr('<libname>')
rfunc=r("""
function(x,a={options.a},b={options.b}) {{
...
*R code here*
...
l<-list(o=o,md=a+b)
l
}}""".format(options=options))
out = rfunc(<python expression>).rx2('o')

Creating an R data.frame in python with low level rpy2

I am using the rpy2 package to bring some R functionality to python.
The functions I'm using in R need a data.frame object, and by using rlike.TaggedList and then robjects.DataFrame I am able to make this work.
However I'm having performance issues, when comparing to the exact same R functions with the exact same data, which led me to try and use the rpy2 low level interface as mentioned here - http://rpy.sourceforge.net/rpy2/doc-2.3/html/performances.html
So far I have tried:
Using TaggedList with FloatSexpVector objects instead of numpy arrays, and the DataFrame object.
Dumping the TaggedList and DataFrame classes by using a dictionary like this:
d = dict((var_name, var_sexp_vector) for ...)
dataframe = robjects.r('data.frame')(**d)
Both did not get me any noticeable speedup.
I have noticed that DataFrame objects can get a rinterface.SexpVector in their constructor , so I have thought of creating a such a named vector, but I have no idea on how to put in the names (in R I know its just names(vec) = c('a','b'...)).
How do I do that? Is there another way?
And is there an easy way to profile rpy itself, so I could know where the bottleneck is?
EDIT:
The following code seem to work great (x4 faster) on newer rpy (2.2.3)
data = ro.r('list')([ri.FloatSexpVector(x) for x in vectors])[0]
data.names = ri.StrSexpVector(vector_names)
However it doesn't on version 2.0.8 (last one supported by windows), since R cant seem to be able to use the names: "Error in eval(expr, envir, enclos) : object 'y' not found"
Ideas?
EDIT #2:
Someone did the fine job of building a rpy2.3 binary for windows (python 2.7), the mentioned works great with it (almost x6 faster for my code)
link: https://bitbucket.org/breisfeld/rpy2_w32_fix/issue/1/binary-installer-for-win32

Python can be several times faster than R (even byte-compiled R), and I managed to perform operations on R data structures with rpy2 faster than R would have. Sharing the relevant R and rpy2 code would help make more specific advice (and improve rpy2 if needed).
In the meantime, SexpVector might not be what you want; it is little more than an abstract class for all R vectors (see class diagram for rpy2.rinterface). ListSexpVector might be more appropriate:
import rpy2.rinterface as ri
ri.initr()
l = ri.ListSexpVector([ri.IntSexpVector((1,2,3)),
ri.StrSexpVector(("a","b","c")),])
An important detail is that R lists are recursive data structures, and R avoids a catch 22-type of situation by having the operator "[[" (in addittion to "["). Python does not have that, and I have not (yet ?) implemented "[[" as a method at the low-level.
Profiling in Python can be done with the module stdlib module cProfile, for example.

Python c_types .dll functions (pari library)

Alright, so a couple days ago I decided to try and write a primitive wrapper for the PARI library. Ever since then I've been playing with ctypes library in loading the dll and accessing the functions contained using code similar to the following:
from ctypes import *
libcyg=CDLL("<path/cygwin1.dll") #It needs cygwin to be loaded. Not sure why.
pari=CDLL("<path>/libpari-gmp-2.4.dll")
print pari.fibo #fibonacci function
#prints something like "<_FuncPtr object at 0x00BA5828>"
So the functions are there and they can potentially be accessed, but I always receive an access violation no matter what I try. For example:
pari.fibo(5) #access violation
pari.fibo(c_int(5)) #access violation
pari.fibo.argtypes = [c_long] #setting arguments manually
pari.fibo.restype = long #set the return type
pari.fibo(byref(c_int(5))) #access violation reading 0x04 consistently
and any variation on that, including setting argtypes to receive pointers.
The Pari .dll is written in C and the fibonacci function's syntax within the library is GEN fibo(long x).
Could it be the return type that's causing these errors, as it is not a standard int or long but a GEN type, which is unique to the PARI library? Any help would be appreciated. If anyone is able to successfully load the library and use ANY function from within python, please tell; I've been at this for hours now.
EDIT: Seems as though I was simply forgetting to initialize the library. After a quick pari.pari_init(4000000,500000) it stopped erroring. Now my problem lies in the in the fact that it returns a GEN object; which is fine, but whenever I try to reference the address to which it points, it's always 33554435, which I presume is still an address. I'm trying further commands and I'll update if I succeed in getting the correct value of something.

You have two problems here, one give fibo the correct return type and two convert the GEN return type to the value you are looking for.
Poking around the source code a bit, you'll find that GEN is defined as a pointer to a long. Also, at looks like the library provides some converting/printing GENs. I focused in on GENtostr since it would probably be safer for all the pari functions.
import cytpes
pari = ctypes.CDLL("./libpari.so.2.3.5") #I did this under linux
pari.fibo.restype = ctypes.POINTER(ctypes.c_long)
pari.GENtostr.restype = ctypes.POINTER(ctypes.c_char)
pari.pari_init(4000000,500000)
x = pari.fibo(100)
y = pari.GENtostr(x)
ctypes.string_at(y)
Results in:
'354224848179261915075'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I partition pyspark RDDs holding R functions - python

Related

Using R packages in Python using rpy2

Running rpy2 in parallel using multiprocessing raises weird exception that cannot be caught

Converting a script from rpy to rpy2 using rpy2.rpy_classic

Creating an R data.frame in python with low level rpy2

Python c_types .dll functions (pari library)

Categories

Resources