I am trying to generate a plot from the robCompositions package in R. I am working with the example shown here. Details about the package are found here.
This plot is produced following some clustering analysis with the robCompositions package.
My system is Ubuntu 16.10 64-bit. I set my R working directory using:
setwd('/home/UserRob/Downloads')
I installed the package and dependencies using:
install.packages(c('data.table','pls','robCompositions'))
I can execute the code in this example, in R, using the following rtest.R file:
library(robCompositions)
data(expenditures)
x <- expenditures
rr3 <- clustCoDa(x, k=6, distMethod = "Aitchison", method = "single",
transformation = "identity", scale = "none")
plot(rr3)
dev.off()
When I execute this in Ubuntu with Rscript /home/UserRob/Downloads/rtest.R, I get a plot saved to /home/UserRob/Downloads/Rplots.pdf.
I tried to run this code in rPy2 with the following code in rtest.py:
from rpy2.robjects import r
r.library('robCompositions')
r('data')('expenditures')
x = r('expenditures')
rr3 = r('clustCoDa')(x,k=6,distMethod='Aitchison',method='single',scale='none',
transformation='identity')
r('plot')(rr3)
When I execute this using python /home/UserRob/Downloads/rtest.py, the plot is not produced. The Python variable rr3 is the same as that from the R usage above. So, the code runs correctly except for the last line - r('plot')(rr3). It appears that the r('plot') command is not running at all. I also tried the following in rPy2:
from rpy2 import robjects
grdevices = importr("grDevices")
graphics = importr("graphics")
from rpy2.robjects import r
r.library('robCompositions')
r('data')('expenditures')
x = r('expenditures')
rr3 = r('clustCoDa')(x,k=6,distMethod='Aitchison',method='single',scale='none',
transformation='identity')
grdevices.png("rtest.png")
graphics.plot(rr3)
grdevices.dev_off()
However, this also gave me the same result - the plot is not saved to rtest.png. All the code before the line graphics.plot(rr3) appears to run correctly.
In the documentation for R's OOPs, I think I am using plot() the same way as plot.lm() shown. It seems that plot() is an S3 method for the class for class clustCoDa (as documented here) and I just need to use plot() - in line 6 of the doc, it actually just links to the usual plot() function.
Question:
Is there a problem with the plot() function in rPy2?
EDIT:
I have duplicated the problem in Windows 7 64-bit. Here are the details:
Python details:
Python 2.7.8
R details:
R x64 3.3.2
personal library folder: C:\Users\UserRob\Documents\R\win-library\3.3
Windows Environment Variables:
R_USER: C:\Python27\Lib\site-packages\rpy2
R_LIBS: C:\Users\UserRob\Documents\R\win-library\3.3
R_HOME: C:\Program Files\R\R-3.3.2
My rtest_win.py file:
from rpy2.robjects import r
r('.libPaths( c( .libPaths(), "C:/Users/UserRob/Documents/R/win-library/3.3") )')
r.library('robCompositions')
r('data')('expenditures')
x = r('expenditures')
rr3 = r('clustCoDa')(x,k=6,distMethod='Aitchison',method='single',scale='none',
transformation='identity')
r('plot')(rr3)
As with Linux, the code runs but the plot() command does not produce a file. This leads me to suspect that the problem is with rPy2.i.e. rPy2 cannot generate the plot with the robCompositions package's plot() S3 method.
Related
I plan on making a chart with ggplot in a python script. These are details about the project:
I have a script that runs on a remote machine and I can install anything within reason on the machine
The script runs in python and has data that I want to visualize stored as a dictionary
The script runs daily and the data always has the same structure
I think my best bet is to do this...
Write an R script that takes the data and creates the ggplot visualization
Use plumbr to create a rest API for my script
Send a call to the rest API and get a PNG of my plot in return
I'm also familiar with ggpy by yhat and I'm even wondering if I can install R on the machine and just send code directly to the machine to process it without having RStudio.
Would plumbr be a recommended and secure implementation?
This is a reproducible example-
my_data = [{"Chicago": "30"} {"New York": "50"}], [{"Cincinatti": "70"}, {"Green Bay": "95"}]
**{this is the part that's missing}**
library(ggplot)
my_data %>% ggplot(aes(city_name, value)) + geom_col()
png("my_bar_chart.png", my_data)
As mentioned in the comments, most of your question should be answered here: Using R in Python with Rpy2: how to ggplot2?.
You can load ggplot with:
import rpy2.robjects.packages as packages
import rpy2.robjects.lib.ggplot2 as ggp2
assuming of course you have ggplot2 + dependencies available.
Then you can almost use R syntax, except that you put ggp2. in front of every command.
E.g: the Python equivilant of ggplot(mtcars) would be ggp2.ggplot(mtcars).
Your example: (not tested)
my_data = [{"Chicago": "30"} {"New York": "50"}], [{"Cincinatti": "70"}, {"Green Bay": "95"}]
import rpy2.robjects.packages as packages
import rpy2.robjects.lib.ggp2 as ggp2
plot = ggp2.ggplot(my_data) +
ggp2.aes(city_name, value)) +
ggp2.geom_col()
plot.plot()
R("dev.copy(png, 'my_data.png')")
There is a package in R that I need to use on my data. All my data preprocessing has already been done in python and all the modelling as well. The package in R is 'PMA'. I have used r2py before using Rs PLS package as follows
import numpy as np
from rpy2.robjects.numpy2ri import numpy2ri
import rpy2.robjects as ro
def Rpcr(X_train,Y_train,X_test):
ro.r('''source('R_pls.R')''')
r_pls=ro.globalenv['R_pls']
r_x_train=numpy2ri(X_train)
r_y_train=numpy2ri(Y_train)
r_x_test=numpy2ri(X_test)
p_res=r_pls(r_x_train,r_y_train,r_x_test)
yp_test=np.array(p_res[0])
yp_test=yp_test.reshape((yp_test.size,))
yp_train=np.array(p_res[1])
yp_train=yp_train.reshape((yp_train.size,))
ncomps=np.array(p_res[2])
ncomps=ncomps.reshape((ncomps.size,))
return yp_test,yp_train,ncomps
when I followed this format is gave an error that function numpy2ri does not exist.
So I have been working off of rpy2 manual and have tried a number of things with no success. The package I am working with in R is implemented like so:
library('PMA')
cspa=CCA(X,Z,typex="standard", typez="standard", K=1, penaltyx=0.25, penaltyz=0.25)
# X and Z are dataframes with dimension ppm and pXq
# cspa returns an R object which I need two attributes u and v
U<-cspa$u
V<-cspa$v
So trying to implement something like I was seeing on the rpy2 tried to load the module in python and use it in python like so
import rpy2.robjects as ro
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage as STAP
from rpy2.robjects import numpy2ri
from rpy2.robjects.packages import importr
base=importr('base'
scca=importr('PMA')
numpy2ri.activate() # To turn NumPy arrays X1 and X2 to r objects
out=scca.CCA(X1,X2,typex="standard",typez="standard", K=1, penaltyz=0.25,penaltyz=0.25)
and got the following error
OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
Abort trap: 6
I also tried using R code directly using an example they had
string<-'''SCCA<-function(X,Z,K,alpha){
library("PMA")
scca<-CCA(X,Z,typex="standard",typez="standard",K=K penaltyx=alpha,penaltyz=alpha)
u<-scca$u
v<-scca$v
out<-list(U=u,V=v)
return(out)}'''
scca=STAP(string,"scca")
which as I understand can be used like an r function directly
numpy2ri.activate()
scca(X,Z,1,0.25)
this results in the same error as above.
So I do not know exactly how to fix it and have been unable to find anything similar.
The error for some reason is a mac-os issue. https://stackoverflow.com/a/53014308/1628393
Thus all you have to do
is modify it with this command and it works well
os.environ['KMP_DUPLICATE_LIB_OK']='True'
string<-'''SCCA<-function(X,Z,K,alpha){
library("PMA")
scca<-CCA(X,Z,typex="standard",typez="standard",K=Kpenaltyx=alpha,penaltyz=alpha)
u<-scca$u
v<-scca$v
out<-list(U=u,V=v)
return(out)}'''
scca=STAP(string,"scca")
then the function is called by
scca.SCCA(X,Z,1,0.25)
Through rpy2 in jupyter, you may plot your data directly from python using R objects. How can you set par(mfrow=c(1,2) in python?
For instance, I want to automatically feed a matrix with variable size from python and plot it (among other statistical analyses) using rpy2. But instead of plotting a single boxplot, I want all of them to be output.
Here's some sample code
import rpy2.ipython
import rpy2.robjects as ro
import scipy as sp
import re #python for regex
from rpy2.robjects.packages import importr
rpy2.robjects.numpy2ri.activate()
%load_ext rpy2.ipython
%R
test=[[1,3,2],[6,5,7,8,9]]
def funtoanalyze(grouparray):
a={}
data=numpy.array(test)
for ig in range(len(grouparray)):
key=grouparray[ig]
value=data[ig]
a[key]=value
next
rbox=ro.r('boxplot')
for gro in a:
datar=a[gro]
ro.r('dev.new()')
rbox(ro.FloatVector(datar[:]),xlab="",main=gro)
return
funtoanalyze(["group33","group2"]) #only plots last group
Your use of %load_ext rpy2.ipython suggests that you want to have your figure in the jupyter notebook.
R is using "graphical devices" to output figures, and calling par(mfrow=c(...)) will either put the setting in an open graphical device or open a new default device and set the parameter.
The "magic" %%R is scanning if figures were generated on default devices and display them in the notebook. The following should work:
%%R
par(mfrow=c(1,2))
plot(0, 0)
plot(0, 0)
If you do not want to use the R magic, there are other utilities for the jupyter notebook in rpy2. For plotting there is a context manager (see https://bitbucket.org/rpy2/rpy2/issues/330/ipython-plotting-wrapper - I don't remember if there is more documentation), but the most advanced utilities are tailored for ggplot2. Check for example this slides and the following ones:
https://lgautier.github.io/odsc-ppda-slides/#/5/13
The full notebook is here:
https://github.com/lgautier/odsc-ppda-slides/blob/master/notebooks/slides.ipynb
There is a docker container shipping with everything needed to run the notebook:
https://github.com/lgautier/pragmatic-polyglot-data-analysis
I'm learning how to use rpy2, and I would like to create formatted regression output using the stargazer package. My best guess of how to do this is the following code:
import pandas as pd
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
stargazer = importr('stargazer')
from rpy2.robjects import pandas2ri
pandas2ri.activate()
r = robjects.r
df = pd.DataFrame({'x': [1,2,3,4,5],
'y': [2,1,3,5,4]})
fit = r.lm('y~x', data=df)
print fit
print r.stargazer(fit)
However, when I run it, the I get the following output:
Coefficients:
(Intercept) x
0.6 0.8
[1] "\n"
[2] "% Error: Unrecognized object type.\n"
So the fit is being generated, and prints fine. But stargazer doesn't seem to recognize the fit object as something it can parse.
Any suggestions? Am I calling stargazer incorrectly in this context?
I should mention that I am running this in Python 2.7.5 on a windows 10 machine, with R 3.3.2, and rpy2 version 2.7.8 from the unofficial windows binary. So it could just be a problem with the windows build, but it seems odd that everything except stargazer would work.
I am not familiar with the R package stargazer but from a quick look at the documentation this seems to be the correct usage.
Before anything, you may want to check whether the issue is with execution or with printing. At which one of the two lines is this failing ?
p = r.stargazer(fit)
print(p)
If the failure is with the execution, you may want to move more code to R and see if you reach a point where you get it to work. If not, this is likely an issue with the R code and/or stargazer. If you get it to work the issue is on the rpy2/conversion side.
rcode = """
df <- data.frame(x = c(1,2,3,4,5),
y = c(2,1,3,5,4))
fit <- lm('y~x', data=df)
p <- stargazer(fit)
"""
# parse and evaluate the R code
r(rcode)
# intermediate objects can be retrieved from the `globalenv` to
# investigate where they differ from the ones obtained earlier.
# For example:
print(robjects.globalenv["p"])
Now that we showed that it is likely an issue on the stargazer side, we can make the use of arbitrary data frames a matter of binding it to a symbol in R's globalenv:
robjects.globalenv["df"] = df
rcode = """
fit <- lm('y~x', data=df)
p <- stargazer(fit)
"""
# parse and evaluate the R code
r(rcode)
print(robjects.globalenv["p"])
I'm a bit of a novice when it comes to python, but I want to convert a python script using rpy into one using rpy2. We do have rpy installed somewhere (for python 2.6.x), but it's not playing nicely with the current version of R (3.2.0). We do however have rpy2 installed for the version of python being used in these scripts (python 2.7[.5])
As far as I can tell, these are the lines which need to change (I've simplified the function a bit):
from rpy import r
r.library('<libname>', quietly=True)
r("""\
func <- function(x,a={options.a},b={options.b}) {{
...
*R code here*
...
l<-list(o=o,md=a+b)
l
}}""".format(options=options))
and later in the script, there's a line which calls this function:
out = r.func(<python expression>)['o']
I can do the first half as follows:
import rpy2.rpy_classic as rpy
rpy.set_default_mode(rpy.NO_CONVERSION)
rpy.r.library('<libname>', quietly=True)
rpy.r("""\
func <- function(x,a={options.a},b={options.b}) {{
...
*R code here*
...
l<-list(o=o,md=a+b)
l
}}""".format(options=options))
Trying the above at an interactive prompt (with some fake data), the output is:
<rpy2.rpy_classic.Robj object at 0x2b9e48481510>
but I need the output value of the function rpy.r.func rather than its not-converted value (as I need to obtain the func(<expression)$o value)
Am I moving on the right track? And how do I rewrite the rpy (v1) code so that I get what I want (from rpy2)?
rpy_classic was mostly there in the early days to demonstrate that the lower-level interface in rpy2 could be used to implement any higher-level interface, including the one in rpy. It is not meant to be an ultimate compatibility tool.
With rpy2's high-level interface robjects, your rpy code would look like:
from rpy2.robjects.packages import importr
from rpy2.robjects import r
lib=importr('<libname>')
rfunc=r("""
function(x,a={options.a},b={options.b}) {{
...
*R code here*
...
l<-list(o=o,md=a+b)
l
}}""".format(options=options))
out = rfunc(<python expression>).rx2('o')