Through rpy2 in jupyter, you may plot your data directly from python using R objects. How can you set par(mfrow=c(1,2) in python?
For instance, I want to automatically feed a matrix with variable size from python and plot it (among other statistical analyses) using rpy2. But instead of plotting a single boxplot, I want all of them to be output.
Here's some sample code
import rpy2.ipython
import rpy2.robjects as ro
import scipy as sp
import re #python for regex
from rpy2.robjects.packages import importr
rpy2.robjects.numpy2ri.activate()
%load_ext rpy2.ipython
%R
test=[[1,3,2],[6,5,7,8,9]]
def funtoanalyze(grouparray):
a={}
data=numpy.array(test)
for ig in range(len(grouparray)):
key=grouparray[ig]
value=data[ig]
a[key]=value
next
rbox=ro.r('boxplot')
for gro in a:
datar=a[gro]
ro.r('dev.new()')
rbox(ro.FloatVector(datar[:]),xlab="",main=gro)
return
funtoanalyze(["group33","group2"]) #only plots last group
Your use of %load_ext rpy2.ipython suggests that you want to have your figure in the jupyter notebook.
R is using "graphical devices" to output figures, and calling par(mfrow=c(...)) will either put the setting in an open graphical device or open a new default device and set the parameter.
The "magic" %%R is scanning if figures were generated on default devices and display them in the notebook. The following should work:
%%R
par(mfrow=c(1,2))
plot(0, 0)
plot(0, 0)
If you do not want to use the R magic, there are other utilities for the jupyter notebook in rpy2. For plotting there is a context manager (see https://bitbucket.org/rpy2/rpy2/issues/330/ipython-plotting-wrapper - I don't remember if there is more documentation), but the most advanced utilities are tailored for ggplot2. Check for example this slides and the following ones:
https://lgautier.github.io/odsc-ppda-slides/#/5/13
The full notebook is here:
https://github.com/lgautier/odsc-ppda-slides/blob/master/notebooks/slides.ipynb
There is a docker container shipping with everything needed to run the notebook:
https://github.com/lgautier/pragmatic-polyglot-data-analysis
Related
I plan on making a chart with ggplot in a python script. These are details about the project:
I have a script that runs on a remote machine and I can install anything within reason on the machine
The script runs in python and has data that I want to visualize stored as a dictionary
The script runs daily and the data always has the same structure
I think my best bet is to do this...
Write an R script that takes the data and creates the ggplot visualization
Use plumbr to create a rest API for my script
Send a call to the rest API and get a PNG of my plot in return
I'm also familiar with ggpy by yhat and I'm even wondering if I can install R on the machine and just send code directly to the machine to process it without having RStudio.
Would plumbr be a recommended and secure implementation?
This is a reproducible example-
my_data = [{"Chicago": "30"} {"New York": "50"}], [{"Cincinatti": "70"}, {"Green Bay": "95"}]
**{this is the part that's missing}**
library(ggplot)
my_data %>% ggplot(aes(city_name, value)) + geom_col()
png("my_bar_chart.png", my_data)
As mentioned in the comments, most of your question should be answered here: Using R in Python with Rpy2: how to ggplot2?.
You can load ggplot with:
import rpy2.robjects.packages as packages
import rpy2.robjects.lib.ggplot2 as ggp2
assuming of course you have ggplot2 + dependencies available.
Then you can almost use R syntax, except that you put ggp2. in front of every command.
E.g: the Python equivilant of ggplot(mtcars) would be ggp2.ggplot(mtcars).
Your example: (not tested)
my_data = [{"Chicago": "30"} {"New York": "50"}], [{"Cincinatti": "70"}, {"Green Bay": "95"}]
import rpy2.robjects.packages as packages
import rpy2.robjects.lib.ggp2 as ggp2
plot = ggp2.ggplot(my_data) +
ggp2.aes(city_name, value)) +
ggp2.geom_col()
plot.plot()
R("dev.copy(png, 'my_data.png')")
There is a package in R that I need to use on my data. All my data preprocessing has already been done in python and all the modelling as well. The package in R is 'PMA'. I have used r2py before using Rs PLS package as follows
import numpy as np
from rpy2.robjects.numpy2ri import numpy2ri
import rpy2.robjects as ro
def Rpcr(X_train,Y_train,X_test):
ro.r('''source('R_pls.R')''')
r_pls=ro.globalenv['R_pls']
r_x_train=numpy2ri(X_train)
r_y_train=numpy2ri(Y_train)
r_x_test=numpy2ri(X_test)
p_res=r_pls(r_x_train,r_y_train,r_x_test)
yp_test=np.array(p_res[0])
yp_test=yp_test.reshape((yp_test.size,))
yp_train=np.array(p_res[1])
yp_train=yp_train.reshape((yp_train.size,))
ncomps=np.array(p_res[2])
ncomps=ncomps.reshape((ncomps.size,))
return yp_test,yp_train,ncomps
when I followed this format is gave an error that function numpy2ri does not exist.
So I have been working off of rpy2 manual and have tried a number of things with no success. The package I am working with in R is implemented like so:
library('PMA')
cspa=CCA(X,Z,typex="standard", typez="standard", K=1, penaltyx=0.25, penaltyz=0.25)
# X and Z are dataframes with dimension ppm and pXq
# cspa returns an R object which I need two attributes u and v
U<-cspa$u
V<-cspa$v
So trying to implement something like I was seeing on the rpy2 tried to load the module in python and use it in python like so
import rpy2.robjects as ro
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage as STAP
from rpy2.robjects import numpy2ri
from rpy2.robjects.packages import importr
base=importr('base'
scca=importr('PMA')
numpy2ri.activate() # To turn NumPy arrays X1 and X2 to r objects
out=scca.CCA(X1,X2,typex="standard",typez="standard", K=1, penaltyz=0.25,penaltyz=0.25)
and got the following error
OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
Abort trap: 6
I also tried using R code directly using an example they had
string<-'''SCCA<-function(X,Z,K,alpha){
library("PMA")
scca<-CCA(X,Z,typex="standard",typez="standard",K=K penaltyx=alpha,penaltyz=alpha)
u<-scca$u
v<-scca$v
out<-list(U=u,V=v)
return(out)}'''
scca=STAP(string,"scca")
which as I understand can be used like an r function directly
numpy2ri.activate()
scca(X,Z,1,0.25)
this results in the same error as above.
So I do not know exactly how to fix it and have been unable to find anything similar.
The error for some reason is a mac-os issue. https://stackoverflow.com/a/53014308/1628393
Thus all you have to do
is modify it with this command and it works well
os.environ['KMP_DUPLICATE_LIB_OK']='True'
string<-'''SCCA<-function(X,Z,K,alpha){
library("PMA")
scca<-CCA(X,Z,typex="standard",typez="standard",K=Kpenaltyx=alpha,penaltyz=alpha)
u<-scca$u
v<-scca$v
out<-list(U=u,V=v)
return(out)}'''
scca=STAP(string,"scca")
then the function is called by
scca.SCCA(X,Z,1,0.25)
I am trying to generate a plot from the robCompositions package in R. I am working with the example shown here. Details about the package are found here.
This plot is produced following some clustering analysis with the robCompositions package.
My system is Ubuntu 16.10 64-bit. I set my R working directory using:
setwd('/home/UserRob/Downloads')
I installed the package and dependencies using:
install.packages(c('data.table','pls','robCompositions'))
I can execute the code in this example, in R, using the following rtest.R file:
library(robCompositions)
data(expenditures)
x <- expenditures
rr3 <- clustCoDa(x, k=6, distMethod = "Aitchison", method = "single",
transformation = "identity", scale = "none")
plot(rr3)
dev.off()
When I execute this in Ubuntu with Rscript /home/UserRob/Downloads/rtest.R, I get a plot saved to /home/UserRob/Downloads/Rplots.pdf.
I tried to run this code in rPy2 with the following code in rtest.py:
from rpy2.robjects import r
r.library('robCompositions')
r('data')('expenditures')
x = r('expenditures')
rr3 = r('clustCoDa')(x,k=6,distMethod='Aitchison',method='single',scale='none',
transformation='identity')
r('plot')(rr3)
When I execute this using python /home/UserRob/Downloads/rtest.py, the plot is not produced. The Python variable rr3 is the same as that from the R usage above. So, the code runs correctly except for the last line - r('plot')(rr3). It appears that the r('plot') command is not running at all. I also tried the following in rPy2:
from rpy2 import robjects
grdevices = importr("grDevices")
graphics = importr("graphics")
from rpy2.robjects import r
r.library('robCompositions')
r('data')('expenditures')
x = r('expenditures')
rr3 = r('clustCoDa')(x,k=6,distMethod='Aitchison',method='single',scale='none',
transformation='identity')
grdevices.png("rtest.png")
graphics.plot(rr3)
grdevices.dev_off()
However, this also gave me the same result - the plot is not saved to rtest.png. All the code before the line graphics.plot(rr3) appears to run correctly.
In the documentation for R's OOPs, I think I am using plot() the same way as plot.lm() shown. It seems that plot() is an S3 method for the class for class clustCoDa (as documented here) and I just need to use plot() - in line 6 of the doc, it actually just links to the usual plot() function.
Question:
Is there a problem with the plot() function in rPy2?
EDIT:
I have duplicated the problem in Windows 7 64-bit. Here are the details:
Python details:
Python 2.7.8
R details:
R x64 3.3.2
personal library folder: C:\Users\UserRob\Documents\R\win-library\3.3
Windows Environment Variables:
R_USER: C:\Python27\Lib\site-packages\rpy2
R_LIBS: C:\Users\UserRob\Documents\R\win-library\3.3
R_HOME: C:\Program Files\R\R-3.3.2
My rtest_win.py file:
from rpy2.robjects import r
r('.libPaths( c( .libPaths(), "C:/Users/UserRob/Documents/R/win-library/3.3") )')
r.library('robCompositions')
r('data')('expenditures')
x = r('expenditures')
rr3 = r('clustCoDa')(x,k=6,distMethod='Aitchison',method='single',scale='none',
transformation='identity')
r('plot')(rr3)
As with Linux, the code runs but the plot() command does not produce a file. This leads me to suspect that the problem is with rPy2.i.e. rPy2 cannot generate the plot with the robCompositions package's plot() S3 method.
I am writing a custom client (a web-based graph representation of an IPython Notebook) for an IPython application and the easiest way to manage IPython programmatically seems to be using the IPython.core.InteractiveShell instance.
Consider this example: when a Jupyter Notebook cell that uses rich output is executed with inline magic, Notebook shows the appropriate rich representation (a plotted image) of plt.show() inline:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0.0, 2.0, 0.01)
s = np.sin(2*np.pi*t)
plt.plot(t, s)
plt.xlabel('time (s)')
plt.ylabel('voltage (mV)')
plt.title('About as simple as it gets, folks')
plt.grid(True)
plt.show()
I want to be able to retrieve the image when using IPython programmatically via its API, namely InteractiveShell, like this:
from IPython.core.interactiveshell import InteractiveShell
shell = InteractiveShell()
result = shell.run_cell("...above code of the cell here...")
# result either gives an error when using %matplotlib inline or retrieves
# no useful info if no line magic is present
Problem is that InteractiveShell instance will not accept the %matplotlib inline magic, giving a NotImplementedError in its enable_gui method which is, wll, not implemented. I found very few information about this apart from a single issue on IPython's Github.
I know I can do this manually by using plt.save() but that doesn't seem right to me as I don't want to write manual interpretations each time I need another rich representation of the result. I feel like I'm missing a lot here in the way IPython works, so I'm asking for help in retrieving the results. What exactly does Jupyter Notebook do to retrieve the rich representation and perhaps can it be done painlessly via other means? I'm looking at using jupyter_client but for now that seems to be even more confusing.
UPDATE:
The io.capture_output context manager seems to be the way to go but I've been able to capture string outputs only (pretty much the same as using %%capture cell magic):
with io.capture_output() as captured:
result = shell.run_cell(cell)
#captures strings only:
captured.stdout = {str} '<matplotlib.figure.Figure at 0x4f486d8>'
I am an avid python user. I have been programming and performing a lot of my statistics using R. Recently, I tried to go into one of my notebooks to perform some statistical analysis. I have written over 5000 lines of code. Now, I have used R functions scattered everywhere throughout my program. Unfortunately, I am unable to even use any of the functions i have written before.
This is what i have done before:
%load_ext rmagic
import rpy2.robjects as R
import pandas.rpy.common as com
from rpy2.robjects.packages import importr
import scipy.stats as sp
stats=importr('stats')
TSA = importr('TSA')
forecast = importr('forecast')
fUnitRoots = importr('fUnitRoots')
tseries = importr('tseries')
urca = importr('urca')
VARS = importr('vars')
zoo = importr('zoo')
aod = importr('aod')
Now, I can't even run any of this any more as i get an import error "r_magic extension has been moved".
Also, i have called R functions by doing the following:
%R acf(x)
Above statement no longer works.
But if i do....
R.r('acf(x)')
it works. This seems like an annoying change i have to incorporate in my large program. Is there a workaround towards this solution?
Thanks
The rmagic is now in rpy2. Do:
%load_ext rpy2.ipython