Rpy2 & ggplot2: LookupError 'print.ggplot' - python

Unhindered by any pre-existing knowledge of R, Rpy2 and ggplot2 I would never the less like to create a scatterplot of a trivial table from Python.
To set this up I've just installed:
Ubuntu 11.10 64 bit
R version 2.14.2 (from r-cran mirror)
ggplot2 (through R> install.packages('ggplot2'))
rpy2-2.2.5 (through easy_install)
Following this I am able to plot some example dataframes from an interactive R session using ggplot2.
However, when I merely try to import ggplot2 as I've seen in an example I found online, I get the following error:
from rpy2.robjects.lib import ggplot2
File ".../rpy2/robjects/lib/ggplot2.py", line 23, in <module>
class GGPlot(robjects.RObject):
File ".../rpy2/robjects/lib/ggplot2.py", line 26, in GGPlot
_rprint = ggplot2_env['print.ggplot']
File ".../rpy2/robjects/environments.py", line 14, in __getitem__
res = super(Environment, self).__getitem__(item)
LookupError: 'print.ggplot' not found
Can anyone tell me what I am doing wrong? As I said the offending import comes from an online example, so it might well be that there is some other way I should be using gplot2 through rpy2.
For reference, and unrelated to the problem above, here's an example of the dataframe I would like to plot, once I get the import to work (should not be a problem looking at the examples). The idea is to create a scatter plot with the lengths on the x axis, the percentages on the Y axis, and the boolean is used to color the dots, whcih I would then like to save to a file (either image or pdf). Given that these requirements are very limited, alternative solutions are welcome as well.
original.length row.retained percentage.retained
1 1875 FALSE 11.00
2 1143 FALSE 23.00
3 960 FALSE 44.00
4 1302 FALSE 66.00
5 2016 TRUE 87.00

There were changes in the R package ggplot2 that broke the rpy2 layer.
Try with a recent (I just fixed this) snapshot of the "default" branch (rpy2-2.3.0-dev) for the rpy2 code on bitbucket.
Edit: rpy2-2.3.0 is a couple of months behind schedule. I just pushed a bugfix release rpy2-2.2.6 that should address the problem.

Although I can't help you with a fix for the import error you're seeing, there is a similar example using lattice here: lattice with rpy2.
Also, the standard R plot function accepts coloring by using the factor function (which you can feed the row.retained column. Example:
plot(original.length, percentage.retained, type="p", col=factor(row.retained))

Based on fucitol's answer I've instead implemented the plot using both the default plot & lattice. Here are both the implementations:
from rpy2 import robjects
#Convert to R objects
original_lengths = robjects.IntVector(original_lengths)
percentages_retained = robjects.FloatVector(percentages_retained)
row_retained = robjects.StrVector(row_retained)
#Plot using standard plot
r = robjects.r
r.plot(x=percentages_retained,
y=original_lengths,
col=row_retained,
main='Title',
xlab='Percentage retained',
ylab='Original length',
sub='subtitle',
pch=18)
#Plot using lattice
from rpy2.robjects import Formula
from rpy2.robjects.packages import importr
lattice = importr('lattice')
formula = Formula('lengths ~ percentages')
formula.getenvironment()['lengths'] = original_lengths
formula.getenvironment()['percentages'] = percentages_retained
p = lattice.xyplot(formula,
col=row_retained,
main='Title',
xlab='Percentage retained',
ylab='Original length',
sub='subtitle',
pch=18)
rprint = robjects.globalenv.get("print")
rprint(p)
It's a shame I can't get ggplot2 to work, as it produces nicer graphs by default and I regard working with dataframes as more explicit. Any help in that direction is still welcome!

If you don't have any experience with R but with python, you can use numpy or pandas for data analysis and matplotlib for plotting.
Here is a small example how "this feels like":
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'original_length': [1875, 1143, 960, 1302, 2016],
'row_retained': [False, False, False, False, True],
'percentage_retained': [11.0, 23.0, 44.0, 66.0, 87.0]})
fig, ax = plt.subplots()
ax.scatter(df.original_length, df.percentage_retained,
c=np.where(df.row_retained, 'green', 'red'),
s=np.random.randint(50, 500, 5)
)
true_value = df[df.row_retained]
ax.annotate('This one is True',
xy=(true_value.original_length, true_value.percentage_retained),
xytext=(0.1, 0.001), textcoords='figure fraction',
arrowprops=dict(arrowstyle="->"))
ax.grid()
ax.set_xlabel('Original Length')
ax.set_ylabel('Precentage Retained')
ax.margins(0.04)
plt.tight_layout()
plt.savefig('alternative.png')
pandas also has an experimental rpy2 interface.

The problem is caused by the latest ggplot2 version which is 0.9.0. This version doesn't have the function print.ggplot() which is found in ggplot2 version 0.8.9.
I tried to tinker with the rpy2 code to make it work with the newest ggplot2 but the extend of the changes seem to be quite large.
Meanwhile, just downgrade your ggplot2 version to 0.8.9

Related

Unexpected behavior of pyplot in the seaborn library. Bug?

I'm trying to understand the pointplot function (Link to pointplot doc) to plot error bars.
Setting the 'errorbar' argument to 'sd' should plot the standard deviation along with the mean. But calculating the standard deviation manually results in a different value.
I used the example provided in the documentation:
import seaborn as sns
df = sns.load_dataset("penguins")
ax = sns.pointplot(data=df, x="island", y="body_mass_g", errorbar="sd")
data = ax.lines[1].get_ydata()
print(data[1] - data[0]) # prints 248.57843137254895
sd = df[df['island'] == 'Torgersen']['body_mass_g'].std()
print(sd) # prints 445.10794020256765
I expected both printed values to be the same, since both data[1] - data[0] and sd should be equal to the standard deviation of the variable 'body_mass_g' for the category 'Torgersen'. Other standard deviation provided by sns.pointplot are also not as expected.
I must be missing something obvious here but for the life of me I can't figure it out.
Appreciate any help. I tested the code locally and in google colab with the same results.
My PC had an outdated version of seaborn (0.11.2), where the argument 'errorbar' was named 'ci'. Using the correct argument resolves the problem. Strangly google Colab also uses version 0.11.2, contrary to their claim that they auto update their packages.

Plotnine : Secondary y-axis (dual axes)

I am using python's wonderful plotnine package. I would like to make a plot with dual y-axis, let's say Celsius on the left axis and Fahrenheit on the right.
I have installed the latest version of plotnine, v0.10.1.
This says the feature was added in v0.10.0.
I tried to follow the syntax on how one might do this in R's ggplot (replacing 'dot' notation with underscores) as follows:
import pandas as pd
from plotnine import *
df = pd.DataFrame({
'month':('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'),
'temperature':(26.0,25.8,23.9,20.3,16.7,14.1,13.5,15.0,17.3,19.7,22.0,24.2),
})
df['month'] = pd.Categorical(df['month'], categories=('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'), ordered=True)
p = (ggplot(df, aes(x='month', y='temperature'))
+ theme_light()
+ geom_line(group=1)
+ scale_y_continuous(
name='Celsius',
sec_axis=sec_axis(trans=~.*1.8+32, name='Fahrenheit')
)
)
p
This didn't like the specification of the transformation, so I tried a few different options. Removing this altogether produces the error:
NameError: name 'sec_axis' is not defined
The documentation does not contain a reference for sec_axis, and searching for 'secondary axis' doesn't help either.
How do you implement a secondary axis in plotnine?
This github issue thread that was mentioned in the question does not say in any way that the secondary axis feature has been implemented. It was added to v0.10.0 milestones list before it was released. Here, milestones list means a todo list of what was planned to be implemented before the version releases. However, upon the actual release, the changelog does not mention the secondary axis feature, which means that it was only planned to be implemented and was not actually implemented. Long story short, the planned feature didn't make it into development and release.
So, I'm sorry to say that currently as of v0.10.0 and now v0.10.1 it seems that this feature isn't there yet in plotnine.

rPy2 S3 method plot not working for robCompositions package

I am trying to generate a plot from the robCompositions package in R. I am working with the example shown here. Details about the package are found here.
This plot is produced following some clustering analysis with the robCompositions package.
My system is Ubuntu 16.10 64-bit. I set my R working directory using:
setwd('/home/UserRob/Downloads')
I installed the package and dependencies using:
install.packages(c('data.table','pls','robCompositions'))
I can execute the code in this example, in R, using the following rtest.R file:
library(robCompositions)
data(expenditures)
x <- expenditures
rr3 <- clustCoDa(x, k=6, distMethod = "Aitchison", method = "single",
transformation = "identity", scale = "none")
plot(rr3)
dev.off()
When I execute this in Ubuntu with Rscript /home/UserRob/Downloads/rtest.R, I get a plot saved to /home/UserRob/Downloads/Rplots.pdf.
I tried to run this code in rPy2 with the following code in rtest.py:
from rpy2.robjects import r
r.library('robCompositions')
r('data')('expenditures')
x = r('expenditures')
rr3 = r('clustCoDa')(x,k=6,distMethod='Aitchison',method='single',scale='none',
transformation='identity')
r('plot')(rr3)
When I execute this using python /home/UserRob/Downloads/rtest.py, the plot is not produced. The Python variable rr3 is the same as that from the R usage above. So, the code runs correctly except for the last line - r('plot')(rr3). It appears that the r('plot') command is not running at all. I also tried the following in rPy2:
from rpy2 import robjects
grdevices = importr("grDevices")
graphics = importr("graphics")
from rpy2.robjects import r
r.library('robCompositions')
r('data')('expenditures')
x = r('expenditures')
rr3 = r('clustCoDa')(x,k=6,distMethod='Aitchison',method='single',scale='none',
transformation='identity')
grdevices.png("rtest.png")
graphics.plot(rr3)
grdevices.dev_off()
However, this also gave me the same result - the plot is not saved to rtest.png. All the code before the line graphics.plot(rr3) appears to run correctly.
In the documentation for R's OOPs, I think I am using plot() the same way as plot.lm() shown. It seems that plot() is an S3 method for the class for class clustCoDa (as documented here) and I just need to use plot() - in line 6 of the doc, it actually just links to the usual plot() function.
Question:
Is there a problem with the plot() function in rPy2?
EDIT:
I have duplicated the problem in Windows 7 64-bit. Here are the details:
Python details:
Python 2.7.8
R details:
R x64 3.3.2
personal library folder: C:\Users\UserRob\Documents\R\win-library\3.3
Windows Environment Variables:
R_USER: C:\Python27\Lib\site-packages\rpy2
R_LIBS: C:\Users\UserRob\Documents\R\win-library\3.3
R_HOME: C:\Program Files\R\R-3.3.2
My rtest_win.py file:
from rpy2.robjects import r
r('.libPaths( c( .libPaths(), "C:/Users/UserRob/Documents/R/win-library/3.3") )')
r.library('robCompositions')
r('data')('expenditures')
x = r('expenditures')
rr3 = r('clustCoDa')(x,k=6,distMethod='Aitchison',method='single',scale='none',
transformation='identity')
r('plot')(rr3)
As with Linux, the code runs but the plot() command does not produce a file. This leads me to suspect that the problem is with rPy2.i.e. rPy2 cannot generate the plot with the robCompositions package's plot() S3 method.

How to use Stargazer to print fit in rpy2

I'm learning how to use rpy2, and I would like to create formatted regression output using the stargazer package. My best guess of how to do this is the following code:
import pandas as pd
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
stargazer = importr('stargazer')
from rpy2.robjects import pandas2ri
pandas2ri.activate()
r = robjects.r
df = pd.DataFrame({'x': [1,2,3,4,5],
'y': [2,1,3,5,4]})
fit = r.lm('y~x', data=df)
print fit
print r.stargazer(fit)
However, when I run it, the I get the following output:
Coefficients:
(Intercept) x
0.6 0.8
[1] "\n"
[2] "% Error: Unrecognized object type.\n"
So the fit is being generated, and prints fine. But stargazer doesn't seem to recognize the fit object as something it can parse.
Any suggestions? Am I calling stargazer incorrectly in this context?
I should mention that I am running this in Python 2.7.5 on a windows 10 machine, with R 3.3.2, and rpy2 version 2.7.8 from the unofficial windows binary. So it could just be a problem with the windows build, but it seems odd that everything except stargazer would work.
I am not familiar with the R package stargazer but from a quick look at the documentation this seems to be the correct usage.
Before anything, you may want to check whether the issue is with execution or with printing. At which one of the two lines is this failing ?
p = r.stargazer(fit)
print(p)
If the failure is with the execution, you may want to move more code to R and see if you reach a point where you get it to work. If not, this is likely an issue with the R code and/or stargazer. If you get it to work the issue is on the rpy2/conversion side.
rcode = """
df <- data.frame(x = c(1,2,3,4,5),
y = c(2,1,3,5,4))
fit <- lm('y~x', data=df)
p <- stargazer(fit)
"""
# parse and evaluate the R code
r(rcode)
# intermediate objects can be retrieved from the `globalenv` to
# investigate where they differ from the ones obtained earlier.
# For example:
print(robjects.globalenv["p"])
Now that we showed that it is likely an issue on the stargazer side, we can make the use of arbitrary data frames a matter of binding it to a symbol in R's globalenv:
robjects.globalenv["df"] = df
rcode = """
fit <- lm('y~x', data=df)
p <- stargazer(fit)
"""
# parse and evaluate the R code
r(rcode)
print(robjects.globalenv["p"])

Getting ggplot for Python to make a bar chart

Following this simple example, I am trying to make a dirt simple bar chart using yhat's ggplot python module. Here is the code suggested previously on StackOverflow:
In [1]:
from ggplot import *
import pandas as pd
df = pd.DataFrame({"x":[1,2,3,4], "y":[1,3,4,2]})
ggplot(aes(x="x", weight="y"), df) + geom_bar()
But I get an error:
Out[1]:
<repr(<ggplot.ggplot.ggplot at 0x104b18ad0>) failed: UnboundLocalError: local variable 'ax' referenced before assignment>
This works a newer version of ggplot-python. It's not that pretty (x-axis labels), we really have to work on that :-(

Categories

Resources