Pass variables between R and Python in R Notebook - python

When working in an R Notebook:
If I define a variable in an R chunk it is added to the global environment and is accessible to all other R chunks.
```{r}
a = 1 + 4
a
```
However, I haven't been able to pass the variable into a Python chunk or to access R's Global Environment from Python, even using rpy2.
```{python, engine.path="/anaconda/bin/python"}
import rpy2.robjects as robjects
a = robjects.r['a']
print(a[0])
```
Is there a way to do this? If not, I don't see the point of using a non-R language in an R Notebook. I could use Magics in Jupiter Notebook but that doesn't seem as easy.

Related

Calling functions from within R packages in Python using importr

I am using a feature selection algorithm called mRMRe in R , but I need to call it from Python. I have successfully installed the package and being able to call it from Python. I need to access some functions within the R mRMRe package like mRMR.data to convert the dataframe into a format as needed by the algo.
from rpy2.robjects.packages import importr
utils = importr('utils') #-- Only once.
utils.install_packages('mRMRe')
# Now we begin by loading in the R packages
pymRMR = importr('mRMRe')
pymRMR
Out[53]: rpy2.robjects.packages.Package as a <module 'mRMRe'>
However when I try to call it's function mRMR.data I get an error:
AttributeError: module 'mRMRe' has no attribute 'mRMR'
same is the case if I try to do with a different library:
datasets = importr('datasets')
datasets.data.fetch('mtcars')
Traceback (most recent call last):
File "<ipython-input-56-b036c6da58e1>", line 2, in <module>
datasets.data.fetch('mtcars')
AttributeError: module 'datasets' has no attribute 'data'
I got this datasets part from enter link description here
I am not sure what I am doing wrong. I earlier had imported as used R's medcouple function from mrfDepth as below:
import rpy2.robjects as ro
#now import the importr() method
from rpy2.robjects.packages import importr
utils = importr('utils') #-- Only once.
utils.install_packages('mrfDepth')
# Now we begin by loading in the R packages
mrfdepth = importr('mrfDepth')
mc = mrfdepth.medcouple(yr)[0]
return mc
Can someone please help me to resolve this?
You're only importing the base module, and need to import it entirely. You'd think Python would do that automatically, apparently it doesn't. See this SO answer.
from mRMRr import *
from datasets import *
Edit: Ah, yeah that applies to explicit python modules. I think the syntax of calling on functions of sub-packages is possibly different. Try this.
import rpy2.robjects.packages as packages
datasets = packages.importr('datasets')
mtcars = packages.data(datasets).fetch('mtcars')['mtcars']
I used to import some R packages ans use them inside my python code but recently I improvised a method where you can simply use your R code and give the required tasks to it. Have a look here https://stackoverflow.com/a/55900840/5350311 it can be useful for your case.

import a function from another .ipynb file

I defined a hello world function in a file called 'functions.ipynb'. Now, I would like to import functions in another file by using "import functions". I am sure that they are in the same folder. However, it still shows that "ImportError: No module named functions". By the way, I am using jupyter notebook. Thanks a lot!
You'll want to use the ipynb package/module importer. You'll need to install it: pip install ipynb.
Create a Notebook named my_functions.ipynb. Add a simple function to it.
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n-1)
Then, create a second IPython Notebook and import this function with:
from ipynb.fs.full.my_functions import factorial
Then you can use it as if it was in the same IPython Notebook:
testing = factorial(5)
See the documentation for more details.
For my use case the ipnyb import didn't work for some reason. I had to use Jupyter Notebook magic cell to import my function.:
%run MyOtherNotebook.ipynb #this is were my function was stored
function(df) #then simply run the function
#David Rinck's answer solved the problem, but I'd like to recommend you add the boilerplate __name__ == "__main__" to guard the scripts you don't want to accidentally invoke. It works the same way as in a usual Python file.
If a .ipynb file a.ipynb is imported by another one b.ipynb
from ipynb.fs.full.a import factorial
the __name__ in a.ipynb would be ipynb.fs.full.a rather than "__main__".
You can save functions.ipynb as functions.py and can import the file as import functions. Now you can use any function defined in the functions file as functions.function_name
For eg, if add is a function,
functions.add(5,3)
after importing will work.

grDevices holding file open

I'm working on a proof of concept using rpy2 to tie an existing R package to a web service. I do have the source to the package, if that is needed to fix this issue. I'm also currently developing on Windows, but if this problem is solved by using Linux instead, that's fine, as that's my planned environment.
For my first point in this POC, I'm trying to capture a chart made by this package, and serve it up to a web request using Flask. The complete code:
from flask import Flask, Response
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
from tempfile import TemporaryDirectory
from os import path
app = Flask(__name__)
null = ro.r("NULL")
numeric = ro.r("numeric")
grdevices = importr("grDevices")
efm = importr('euroformix')
#app.route('/')
def index():
table = efm.tableReader('stain.txt')
list = efm.sample_tableToList(table)
with TemporaryDirectory() as dir_name:
print("Working in {0}".format(dir_name))
png_path = path.join(dir_name, "epg_mix.png")
print("png path {0}".format(png_path))
grdevices.png(file=png_path, width=512, height=512)
# Do Data Science Stuff Here
grdevices.dev_off()
with open(png_path, 'rb') as f:
png = f.read()
return Response(png, "image/png")
if __name__ == '__main__':
app.run(debug=True)
When hitting the service, I get back PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\matt\\AppData\\Local\\Temp\\tmpgg65cagq\\epg_mix.png'
Looking at the call stack, it happens when TempDirectory() goes to clean up. Using the Flask debugger, the png variable is empty, also.
So, how do I make grDevices close the file? Or do I need to go about my POC a different way?
rpy2 is not fully supported on Windows and what is working on Linux (or OS X) might not. Since you are developing a PoC with Flask, I'd encourage you to try using Docker (with docker-machine on Windows). You could use rpy2's docker image as a base image.
However, here this is just using the R functions png() and dev.off() so it "should" work.
I have 3 suggestions:
1-
Does your "Do Data Science stuff" block make any R plot ? If not this would explain why your Python object png is empty.
2-
If using R's grid system (e.g., through lattice or ggplot2) and you are evaluating strings as R code it is preferable to explicitly ask R to plot the figure. For example:
p <- ggplot(mydata) + geom_point(aes(x=x, y=y))
print(p)
rather than
ggplot(mydata) + geom_point(aes(x=x, y=y))
3-
Try moving return Response(png, "image/png") to outside the context manager block for TemporaryDirectory

Configure a first cell by default in Jupyter notebooks

TIs there a way to configure a default first cell for a specific python kernel in the Jupyter notebook? I agree that default python imports go against good coding practices.
So, can I configure the notebook such that the first cell of a new python notebook is always
import numpy as np
for instance?
Creating an IPython profile as mentioned above is a good first solution, but IMO it isn't totally satisfying, especially when it comes to code sharing.
The names of your libraries imported through the command exec_lines do not appear in the notebook, so you can easily forget it. And running the code on another profile / machine would raise an error.
Therefore I would recommend to use a Jupyter notebook extension, because the imported libraries are displayed. It avoids importing always the same libraries at the beginning of a notebook.
First you need to install the nbextension of Jupyter.
You can either clone the repo : https://github.com/ipython-contrib/jupyter_contrib_nbextensions
or use the pip : pip install jupyter_contrib_nbextensions
Then you can create a nb extension by adding a folder 'default_cells' to the path where the nb extensions are installed. For example on Ubuntu it's /usr/local/share/jupyter/nbextensions/., maybe on windows : C:\Users\xxx.xxx\AppData\Roaming\jupyter\nbextensions\
You have to create 3 files in this folder:
main.js which contains the js code of the extension
default_cells.yaml description for the API in Jupyter
README.MD the usual description for the reader appearing in the API.
I used the code from : https://github.com/jupyter/notebook/issues/1451my main.js is :
define([
'base/js/namespace'
], function(
Jupyter
) {
function load_ipython_extension() {
if (Jupyter.notebook.get_cells().length===1){
//do your thing
Jupyter.notebook.insert_cell_above('code', 0).set_text("# Scientific libraries\nimport numpy as np\nimport scipy\n\n# import Pandas\n\nimport pandas as pd\n\n# Graphic libraries\n\nimport matplotlib as plt\n%matplotlib inline\nimport seaborn as sns\nfrom plotly.offline import init_notebook_mode, iplot, download_plotlyjs\ninit_notebook_mode()\nimport plotly.graph_objs as go\n\n# Extra options \n\npd.options.display.max_rows = 10\npd.set_option('max_columns', 50)\nsns.set(style='ticks', context='talk')\n\n# Creating alias for magic commands\n%alias_magic t time");
}
}
return {
load_ipython_extension: load_ipython_extension
};
});
the .yaml has to be formatted like this :
Type: IPython Notebook Extension
Compatibility: 3.x, 4.x
Name: Default cells
Main: main.js
Link: README.md
Description: |
Add a default cell for each new notebook. Useful when you import always the same libraries$
Parameters:
- none
and the README.md
default_cells
=========
Add default cells to each new notebook. You have to modify this line in the main.js file to change your default cell. For example
`Jupyter.notebook.insert_cell_above('code', 0).set_text("import numpy as np/nimportpandas as pd")`
You can also add another default cell by creating a new line just below :
`Jupyter.notebook.insert_cell_above('code', 1).set_text("from sklearn.meatrics import mean_squared_error")`
**Don't forget to increment 1 if you want more than one extra cell. **
Then you just have to enable the 'Default cells' extension in the new tab 'nbextensions' which appeared in Jupyter.
The only issue is that it detects if the notebook is new, by looking at the number of cells in the notebook. But if you wrote all your code in one cell, it will detect it as a new notebook and still add the default cells.
Another half-solution: keep the default code in a file, and manually type and execute a %load command in your first cell.
I keep my standard imports in firstcell.py:
%reload_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd
...
Then in each new notebook, I type and run %load firstcell.py in the first cell, and jupyter changes the first cell contents to
# %load firstcell.py
%reload_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
If you really just want a single import statement, this doesn't get you anything, but if you have several you always want to use, this might help.
Go there:
~/.ipython/profile_default/startup/
You can read the README:
This is the IPython startup directory
.py and .ipy files in this directory will be run prior to any code
or files specified via the exec_lines or exec_files configurables
whenever you load this profile.
Files will be run in lexicographical order, so you can control the
execution order of files with a prefix, e.g.::
00-first.py
50-middle.py
99-last.ipy
So you just need to create a file there like 00_imports.py which contains:
import numpy as np
if you want to add stuff like %matplotlib inline use .ipy, which you can use directly as well.
Alternatively, there seems to exist another solution with notebook extension, but I don't know how it works, see here for the github issue of the topic:
https://github.com/jupyter/notebook/issues/640
HTH
An alternative which I find to be quite handy is to use %load command in your notebook.
As an example I stored the following one line code into a python file __JN_init.py
import numpy as np
Then whenever you need it, you could just type in:
%load __JN_init.py
and run the cell. You will get the intended package to be loaded in. The advantage is that you could keep a set of commonly used initialization code with almost no time to set up.
I came up with this:
1 - Create a startup script that will check for a .jupyternotebookrc file:
# ~/.ipython/profile_default/startup/run_jupyternotebookrc.py
import os
import sys
if 'ipykernel' in sys.modules: # hackish way to check whether it's running a notebook
path = os.getcwd()
while path != "/" and ".jupyternotebookrc" not in os.listdir(path):
path = os.path.abspath(path + "/../")
full_path = os.path.join(path, ".jupyternotebookrc")
if os.path.exists(full_path):
get_ipython().run_cell(open(full_path).read(), store_history=False)
2 - Create a configuration file in your project with the code you'd like to run:
# .jupyternotebookrc in any folder or parent folder of the notebook
%load_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
You could commit and share your .jupyternotebookrc with others, but they'll also need the startup script that checks for it.
A quick and also flexible solution is to create template notebooks e.g.
One notebook with specific imports for a python 2.7 kernel:
a_template_for_python27.ipynb
Another notebook with different imports:
a_template_for_python36.ipynb
The preceding a_ has the advantage that your notebook shows up on top.
Now you can duplicate the notebook whenever you need it. The advantage over %load firstcell.py is that you don't need another file.
However the problem of this approach is that the imports do not change dynamically when you want to start an existing notebook with another kernel.
While looking into the same question I found the Jupytemplate project from I was looking into the same question and found a pretty good lightwightb solution for this kind of problem. Jupytemplate copies a template Notebook on top of the notebook you are working on when you initialise the template or press a button. Afterwords the inserted cells are a completly normal part of your Notebook and can be edited/converted/downloaded/exported/imported like you do with any other notebook.
Github Project jupytemplate

Calling R script from python using rpy2

I'm very new to rpy2, as well as R.
I basically have a R script, script.R, which contains functions, such as rfunc(folder). It is located in the same directory as my python script. I want to call it from Python, and then launch one of its functions. I do not need any output from this R function. I know it must be very basic, but I cannot find examples of R script-calling python codes.
What I am currently doing, in Python:
import rpy2.robjects as robjects
def pyFunction(folder):
#do python stuff
r=robjects.r
r[r.source("script.R")]
r["rfunc(folder)"]
#do python stuff
pyFunction(folder)
I am getting an error on the line with source:
r[r.source("script.R")]
File "/usr/lib/python2.7/dist-packages/rpy2/robjects/__init__.py", line 226, in __getitem__
res = _globalenv.get(item)
TypeError: argument 1 must be string, not ListVector
I quite do not understand how the argument I give it is not a string, and I guess the same problem will then happen on the next line, with folder being a python string, and not a R thingie.
So, how can I properly call my script?
source is a r function, which runs a r source file. Therefore in rpy2, we have two ways to call it, either:
import rpy2.robjects as robjects
r = robjects.r
r['source']('script.R')
or
import rpy2.robjects as robjects
r = robjects.r
r.source('script.R')
r[r.source("script.R")] is a wrong way to do it.
Same idea may apply to the next line.

Categories

Resources