Integrating R code via rpy2 into a Python package

Integrating R code via rpy2 into a Python package - python

I'm attempting to build a Python package, and use rpy2 and a handful of R scripts to integrate R seamlessly into that package.
This is code that I've prototyped previously in a Jupyter notebook. What this usually looks like is:
import rpy2
# load in R script containing some useful functions
rpy2.robjects.r("source('feature.R')")
# generate a python binding for 'useful_func' described in the R script
useful_func = rpy2.robjects.globalenv['useful_func']
result = useful_func(data)
This has worked well in Jupyter, as long as all my R scripts are in the same directory as the notebook I'm working with.
The package I'm trying to build looks something like:
package/
-__init__.py
-package.py
-lib/
-__init__.py
-feature1.py
-feature1.R
I can import feature1 easily, but when it tries to source feature1.R, R can't find the file. I can fix this by providing an absolute path to feature1.R but obviously this won't work when I attempt to distribute the package. How can I generate an absolute path to a resource file within a package in a way that is zip-safe?

...and I figured it out. Answering in case other folks have a similar form of this issue.
In feature1.py:
import importlib.resources as pkg_resources
import rpy2
with pkg_resources.path('lib', 'feature1.R') as filepath:
rpy2.robjects.r("source('" + str(filepath) + "')")
useful_func = rpy2.robjects.globalenv['useful_func']

You have resolved yourself the issue with the path in your package. The following is only a mention of convenience code in rpy2 to let you automagically map your R source file to a Python module (just like rpy2's importr() does, but without the need to have the R code in an R package):
https://rpy2.github.io/doc/v3.1.x/html/robjects_rpackages.html#importing-arbitrary-r-code-as-a-package

Related

Load file from same folder as Python 3.x script inside package

Opening and loading data from a file that is situated in the same folder as the currently executing Python 3.x script can be done like this:
import os
mydata_path = os.path.join(os.path.dirname(__file__), "mydata.txt")
with open(mydata_path, 'r') as file:
data = file.read()
However once the script and mydata.txt files become part of a Python package this is not as straight forward anymore. I have managed to do this using a concoction of functions from the pkg_resources module such as resource_exists(), resource_listdir(), resource_isdir() and resource_string(). I won't put my code here because it is horrible and broken (but it sort of works).
Anyhow my question is; is there no way to manage the loading of a file in the same folder as the currently executing Python script that works regardles of wether the files are in a package or not?

You can use importlib.resources.read_text in order to read a file that's located relative to a package:
from importlib.resources import read_text
data = read_text('mypkg.foo', 'mydata.txt')

Correct way to link to a python file

I am writing a code organized in some (several) files. For the sake of organization of folders and the CMakeLists.txt, a pythonlibs folder is created during the build process, and some links to python files are created to libraries in the /build/src/XXXX/ folder.
In the python file, I add to the python path:
sys.path.insert(1,'/opt/hpc/softwares/erfe/erfe/build/pythonlibs')
import libmsym as msym
When I run the main python file, there is this one library lybmsym that fails with:
import libmsym as msym
File "/opt/hpc/softwares/erfe/erfe/build/pythonlibs/libmsym.py", line 15, in <module>
from . import _libmsym_install_location, export
ImportError: attempted relative import with no known parent package
I created a link using cmake, but I believe it does use the ln command (tried both hard and symbolic). Is there a way to prevent this behavior without changing the library itself, just another way to create this link?
Thanks.

how to use rpy2 within a packrat environment?

I try to use an R package that I have installed using the R package 'packrat' that allow to create a virtual environment similar to virtuanlenv in python. But I do not succeed.
Within a console using R I can run successfully the following code:
cd /path/to/packrat/environment
R # this launch a R console in the packrat environment
library(mycustompackage)
result = mycustompackage::myfunc()
q()
I would like to do the same using rpy2, but I'm unable to activate the packrat environment. Here follow what I've tested unsuccessfully.
from rpy2.robjects import r
from rpy2.robjects.packages import importr
packrat_dir = r.setwd('/path/to/packrat/environment')
importr('mycustompackage')
result = r.mycustompackage.myfunc()
But it fails at 'importr' because it cannot find the package 'mycustompackage'. Either unsuccessfull :
importr('mycustompackage', lib_loc='/path/to/packrat/environment')
Neither:
os.environ['R_HOME'] = '/path/to/packrat/environment'
importr('mycustompackage', lib_loc ='/path/to/packrat/environment')
Any suggestion on how to use rpy2 with packrat environments?

I am not familiar with the R package packrat, but I am noticing that the bash + R and Python/rpy2 code have a subtle difference that might matter a lot: in the bash + R case, when R is starting it is already in your packrat project directory whereas in the Python / rpy2 case R is starting from a different directory and is moved to the packrat project directory using setwd().
I am reading that packrat is using a file .Rprofile (https://rstudio.github.io/packrat/limitations.html), evaluated by R at startup time if in the current directory. I suspect that the issue is down to how packrat is used rather than an issue with rpy2.

Very good remark (hidden file = forgotten file). I found out how to make it running:
from rpy2.robjects import r
from rpy2.robjects.packages import importr
# Init the packrat environment
r.setwd('/path/to/packrat/environment')
r.source('.Rprofile')
# use the packages it contains
importr('mycustompackage')
result = r.myfunc()
lgautier, you made my day, thanks a lot.

pdoc can't import the function from other module

i'm using python 2.7 and trying to gather documentation for our testing project using pdoc.
pdoc is located here: D:\dev\Python27\Scripts
the regression project here: C:\views\md_LDB_RegressionTests_v03.1_laptop\mts\Tests\LDB\Regression\Tests
We are using proboscis for our tests and i'm trying to create html documentation for the separate group of tests, a separate python file in my case.
I run such command:
D:\dev\Python27\Scripts>python pdoc --html "C:\views\md_LDB_RegressionTests_v03.
1_laptop\mts\Tests\LDB\Regression\Tests\tests\check_system_management\check_capa
bilities_encoding_problems.py"
and get such answer:
Traceback (most recent call last):
File "pdoc", line 458, in <module>
module = imp.load_source('__pdoc_file_module__', fp, f)
File "C:\views\md_LDB_RegressionTests_v03.1_laptop\mts\Tests\LDB\Regression\Te
sts\tests\check_system_management\check_capabilities_encoding_problems.py", line
4, in <module>
from common.builders.system_request import default_create_system, create_cap
ability
ImportError: No module named common.builders.system_request
pdoc can't import the function from other modules in our regression...
The structure of our project looks like this:
-Tests (C:\views\md_LDB_RegressionTests_v03.1_laptop\mts\Tests\LDB\Regression\Tests)
-"common" package (with init file)
-"builders" packege
-system_request.py
-"test" package
-check_system_management package
-check_capabilities_encoding_problems.py - this is the file i want to get documentation to
Of course there are lots of other packages but im not sure if it makes sense to describe all the structure now
The import part of the check_capabilities_encoding_problems.py looks like this:
import urllib
from hamcrest import assert_that, all_of
from proboscis import test, before_class, after_class
from common.builders.system_request import default_create_system, create_capability
from common.entity.LDBChecks import LDBChecks
How can i point to pdoc where to look for the functions of other modules?
thank you!

You can set PYTHONPATH env variable. This is a path that say python where to find modules and packages by 3th party also you.

When using pdoc with my Spyder IDE, I use the following script to add a directory to pdoc path
import pdoc
libpath = r'C:\Path\To\Module'
pdoc.import_path.append(libpath)
mod = pdoc.import_module('ModuleName')
doc = pdoc.Module(mod)
string = doc.html()
The pdoc.import_path is a list of currently used paths to look for your module; pdoc.import_path equals sys.path in default. More info can be found in pdoc documentation.

pydoc and pdoc read your code!!!
if you will run it from the same directory pdoc3 --html . or pydoc -w . it should work if all the modules are in the same directory. but if they are not:
make sure your main module in each directory has it sys full path append to it (to the same directory).
sys.path.append("D:/Coding/project/....)
Relative path will not do the trick!

Markdown to LaTeX in Python

There are few versions of the Markdown-Latex package for Python, but I can't get any to work with the current version of the Markdown package. Does anyone have a working example using python like:
lines = markdown.markdown(lines,extensions=[MarkdownLatex()]
Thanks!

You can rename the file of the extension to "mdx_latex.py" and then, in the same directory, you can run the following command:
import markdown
md = markdown.Markdown(extensions=['latex'])
out = md.convert(text)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Integrating R code via rpy2 into a Python package - python

Related

Load file from same folder as Python 3.x script inside package

Correct way to link to a python file

how to use rpy2 within a packrat environment?

pdoc can't import the function from other module

Markdown to LaTeX in Python

Categories

Resources