Problems with reticulate in R studio and importing python modules

Problems with reticulate in R studio and importing python modules - python

I'm trying to run reticulate and import python modules within r studio (specifically R-markdown). The R code chunk seems to do what is expected (i.e. install the python modules) and not seem to produce any errors, but the python code chunk does not seem to do what is expected (i.e. import the installed packages). It does not produce any output (or errors) which is somewhat strange.
I've tried a fresh install of reticulate, using the devtools version of reticulate a fresh install of R Studio and using the full conda path rather than the name, but neither seem to be working. I'm at a loss to figure out what is going wrong. I've also searched stackoverflow for various answers already and have tried various suggestions, but nothing seems to be working). Additionally, I have miniconda installed and python installed as well (and python packages and scripts run perfectly fine). If anyone was able to help, that would be fantastic.
(Apologies for formatting, the last backticks indicating the end of the code chunks aren't showing up properly)
## R code chunk
```
```{r}
library("reticulate")
#devtools::install_github("rstudio/reticulate")
conda_create("my_project_env")
py_install(packages = c("numpy","pandas","scikit-learn","matplotlib","seaborn","statsmodels"))
py_install(packages = c("IPython"))
# Either of these seem to "work" for installation
#conda_install(packages = c("numpy","pandas","scikit-learn","matplotlib","seaborn","statsmodels"))
#conda_install(packages = c("IPython"))
conda_list()
use_condaenv("my_project_env")
```
The python code chunk below seems to "run" but does not produce any output or errors (such as the python module could not be found) and I am unable to use the modules.
## Python code chunk
```
```{python}
# Main packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
```

It seems like the solution was to run the imports in an r studio code chunk in the following manner :
library("reticulate")
conda_create("my_project_env")
py_install(packages = c("numpy","pandas","scikit-learn","matplotlib","seaborn","statsmodels"))
conda_list()
use_condaenv("full_path_to_python_for my_project_env")
py_run_string('import numpy as np')
py_run_string('import pandas as pd')
py_run_string('import matplotlib.pyplot as plt')
py_run_string('import seaborn as sns')
From there, I was able to run the python functions in the python code chunks without issue.

Related

netCDF4 has no module Dataset using Anaconda

I am using Anaconda to manage my environments, and I have a strange problem with netCDF4.
I have several Jupyter notebooks in my environment that I've been using with netCDF4, no problem at all. I'm only interested in reading NetCDF files, so I'm only really using the Dataset.
Now I'm implementing the algorithm from my Jupyter notebooks in a Python package, and I get this error (in VS Code):
No name 'Dataset' in module 'netCDF4'
I can see that it's installed in Anaconda Navigator and if I try to do a pip install it reports that netcdf4 is already installed and all dependencies are met.
I've read similar-sounding posts here and they do not solve my problem.
In response to a comment, the error is where I import Dataset:
from netCDF4 import Dataset
This also gives the error:
import netCDF4 as nc
salinity_data = nc.Dataset(<file name etc...>)
The code completion does not show anything in the netCDF4 package other than some "_" prefixed variables.
I'm using Python 3.8.12 and I am using the correct virtual environment that I set up with Anaconda.

The error message is coming from pylint, not the Python interpreter (see comments above).
The code will run fine, so the issue is with pylint and the configuration. I can suppress the error by:
from netCDF4 import Dataset #pylint: disable=no-name-in-module
This will do for now, but at some point, I'd like to figure out why pylint is reporting this.
I have also found a package that works better for what I want to do wth the netCDF files:
https://github.com/h5netcdf/h5netcdf
This doesn't have all the hidden dependencies that netCDF4 does, and has a "legacyapi" that is a drop-in replacement for the netcdf package:
import hfnetcdf.legacyapi as nc
my_data = nc.Dataset('my_data_file.nc', 'r')

Inconsistent results on Jupyer notebook and Intellij IDE: Python

I am trying to compute medcouple using robustats module in python.
https://github.com/FilippoBovo/robustats
The results on jupyter note book and IntelliJ Ide do not match.
Result on my jupyter notebook:
from robustats import medcouple
import numpy as np
x = np.array([9325.06, 6206.00, 10000.00, 9569.78])
print(medcouple(x))
-0.6442066420664204
Result on IntelliJ Ide
from robustats import medcouple
import numpy as np
x = np.array([9325.06, 6206.00, 10000.00, 9569.78])
print(medcouple(x))
nan
Has anyone came across this strange behaviour. Please do let me know if I have to change any setting on IDE.
I have made sure both are running against the same virtual env

it is not reproducible for me in 2021.1.3, robustats 0.1.7, Python 3.6, venv.
Could you please provide me with your setup details?
Please submit an issue at https://youtrack.jetbrains.com/issues/PY
with logs folder zipped from Help | Collect logs and Diagnostic Data and a screencast presenting the behaviour.
Information on how to use YouTrack: https://intellij-support.jetbrains.com/hc/en-us/articles/207241135-How-to-follow-YouTrack-issues-and-receive-notifications

Python - Can't Get Pandas and Numpy Working in Visual Studio Code or Eclipse

I'm fairly new to IDE's and I'm trying to take courses in Python. No matter what I try, I cannot successfully run a python script that has import pandas and import numpy in it in either Visual Studio Code or Eclipse (running on Windows 10). I have Python 3.8 installed, and when I try running those commands in the shell it works fine. I suspect when I try executing an actual Python script instead of using the console, it might be using a different interpreter, and I only get errors when I try doing this, saying numpy is not defined. I also get the error "cannot import name 'numpy' from partially initialized module 'pandas' (most likely due to a circular import)" when I specify "from pandas import numpy" rather than "from pandas import *".
I am very frustrated and don't know how to fix this problem. I've tried searching for help but not having a programming background, I don't know where to go to resolve this or how.
I also cannot get pip or pip3 to work at all to install packages. Those commands don't get recognized.
Please help!

I recommend using Jupyter Notebooks/pycharm(IDE). Both are very useful for learning python and working with data, data manipulation, and data visualizations.
PyCharm knows everything about your code. Rely on it for intelligent code completion, on-the-fly error checking and quick-fixes & easy project navigation.
While
Jupyter Notebooks can run line by line, rerun specific lines after making changes, and it's inline output is very useful for debugging and visualizations. You can get it from https://jupyter.org.
Zepellin Notebooks can also serve as alternatives.

How to import pandas using R studio

So, just to be clear, I'm very new to python coding... so I'm not exactly sure what's going wrong.
Yesterday, whilst following a tutorial on calling python from R, I successfully installed and used several python packages (e.g., NumPy, pandas, matplotlib etc).
But today, when trying to run the exact same code, I'm getting an error when trying to import pandas (NumPy is importing without any errors). The error states:
ModuleNotFoundError: No module named 'pandas'
I'm not sure what's going on!?
I'm using R-Studio (running on a Mac)... here's a code snippet of how I'm doing it:
library(reticulate)
os <- import("os") # Setting directory
os$getcwd()
repl_python() #used to make it interactive
import numpy as np. # Load numpy package
import pandas as pd # Load pandas package
At this point, it's throwing me an error. I've tried googling the answer and searching here, but to no avail.
Any suggestions as to how I'd fix this problem, or what is going on?
Thanks

Possibly your python path for reticulate changed upon reloading Rstudio. Here is how to set the path manually (filepath for Linux or Mac):
library(reticulate)
path_to_python <- "~/anaconda3/bin/python"
use_python(path_to_python)
https://stackoverflow.com/a/45891929/4549682
You can check your Python path with py_config(): https://rstudio.github.io/reticulate/articles/versions.html#configuration-info
I recommend using Anaconda for your Python distribution (you might have to use Anaconda anyway for reticulate, not sure). Download it from here: https://www.anaconda.com/distribution/#download-section
Then you can create the environment for reticulate to use:
conda_create('r-reticulate', packages = "python=3.5")
I use Python 3.5 for some specific packages, but you can change that version or leave it as just 'python' for the latest version.
https://www.rdocumentation.org/packages/reticulate/versions/1.10/topics/conda-tools
Then you want to install the packages you need (if they aren't already) with
conda_install('re-reticulate', packages = 'numpy')
The way I use something like numpy is
np <- import('numpy')
np$arange(10)

You need to set the second argument of the function use_python, so it should be:
For example, use_python("/users/my_user/Anaconda3/python.exe",required = TRUE)
DON'T forget required = TRUE

Not able to import pandas in R

I am calling a python script from R/shiny as:
system("python /Users/Downloads/Untitled3.py EMEA regulatory '10% productivity saves SOW'")
It is not able to import pandas.
But when I straight call the script from the terminal as:
python /Users/Downloads/Untitled3.py EMEA regulatory '10% productivity saves SOW'
It is able to import pandas. I guess some version issue in python.I have anaconda installed. Can anyone of you please help me in rectifying the issue.
Although not required as, script starts as:
import pandas as pd
import numpy as np
import sys
from difflib import SequenceMatcher
##### More code#########

Problem
You have the default system python and then the anaconda distribution as well.
Merely running the command that you are running from R calls the default system python that doesn't have the required packages.
Fix
Assuming you have anaconda installed at /Users/<username>/anaconda/bin/python (that's the default mac installation folder),
the R command that you should run is -
system("/Users/<username>/anaconda/bin/python /Users/Downloads/Untitled3.py EMEA regulatory '10% productivity saves SOW'")
This ensures that you are explicitly using anaconda's python binaries which will pick up on the pandas and other relevant libraries installed there.
Hope that helps!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems with reticulate in R studio and importing python modules - python

Related

netCDF4 has no module Dataset using Anaconda

Inconsistent results on Jupyer notebook and Intellij IDE: Python

Python - Can't Get Pandas and Numpy Working in Visual Studio Code or Eclipse

How to import pandas using R studio

Not able to import pandas in R

Categories

Resources