How to access the global environment in R using rpy2 in Python? - python

I am trying to access a dataframe from R global environment and import it into Python in Pycharm IDE. But, I am not able to figure how to do it.
I tried the following:
Since, I don't know how to access the global environment where my target data.frame is stored. I created another R script (myscript.R) where I converted to data.frame into a rds object and called it again.
save(dfcast, file = "forecast.rds")
my_data <- readRDS(file = "forecast.rds")
However, when I try to read the rds in python using the following code in Python:
import os
import pandas as pd
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage
cwd = os.getcwd()
pandas2ri.activate()
os.chdir('C:/Users/xx/myscript.R')
readRDS = robjects.r['readRDS']
df = readRDS('forecast.rds')
df = pandas2ri.ri2py(df)
df.head()
I get the following error:
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file 'forecast.rds', probable reason 'No such file or directory'
Please show the way to deal with this. I just want to access a data.frame from R in Python.
The data.frame is actually a forecast generated from another R script which takes about 7-8 minutes to run. So, instead of running it again on Python , i want it to process in R and import the forecast dataframe to python for further analysis. Since, I am in the midst of building further analysis module. I don't want the R forecast function to run again and again while I am debugging my analysis module. Hence, I want to directly access it from R.

Related

Create a new folder on HSDS server with Python

How is a new folder created using h5pyd module in Python?
For example, I have the domain /home/user/ and I want to create a folder /home/user/data1/.
From the command line I can use the following command:
hstouch /home/user/data1/
What is the equivalent in h5pyd?
See below for a simplified example of what I am trying to do.
import h5pyd
import numpy as np
with h5pyd.File("/home/user/data1/myfile.h5", "w") as f:
dset = f.create_dataset("mydataset", (100,), dtype='i')
However, because /home/user/data1/ does not exist, I get a 404 error.
You would just do:
h5pyd.Folder("/home/user/data1/", mode="w')
There's a test case for this at:
https://github.com/HDFGroup/h5pyd/blob/master/test/hl/test_folder.py#L178

Import csv from Kaggle url into a pandas DataFrame

I want to import a public dataset from Kaggle (https://www.kaggle.com/unsdsn/world-happiness?select=2017.csv) into a local jupyter notebook. I don't want to use any credencials in the process.
I saw diverse solutions including: pd.read_html, pd.read_csv, pd.read_table (pd = pandas).
I also found the solutions that imply a login.
The first set of solutions are the ones I am interested in, though I see that they work on other websites because there is a link to the raw data.
I have been clincking everywhere in the kaggle interface but find no direct url to raw data.
Bottom line: Is it possible to use say pd.read_csv to directly get data from the website into your local notebook? If so, how?
You can automate kaggle.cli
follow the instructions to download and save kaggle.json for authentication https://github.com/Kaggle/kaggle-api
import kaggle.cli
import sys
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
# download data set
# https://www.kaggle.com/unsdsn/world-happiness?select=2017.csv
dataset = "unsdsn/world-happiness"
sys.argv = [sys.argv[0]] + f"datasets download {dataset}".split(" ")
kaggle.cli.main()
zfile = ZipFile(f"{dataset.split('/')[1]}.zip")
dfs = {f.filename:pd.read_csv(zfile.open(f)) for f in zfile.infolist() }
dfs["2017.csv"]

How to load R's .rdata files into Python?

I am trying to convert one part of R code in to Python. In this process I am facing some problems.
I have a R code as shown below. Here I am saving my R output in .rdata format.
nms <- names(mtcars)
save(nms,file="mtcars_nms.rdata")
Now I have to load the mtcars_nms.rdata into Python.
I imported rpy2 module. Then I tried to load the file into python workspace. But could not able to see the actual output.
I used the following python code to import the .rdata.
import pandas as pd
from rpy2.robjects import r,pandas2ri
pandas2ri.activate()
robj = r.load('mtcars_nms.rdata')
robj
My python output is
R object with classes: ('character',) mapped to:
<StrVector - Python:0x000001A5B9E5A288 / R:0x000001A5B9E91678>
['mtcars_nms']
Now my objective is to extract the information from mtcars_nms.
In R, we can do this by using
load("mtcars_nms.rdata");
get('mtcars_nms')
Now I wanted to do the same thing in Python.
There is a new python package pyreadr that makes very easy import RData and Rds files into python:
import pyreadr
result = pyreadr.read_r('mtcars_nms.rdata')
mtcars = result['mtcars_nms']
It does not depend on having R or other external dependencies installed.
It is a wrapper around the C library librdata, therefore it is very fast.
You can install it very easily with pip:
pip install pyreadr
The repo is here: https://github.com/ofajardo/pyreadr
Disclaimer: I am the developer.
Rather than using the .rdata format, I would recommend to use feather, which allows to efficiently share data between R and Python.
In R, you would run something like this:
library(feather)
write_feather(nms, "mtcars_nms.feather")
In Python, to load the data into a pandas dataframe, you can then simply run:
import pandas as pd
nms = pd.read_feather("mtcars_nms.feather")
The R function load will return an R vector of names for the objects that were loaded (into GlobalEnv).
You'll have to do in rpy2 pretty much what you are doing in R:
R:
get('mtcars_nms')
Python/rpy2
robjects.globalenv['mtcars_nms']

TabPy - Invalid file path or buffer object type

Started out with TabPy recently.
I have seen that python codes directly doesnt work inside Tableau (which usually runs well inside Python environment) or maybe I am doing something wrong.
Here is what I am facing -
I wrote the code
FLOAT(SCRIPT_REAL('
import pandas as pd
import numpy as np
from scipy import stats
# In[152]:
# Reading input file
data_file = pd.read_csv(_arg1)
a1 = data_file([Actualmax])
return a1' , '/User/****/caution new/7S.csv
# In[153]:
# Calculate Mean
mn = np.mean(a1)
return mn
'))
using this to find the mean from the column actualmax from the file 7S.
The same code runs well inside Python but somehow I am getting an error message -
After that, I even tried something like this - to use the Column as an argument instead of importing the file from the local system, because the file is already inside Tableau
INT(SCRIPT_STR("
import pandas as pd
import numpy as np
from scipy import stats
# In[152]:
# Reading input file
data_file = pd.read_csv(_arg1)
# In[153]:
# Calculate Mean
mn = np.mean(_arg1)
return mn
",SUM([Actualmax])))
There are no syntax errors but the error remains the same.
I get the result when I write something like this -
SCRIPT_INT("
import pandas as pd
import numpy as np
from scipy import stats
# In[152]:
# Reading input file
#data_file = pd.read_csv(arg)
# In[153]:
# Calculate Mean
mn = np.mean(arg)
return mn
",AVG([Actualmax]))
But this isn't something I want to go to - as it is using the AVG function inside Tableau and not the power of Python.
What am I doing wrong in here? How should I proceed?
Apparently the answer was pretty simple - I followed Bora Beran's post in the link given below -
https://community.tableau.com/docs/DOC-10856
under the section - Using Every Row of Data - Disaggregated Data
The new code is
(SCRIPT_REAL("
import numpy as np
# Normality test
#return _arg1
mn = np.mean(_arg1)
return mn
",ATTR([Actualmax])))
Hope this helps anyone else who was going through this issue.
Happy Tableau'ing.

Import csv Python with Spyder

I am trying to import a csv file into Python but it doesn't seem to work unless I use the Import Data icon.
I've never used Python before so apologies is I am doing something obviously wrong. I use R and I am trying to replicate the same tasks I do in R in Python.
Here is some sample code:
import pandas as pd
import os as os
Main_Path = "C:/Users/fan0ia/Documents/Python_Files"
Area = "Pricing"
Project = "Elasticity"
Path = os.path.join(R_Files, Business_Area, Project)
os.chdir(Path)
#Read in the data
Seasons = pd.read_csv("seasons.csv")
Dep_Sec_Key = pd.read_csv("DepSecKey.csv")
These files import without any issues but when I execute the following:
UOM = pd.read_csv("FINAL_UOM.csv")
Nothing shows in the variable explorer panel and I get this in the IPython console:
In [3]: UOM = pd.read_csv("FINAL_UOM.csv")
If I use the Import Data icon and use the wizard selecting DataFrame on the preview tab it works fine.
The same file imports into R with the same kind of command so I don't know what I am doing wrong? Is there any way to see what code was generated by the wizard so I can compare it to mine?
Turns out the data had imported, it just wasn't showing in the variable explorer

Categories

Resources