Passing a pandas data frame through an R function using rpy2

Passing a pandas data frame through an R function using rpy2 - python

I am trying to reproduce R results in Python. The following R code works:
library("TTR")
library("zoo")
library("xts")
library("quantmod")
getSymbols("^GSPC",from = "2014-01-01", to = "2015-01-01")
dataf = GSPC[,c("GSPC.High", "GSPC.Low", "GSPC.Close")]
result = CCI(dataf, n=20, c=0.015)
But not the following Python code:
from datetime import datetime
from rpy2.robjects.packages import importr
TTR = importr('TTR')
import pandas_datareader as pdr
from rpy2.robjects import pandas2ri
pandas2ri.activate()
GSPC = pdr.get_data_yahoo(symbols='^GSPC', start=datetime(2014, 1, 1), end=datetime(2015, 1, 1))
dataf = GSPC[['High', 'Low', 'Close']]
result = TTR.CCI(dataf, n=20, c=0.015)
The error I get occurs on the last line when using TTR.CCI. Traceback and error returned is:
Traceback (most recent call last):
File "svm_strat_test_oliver.py", line 30, in <module> result = TTR.CCI(dataf, n=20, c=0.015)
File "/usr/local/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in `[.data.frame`(center, beg:NROW(x)) : undefined columns selected

Your data.frame in the R code is actually an "xts" "zoo" object you just need to convert it to one in the python code:
rzoo = importr('zoo')
datazoo = zoo.as_zoo_xts(dataf)
result = TTR.CCI(datazoo, n=20, c=0.015)

Related

"String indices must be integers" error in pandas & yfinance - Python

Here is my code that I am using:
import warnings
import datetime
import numpy as np
import pandas as pd
import pandas_datareader.data as pdr
import matplotlib.pyplot as plt, mpld3
import matplotlib.ticker as mtick
date_from = datetime.date(2020, 1, 1)
date_to = datetime.date(2022, 12, 30)
tickerL = 'BTC-USD'
print("Comparing " + tickerL +" to...")
\#tickerL2 = \['AAPL'\]
tickerL2 = input('Now enter your comparison ticker:\\n')
tickerList = \[tickerL, tickerL2\]
print(tickerList)
\#tickerList = \['BTC-USD', 'AMZN', 'AAPL', 'CL=F', '^GSPC', '^DJI', 'GC=F'\]
\#fetch multiple asset data
def getMultiAssetData(tickerList, date_from, date_to):
def getData(ticker):
data = pdr.DataReader(ticker, "yahoo", date_from, date_to)
return data
datas = map(getData, tickerList)
return pd.concat(datas, keys=tickerList, names=['Ticker', 'Date'])
sort=False
multiData = getMultiAssetData(tickerList, date_from, date_to)
df = multiData.copy()
\#print(df)
df = df.loc\[tickerL, :\]
df.tail()
Now I keep getting this error and I don't know how to move forward:
Traceback (most recent call last):
File "main.py", line 51, in \<module\>
multiData = getMultiAssetData(tickerList, date_from, date_to)
File "main.py", line 45, in getMultiAssetData
datas = list(map(getData, tickerList))
File "main.py", line 42, in getData
data = pdr.DataReader(ticker, "yahoo", date_from, date_to)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/util/\_decorators.py", line 207, in wrapper
return func(\*args, \*\*kwargs)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas_datareader/data.py", line 370, in DataReader
return YahooDailyReader(
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas_datareader/base.py", line 253, in read
df = self.\_read_one_data(self.url, params=self.\_get_params(self.symbols))
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas_datareader/yahoo/daily.py", line 153, in \_read_one_data
data = j\["context"\]\["dispatcher"\]\["stores"\]\["HistoricalPriceStore"\]
TypeError: string indices must be integers
This script worked just fine a couple of months ago but I assume some of the packages got updated and now require different format.

Using rpy2 with streamlit

I am trying to build an app by using python and rpy2. I read a .csv file (see below, table_mean_plain) and I would like to plot an histogram of its percentage of explained variances. When I use the code in Jupyter notebook it is working fine. But when I try through streamlit the code is loading forever and I receive the message:
2022-10-25 17:08:05.231 Uncaught app exception
Traceback (most recent call last):
File "/opt/anaconda3/envs/XX/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 562, in _run_script
exec(code, module.__dict__)
File "/Users/XX/Documents/XX/pages/4_Visualisations.py", line 69, in <module>
res_pca = FactoMineR.PCA(my_data)
File "/opt/anaconda3/envs/XX/lib/python3.9/site-packages/rpy2/robjects/functions.py", line 201, in __call__
return (super(SignatureTranslatedFunction, self)
File "/opt/anaconda3/envs/XX/lib/python3.9/site-packages/rpy2/robjects/functions.py", line 124, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
File "/opt/anaconda3/envs/XX/lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
cdata = function(*args, **kwargs)
File "/opt/anaconda3/envs/XX/lib/python3.9/site-packages/rpy2/rinterface.py", line 810, in __call__
raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Erreur dans (function (title, width, height, pointsize, family, antialias, :
impossible de créer le périphérique cible quartz(), le type fourni n'est peut-être pas supporté
Assertion failed: (NSViewIsCurrentlyBuildingLayerTreeForDisplay() != currentlyBuildingLayerTree), function NSViewSetCurrentlyBuildingLayerTreeForDisplay, file NSView.m, line 13477.
[9] 59974 illegal hardware instruction streamlit run 1_Workbench.py
Here is my code
from rpy2.robjects.packages import importr, data
from rpy2.robjects.vectors import DataFrame
from rpy2.ipython.ggplot import image_png
import streamlit as st
import matplotlib.pyplot as plt
from rpy2 import robjects
import matplotlib.image as mpimg
utils = importr('utils')
corrplot = importr('corrplot')
FactoMineR = importr('FactoMineR')
factoextra = importr('factoextra')
my_datafile = 'table/plain/table_mean_plain.csv'
#my_data = utils.read_csv(my_datafile)
my_data = DataFrame.from_csvfile(my_datafile, row_names="X")
#rownames(my_data = my_data$X
#my_dataX = NULL
# Do PCA
# --------
res_pca = FactoMineR.PCA(my_data)
# Eigen values / Variance
# --------
eig_val = factoextra.get_eigenvalue(res_pca)
st.pyplot(display(image_png(factoextra.fviz_eig(res_pca, addlabels = True))))
The corresponding .csv file is
,logistic-regression,svm-linear,svm-rbf,Gnb,decision-tree,random-forest,XGBoost,MLP,Ensemble,GEV,iForest,DevNet
news,0.827,0.566,0.724,0.681,0.687,0.852,0.865,0.859,0.866,0.87,0.497,0.666
telE,0.761,0.552,0.623,0.756,0.86,0.944,0.957,0.915,0.95,0.941,0.399,0.717
bank,0.832,0.62,0.772,0.817,0.687,0.848,0.841,0.848,0.855,0.873,0.625,0.793
member,0.648,0.488,0.514,0.625,0.574,0.696,0.695,0.669,0.707,0.667,0.545,0.598
dsn,0.735,0.708,0.785,0.707,0.761,0.884,0.897,0.754,0.884,0.766,0.568,0.661
mobile,0.889,0.432,0.393,0.842,0.776,0.9,0.906,0.9,0.908,0.907,0.7,0.852
campaign,0.907,0.543,0.668,0.809,0.709,0.928,0.933,0.923,0.935,0.924,0.65,0.832
HR,0.85,0.766,0.841,0.762,0.615,0.809,0.797,0.825,0.838,0.827,0.589,0.72
sato,0.794,0.747,0.8,0.729,0.651,0.808,0.809,0.799,0.825,0.82,0.5,0.739
uci,0.854,0.586,0.897,0.859,0.818,0.901,0.905,0.847,0.913,0.91,0.663,0.793
TelC,0.844,0.632,0.797,0.814,0.658,0.819,0.825,0.842,0.841,0.85,0.295,0.784
median_AUC,0.832,0.586,0.772,0.762,0.687,0.852,0.866,0.847,0.866,0.87,0.568,0.739
Rank,5.67,10.67,7.5,8.08,9.0,4.0,3.25,5.0,1.83,2.67,11.42,8.92

An error occured to code like that "argument of type 'method' is not iterable"

I would like to predict future stock price and I tried to create calculate function but when I run code below I found an error. I am not sure if I missing (), or not. Could you please advice me?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
Gold_train_data = pd.read_csv('Gold Data Last Year.csv', index_col=False)
Gold_test_data = pd.read_csv('Gold Data Last Month.csv', index_col=False)
current_train_data = Gold_train_data
current_test_data = Gold_test_data
NUM_train_data = 266
NUM_test_data = 22
def load_stock_data(stock_name, num_data_points):
data = pd.read_csv(stock_name,
skiprows=0,
nrows=num_data_points,
usecols=['Price', 'Open', 'Vol.'])
final_prices = data['Price'].astype(str).str.replace(',','').astype(np.float)
opening_prices = data['Open'].astype(str).str.replace(',', '').astype(np.float)
volumes = data['Vol.'].str.strip('MK').astype(np.float)
return final_prices, opening_prices, volumes
def calculate_price_differences(final_prices, opening_prices):
price_differences = []
for d_i in range(len(final_prices) - 1):
price_difference = opening_prices[d_i + 1] - final_prices[d_i]
price_differences.append(price_difference)
return price_differences
print(load_stock_data(current_test_data, NUM_test_data))
The above is the code and the below is the error shown as below:
Traceback (most recent call last):
Input In [6] in <cell line: 1>
print(load_stock_data(current_test_data, NUM_test_data))
Input In [4] in load_stock_data
data = pd.read_csv(stock_name,
File ~\Anaconda3\lib\site-packages\pandas\util\_decorators.py:311 in wrapper
return func(*args, **kwargs)
File ~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py:680 in read_csv
return _read(filepath_or_buffer, kwds)
File ~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py:575 in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File ~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py:933 in __init__
self._engine = self._make_engine(f, self.engine)
File ~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py:1217 in _make_engine
self.handles = get_handle( # type: ignore[call-overload]
File ~\Anaconda3\lib\site-packages\pandas\io\common.py:661 in get_handle
if _is_binary_mode(path_or_buf, mode) and "b" not in mode:
File ~\Anaconda3\lib\site-packages\pandas\io\common.py:1128 in _is_binary_mode
return isinstance(handle, _get_binary_io_classes()) or "b" in getattr(
TypeError: argument of type 'method' is not iterable

The issue is with the function call
print(load_stock_data(current_test_data, NUM_test_data))
current_test_data was the pandas dataframe you loaded previously. Passing the variable to load_stock_data() results in the function trying to execute
data = pd.read_csv(current_test_data,
skiprows=0,
nrows=num_data_points,
usecols=['Price', 'Open', 'Vol.'])
Note that by default the first argument to pd.read_csv() by convention is the pathname to the csv file. Hence the
error arises when it thinks current_test_data is the pathname but it is actually a pandas.DataFrame.
What I think you want to do could be achieved by using
print(load_stock_data('Gold Data Last Month.csv', NUM_test_data))
instead. Feeding the pathname to the function call instead of the dataframe itself.

Unable to read from object of type: <class 'numpy.ndarray'>

I have the following python code which I am trying to output to a directory based on timestamp.
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import uuid
data = {'date': ['2018-03-04T14:12:15.653Z', '2018-03-03T14:12:15.653Z', '2018-03-02T14:12:15.653Z', '2018-03-05T14:12:15.653Z'],
'battles': [34, 25, 26, 57],
'citys': ['london', 'newyork', 'boston', 'boston']}
df = pd.DataFrame(data, columns=['date', 'battles', 'citys'])
df['date'] = df['date'].map(lambda t: pd.to_datetime(t, format="%Y-%m-%dT%H:%M:%S.%fZ"))
df.groupby(by=['citys'])
dst_path = "logs/year=" + df['date'].dt.year.astype('str').unique() + "/month=" + df['date'].dt.month.astype('str').unique() + "/day=" + df['date'].dt.day.astype('str').unique() + "/" + str(uuid.uuid4()) + ".parq"
table = pa.Table.from_pandas(df)
pq.write_table(table, dst_path)
but i am seeing the following error:
python3 test.py
Traceback (most recent call last):
File "test.py", line 15, in <module>
pq.write_table(table, dst_path)
File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 943, in write_table
**kwargs)
File "/usr/local/lib/python3.6/site-packages/pyarrow/parquet.py", line 286, in __init__
**options)
File "pyarrow/_parquet.pyx", line 837, in pyarrow._parquet.ParquetWriter.__cinit__ (/Users/travis/build/BryanCutler/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/_parquet.cxx:14606)
File "pyarrow/io.pxi", line 835, in pyarrow.lib.get_writer (/Users/travis/build/BryanCutler/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/lib.cxx:59078)
TypeError: Unable to read from object of type: <class 'numpy.ndarray'>
how can i create a directory from pandas timestamp?

Your dst_path is a numpy array.
print(type(dst_path))
output
<class 'numpy.ndarray'>
it should be a string, so below the dst_path line I added the following and it worked. Its not elegant so you can investigate a better way to do it. The point here is that you need a string.
dst_path = str(dst_path[0])
Note that the directory has to be already there or you will get an error, so you can write the following prior to write_table.
import os
dir, file = os.path.split(dst_path)
if not os.path.exists(dir):
os.makedirs(dir)

Unexpected keyword 'auth_token' using python gdata GetContacts()

Why do I get the keyword error shown below when running the following code? It is failing on the last line: feed = gc.GetContacts().
Code
from oauth2client.client import OAuth2Credentials
import gdata.contacts.client
authfn = '/home/ms/gcontactsback.oauth'
f = open(authfn, "r")
credentials = OAuth2Credentials.from_json(f.read())
f.close()
gc = gdata.contacts.client.ContactsClient(source='gback')
gc = credentials.authorize(gc)
feed = gc.GetContacts()
Output
(py)ms#ny:~/py$ ./gcontacts.py
Traceback (most recent call last):
File "./gcontacts.py", line 19, in <module>
feed = gc.GetContacts()
File "/home/ms/py/local/lib/python2.7/site-
packages/gdata/contacts/client.py", line 201, in get_contacts
desired_class=desired_class, **kwargs)
File "/home/ms/py/local/lib/python2.7/site-packages/gdata/client.py",
line 640, in get_feed
**kwargs)
File "/home/ms/py/local/lib/python2.7/site-
packages/oauth2client/util.py", line 137, in positional_wrapper
return wrapped(*args, **kwargs)
TypeError: new_request() got an unexpected keyword argument 'auth_token'
(py)ms#ny:~/py$

I got it working this way:
gc = gdata.contacts.client.ContactsClient(source='gback')
auth2token = gdata.gauth.OAuth2TokenFromCredentials(credentials)
gc = auth2token.authorize(gc)
feed = gc.GetContacts()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Passing a pandas data frame through an R function using rpy2 - python

Your data.frame in the R code is actually an "xts" "zoo" object you just need to convert it to one in the python code: rzoo = importr('zoo') datazoo = zoo.as_zoo_xts(dataf) result = TTR.CCI(datazoo, n=20, c=0.015)

Related

"String indices must be integers" error in pandas & yfinance - Python

Using rpy2 with streamlit

An error occured to code like that "argument of type 'method' is not iterable"

Unable to read from object of type: <class 'numpy.ndarray'>

Unexpected keyword 'auth_token' using python gdata GetContacts()

Categories

Resources