I'm trying to follow the basic tutorial for fbprophet and am getting an error that doesn't really make sense on the Prophet.predict() method. My code follows the tutorial exactly:
import pandas as pd
import numpy as np
from fbprophet import Prophet
df = pd.read_csv("example_wp_peyton_manning.csv")
df['y'] = np.log(df['y'])
m = Prophet()
m.fit(df)
future = m.make_future_dataframe(periods = 365)
forecast = m.predict(future)
on the predict method, I get
ValueError: If using all scalar values, you must pass an index
I've seen this before when trying to use DataFrame constructors improperly, but this seems to be happening under the hood in the fbprophet code, which is strange because the passed dataframe comes from the package's own make_future_dataframe method. Has anyone else experienced this/know a work-around?
For context, I'm using Python 3.6.0, with Visual C++ 14.0, Numpy 1.13.1, Pandas 0.21.0, pystan 2.17.0.0 and fbprophet 0.2
There doesn't seem to be a tag for fbprophet and I don't have the reputation to make one
I got some other error but it works after adding:
...
m = Prophet()
m.daily_seasonality=True
...
Maybe you should try python 2.
Related
I'm trying to convert a pandas dataframe into an R dataframe using the guide here. The code I have so far is:
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
pd_df = pd.DataFrame({'int_values': [1, 2, 3],
'str_values': ['abc', 'def', 'ghi']})
with localconverter(ro.default_converter + pandas2ri.converter):
r_from_pd_df = ro.conversion.py2rpy(pd_df)
However, this is giving me the following error:
Traceback (most recent call last):
File <my_file_ref>, line 13, in <module>
r_from_pd_df = ro.conversion.py2rpy(pd_df)
AttributeError: module 'rpy2.robjects.conversion' has no attribute 'py2rpy'
I have found this relevant question where the OP refers to function names being changed however doesn't explain what the changes are. I've tried looking at the module but I'm not quite advanced enough to be able to make sense of it.
The accepted answer refers to checking versions which I've done and I'm definitely using rpy2 v3.3.3 which is the same as the guide I'm following.
Has anyone encountered this error and found a solution?
The section of the rpy2 documentation you are pointing out is built by running the code example. This is means that the example did work with the corresponding version of rpy2.
I am not sure you are using that version of rpy2 at runtime? For example, add print(rpy2.__version__) to check that this is the case.
For what is worth, the latest release in the rpy2 3.3.x series is 3.3.6 and there is probably no good reason to stay with 3.3.3. Otherwise rpy2 3.4.0 was just released; if using R 4.0 or greater, or the latest release of the R packages dplyr or ggplot2 together with their rpy2 wrapper, I would recommend to use that release.
[PS: I just tried your example with rpy2-3.4.0 and it runs without error]
I'm studying Spark 3.0.1 with pyspark, and have setup some data for simple OLS regression using
data = results.select('OrderMonthYear', 'SaleAmount').rdd.map(lambda row: LabeledPoint(row[1], [row[0]])).toDF()
The OrderMonthYear is my feature column (int), and SaleAmount is the response (float). The LabeledPoint method was imported from pyspark.mllib.regression. I then try to fit the regression model with
from pyspark.ml.regression import LinearRegression
lr = LinearRegression()
modelA = lr.fit(data, {lr.regParam:0.0})
to get this exception
IllegalArgumentException: requirement failed: Column features must be of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>> but was actually struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.
This is clearly not very helpful, as the required and passed features seem to be the same structs. I've searched online, and only found answers to this problem for java, or for someone building the struct themselves. The exception was thrown from a util function that was just throwing a java exception (#Hide where the exception came from that shows a non-Pythonic JVM exception message.), so I can't debug further.
MLlib and RDD-based MLlib functions are deprecated. I suggest using vector assembler of ML:
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression
data = spark.createDataFrame([[0,1],[1,2],[2,3]]).toDF('OrderMonthYear', 'SaleAmount')
va = VectorAssembler(inputCols=['SaleAmount'], outputCol='features')
data2 = va.transform(data)
lr = LinearRegression(labelCol='OrderMonthYear')
model = lr.fit(data2)
For anyone else following the same LI Learning course, based on some modifications to the accepted answer above to align more with what I was seeing in the course, here's what Cmd 4 cell should look like:
# convenience for specifying schema
from pyspark.ml.feature import VectorAssembler
data = VectorAssembler(inputCols=['OrderMonthYear'], outputCol='features').transform(results.select("OrderMonthYear", "SaleAmount")).drop('OrderMonthYear').withColumnRenamed('SaleAmount', 'label')
display(data)
Alternatively, you can use the following which also works:
from pyspark.ml.linalg import Vectors
data = results.rdd.map(lambda r: (Vectors.dense(r[0]), r[1])).toDF(["features","label"])
display(data)
Then you should be good to go. Note that you'll want to make the same changes to Cmd 4 in notebooks 4.4 and 4.5 as well. Hope this helps!
I am trying a simple prediction using ARIMA. The code below produces all NaNs as output for the argument of order of (1,1,3), but for order argument of (1,1,2) and (1,1,4) i am able to get proper(numerical) output. The same function works fine, in some other installations with older/newer versions of pandas, statsmodels and pmdarima. I checked related questions here in Stackoverflow, but since the same function with the same argument is working in other libraries, i assume there is nothing wrong with the order argument of (1,1,3) and probably the bug is with library versions or some other configuration. Any help is appreciated.
from statsmodels.tsa.arima_model import ARIMA
def testarima():
trainseries = pd.Series([600.00,10.00,405.00,900.00,500.00,500.00,500.00,500.00,500.00,
500.00,1000.00,533.00,2784.11,1775.00,300.00,4289.42,1270.00,
500.00,2145.00,1650.00,1750.00,785.00,4137.50,2450.00,2194.00,
1750.00,1000.00,2250.00,1000.00,1055.98,1000.00,250.00,450.00,
540.00,2247.50,200.00,820.00,570.00,555.00])
model = ARIMA(trainseries, order=(1, 1, 3))
# print("train: " + str(train))
try:
model_fit = model.fit(disp=0)
fc, se, conf = model_fit.forecast(24, alpha=0.05)
print('result: '+str(fc))
return fc
except:
return np.zeros(24)
statsmodels v 0.10.2
pmdarima v 1.5.1
pandas 0.25.3
python 3.7.5
There is a warning output
C:\Users\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:225: RuntimeWarning: invalid value encountered in log
Z_mat.astype(complex), R_mat, T_mat)
C:\Users\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\tsa\kalmanf\kalmanfilter.py:225: RuntimeWarning: invalid value encountered in true_divide
Z_mat.astype(complex), R_mat, T_mat)
C:\Users\AppData\Local\Programs\Python\Python37\lib\site-packages\statsmodels\base\model.py:492: HessianInversionWarning: Inverting hessian failed, no bse or cov_params available
'available', HessianInversionWarning)
But in other installation where this is working also these warnings appear, but there i get proper numerical output
use panda version 0.22.0.0
its not resolved then avoid statsmodel just import ARIMA, if you using tensorflow in colab its not supporting tensorflow 2 version
I am trying to build a score matching using pymatch. Unfortunately I am getting the following error
Fitting Models on Balanced Samples: 1\200Error: Unable to coerce to Series, length must be 1: given 1898
Here is my code
from sklearn.datasets.samples_generator import make_blobs
from pymatch.Matcher import Matcher
import pandas as pd
import numpy as np
X, y = make_blobs(n_samples=5000, centers=2, n_features=2, cluster_std=3.5)
df = pd.DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
df['population'] = np.random.choice([1, 0], size=len(df), p=[0.8, 0.2])
control = df[df.label == 1]
test = df[df.label == 0]
m = Matcher(test, control, yvar="population", exclude=['label'])
m.fit_scores(balance=True, nmodels=200)
if I ran this code I will get the error. I am quite sure that I was able to run this before, but after changing some versions, this doesn't work anymore. Unfortunately I wasn't able to fix it by going back to previous versions, so not sure what's going on here...
Downgrading pandas did not work for me, but I found where the problem is.
It is an error in the method _scores_to_accuracy() of Matcher.py. I downloaded the source file, edited the function on my local machine, and now it works fine.
https://github.com/benmiroglio/pymatch/issues/23
Please downgrade your pandas, to version 0.23.4.
Use the code:
pip install pandas==0.23.4
Simple code like this won't work anymore on my python shell:
import pandas as pd
df=pd.read_csv("K:/01. Personal/04. Models/10. Location/output.csv",index_col=None)
df.sample(3000)
The error I get is:
AttributeError: 'DataFrame' object has no attribute 'sample'
DataFrames definitely have a sample function, and this used to work.
I recently had some trouble installing and then uninstalling another distribution of python. I don't know if this could be related.
I've previously had a similar problem when trying to execute a script which had the same name as a module I was importing, this is not the case here, and pandas.read_csv is actually working.
What could cause this?
As given in the documentation of DataFrame.sample -
DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
Returns a random sample of items from an axis of object.
New in version 0.16.1.
(Emphasis mine).
DataFrame.sample is added in 0.16.1 , you can either -
Upgrade your pandas version to latest, you can use pip for that, Example -
pip install pandas --upgrade
Or if you don't want to upgrade, and want to sample few rows from the dataframe, you can also use random.sample(), Example -
import random
num = 100 #number of samples
sampleddata = df.loc[random.sample(list(df.index),num)]