random subspace ensemble classifier python - python

i am in python i search for implementation for random subspace ensemble classifier , and i found the following code in github
https://github.com/mwygoda/randomSubspaceImplementation/blob/master/solution.py
author depend in this two lib
from utils import prepare_data_from_file
from utils import plot_results
i try to install utils using pip3, it installed and worked when i run import utils as ut put still get error cannot import name 'plot_results' or'prepare_data_from_file'
any one help me how can i fix it

That file is not in the repo. You will have to implement it yourself.
from the code it looks like it returns a feature vector and target labels.
e.g
def prepare_data_from_file(file):
import pandas as pd
df = pd.read_csv(file)
return df['A'], df['B']
But this is mere speculation. Now get off stackoverflow and go do your assisgnment.

Related

Failed to Importing Adida from statsforecast.models in Python

I was trying to replicate this code for stat forecasting in python, I came across the issue of not being able to load this model 'adida' form statsforecast library,
Here is the link for reference : https://towardsdatascience.com/time-series-forecasting-with-statistical-models-f08dcd1d24d1
import random
from itertools import product
from IPython.display import display, Markdown
from multiprocessing import cpu_count
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from nixtlats.data.datasets.m4 import M4, M4Info
from statsforecast import StatsForecast
from statsforecast.models import (
adida,
croston_classic,
croston_sba,
croston_optimized,
historic_average,
imapa,
naive,
random_walk_with_drift,
seasonal_exponential_smoothing,
seasonal_naive,
seasonal_window_average,
ses,
tsb,
window_average
)
Attached is the error message, Can you please have a look at this and let me know why is there an issue in importing this?
Given below is the error image:
I did some research and figured out the issue is probably with the version, try installing this specific version of statsforecast
pip install statsforecasts==0.6.0
Trying loading these models after that, hopefully this should work.
As of v1.0.0 of StatsForecast, the API changed to be more like sklearn, using classes instead of functions. You can find an example of the new syntax here: https://nixtla.github.io/statsforecast/examples/IntermittentData.html.
The new code would be
from statsforecast import StatsForecast
from statsforecast.models import ADIDA, IMAPA
model = StatsForecast(df=Y_train_df, # your data
models=[ADIDA(), IMAPA()],
freq=freq, # frequency of your data
n_jobs=-1)
If you want to use the old syntax, setting the version as suggested should work.
If you have updated the package ..use ADIDA it will work
see the model list name with new packages
ADIDA(),
IMAPA(),
(SimpleExponentialSmoothing(0.1)),
(TSB(0.3,0.2)),
(WindowAverage( 6))

Import Error: cannot import name 'tree' from 'sklearn.tree'

I am on my second day of re-taking Python for the gazillionth time!
I am doing a tutorial on ML in Python, using the following code:
import sklearn.tree
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import tree
music_data = pd.read_csv('music.csv')
x = music_data.drop(columns=['genre'])
y = music_data['genre']
model = DecisionTreeClassifier()
model.fit(x,y)
tree.export_graphviz(model, out_file='music-recommender.dot',
feature_names=['age','gender'],
class_names= sorted(y.unique()),
label='all',
rounded=True,
filled=True)
I keep getting the following error:
ImportError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13088/3820271611.py in <module>
2 import pandas as pd
3 from sklearn.tree import DecisionTreeClassifier
----> 4 from sklearn.tree import tree
5
6 music_data = pd.read_csv('music.csv')
ImportError: cannot import name 'tree' from 'sklearn.tree' (C:\Anaconda\lib\site-packages\sklearn\tree\__init__.py)
I've tried to find a solution online, but I don't think it's the version of Python/Anaconda because I literally just installed both. I also don't think it's the sklearn.tree since I was able to import DecisionClassifer.
As this answer indicates, you're looking at some older code; this is always a risk with programming. But there's another thing you need to know about your code.
First off, scikit-learn contains several modules, and almost everything you need from it is in one of those. In my experience, most people import things like this:
from sklearn.tree import DecisionTreeRegressor # A regressor class.
from sklearn.tree import plot_tree # A helpful function.
from sklearn.metrics import mean_squared_error # An evaluation function.
It looks like the tutorial wants something similar to plot_tree(). This new-ish function is much easier to use than the older Graphviz visualization. So unless you really need the DOT file for some reasons, you should be able to do this:
from sklearn.tree import plot_tree
sklearn.tree.plot_tree(model)
Bottom line: there will probably be more broken things in that material. So if I were you I'd either make a new environment with a version of sklearn matching whatever material you're using... or ditch that material and look for something newer.
from sklearn.tree import tree looks wrong. Did you mean from sklearn import tree ?
According to the official Scikit Learn Decision Trees Documentation you really do not need too much of importing.
It can be done simply as follows:
from sklearn import tree
import pandas as pd
music_data = pd.read_csv('music.csv')
X = music_data.drop(columns=['genre'])
y = music_data['genre']
model = tree.DecisionTreeClassifier()
model.fit(X,y)

Using xgboost in Pyspark gives ImportError: cannot import name 'JavaPredictionModel'

This is my first attempt to use xgboost in pyspark so my experience with Java and Pyspark is still in learning phase.
I saw an awesome article in towards datascience with title PySpark ML and XGBoost full integration tested on the Kaggle Titanic dataset where the author goes through use case of xgboost in pyspark.
I tried to follow the steps but was hit with ImportError.
Installation
I have downloaded two jar files from maven and put them in the same directory where my notebook is.
xgboost4j version 0.72
xgboost4j-spark version 0.72
I have also downloaded the xgboost wrapper file sparkxgb.zip to the path ~/Softwares/sparkxgb.zip.
My jupyter notebook first cell
import xgboost
print(xgboost.__version__) # 1.2.0
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars xgboost4j-spark-0.72.jar,xgboost4j-0.72.jar pyspark-shell'
HOME = os.path.expanduser('~')
import findspark
findspark.init(HOME + "/Softwares/spark-3.0.0-bin-hadoop2.7")
import pyspark
from pyspark.sql.session import SparkSession
from pyspark.sql.types import *
from pyspark.ml.feature import StringIndexer, VectorAssembler
from pyspark.ml import Pipeline
from pyspark.sql.functions import col
spark = SparkSession\
.builder\
.appName("PySpark XGBOOST Titanic")\
.getOrCreate()
spark.sparkContext.addPyFile(HOME + "/Softwares/sparkxgb.zip")
print(pyspark.__version__) # 3.0.0
# this does not give any error
# Computer: MacOS
This cell gives errror
from sparkxgb import XGBoostEstimator
Error
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-7-cf2ff39c26f4> in <module>
----> 1 from sparkxgb import XGBoostEstimator
/private/var/folders/tb/7xdk9scs79j9hxzcl3l_s6k00000gn/T/spark-1cf282a4-f3f2-42b3-a064-6bbd8751489e/userFiles-abca5e59-5af3-4b3d-a3bc-edc2973e9995/sparkxgb.zip/sparkxgb/__init__.py in <module>
18
19 from sparkxgb.pipeline import XGBoostPipeline, XGBoostPipelineModel
---> 20 from sparkxgb.xgboost import XGBoostEstimator, XGBoostClassificationModel, XGBoostRegressionModel
21
22 __all__ = ["XGBoostEstimator", "XGBoostClassificationModel", "XGBoostRegressionModel",
/private/var/folders/tb/7xdk9scs79j9hxzcl3l_s6k00000gn/T/spark-1cf282a4-f3f2-42b3-a064-6bbd8751489e/userFiles-abca5e59-5af3-4b3d-a3bc-edc2973e9995/sparkxgb.zip/sparkxgb/xgboost.py in <module>
19 from pyspark.ml.param import Param
20 from pyspark.ml.param.shared import HasFeaturesCol, HasLabelCol, HasPredictionCol, HasWeightCol, HasCheckpointInterval
---> 21 from pyspark.ml.util import JavaMLWritable, JavaPredictionModel
22 from pyspark.ml.wrapper import JavaEstimator, JavaModel
23 from sparkxgb.util import XGBoostReadable
ImportError: cannot import name 'JavaPredictionModel' from 'pyspark.ml.util' (/Users/poudel/Softwares/spark-3.0.0-bin-hadoop2.7/python/pyspark/ml/util.py)
Questions
How to fix the error and run xgboost in pyspark?
Maybe I have not placed downloaded jar files to correct path. (I have them placed in my working directory where I have jupyter notebook file). Do I need to place these files somewhere else? I assume jupyter automatically loads the path . and sees these jar files but I may be wrong.
If any good samaritan has already ran xgboost in pyspark, their help is much appreciated.
This question was asked almost 6 months ago but still no solution were provided.
Even I was facing the same issue for past few days, and finally I got solution so would like to share with all my folks.
By now you might have also got solution but I thought it would be better if I share so that you or in future any one can get benefits from this solution.
You can get rid of this error in two ways,
JavaPredictionModel has been removed from latest version of pyspark, so your can downgrad pyspark to let say version 2.4.0, then error will resolve.
But by doing this you might have to follow all the structure of old pyspark version only like OneHotEncoder can not be used for multiple features at same time you have to do that one-by-one.
!pip install pyspark==2.4.0
The second and best solution is to modify sparkxgb codes, like you can import JavaPredictionModel from pyspark.ml.wrapper, so you don't need to downgrad your pyspark.
from pyspark.ml.wrapper import JavaPredictionModel
P.S. Pardon me for not following the answer standards.
There are some problem with your versions. I know decision of similar problem fot catboost_spark. So i had a problem with versions (catboost_spark_version)
You need to go to https://catboost.ai/en/docs/installation/spark-installation-pyspark
Get the appropriate catboost_spark_version (see available versions at Maven central).
Choose the appropriate spark_compat_version (2.3, 2.4 or 3.0) and scala_compat_version (2.11 or 2.12).
Just add the catboost-spark Maven artifact with the appropriate spark_compat_version, scala_compat_version and catboost_spark_version to spark.jar.packages Spark config parameter and import the catboost_spark package:
So you go to https://search.maven.org/search?q=catboost-spark and
choose version (for example catboost-spark_3.3_2.12)
Then copy "Artifact ID". In this case is "catboost-spark_3.3_2.12:1.1.1"
Then paste it to your config parameter
And you will get something like this:
sparkSession = (SparkSession.builder
.master('local[*]')
.config("spark.jars.packages", "ai.catboost:catboost-spark_3.3_2.12:1.1.1")
.getOrCreate())
import catboost_spark
and it will works :)

plot calibration curve for machine learning

I have the code below and this code work only with the binary class so how can I use with three classes.
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import scikitplot as skp
orgnal_data = pd.read_excel("movie.xls")
# Program extracting first column
text = orgnal_data.iloc[:,0]
lable = orgnal_data.iloc[:,1]
x_train,x_test,y_train,y_test=train_test_split(fe,lable,test_size=0.30,random_state=40)
DT = DecisionTreeClassifier()
DT_y = DT.fit(x_train,y_train).predict(x_test)
clf_names = ['Decision Tree']
skp.metrics.plot_calibration_curve(y_test,DT_y,clf_names)
plt.show()
Since you use scikit-plot module, there is no function for a multiclass problem.
Read the source code here:
This function currently only works for binary classification.
So you can either 1) modify the source code or 2) open a github issue and request a function for multiclass problems.
EDIT 1:
Using scikit-learn you have some ML models that can handle multiclass problems. For example for the LinearSVC function here, the multiclass support is handled according to a one-vs-the-rest scheme.
So you can actually have models like this and then use the plot_calibration_curve function for each case (one VS rest) separately.

Why can't I import statsmodels directly?

I'm certainly missing something very obvious here, but why does this work:
a = [0.2635,0.654654,0.365,0.4545,1.5465,3.545]
import statsmodels.robust as rb
print rb.scale.mad(a)
0.356309343367
but this doesn't:
import statsmodels as sm
print sm.robust.scale.mad(a)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-1ce0c872b0be> in <module>()
----> 1 print statsmodels.robust.scale.mad(a)
AttributeError: 'module' object has no attribute 'robust'
Long answer see http://www.statsmodels.org/stable/importpaths.html
Statsmodels has intentionally mostly empty __init__.py but has a parallel import collection through the api.py.
The recommended import for interactive work import statsmodels.api as sm imports almost all of statsmodels, numpy, pandas and patsy, and large parts of scipy. This is slooow on cold start.
If we want to import just a specific part of statsmodels, then we don't need to import all these extras. Having empty __init__.py means that we can import just a single module (which of course imports the dependencies of that module).
e.g. from statsmodels.robust.scale import mad or
import statmodels.robust scale as smscale
smscale.mad(...)
(Small caveat: Some of the very low level imports might not remain always backwards compatible if the internal structure changes. However, the general policy is to deprecate functions over one or two releases while maintaining the old access structure.)
You can, you just have to import robust as well:
import statsmodels as sm
import statsmodels.robust
Then:
>>> sm.robust.scale.mad(a)
0.35630934336679576
robust is a subpackage of statsmodels, and importing a package does not in general automatically import subpackages (unless the package is written to do so explicitly).

Categories

Resources