I've been experiencing some odd issues with randomness in scikit-learn on my macbook. (OS X 10.12.6, conda environment with python 2.7). As a test, I've set up the following script:
import numpy.random as npr
import numpy.testing as npt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
def test_randomness_one():
npr.seed(54)
rand_ints_one = npr.randint(500, size=50)
npr.seed(54)
rand_ints_two = npr.randint(500, size=50)
npt.assert_array_equal(rand_ints_one, rand_ints_two)
def test_logit_one():
data = load_breast_cancer()
preds_one = LogisticRegression(random_state=2)\
.fit(data['data'], data['target'])\
.decision_function(data['data'])
preds_two = LogisticRegression(random_state=2)\
.fit(data['data'], data['target'])\
.decision_function(data['data'])
npt.assert_array_equal(preds_one, preds_two)
def test_logit_two():
data = load_breast_cancer()
preds_one = LogisticRegression()\
.fit(data['data'], data['target'])\
.decision_function(data['data'])
preds_two = LogisticRegression()\
.fit(data['data'], data['target'])\
.decision_function(data['data'])
npt.assert_array_equal(preds_one, preds_two)
# A note: main used for testing with the interpreter directly
# Executed with pytest *without* the below lines.
if __name__ == "__main__":
test_randomness_one()
test_logit_one()
test_logit_two()
In theory, all of these results should be identical, and coworkers running Ubuntu and windows boxes have verified this. On my box, all of these tests pass if executed in a REPL or run via python toy_test.py. If run via pytest toy_test.py, however, test_logit_one fails consistently and test_logit_two fails often but not always. Where is the randomness coming from in this situation? Is it OS-level? conda-level? pytest? Or something else?
Related
I am running a model using tqdm.notebook to check the progress using python3.8. However, the progress bar is not running though the generation works okay.
It just shows this on and on.
Here is my following code, and the model I'm running.
import numpy as np
import tensorflow as tf
from midi_ddsp.utils.midi_synthesis_utils import synthesize_mono_midi, conditioning_df_to_audio
from midi_ddsp.utils.inference_utils import get_process_group
from midi_ddsp.midi_ddsp_synthesize import load_pretrained_model
from midi_ddsp.data_handling.instrument_name_utils import INST_NAME_TO_ID_DICT
from tqdm.notebook import tqdm
# -----MIDI Synthesis-----
midi_file = '/Users/midi-ddsp/midi_example/ode_to_joy.mid'
# Load pre-trained model
synthesis_generator, expression_generator = load_pretrained_model()
# Synthesize with violin:
instrument_name = 'violin'
instrument_id = INST_NAME_TO_ID_DICT[instrument_name]
# Run model prediction
midi_audio, midi_control_params, midi_synth_params, conditioning_df = synthesize_mono_midi(synthesis_generator,
expression_generator,
midi_file, instrument_id,
output_dir=None)
synthesized_audio = midi_audio # The synthesized audio
conditioning_df_changed = conditioning_df.copy()
idk what's the problem. Hope someone can tell me. I appreciate it!
I'm trying to learn how to create a machine learning API with Flask, however, following this tutorial, the following error appears when I type the command python app.py:
Traceback (most recent call last):
File "C:\Users\Breno\Desktop\flask-api\app.py", line 24, in <module>
model = p.load(open(modelfile, 'rb'))
ModuleNotFoundError: No module named 'sklearn.tree.tree'
My code:
from flask import Flask, request, redirect, url_for, flash, jsonify
import numpy as np
import pickle as p
import pandas as pd
import json
#from sklearn.tree import DecisionTreeClassifier
app = Flask(__name__)
#app.route('/api/', methods=['POST'])
def makecalc():
j_data = request.get_json()
prediction = np.array2string(model.predict(j_data))
return jsonify(prediction)
if __name__ == '__main__':
modelfile = 'models/final_prediction.pickle'
model = p.load(open(modelfile, 'rb'))
app.run(debug=True,host='0.0.0.0')
Could someone help me please?
Pickles are not necessarily compatible across scikit-learn versions so this behavior is expected (and the use case is not supported). For more details, see https://scikit-learn.org/dev/modules/model_persistence.html#model-persistence. Replace pickle by joblib.
by example :
>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> X, y= datasets.load_iris(return_X_y=True)
>>> clf.fit(X, y)
SVC()
>>> from joblib import dump, load
>>> dump(clf, open('filename.joblib','wb'))
>>> clf2 = load(open('filename.joblib','rb'))
>>> clf2.predict(X[0:1])
array([0])
>>> y[0]
0
For anyone coming across this issue (perhaps dealing with code written long ago), sklearn.tree.tree is now under sklearn.tree (as from v0.24). This can be see from the import error warning:
from sklearn.tree.tree import BaseDecisionTree
/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.tree. Anything that cannot be imported from sklearn.tree is now part of the private API.
warnings.warn(message, FutureWarning)
Instead, use:
from sklearn.tree import BaseDecisionTree
The problem is with the version of sklearn. Module sklearn.tree.tree is removed since version 0.24. Most probably, your model has been generated with the older version. Try installing an older version of sklearn:
pip uninstall scikit-learn
pip install scikit-learn==0.20.4
I need to perform inference for a cuml.dask.ensemble.RandomForestClassifier on a GPU-less Windows virtual machine where rapids/cuml can't be installed.
I have thought to use treelite so I have to import the model into treelite and generate a shared library (.dll file for windows). After that, I would use treelite_runtime.Predictor to import the shared library and perform inference in the target machine.
The problem is that I have no idea of how to import the RandomForestClassifier model into treelite to create a treelite model.
I have tried to use the 'convert_to_treelite_model' but the obtained object isn't a treelite model and I don't know how to use it.
See the attached code (executed under Linux, so I try to use the gcc toolchain and generate a '.so' file...
I get the exception "'cuml.fil.fil.TreeliteModel' object has no attribute 'export_lib'" when I try to call the 'export_lib' function...
import numpy as np
import pandas as pd
import cudf
from sklearn import model_selection, datasets
from cuml.dask.common import utils as dask_utils
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
import dask_cudf
from cuml.dask.ensemble import RandomForestClassifier as cumlDaskRF
import treelite
import treelite_runtime
if __name__ == '__main__':
# This will use all GPUs on the local host by default
cluster = LocalCUDACluster(threads_per_worker=1)
c = Client(cluster)
# Query the client for all connected workers
workers = c.has_what().keys()
n_workers = len(workers)
n_streams = 8 # Performance optimization
# Data parameters
train_size = 10000
test_size = 100
n_samples = train_size + test_size
n_features = 10
# Random Forest building parameters
max_depth = 6
n_bins = 16
n_trees = 100
X, y = datasets.make_classification(n_samples=n_samples, n_features=n_features,
n_clusters_per_class=1, n_informative=int(n_features / 3),
random_state=123, n_classes=5)
X = X.astype(np.float32)
y = y.astype(np.int32)
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=test_size)
n_partitions = n_workers
# First convert to cudf (with real data, you would likely load in cuDF format to start)
X_train_cudf = cudf.DataFrame.from_pandas(pd.DataFrame(X_train))
y_train_cudf = cudf.Series(y_train)
X_test_cudf = cudf.DataFrame.from_pandas(pd.DataFrame(X_test))
# Partition with Dask
# In this case, each worker will train on 1/n_partitions fraction of the data
X_train_dask = dask_cudf.from_cudf(X_train_cudf, npartitions=n_partitions)
y_train_dask = dask_cudf.from_cudf(y_train_cudf, npartitions=n_partitions)
x_test_dask = dask_cudf.from_cudf(X_test_cudf, npartitions=n_partitions)
# Persist to cache the data in active memory
X_train_dask, y_train_dask, x_test_dask= dask_utils.persist_across_workers(c, [X_train_dask, y_train_dask, x_test_dask], workers=workers)
cuml_model = cumlDaskRF(max_depth=max_depth, n_estimators=n_trees, n_bins=n_bins, n_streams=n_streams)
cuml_model.fit(X_train_dask, y_train_dask)
wait(cuml_model.rfs) # Allow asynchronous training tasks to finish
# HACK: comb_model is None if a prediction isn't performed before calling to 'get_combined_model'.
# I don't know why...
cuml_y_pred = cuml_model.predict(x_test_dask).compute()
cuml_y_pred = cuml_y_pred.to_array()
del cuml_y_pred
comb_model = cuml_model.get_combined_model()
treelite_model = comb_model.convert_to_treelite_model()
toolchain = 'gcc'
treelite_model.export_lib(toolchain=toolchain, libpath='./mymodel.so', verbose=True) # <----- EXCEPTION!
del cuml_model
del treelite_model
predictor = treelite_runtime.Predictor('./mymodel.so', verbose=True)
y_pred = predictor.predict(X_test)
# ......
Notes: I'm trying to run the code on an Ubuntu box with 2 NVIDIA RTX2080ti GPUs, using the following library versions:
cudatoolkit 10.1.243
cudnn 7.6.0
cudf 0.15.0
cuml 0.15.0
dask 2.30.0
dask-core 2.30.0
dask-cuda 0.15.0
dask-cudf 0.15.0
rapids 0.15.1
treelite 0.92
treelite-runtime 0.92
At the moment Treelite does not have a serialization method that can be directly used. We have an internal serialization method that we use to pickle cuML's RF model.
I would recommend creating a feature request in Treelite's github repo (https://github.com/dmlc/treelite) and requesting a feature for serializing and deserializing Treelite models.
Furthermore, the output of convert_to_treelite_model function is a Treelite model. It shows it as :
In [2]: treelite_model
Out[2]: <cuml.fil.fil.TreeliteModel at 0x7f11ceeca840>
As we expose the C++ Treelite code in cython to have direct access to Treelite's C++ handle.
While using the rpy2 library of Python to work with R. I get the following error message while trying to import a function of the bnlearn package:
# Using R inside python
import rpy2
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects.packages import importr
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1)
# Install packages
packnames = ('visNetwork', 'bnlearn')
utils.install_packages(StrVector(packnames))
# Load packages
visNetwork = importr('visNetwork')
bnlearn = importr('bnlearn')
tabu = bnlearn.tabu
fit = bn.learn.bn.fit
With the error:
AttributeError: module 'bnlearn' has no attribute 'bn'
While checking the bnlearn documentation one finds out that bn is a class structure. So one should check out all the attributes of the object in question, that is, running:
bnlearn.__dict__['_rpy2r']
After that you should get a similar output like the next one, where you find how you would import each attribute of bnlearn:
...
...
'bn_boot': 'bn.boot',
'bn_cv': 'bn.cv',
'bn_cv_algorithm': 'bn.cv.algorithm',
'bn_cv_structure': 'bn.cv.structure',
'bn_fit': 'bn.fit',
'bn_fit_backend': 'bn.fit.backend',
'bn_fit_backend_continuous': 'bn.fit.backend.continuous',
'bn_fit_backend_discrete': 'bn.fit.backend.discrete',
'bn_fit_backend_mixedcg': 'bn.fit.backend.mixedcg',
'bn_fit_barchart': 'bn.fit.barchart',
'bn_fit_dotplot': 'bn.fit.dotplot',
...
...
Then, running the following will solve the issue:
bn_fit = bnlearn.bn_fit
Now, you could, for example, run a bayesian Network:
structure = tabu(datos, score = "loglik-g")
bn_mod = bn_fit(structure, data = datos, method = "mle")
In general, this approach solves the issue of importing any function from an R package into Python through the rpy2 package.
I'm trying to write my own chatbot with the RASA framework.
Right now I'm just playing around with it and I have the following piece of code for training purposes.
from rasa.nlu.training_data import load_data
from rasa.nlu.config import RasaNLUModelConfig
from rasa.nlu.model import Trainer
from rasa.nlu import config
training_data = load_data("./data/nlu.md")
trainer = Trainer(config.load("config.yml"))
interpreter = trainer.train(training_data)
model_directory = trainer.persist("./models/nlu",fixed_model_name="current")
Now, I read that if I wanted to test it I should do something like this.
from rasa.nlu.evaluate import run_evaluation
run_evaluation("nlu.md", model_directory)
But this code is not available anymore in rasa.nlu.evaluate nor in rasa.nlu.test!
What's the way, then, of testing a RASA model?
The module was renamed.
Please import
from rasa.nlu.test import run_evaluation
Alternatively you now also do
from rasa.nlu import test
test_result = test(path_to_test_data, unpacked_model)
intent_evaluation_report = test_result["intent_evaluation"]["report"]
print(intent_evaluation_report)