Error when loading UMAP. Cannot load set_parallel_chunksize - python

This is my code. I keep getting an error which tells me i cannot load set_parallel_chunksize:
import umap.umap_ as umap
from sklearn.mixture import GaussianMixture
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.metrics.pairwise import euclidean_distances
import warnings
warnings.filterwarnings('ignore')
# Get list of numeric columns
numeric_cols = data.select_dtypes(include=['float64', 'int64']).columns
# Set default columns to be pre-selected in the multiselect menu
default_cols = ['Minutes_played']
# Create multiselect menu to select features
selected_cols = st.multiselect('Select columns to use as features', numeric_cols, default=default_cols)
# Create a list of selected columns in the specified format
features = list(data[selected_cols])
### APPLY MODEL ###
#Standardize the data
X = data[features]
z = StandardScaler()
X[features] = z.fit_transform(X)
#Reduce the size of the features to 2 components using UMAP
fit = umap.UMAP(n_components=2,random_state=42)
u = fit.fit_transform(X)
I keep getting this error:
ImportError: cannot import name 'set_parallel_chunksize' from 'numba.np.ufunc' (/Users/omar/opt/anaconda3/lib/python3.9/site-packages/numba/np/ufunc/__init__.py)
I'm on a mac and already tried
pip uninstall umap
pip install umap-learn
Any clue what the issue might be?

Related

Duplicated feature and criteria from sklearn RandomForest when examining the decision path

I'm getting duplicated feature and threshold (CO2) when examining the decision tree from a random forest model. The code to visualize the tree is the following:
estimator = model.estimators_[10]
from sklearn.tree import export_graphviz
# Export as dot file
export_graphviz(estimator, out_file='tree.dot',
feature_names = ['pdo', 'pna', 'lat', 'lon', 'ele', 'co2'],
class_names = 'disWY',
rounded = False, proportion = False,
precision = 3, filled = True)
# Convert to png using system command (requires Graphviz)
from subprocess import call
call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=300'])
# Display in jupyter notebook
from IPython.display import Image
Image(filename = 'tree.png')
It is clear that CO2 and -0.69 are used twice. I don't understand how this is possible. Anyone has any idea?
screen shot of decision tree
Should it be different threshold for the same feature?
This is probably a rounding error.
It's a little contrived, but here's a minimal way to reproduce this with RandomForestRegressor
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import export_graphviz
X = np.array([[-0.6901, 4.123],
[-0.6902, 5.456],
[-0.6903, 6.789],
[-0.6904, 7.012]])
y = np.array([0.0, 1.0, 1.0, 0.0])
reg = RandomForestRegressor(random_state=42).fit(X, y)
export_graphviz(reg.estimators_[6], out_file=f"tree6.dot", precision=3, filled=True)
# dot -Tpng tree6.dot -o tree6.png
If instead we passed a higher precision=8 when calling export_graphviz() we would see something like this:

2D output on Lineal regression model

I'm getting the following error from my code:
ValueError: Expected 2D array, got scalar array instead:
array=99.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Here is the code used:
#importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import linear_model
Physical_activity_df = pd.read_excel('C:/Users/Usuario/Desktop/LW_docs/Physical_activity_nopass.xlsx')
prediction_df = Physical_activity_df[['Activity_Score','Calories']]
prediction_df.plot(kind='scatter', x= 'Activity_Score', y= 'Calories')
plt.show()
#change to df variables
activity_score = pd.DataFrame(prediction_df['Activity_Score'])
calories = pd.DataFrame(prediction_df['Calories'])
lm = linear_model.LinearRegression()
model = lm.fit(activity_score,calories)
#predict new values for calories (FROM HERE COMES THE ERROR)
activity_score_new = 99
calories_predict = model.predict(activity_score_new)
calories_predict
Any idea about how to fix this issue? Thanks!

dataset is not callable problems

Im trying to impute NaN values but,first i want to check the best method to calculate this values. Im new using this methods, so im want to use a code i found to capare the differents regressors and choose the best. The original code is this:
from sklearn.experimental import enable_iterative_imputer # noqa
from sklearn.datasets import fetch_california_housing
from sklearn.impute import SimpleImputer
from sklearn.impute import IterativeImputer
from sklearn.linear_model import BayesianRidge
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_val_score
N_SPLITS = 5
rng = np.random.RandomState(0)
X_full, y_full = fetch_california_housing(return_X_y=True)
# ~2k samples is enough for the purpose of the example.
Remove the following two lines for a slower run with different error bars.
X_full = X_full[::10]
y_full = y_full[::10]
n_samples, n_features = X_full.shape
fetch_california_housing is his Dataset.
So, when i try to adapt this code to my case i wrote this code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from numpy import genfromtxt
data = genfromtxt('documents/datasets/df.csv', delimiter=',')
features = data[:, :2]
targets = data[:, 2]
N_SPLITS = 5
rng = np.random.RandomState(0)
X_full, y_full = data(return_X_y= True)
# ~2k samples is enough for the purpose of the example.
# Remove the following two lines for a slower run with different error bars.
X_full = X_full[::10]
y_full = y_full[::10]
n_samples, n_features = X_full.shape
I always get the same error:
AttributeError: 'numpy.ndarray' object is not callable
and before I used my DF as csv (df.csv) the error is the same
AttributeError: 'Dataset' object is not callable
the complete error is this:
ypeError Traceback (most recent call last) <ipython-input-8-3b63ca34361e> in <module>
3 rng = np.random.RandomState(0) 4
----> 5 X_full, y_full = df(return_X_y=True)
6 # ~2k samples is enough for the purpose of the example.
7 # Remove the following two lines for a slower run with different error bars.
TypeError: 'DataFrame' object is not callable
and i dont know how to solve one of both error to go away
I hope to explain well my problem cause my english is not very good

keep getting error "Input contains NaN, infinity or a value too large for dtype('float32')."

Trying to create tree for this data, not sure what is wrong exactly, but keep getting same error.
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
train, test = train_test_split(sns.load_dataset('titanic').drop(columns=['alive']), random_state=0)
target = 'survived'
!pip install dtreeviz
!apt install graphviz
!apt install xdg-utils
sex = {'male': 0, 'female': 1,}
train.sex = train.sex.map(sex)
test.sex = test.sex.map(sex)
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(max_depth=2)
features = ['sex', 'age']
target = 'survived'
model.fit(train[features], train[target])
model.fit(test[features], test[target])
from dtreeviz.trees import *
dtreeviz(model,
train[features],
train[target],
taget_name=target,
feature_names=features,
class_names=['deceased', 'survived'])
new to python, any help appreciated!

Type error while using scikit-learns SimpleImputer

This code is for data preprocessing that I am learning in an online course of ML.
import numpy as np
import matplotlib.pyplot as plt #pyplot is a sublibrary of matplotlib
import pandas as pd
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:,:-1]
Y = dataset.iloc[:,-1]
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan,strategy = 'mean',verbose = 0)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
But it is giving this Type error: unhashable type: 'slice' .
Please help me with this.
X is a dataframe and you can't access like X[:,1:3].you should use iloc.
Try this
imputer = imputer.fit(X.iloc[:,1:3])
X.iloc[:,1:3] = imputer.transform(X.iloc[:,1:3])
I would also advise to make use of sklearn.pipeline.Pipeline and sklearn.compose .ColumnTransformer make these preprocessing transformation if your final goal is to predict: https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py

Categories

Resources