I'm applying this tutorial of implementing recommendation system and I faced a problem when importing from sklearn.selection_model train_test_split in order to do the train/test split.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
#create columns name
header = ['user_id', 'item_id', 'rating', 'timestamp']
#read data containing the full dataset of ratings
df = pd.read_csv('ml-100k/u.data', sep='\t', names=header)
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]
print 'Number of users = ' + str(n_users) + ' | Number of movies = ' + str(n_items)
#train_data, test_data = train_test_split(df,test_size=0.25)
#print 'train shape = ' + str(train_data.shape)
Log error:
Traceback (most recent call last):
File "C:/Users/PycharmProjects/recommendation_system_trials/engine.py", line 3, in
from sklearn.model_selection import train_test_split
File "C:\Users\hello2\lib\site-packages\sklearn__init__.py", line 57, in
from .base import clone
File "C:\Users\hello2\lib\site-packages\sklearn\base.py", line 10, in
from scipy import sparse
ImportError: No module named scipy
Why Am I getting this error, I'm not using scipy, I just wanted to import train_test_split.
Thank you for your help.
Related
I am trying to convert pandas dataframe into Tensorflow dataset to build a model upon. But from_tensor_slices gives error. Any idea to fix it or another way to use pandas df in tensorflow model?
Thanks in advance.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
tf.compat.v1.disable_eager_execution()
df = pd.read_csv('insurance.csv')
X = pd.get_dummies(df, columns = ['sex', 'smoker', 'region'])
y = X.pop('charges')
ds = tf.data.Dataset.from_tensor_slices((X.values, y.values))
Error:
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-
packages\tensorflow\python\training\tracking\tracking.py", line 269, in
__del__
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-
packages\tensorflow\python\framework\ops.py", line 4011, in as_default
AttributeError: 'NoneType' object has no attribute 'get_controller'
I solved it now, and I am posting the answer for those who get the same error.
You need to add the line below before from_tensor_slices() line:
tf.compat.v1.enable_eager_execution()
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.preprocessing import LabelEncoder
from google.colab import files
df = files.upload()
df='Dataset.csv'
df=df.dropna()
AttributeError Traceback (most recent call last)
in ()
----> 1 df=df.dropna()
AttributeError: 'str' object has no attribute 'dropna'
You are not loading the file as a dataframe, you just assign the file name of df. Use instead -
df = pd.read_csv('Dataset.csv')
df = df.dropna()
How can I use metrics.silouhette_score on a dataset which has 1300 images that I have their ResNet50 feature vectors (each of length 2048) and a discrete class label between 1 to 9?
import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn import cluster, datasets, preprocessing, metrics
from sklearn.cluster import KMeans
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
labels_reshaped = np.ndarray(labels).reshape(-1,1)
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels_reshaped,
metric='cosine'))
I get this error:
Traceback (most recent call last):
File "/dataset/silouhette_score.py", line 8, in <module>
labels_reshaped = np.ndarray(labels).reshape(-1,1)
ValueError: sequence too large; cannot be greater than 32
Process finished with exit code 1
For this other code:
import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn import cluster, datasets, preprocessing, metrics
from sklearn.cluster import KMeans
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
labels_reshaped = np.ndarray(labels).reshape(1,-1)
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels_reshaped,
metric='cosine'))
I get this error:
Traceback (most recent call last):
File "/dataset/silouhette_score.py", line 8, in <module>
labels_reshaped = np.ndarray(labels).reshape(1,-1)
ValueError: sequence too large; cannot be greater than 32
Process finished with exit code 1
If I run this other code:
import pandas as pd
from sklearn import metrics
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels,
metric='cosine'))
I get this as an output: https://pastebin.com/raw/hk2axdWL
How can I fix this code so that I can print the single silhouette score?
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Process finished with exit code 1
I have pasted one line of my feature vector file (a .txt file) here: https://pastebin.com/raw/hk2axdWL (consists of 2048 numbers separated by space)
I was eventually able to figure this out. I needed to create the feature vector same exact format as sklearn required them:
import pandas as pd
from sklearn import metrics
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
X = open('entire_dataset__resnet50_feature_vectors.txt')
#X_Data = X.read()
fv = []
for line in X:
line = line.strip("\n")
tmp_arr = line.split(' ')
print(tmp_arr)
fv.append(tmp_arr)
print(fv)
print('Silhouette Score:', metrics.silhouette_score(fv, labels,
metric='cosine'))
I am trying to implement nearest neighbor classifier in Turi Create, however I am unsure of this error I am getting. This error occurs when I create the actual model. I am using python 3.6 if that helps.
Error:
Traceback (most recent call last):
File "/Users/PycharmProjects/turi/turi.py", line 51, in <module>
iris_cross()
File "/Users/PycharmProjects/turi/turi.py", line 37, in iris_cross
clf = tc.nearest_neighbor_classifier(train_data, target='4', features=features)
TypeError: 'module' object is not callable
Code:
import turicreate as tc
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import datasets
import time
import numpy as np
#Iris Classification Cross Validation
def iris_cross():
iris = datasets.load_iris()
features = ['0','1','2','3']
target = iris.target_names
x = iris.data
y = iris.target.astype(int)
undata = np.column_stack((x,y))
data = tc.SFrame(pd.DataFrame(undata))
print(data)
train_data, test_data = data.random_split(.8)
clf = tc.nearest_neighbor_classifier(train_data, target='4', features=features)
print('done')
iris_cross()
You have to actually call the create() method of the nearest_neighbor_classifier. See the library API.
Run the following line of code instead:
clf = tc.nearest_neighbor_classifier.create(train_data, target='4', features=features)
I've been attempting to fit this data by a Linear Regression, following a tutorial on bigdataexaminer. Everything was working fine up until this point. I imported LinearRegression from sklearn, and printed the number of coefficients just fine. This was the code before I attempted to grab the coefficients from the console.
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import sklearn
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
boston = load_boston()
bos = pd.DataFrame(boston.data)
bos.columns = boston.feature_names
bos['PRICE'] = boston.target
X = bos.drop('PRICE', axis = 1)
lm = LinearRegression()
After I had all this set up I ran the following command, and it returned the proper output:
In [68]: print('Number of coefficients:', len(lm.coef_)
Number of coefficients: 13
However, now if I ever try to print this same line again, or use 'lm.coef_', it tells me coef_ isn't an attribute of LinearRegression, right after I JUST used it successfully, and I didn't touch any of the code before I tried it again.
In [70]: print('Number of coefficients:', len(lm.coef_))
Traceback (most recent call last):
File "<ipython-input-70-5ad192630df3>", line 1, in <module>
print('Number of coefficients:', len(lm.coef_))
AttributeError: 'LinearRegression' object has no attribute 'coef_'
The coef_ attribute is created when the fit() method is called. Before that, it will be undefined:
>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.datasets import load_boston
>>> from sklearn.linear_model import LinearRegression
>>> boston = load_boston()
>>> lm = LinearRegression()
>>> lm.coef_
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-22-975676802622> in <module>()
7
8 lm = LinearRegression()
----> 9 lm.coef_
AttributeError: 'LinearRegression' object has no attribute 'coef_'
If we call fit(), the coefficients will be defined:
>>> lm.fit(boston.data, boston.target)
>>> lm.coef_
array([ -1.07170557e-01, 4.63952195e-02, 2.08602395e-02,
2.68856140e+00, -1.77957587e+01, 3.80475246e+00,
7.51061703e-04, -1.47575880e+00, 3.05655038e-01,
-1.23293463e-02, -9.53463555e-01, 9.39251272e-03,
-5.25466633e-01])
My guess is that somehow you forgot to call fit() when you ran the problematic line.
I also got the same problem while dealing with linear regression the problem object has no attribute 'coef'.
There are just slight changes in the syntax only.
linreg = LinearRegression()
linreg.fit(X,y) # fit the linesr model to the data
print(linreg.intercept_)
print(linreg.coef_)
I Hope this will help you Thanks