Error with NaN using sklearn in python - python

My Code:
### Working with NaN using sklearn
import numpy as np
from sklearn.preprocessing import Imputer
### Mean strategy
imp = Imputer(missing_values='NaN', strategy='mean', axis=1)
imp.fit([1,5,9,np.NaN])
X = [1,5,9,np.NaN]
y = imp.transform(X)
print (y)
After running I am getting below warning message:
C:\Users\Admin\Anaconda3\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)
How to solve it? I tried the reshape but it is giving error message saying:
'list' object has no attribute 'reshape'
Please help.

so i ran your code and changed X do a 2d list... Turns out that because you were passing a 1D array to transform so it was throwing you the error... So i made it a 2D lisst
import numpy as np
from sklearn.preprocessing import Imputer
### Mean strategy
imp = Imputer(missing_values='NaN', strategy='mean', axis=1)
imp.fit([1,5,9,np.NaN])
X = [[1,5,9,np.NaN]] # < =========== The change that I made
y = imp.transform(X)
print(y)
enter code here

Related

Why am i getting index error on this one hot encoding?

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('netflixprice.csv')
x = dataset.iloc[:,0].values
y = dataset.iloc[:, 1:6].values
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
x = np.array(ct.fit_transform(x))
IndexError Traceback (most recent call last)
Input In [8], in <cell line: 4>()
2 from sklearn.preprocessing import OneHotEncoder
3 ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
----> 4 x = np.array(ct.fit_transform(x))
data structure
New to this. Also anywhere i can learn more about data processing ?
It's hard to tell anything without knowing the structure of your data. However, it seems like you may want to reshape your x:
x = dataset.iloc[:, 0].values.reshape(-1, 1)
I could find a dataset that might be similar to yours and tried it, it worked.
As for learning how to process the data: I personally try to refer to the documentation of a method I want to apply. In your case it's here. However, a clue to where the problem was I could find in the error message:
def _get_column_indices(X, key):
"""Get feature column indices for input data X and key.
For accepted values of `key`, see the docstring of
:func:`_safe_indexing_column`.
"""
--> n_columns = X.shape[1] # this is where the problem is
key_dtype = _determine_key_type(key)
if isinstance(key, (list, tuple)) and not key:
# we get an empty list
IndexError: tuple index out of range
That made me suspect that you got an ndarray shaped (n,) when sliced x, which doesn't have columns that were required.
It also seems like you intended x to be the target rather than the only feature. With 6 other columns assigned to y you may want to swap x and y. You may still encode your target like you planned.

2D output on Lineal regression model

I'm getting the following error from my code:
ValueError: Expected 2D array, got scalar array instead:
array=99.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Here is the code used:
#importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import linear_model
Physical_activity_df = pd.read_excel('C:/Users/Usuario/Desktop/LW_docs/Physical_activity_nopass.xlsx')
prediction_df = Physical_activity_df[['Activity_Score','Calories']]
prediction_df.plot(kind='scatter', x= 'Activity_Score', y= 'Calories')
plt.show()
#change to df variables
activity_score = pd.DataFrame(prediction_df['Activity_Score'])
calories = pd.DataFrame(prediction_df['Calories'])
lm = linear_model.LinearRegression()
model = lm.fit(activity_score,calories)
#predict new values for calories (FROM HERE COMES THE ERROR)
activity_score_new = 99
calories_predict = model.predict(activity_score_new)
calories_predict
Any idea about how to fix this issue? Thanks!

Sklearn.linear_model : ValueError: Found input variables with inconsistent numbers of samples: [1, 20]

I am trying to implement linear regression but when i run the code I get this error ValueError: Found input variables with inconsistent numbers of samples: [1, 20] in line-->linear.fit(x_train1,y_train1) [data type of x_train1,x is series & y_ is series].
I changed x=dataset.iloc[:,:-1] datatype of x_train, x changes to dataframe(y_ is still series) and it works correctly
So why it only works when x is dataframe eventhough y is still series??
import pandas as pd
import numpy as np
import matplotlib.pyplot
dataset=pd.read_csv('Salary_Data.csv')
x=dataset.iloc[:,0]
y=dataset.iloc[:,1]
from sklearn.model_selection import train_test_split
x_train1,x_test1,y_train1,y_test1=
train_test_split(x,y,test_size=1/3,random_state=0)
#implementing simple linear regression
from sklearn.linear_model import LinearRegression
linear=LinearRegression()
linear.fit(x_train1,y_train1)
y_pred=linear.predict(x_test1)
Scikit-Learn does not accept rank 1 array (1 dimensional data), i.e: if you call shape method on your x:
x.shape
it will return something that looks like this (23,), 23 being the number of rows where it should be (23,1).
In order for it to work, try using reshape:
x = dataset.iloc[:,0]
x = x.reshape((len(x),1))
...

Tried Linear Regression But Reshape Error

I am trying to put out a linear regression but am getting this error:
ValueError: cannot reshape array of size 2246 into shape (2,2246)
and
C:\Users\Brian\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead
This is my code.
import pandas as pd
import matplotlib.pyplot as plt
% matplotlib inline
df = pd.read_csv(r'C:\Users\Brian\Desktop\GOOGTICKER.CSV')
df
times = pd.DatetimeIndex(df['Date'])
grouped= df.groupby([times.year]).mean()
from sklearn import linear_model
x_val= times
y_val= df['GOOGL']
body_reg =linear_model.LinearRegression()
body_reg.fit(x_val, y_val)
I have imported numpy as py and have tried reshaping, but I still get an error. Any advice would be greatly appreciated. Thank you for your time.

depreciation warning when I use sklearn imputer

My code is quite simple, but it always pops out the warning like this:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will
raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if
your data has a single feature or X.reshape(1, -1) if it contains a single sample.
(DeprecationWarning)
I don't know why it does not work even if I add s.reshape(-1,1) in the parentheses of fit_transforms.
The code is following:
import pandas as pd
s = pd.Series([1,2,3,np.nan,5,np.nan,7,8])
imp = Imputer(missing_values='NAN', strategy='mean', axis=0)
x = pd.Series(imp.fit_transform(s).tolist()[0])
x

Categories

Resources