2D output on Lineal regression model

2D output on Lineal regression model - python

I'm getting the following error from my code:
ValueError: Expected 2D array, got scalar array instead:
array=99.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Here is the code used:
#importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import linear_model
Physical_activity_df = pd.read_excel('C:/Users/Usuario/Desktop/LW_docs/Physical_activity_nopass.xlsx')
prediction_df = Physical_activity_df[['Activity_Score','Calories']]
prediction_df.plot(kind='scatter', x= 'Activity_Score', y= 'Calories')
plt.show()
#change to df variables
activity_score = pd.DataFrame(prediction_df['Activity_Score'])
calories = pd.DataFrame(prediction_df['Calories'])
lm = linear_model.LinearRegression()
model = lm.fit(activity_score,calories)
#predict new values for calories (FROM HERE COMES THE ERROR)
activity_score_new = 99
calories_predict = model.predict(activity_score_new)
calories_predict
Any idea about how to fix this issue? Thanks!

Related

Python logit regression matrix shape error "ValueError: endog and exog matrices are different sizes"

Basic setup: I'm trying to run a logit regression in python on the probability of founding a business (founder variable) the exogenous variables are year, age, edu_cat (education category), and sex.
The X matrix is (4, 650), and the y matrix(1, 650). All of the variables within the x matrix have 650 non-NaN observations.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
x=np.array ([ df_all['Year'], df_all['Age'], df_all['Edu_cat'], df_all['sex']])
y= np.array([df_all['founder']])
logit_model = sm.Logit(y, x)
result = logit_model.fit()
print(result)
So I'm tracking that the shape is good, but python is telling me otherwise. Am I missing something basic?

I believe the issue is with the Y array, being [650,1], when it should be [650,], which it defaults to. Additionally I needed to make the x array [650,4] through a transpose.

I am trying to get kmeans to plot 5 clusters, but I'm only get 1 cluster plotted

I found some code on SO which seems to work quite well.
This code, directly below, produces the plot, also below.
from sklearn import datasets
from sklearn import cluster
import plotly
plotly.offline.init_notebook_mode()
iris = datasets.load_iris()
kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(iris.data[:,0:1])
data = [plotly.graph_objs.Scatter(x=iris.data[:,0],
y=iris.data[:,1],
mode='markers',
marker=dict(color=kmeans.labels_)
)]
plotly.offline.iplot(data)
Now, I make a simple substitution in the code, to point to my own data, like this.
from sklearn import datasets
from sklearn import cluster
import plotly
plotly.offline.init_notebook_mode()
x = df[['Spend']]
y = df[['Revenue']]
kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(x,y)
data = [plotly.graph_objs.Scatter(x=df[['Spend']],
y=df[['Revenue']],
mode='markers',
marker=dict(color=kmeans.labels_))]
plotly.offline.iplot(data)
That gives me this plot.
Here is my data frame.
# Import pandas library
import pandas as pd
# initialize list of lists
data = [[110,'CHASE CENTER',53901,8904,44997,4], [541,'METS STADIUM',57999,4921,53078,1], [538,'DEN BRONCOS',91015,9945,81070,1], [640,'LAMBEAU WI',76214,5773,70441,3], [619,'SAL AIRPORT',93000,8278,84722,5]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Location', 'Location_Description', 'Revenue','Spend','Profit_Or_Loss','cluster_number'])
# print dataframe.
df
I must be missing something silly, but I don't see what it is.

You have a problem with the dimension:
# In the iris dataset
>>> iris.data[:,0].shape
(150,)
# Your data
>>> x.shape
(5, 1)
# You need to flatter your array
x.values.flatten().shape
(5,)
For example:
from sklearn import datasets
from sklearn import cluster
import plotly
plotly.offline.init_notebook_mode()
x = df[['Spend']]
y = df[['Revenue']]
x_flat = x.values.flatten()
y_flat = y.values.flatten()
kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(x)
data = [plotly.graph_objs.Scatter(x=x_flat,
y=y_flat,
mode='markers',
marker=dict(color=kmeans.labels_))]
plotly.offline.iplot(data)
On the other hand cluster.KMeans.fit accepts an array (and not two as you are passing). You're going to have to convert them to something of of shape (n_samples, n_features):
X = np.zeros((x_flat.shape[0], 2))
X[:, 0] = x_flat
X[:, 1] = y_flat
# X.shape -> (5, 2)
kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(X)

Issue in trying to import CSV data using iloc

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('c:Documents/windpowerlib-dev/example/Wind_Power.csv')
dataset.plot()
X = np.array(range(1,731))
y = dataset.iloc[2:731, 1].values
I am trying to input Wind power data as Y for regression plotting with X as mere numbers from 1 until 730 but somehow I get this error:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample
The data I use is 730 in numbers at the first column. I do not understand what I did wrong.

Type error while using scikit-learns SimpleImputer

This code is for data preprocessing that I am learning in an online course of ML.
import numpy as np
import matplotlib.pyplot as plt #pyplot is a sublibrary of matplotlib
import pandas as pd
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:,:-1]
Y = dataset.iloc[:,-1]
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan,strategy = 'mean',verbose = 0)
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
But it is giving this Type error: unhashable type: 'slice' .
Please help me with this.

X is a dataframe and you can't access like X[:,1:3].you should use iloc.
Try this
imputer = imputer.fit(X.iloc[:,1:3])
X.iloc[:,1:3] = imputer.transform(X.iloc[:,1:3])

I would also advise to make use of sklearn.pipeline.Pipeline and sklearn.compose .ColumnTransformer make these preprocessing transformation if your final goal is to predict: https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py

How to change 2D array into 1D array in Python

Getting an error message,
Expected 2D array, got 1D array instead:
array=[0.00127552 0.00286695 0.00135289 ... 0.00611554 0.02416051 0.00977264].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Tried tstArray.reshape(1,-1) but no luck.
import numpy as np
import matplotlib.pyplot as plt
import skimage.feature
from sklearn.decomposition import PCA
trnImages = np.load('trnImage.npy')
tstImages = np.load('tstImage.npy')
trnLabels = np.load('trnLabel.npy')
tstLabels = np.load('tstLabel.npy')
trnidx = 20
trnImages.shape
from sklearn.svm import SVC
import tensorflow.keras as keras
def computeFeatures(image):
# This function computes the HOG features with the parsed hyperparameters and returns the features as hog_feature.
# By setting visualize=True we obtain an image, hog_as_image, which can be plotted for insight into extracted HOG features.
hog_feature, hog_as_image = skimage.feature.hog(image, visualize=True, block_norm='L2-Hys')
return hog_feature
rnArray = np.zeros([10000,324])
tstArray = np.zeros([1000,324])
for i in range (0, 10000 ):
trnFeatures = computeFeatures(trnImages[:,:,:,i])
trnArray[i,:] = trnFeatures
ValueError: Expected 2D array, got 1D array instead:
array=[0.00127552 0.00286695 0.00135289 ... 0.00611554 0.02416051 0.00977264].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

2D output on Lineal regression model - python

Related

Python logit regression matrix shape error "ValueError: endog and exog matrices are different sizes"

I am trying to get kmeans to plot 5 clusters, but I'm only get 1 cluster plotted

Issue in trying to import CSV data using iloc

Type error while using scikit-learns SimpleImputer

How to change 2D array into 1D array in Python

Categories

Resources