import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('c:Documents/windpowerlib-dev/example/Wind_Power.csv')
dataset.plot()
X = np.array(range(1,731))
y = dataset.iloc[2:731, 1].values
I am trying to input Wind power data as Y for regression plotting with X as mere numbers from 1 until 730 but somehow I get this error:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample
The data I use is 730 in numbers at the first column. I do not understand what I did wrong.
Related
I have 2 CSV files one called training_data and another called target data Ive read both of them training data contains around 30 columns of data and target data has 1 im trying to correlate between the one column in the target data to all the columns of the training data
import pandas as pd
import tarfile
import numpy as np
import csv
#reading in the data
training_data = pd.read_csv(training_data_path)
training_target = pd.read_csv(training_targets_path)
%matplotlib inline
import matplotlib.pyplot as plt
#plotting histogram
training_data.hist(bins=60,figsize=(30,25))
#after reviewing the histograms it can be seen in the histogram of the average household sizes that around 50 counties have a AvgHousehold size of almost 0
#PctSomeCol18_24, PctEmployed16_Over, PctPrivateCoverageAlone all have missing data
display(training_data)
display(training_target)
TARGET_deathRate = training_target["TARGET_deathRate"]
corr_matrix=training_data.corr(training_target)
Ive tried using the corr function but it is not working
It is better to use correlation in one data set, therefore first of all you have to join these two datasets and then use the correlation function. for joining you can use concat, append and join which I rather use join:
df = training_data.join(training_target) #joining datasets
corr_matrix=df.corr()['TARGET_deathRate']
I'm getting the following error from my code:
ValueError: Expected 2D array, got scalar array instead:
array=99.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Here is the code used:
#importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import linear_model
Physical_activity_df = pd.read_excel('C:/Users/Usuario/Desktop/LW_docs/Physical_activity_nopass.xlsx')
prediction_df = Physical_activity_df[['Activity_Score','Calories']]
prediction_df.plot(kind='scatter', x= 'Activity_Score', y= 'Calories')
plt.show()
#change to df variables
activity_score = pd.DataFrame(prediction_df['Activity_Score'])
calories = pd.DataFrame(prediction_df['Calories'])
lm = linear_model.LinearRegression()
model = lm.fit(activity_score,calories)
#predict new values for calories (FROM HERE COMES THE ERROR)
activity_score_new = 99
calories_predict = model.predict(activity_score_new)
calories_predict
Any idea about how to fix this issue? Thanks!
I have a pandas DataFrame from the sklearn.datasets Boston house price data and am trying to convert this to a numpy array but keeping column names. Here is the code I tried:
from sklearn import datasets ## imports datasets from scikit-learn
import numpy as np
import pandas as pd
data = datasets.load_boston() ## loads Boston dataset from datasets library
df = pd.DataFrame(data.data, columns=data.feature_names)
X = df.to_numpy()
print(X.dtype.names)
However this returns None and therefore column names are not kept. Does anyone understand why?
Thanks
try this :
w = (data.feature_names).reshape(13,1)
X = np.vstack((w.T, data.data))
print (X)
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np
# Read the data.
data = np.asarray(pd.read_csv('data.csv', header=None))
# Assign the features to the variable X, and the labels to the variable y.
X = data[:,0:2]
y = data[:,2]
# TODO: Create the model and assign it to the variable model.
# Find the right parameters for this model to achieve 100% accuracy on the dataset.
model = SVC()
model.fit(X,y)
2 Questions:
the data goes into a numpy array from a pandas Dataframe (by pd.read_csv).
Is that better? Is there a good reason for that? why not stay with the DataFrame?
I do not understand this notation:
X = data[:,0:2]
y = data[:,2]
What does it do?
Thank you.
The data consists of a CSV file with many rows like this:
0.28917,0.65643,0
It includes three columns, the first 2 comprising of the coordinates of the points, and the third one of the label.
Getting an error message,
Expected 2D array, got 1D array instead:
array=[0.00127552 0.00286695 0.00135289 ... 0.00611554 0.02416051 0.00977264].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Tried tstArray.reshape(1,-1) but no luck.
import numpy as np
import matplotlib.pyplot as plt
import skimage.feature
from sklearn.decomposition import PCA
trnImages = np.load('trnImage.npy')
tstImages = np.load('tstImage.npy')
trnLabels = np.load('trnLabel.npy')
tstLabels = np.load('tstLabel.npy')
trnidx = 20
trnImages.shape
from sklearn.svm import SVC
import tensorflow.keras as keras
def computeFeatures(image):
# This function computes the HOG features with the parsed hyperparameters and returns the features as hog_feature.
# By setting visualize=True we obtain an image, hog_as_image, which can be plotted for insight into extracted HOG features.
hog_feature, hog_as_image = skimage.feature.hog(image, visualize=True, block_norm='L2-Hys')
return hog_feature
rnArray = np.zeros([10000,324])
tstArray = np.zeros([1000,324])
for i in range (0, 10000 ):
trnFeatures = computeFeatures(trnImages[:,:,:,i])
trnArray[i,:] = trnFeatures
ValueError: Expected 2D array, got 1D array instead:
array=[0.00127552 0.00286695 0.00135289 ... 0.00611554 0.02416051 0.00977264].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.