Dimensionality Reduction ValueError - python

New to this subject. I was trying to use PCA, from sklearn to reduce my data dimensionally. As I dont know another method I am trying to use PCA to guess how much dimensions it should be used.
My data is an ndarray with shape (51, 2928). With the next code I try to fit the data
pca = PCA(n_components='mle', svd_solver='full')
pca.fit(data)
But I deal with the following error when trying to fit the data:
ValueError: n_components='mle' is only supported if n_samples >= n_features
What am I doing wrong?

Related

fit and transform error on Cross validation and test data

I need help with the code here. i am trying to fit and transform the train data and then transform the cross validation and the test data. but when i do that i get the error that - ValueError: X has 24155 features, but Normalizer is expecting 49041 features as input.
Can someone please help me to solve this issue.
my code snippet-
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
X_train_price_norm = normalizer.fit_transform(X_train['price'].values.reshape(1,-1))
X_cv_price_norm = normalizer.transform(X_cv['price'].values.reshape(1,-1))
X_test_price_norm = normalizer.transform(X_test['price'].values.reshape(1,-1))
print("After vectorizations")
print(X_train_price_norm.shape, y_train.shape)
print(X_cv_price_norm.shape, y_cv.shape)
print(X_test_price_norm.shape, y_test.shape)
print("="*100)
The transform function expects a 2D array as (samples, features)
The error indicates that second dimension of X_train['price'] and x_cv['price'] or x_test['price'] are not the same.
As the code reflects, you have 1 feature (price), and many samples. So, as the above explanation (samples, features), your input shape should be like (n_samples,1), since you have one feature. Now, consider to change the reshape to (-1,1) instead of (1,-1).
X_train_price_norm = normalizer.fit_transform(X_train['price'].values.reshape(-1,1))
X_cv_price_norm = normalizer.transform(X_cv['price'].values.reshape(-1,1))
X_test_price_norm = normalizer.transform(X_test['price'].values.reshape(-1,1))

How can I solve inverse_transform with shape problem?

here is my code
scaler = MinMaxScaler() #default set 0~1
dataset= scaler.fit_transform(dataset)
...
make model
...
predicted = model.predict(X_test) #shape : (5, 1)
and when I run predict = scaler.inverse_transform(predicted)
ValueError occur ValueError: non-broadcastable output operand with shape (5,1) doesn't match the broadcast shape (5,2)
My model have 2 feature as input
I tried scaler.inverse_transform(predict)[:, [0]] and reshape in several directions
but occur same ValueError
how can I solve this Problem? please give me some advice
I need your priceless opinion and will be very much appreciated.
You are using inverse_transform in a wrong way: while you have used fit_transform to your features, you are using inverse_transform to your predictions, which are of a different shape, hence the error.
This is not the intended usage of inverse_transform; have a look at the docs for more:
inverse_transform(self, X)
Undo the scaling of X according to feature_range.
Parameters: X : array-like, shape [n_samples, n_features]
Input data that will be transformed. It cannot be sparse.
It is not clear from your post why you attempt to "transform back" your predictions; this only makes sense if you already have transformed your labels (it is not clear from your post if you have done so), and you want, say, to scale back measures like MSE in the original scale of the labels. In such a case, you should use a separate scaler for your labels - see own answer in How to interpret MSE in Keras Regressor for details (the example there is with StandardScaler, but the rationale is the same).

Keras Input Shape Issue

I can find many questions and answers related to my question but somehow they did not solve my problem. I have data with shape (10584, 56) and specified input_shape=(10584,56) in the code but getting following error:
ValueError: Error when checking input: expected dense_1_input to have 3 dimensions, but got array with shape (10584, 56).
I have somehow idea that I have to reshape my input data frame but not sure how. Following is my code:
y = df['Target']
x_train, x_test, y_train, y_test = train_test_split(df, y, test_size=0.2)
model = keras.models.Sequential()
model.add(keras.layers.Dense(64,input_shape(10584,56),activation='relu'))
Any help/suggestion will be much appreciated.
There is always an additional dimension for the batch size that you need add even if you want to use a batch size of 1.
Another possibility: If in fact your samples are not 2d vectors but 1d vectors of size 64 and 10584 is the number of samples you have, than the number of samples is not part of the input shape. You only provide the size of a single sample. Keras will take care of splitting your data into batches and setting the network up the right way.

Linear Discriminant Analysis transform function

x = data.values
y = target.values
lda = LDA(solver='eigen', shrinkage='auto',n_components=2)
df_lda = lda.fit(x,y).transform(x)
df_lda.shape
This is the small part of the code. I am trying to reduce the dimensionality to the most discriminative directions. To my understanding the transform() function projects data to maximize class separation for my data set and should return an array of shape (n_samples, n_components)
But my df_lda is of shape (614, 1).
What am I missing here ? Or is my data not linearly separable?.
For the case of K distinct classes in target.values there are K-1 components in the transformed data (without further dimensionality reduction). Since you only have two classes in your data set, there is only one transformed component so you cannot get more components than that.
I suppose it might by helpful for sklearn to issue a warning when you request more than are available.

Python, ValueError, BroadCast Error with SKLearn Preproccesing

I am trying to run SKLearn Preprocessing standard scaler function and I receive the following error:
from sklearn import preprocessing as pre
scaler = pre.StandardScaler().fit(t_train)
t_train_scale = scaler.transform(t_train)
t_test_scale = scaler.transform(t_test)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-149-c0133b7e399b> in <module>()
4 scaler = pre.StandardScaler().fit(t_train)
5 t_train_scale = scaler.transform(t_train)
----> 6 t_test_scale = scaler.transform(t_test)
C:\Users\****\Anaconda\lib\site-packages\sklearn\preprocessing\data.pyc in transform(self, X, y, copy)
356 else:
357 if self.with_mean:
--> 358 X -= self.mean_
359 if self.with_std:
360 X /= self.std_
ValueError: operands could not be broadcast together with shapes (40000,59) (119,) (40000,59)
I understand the shapes do not match. The train and test data set are different lengths so how would I transform the data?
please print the output from t_train.shape[1] and t_test.shape[1]
StandardScaler expects any two datasets to have the same number of columns. I suspect earlier pre-processing (dropping columns, adding dummy columns, etc) is the source of your problem. Whatever transformations you make to the t_train also need to be made to t_test.
The error is telling you the information that I'm asking for:
ValueError: operands could not be broadcast together with shapes (40000,59) (119,) (40000,59)
I expect you'll find that t_train.shape[1] is 59 and t_test.shape[1] is 119.
So you have 59 columns in your training dataset and 119 in your test dataset.
Did you remove any columns from the training set prior to attempting to use StandardScaler?
What do you mean by "train and test data set are different lengths"?? How did you obtain your training data?
If your testing data have more features than your training data in order to efficiently reduce the dimensionality of your testing data you should know how your training data were formulated.For example using a dimensionality reduction technique (PCA,SVD etc.) or something like that. If that is the case you have to multiply each testing vector with the same matrix that was used to reduce the dimensionality of your training data.
The time series was in the format with time as the columns and data in the rows. I did the following before the original posted code:
t_train.transpose()
t_test.transpose()
Just a reminder, I had to run the cell a 2x before the change 'stuck' for some reason...
t_train shape is (x, 119), whereas t_test shape is (40000,59).
If you want to use same scaler object for transformation then your data should have same number of columns always.
Since you fit scaler on t_train, that's the reason you are getting issue when you are trying to transform t_test.

Categories

Resources