I'm running the following code from github, but I'm getting an error. What's wrong?
https://github.com/susanli2016/Machine-Learning-with-Python/blob/master/Time%20Series%20ANN%20%26%20LSTM%20VIX.ipynb
Cell:
# scale train and test data to [-1, 1]
scaler = MinMaxScaler(feature_range=(-1, 1))
train_sc = scaler.fit_transform(train)
test_sc = scaler.transform(test)
Error:
ValueError: Expected 2D array, got 1D array instead:
array=[17.24 18.190001 19.219999 ... 10.47 10.18 11.04 ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
The person who made that notebook was using a really old version of sklearn. In short, your features were of the form [row_1, row_2...row_n], when they should have been of the form [[row_1], [row_2]...[row_n]].
Accordingly, use this:
new_shape = (len(train), 1)
train_sc = scaler.fit_transform(np.reshape(train, new_shape))
test_sc = scaler.transform(np.reshape(test, new_shape))
Solved the problem adding the methods below, which apparently transform train and test objects to numpy arrays. Is that correct?
scaler = MinMaxScaler(feature_range=(-1, 1))
train_sc = scaler.fit_transform(train.values.reshape(-1, 1))
test_sc = scaler.transform(test.values.reshape(-1,1))
Related
I am trying to normalise the pixel values of all the images contained in a folder at once but the error shows up
def resize():
data = []
img_size = 244
data_dir = r'C:\technocolab project2\archive\img'
for img in os.listdir(data_dir):
try:
imgPath = os.path.join(data_dir,img)
images = cv2.imread(imgPath, cv2.IMREAD_GRAYSCALE)
image_resized = cv2.resize(images,(img_size,img_size))
data.append(image_resized)
#except Exception as e:
#print(e)
except:
pass
return data
data = resize()
print(len(data))
sample = data[1]
print(sample.shape)
it prints 244,244
training = data[:int(0.7*len(data))]
validation = data[int(0.7*len(data)):int(0.9*len(data))]
testing = data[int(0.9*len(data)):]
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
train_normalised = sc.fit_transform(training)
valid_normalised = sc.transform(validation)
test_normalised = sc.transform(testing)
ValueError: Found array with dim 3. StandardScaler expected <= 2.
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
train_normalised = pca.fit_transform(training)
valid_normalised = pca.transform(validation)
test_normalised = pca.transform(testing)
MemoryError: Unable to allocate 15.5 GiB for an array with shape (34997, 244, 244) and data type float64
norm = sc.fit_transform(image_resized)
NameError: name 'image_resized' is not defined
First, standard scaler only works for 2D arrays, so you need to reshape your array before calling it.
Second, it is casting your data from np.uint8 to np.float64. Try to do it yourself and make sure everything is in np.float32, which is usually enough.
Another point to consider is loading in memory one dataset at a time (train, validation and test), and load them to a numpy array straightforward instead of creating a list: just create an empty numpy array and start to fill it with the images you read.
Last, but not least, a side note: the way you are splitting your dataset looks weird. Consider that os.listdir does not guarantee any order, so each time you run this code you may get different splits. In addition to this, you should shuffle your data before splitting, otherwise you may be adding some bias to your dataset. Take a look at train_test_split from sklearn.
I have images in the numpy format, I have downloaded the data from the internet(https://github.com/ichatnun/spatiospectral-densenet-rice-classification/blob/master/x.npy). Example of data (1, 34, 23, 100), Here 1 is the image number, 34x23 is pixel value, 100 is the channel.
I wanted to load the data for the training of a machine learning model, I looked at the other sources, their data is in the format only 34x23
#my code till now
dataset1 = np.load('x.npy', encoding='bytes')
print("shape of dataset1")
print(dataset1.shape, dataset1.dtype)
#data shape
shape of dataset1
(3, 50, 170, 110) float64
#my code
data1 = dataset1[:, :, :, -1]
data1.shape
If I use SVM like,
from sklearn.svm import SVC
clf = SVC(gamma='auto')
clf.fit(datasset1, y)
I got the error
ValueError: Found array with dim 4. Estimator expected <= 2
I wanted to load the data as a dataframe or another format for train and split, but I am not able to remove the first value.
Sample data
print(dataset1)
[[[[0.17807601 0.15946769 0.20311266 ... 0.48133529 0.48742528
0.47095974]
[0.18518101 0.18394045 0.19093267 ... 0.45889252 0.44987031
0.46464419]
[0.19600767 0.18845156 0.18506823 ... 0.47558362 0.47738807
0.45821586]
...
My expected output is how to pass the data to the svm for classification
the issue is that the SVM accept only 2d array, your data is in the format(numberof sample, rows, column, channel)
Try this, it works for me
dataset1 = np.load('x.npy', encoding='bytes')
dataset2 = np.load('labels.npy', encoding='bytes')
nsamples, nx, ny, nz = dataset1.shape
X = dataset1.reshape((nsamples,nx*ny*nz))
y = numpy.argmax(dataset2, axis=1)
from sklearn import svm
clf = svm.SVC(kernel='linear', C = 1.0)
clf.fit(X, y)
#repalce X with your test data
print(clf.predict(X))
pay attention to your data source, your x.npy doesn't have images
x.npy contains example datacubes of the processed rice dataset that
can be used for training/testing. Each datacube is a three-dimensional
50x170x110 tensor: two spatial dimensions and one spectral dimension.
I'm trying to train a classifier via PyTorch. However, I am experiencing problems with training when I feed the model with training data.
I get this error on y_pred = model(X_trainTensor):
RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1'
Here are key parts of my code:
# Hyper-parameters
D_in = 47 # there are 47 parameters I investigate
H = 33
D_out = 2 # output should be either 1 or 0
# Format and load the data
y = np.array( df['target'] )
X = np.array( df.drop(columns = ['target'], axis = 1) )
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8) # split training/test data
X_trainTensor = torch.from_numpy(X_train) # convert to tensors
y_trainTensor = torch.from_numpy(y_train)
X_testTensor = torch.from_numpy(X_test)
y_testTensor = torch.from_numpy(y_test)
# Define the model
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
nn.LogSoftmax(dim = 1)
)
# Define the loss function
loss_fn = torch.nn.NLLLoss()
for i in range(50):
y_pred = model(X_trainTensor)
loss = loss_fn(y_pred, y_trainTensor)
model.zero_grad()
loss.backward()
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
Reference is from this github issue.
When the error is RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1', you would need to use the .float() function since it says Expected object of scalar type Float.
Therefore, the solution is changing y_pred = model(X_trainTensor) to y_pred = model(X_trainTensor.float()).
Likewise, when you get another error for loss = loss_fn(y_pred, y_trainTensor), you need y_trainTensor.long() since the error message says Expected object of scalar type Long.
You could also do model.double(), as suggested by #Paddy
.
Before converting to Tensor, try this
X_train = X_train.astype(np.float32)
The issue can be fixed by setting the datatype of input to Double i.e torch.float32
I hope the issue came because your datatype is torch.float64.
You can avoid such situations either while setting the data, as explained in one of other answers or make the model type also to the same as of your data. i.e use either float64 or float32.
For debug, print obj.dtype and check for consistency.
Let's do that:
df['target'] = df['target'].astype(np.float32)
and for x features too
This issue can also occur if the wrong loss function is selected. For example, if you have regression problem, but you are trying to use cross entropy loss. Then it will be fixed by changing your loss function on MSE
try to use:
target = target.float() # target is the name of error
New to PyTorch. For some reason, calling torch.set_default_dtype() with the needed datatype was what worked for me on Google Colab. network.double()/network.float() and tensor.double()/tensor.float() didn’t have any effect, for some reason.
Try this example:
from sentence_transformers import SentenceTransformer, util
import numpy as np
import torch
a = np.array([0, 1,2])
b = [[0, 1,2], [4, 5,6], [7,8,9]]
bb = np.zeros((3,3))
for i in range(0, len(b)):
bb[i,:] = np.array(b[i])
a = torch.from_numpy(a)
b = torch.from_numpy(bb)
a= a.float()
b = b.float()
cosine_scores = util.pytorch_cos_sim(b, a)
print(cosine_scores)
I want to reshape the MNIST dataset from shape (70000, 784) to (70000, 28, 28), the following code is tryed, but it gets a TypeError:
TypeError: only integer scalar arrays can be converted to a scalar index
df = pd.read_csv('images.csv', sep=',', header=None)
x_data = np.array(df)
x_data = x_data.reshape(x_data[0], 28, 28)
This works, but is slow
data = np.array(df)
x_data = []
for d in data:
x_data.append(d.reshape(28,28))
x_data = np.array(x_data)
How should this be with numpy.reshape() and without looping?
Manny thanks!
I think, the problem with the second one is because ur using a for loop it can take more time. So i would suggest you can try this
import tensorflow as tf
#load the data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', validation_size=0)
#considering only first 2 data points
img = mnist.train.images[:2]
x = tf.reshape(img, shape=[-1, 28, 28, 1]) # -1 refers to standard feature which is equivalent to 28*28*1 here
Ideally i got the shape for x as (2, 28, 28, 1). Hope this helps!!
For MNIST dataset, you may use following to convert your dataset into 3D,
train = pd.read_csv("images.csv")
data = data.values.reshape(-1,28,28,1)
assuming you have data as pandas dataframe and first label column is already dropped.
Datasets.fetch_openml returns pair values includes features and target of mnist data.
Then we reshape the a certain row of feature in (28,28) 2-D array.
And as these features are the pixel intensity we can plot this 2-D array to visualise.
pixel_values,targets=datasets.fetch_openml(
'mnist_784',
version=1,
return_X_y=True
)
single_image=pixel_values[1:2].values.reshape(28,28)
plt.imshow(single_image,cmap='gray')
Getting output classification with Lasagne/Theano
I am migrating my code from pure Theano to Lasagne.
I had this certain code from a tutorial to get the result of a prediction with a certain data and I would generate a csv file to send to kaggle.
But with lasagne, it doesn't work.
I have tried several things but they all give errors.
I would love if anyone could help me figure what's wrong!
I pasted the whole code here :
http://pastebin.com/e7ry3280
test_data = np.loadtxt("../inputData/test.csv", dtype=np.uint8, delimiter=',', skiprows=1)
# The inputs are vectors now, we reshape them to monochrome 2D images,
# following the shape convention: (examples, channels, rows, columns)
data = data.reshape(-1, 1, 28, 28)
test_data = test_data.reshape(-1, 1, 28, 28)
index = T.lscalar() # index to a [mini]batch
preds = []
for it in range(len(test_data)):
test_data = test_data[it]
N = len(test_data)
# print "N : ", N
test_data = theano.shared(np.asarray(test_data, dtype=theano.config.floatX))
test_labels = T.cast(theano.shared(np.asarray(np.zeros(batch_size), dtype=theano.config.floatX)),'uint8')
###target_var
#y = T.ivector('y') # the labels are presented as 1D vector of [int] labels
#index = T.lscalar() # index to a [mini]batch
ppm = theano.function([index],lasagne.layers.get_output(network, deterministic=True),
givens={
input_var: test_data[index * batch_size: (index + 1) * batch_size],
target_var: test_labels
}, on_unused_input='warn')
p = [ppm(ii) for ii in range(N // batch_size)]
p = np.array(p).reshape((N, 10))
print (p)
p = np.argmax(p, axis=1)
p = p.astype(int)
preds.append(p)
subm = np.empty((len(preds), 2))
subm[:, 0] = np.arange(1, len(preds) + 1)
subm[:, 1] = preds
np.savetxt('submission.csv', subm, fmt='%d', delimiter=',',header='ImageId,Label', comments='')
return preds
The code fails on the line that starts with ppm = theano.function...:
TypeError: Cannot convert Type TensorType(float32, 3D) (of Variable Subtensor{int64:int64:}.0) into Type TensorType(float32, 4D). You can try to manually convert Subtensor{int64:int64:}.0 into a TensorType(float32, 4D).
I'm just trying to input the test data to the CNN and get the results to a CSV file. How can I do it? I know I must use minibatches because the whole test data wont fit on the GPU.
As pointed out by the error message and Daniel Renshaw in the comments, the problem is a mismatch of dimensions between test_data and input_var. On the first line on the loop, you write:
test_data = test_data[it]
Which turns the 4D array test_data into a 3D array with the same name (that is why using the same variable name for different types is never recommended :) ). After that you encapsulate it in a shared variable which doesn't change the dimension, and then you slice it to assign it to input_var, which again doesn't change the dimension.
If I understand your code, I think you should just remove that first line. That way test_data remains a list of examples, and you can slice it to make a batch.