Unable to load data from numpy array for SVM Classification - python

I have images in the numpy format, I have downloaded the data from the internet(https://github.com/ichatnun/spatiospectral-densenet-rice-classification/blob/master/x.npy). Example of data (1, 34, 23, 100), Here 1 is the image number, 34x23 is pixel value, 100 is the channel.
I wanted to load the data for the training of a machine learning model, I looked at the other sources, their data is in the format only 34x23
#my code till now
dataset1 = np.load('x.npy', encoding='bytes')
print("shape of dataset1")
print(dataset1.shape, dataset1.dtype)
#data shape
shape of dataset1
(3, 50, 170, 110) float64
#my code
data1 = dataset1[:, :, :, -1]
data1.shape
If I use SVM like,
from sklearn.svm import SVC
clf = SVC(gamma='auto')
clf.fit(datasset1, y)
I got the error
ValueError: Found array with dim 4. Estimator expected <= 2
I wanted to load the data as a dataframe or another format for train and split, but I am not able to remove the first value.
Sample data
print(dataset1)
[[[[0.17807601 0.15946769 0.20311266 ... 0.48133529 0.48742528
0.47095974]
[0.18518101 0.18394045 0.19093267 ... 0.45889252 0.44987031
0.46464419]
[0.19600767 0.18845156 0.18506823 ... 0.47558362 0.47738807
0.45821586]
...
My expected output is how to pass the data to the svm for classification

the issue is that the SVM accept only 2d array, your data is in the format(numberof sample, rows, column, channel)
Try this, it works for me
dataset1 = np.load('x.npy', encoding='bytes')
dataset2 = np.load('labels.npy', encoding='bytes')
nsamples, nx, ny, nz = dataset1.shape
X = dataset1.reshape((nsamples,nx*ny*nz))
y = numpy.argmax(dataset2, axis=1)
from sklearn import svm
clf = svm.SVC(kernel='linear', C = 1.0)
clf.fit(X, y)
#repalce X with your test data
print(clf.predict(X))

pay attention to your data source, your x.npy doesn't have images
x.npy contains example datacubes of the processed rice dataset that
can be used for training/testing. Each datacube is a three-dimensional
50x170x110 tensor: two spatial dimensions and one spectral dimension.

Related

Found array with dim 3. StandardScaler expected <= 2,Unable to allocate 15.5 GiB for an array with shape (34997, 244, 244) and data type float64

I am trying to normalise the pixel values of all the images contained in a folder at once but the error shows up
def resize():
data = []
img_size = 244
data_dir = r'C:\technocolab project2\archive\img'
for img in os.listdir(data_dir):
try:
imgPath = os.path.join(data_dir,img)
images = cv2.imread(imgPath, cv2.IMREAD_GRAYSCALE)
image_resized = cv2.resize(images,(img_size,img_size))
data.append(image_resized)
#except Exception as e:
#print(e)
except:
pass
return data
data = resize()
print(len(data))
sample = data[1]
print(sample.shape)
it prints 244,244
training = data[:int(0.7*len(data))]
validation = data[int(0.7*len(data)):int(0.9*len(data))]
testing = data[int(0.9*len(data)):]
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
train_normalised = sc.fit_transform(training)
valid_normalised = sc.transform(validation)
test_normalised = sc.transform(testing)
ValueError: Found array with dim 3. StandardScaler expected <= 2.
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
train_normalised = pca.fit_transform(training)
valid_normalised = pca.transform(validation)
test_normalised = pca.transform(testing)
MemoryError: Unable to allocate 15.5 GiB for an array with shape (34997, 244, 244) and data type float64
norm = sc.fit_transform(image_resized)
NameError: name 'image_resized' is not defined
First, standard scaler only works for 2D arrays, so you need to reshape your array before calling it.
Second, it is casting your data from np.uint8 to np.float64. Try to do it yourself and make sure everything is in np.float32, which is usually enough.
Another point to consider is loading in memory one dataset at a time (train, validation and test), and load them to a numpy array straightforward instead of creating a list: just create an empty numpy array and start to fill it with the images you read.
Last, but not least, a side note: the way you are splitting your dataset looks weird. Consider that os.listdir does not guarantee any order, so each time you run this code you may get different splits. In addition to this, you should shuffle your data before splitting, otherwise you may be adding some bias to your dataset. Take a look at train_test_split from sklearn.

Problem in reshaping train and validation data for 1D CNN

I want to train 1D CNN on physioNet2017 ECG data. Each row in training data is of valiable length i.e, some rows are 9000 columns long and some are 18286 columns long. To make them of same length i have padded zeros to each row upto maximum length that 18286.
Now i have 20200 rows and each row is 18286 columns long so data shape is (20200, 18286). now i want to reshape this data in order to train 1D CNN. i have used following code for splitting the data into training and validation.
Xt, Xv, Yt, Yv = train_test_split(trainX_bal, trainY_bal, random_state=42, test_size=0.2)
print("Train shape: ", Xt.shape)
print("Valdation shape: ", Xv.shape)
and i have output:
Train shape: (16160, 18286)
Valdation shape: (4040, 18286)
Now i have reshaped the training and validation data using following code:
samples_train = list()
samples_val = list()
samples_test = list()
length = 8
for i in range(0,Xt.shape[0],length):
sample = Xt[i:i+length]
samples_train.append(sample)
for i in range(0,Xv.shape[0],length):
sample_val = Xv[i:i+length]
samples_val.append(sample_val)
data = np.array(samples_train).astype(np.float32)
data_val = np.array(samples_val).astype(np.float32)
print("Training new shape: ", data.shape)
print("Validation new shape: ", data_val.shape)
Xt_cnn = data.reshape((len(samples_train), length, data.shape[2]))
Xv_cnn = data_val.reshape((len(samples_val), length, data_val.shape[2]))
Yt = to_categorical(Yt, num_classes=4)
Yv = to_categorical(Yv, num_classes=4)
the output is:
Training new shape: (2020, 8, 18286)
Validation new shape: (505, 8, 18286)
Now i fit this data to CNN model using following code:
mod = cnn_model(Xt_cnn)
cnn_history = mod.fit(Xt_cnn, Yt, batch_size=64, validation_data = (Xv_cnn, Yv),
epochs=20)
i get this error.
Error
Your reshaping is wrong. You are altering the number of samples so your data becomes incompatible with your labels. As I understand you are trying to reshape (1,18286) into (8,18286/8) values which is impossible since 18286/8=2285,75. If you increase your padding and make shape 18288 then it becomes possible, since 18288/8=2286(since it's an integer).
You can do this reshaping as the following pseudo-code:
Arr=[]
for samp in range(number_of_samples):
new_array=Xt[samp,:].reshape(8,2286)
Arr.append(new_array)
Arr=np.array(Arr)
Arr's shape becomes (number_of_samples,8,2886)

X has 232 features, but StandardScaler is expecting 241 features as input

I want to make a prediction using knn and I have following lines of code:
def knn(trainImages, trainLabels, testImages, testLabels):
max = 0
for i in range(len(trainImages)):
if len(trainImages[i]) > max:
max = len(trainImages[i])
for i in range(len(trainImages)):
aux = np.array(trainImages[i])
aux.resize(max)
trainImages[i] = aux
max = 0
for i in range(len(testImages)):
if len(testImages[i]) > max:
max = len(testImages[i])
for i in range(len(testImages)):
aux = np.array(testImages[i])
aux.resize(max)
testImages[i] = aux
scaler = StandardScaler()
scaler.fit(list(trainImages))
trainImages = scaler.transform(list(trainImages))
testImages = scaler.transform(list(testImages))
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(trainImages, trainLabels)
pred = classifier.predict(testImages)
print(classification_report(testLabels, pred))
I got the error at testImages = scaler.transform(list(testImages)). I understand that its a difference between arrays number. How can I solve it?
scaler in scikit-learn expects input shape as (n_samples, n_features).
If your second dimension in train and test set is not equal, then not only in sklearn it is incorrect and cause to raise error, but also in theory it does not make sense. n_features dimension of test and train set should be equal, but first dimension can be different, since it show number of samples and you can have any number of samples in train and test sets.
When you execute scaler.transform(test) it expects test have the same feature numbers as where you executed scaler.fit(train). So, all your images should be in the same size.
For example, if you have 100 images, train_images shape should be something like (90,224,224,3) and test_images shape should be like (10,224,224,3) (only first dimension is different).
So, try to resize your images like this:
import cv2
resized_image = cv2.resize(image, (224,224)) #don't include channel dimension

Import and reshape MNIST data, numpy

I want to reshape the MNIST dataset from shape (70000, 784) to (70000, 28, 28), the following code is tryed, but it gets a TypeError:
TypeError: only integer scalar arrays can be converted to a scalar index
df = pd.read_csv('images.csv', sep=',', header=None)
x_data = np.array(df)
x_data = x_data.reshape(x_data[0], 28, 28)
This works, but is slow
data = np.array(df)
x_data = []
for d in data:
x_data.append(d.reshape(28,28))
x_data = np.array(x_data)
How should this be with numpy.reshape() and without looping?
Manny thanks!
I think, the problem with the second one is because ur using a for loop it can take more time. So i would suggest you can try this
import tensorflow as tf
#load the data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', validation_size=0)
#considering only first 2 data points
img = mnist.train.images[:2]
x = tf.reshape(img, shape=[-1, 28, 28, 1]) # -1 refers to standard feature which is equivalent to 28*28*1 here
Ideally i got the shape for x as (2, 28, 28, 1). Hope this helps!!
For MNIST dataset, you may use following to convert your dataset into 3D,
train = pd.read_csv("images.csv")
data = data.values.reshape(-1,28,28,1)
assuming you have data as pandas dataframe and first label column is already dropped.
Datasets.fetch_openml returns pair values includes features and target of mnist data.
Then we reshape the a certain row of feature in (28,28) 2-D array.
And as these features are the pixel intensity we can plot this 2-D array to visualise.
pixel_values,targets=datasets.fetch_openml(
'mnist_784',
version=1,
return_X_y=True
)
single_image=pixel_values[1:2].values.reshape(28,28)
plt.imshow(single_image,cmap='gray')

How to use tf.data.Dataset.apply() for reshaping the dataset

I am working with time series models in tensorflow. My dataset contains physics signals. I need to divide this signals into windows as give this sliced windows as input to my model.
Here is how I am reading the data and slicing it:
import tensorflow as tf
import numpy as np
def _ds_slicer(data):
win_len = 768
return {"mix":(tf.stack(tf.split(data["mix"],win_len))),
"pure":(tf.stack(tf.split(data["pure"],win_len)))}
dataset = tf.data.Dataset.from_tensor_slices({
"mix" : np.random.uniform(0,1,[1000,24576]),
"pure" : np.random.uniform(0,1,[1000,24576])
})
dataset = dataset.map(_ds_slicer)
print dataset.output_shapes
# {'mix': TensorShape([Dimension(768), Dimension(32)]), 'pure': TensorShape([Dimension(768), Dimension(32)])}
I want to reshape this dataset to # {'mix': TensorShape([Dimension(32)]), 'pure': TensorShape([Dimension(32))}
Equivalent transformation in numpy would be something like following:
signal = np.random.uniform(0,1,[1000,24576])
sliced_sig = np.stack(np.split(signal,768,axis=1),axis=1)
print sliced_sig.shape #(1000, 768, 32)
sliced_sig=sliced_sig.reshape(-1, sliced_sig.shape[-1])
print sliced_sig.shape #(768000, 32)
I thought of using tf.contrib.data.group_by_window as an input to dataset.apply() but couldn't figure out exactly how to use it. Is there a way I can use any custom transformation to reshape the dataset?
I think you're just looking for the transformation tf.contrib.data.unbatch. This does exactly what you want:
x = np.zeros((1000, 768, 32))
dataset = tf.data.Dataset.from_tensor_slices(x)
print(dataset.output_shapes) # (768, 32)
dataset = dataset.apply(tf.contrib.data.unbatch())
print(dataset.output_shapes) # (32,)
From the documentation:
If elements of the dataset are shaped [B, a0, a1, ...], where B may vary from element to element, then for each element in the dataset, the unbatched dataset will contain B consecutive elements of shape [a0, a1, ...].
Edit for TF 2.0
(Thanks #DavidParks)
From TF 2.0, you can use directly tf.data.Dataset.unbatch:
x = np.zeros((1000, 768, 32))
dataset = tf.data.Dataset.from_tensor_slices(x)
print(dataset.output_shapes) # (768, 32)
dataset = dataset.unbatch()
print(dataset.output_shapes) # (32,)

Categories

Resources