Import and reshape MNIST data, numpy - python

I want to reshape the MNIST dataset from shape (70000, 784) to (70000, 28, 28), the following code is tryed, but it gets a TypeError:
TypeError: only integer scalar arrays can be converted to a scalar index
df = pd.read_csv('images.csv', sep=',', header=None)
x_data = np.array(df)
x_data = x_data.reshape(x_data[0], 28, 28)
This works, but is slow
data = np.array(df)
x_data = []
for d in data:
x_data.append(d.reshape(28,28))
x_data = np.array(x_data)
How should this be with numpy.reshape() and without looping?
Manny thanks!

I think, the problem with the second one is because ur using a for loop it can take more time. So i would suggest you can try this
import tensorflow as tf
#load the data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', validation_size=0)
#considering only first 2 data points
img = mnist.train.images[:2]
x = tf.reshape(img, shape=[-1, 28, 28, 1]) # -1 refers to standard feature which is equivalent to 28*28*1 here
Ideally i got the shape for x as (2, 28, 28, 1). Hope this helps!!

For MNIST dataset, you may use following to convert your dataset into 3D,
train = pd.read_csv("images.csv")
data = data.values.reshape(-1,28,28,1)
assuming you have data as pandas dataframe and first label column is already dropped.

Datasets.fetch_openml returns pair values includes features and target of mnist data.
Then we reshape the a certain row of feature in (28,28) 2-D array.
And as these features are the pixel intensity we can plot this 2-D array to visualise.
pixel_values,targets=datasets.fetch_openml(
'mnist_784',
version=1,
return_X_y=True
)
single_image=pixel_values[1:2].values.reshape(28,28)
plt.imshow(single_image,cmap='gray')

Related

Unable to load data from numpy array for SVM Classification

I have images in the numpy format, I have downloaded the data from the internet(https://github.com/ichatnun/spatiospectral-densenet-rice-classification/blob/master/x.npy). Example of data (1, 34, 23, 100), Here 1 is the image number, 34x23 is pixel value, 100 is the channel.
I wanted to load the data for the training of a machine learning model, I looked at the other sources, their data is in the format only 34x23
#my code till now
dataset1 = np.load('x.npy', encoding='bytes')
print("shape of dataset1")
print(dataset1.shape, dataset1.dtype)
#data shape
shape of dataset1
(3, 50, 170, 110) float64
#my code
data1 = dataset1[:, :, :, -1]
data1.shape
If I use SVM like,
from sklearn.svm import SVC
clf = SVC(gamma='auto')
clf.fit(datasset1, y)
I got the error
ValueError: Found array with dim 4. Estimator expected <= 2
I wanted to load the data as a dataframe or another format for train and split, but I am not able to remove the first value.
Sample data
print(dataset1)
[[[[0.17807601 0.15946769 0.20311266 ... 0.48133529 0.48742528
0.47095974]
[0.18518101 0.18394045 0.19093267 ... 0.45889252 0.44987031
0.46464419]
[0.19600767 0.18845156 0.18506823 ... 0.47558362 0.47738807
0.45821586]
...
My expected output is how to pass the data to the svm for classification
the issue is that the SVM accept only 2d array, your data is in the format(numberof sample, rows, column, channel)
Try this, it works for me
dataset1 = np.load('x.npy', encoding='bytes')
dataset2 = np.load('labels.npy', encoding='bytes')
nsamples, nx, ny, nz = dataset1.shape
X = dataset1.reshape((nsamples,nx*ny*nz))
y = numpy.argmax(dataset2, axis=1)
from sklearn import svm
clf = svm.SVC(kernel='linear', C = 1.0)
clf.fit(X, y)
#repalce X with your test data
print(clf.predict(X))
pay attention to your data source, your x.npy doesn't have images
x.npy contains example datacubes of the processed rice dataset that
can be used for training/testing. Each datacube is a three-dimensional
50x170x110 tensor: two spatial dimensions and one spectral dimension.

Expected 2D array, got 1D array

I'm running the following code from github, but I'm getting an error. What's wrong?
https://github.com/susanli2016/Machine-Learning-with-Python/blob/master/Time%20Series%20ANN%20%26%20LSTM%20VIX.ipynb
Cell:
# scale train and test data to [-1, 1]
scaler = MinMaxScaler(feature_range=(-1, 1))
train_sc = scaler.fit_transform(train)
test_sc = scaler.transform(test)
Error:
ValueError: Expected 2D array, got 1D array instead:
array=[17.24 18.190001 19.219999 ... 10.47 10.18 11.04 ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
The person who made that notebook was using a really old version of sklearn. In short, your features were of the form [row_1, row_2...row_n], when they should have been of the form [[row_1], [row_2]...[row_n]].
Accordingly, use this:
new_shape = (len(train), 1)
train_sc = scaler.fit_transform(np.reshape(train, new_shape))
test_sc = scaler.transform(np.reshape(test, new_shape))
Solved the problem adding the methods below, which apparently transform train and test objects to numpy arrays. Is that correct?
scaler = MinMaxScaler(feature_range=(-1, 1))
train_sc = scaler.fit_transform(train.values.reshape(-1, 1))
test_sc = scaler.transform(test.values.reshape(-1,1))

Value error while concatenating the numpy arrays

Im loading mnist dataset as follows,
(X_train, y_train), (X_test, y_test) = mnist.load_data()
However since I need to load and train my own dataset, I wrote the little script as follows which will give the exact train and test values
def load_train(path):
X_train = []
y_train = []
print('Read train images')
for j in range(10):
files = glob(path + "*.jpeg")
for fl in files:
img = get_im(fl)
print(fl)
X_train.append(img)
y_train.append(j)
return np.asarray(X_train), np.asarray(y_train)
the pertained model generates a numpy array of size (64, 28, 28, 1) while training. Im concatenating the image_batch from the generated image as follows,
X = np.concatenate((image_batch, generated_images))
However im getting the following error,
ValueError: all the input arrays must have same number of dimensions
img_batch is of size (64, 28, 28)
generated_images is of size (64, 28, 28, 1)
How do I expand the dimension of the img_batch in X_train so as to concatenate with generated_images? or is there any other ways to load the custom images in place of loadmnist?
There is a function in python called np.expand_dims() which can expand the dimension of any array along the axis provided in arguments. In your case use, img_batch = np.expand_dims(img_batch, axis=3).
One other approach would be to use reshape function as suggested by #Ioannis Nasios. img_batch = img_batch.reshape(64,28,28,1)
image_batch = image_batch.reshape(64, 28, 28, 1)

How to use tf.data.Dataset.apply() for reshaping the dataset

I am working with time series models in tensorflow. My dataset contains physics signals. I need to divide this signals into windows as give this sliced windows as input to my model.
Here is how I am reading the data and slicing it:
import tensorflow as tf
import numpy as np
def _ds_slicer(data):
win_len = 768
return {"mix":(tf.stack(tf.split(data["mix"],win_len))),
"pure":(tf.stack(tf.split(data["pure"],win_len)))}
dataset = tf.data.Dataset.from_tensor_slices({
"mix" : np.random.uniform(0,1,[1000,24576]),
"pure" : np.random.uniform(0,1,[1000,24576])
})
dataset = dataset.map(_ds_slicer)
print dataset.output_shapes
# {'mix': TensorShape([Dimension(768), Dimension(32)]), 'pure': TensorShape([Dimension(768), Dimension(32)])}
I want to reshape this dataset to # {'mix': TensorShape([Dimension(32)]), 'pure': TensorShape([Dimension(32))}
Equivalent transformation in numpy would be something like following:
signal = np.random.uniform(0,1,[1000,24576])
sliced_sig = np.stack(np.split(signal,768,axis=1),axis=1)
print sliced_sig.shape #(1000, 768, 32)
sliced_sig=sliced_sig.reshape(-1, sliced_sig.shape[-1])
print sliced_sig.shape #(768000, 32)
I thought of using tf.contrib.data.group_by_window as an input to dataset.apply() but couldn't figure out exactly how to use it. Is there a way I can use any custom transformation to reshape the dataset?
I think you're just looking for the transformation tf.contrib.data.unbatch. This does exactly what you want:
x = np.zeros((1000, 768, 32))
dataset = tf.data.Dataset.from_tensor_slices(x)
print(dataset.output_shapes) # (768, 32)
dataset = dataset.apply(tf.contrib.data.unbatch())
print(dataset.output_shapes) # (32,)
From the documentation:
If elements of the dataset are shaped [B, a0, a1, ...], where B may vary from element to element, then for each element in the dataset, the unbatched dataset will contain B consecutive elements of shape [a0, a1, ...].
Edit for TF 2.0
(Thanks #DavidParks)
From TF 2.0, you can use directly tf.data.Dataset.unbatch:
x = np.zeros((1000, 768, 32))
dataset = tf.data.Dataset.from_tensor_slices(x)
print(dataset.output_shapes) # (768, 32)
dataset = dataset.unbatch()
print(dataset.output_shapes) # (32,)

Getting output classification with lasagne

Getting output classification with Lasagne/Theano
I am migrating my code from pure Theano to Lasagne.
I had this certain code from a tutorial to get the result of a prediction with a certain data and I would generate a csv file to send to kaggle.
But with lasagne, it doesn't work.
I have tried several things but they all give errors.
I would love if anyone could help me figure what's wrong!
I pasted the whole code here :
http://pastebin.com/e7ry3280
test_data = np.loadtxt("../inputData/test.csv", dtype=np.uint8, delimiter=',', skiprows=1)
# The inputs are vectors now, we reshape them to monochrome 2D images,
# following the shape convention: (examples, channels, rows, columns)
data = data.reshape(-1, 1, 28, 28)
test_data = test_data.reshape(-1, 1, 28, 28)
index = T.lscalar() # index to a [mini]batch
preds = []
for it in range(len(test_data)):
test_data = test_data[it]
N = len(test_data)
# print "N : ", N
test_data = theano.shared(np.asarray(test_data, dtype=theano.config.floatX))
test_labels = T.cast(theano.shared(np.asarray(np.zeros(batch_size), dtype=theano.config.floatX)),'uint8')
###target_var
#y = T.ivector('y') # the labels are presented as 1D vector of [int] labels
#index = T.lscalar() # index to a [mini]batch
ppm = theano.function([index],lasagne.layers.get_output(network, deterministic=True),
givens={
input_var: test_data[index * batch_size: (index + 1) * batch_size],
target_var: test_labels
}, on_unused_input='warn')
p = [ppm(ii) for ii in range(N // batch_size)]
p = np.array(p).reshape((N, 10))
print (p)
p = np.argmax(p, axis=1)
p = p.astype(int)
preds.append(p)
subm = np.empty((len(preds), 2))
subm[:, 0] = np.arange(1, len(preds) + 1)
subm[:, 1] = preds
np.savetxt('submission.csv', subm, fmt='%d', delimiter=',',header='ImageId,Label', comments='')
return preds
The code fails on the line that starts with ppm = theano.function...:
TypeError: Cannot convert Type TensorType(float32, 3D) (of Variable Subtensor{int64:int64:}.0) into Type TensorType(float32, 4D). You can try to manually convert Subtensor{int64:int64:}.0 into a TensorType(float32, 4D).
I'm just trying to input the test data to the CNN and get the results to a CSV file. How can I do it? I know I must use minibatches because the whole test data wont fit on the GPU.
As pointed out by the error message and Daniel Renshaw in the comments, the problem is a mismatch of dimensions between test_data and input_var. On the first line on the loop, you write:
test_data = test_data[it]
Which turns the 4D array test_data into a 3D array with the same name (that is why using the same variable name for different types is never recommended :) ). After that you encapsulate it in a shared variable which doesn't change the dimension, and then you slice it to assign it to input_var, which again doesn't change the dimension.
If I understand your code, I think you should just remove that first line. That way test_data remains a list of examples, and you can slice it to make a batch.

Categories

Resources