sklearn.exceptions.NotFittedError: This LabelEncoder instance is not fitted yet

sklearn.exceptions.NotFittedError: This LabelEncoder instance is not fitted yet - python

I'm trying to run a voice recognition code from Github HERE that analyzes voice. There is an example in final_results_gender_test.ipynb that illustrates the steps both on the training and inference. So I copied and adjusted the inference part and came up with the following code that uses the trained model for just inference. But I'm not sure why I get this error, complaining This LabelEncoder instance is not fitted yet.
How to fix the problem? I'm just doing inference and why do I need the fit?
Traceback (most recent call last):
File "C:\Users\myname\Documents\Speech-Emotion-Analyzer-master\audio.py", line 53, in <module>
livepredictions = (lb.inverse_transform((liveabc)))
File "C:\Users\myname\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py", line 272, in inverse_transform
check_is_fitted(self, 'classes_')
File "C:\Users\myname\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 914, in check_is_fitted
raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This LabelEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
Here is my copied/adjusted code from the notebook:
import os
from keras import regularizers
import keras
from keras.callbacks import ModelCheckpoint
from keras.layers import Conv1D, MaxPooling1D, AveragePooling1D, Dense, Embedding, Input, Flatten, Dropout, Activation, LSTM
from keras.models import Model, Sequential, model_from_json
from keras.preprocessing import sequence
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
import librosa
import librosa.display
from matplotlib.pyplot import specgram
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
opt = keras.optimizers.rmsprop(lr=0.00001, decay=1e-6)
lb = LabelEncoder()
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("saved_models/Emotion_Voice_Detection_Model.h5")
print("Loaded model from disk")
X, sample_rate = librosa.load('h04.wav', res_type='kaiser_fast',duration=2.5,sr=22050*2,offset=0.5)
sample_rate = np.array(sample_rate)
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13),axis=0)
featurelive = mfccs
livedf2 = featurelive
livedf2= pd.DataFrame(data=livedf2)
livedf2 = livedf2.stack().to_frame().T
twodim= np.expand_dims(livedf2, axis=2)
livepreds = loaded_model.predict(twodim, batch_size=32, verbose=1)
livepreds1=livepreds.argmax(axis=1)
liveabc = livepreds1.astype(int).flatten()
livepredictions = (lb.inverse_transform((liveabc)))
print(livepredictions)

I was facing the same problem. It is probably too late for you. But I want to give solution for who has still errors.
I was using this code Github
at read.me file you can see this note:
NOTE: If you are using the model directly and want to decode the output ranging from 0 to 9 then the following list will help you.
So since he already gave that list, just delete this part from your code:
livepredictions = (lb.inverse_transform((liveabc)))
print(livepredictions)
Since the speechs are already loaded, we don't need to fit it or transform it.
So instead of those lines add these: I prefer to use as dictionary, then print it from there.
Sentiments = { 0 : "Female_angry",
1 : "Female Calm",
2 : "Female Fearful",
3 : "Female Happy",
4 : "Female Sad",
5 : "Male Angry",
6 : "Male calm",
7 : "Male Fearful",
8 : "Male Happy",
9 : "Male sad"
}
Way1: Use generator to get value from dictionary.
Result = [emotions for (number,emotions) in Sentiments.items() if liveabc == number]
print(Result)
Way2: Or simply get exact value from dictionary.
for number,emotions in Sentiments.items():
if liveabc == number:
print(emotions)
if you use way 1: it will show you as ['Male Angry'].
if you use way 2: it will print as Male Angry.
So the full code will be like this:
from keras.models import model_from_json
import librosa
import numpy as np
import pandas as pd
Sentiments = { 0 : "Female_angry",
1 : "Female Calm",
2 : "Female Fearful",
3 : "Female Happy",
4 : "Female Sad",
5 : "Male Angry",
6 : "Male calm",
7 : "Male Fearful",
8 : "Male Happy",
9 : "Male sad"
}
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("saved_models/Emotion_Voice_Detection_Model.h5")
print("Loaded model from disk")
X, sample_rate = librosa.load('output10.wav', res_type='kaiser_fast',duration=2.5,sr=22050*2,offset=0.5)
sample_rate = np.array(sample_rate)
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13),axis=0)
featurelive = mfccs
livedf2 = featurelive
livedf2= pd.DataFrame(data=livedf2)
livedf2 = livedf2.stack().to_frame().T
twodim= np.expand_dims(livedf2, axis=2)
livepreds = loaded_model.predict(twodim,
batch_size=32,
verbose=1)
livepreds1=livepreds.argmax(axis=1)
liveabc = livepreds1.astype(int).flatten()
Result = [emotions for (number,emotions) in Sentiments.items() if liveabc == number]
print(Result)

Related

Value Error problem for machine learning model

I got a ValueError when using TensorFlow to create a model. I have tried debugging and checked the shapes from my model. I don't understand how or what this error means but I've narrowed the problem down to the Conv2D layer as causing the error. I also tried changing hyperparamters (i.e., batch size, microbatches, etc.).
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
!pip install tensorflow-privacy
import numpy as np
import tensorflow as tf
from tensorflow_privacy import *
import tensorflow_privacy
from matplotlib import pyplot as plt
import pylab as pl
import numpy as np
import pandas as pd
from tensorflow.keras.models import Model
from tensorflow.keras import datasets, layers, models, losses
from tensorflow.keras import backend as bke
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l1, l2, l1_l2 #meaning of norm
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
batch_size = 8
epochs = 4
microbatches = 8
inChannel = 1
kr = 0#1e-5
num_kernels=8
drop_perc=0.25
dim = 1
l2_norm_clip = 1.5
noise_multiplier = 1.3
learning_rate = 0.25
latent_dim = 0
def print_datashape():
print('genotype data: ', genotype_data.shape)
print('phenotype data: ', single_pheno.shape)
genotype_data = tf.random.uniform([4276, 28220],0,255,)
phenotype_data = tf.random.uniform([4276, 20],0,255,)
genotype_data = genotype_data.numpy()
phenotype_data = phenotype_data.numpy()
small_geno = genotype_data
single_pheno = phenotype_data[:, 1]
print_datashape()
df = small_geno
min_max_scaler = preprocessing.MinMaxScaler()
df = min_max_scaler.fit_transform(df)
scaled_pheno = min_max_scaler.fit_transform(single_pheno.reshape(-1,1)).reshape(-1)
feature_size= df.shape[1]
df = df.reshape(-1, feature_size, 1, 1)
print("df: ", df.shape)
print("scaled: ", scaled_pheno.shape)
# split train to train and valid
train_data,test_data,train_Y,test_Y = train_test_split(df, scaled_pheno, test_size=0.2, random_state=13)
train_X,valid_X,train_Y,valid_Y = train_test_split(train_data, train_Y, test_size=0.2, random_state=13)
def print_shapes():
print('train_X: {}'.format(train_X.shape))
print('train_Y: {}'.format(train_Y.shape))
print('valid_X: {}'.format(valid_X.shape))
print('valid_Y: {}'.format(valid_Y.shape))
input_shape= (feature_size, dim, inChannel)
predictor = tf.keras.Sequential()
predictor.add(layers.Conv2D(num_kernels, (5,1), padding='same', strides=(12, 1), activation='relu',input_shape= input_shape))
predictor.add(layers.AveragePooling2D(pool_size=(2,1)))
predictor.add(layers.Dropout(drop_perc))
predictor.add(layers.Flatten())
predictor.add(layers.Dense(int(feature_size / 4), activation='relu'))
predictor.add(layers.Dropout(drop_perc))
predictor.add(layers.Dense(int(feature_size / 10), activation='relu'))
predictor.add(layers.Dropout(drop_perc))
predictor.add(layers.Dense(1))
optimizer = DPKerasAdamOptimizer(learning_rate=learning_rate, l2_norm_clip=l2_norm_clip, noise_multiplier=noise_multiplier, num_microbatches=microbatches)
# compile
predictor.compile(loss='mse', optimizer=optimizer, metrics=['mse'])
#summary
predictor.summary()
print_shapes()
predictor.fit(train_X, train_Y,batch_size=batch_size,epochs=epochs,verbose=1, validation_data=(valid_X, valid_Y))
ValueError: Dimension size must be evenly divisible by 8 but is 1 for '{{node Reshape}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32](mean_squared_error/weighted_loss/value, Reshape/shape)' with input shapes: [], [2] and with input tensors computed as partial shapes: input[1] = [8,?].

object of type 'Activation' has no len()

I'm trying to build my GraphSAGE model using Keras but I get the following error:
/Users/name/anaconda3/envs/tf/bin/python /Users/name/PycharmProjects/keras_autoencoder/NodeEmbeddings.py
Using TensorFlow backend.
2020-03-26 22:35:08.640725: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-26 22:35:08.655308: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f9aa4872710 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-26 22:35:08.655323: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
link_classification: using 'ip' method to combine node embeddings into edge embeddings
/Users/name/PycharmProjects/keras_autoencoder/NodeEmbeddings.py:65: UserWarning: Update your `Model` call to the Keras 2 API: `Model(inputs=[<tf.Tenso..., outputs=Tensor("re...)`
model = Model(input=x_inp, output=prediction)
Traceback (most recent call last):
File "/Users/name/PycharmProjects/keras_autoencoder/NodeEmbeddings.py", line 65, in <module>
model = Model(input=x_inp, output=prediction)
File "/Users/name/anaconda3/envs/tf/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/Users/name/anaconda3/envs/tf/lib/python3.6/site-packages/keras/engine/network.py", line 94, in __init__
self._init_graph_network(*args, **kwargs)
File "/Users/name/anaconda3/envs/tf/lib/python3.6/site-packages/keras/engine/network.py", line 241, in _init_graph_network
self.inputs, self.outputs)
File "/Users/name/anaconda3/envs/tf/lib/python3.6/site-packages/keras/engine/network.py", line 1434, in _map_graph_network
tensor_index=tensor_index)
File "/Users/name/anaconda3/envs/tf/lib/python3.6/site-packages/keras/engine/network.py", line 1415, in build_map
for i in range(len(node.inbound_layers)):
TypeError: object of type 'Activation' has no len()
Here is is my code:
import networkx as nx
import stellargraph as sg
import pandas as pd
import numpy as np
from keras import layers, optimizers, losses, metrics, Model
from keras import optimizers
from stellargraph.mapper import GraphSAGENodeGenerator, GraphSAGELinkGenerator
from stellargraph.layer import GraphSAGE, link_classification
from stellargraph.data import UnsupervisedSampler
from sklearn import preprocessing
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
# Loading Data --------------
# Define Edges and Nodes (from_pandas_edgelist creates Nodes automatically from the parsed edgelist)
edgelist= pd.read_csv("./data/cora/cora.cites", sep='\t', header=None, names=['target', 'source'])
edgelist['label'] = 'cites'
Gnx = nx.from_pandas_edgelist(edgelist, edge_attr='label')
nx.set_node_attributes(Gnx, 'paper', 'label')
# Define Node features
feature_names = ["w_{}".format(ii) for ii in range(1433)]
column_names = feature_names + ['subject']
node_data = pd.read_csv("./data/cora/cora.content", sep='\t', header=None, names=column_names)
node_with_features = node_data[feature_names]
# Create StellarGraph object
G = sg.StellarGraph(Gnx, node_features=node_with_features)
# Specify model and training parameter
nodes = list(G.nodes())
number_of_walks = 1
length = 5
batch_size = 50
epochs = 4
num_samples = [10, 5]
unsupervised_samples = UnsupervisedSampler(G, nodes=nodes, length=length, number_of_walks=number_of_walks)
train_gen = GraphSAGELinkGenerator(G,batch_size, num_samples)#.flow(unsupervised_samples)
# Creating GraphSAGE model
layer_sizes =[50,50]
graphsage = GraphSAGE(layer_sizes=layer_sizes, generator=train_gen, bias=True, dropout=0.0, normalize='l2')
x_inp, x_out = graphsage.build()
prediction = link_classification(output_dim=1, output_act='hard_sigmoid', edge_embedding_method='ip')(x_out)
model = Model(input=x_inp, output=prediction)
model.compile(
optimizers=optimizers.Adam(lr=1e-3),
loss=losses.binary_crossentropy,
metrics=[metrics.binary_accuracy],
)
history = model.fit_generator(
train_gen,
epochs=epochs,
verbose=1,
use_multiprocessing=False,
workers=4,
shuffle=True,
)
# Node Embedding
x_inp_src = x_inp[0::2]
x_out_src = x_out[0]
embedding_model = Model(inputs=x_inp_src, outputs=x_out_src)
node_ids = node_data.index
node_gen = GraphSAGENodeGenerator(G, batch_size,num_samples).flow(node_ids)
node_embeddings = embedding_model.predict_generator(node_gen, workers=4, verbose=1)
As I'm not sure what this error tells me because the Activation methods in Keras API don't have len() implemented. I've read couple of other topics on this error but it doesn't work out either. Please help.

the exact problem/solution would depend on what version of stellargraph you're using, but if it's not a problem for you to use the latest version (0.11.0 at time of writing), I've made some adjustments to make it work:
import networkx as nx
import stellargraph as sg
import pandas as pd
import numpy as np
# UPDATED: import from tensorflow.keras instead of keras
from tensorflow.keras import layers, optimizers, losses, metrics, Model
from stellargraph.mapper import GraphSAGENodeGenerator, GraphSAGELinkGenerator
from stellargraph.layer import GraphSAGE, link_classification
from stellargraph.data import UnsupervisedSampler
from sklearn import preprocessing
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
# Loading Data --------------
# Define Edges and Nodes (from_pandas_edgelist creates Nodes automatically from the parsed edgelist)
edgelist= pd.read_csv("./data/cora/cora.cites", sep='\t', header=None, names=['target', 'source'])
edgelist['label'] = 'cites'
Gnx = nx.from_pandas_edgelist(edgelist, edge_attr='label')
nx.set_node_attributes(Gnx, 'paper', 'label')
# Define Node features
feature_names = ["w_{}".format(ii) for ii in range(1433)]
column_names = feature_names + ['subject']
node_data = pd.read_csv("./data/cora/cora.content", sep='\t', header=None, names=column_names)
node_with_features = node_data[feature_names]
# Create StellarGraph object
G = sg.StellarGraph(Gnx, node_features=node_with_features)
# Specify model and training parameter
nodes = list(G.nodes())
number_of_walks = 1
length = 5
batch_size = 50
epochs = 4
num_samples = [10, 5]
unsupervised_samples = UnsupervisedSampler(G, nodes=nodes, length=length, number_of_walks=number_of_walks)
train_gen = GraphSAGELinkGenerator(G,batch_size, num_samples)
# Creating GraphSAGE model
layer_sizes =[50,50]
graphsage = GraphSAGE(layer_sizes=layer_sizes, generator=train_gen, bias=True, dropout=0.0, normalize='l2')
x_inp, x_out = graphsage.build()
prediction = link_classification(output_dim=1, output_act='hard_sigmoid', edge_embedding_method='ip')(x_out)
# UPDATED: `inputs` and `outputs` instead of `input` and `output`
model = Model(inputs=x_inp, outputs=prediction)
model.compile(
# UPDATED: parameter name `optimizer` instead of `optimizers`
optimizer=optimizers.Adam(lr=1e-3),
loss=losses.binary_crossentropy,
metrics=[metrics.binary_accuracy],
)
history = model.fit_generator(
# UPDATED: we need to call .flow before passing it to `fit_generator`
train_gen.flow(unsupervised_samples),
epochs=epochs,
verbose=1,
use_multiprocessing=False,
workers=4,
shuffle=True,
)
# Node Embedding
x_inp_src = x_inp[0::2]
x_out_src = x_out[0]
embedding_model = Model(inputs=x_inp_src, outputs=x_out_src)
node_ids = node_data.index
node_gen = GraphSAGENodeGenerator(G, batch_size,num_samples).flow(node_ids)
node_embeddings = embedding_model.predict_generator(node_gen, workers=4, verbose=1)
print(node_embeddings)
I wrote a comment for each line I updated (the UPDATED ones) - other than some minor typos, I suspect the main issue came from importing keras instead of tensorflow.keras. With the release of tensorflow >= 2.0, stellargraph uses the keras API that's part of tensorflow's core API, and it's recommended that users that use keras with the tensorflow backend switch to using tensorflow.keras
At this time, we recommend that Keras users who use multi-backend Keras with the TensorFlow backend switch to tf.keras in TensorFlow 2.0. tf.keras is better maintained and has better integration with TensorFlow features (eager execution, distribution support and other).
Hope that helps!
As a side note, some of the other methods being used are deprecated in 0.11.0 - they should work as is for now, but will produce deprecation warnings and will be removed in the future:
Constructing a stellargraph from networkx:
G = sg.StellarGraph(Gnx, node_features=node_with_features)
# switch to
G = sg.StellarGraph.from_networkx(Gnx, node_features=node_with_features)
Getting input and output tensors from a stellargraph model:
x_inp, x_out = graphsage.build()
# switch to
x_inp, x_out = graphsage.in_out_tensors()

LSTM Keras confusion

#enumaris thank you for your answer. I'll try to explain my approach a bit:
I pushed the video frames through resnet model and got fature shapes of (k, 2048). I have the data into train/validation and test folders. Then I was writing this script:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Activation, Dropout, Dense
import tensorflow as tf
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import cv2
import os
dataTrain = []
labelsTrain = []
# Prepare the Training Data. The .txt files contain name of the name of the
#file and the label which is 0,1,or 2 based on which class the video belongs
#to (nameVideo.npy 0)
with open('D:...\Data\/train_files.txt') as f:
trainingList = f.readlines()
for line in trainingList:
npyFiles = line.split( )
loadTrainingData = np.load(npyFiles[0])
dataTrain.append(loadTrainingData)
labelsTrain.append(npyFiles[1])
dataNp = np.array(dataTrain, dtype=object)
labelsNp = np.array(labelsTrain, dtype=object)
f.close()
dataVal = []
labelsVal = []
# Prepare the Validation Data
with open('D:\...\Data\/val_files.txt') as f:
valList = f.readlines()
for line in valList:
npyValFiles = line.split( )
loadValData = np.load(npyValFiles[0])
dataVal.append(loadValData)
labelsVal.append(npyValFiles[1])
f.close()
print(len(dataVal))
model = Sequential()
model.add(LSTM(32,
batch_input_shape=(None, None, 1),
return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32))
model.add(Dense(10, activation='softmax'))
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['accuracy'])
model.summary()
history = model.fit(dataTrain, labelsTrain,
epochs=10,
validation_data=(dataVal, labelsVal))
Which results in the following error:
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 3521 arrays.

word embedding example in keras predicts different results on each run

I am following the pretrained_word_embeddings and is saving the model using the following piece of code
print('Saving model to disk ...')
model.save('/home/data/pretrained-model.h5'')
I am then loading the pretrained model using
pretrained_model = load_model('/home/data/pretrained-model.h5')
Later the following piece of code for predicting on a different text altogether
predict_texts = [] # list of text samples
for predict_name in sorted(os.listdir(PREDICT_TEXT_DATA_DIR)):
predict_path = os.path.join(PREDICT_TEXT_DATA_DIR, predict_name)
if os.path.isdir(predict_path):
for predict_fname in sorted(os.listdir(predict_path)):
if predict_fname.isdigit():
predict_fpath = os.path.join(predict_path, predict_fname)
if sys.version_info < (3,):
f = open(predict_fpath)
else:
f = open(predict_fpath, encoding='latin-1')
predict_text = f.read()
i = predict_text.find('\n\n') # skip header
if 0 < i:
predict_text = predict_text[i:]
predict_texts.append(predict_text)
f.close()
print('Found %s texts.' % len(predict_texts))
tokenizer.fit_on_texts(predict_texts)
predict_sequences = tokenizer.texts_to_sequences(predict_texts)
predict_data = pad_sequences(predict_sequences, maxlen=MAX_SEQUENCE_LENGTH)
print('Shape of predict data tensor:', predict_data.shape)
x_predict = predict_data
y_predict = pretrained_model.predict(x_predict)
max_val = np.argmax(y_predict)
print('Category it belongs to : ',max_val)
The problem that I am facing now is that each time I run this above piece of code, max_val is always a different value.
How do I make predictions consistent please ?

I think you should predict one by one, not merge all texts for all files.
The following code I tested is OK:
from __future__ import print_function
import os
import sys
import numpy as np
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
from keras.layers import Dense, Input, GlobalMaxPooling1D
from keras.layers import Conv1D, MaxPooling1D, Embedding
from keras.models import Model
from keras.models import load_model
from keras.preprocessing.text import text_to_word_sequence
MAX_SEQUENCE_LENGTH = 1000
MAX_NB_WORDS = 20000
EMBEDDING_DIM = 100
model = load_model('embedding.h5')
PREDICT_TEXT_DATA_DIR = 'predict_data'
predict_path = os.path.join(PREDICT_TEXT_DATA_DIR, '1.txt')
f = open(predict_path, encoding='utf-8')
predict_text = f.read()
f.close()
texts=[predict_text]
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = Tokenizer(num_words=MAX_NB_WORDS)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
x_predict = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
print('Shape of predict data tensor:', x_predict.shape)
y_predict = model.predict(x_predict)
max_val = np.argmax(y_predict)
print('Category it belongs to : ',max_val)

LSTM Error python keras

Good morning, I'm trying to train lstm to classify spam and not spam, I came across the following error:
ValueError: Input 0 is incompatible with layer lstm_1: expected ndim = 3, found ndim = 4
Can someone help me understand where the problem is?
my code:
import sys
import pandas as pd
import numpy as np
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.feature_extraction.text import CountVectorizer
if __name__ == "__main__":
np.random.seed(7)
with open('SMSSpamCollection') as file:
dataset = [[x.split('\t')[0],x.split('\t')[1]] for x in [line.strip() for line in file]]
data = np.array([dat[1] for dat in dataset])
labels = np.array([dat[0] for dat in dataset])
dataVectorizer = CountVectorizer(analyzer = "word",
tokenizer = None,
preprocessor = None,
stop_words = None,
max_features = 5000)
labelVectorizer = CountVectorizer(analyzer = "word",
tokenizer = None,
preprocessor = None,
stop_words = None,
max_features = 5000)
data = dataVectorizer.fit_transform(data).toarray()
labels = labelVectorizer.fit_transform(labels).toarray()
vocab = labelVectorizer.get_feature_names()
print(vocab)
print(data)
print(labels)
data = np.reshape(data, (data.shape[0], 1, data.shape[1]))
input_dim = data.shape
tam = len(data[0])
print(data.shape)
print(tam)
model = Sequential()
model.add(LSTM(tam, input_shape=input_dim))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(data, labels, epochs=100, batch_size=1, verbose=2)
I tried adding another position in the data array but also with no result
my file SMSSpamCollection
ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...
ham Ok lar... Joking wif u oni...
spam Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
ham U dun say so early hor... U c already then say...
ham Nah I don't think he goes to usf, he lives around here though
spam FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv
ham Even my brother is not like to speak with me. They treat me like aids patent.
...
thanks

The problem lies in fact that you are adding an additional dimension connected with samples. Try:
input_dim = (data.shape[1], data.shape[2])
This should work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

sklearn.exceptions.NotFittedError: This LabelEncoder instance is not fitted yet - python

Related

Value Error problem for machine learning model

object of type 'Activation' has no len()

LSTM Keras confusion

word embedding example in keras predicts different results on each run

LSTM Error python keras

Categories

Resources