I am working on a text classification project, and I would like to use keras to rank the importance of each word (token). My intuition is that I should be able to sort weights from the Keras model to rank the words.
Possibly I am having a simple issue using argsort or tf.math.top_k.
The complete code is from Packt
I start by using sklearn to compute TF-IDF using the 10,000 most frequent words.
vectorizer = TfidfVectorizer(min_df=2, ngram_range=(1, 2), stop_words='english',
max_features=10000, strip_accents='unicode', norm='l2')
x_train_2 = vectorizer.fit_transform(x_train_preprocessed).todense()
x_test_2 = vectorizer.transform(x_test_preprocessed).todense()
I can view the list of words like this:
print(vectorizer.get_feature_names()[:10])
I then build and fit a model using Keras. Keras is using the tensorflow backend.
# Deep Learning modules
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import Adadelta, Adam, RMSprop
from keras.utils import np_utils
# Definiting hyper parameters
np.random.seed(1337)
nb_classes = 20
batch_size = 64
nb_epochs = 20
Y_train = np_utils.to_categorical(y_train, nb_classes)
model = Sequential()
model.add(Dense(1000, input_shape=(10000,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(500))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
print(model.summary())
# Model Training
model.fit(x_train_2, Y_train, batch_size=batch_size, epochs=nb_epochs, verbose=1)
I can then get weights like this:
weight = model.weights[0]
# Returns <tf.Variable 'dense_1/kernel:0' shape=(10000, 1000) dtype=float32_ref>
Since the number of rows (10,000) is equal to the number of features, I think I am on the right track. I need to get a list of indices I can use to get feature names: informative_features = vectorizer.get_feature_names()[sorted_indices].
I have tried to build a list using two different techniques:
tf.nn.top_k
sorted_indices = tf.nn.top_k(weight)
# Returns TopKV2(values=<tf.Tensor 'TopKV2_2:0' shape=(10000, 1) dtype=float32>, indices=<tf.Tensor 'TopKV2_2:1' shape=(10000, 1) dtype=int32>)
I have not determined how to get a list from this result.
argsort
sorted_indices = model.get_weights()[0].argsort(axis=0)
print(sorted_indices.shape)
# Returns (10000, 1000)
Function argsort returns a matrix, but what I need is a one-dimensional list.
How can I use weights to rank text features?
i think it is not possible
first layer outputs 1000 value
each value binded with each feature with some weight value
and same thing continues to end of network
if input directly binded classification layer and if it is trained then
tfidf = Input(shape=(10000,))
output = Dense(nb_classes, activation='softmax')(tfidf)
model = Model(tfidf,output)
model.summary()
# train model ...
last_layer = model.layers[-1]
weights = last_layer.get_weights()[0]
for i in range(nb_classes):
print('class : ',i,' -> Feature : ',np.argmax(weights[:,i]) )
Related
I've read Tensorflow's tutorial(https://www.tensorflow.org/tutorials/structured_data/preprocessing_layers) for binary classification for structured data but I can't manage to modify it to produce a model that will be able to do a multiclass classification.
first, you should add a Dense layer as the last layer that the numbers of a neuron are equal to the class numbers.
if each input can be members of more than one class you should use the sigmoid activation function for the last layer and binary_crossentropy loss function.
the reason is that in the softmax function sum of the all outputs is equal to 1. so it isn't good for multi-class. but when you use sigmoid, each class would be as a binary classification problem.
also if each input can be a member of only one class. you should use the softmax activation function and categorical_crossentrop loss function.
this is a simple project for the Reuters dataset with 46 class numbers and 10000 feature inputs. I used Keras but TensorFlow is the same.
from keras.datasets import reuters
from keras import models
from keras import layers
import numpy as np
import matplotlib.pyplot as plt
def binery_vectorize(sequences, dimension):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
word_index = reuters.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_newswire = ' '.join([reverse_word_index.get(i - 3, '(?)') for i in train_data[0]])
x_train = binery_vectorize(train_data,10000)
x_test = binery_vectorize(test_data,10000)
y_train = binery_vectorize(train_labels,46)
y_test = binery_vectorize(test_labels,46)
if the inputs can be in only one class the model would be like below
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])
if the inputs can be in more than one class the model would be like below
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='sigmoid'))
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])
In the end fit model
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = y_train[:1000]
partial_y_train = y_train[1000:]
history = model.fit(partial_x_train,partial_y_train,epochs=20,batch_size=512,validation_data=(x_val, y_val))
Okay so I'm pretty new to deep learning and have a very basic doubt. I have an input data with an array containing 255 data (Araay shape (255,)) in epochs_data and their corresponding labels in new_labels (Array shape (255,)).
I split the data using the following code:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(epochs_data, new_labels, test_size = 0.2, random_state=30)
I'm using a sequential model:
from keras.models import Sequential
from keras import layers
from keras.layers import Dense, Activation, Flatten
model = Sequential()
I know how to code for the hidden layers and output layer:
model.add(Dense(500, activation='relu')) #Hidden Layer
model.add(Dense(2, activation='softmax')) #Output Layer
But I don't know how to code layer for input with the input_shape specified. The X_train is the input.It's an array of shape (180,). Also tell me how to code the model.fit() for the same. Any help is appreciated.
You have to copy this line before the hidden layer. You can add the activation function that you want. Finally, as you can see this line represent both the input layer and the 1° hidden layer (you have to choose the n° of neuron (I put 100) )
model.add(Dense(100, input_shape = (X_train.shape[1],))
EDIT:
Before fitting your model you have to configure your model with this line:
model.compile(loss = 'mse', optimizer = 'Adam', metrics = ['mse'])
So you have to choose a metric that in this case is Mean Squarred Error and an optimizer like Adam, Adamax, ect.
Then you can fit your model choosing the data (X,Y), n° epochs, val_split and the batch size.
history = model.fit(X_train, y_train, epochs = 200,
validation_split = 0.1, batch_size=250)
I want to change my model architecture a bit on the LSTM so it accepts the same exact flattened inputs the full connected approach does.
Working Dnn model from Keras examples
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import to_categorical
# import the data
from keras.datasets import mnist
# read the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
num_pixels = x_train.shape[1] * x_train.shape[2] # find size of one-dimensional vector
x_train = x_train.reshape(x_train.shape[0], num_pixels).astype('float32') # flatten training images
x_test = x_test.reshape(x_test.shape[0], num_pixels).astype('float32') # flatten test images
# normalize inputs from 0-255 to 0-1
x_train = x_train / 255
x_test = x_test / 255
# one hot encode outputs
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
num_classes = y_test.shape[1]
print(num_classes)
# define classification model
def classification_model():
# create model
model = Sequential()
model.add(Dense(num_pixels, activation='relu', input_shape=(num_pixels,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
return model
# build the model
model = classification_model()
# fit the model
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10, verbose=2)
# evaluate the model
scores = model.evaluate(x_test, y_test, verbose=0)
Same problem but trying LSTM (syntax error still)
def kaggle_LSTM_model():
model = Sequential()
model.add(LSTM(128, input_shape=(x_train.shape[1:]), activation='relu', return_sequences=True))
# What does return_sequences=True do?
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr=1e-3, decay=1e-5)
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt,
metrics=['accuracy'])
return model
model_kaggle_LSTM = kaggle_LSTM_model()
# fit the model
model_kaggle_LSTM.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10, verbose=2)
# evaluate the model
scores = model_kaggle_LSTM.evaluate(x_test, y_test, verbose=0)
Problem is here:
model.add(LSTM(128, input_shape=(x_train.shape[1:]), activation='relu', return_sequences=True))
ValueError: Input 0 is incompatible with layer lstm_17: expected
ndim=3, found ndim=2
If I go back and don't flatten x_train and y_train, it works. However, I'd like this to be "just another model choice" that feeds off the same pre-processed input. I thought passing shape[1:] would work as that it the real flattened input_shape. I'm sure it's something easy I'm missing about the dimensionality, but I couldn't get it after an hour of twiddling and debugging, although did figure out not flattening the 28x28 to 784 works, but I don't understand why it works. Thanks a lot!
For bonus points, an example of how to do either DNN or LSTM in either 1D (784,) or 2D (28, 28) would be the best.
RNN layers such as LSTM are meant for sequence processing (i.e. a series of vectors which their order of appearance matters). You can look at an image from top to bottom, and consider each row of pixels as a vector. Therefore, the image would be a sequence of vectors and can be fed to the RNN layer. Therefore, according to this description, you should expect that the RNN layer take an input of shape (sequence_length, number_of_features). That's why when you feed the images to the LSTM network in their original shape, i.e. (28,28), it works.
Now if you insist on feeding the LSTM model the flattened image, i.e. with shape (784,), you have at least two options: either you can consider this as a sequence of length one, i.e. (1, 748), which does not make much sense; or you can add a Reshape layer to your model to reshape back the input to its original shape suitable for the input shape of a LSTM layer, like this:
from keras.layers import Reshape
def kaggle_LSTM_model():
model = Sequential()
model.add(Reshape((28,28), input_shape=x_train.shape[1:]))
# the rest is the same...
I am kind of new to deep learning and I have been trying to create a simple sentiment analyzer using deep learning methods for natural language processing and using the Reuters dataset. Here is my code:
import numpy as np
from keras.datasets import reuters
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Dense, Dropout, GRU
from keras.utils import np_utils
max_length=3000
vocab_size=100000
epochs=10
batch_size=32
validation_split=0.2
(x_train, y_train), (x_test, y_test) = reuters.load_data(path="reuters.npz",
num_words=vocab_size,
skip_top=5,
maxlen=None,
test_split=0.2,
seed=113,
start_char=1,
oov_char=2,
index_from=3)
tokenizer = Tokenizer(num_words=max_length)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')
y_train = np_utils.to_categorical(y_train, 50)
y_test = np_utils.to_categorical(y_test, 50)
model = Sequential()
model.add(GRU(50, input_shape = (49,1), return_sequences = True))
model.add(Dropout(0.2))
model.add(Dense(256, input_shape=(max_length,), activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(50, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
model.summary()
history = model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=validation_split)
score = model.evaluate(x_test, y_test)
print('Test Accuracy:', round(score[1]*100,2))
What I do not understand is why, every time I try to use a GRU or LSTM cell instead of a Dense one, I get this error:
ValueError: Error when checking input: expected gru_1_input to have 3
dimensions, but got array with shape (8982, 3000)
I have seen online that adding return_sequences = True could solve the issue, but as you can see, the issue remains in my case.
What should I do in this case?
The problem is that the shape of x_train is (8982, 3000) so it means that (considering the preprocessing stage) there are 8982 sentences encoded as one-hot vectors with vocab size of 3000. On the other hand a GRU (or LSTM) layer accepts a sequence as input and therefore its input shape should be (batch_size, num_timesteps or sequence_length, feature_size). Currently the features you have are the presence (1) or absence (0) of a particular word in a sentence. So to make it work with GRU you need to add a third dimension to x_train and x_test:
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)
and then remove that return_sequences=True and change the input shape of GRU to input_shape=(3000,1). This way you are telling the GRU layer that you are processing sequences of length 3000 where each element consists of one single feature. (As a side note I think you should pass the vocab_size to num_words argument of Tokenizer. That indicates the number of words in vocabulary. Instead, pass max_length to maxlen argument of load_data which limits the length of a sentence.)
However, I think you may get better results if you use an Embedding layer as the first layer and before the GRU layer. That's because currently the way you encode sentences does not take into account the order of words in a sentence (it just cares about their existence). Therefore, feeding GRU or LSTM layers, which relies on the order of elements in a sequence, with this representation does not make sense.
I am trying to implement CNN for a classification task. I want to see the how the weights are being optimized at each epoch. To do so, I need the values of penultimate layer. Also, I will hard code the last layer and backpropagation myself. Please recommend APIs also which which will be helpful.
Edit: I have added a code from keras examples. Looking forward to edit it.
This link provide some hint. I have mentioned the layer after which I require the output.
from __future__ import print_function
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import Conv1D, GlobalMaxPooling1D
from keras.datasets import imdb
# set parameters:
max_features = 5000
maxlen = 400
batch_size = 100
embedding_dims = 50
filters = 250
kernel_size = 3
hidden_dims = 250
epochs = 100
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
print('Build model...')
model = Sequential()
# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features,
embedding_dims,
input_length=maxlen))
model.add(Dropout(0.2))
# we add a Convolution1D, which will learn filters
# word group filters of size filter_length:
model.add(Conv1D(filters,
kernel_size,
padding='valid',
activation='relu',
strides=1))
# we use max pooling:
model.add(GlobalMaxPooling1D())
# We add a vanilla hidden layer:
model.add(Dense(hidden_dims))
model.add(Dropout(0.2))
model.add(Activation('relu'))
# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1))
model.add(Activation('sigmoid')) #<======== I need output after this.
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test))
You can get the individual layers of your model like this:
num_layer = 7 # Dense(1) layer
layer = model.layers[num_layer]
I want to see the how the weights are being optimized at each epoch.
To get the weights of the layer use layer.get_weights() like this:
w, b = layer.get_weights() # weights and bias of Dense(1)
I need the values of penultimate layer.
To get the value of the evaluation of the last layer use model.predict():
prediction = model.predict(x_test)
To get the evaluation of any other layer do it with tensorflow like this:
input = tf.placeholder(tf.float32) # Create input placeholder
layer_output = layer(input) # create layer output operation
init_op = tf.global_variables_initializer() # initialize variables
with tf.Session() as sess:
sess.run(init_op)
# evaluate layer output
output = sess.run(layer_output, feed_dict = {input: x_test})
print(output)