How to do Topic Detection in Unsupervised Aspect Based Sentiment Analysis - python

I want to make an ABSA using Python where the sentiment of pre-defined aspects (e.g. delivery, quality, service) is analyzed from online reviews. I want to do it unsupervised because this will save me from manually labeling reviews and I can analyze a lot more review data (looking at around 100k reviews). Therefore, my datasets consists of only reviews and no ratings. I would like to have a model that can first detect the aspect category and then assign the sentiment polarity. E.g. when the review says "The shipment went smoothly, but the product is broken" I want the model to assign the word "shipment" to the aspect category "delivery" and "smoothly" relates to a positive sentiment.
I have searched for approaches to take and I would like to know if anyone has experience with this and could guide me into a direction that could help me. It will be highly appreciated!

Aspect Based Sentiment Analysis (ABSA), where the task is first to
extract aspects or features of an entity (i.e. Aspect Term Extraction
or ATE1 ) from a given text, and second to determine the sentiment
polarity (SP), if any, towards each aspect of that entity. The
importance of ABSA led to the creation of the ABSA task
B-LSTM & CRF classifier will be used for feature extraction and aspect
term detection for both supervised and unsupervised ATE.
https://www.researchgate.net/profile/Andreea_Hossmann/publication/319875533_Unsupervised_Aspect_Term_Extraction_with_B-LSTM_and_CRF_using_Automatically_Labelled_Datasets/links/5a3436a70f7e9b10d842b0eb/Unsupervised-Aspect-Term-Extraction-with-B-LSTM-and-CRF-using-Automatically-Labelled-Datasets.pdf
https://github.com/songyouwei/ABSA-PyTorch/blob/master/infer_example.py
# -*- coding: utf-8 -*-
# file: infer.py
# author: songyouwei <youwei0314#gmail.com>
# Copyright (C) 2019. All Rights Reserved.
import torch
import torch.nn.functional as F
import argparse
from data_utils import build_tokenizer, build_embedding_matrix
from models import IAN, MemNet, ATAE_LSTM, AOA
class Inferer:
"""A simple inference example"""
def __init__(self, opt):
self.opt = opt
self.tokenizer = build_tokenizer(
fnames=[opt.dataset_file['train'], opt.dataset_file['test']],
max_seq_len=opt.max_seq_len,
dat_fname='{0}_tokenizer.dat'.format(opt.dataset))
embedding_matrix = build_embedding_matrix(
word2idx=self.tokenizer.word2idx,
embed_dim=opt.embed_dim,
dat_fname='{0}_{1}_embedding_matrix.dat'.format(str(opt.embed_dim), opt.dataset))
self.model = opt.model_class(embedding_matrix, opt)
print('loading model {0} ...'.format(opt.model_name))
self.model.load_state_dict(torch.load(opt.state_dict_path))
self.model = self.model.to(opt.device)
# switch model to evaluation mode
self.model.eval()
torch.autograd.set_grad_enabled(False)
def evaluate(self, raw_texts):
context_seqs = [self.tokenizer.text_to_sequence(raw_text.lower().strip()) for raw_text in raw_texts]
aspect_seqs = [self.tokenizer.text_to_sequence('null')] * len(raw_texts)
context_indices = torch.tensor(context_seqs, dtype=torch.int64).to(self.opt.device)
aspect_indices = torch.tensor(aspect_seqs, dtype=torch.int64).to(self.opt.device)
t_inputs = [context_indices, aspect_indices]
t_outputs = self.model(t_inputs)
t_probs = F.softmax(t_outputs, dim=-1).cpu().numpy()
return t_probs
if __name__ == '__main__':
model_classes = {
'atae_lstm': ATAE_LSTM,
'ian': IAN,
'memnet': MemNet,
'aoa': AOA,
}
# set your trained models here
model_state_dict_paths = {
'atae_lstm': 'state_dict/atae_lstm_restaurant_acc0.7786',
'ian': 'state_dict/ian_restaurant_acc0.7911',
'memnet': 'state_dict/memnet_restaurant_acc0.7911',
'aoa': 'state_dict/aoa_restaurant_acc0.8063',
}
class Option(object): pass
opt = Option()
opt.model_name = 'ian'
opt.model_class = model_classes[opt.model_name]
opt.dataset = 'restaurant'
opt.dataset_file = {
'train': './datasets/semeval14/Restaurants_Train.xml.seg',
'test': './datasets/semeval14/Restaurants_Test_Gold.xml.seg'
}
opt.state_dict_path = model_state_dict_paths[opt.model_name]
opt.embed_dim = 300
opt.hidden_dim = 300
opt.max_seq_len = 80
opt.polarities_dim = 3
opt.hops = 3
opt.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
inf = Inferer(opt)
t_probs = inf.evaluate(['happy memory', 'the service is terrible', 'just normal food'])
print(t_probs.argmax(axis=-1) - 1)

Related

Pytorch model weights change when put on GPU

I noticed a very strange behaviour regarding the 3D Resnet by Facebookresearch. Using their sample code from the website, I receive different results, when putting the model on GPU. While on cpu the correct class (archery) is predicted, the model fails to predict it on GPU. Can anyone replicate this and confirm that this is indeed the case? Does anyone know, why this is happening and how to prevent it? Following, you will find some code to quickly test it out:
import torch
import json
import urllib
from pytorchvideo.data.encoded_video import EncodedVideo
from torchvision.transforms import Compose, Lambda
from torchvision.transforms._transforms_video import (
CenterCropVideo,
NormalizeVideo,
)
from pytorchvideo.transforms import (
ApplyTransformToKey,
ShortSideScale,
UniformTemporalSubsample
)
def predict_archery(model, device):
json_url = "https://dl.fbaipublicfiles.com/pyslowfast/dataset/class_names/kinetics_classnames.json"
json_filename = "kinetics_classnames.json"
try:
urllib.URLopener().retrieve(json_url, json_filename)
except:
urllib.request.urlretrieve(json_url, json_filename)
with open(json_filename, "r") as f:
kinetics_classnames = json.load(f)
# Create an id to label name mapping
kinetics_id_to_classname = {}
for k, v in kinetics_classnames.items():
kinetics_id_to_classname[v] = str(k).replace('"', "")
side_size = 256
mean = [0.45, 0.45, 0.45]
std = [0.225, 0.225, 0.225]
crop_size = 256
num_frames = 8
sampling_rate = 8
frames_per_second = 30
# Note that this transform is specific to the slow_R50 model.
transform = ApplyTransformToKey(
key="video",
transform=Compose(
[
UniformTemporalSubsample(num_frames),
Lambda(lambda x: x / 255.0),
NormalizeVideo(mean, std),
ShortSideScale(
size=side_size
),
CenterCropVideo(crop_size=(crop_size, crop_size))
]
),
)
# The duration of the input clip is also specific to the model.
clip_duration = (num_frames * sampling_rate) / frames_per_second
url_link = "https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4"
video_path = 'archery.mp4'
try:
urllib.URLopener().retrieve(url_link, video_path)
except:
urllib.request.urlretrieve(url_link, video_path)
# Select the duration of the clip to load by specifying the start and end duration
# The start_sec should correspond to where the action occurs in the video
start_sec = 0
end_sec = start_sec + clip_duration
# Initialize an EncodedVideo helper class and load the video
video = EncodedVideo.from_path(video_path)
# Load the desired clip
video_data = video.get_clip(start_sec=start_sec, end_sec=end_sec)
# Apply a transform to normalize the video input
video_data = transform(video_data)
# Move the inputs to the desired device
inputs = video_data["video"]
inputs = inputs.to(device)
# Pass the input clip through the model
preds = model(inputs[None, ...])
# Get the predicted classes
post_act = torch.nn.Softmax(dim=1)
preds = post_act(preds)
pred_classes = preds.topk(k=5).indices[0]
# Map the predicted classes to the label names
pred_class_names = [kinetics_id_to_classname[int(i)] for i in pred_classes]
print("Top 5 predicted labels: %s" % ", ".join(pred_class_names))
if __name__ == '__main__':
# Choose device
# device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device = torch.device("cpu")
# Choose the `slow_r50` model
model = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True).to(device)
model = model.eval()
predict_archery(model, device)
Results on cpu:
Top 5 predicted labels: archery, throwing axe, playing paintball,
stretching arm, riding or walking with horse
Results on GPU:
Top 5 predicted labels: flying kite, air drumming, beatboxing,
smoking, reading book
Edit:
Apparently, this issue cannot be reproduced on google colab. I therefore assume that the issue is related to the specific hardware / cuda version. I am using a NVIDIA TITAN Xp and cuda version 11.4.

changing name of tensorflow log in detectron2 model

I want to change the name of my tensorflow logs (for ex. events.out.tfevents.1649248617.AlienwareArea51R5.51093) to a custom name. The logs are are saved in the cfg.OUTPUT_DIR location but I cannot find where to change the name.. Should I change it somewhere in the cfg settings or is it with setup_logger()?
thank you in advance!
my current function:
'''
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2.engine.MyTrainer import MyTrainer
from detectron2.config import get_cfg
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("kernel_train_data",)
cfg.DATASETS.TEST = ("kernel_test_data",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-
Detection/faster_rcnn_R_101_FPN_3x.yaml") # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR
cfg.SOLVER.MAX_ITER = 6000 # 300 iterations seems good enough for this toy dataset; you
will need to train longer for a practical dataset
cfg.SOLVER.STEPS = [] # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64 # faster, and good enough for this toy
dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
cfg.TEST.EVAL_PERIOD= 500
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = MyTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()'''

Using CIFAR-100 datased with VGG19 model in simple_fedavg example

I'm using the Simple fedavg example from the github of tensorflow federated, i was trying to change the dataset and the model, but i can't get any positive feedback, the accuracy is always at 1%.
This is the code, i just changed the model part and the dataset from the simple_fedavg example. Any idea? I tried with different optimizers, but still no luck.
# Copyright 2020, The TensorFlow Federated Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Simple FedAvg to train EMNIST.
This is intended to be a minimal stand-alone experiment script built on top of
core TFF.
"""
import collections
import functools
from absl import app
from absl import flags
import numpy as np
import tensorflow as tf
import tensorflow_federated as tff
import random
import simple_fedavg_tf
import simple_fedavg_tff
import os
np.set_printoptions(precision=None, suppress=None)
# Training hyperparameters
flags.DEFINE_integer('total_rounds', 50, 'Number of total training rounds.')
flags.DEFINE_integer('rounds_per_eval', 1, 'How often to evaluate')
flags.DEFINE_integer('train_clients_per_round', 4,
'How many clients to sample per round.')
flags.DEFINE_integer('client_epochs_per_round', 5,
'Number of epochs in the client to take per round.')
flags.DEFINE_integer('batch_size', 16, 'Batch size used on the client.')
flags.DEFINE_integer('test_batch_size', 128, 'Minibatch size of test data.')
# Optimizer configuration (this defines one or more flags per optimizer).
flags.DEFINE_float('server_learning_rate', 1, 'Server learning rate.')
flags.DEFINE_float('client_learning_rate', 0.0005, 'Client learning rate.')
FLAGS = flags.FLAGS
def create_vgg19_model():
model = tf.keras.applications.VGG19(include_top=True,
weights=None,
input_shape=(32, 32, 3),
classes=100)
return model
def get_cifar100_dataset():
cifar100_train, cifar100_test = tff.simulation.datasets.cifar100.load_data()
def element_fn(element):
return collections.OrderedDict(
x=tf.expand_dims(element['image'], -1), y=element['label'])
def preprocess_train_dataset(dataset):
# Use buffer_size same as the maximum client dataset size,
# 418 for Federated EMNIST
return dataset.map(element_fn).shuffle(buffer_size=418).repeat(
count=FLAGS.client_epochs_per_round) # .batch(
# FLAGS.batch_size, drop_remainder=False)
def preprocess_test_dataset(dataset):
return dataset.map(element_fn).batch(
FLAGS.test_batch_size, drop_remainder=False)
cifar100_train = cifar100_train.preprocess(preprocess_train_dataset)
cifar100_test = preprocess_test_dataset(
cifar100_test.create_tf_dataset_from_all_clients())
return cifar100_train, cifar100_test
def server_optimizer_fn():
return tf.keras.optimizers.SGD(learning_rate=FLAGS.server_learning_rate)
def client_optimizer_fn():
return tf.keras.optimizers.Adam(learning_rate=FLAGS.client_learning_rate)
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
client_devices = tf.config.list_logical_devices('GPU')
print(client_devices)
server_device = tf.config.list_logical_devices('CPU')[0]
tff.backends.native.set_local_execution_context(
server_tf_device=server_device, client_tf_devices=client_devices)
train_data, test_data = get_cifar100_dataset()
def tff_model_fn():
"""Constructs a fully initialized model for use in federated averaging."""
# keras_model = create_original_fedavg_cnn_model(only_digits=False)
keras_model = create_vgg19_model()
# keras_model.summary()
loss = tf.keras.losses.SparseCategoricalCrossentropy()
return simple_fedavg_tf.KerasModelWrapper(keras_model,
test_data.element_spec, loss)
iterative_process = simple_fedavg_tff.build_federated_averaging_process(
tff_model_fn, tff_model_fn, tff_model_fn, tff_model_fn, server_optimizer_fn, client_optimizer_fn)
server_state = iterative_process.initialize()
metric = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')
model = tff_model_fn()
for round_num in range(FLAGS.total_rounds):
sampled_clients = np.random.choice( train_data.client_ids, size=FLAGS.train_clients_per_round, replace=False)
sampled_train_data = [ train_data.create_tf_dataset_for_client(client).batch( FLAGS.batch_size, drop_remainder=False)
for client in sampled_clients
]
server_state, train_metrics = iterative_process.next(
server_state, sampled_train_data)
print(f'Round {round_num} training loss: {train_metrics}')
if round_num % FLAGS.rounds_per_eval == 0:
model.from_weights(server_state.model_weights)
accuracy = simple_fedavg_tf.keras_evaluate(model.keras_model, test_data,
metric)
print(f'Round {round_num} validation accuracy: {accuracy * 100.0}')
if __name__ == '__main__':
app.run(main)

Training Spacy NER on custom dataset gives error

I am trying to train spacy NER model on custom dataset. Basically I want to use this model to extract Name, Organization, Email, phone number etc from resume.
Below is the code I am using.
import json
import random
import spacy
import sys
import logging
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_fscore_support
from spacy.gold import GoldParse
from spacy.scorer import Scorer
from sklearn.metrics import accuracy_score
from spacy.gold import biluo_tags_from_offsets
def convert_dataturks_to_spacy(dataturks_JSON_FilePath):
try:
training_data = []
lines=[]
with open(dataturks_JSON_FilePath, encoding='utf-8') as f:
lines = f.readlines()
for line in lines:
data = json.loads(line)
text = data['content']
entities = []
for annotation in data['annotation']:
#only a single point in text annotation.
point = annotation['points'][0]
labels = annotation['label']
if not isinstance(labels, list):
labels = [labels]
for label in labels:
entities.append((point['start'], point['end'] + 1 ,label))
training_data.append((text, {"entities" : entities}))
return training_data
except Exception as e:
logging.exception("Unable to process " + dataturks_JSON_FilePath + "\n" + "error = " + str(e))
return None
def reformat_train_data(tokenizer, examples):
output = []
for i, (text, entity_offsets) in enumerate(examples):
doc = tokenizer(text.strip())
ner_tags = biluo_tags_from_offsets(tokenizer(text), entity_offsets['entities'])
words = [w.text for w in doc]
tags = ['-'] * len(doc)
heads = [0] * len(doc)
deps = [''] * len(doc)
sentence = (range(len(doc)), words, tags, heads, deps, ner_tags)
output.append((text, [(sentence, [])]))
print("output",output)
return output
################### Train Spacy NER.###########
def train_spacy():
TRAIN_DATA = convert_dataturks_to_spacy("C:\\Users\\akjain\\Downloads\\Entity-Recognition-In-Resumes-SpaCy-master\\traindata.json")
nlp = spacy.blank("en")
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)
# add labels
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
ner.add_label(ent[2])
def get_data(): return reformat_train_data(nlp.tokenizer, TRAIN_DATA)
optimizer = nlp.begin_training(get_data)
for itn in range(10):
print("Starting iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
print(losses)
train_spacy()
I am getting the below error. Also, I came across a link (https://github.com/explosion/spaCy/issues/3558) with some suggestion to fix this code. But even after implementing that I am still getting error.
I am using Python 3.6.5 and Spacy 2.2.3
Dataset:
{"content": "Nida Khan\nTech Support Executive - Teleperformance for Microsoft\n\nJaipur, Rajasthan - Email me on Indeed: indeed.com/r/Nida-Khan/6c9160696f57efd8\n\n• To be an integral part of the organization and enhance my knowledge to utilize it in a productive\nmanner for the growth of the company and the global.\n\nINDUSTRIAL TRAINING\n\n• BHEL, (HEEP) HARIDWAR\nOn CNC System& PLC Programming.\n\nWORK EXPERIENCE\n\nTech Support Executive\n\nTeleperformance for Microsoft -\n\nSeptember 2017 to Present\n\nprocess.\n• 21 months of experience in ADFC as Phone Banker.\n\nEDUCATION\n\nBachelor of Technology in Electronics & communication Engg\n\nGNIT institute of Technology - Lucknow, Uttar Pradesh\n\n2008 to 2012\n\nClass XII\n\nU.P. Board - Bareilly, Uttar Pradesh\n\n2007\n\nClass X\n\nU.P. Board - Bareilly, Uttar Pradesh\n\n2005\n\nSKILLS\n\nMicrosoft office, excel, cisco, c language, cbs. (4 years)\n\nhttps://www.indeed.com/r/Nida-Khan/6c9160696f57efd8?isid=rex-download&ikw=download-top&co=IN","annotation":[{"label":["Email Address"],"points":[{"start":872,"end":910,"text":"indeed.com/r/Nida-Khan/6c9160696f57efd8"}]},{"label":["Skills"],"points":[{"start":800,"end":857,"text":"Microsoft office, excel, cisco, c language, cbs. (4 years)"}]},{"label":["Graduation Year"],"points":[{"start":676,"end":679,"text":"2012"}]},{"label":["College Name"],"points":[{"start":612,"end":640,"text":"GNIT institute of Technology "}]},{"label":["Degree"],"points":[{"start":552,"end":609,"text":"Bachelor of Technology in Electronics & communication Engg"}]},{"label":["Companies worked at"],"points":[{"start":420,"end":448,"text":"Teleperformance for Microsoft"}]},{"label":["Designation"],"points":[{"start":395,"end":417,"text":"\nTech Support Executive"}]},{"label":["Email Address"],"points":[{"start":106,"end":144,"text":"indeed.com/r/Nida-Khan/6c9160696f57efd8"}]},{"label":["Location"],"points":[{"start":66,"end":71,"text":"Jaipur"}]},{"label":["Companies worked at"],"points":[{"start":35,"end":63,"text":"Teleperformance for Microsoft"}]},{"label":["Designation"],"points":[{"start":10,"end":32,"text":"Tech Support Executive "}]},{"label":["Designation"],"points":[{"start":9,"end":31,"text":"\nTech Support Executive"}]},{"label":["Name"],"points":[{"start":0,"end":8,"text":"Nida Khan"}]}]}
The problem is you are feeding training data to model optimizer.
As mentioned in https://github.com/explosion/spaCy/issues/3558, use the following function to remove leading and trailing white spaces from entity spans.
def trim_entity_spans(data: list) -> list:
"""Removes leading and trailing white spaces from entity spans.
Args:
data (list): The data to be cleaned in spaCy JSON format.
Returns:
list: The cleaned data.
"""
invalid_span_tokens = re.compile(r'\s')
cleaned_data = []
for text, annotations in data:
entities = annotations['entities']
valid_entities = []
for start, end, label in entities:
valid_start = start
valid_end = end
# if there's preceding spaces, move the start position to nearest character
while valid_start < len(text) and invalid_span_tokens.match(
text[valid_start]):
valid_start += 1
while valid_end > 1 and invalid_span_tokens.match(
text[valid_end - 1]):
valid_end -= 1
valid_entities.append([valid_start, valid_end, label])
cleaned_data.append([text, {'entities': valid_entities}])
return cleaned_data
Then use the following function for training:
def train_spacy():
TRAIN_DATA = convert_dataturks_to_spacy("C:\\Users\\akjain\\Downloads\\Entity-Recognition-In-Resumes-SpaCy-master\\traindata.json")
TRAIN_DATA = trim_entity_spans(TRAIN_DATA)
nlp = spacy.blank('en') # create blank Language class
# create the built-in pipeline components and add them to the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)
# add labels
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
ner.add_label(ent[2])
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(10):
print("Statring iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
print(losses)

Class weights for balancing data in TensorFlow Object Detection API

I'm fine-tuning SSD object detector using TensorFlow object detection API on Open Images Dataset. My training data contains imbalanced classes, e.g.
top (5K images)
dress (50K images)
etc...
I would like to add class weights to classification loss to improve performance. How do I do that? The following section of the config file seems relevant:
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
...
classification_weight: 1.0
localization_weight: 1.0
}
How can I change the config file to add classification loss weights per class? If not through a config file, what's a recommended way of doing that?
the API expects a weight for each object (bbox) directly in the annotation files. Due to this requirement the solutions to use class weights seem to be:
1) If you have a custom dataset you can modify the annotations of each object (bbox) to include the weight field as 'object/weight'.
2) If you don't want to modify the annotations you can recreate only the tf_records file in order to include the weights of the bboxes.
3) Modify the code of the API (seemed to me quite tricky)
I decided to go for #2, so I put here the code to generate such weighted tf records file for a custom dataset with two classes ("top", "dress") with weights (1.0, 0.1) given a folder of xml annotations as:
import os
import io
import glob
import hashlib
import pandas as pd
import xml.etree.ElementTree as ET
import tensorflow as tf
import random
from PIL import Image
from object_detection.utils import dataset_util
# Define the class names and their weight
class_names = ['top', 'dress', ...]
class_weights = [1.0, 0.1, ...]
def create_example(xml_file):
tree = ET.parse(xml_file)
root = tree.getroot()
image_name = root.find('filename').text
image_path = root.find('path').text
file_name = image_name.encode('utf8')
size=root.find('size')
width = int(size[0].text)
height = int(size[1].text)
xmin = []
ymin = []
xmax = []
ymax = []
classes = []
classes_text = []
truncated = []
poses = []
difficult_obj = []
weights = [] # Important line
for member in root.findall('object'):
xmin.append(float(member[4][0].text) / width)
ymin.append(float(member[4][1].text) / height)
xmax.append(float(member[4][2].text) / width)
ymax.append(float(member[4][3].text) / height)
difficult_obj.append(0)
class_name = member[0].text
class_id = class_names.index(class_name)
weights.append(class_weights[class_id])
if class_name == 'top':
classes_text.append('top'.encode('utf8'))
classes.append(1)
elif class_name == 'dress':
classes_text.append('dress'.encode('utf8'))
classes.append(2)
else:
print('E: class not recognized!')
truncated.append(0)
poses.append('Unspecified'.encode('utf8'))
full_path = image_path
with tf.gfile.GFile(full_path, 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
if image.format != 'JPEG':
raise ValueError('Image format not JPEG')
key = hashlib.sha256(encoded_jpg).hexdigest()
#create TFRecord Example
example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(file_name),
'image/source_id': dataset_util.bytes_feature(file_name),
'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
'image/object/truncated': dataset_util.int64_list_feature(truncated),
'image/object/view': dataset_util.bytes_list_feature(poses),
'image/object/weight': dataset_util.float_list_feature(weights) # Important line
}))
return example
def main(_):
weighted_tf_records_output = 'name_of_records_file.record' # output file
annotations_path = '/path/to/annotations/folder/*.xml' # input annotations
writer_train = tf.python_io.TFRecordWriter(weighted_tf_records_output)
filename_list=tf.train.match_filenames_once(annotations_path)
init = (tf.global_variables_initializer(), tf.local_variables_initializer())
sess=tf.Session()
sess.run(init)
list = sess.run(filename_list)
random.shuffle(list)
for xml_file in list:
print('-> Processing {}'.format(xml_file))
example = create_example(xml_file)
writer_train.write(example.SerializeToString())
writer_train.close()
print('-> Successfully converted dataset to TFRecord.')
if __name__ == '__main__':
tf.app.run()
If you have other kinds of annotations the code will be very similar but this one unfortunately will not work.
The Object Detection API losses are defined in: https://github.com/tensorflow/models/blob/master/research/object_detection/core/losses.py
In particular, the following loss classes have been implemented:
Classification losses:
WeightedSigmoidClassificationLoss
SigmoidFocalClassificationLoss
WeightedSoftmaxClassificationLoss
WeightedSoftmaxClassificationAgainstLogitsLoss
BootstrappedSigmoidClassificationLoss
Localization losses:
WeightedL2LocalizationLoss
WeightedSmoothL1LocalizationLoss
WeightedIOULocalizationLoss
The weight parameters are used to balance anchors (prior boxes) and are of size [batch_size, num_anchors] in addition to hard negative mining. Alternatively, the focal loss down weighs well classified examples and focusses on the hard examples.
The primary class imbalance is due to many more negative examples (bounding boxes without objects of interest) in comparison to very few positive examples (bounding boxes with object classes). That seems to be the reason why class imbalance within positive examples (i.e. unequal distribution of positive class labels) is not implemented as part of object detection losses.

Categories

Resources