X.shape[1] size doesn't fit the expected value - python

I'm currently working on my final degree project in robotics, and I decided to create an open-source robot capable of replicating human emotions. The robot is all set up and ready to receive orders, but I'm still busy coding it. I'm currently basing my code off this method. The idea is to extract 68 facial landmarks from
a low FPS video feed (using RPi Camera V2), feed those landmarks to a trained SVM classifier and have it return a numeral from 0-6 depending on the expression it detected (Angry, Disgust, Fear, Happy, Sad, Surprise and Neutral). I'm testing out the capabilities of my model with some pictures I took using the RPi Camera, and this is what I've managed to put together so far in terms of code:
# import the necessary packages
from imutils import face_utils
import dlib
import cv2
import numpy as np
import time
import argparse
import os
import sys
if sys.version_info >= (3, 0):
import _pickle as cPickle
else:
import cPickle
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from data_loader import load_data
from parameters import DATASET, TRAINING, HYPERPARAMS
def get_landmarks(image, rects):
if len(rects) > 1:
raise BaseException("TooManyFaces")
if len(rects) == 0:
raise BaseException("NoFaces")
return np.matrix([[p.x, p.y] for p in predictor(image, rects[0]).parts()])
# initialize dlib's face detector (HOG-based) and then create
# the facial landmark predictor
print("Initializing variables...")
p = "shape_predictor_68_face_landmarks.dat"
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(p)
# path to pretrained model
path = "saved_model.bin"
# load pretrained model
print("Loading model...")
model = cPickle.load(open(path, 'rb'))
# initialize final image height & width
height = 48
width = 48
# initialize landmarks variable as empty array
landmarks = []
# load the input image and convert it to grayscale
print("Loading image...")
gray = cv2.imread("foo.jpg")
# detect faces in the grayscale image
print("Detecting faces in loaded image...")
rects = detector(gray, 0)
# loop over the face detections
print("Looping over detections...")
for (i, rect) in enumerate(rects):
# determine the facial landmarks for the face region, then
# convert the facial landmark (x, y)-coordinates to a NumPy
# array
shape = predictor(gray, rect)
shape = face_utils.shape_to_np(shape)
# loop over the (x, y)-coordinates for the facial landmarks
# and draw them on the image
for (x, y) in shape:
cv2.circle(gray, (x, y), 2, (0, 255, 0), -1)
# show the output image with the face detections + facial landmarks
print("Storing saved image...")
cv2.imwrite("output.jpg", gray)
print("Image stored as /'output.jpg/'")
# arrange landmarks in array
print("Collecting and arranging landmarks...")
# scipy.misc.imsave('temp.jpg', image)
# image2 = cv2.imread('temp.jpg')
face_rects = [dlib.rectangle(left=1, top=1, right=47, bottom=47)]
landmarks = get_landmarks(gray, face_rects)
# load data
print("Loading collected data into predictor...")
print("Extracted landmarks: ", landmarks)
landmarks = np.array(landmarks.flatten())
# predict expression
print("Making prediction")
predicted = model.predict(landmarks)
However, after running the code everything seems to be fine up until this point:
Making prediction
Traceback (most recent call last):
File "face.py", line 97, in <module>
predicted = model.predict(landmarks)
File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 576, in predict
y = super(BaseSVC, self).predict(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 325, in predict
X = self._validate_for_predict(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 478, in _validate_for_predict
(n_features, self.shape_fit_[1]))
ValueError: X.shape[1] = 136 should be equal to 2728, the number of features at training time
I searched for similar issues on this website, but being such a specific purpose I didn't quite find what I needed. I've been working on the design and research for quite some time, but finding all the snippets needed to make the code work has taken the most time out of me, and I'd love to polish this concept as soon as possible since the presentation date is approaching quickly. Any and all contributions are greatly welcomed!
Here's the trained model I'm currently using, by the way.

I am probably being silly, but it looks like you define path after you use it to load your model.
Also path seems like a very bad name for a variable containing a file location, perhaps modelFileLocation is less likely to already be defined.

Solved it! Turns out my model was trained using a combination of HOG features and Dlib landmarks, however I was only feeding the landmarks to the predictor, which resulted in the size discrepancy.

Related

extracting coordinates from computer vision inference

I converted this computer vision model 7.x to an ONNX type model that can be used with the open VINO toolkit. This model has good characteristics of what I am after for how it is used in other applications I have read about.
I think my question is super basic related to not understanding computer vision enough and just curious if someone can give me some tips on the computer vision basics on how to loop through the model output for "bounding boxes" to draw with opencv.
Using this on CPU with pip installed open VINO:
import cv2
import numpy as np
import matplotlib.pyplot as plt
from openvino.runtime import Core
model_path = (
f"./yolov7.xml"
)
ie_core = Core()
def model_init(model_path):
model = ie_core.read_model(model=model_path)
compiled_model = ie_core.compile_model(model=model, device_name="CPU")
input_keys = compiled_model.input(0)
output_keys = compiled_model.output(0)
return input_keys, output_keys, compiled_model
input_key, output_keys, compiled_model = model_init(model_path)
# resize the image so it works with the model dimensions
image = cv2.resize(image, (width, height))
image = image.transpose((2,0,1))
image = image.reshape(1,3, height,width)
# Run inference on image, trying .output(1) first
boxes = compiled_model([image])[compiled_model.output(1)]
The code works....outputs an array, but what does this data contain? For some reason I thought that there could be a confidence I could filter out bad predictions as well as bounding box coordinates?
If I print(compiled_model) this outputs I think the model architecture:
<CompiledModel:
inputs[
<ConstOutput: names[input.1] shape{1,3,640,640} type: f32>
]
outputs[
<ConstOutput: names[812] shape{1,25200,85} type: f32>,
<ConstOutput: names[588] shape{1,3,80,80,85} type: f32>,
<ConstOutput: names[669] shape{1,3,40,40,85} type: f32>,
<ConstOutput: names[750] shape{1,3,20,20,85} type: f32>
]>
Does this tell me anything about the model output, like what the data would contain? or the boxes.shape:
Which returns:
(1, 3, 80, 80, 85)
for box in boxes:
print(box)
this is just numpy arrays lots of float data just curious if anyone can help me understand at a high level what I need to learn to draw bounding boxes around features inside the image.
From my replication, your code is not working with "NameError:name 'image' is not defined" error. In your output, the ConstOutput only represents port/node of your model. To ensure your model works, run your yolov7.xml file with OpenVINO Benchmark Python Tool. You should not receive any errors.
In OpenVINO samples, you may refer to Object Detection Python Demo source code to learn the OpenVINO Inference Engine API usage for creating bounding boxes and how to handle the model. Here is another example of creating bounding boxes:
For box in boxes:
#Pick a confidence factor from the last place in an array.
conf=box[-1]
If conf > threshold:
#Convert float to int and multiply corner position of each box by x and y ration.
#If the bounding box is found that the top of the image
#Position the upper box bar little lower to make it visible on the image
(x_min, y_min, x_max, y_max) = [
int (max(corner_position*ratio_y, 10)) if idx%2
else int (corner_position*ratio_x)
for idx, corner_position in enumerate(box[:-1])
#Draw a box base on the position, parameters in rectangle function are: image,start_point, end_point, color, thickness.
rgb_image = cv2.rectangle(rgb_image, (x_min,y_min), (x_max,y_max),
colors["green"], 3)

running python script by php give null result on macOS

so I'm trying to execute a python file version 3.9 file using php via VScode and safari browser I spent hours trying to figure out what is the problem because it keep giving me null result
here is my code on php file
<?php
$result = shell_exec("/usr/bin/python3 fr/fr_load.py");
var_dump($result);
?>
I tried other python file with only print command and they worked just fine so my connection is fine but I can't figure out what is the problem
my python code is
#!/usr/bin/python3
#import the required libraries
import cv2
import joblib
from sys import exit
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd
# function to detect face from image
def face_detection(image_to_detect):
#converting the image to grayscale since its required for eigen and fisher faces
image_to_detect_gray = cv2.cvtColor(image_to_detect, cv2.COLOR_BGR2GRAY)
# load the pretrained model for face detection
# haarcascade is recommended for Eigenface
face_detection_classifier = cv2.CascadeClassifier('php/fr/models/haarcascade_frontalface_default.xml')
# detect all face locations in the image using classifier
all_face_locations = face_detection_classifier.detectMultiScale(image_to_detect_gray)
# if no faces are detected
if (len(all_face_locations) == 0):
return None, None
#splitting the tuple to get four face positions
x,y,width,height = all_face_locations[0]
#[0] cuz we assume we have only one face on our data set
#calculating face coordinates
face_coordinates = image_to_detect_gray[y:y+width, x:x+height]
#for traning and testting all images should be at same size (for egien)
face_coordinates = cv2.resize(face_coordinates,(500,500)) #we can use any number fit for us
#return the face detected and face location
return face_coordinates, all_face_locations[0] #all_face_locations[0] beacuse there is only one face in each image
names =[]
names.append("steve jobs")
names.append("ali alramadan")
names.append("hillary clinton")
#########load recognition model for later use ###########
face_classifier = cv2.face.EigenFaceRecognizer_create()
face_classifier.read("php/fr/models/our model.yml")
######## prediction ##############
#path of the image we want to test and predict
image_to_classify = cv2.imread("php/fr/dataset/testing/load.jpg")
#make a copy of the image cuz we dont want to ruin the actual image by writing on it
image_to_classify_copy = image_to_classify.copy()
#get the face from the image
face_coordinates_classify, box_locations = face_detection(image_to_classify_copy)
#if no faces are returned
if face_coordinates_classify is None:
print("There are no faces in the image to classify")
exit()
#if face is returned then predict the face
name_index, distance = face_classifier.predict(face_coordinates_classify) #if distance is big then there is big different between the train image and test image
name = names[name_index]
distance = abs(distance)
print(name)
#draw bounding box and text for the face detected
(x,y,w,h) = box_locations
cv2.rectangle(image_to_classify,(x,y),(x+w, y+h),(0,255,0),2)
cv2.putText(image_to_classify,name,(x,y-5),cv2.FONT_HERSHEY_PLAIN,2.5,(0,255,0),2)
#show the image in window
cv2.imshow("Prediction "+name, cv2.resize(image_to_classify, (500,500)))
cv2.waitKey(0)
cv2.destroyAllWindows()
any suggestions please?

Python face_recognition dataset quality

I´m construction a dataset with more than one image for each person for python face_recognition package. It will add a classifier on top of the bultin model. See also this example: face_recognition_knn.py. here is my code:
# import the necessary packages
from imutils import paths
import face_recognition
import pickle
import cv2
import os
# grab the paths to the input images in our dataset
print("[INFO] quantifying faces...")
imagePaths = list(paths.list_images('dataset'))
# initialize the list of known encodings and known names
knownEncodings = []
knownNames = []
# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
# extract the person name from the image path
print(f"[INFO] processing image {i+1}/{len(imagePaths)} -> {imagePath}")
name = imagePath.split(os.path.sep)[-2]
# load the input image and convert it from BGR (OpenCV ordering)
# to dlib ordering (RGB)
image = cv2.imread(imagePath)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# detect the (x, y)-coordinates of the bounding boxes
# corresponding to each face in the input image
boxes = face_recognition.face_locations(rgb,
model='hog') #can be cnn
# compute the facial embedding for the face
encodings = face_recognition.face_encodings(rgb, boxes)
# loop over the encodings
for encoding in encodings:
# add each encoding + name to our set of known names and
# encodings
knownEncodings.append(encoding)
knownNames.append(name)
# dump the facial encodings + names to disk
print("[INFO] serializing encodings...")
data = {"encodings": knownEncodings, "names": knownNames}
f = open('encodings.pickle', "wb")
f.write(pickle.dumps(data))
f.close()
Then, I try to identify these people with this code:
import face_recognition
import pickle
import cv2
import numpy as np
import requests
from datetime import datetime
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open("encodings.pickle", "rb").read())
def processa_imagem(url):
# load the input image and convert it from BGR to RGB and returns file with cofidence
image = cv2.imread(url)
if image is None:
print(f'Image not found: {imagem}')
#image = np.array(image, dtype=np.uint8)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# detect the (x, y)-coordinates of the bounding boxes corresponding
# to each face in the input image, then compute the facial embeddings
# for each face
print("[INFO] recognizing faces...")
boxes = face_recognition.face_locations(rgb,
model='hog')
encodings = face_recognition.face_encodings(rgb, boxes)
# initialize the list of names for each face detected
names = []
# loop over the facial embeddings
for encoding in encodings:
# attempt to match each face in the input image to our known
# encodings
## ATTENTION! the ideal is face_recognition.api.batch_face_locations but i dont have a GPU
matches = face_recognition.face_distance(data["encodings"],
encoding)
name = "unkown"
# check to see if we have found a match
if max(matches) > 0.7:
# find the indexes of all matched faces then initialize a
# dictionary to count the total number of times each face
# was matched
matchedIdxs = [i for (i, b) in enumerate(matches) if b]
counts = {}
# loop over the matched indexes and maintain a count for
# each recognized face face
for i in matchedIdxs:
name = data["names"][i]
counts[name] = counts.get(name, 0) + 1
# determine the recognized face with the largest number of
# votes (note: in the event of an unlikely tie Python will
# select first entry in the dictionary)
name = max(counts, key=counts.get)
# update the list of names
names.append(name)
# loop over the recognized faces
for ((top, right, bottom, left), name) in zip(boxes, names):
# draw the predicted face name on the image
cv2.rectangle(image, (left, top), (right, bottom), (255, 0, 0), 2)
y = top - 15 if top - 15 > 15 else top + 15
cv2.putText(image, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX,
0.75, (255, 0, 0), 2)
now = datetime.now()
current_time = now.strftime("%H%M%S%f")
#file_path = f'static/face-{current_time}.jpg'
file_path = f'face-{current_time}.jpg'
cv2.imwrite(file_path,image)
return (file_path, ', '.join(names))
On my dataset, I´ve added, on average, about 10 photos of each individual. The script uses face_recognition.face_distance and it works well to recognize someone in the dataset.
The problema is that, when I use it with someone that OUT. For these people, sometimes I still get about 0.90 higher confidence false positive results.
Some of the pictures in dataset are low quality. Maybe that´s the reason? Should I change my approach, using more detailed photos (2 or 3) and maybe encoding them with jitters?
Thanks for any input!

How to solve : internal_error: 'numpy.ndarray' object is not callable?

I have this problem. I run this code on flask api
# face verification with the VGGFace2 model
from matplotlib import pyplot
from PIL import Image
from numpy import asarray
from scipy.spatial.distance import cosine
from mtcnn.mtcnn import MTCNN
from keras_vggface.vggface import VGGFace
from keras_vggface.utils import preprocess_input
# extract a single face from a given photograph
def extract_face(filename, required_size=(254, 254)):
# load image from file
pixels = pyplot.imread(filename)
# create the detector, using default weights
detector = MTCNN()
# detect faces in the image
results = detector.detect_faces(pixels)
# extract the bounding box from the first face
x1, y1, width, height = results[0]['box']
x2, y2 = x1 + width, y1 + height
# extract the face
face = pixels[y1:y2, x1:x2]
# resize pixels to the model size
image = Image.fromarray(face)
image = image.resize(required_size)
face_array = asarray(image)
# print(face_array)
return face_array
# extract faces and calculate face embeddings for a list of photo files
def get_embeddings(filenames):
# extract faces
faces = [extract_face(f) for f in filenames]
# convert into an array of samples
samples = asarray(faces, 'float32')
# prepare the face for the model, e.g. center pixels
samples = preprocess_input(samples, version=2)
# create a vggface model
model = VGGFace(model='vgg16', include_top=False, input_shape=(254, 254, 3), pooling='max')
# perform prediction
yhat = model.predict(samples)
return yhat
# determine if a candidate face is a match for a known face
def is_match(known_embedding, candidate_embedding, thresh=0.45):
# calculate distance between embeddings
score = cosine(known_embedding, candidate_embedding)
print('Match percentage (%.3f)' % (100 - (100 * score)))
print('>face is a Match (%.3f <= %.3f)' % (score, thresh))
# define filenames
filenames = ['audacious.jpg', 'face-20190717050545949130_123.jpg']
# get embeddings file filenames
embeddings = get_embeddings(filenames)
# define sharon stone
sharon_id = embeddings[0]
# verify known photos of sharon
print('Positive Tests')
is_match(embeddings[0], embeddings[1])
I test with first hit, the process work well. But when the second hit that give error :
'numpy.ndarray' object is not callable
'Cannot interpret feed_dict key as Tensor: Tensor Tensor("Placeholder:0", shape=(3, 3, 3, 64), dty
pe=float32) is not an element of this graph.'
If i run not on API, just in file then run with : python3 file.py, any times i run not give any errors
any clue ?
Check this line:
samples = asarray(faces, 'float32')
and try to replace it with:
samples = asarray(faces, dtype=np.float32)

Incorrect facenet recognition

I've been working on a face recognition attendance management system. I've built the pipeline from scratch but in the end,the script recognizes the wrong face among a group of 10 classes.
I've implemented the following pipeline using Tensorflow and Python.
Capture images, resize, align them using dlib's shape predictor and store them in named folders for later comparison while performing recognition.
Pickle the images into a data.pickle file for later deserialization.
Using OpenCV to implement MTCNN algorithm to detect faces in a frame captured by webcam
passing these frames into a facenet network to create 128-D embeddings and compared accordingly with the embeddings present in pickle database.
Given Below is the main file which runs step 3 and 4:
from keras import backend as K
import time
from multiprocessing.dummy import Pool
K.set_image_data_format('channels_first')
import cv2
import os
import glob
import numpy as np
from numpy import genfromtxt
import tensorflow as tf
from keras.models import load_model
from fr_utils import *
from inception_blocks_v2 import *
from mtcnn.mtcnn import MTCNN
import dlib
from imutils import face_utils
import imutils
import pickle
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
FRmodel = load_model('face-rec_Google.h5')
# detector = dlib.get_frontal_face_detector()
detector = MTCNN()
# FRmodel = faceRecoModel(input_shape=(3, 96, 96))
#
# # detector = dlib.get_frontal_face_detector()
# # predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
# def triplet_loss(y_true, y_pred, alpha = 0.3):
# """
# Implementation of the triplet loss as defined by formula (3)
#
# Arguments:
# y_pred -- python list containing three objects:
# anchor -- the encodings for the anchor images, of shape (None, 128)
# positive -- the encodings for the positive images, of shape (None, 128)
# negative -- the encodings for the negative images, of shape (None, 128)
#
# Returns:
# loss -- real number, value of the loss
# """
#
# anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
#
# pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), axis=-1)
# neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), axis=-1)
# basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
# loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
#
# return loss
#
# FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])
# load_weights_from_FaceNet(FRmodel)
def ret_model():
return FRmodel
def prepare_database():
pickle_in = open("data.pickle","rb")
database = pickle.load(pickle_in)
return database
def unpickle_something(pickle_file):
pickle_in = open(pickle_file,"rb")
unpickled_file = pickle.load(pickle_in)
return unpickled_file
def webcam_face_recognizer(database):
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
while vc.isOpened():
ret, frame = vc.read()
img_rgb = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
img = frame
# We do not want to detect a new identity while the program is in the process of identifying another person
img = process_frame(img,img)
cv2.imshow("Preview", img)
cv2.waitKey(1)
vc.release()
def process_frame(img, frame):
"""
Determine whether the current frame contains the faces of people from our database
"""
# rects = detector(img)
rects = detector.detect_faces(img)
# Loop through all the faces detected and determine whether or not they are in the database
identities = []
for (i,rect) in enumerate(rects):
(x,y,w,h) = rect['box'][0],rect['box'][1],rect['box'][2],rect['box'][3]
img = cv2.rectangle(frame,(x, y),(x+w, y+h),(255,0,0),2)
identity = find_identity(frame, x-50, y-50, x+w+50, y+h+50)
cv2.putText(img, identity,(10,500), cv2.FONT_HERSHEY_SIMPLEX , 4,(255,255,255),2,cv2.LINE_AA)
if identity is not None:
identities.append(identity)
if identities != []:
cv2.imwrite('example.png',img)
return img
def find_identity(frame, x,y,w,h):
"""
Determine whether the face contained within the bounding box exists in our database
x1,y1_____________
| |
| |
|_________________x2,y2
"""
height, width, channels = frame.shape
# The padding is necessary since the OpenCV face detector creates the bounding box around the face and not the head
part_image = frame[y:y+h, x:x+w]
return who_is_it(part_image, database, FRmodel)
def who_is_it(image, database, model):
encoding = img_to_encoding(image, model)
min_dist = 100
# Loop over the database dictionary's names and encodings.
for (name, db_enc) in database.items():
# Compute L2 distance between the target "encoding" and the current "emb" from the database.
dist = np.linalg.norm(db_enc.flatten() - encoding.flatten())
print('distance for %s is %s' %(name, dist))
# If this distance is less than the min_dist, then set min_dist to dist, and identity to name
if dist < min_dist:
min_dist = dist
identity = name
if min_dist >0.1:
print('Unknown person')
else:
print(identity)
return identity
if __name__ == "__main__":
database = prepare_database()
webcam_face_recognizer(database)
What am I doing wrong here?
Here the FRmodel is the facenet trained model
Few points:
I don't see resizing, aligning and whitening of the input face image that is fed into the network.
You cannot add a fixed margin of 50 to a variable-sized face. There has to be a scaling such that the face region fills almost the same region in every input image.
I am not sure about the model you are using, but if you are using FaceNet, your accepted matching threshold, 0.1, seems to be very low. It will not accept any matches unless it is the same exact image (with a distance of 0.0), or has a very minimal variation from the gallery image.

Categories

Resources