how to update the face recognition reference image encoding daily or weekly to make face detection more accurate. because when time goes there will be changes in peoples face. so iam thinking to compare face_distance with a threshold value like .40 if it is greater than .40 then iam taking the face encodings. is that right or is there any better way to do this ?
How about:
Make two dataset folders
Name one of them: face_recognition
Second one: face_recognition_updated
Write a function to update dataset with new one
Yes, you are right but you need to be careful about:
make sure the first/initial enrolled image (or corresponding
embedding) is of high quality.
make sure there is no false positive--if some day your software finds
a false positive (may be a look alike face) then the software will
enroll the wrong person forever.
Better option is to re-enroll the person every 5 years.
Related
I'm working on a project that aims to detect each person's face while entering to a public space and store entering time and the person's image (array format) in Elasticsearch, and then detect each exiting face, loop over the Elasticsearch index relative to people who have entered in that day, pass to my model two images (detected exiting face and faces stored in Elasticsearch), match the two faces and return enter time, exit time and total duration.
For face matching/Face re-identification I'm using a VGG model that takes ~1sec to compare two faces.
This model takes two parameters and returns a value between 0 and 1.
I loop over all faces, I append accuracy to a list, and the appropriate face is which has the minimum value returned.
So that, if I have 100 entered person in that day, while looping to find one face, the program will take more than 100sec, but in my use case the program needs to run in real-time.
Any propositions for that ?
This is a Screenshot of my code where I'm calling the model:
In case you have too many image I would suggest to look at method like -FAISS. It is more efficient than computing distances between the new and other saved images. Also, you can try with 4-layer Conv net/efficient-net instead of VGG(but need to check accuracy degradation). As VGG is more computationally expensive.
Another approach you can do if list of person is fixed then you can save the embedding of all saved images and store them. At real time you can use feature extractor to extract feature and compare it with all stored embedding, this will definitely save time for you.
Adding to #Rambo_john - here is a nice image search demo that uses VGG and a managed Faiss service.
I aim to design an app that recognize a certain type of objects (let's say, a book) and that can say whether the input is effectively a book or not (binary classification).
For a better user experience, I would like the input to be a video rather than a picture: that way, the user won't have to deal with issues such as sharpness, centering of the object... He'll just have to make a "scan" of the object, without much consideration for the quality of a single image.
And there comes my problem : As I intend to create my training dataset from scratch (the true object I want to detect being absent from existing datasets such as ImageNet),
I was wondering if videos were irrelevant for this type of binary classification and if I should rather ask the user to take a good picture of the object.
On one hand, videos have the advantage of constituting a larger dataset than one created only from photos (though I can expand my picture's dataset thanks to data augmentation) as it is easier to take a 10s video of an object rather than taking 10x24 (more or less…) pictures of it.
But on the other hand I fear the result will be less precise, as in a video many frames are redundant and the average quality might not be as good as in a single, proper image.
Moreover, I do not intend to use the time property of a video (as in a scan the temporality is useless) but rather working one frame at a time (as depicted in this article).
What is the proper way of constituting my dataset? As I really would like to keep this “scan” for the user’s comfort and if images are more precise than videos in such a classification is it eventually possible to automatically extract a single image from a “scan”, and working directly on it?
Good question! The answer is: you should train your model on how you plan to use it. So if you ask the user to take photos, train it on photos. If you ask the user to film the object, train on frames extracted from video.
The images might seem blurry to you, but they won't be for a computer. It will just learn to detect "blurry books", but that's OK, that's what you want.
Of course this is not always the case. The image might become so blurry that the information whether or not there is a book in the frame is no longer there. Where is the line? A general rule of thumb: if you can see it's a book, the computer will also see it. As I think blurry images of books will still be recognizable as books, I think you could totally do it.
Creating "photos (single image, sharp)" from "scan (more blurry, frames from video)" can be done, it's called super-resolution. But those models are pretty beefy, not something you would want to run on a mobile device.
On a completely unrelated note: try googling Transfer Learning! It will benefit you for sure :D.
In general, is there any "best practise" on how to use videos as input of deep learning models? How can we annotate video in the most efficient way?
Also, I have some videos with ducks walking through a passage. I want to count the number of grey-duck and the number of yellow-duck passing through the passage. A duck can pass directly through (easiest case), or can stay in the passage for a while and pass through, or can go half the passage and go back the other direction (in this case it should not be counted).
I plan to use Mask-RCNN to segment ducks in each frame and then to look at the masks from frame i and masks from frame i+1 and make rules to count the number of different ducks that truly pass the passage.
This does not seems optimal to me.
Any ideas/help/hints?
I guess it depends on the video, but a good option was to
Annotate some 'not to similar frames' with: http://www.robots.ox.ac.uk/~vgg/software/via/
Use a model like YOLO or Mask-RCNN to find bounding box over each objects and classfiy them. Or use Optical flow algorithm. Optical flow algorithm is also an option instead of using deep learning, but I finally decided not to use it due to several possible outcome which made it from my point of view less automatic: *object that moves, stop and restart moving would require special attention *objects which are of one main color might be split into two pieces (middle pixels might be saw as not moving) *group of object passing together will probably be saw as one object
Then using tracking algorithm you will be able to give a specific ID to each object, and hence to count when they pass a certain line.
I'm building a grading system for crabs. In this system, the animals (crabs) are placed in a moving conveyor and I need to identify dead or alive animals by detecting its motion based on images captured by a camera on this conveyor.
The color of conveyor belt is black.
As the conveyor is always moving, so I can't apply methods using stationary camera like here. Does anyone have a suggestion about motion detection of the animals in this case using opencv? I can use more than one camera if it's necessary. Thanks.
Well, the most obvious answer is:
1) adjust the pictures of the conveyor in the different periods of time so that they become of the same area.
2) watch which ones of the crabs have different poses (like, "substract the images") - different regions (pixels) mean that there happened a motion.
If using a tracking - well, you should train your classifier to watch the crabs, and than compare the regions of crabs in a same way. But i think it's too complicated for your particular issue.
Well, This is an interesting question. While weighing different solutions to the problem, I learned that crabs are ectothermal animals, i.e. they can not control their body temperatures but rather their body temperatures are equal to the temperatures of the environment they are in. So, using a remote thermometer is out of question. (But I learned something new, thank you for that :] )
A different, but a little bit cruel method would be, to give take a shot of a crab on the the belt, then give it a nudge of electric pulse (very very small voltage, enough for it to make it react only, similar to us when we get a static discharge) and take another shot of the crab immediately. Compare two images to see if there is a difference in crab's movements. If so, it should be alive, if not, RIP crab.
There are downsides of this solution too:
I really do not like the idea of giving electric shocks to crabs,
even if it is low voltage. Sounds very cruel to me. I am not sure,
if it is legally doable where you live in either.
This requires adding another step to process.
I absolutely have no idea what would be a amount of voltage to be
used in such a system. Would it pose any danger for the employees
around the conveyor belt?
[I hope I am not get stoned for suggesting giving electric shocks to crabs here]
I am working on image processing and computer vision project. The project is to count the number of people entering the conference. This need to done in OpenCV or Python.
I have already tried the Haar Cascade that is available in OpenCV for Upper body: Detect upper body portion using OpenCV
However, it does not address the requirement. The link of the videos is as follows:
https://drive.google.com/open?id=0B3LatSCwKo2benZyVXhKLXV6R0U
If you view the sample1 file, at 0:16 secs a person is entering the room, that would always be the way. The camera is on top of the door.
Identifying People from this Aerial Video Stream
I think there is a simple way of approaching this problem. Background subtraction methods for detecting moving objects are just what you need because the video you provided seems to only have one moving object at any point: the person walking through the door. Thus, if you follow this tutorial in Python, you should be able to implement a satisfying solution for your problem.
Counting People Entering / Exiting
Now, the first question that pops to my mind is what might I do to count if multiple people are walking through the door at separate time intervals (one person walks in 10 seconds into the video and a second person walks in 20 seconds into the video)? Here's the simplest solution to this consideration that I can think of. Once you've detected the blob(s) via background subtraction, you only have to track the blob until it goes off the frame. Once it leaves the frame, the next blob you detect must be a new person entering the room and thus you can continue counting. If you aren't familiar with how to track objects once they have been detected, give this tutorial a read. In this manner, you'd avoid counting the same blob (i.e., the same person) entering too many times.
The Difficulties in Processing Complex Dynamic Environments
If you think that there is a high level of traffic through that doorway, then the problem becomes much more difficult. This is because in that case there may not be much stationary background to subtract at any given moment, and further there may be a lot of overlap between detected blobs. There is a lot of active research in the area of autonomous pedestrian tracking and identification - so, in short, it's a difficult question that doesn't have a straightforward easy-to-implement solution. However, if you're interested in reading about some of the potential approaches you could take to solving these more challenging problems in pedestrian detection from an aerial view, I'd recommend reading the answers to this question.
I hope this helps, good luck coding!