At Meerkat, we improved our facial recognition by 40% at 10k distractors, with real-time performance and an easy interface.
This article is also available in our Medium: https://firstname.lastname@example.org
Facial recognition (FR) technology has come a long way in recent years in terms of applicability. However, the standard FR deploy still presents several difficulties with it. They may range from the method accuracy and performance, requirement of specific setups to ease integration and mobile support. With the latest release of our facial recognition API (frAPI) version 5.0 we aim to addressed all those problems together such that our clients can be up and running their FR system within fifteen minutes.
Accuracy and Setup Dependency
The facial recognition, as well as other Computer Vision areas, had a recent breakthrough with the use of Deep Learning. They allowed the creation of highly accurate recognition systems, however they usually require several millions of images to train and they present a high computational cost, usually requiring a GPU for decent performance.
Our previous frAPI technology was based on a method not based on Deep Learning, yet with good accuracy and low computational requirements, which allowed us to process a video stream with up to 40 frames per second. For the last year or so, we start diving into Deep Learning and CNNs (Convolutional Neural Networks) inspired by all the explosive improvements in performance in such problems, such as object detection in the ImageNet or Pascal-VOC challenges. However, the challenge of running a network on CPU and keeping the real-time performance was quite challenging.
By carefully choosing which parts of the system will be based on neural networks, we boost our accuracy by ~40% for large databases while maintaining real-time performance on CPU.
On the image below you can see an example of the robustness of the new deeplearning version when comparing to the previous version. It is remarkable that with only one image the system was able to detect both IronMan with shades and Captain America that is partially occluded by Scarlet Witch — all of the heroes are not wearing suits, of course.
It’s clear that the new CNN-based method is able to improve recall and give more assertive recognition in terms of confidence values.
The importance of this recognition robustness is two folded: the system becomes much more reliable and there is no need for a more controlled setup, such as use several images to train a person or heavily constrain the camera setup.
For a quantitative evaluation we used the LFW (Labeled Faces in the Wild) dataset, which is standard dataset to compare facial recognition techniques. On the standard facial verification protocol we achieved 98.5% of accuracy, which is quite awesome. We can see on the graphs below how we stack against other facial recognition solutions.
We have great results on LFW, however it is becoming a “saturated dataset”, i.e. it is no longer presenting a difficult enough test to rank the recent and most accurate facial recognition systems. Also, their evaluation protocol is more suited for the verification problem (one to one), and not for the recognition problem (one to many).
In real world problems the N from one to many (1:N) can be quite big. So in order to provide a more hard and real-life scenario to our test we used the same protocol as proposed by the MegaFace challenge: within N images, find the single image containing a given person. In the graph below we can see how our system does with N up to 10,000, i.e. in those 10,000 images there is only one of the person we are searching for, and the system must be able to find it.
Here, we are plotting the recognition accuracy with 10,000 distractors for different ranks. A given X rank indicates that the correct person was found on the first X results. For example, for example a correct recognition with rank 10 indicates that the correct person was within the first 10 persons from the recognition process.
What the graph is showing us is that, given a single image input and 10k images from 10k people, we are able to identify the person 82% of the time.
Given all that above, we have a pretty amazing technology for facial recognition. But, as we stated in the beginning, we are interested in also providing a complete system to our clients. That means several things such as real-time performance for IP cameras, providing a easy and intuitive interface and porting our technology to mobile.
A common difficulty in implementing Computer Vision systems is on the enrollment phase — this is also the case for face recognition. Usually this is done on an active manner, with the enrolled person being on a controlled environment where some pictures can be taken. It is clear that this is a burden, especially if the person to be enrolled is a client. A setup of a database for hundreds of people can be almost impractical — yet, the quality of acquisition will influence directly the results at inference time.
To tackle this problem we created a “Smart Enroll” option that does the enrollment process completely passive, without the need of special setups. You just need a camera in which several people will appear, and the system will extract the faces of every person and automatically cluster them apart. This allows the enrollment of people on real use case scenarios, with several people are the video and without the need of a special setup. Take a look at this processing running in our system:
Clustering can also be used for other purposes as well, such as assigning temporary labels for people that are not present in the database — our Restful API has this option also.
In passive face recognition, speed matters a lot. Take the case of using a camera at a store’s entrance to detect recurring consumers. If the recognition system has a low throughput, many will not be detected since the gap between two consecutive recognitions is too large to collect evidence from the person entering.
We already talked about our core technology change to use deeplearning method, and a first concern that one might have is the speed performance when using it with CPU. We are glad to announce that our hand-crafted neural network is able to process a video stream with up to 25 frames per second on a common machine (i5 3.2GHz, 4GB RAM).
With this performance it is possible to use the facial recognition directly on a video stream such as your common IP camera. On the near future we are planning to launch a GPU version which should be much faster and able to process a large number of cameras at real-time in a single machine.
Depending on the facial recognition usage, one common problem is the use of IP cameras. They are indeed useful for many applications, however they usually require a power connection and are not portable. So we decided to use a high quality camera that most of us already have on our pockets, the cell phone.
We developed an small (< 2MB) app called frAPI Eye that transforms your cellphone on a IP camera that connects directly to frAPI. With this app you can launch a recognition stream from the cellphone camera. And since we use H264 video encoding for the video transfer the image quality is really high while using a small bandwidth on your network. For the more tech-savyy, we implement the data stream using websockets: the code is public available at https://github.com/meerkat-cv/h264_decoder .
Finally, our new face verification system was ported to smartphones where a biometric process can be performed without the requirement of internet connection. All is done locally and the Android/iOS SDK is much smaller than current competitors (< 30MB). Notice that this port has exactly the same accuracy as the on-premise version of the system (i.e. 98.5% on LFW).
This huge set of improvements places Meerkat as a top provider of face recognition, yet we still are looking for new ways to improve our customer experience with it, so let us know if you have any ideas/requirements in the comments.
If that made you interested, contact us at email@example.com and we can arrange a trial at no time. Also, follows us on Medium; interesting things will be launched soon