Boost Your Website’s Capabilities with Real-Time Facial Recognition Using Face-API.js and Vanilla JavaScript

Boost Your Website’s Capabilities with Real-Time Facial Recognition Using Face-API.js and Vanilla JavaScript

·

10 min read

After our first article about facial detection (link here). In this new step-by-step tutorial, we will build a new AI capable application, our application can recognize multiple people's faces in a live streaming video (ex: webcam, phone camera …), we will use only vanilla JavaScript and face-api library.

Before starting

Before starting with us, I will suggest that you take a look at our first two articles in this AI series, to learn more about the Face-api library and how to implement it with JavaScript.

What are we building?

In this step-by-step tutorial, we will build a web application that recognize person’s faces from a video streaming source: webcam, external camera, phone camera, all of that from the browser, no back-end or API calls

  1. The application loads different models to use with our Face-api library

  2. Loading faces descriptions (imprint) from images, it's preferable to load multiple images for each person.

  3. Requesting access to the camera using native JavaScript API (navigator.mediaDevices)

  4. Starting the recognition process

List of Face-api models

SSD Mobilenet V1

SSD stands for (Single Shot Multibox Detector) based on MobileNetV1. The neural net will compute the locations of each face in an image and will return the bounding boxes together with its probability for each face. This face detector is aiming towards obtaining high accuracy in detecting face bounding boxes instead of low inference time. The size of the quantized model is about 5.4 MB (ssd_mobilenetv1_model).

68 Point Face Landmark Detection Models

This package implements a very lightweight and fast, yet accurate 68 point face landmark detector. The default model has a size of only 350kb (face_landmark_68_model) and the tiny model is only 80kb (face_landmark_68_tiny_model). Both models employ the ideas of depth wise separable convolutions and densely connected blocks. The models have been trained on a dataset of ~35k face images labelled with 68 face landmark points.

Face Recognition

For face recognition, a ResNet-34 like architecture is implemented to compute a face descriptor (a feature vector with 128 values) from any given face image, which is used to describe the characteristics of a person's face. The model is not limited to the set of faces used for training, meaning you can use it for face recognition of any person, for example yourself. You can determine the similarity of two arbitrary faces by comparing their face descriptors, for example by computing the Euclidean distance or using any other classifier of your choice.

The neural net is equivalent to the FaceRecognizerNet used in face-recognition.js and the net used in the dlib face recognition example. The weights have been trained by davisking and the model achieves a prediction accuracy of 99.38% on the LFW (Labelled Faces in the Wild) benchmark for face recognition.

The size of the quantized model is roughly 6.2 MB (face_recognition_model).

Read more on the official Face-api repository

Prerequisites

To go with us in this process you will need a basic understanding of JavaScript, a code editor and your browser. Nothing more. And a Camera, of course.

Let’s code

Our application structure will look like this:

/Root
 ├─ faces
 │   ├─ person-one
 │   │   ├─ 1.jpg
 │   │   └─ 2.jpg
 │   └─ person-two
 │       ├─ 1.jpg
 │       └─ 2.jpg
 ├─ models
 ├─ app.js
 ├─ face-api.min.js
 ├─ styles.css
 └─ index.html

./faces

This folder contains the images for the person’s that we want to recognize, with a sub-folder for each person.

./models

This folder contains the files of all the trained models that we will use in our face detection process.

face-api-recognition/models at main · adelpro/face-api-recognition

face-api.min.js

Contain the code of our face-api (minified), you can download this file from here:

face-api-recognition/face-api.min.js at main · adelpro/face-api-recognition

styles.css

A simple CSS file to style our application.

.container {

 display: flex;

 justify-content: center;

 flex-direction: column;

 align-items: center;

 height: 100vh;

 text-align: center;

 flex: 1;

}


.video-container {

 position: relative;

}


.logs {

 width: 500px;

 border: 1px solid black;

 background-color: #cccc;

 margin-bottom: 30px;

 height: 200px;

 overflow-y: scroll;

}


.logs li {

 width: 100%;

 text-align: left;

 list-style: none;

 padding: 0;

}


.logs li:before {

 content: "\2192";

 margin-right: 10px;

}


canvas {

 position: absolute;

 top: 0;

 left: 0;

}

It’s the same as before (in the precedent article), just added a UI list basic styling with an ID: logs

index.html

This file contains all the necessary HTML code for our application, we will do all the logic in app.js.

<!DOCTYPE html>

<html lang="en">


<head>

 <meta charset="UTF-8">

 <meta http-equiv="X-UA-Compatible" content="IE=edge">

 <meta name="viewport" content="width=device-width, initial-scale=1.0">

 <title>Face-API AI Real-Time Facial Recognition</title>

 <link rel="stylesheet" type="text/css" href="styles.css">

 <script defer src="face-api.min.js"></script>

 <script defer src="app.js"></script>

</head>


<body>

 <main class="container">

   <h1>Real-Time facial recognition</h1>

   <p>We will use Javascript,mongose and Face-API to create a real time facial recognition system</p>

   <div id="video-container" class="video-container">

     <video id="video" autoplay muted width="500" height="500"></video>

   </div>

   <div>

     <ul id="logs" class="logs">

       <!--Logs will be displayed here-->

     </ul>

   </div>

 </main>

</body>

</html>

app.js

This file is the corp of our application, it contains all the logics, we will go through it and explain each part.

const video = document.getElementById("video");

const videoContainer = document.getElementById("video-container");

const logs = document.getElementById("logs");

const MODEL_URI = "/models";

log("Start loading the models");

Promise.all([

 faceapi.nets.ssdMobilenetv1.loadFromUri(MODEL_URI),

 faceapi.nets.faceLandmark68Net.loadFromUri(MODEL_URI),

 faceapi.nets.faceRecognitionNet.loadFromUri(MODEL_URI),

])

 .then(log("All models have been loaded."))

 .then(playVideo)

 .catch((err) => {

   console.log(err);

 });


function playVideo() {

 if (!navigator.mediaDevices) {

   console.error("mediaDevices not supported");

   return;

 }

 navigator.mediaDevices

   .getUserMedia({

     video: {

       width: { min: 640, ideal: 1280, max: 1920 },

       height: { min: 360, ideal: 720, max: 1080 },

     },

     audio: false,

   })

   .then(function (stream) {

     video.srcObject = stream;

   })

   .catch(function (err) {

     console.log(err);

   });

}


video.addEventListener("play", async () => {

 log("The video is starting to play.");

 log("Loading the faces from the database");

 const labeledFaceDescriptors = await loadLabeledFaceDescriptors();

 log("All faces have been loaded");

 const faceMatcher = new faceapi.FaceMatcher(labeledFaceDescriptors);

 // Creating the canvas

 const canvas = faceapi.createCanvasFromMedia(video);


 // This will force the use of a software (instead of hardware accelerated)

 // Enable only for low configurations

 canvas.willReadFrequently = true;


 videoContainer.appendChild(canvas);


 // Resizing the canvas to cover the video element

 const canvasSize = { width: video.width, height: video.height };

 faceapi.matchDimensions(canvas, canvasSize);

 log("Done.");

 setInterval(async () => {

   const detections = await faceapi

     .detectAllFaces(video)

     .withFaceLandmarks()

     .withFaceDescriptors();


   // Set detections size to the canvas size

   const detectionsArray = faceapi.resizeResults(detections, canvasSize);

   canvas.getContext("2d").clearRect(0, 0, canvas.width, canvas.height);


   detectionsDraw(canvas, faceMatcher, detectionsArray);

 }, 10000);

});


async function loadLabeledFaceDescriptors() {

 const faces = [

   {

     id: 1,

     label: "mahrez",

     images: ["./faces/mahrez/1.jpg", "./faces/mahrez/2.jpg"],

   },

   {

     id: 2,

     label: "mohamed",

     images: ["./faces/mohamed/1.jpg", "./faces/mohamed/2.jpg"],

   },

 ];

 const results = [];

 for (const face of faces) {

   const descriptions = [];

   for (let i = 0; i < face.images.length; i++) {

     const img = await faceapi.fetchImage(face.images[i]);

     log(`Processing image: ${face.images[i]}`);


     const detections = await faceapi

       .detectSingleFace(img)

       .withFaceLandmarks()

       .withFaceDescriptor();

     if (!detections) {

       log(`No face detected in ${face.label + ": " + face.images[i]}`);

       continue;

     }

     descriptions.push(detections.descriptor);

   }

   const result = new faceapi.LabeledFaceDescriptors(face.label, descriptions);

   results.push(result);

 }

 return results;

}

// Drawing our detections above the video

function detectionsDraw(canvas, faceMatcher, DetectionsArray) {

 DetectionsArray.forEach((detection) => {

   const faceMatches = faceMatcher.findBestMatch(detection.descriptor);

   const box = detection.detection.box;

   const drawOptions = {

     label: faceMatches.label,

     lineWidth: 2,

     boxColor: "#FF0015",

   };

   const drawBox = new faceapi.draw.DrawBox(box, drawOptions);

   drawBox.draw(canvas);

 });

}

function log(msg) {

 const message = document.createTextNode(msg);

 const li = document.createElement("li");

 li.appendChild(message);

 logs.appendChild(li);

 // Scroll down

 logs.scrollTop = logs.scrollHeight;

}

We have the function log() that will show any errors or information in a styled UI element

const logs = document.getElementById("logs");

.....


function log(msg) {

 const message = document.createTextNode(msg);

 const li = document.createElement("li");

 li.appendChild(message);

 logs.appendChild(li);

 // Scroll down to the end of the list

 logs.scrollTop = logs.scrollHeight;

}

The first thing is loading the models of our face-api

Promise.all([

 faceapi.nets.ssdMobilenetv1.loadFromUri(MODEL_URI),

 faceapi.nets.faceLandmark68Net.loadFromUri(MODEL_URI),

 faceapi.nets.faceRecognitionNet.loadFromUri(MODEL_URI),

])

 .then(log("All models have been loaded."))

 .then(playVideo)

 .catch((err) => {

   console.log(err);

 });

In this code block, we are loading our models files from the “./models” folder using Promise.all[], which allows us to wait for all models to load (it will take some time, depending on your hardware).

Then, we call the function playVideo() and of course catch any errors and show them in the console.

Our playVideo() function

function playVideo() {

 if (!navigator.mediaDevices) {

   console.error("mediaDevices not supported");

   return;

 }

 navigator.mediaDevices

   .getUserMedia({

     video: {

       width: { min: 640, ideal: 1280, max: 1920 },

       height: { min: 360, ideal: 720, max: 1080 },

     },

     audio: false,

   })

   .then(function (stream) {

     video.srcObject = stream;

   })

   .catch(function (err) {

     console.log(err);

   });

}

This function will first check if the browser support mediaDevices API, if not we will break the execution of the function and show an error message using our log() function: “mediaDevices not supported”

If It’s ok, we will call the mediaDevice.getUserMedia: this will show a notification to the user, asking permission to access the camera (no sound needed in our case), if the user refuses, we throw an error.

NB: The mediaDevice API must run in a secure context (https) or using localhost.

Now we will add an event listener to listen to the video play event

video.addEventListener("play", async () => {

 log("The video is starting to play.");

 log("Loading the faces from the database");

 const labeledFaceDescriptors = await loadLabeledFaceDescriptors();

 log("All faces have been loaded");

 const faceMatcher = new faceapi.FaceMatcher(labeledFaceDescriptors);

 // Creating the canvas

 const canvas = faceapi.createCanvasFromMedia(video);


 // This will force the use of a software (instead of hardware accelerated)

 // Enable only for low configurations

 canvas.willReadFrequently = true;


 videoContainer.appendChild(canvas);


 // Resizing the canvas to cover the video element

 const canvasSize = { width: video.width, height: video.height };

 faceapi.matchDimensions(canvas, canvasSize);

 log("Done.");

 setInterval(async () => {

   const detections = await faceapi

     .detectAllFaces(video)

     .withFaceLandmarks()

     .withFaceDescriptors();


   // Set detections size to the canvas size

   const detectionsArray = faceapi.resizeResults(detections, canvasSize);

   canvas.getContext("2d").clearRect(0, 0, canvas.width, canvas.height);


   detectionsDraw(canvas, faceMatcher, detectionsArray);

 }, 10000);

});

In this listener, we will first load face descriptions from our faces' folder, what does this mean?

const labeledFaceDescriptors = await loadLabeledFaceDescriptors();

The loadLabeledFaceDescriptors() will load all images from this folder and for each sub-folder (or face) will create a:

  • Label: contain the name of the person.

  • Description: specific JSON formatted data specific for this person, we will use it for the recognition mechanism.

  • The function uses this array to extract the sub-folders structure.

const faces = [

   {

     id: 1,

     label: "mahrez",

     images: ["./faces/mahrez/1.jpg", "./faces/mahrez/2.jpg"],

   },

   {

     id: 2,

     label: "mohamed",

     images: ["./faces/mohamed/1.jpg", "./faces/mohamed/2.jpg"],

   },

 ];
  • You can use a database if you want to, it's just for demonstration, and to keep things simple.

After loading the descriptions, we will search for any match in the video using: faceapi.FaceMatcher

const faceMatcher = new faceapi.FaceMatcher(labeledFaceDescriptors);

In this part, we are creating a canvas, which will serve to draw our detection box.

// Creating the canvas

 const canvas = faceapi.createCanvasFromMedia(video);


 // This will force the use of a software (instead of hardware accelerated)

 // Enable only for low configurations

 canvas.willReadFrequently = true;


 videoContainer.appendChild(canvas);


 // Resizing the canvas to cover the video element

 const canvasSize = { width: video.width, height: video.height };

 faceapi.matchDimensions(canvas, canvasSize);

 log("Done.");

Now we start a timer to scan the video each 10 seconds, to see if there is any match and draw a box with a label:

— containing the name of the recognized person.

Or

— contain ‘unknown’ if the person isn't recognized.

setInterval(async () => {

   const detections = await faceapi

     .detectAllFaces(video)

     .withFaceLandmarks()

     .withFaceDescriptors();


   // Set detections size to the canvas size

   const detectionsArray = faceapi.resizeResults(detections, canvasSize);

   canvas.getContext("2d").clearRect(0, 0, canvas.width, canvas.height);


   detectionsDraw(canvas, faceMatcher, detectionsArray);

 }, 10000);

Full complete source code

Modifying this application

To add a person to the Recognition system:

  • Add the photos to a sub-folder in the “./faces” folder.

  • Add on object containing this structure to the face[] array

{

     id: id,

     label: "_person_",

     images: ["./faces/_person_/1.jpg", "./faces/_person_/2.jpg"],

   },

You can modify the timer in the setInterval ( …,1000).

You can enable or disable canvas.willReadFrequently to your need, by setting it to true or false.

// This will force the use of a software (instead of hardware accelerated)

// Enable only for low configurations

canvas.willReadFrequently = true;

Conclusion

In this tutorial, We have built an AI capable application for face Recognition, using only vanilla JavaScript, No Back-end, no python, no extra API calls, all from your browser.

We have tried to keep things as simple as possible, you can modify the application by adding database calls or a better UI.

For example, adding a notification system, when a person is detected it show more information, not just a simple label containing the name.

You can use this application to:

  • Create an authentication / login system to replace the user/password inputs.

  • Create an alarm system using your CCTV cameras.

  • Portal opening system.

  • Employee Point System.

  • ….

Did you find this article valuable?

Support Let's code by becoming a sponsor. Any amount is appreciated!