Understanding the Real Time Person Removal project – TensorFlow.js

Most of us interested in TensorFlow.js would be familiar with the popular demo of disappearing people in real time by Jason Mayes. This is a fantastic example of what is possible using TensorFlow.js in the browser. Jason is gracious to share the code of this application for others to learn and implement their own versions of such applications. 

In this article, I will attempt to document and walkthrough this code, so that it helps anyone who is trying to understand the code for this app.

At a high level, this app uses the TensorFlow.js pre-trained BodyPix model to identify the person in the image (using the pixel wise classification output) and remove it real time from the screen. 

Let’s dive into the code now.

Index.html file

Let’s discuss the two key things in the index.html file.

1. Load the JavaScript packages required for this app

This app requires the TensorFlow.js library and the BodyPix package which will be used to identify the pixels that correspond to a person in the image. 

The script.js file is the JavaScript file where the code for this app resides.

<!-- Import TensorFlow.js library -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js" type="text/javascript"></script>
<!-- Load the bodypix model to recognize body parts in images -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/body-pix@2.0"></script>
<!-- Import the page's JavaScript to do some stuff -->
<script src="script.js" defer></script>

2. The video element

There is a Div element (id=liveview) defined in the index.html that wraps the video element. The video element is going to display the live video from the webcam which would be then used to process the frames through the BodyPix model.

<div id="liveView" class="webcam">
<button id="webcamButton">Enable Webcam</button>
<video id="webcam" autoplay></video>
</div>

We are now ready to walkthrough the code in script.js file where all the action happens.

Script.js file

Let’s try to put this code together step by step to understand in detail.

1. Declare the required objects and configs

To start with, declare the variables and the configurations required as input for the BodyPix model. The object “bodyPixProperties” contains parameters that will be provided as input while loading the BodyPix model and the object “segmentationProperties” holds the parameters that will be provided as input during object detection. Please refer to the BodyPix model’s github page to get a better understanding of the configurations in these two objects.

const video = document.getElementById('webcam');
const liveView = document.getElementById('liveView');
const demosSection = document.getElementById('demos');
const DEBUG = false;
// An object to configure parameters to set for the bodypix model.
// See github docs for explanations.
const bodyPixProperties = {
architecture: 'MobileNetV1',
outputStride: 16,
multiplier: 0.75,
quantBytes: 4
};
// An object to configure parameters for detection. I have raised
// the segmentation threshold to 90% confidence to reduce the
// number of false positives.
const segmentationProperties = {
flipHorizontal: false,
internalResolution: 'high',
segmentationThreshold: 0.9,
scoreThreshold: 0.2
};

2. Load the BodyPix model

The object bodyPix is available because we loaded the body-pix package using script tag in the index.html file. Invoke the load method by passing the “bodyPixProperties” object as input. This method returns a promise which resolves to the body-pix model object. Once the model is loaded, the variable “modelHasLoaded” is set to true and the “section” element which wraps the Div element (liveview) is made visible which displays the “Enable Webcam” button on the screen.

// Let's load the model with our parameters defined above.
// Before we can use bodypix class we must wait for it to finish
// loading. Machine Learning models can be large and take a moment to
// get everything needed to run.
var modelHasLoaded = false;
var model = undefined;
model = bodyPix.load(bodyPixProperties).then(function (loadedModel) {
model = loadedModel;
modelHasLoaded = true;
// Show demo section now model is ready to use.
demosSection.classList.remove('invisible');
});

3. Declare the canvas elements

There are two canvas elements created

  • The “videoRenderCanvas” is used here as an in-memory canvas that will temporarily hold the video frames during classification of the image on each frame. This canvas is not added to the DOM and will not be displayed on the screen. 
  • The “webcamCanvas” element is the one that is used to render the output frames after removing the person. This is the one displayed at the bottom of the screen (below the live video) where the person is invisible. 

Please note that there is a third canvas element named “bodyPixCanvas” that is commented as it is not used.

// We will create a temporary canvas to render to store frames from
// the webcam stream for classification.
var videoRenderCanvas = document.createElement('canvas');
var videoRenderCanvasCtx = videoRenderCanvas.getContext('2d');
// Lets create a canvas to render our findings to the DOM.
var webcamCanvas = document.createElement('canvas');
webcamCanvas.setAttribute('class', 'overlay');
liveView.appendChild(webcamCanvas);
// Create a canvas to render ML findings from to manipulate.
/* var bodyPixCanvas = document.createElement('canvas');
bodyPixCanvas.setAttribute('class', 'overlay');
var bodyPixCanvasCtx = bodyPixCanvas.getContext('2d');
bodyPixCanvasCtx.fillStyle = '#FF0000';
liveView.appendChild(bodyPixCanvas); */

4. Add event listener to the “Enable Webcam” button

Check if webcam is supported and then add an event listener to the “Enable Webcam” button’s “click” event by passing the “enableCam()” callback function.

// Check if webcam access is supported.
function hasGetUserMedia() {
return !!(navigator.mediaDevices &&
navigator.mediaDevices.getUserMedia);
}
// If webcam supported, add event listener to button for when user
// wants to activate it.
if (hasGetUserMedia()) {
const enableWebcamButton = document.getElementById('webcamButton');
enableWebcamButton.addEventListener('click', enableCam);
} else {
console.warn('getUserMedia() is not supported by your browser');
}

5. enableCam() function

Let’s break down the code in this function and look at them in detail.

5.1 Check BodyPix model is loaded

  • If the BodyPix model is not loaded, then exit from the function. 
  • Hide the “Enable Webcam” button after it is clicked once as the video will start playing and classification will be done continuously on the live feed.
function enableCam(event) {
if (!modelHasLoaded) {
return;
}
// Hide the button.
event.target.classList.add('removed');
...
}

5.2 Activate the webcam

Declare the parameters that will be passed to the getUserMedia() function. Since for this app, we only require the video feed, the video parameter alone is set to “true”.

Invoke the getUserMedia() function which returns a Promise that resolves to the live webcam video stream. 

Add an event listener for the “loadedmetadata” event. In the callback function of this event listener, first update the width and height of the two canvas elements that we declared earlier to that of the video elements width and height. The width and height for the video and canvas elements should match for the BodyPix model to process the frame and output the results accurately. 

Display the first frame of the video in the “webcamCanvas” which will be displayed below the live video on the screen.

Assign the video stream as the source for the video element, so that the live view from the webcam is displayed on the screen.

Finally add an event listener to the “loadeddata” event of the video object, and pass the “predictWebcam()” callback function.

// Enable the live webcam view and start classification.
function enableCam(event) {
...
// getUsermedia parameters.
const constraints = {
video: true
};
// Activate the webcam stream.
navigator.mediaDevices.getUserMedia(constraints).then(function(stream) {
video.addEventListener('loadedmetadata', function() {
// Update widths and heights once video is successfully played
// otherwise it will have width and height of zero initially causing
// classification to fail.
webcamCanvas.width = video.videoWidth;
webcamCanvas.height = video.videoHeight;
videoRenderCanvas.width = video.videoWidth;
videoRenderCanvas.height = video.videoHeight;
/* bodyPixCanvas.width = video.videoWidth;
bodyPixCanvas.height = video.videoHeight; */
let webcamCanvasCtx = webcamCanvas.getContext('2d');
webcamCanvasCtx.drawImage(video, 0, 0);
});
video.srcObject = stream;
video.addEventListener('loadeddata', predictWebcam);
});
}

6. predictWebcam() function

This function will be repeatedly invoked and each frame will be passed to the BodyPix model for classification and the output of the model will be then processed to remove the person from the frame. 

Copy the video frame to the “videoRenderCanvas” which is present only in-memory and not displayed on the screen.

Invoke “model.segmentPerson()” and pass the “videoRenderCanvas” which holds the current frame of the video and the “segmentationProperties” object that was declared earlier. This method returns the Promise that resolves to the model’s output object – “segmentation”. Once the output from the model is available, invoke the “processSegmentation” function to process the model’s output and remove the person from the image. The processed image will then be displayed in the “webcamCanvas” element.

Pass this function to the “window.requestAnimationFrame()” method to invoke this function repeatedly for each frame in the video.

var previousSegmentationComplete = true;
// This function will repeatedly call itself when the browser is ready to
// process the next frame from the webcam.
function predictWebcam() {
if (previousSegmentationComplete) {
// Copy the video frame from the webcam to a temporary canvas in memory only (not in the DOM).
videoRenderCanvasCtx.drawImage(video, 0, 0);
previousSegmentationComplete = false;
// Now classify the canvas image we have available.
model.segmentPerson(videoRenderCanvas, segmentationProperties).then(function(segmentation) {
processSegmentation(webcamCanvas, segmentation);
previousSegmentationComplete = true;
});
}
// Call this function again to keep predicting when the browser is ready.
window.requestAnimationFrame(predictWebcam);
}

7. processSegmentation() function

The BodyPix model’s output object contains a “width”, “height”, “allPoses” array and a “data” array of 0s and 1s for every pixel in the image. The pixels that are part of a person’s body part are denoted by 1s in the “data” array, otherwise 0s.

{
  width: 640,
  height: 480,
  data: Uint8Array(307200) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, …],
  allPoses: [{"score": 0.4, "keypoints": […]}, …]
}

The “webcamCanvas” element at this point will be holding the previous frame image that was drawn on to it. The video element will hold the live frame from the webcam. We have the pixel wise classification output on the live frame currently. 

In case, the model’s output detects a person’s body, then the idea is to loop through all the pixels in the image, and update only those pixels of the “webcamCanvas” element with that of the live image data where the data array contains a 0 (person not found) for that pixel index. Thereby we update the “webcamCanvas” element frame with live data in all pixels other than the ones where a person’s body is identified. The area where a person’s body is found, is left without updating, so that it can hold the previous frame data.

In the event, the model’s output doesn’t detect a person’s body, then update all the pixels of the “webcamCanvas” element with that of the live image data from the video element.

Let’s look at the code in detail.

7.1 Get the images data

The “webcamCanvas” element currently holds the previous frame image. Get that image by invoking the ctx.getImageData() function. 

The “videoRenderCanvas” element currently holds the live video frame. Get that image by invoking the videoRenderCanvasCtx.getImageData() function.

Get the data property (array) from the ImageData of both “webcamCanvas” and “videoRenderCanvas” elements.

// Render returned segmentation data to a given canvas context.
function processSegmentation(canvas, segmentation) {
var ctx = canvas.getContext('2d');
console.log(segmentation)
// Get data from our overlay canvas which is attempting to estimate background.
var imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);
var data = imageData.data;
// Get data from the live webcam view which has all data.
var liveData = videoRenderCanvasCtx.getImageData(0, 0, canvas.width, canvas.height);
var dataL = liveData.data;
...
}

7.2 Identify the bounding box coordinates for the person’s body

As a next step, the rectangular bounding box coordinates for the person’s body in the image is identified. Loop through all pixels of the image and calculate the bounding box coordinates based on those pixel indexes where a 1 is present in the “data” array of the model’s output.

In the below code, the first for loop loops through for the entire width of the image, and the inner for loop, loops through the entire height of the image. So, for each x value, the entire y value is looped. The “n” variable calculates the index position of the pixel or in other words the nth pixel that is being looked at in the image. Look at the below image, which depicts the loop, pictorially.

function processSegmentation(canvas, segmentation) {
...
var minX = 100000;
var minY = 100000;
var maxX = 0;
var maxY = 0;
var foundBody = false;
// Go through pixels and figure out the bounding box of body pixels.
for (let x = 0; x < canvas.width; x++) {
for (let y = 0; y < canvas.height; y++) {
let n = y * canvas.width + x;
// Human pixel found. Update bounds.
if (segmentation.data[n] !== 0) {
if(x < minX) {
minX = x;
}
if(y < minY) {
minY = y;
}
if(x > maxX) {
maxX = x;
}
if(y > maxY) {
maxY = y;
}
foundBody = true;
}
}
}
...
}

7.3 Calculate the dimensions of the bounding box

Get the width and height of the bounding box, by subtracting the X and Y values between min and max coordinates. 

Increase the width and height by a factor of 1.3 to account for the false negatives around the region where a body is identified. Now since the width and height are scaled up, recalculate the X and Y coordinates so that the bounding box is equally adjusted with respect to the center.

function processSegmentation(canvas, segmentation) {
...
// Calculate dimensions of bounding box.
var width = maxX - minX;
var height = maxY - minY;
// Define scale factor to use to allow for false negatives around this region.
var scale = 1.3;
// Define scaled dimensions.
var newWidth = width * scale;
var newHeight = height * scale;
// Calculate the offset to place a new bounding box so scaled from the center of the current bounding box.
var offsetX = (newWidth - width) / 2;
var offsetY = (newHeight - height) / 2;
var newXMin = minX - offsetX;
var newYMin = minY - offsetY;
...
}

7.4 Update the background of the canvas outside of the bounding box

Loop through all the pixels and if there is a person’s body identified, then update only those pixels in the “webcamCanvas” element with the live video data, which are outside of the bounding box coordinates. 

If there is no person’s body identified in this current frame, then update all the pixels of “webcamCanvas” element with that of the live video data. 

7.4.1 How the image data is updated

The “ImageData” object returned by getImageData() function contains information about every pixel of the image. The “data” property of the “ImageData” object contains an array of color/alpha information for every pixel in the image.  

Please refer to the link to understand more about getImageData() function and its data property.

To quote from that link,
“For every pixel in an ImageData object there are four pieces of information, the RGBA values:

R - The color red (from 0-255)
G - The color green (from 0-255)
B - The color blue (from 0-255)
A - The alpha channel (from 0-255; 0 is transparent and 255 is fully visible)

The color/alpha information is held in an array, and is stored in the data property of the ImageData object.”

The “data” property of “ImageData” will contain an array of size width * height * 4, where width and height are the dimensions of the image. 

So, for every pixel, 4 items in the “ImageData.data” array are updated corresponding to the color/alpha values corresponding to that pixel.

Finally draw the ImageData on the “webcamCanvas” element by invoking the putImageData() function on its context.

function processSegmentation(canvas, segmentation) {
...
// Now loop through update background understanding with new data
// if not inside a bounding box.
for (let x = 0; x < canvas.width; x++) {
for (let y = 0; y < canvas.height; y++) {
// If outside the bounding box and we found a body, update the background.
if (foundBody && (x < newXMin || x > newXMin + newWidth) || ( y < newYMin || y > newYMin + newHeight)) {
// Convert xy co-ords to array offset.
let n = y * canvas.width + x;
data[n * 4] = dataL[n * 4];
data[n * 4 + 1] = dataL[n * 4 + 1];
data[n * 4 + 2] = dataL[n * 4 + 2];
data[n * 4 + 3] = 255;
} else if (!foundBody) {
// No body found at all, update all pixels.
let n = y * canvas.width + x;
data[n * 4] = dataL[n * 4];
data[n * 4 + 1] = dataL[n * 4 + 1];
data[n * 4 + 2] = dataL[n * 4 + 2];
data[n * 4 + 3] = 255;
}
}
}
ctx.putImageData(imageData, 0, 0);
}

That is it, hope this helps in understanding how the app works a little quicker. Please leave your comments below, if you have any questions.

References

  1. Github repository for the Real-Time-Person-Removal project by Jason Mayes.
  2. Github repository for the TensorFlow.js pre-trained BodyPix model.
  3. HTML canvas getImageData() method from W3Schools.
  4. Real-time Human Pose Estimation in the Browser with TensorFlow.js using PoseNet model.

Leave a Comment

Your email address will not be published. Required fields are marked *