Scala and OpenCV Ep 2: Akka Face Detector

In Episode 1 of this series on Scala and computer vision, we created a basic Akka-Streams-powered webcam feed app. To bring it to the next level, we will dig a little deeper into the OpenCV toolset and bring in feature detection as well as video stream editing.

'Beachape face detected'

We will build on the foundations from the previous post and continue with the usage of Akka Streams, modeling our application as a series of small transformations that are run asynchronously, with backpressure handled automatically.

Flow chart

Previously, our app could be represented by a somewhat trivial flow chart that nonetheless had all the elements of a useful Akka stream: a Source, multiple transformations, and controlled side-effecting.

To build our face detector, we will add the following:

Conversion to grey scale: Many image analysis tools need to be run on greyscale images, both for simplicity and efficiency.
Facial features detector: We will make use of OpenCV’s Haar Cascade feature detection API to detect and identify faces in our video feed.
Video editing: We want to draw rectangles around the faces that have been identified into the image.

Our updated flow chart is as follows (new transformations are highlighted by a light green rectangle):

Greyscale

To convert a given Mat to a greyscale Mat, we can make use of the OpenCV method cvtColor. The only slight niggle is that the method isn’t idempotent: if you try to convert a greyscale image to greyscale, the method will throw. No matter, we can try handle that scenario ourselves by detecting the number of channels in the matrix.

def toGreyScale(mat: Mat): Mat = {
  if (mat.channels() == 1) {
    mat // just hand back the matrix as is; it is already grey
  } else {
    // allocate a new Matrix with the same dimensions
    val greyMat = {
      val (rows, cols) = (mat.rows(), mat.cols())
      new Mat(rows, cols, CV_8U)
    }
    opencv_imgproc.cvtColor(mat, greyMat, COLOR_BGR2GRAY, 1)
    greyMat
  }
}

However, since we want to pass the original colour image and the new greyscale image down the pipeline, we’ll make things a bit easier for ourselves by defining a simple WithGreyscale case class to hold both:

object WithGrey {

  /**
   * Simple transformer method that produces a [[WithGrey]]
   */
  def build(orig: Mat): WithGrey = {
    val grey = toGreyScale(orig)
    WithGrey(orig = orig, grey = grey)
  }

  // toGreyScale is in here too
}

/**
 * Original Matrix with a Grey image. Useful because almost all analysis processing requires a greyscale image instead of
 * a colour image.
 *
 * The constructor is private to make sure we don't mix up the two references
 *
 * Passing [[WithGrey]] images along with the original saves us from having to process to grey scale over and over again.
 */
final case class WithGrey private (orig: Mat, grey: Mat)

Face detection

To find faces in the images in our video feed, we will make use of Haar feature-based cascade classifiers, which are supported directly by OpenCV. Haar Cascade classifers define how to look at an image and quickly identify any areas in it that are of interest to us. A given classifier definition will usually contain multiple stages, so that a region is considered to test positive if all features in all stages of the definition return positive (thus cascade).

In actual usage, this relies on careful training and tuning of classifier defintions, as well as a combination of clever mathematics and pragmatic optimisation for detection. I will not cover exactly how they work in this tutorial (my understanding is dubious and there is a wealth of information online about them), but the following are a couple links that really helped me understand the theory behind them and how they work in practice:

OpenCV’s Haar Classifier API (or perhaps JavaCV’s wrapping of it) is fairly straight forward and boils down to:

Instantiating a CascadeClassifier, passing in a path to a classifier definition (you can find some here) as a constructor argument
Instantiating an instance of RectVector, which is aptly named because it is a wrapper for a native vector of rectangles.
Pass the allocated instance of RectVector to the CascadeClassifier’s detectMultiScale along with a greyscale image and some other options (yes, OpenCV will mutate the RectVector you pass in by adding in Rects)

In our implementation of a face detector, we’ll wrap a few raw (but aliased) primitives that serve as option flags in OpenCV, just for our own sanity. We’ll also create a delegator class that has a detect(withGrey: WithGrey): (WithGrey, Seq[Face]) method and wraps the classifier to hold constant values for the classifier options because for our purposes, those won’t be changing on the fly.

Tuple-like class for holding width and height in pixels (Dimensions.scala) download

/**
 * Tuple-like class for holding width and height in pixels
 */
case class Dimensions(width: Int, height: Int)

Nothing face-specific in this class per say; it can hold ids and Rects for any detected object (Face.scala) download

/**
  * Holds an id and an OpenCV Rect defining the corners of a rectangle.
  *
  * There is nothing *face* specific in this class per say; it can hold ids and Rects for any detected
  * object
  */
case class Face(id: Long, faceRect: Rect)

Haar classifier option wrapper class (HaarDetectorFlag.scala) download

sealed abstract class HaarDetectorFlag(val flag: Int)

case object HaarDetectorFlag {

  case object DoCannyPruning extends HaarDetectorFlag(CV_HAAR_DO_CANNY_PRUNING)
  case object ScaleImage extends HaarDetectorFlag(CV_HAAR_SCALE_IMAGE)
  case object FindBiggestObject extends HaarDetectorFlag(CV_HAAR_FIND_BIGGEST_OBJECT)
  case object DoRoughSearch extends HaarDetectorFlag(CV_HAAR_DO_ROUGH_SEARCH)

}

Face detector class that holds a Haar classifier (FaceDetector.scala) download

object FaceDetector {

  /**
   * Builds a FaceDetector with the default Haar Cascade classifier in the resource directory
   */
  def defaultCascadeFile(
    dimensions: Dimensions,
    scaleFactor: Double = 1.3,
    minNeighbours: Int = 3,
    detectorFlag: HaarDetectorFlag = HaarDetectorFlag.DoCannyPruning,
    minSize: Dimensions = Dimensions(width = 30, height = 30),
    maxSize: Option[Dimensions] = None
  ): FaceDetector = {
    val classLoader = this.getClass.getClassLoader
    val faceXml = classLoader.getResource("haarcascade_frontalface_alt.xml").getPath
    new FaceDetector(
      dimensions = dimensions,
      classifierPath = faceXml,
      scaleFactor = scaleFactor,
      minNeighbours = minNeighbours,
      detectorFlag = detectorFlag,
      minSize = minSize,
      maxSize = maxSize
    )
  }
}

class FaceDetector(
    val dimensions: Dimensions,
    classifierPath: String,
    scaleFactor: Double = 1.3,
    minNeighbours: Int = 3,
    detectorFlag: HaarDetectorFlag = HaarDetectorFlag.ScaleImage,
    minSize: Dimensions = Dimensions(width = 30, height = 30),
    maxSize: Option[Dimensions] = None
) {

  private val faceCascade = new CascadeClassifier(classifierPath)

  private val minSizeOpenCV = new Size(minSize.width, minSize.height)
  private val maxSizeOpenCV = maxSize.map(d => new Size(d.width, d.height)).getOrElse(new Size())

  /**
   * Given a frame matrix, a series of detected faces
   */
  def detect(frameMatWithGrey: WithGrey): (WithGrey, Seq[Face]) = {
    val currentGreyMat = frameMatWithGrey.grey
    val faceRects = findFaces(currentGreyMat)
    val faces = for {
      i <- 0L until faceRects.size()
      faceRect = faceRects.get(i)
    } yield Face(i, faceRect)
    (frameMatWithGrey, faces)
  }

  private def findFaces(greyMat: Mat): RectVector = {
    val faceRects = new RectVector()
    faceCascade.detectMultiScale(greyMat, faceRects, scaleFactor, minNeighbours, detectorFlag.flag, minSizeOpenCV, maxSizeOpenCV)
    faceRects
  }

}

To be clear, there is really nothing face-specific in our classifier because what it detects is entirely dependent on the Haar cascade XML file passed to it on construction.

Drawing rectangles

Once we have a list of rectangles that denote where our objects are in the image matrix, the last thing we need to do is draw the rectangles on the original image matrix. OpenCV provides a rectangle method that takes a Mat and two points denoting the top left and bottom right corners of a rectangle and draws the rectangle to the matrix it in-place. Here again, our implementation will clone the matrix first before calling the OpenCV method so as to keep our code easy to reason about.

(FaceDrawer.scala) download

class FaceDrawer(fontScale: Float = 0.6f) {

  private val RedColour = new Scalar(AbstractCvScalar.RED)

  /**
   * Clones the Mat, draws squares around the faces on it using the provided [[Face]] sequence and returns the new Mat
   */
  def drawFaces(withGrey: WithGrey, faces: Seq[Face]): Mat = {
    val clonedMat = withGrey.orig.clone()
    for (f <- faces) drawFace(clonedMat, f)
    clonedMat
  }

  private def drawFace(clonedMat: Mat, f: Face): Unit = {
    rectangle(
      clonedMat,
      new Point(f.faceRect.x, f.faceRect.y),
      new Point(f.faceRect.x + f.faceRect.width, f.faceRect.y + f.faceRect.height),
      RedColour,
      1,
      CV_AA,
      0
    )

    // draw the face number
    val cvPoint = new Point(f.faceRect.x, f.faceRect.y - 20)
    putText(clonedMat, s"Face ${f.id}", cvPoint, FONT_HERSHEY_SIMPLEX, fontScale, RedColour)
  }

}

Our FaceDrawer will expose adrawFaces method that takes a WithGrey with a list of detected Faces and use the above method to draw rectanges around each face. We’ll also make use of OpenCV’s putText method to write the word “Face” along with a number right on top of the rectangle.

UI

We’ll hook up all our components in a simple Swing app. To make things a little more interesting, the app will consist of 2 frames:

An initial frame to allow the user to choose between loading a custom Haar cascade classifier file or to load the default one that’s packaged in resources
The actual CanvasFrame shows our feed along with rectangles around detected objects

WebcamFaceDetector UI (WebcamFaceDetector.scala) download

object WebcamFaceDetector extends SimpleSwingApplication {

  def top: Frame = new OptionsFrame

  /**
   * This is the initial frame, which presents two simple options, to load a custom Haar cascade file for face detection,
   * or to use the default one
   */
  private class OptionsFrame extends Frame { currentFrame =>

    peer.setDefaultCloseOperation(javax.swing.JFrame.EXIT_ON_CLOSE)

    val imageDimensions = Dimensions(width = 640, height = 480)
    val chooseCascadeBtn = Button("Load custom Haar cascade file") {
      val filePath = openChooser()
      filePath.foreach { path =>
        val detector = new FaceDetector(dimensions = imageDimensions, classifierPath = path)
        openFaceDetectionWindow(detector)
      }
    }
    val defaultCascadeBtn = Button("Use default face Haar cascade file") {
      val detector = FaceDetector.defaultCascadeFile(imageDimensions)
      openFaceDetectionWindow(detector)
    }

    val mainPanel = new GridPanel(rows0 = 0, cols0 = 1) {
      preferredSize = new Dimension(300, 200)
      contents ++= Seq(chooseCascadeBtn, defaultCascadeBtn)
    }

    contents = mainPanel

    private def openChooser(): Option[String] = {
      val chooser = new FileChooser(new java.io.File("."))
      chooser.fileSelectionMode = FileChooser.SelectionMode.FilesOnly
      chooser.showOpenDialog(currentFrame) match {
        case FileChooser.Result.Approve => Some(chooser.selectedFile.toPath.toAbsolutePath.toString)
        case _ => None
      }
    }

    private def openFaceDetectionWindow(faceDetector: FaceDetector): Unit = {
      new DetectionFrame(faceDetector)
      peer.setDefaultCloseOperation(javax.swing.WindowConstants.DO_NOTHING_ON_CLOSE)
      currentFrame.close()
    }

  }

  /**
   * Our detection window; opened by Initial Frame
   */
  private class DetectionFrame(faceDetector: FaceDetector) {

    implicit val system = ActorSystem()
    implicit val materializer = ActorMaterializer()

    val webcamSource = Webcam.source(deviceId = 0, dimensions = faceDetector.dimensions)

    val canvas = new CanvasFrame("Webcam")
    //  //Set Canvas frame to close on exit
    canvas.setDefaultCloseOperation(javax.swing.JFrame.EXIT_ON_CLOSE)

    val faceDrawer = new FaceDrawer()

    val flow = webcamSource
      .map(MediaConversion.toMat) // most OpenCV manipulations require a Matrix
      .map(Flip.horizontal)
      .map(WithGrey.build)
      .map(faceDetector.detect)
      .map((faceDrawer.drawFaces _).tupled)
      .map(MediaConversion.toFrame) // convert back to a frame
      .map(canvas.showImage)
      .to(Sink.ignore)

    flow.run()

  }
}

Notice that once again, the code defining the Akka Flow Graph maps almost one to one to our flow chart.

Conclusion

We now have a face detector that uses OpenCV’s Haar cascade classifier toolbelt and draws rectangles around any identified faces, and we made it by expanding on the Akka Stream foundations laid in the previous post. As before, the code for this tutorial can be found on Github.

In the next post, we’ll expand this further by classifying the faces that we’ve detected as smiling or not using a supervised machine-learning model. We could of course continue to use Haar cascades to identify smiles in our feed (we can simply choose to load a smile Haar cascade classifier file), but what would be the fun in that ? :)

Credits

Playing with OpenCV in Scala to do face detection with Haarcascade classifier using a webcam