In Episode 1 of this series on Scala and computer vision, we created a basic Akka-Streams-powered webcam feed app. To bring it to the next level, we will dig a little deeper into the OpenCV toolset and bring in feature detection as well as video stream editing.
We will build on the foundations from the previous post and continue with the usage of Akka Streams, modeling our application as a series of small transformations that are run asynchronously, with backpressure handled automatically.
Flow chart
Previously, our app could be represented by a somewhat trivial flow chart that nonetheless had all the elements of a useful Akka stream: a Source, multiple transformations, and controlled side-effecting.
To build our face detector, we will add the following:
Conversion to grey scale: Many image analysis tools need to be run on greyscale images, both for simplicity and efficiency.
Facial features detector: We will make use of OpenCV’s Haar Cascade feature detection API to detect and identify faces in our video feed.
Video editing: We want to draw rectangles around the faces that have been identified into the image.
Our updated flow chart is as follows (new transformations are highlighted by a light green rectangle):
Greyscale
To convert a given Mat to a greyscale Mat, we can make use of the OpenCV method cvtColor. The only slight niggle is that the method isn’t idempotent: if you try to convert a greyscale image to greyscale, the method will throw. No matter, we can try handle that scenario ourselves by detecting the number of channels in the matrix.
12345678910111213
deftoGreyScale(mat:Mat):Mat={if(mat.channels()==1){mat// just hand back the matrix as is; it is already grey}else{// allocate a new Matrix with the same dimensionsvalgreyMat={val(rows,cols)=(mat.rows(),mat.cols())newMat(rows,cols,CV_8U)}opencv_imgproc.cvtColor(mat,greyMat,COLOR_BGR2GRAY,1)greyMat}}
However, since we want to pass the original colour image and the new greyscale image down the pipeline, we’ll make things a bit easier for ourselves by defining a simple WithGreyscale case class to hold both:
12345678910111213141516171819202122
objectWithGrey{/** * Simple transformer method that produces a [[WithGrey]] */defbuild(orig:Mat):WithGrey={valgrey=toGreyScale(orig)WithGrey(orig=orig,grey=grey)}// toGreyScale is in here too}/** * Original Matrix with a Grey image. Useful because almost all analysis processing requires a greyscale image instead of * a colour image. * * The constructor is private to make sure we don't mix up the two references * * Passing [[WithGrey]] images along with the original saves us from having to process to grey scale over and over again. */finalcaseclassWithGreyprivate(orig:Mat,grey:Mat)
Face detection
To find faces in the images in our video feed, we will make use of Haar feature-based cascade classifiers, which are supported directly by OpenCV. Haar Cascade classifers define how to look at an image and quickly identify any areas in it that are of interest to us. A given classifier definition will usually contain multiple stages, so that a region is considered to test positive if all features in all stages of the definition return positive (thus cascade).
In actual usage, this relies on careful training and tuning of classifier defintions, as well as a combination of clever mathematics and pragmatic optimisation for detection. I will not cover exactly how they work in this tutorial (my understanding is dubious and there is a wealth of information online about them), but the following are a couple links that really helped me understand the theory behind them and how they work in practice:
OpenCV’s Haar Classifier API (or perhaps JavaCV’s wrapping of it) is fairly straight forward and boils down to:
Instantiating a CascadeClassifier, passing in a path to a classifier definition (you can find some here) as a constructor argument
Instantiating an instance of RectVector, which is aptly named because it is a wrapper for a native vector of rectangles.
Pass the allocated instance of RectVector to the CascadeClassifier’s detectMultiScale along with a greyscale image and some other options (yes, OpenCV will mutate the RectVector you pass in by adding in Rects)
In our implementation of a face detector, we’ll wrap a few raw (but aliased) primitives that serve as option flags in OpenCV, just for our own sanity. We’ll also create a delegator class that has a detect(withGrey: WithGrey): (WithGrey, Seq[Face]) method and wraps the classifier to hold constant values for the classifier options because for our purposes, those won’t be changing on the fly.
Tuple-like class for holding width and height in pixels (Dimensions.scala)download
1234
/** * Tuple-like class for holding width and height in pixels */caseclassDimensions(width:Int,height:Int)
Nothing face-specific in this class per say; it can hold ids and Rects for any detected object (Face.scala)download
1234567
/** * Holds an id and an OpenCV Rect defining the corners of a rectangle. * * There is nothing *face* specific in this class per say; it can hold ids and Rects for any detected * object */caseclassFace(id:Long,faceRect:Rect)
Haar classifier option wrapper class (HaarDetectorFlag.scala)download
objectFaceDetector{/** * Builds a FaceDetector with the default Haar Cascade classifier in the resource directory */defdefaultCascadeFile(dimensions:Dimensions,scaleFactor:Double=1.3,minNeighbours:Int=3,detectorFlag:HaarDetectorFlag=HaarDetectorFlag.DoCannyPruning,minSize:Dimensions=Dimensions(width=30,height=30),maxSize:Option[Dimensions]=None):FaceDetector={valclassLoader=this.getClass.getClassLoadervalfaceXml=classLoader.getResource("haarcascade_frontalface_alt.xml").getPathnewFaceDetector(dimensions=dimensions,classifierPath=faceXml,scaleFactor=scaleFactor,minNeighbours=minNeighbours,detectorFlag=detectorFlag,minSize=minSize,maxSize=maxSize)}}classFaceDetector(valdimensions:Dimensions,classifierPath:String,scaleFactor:Double=1.3,minNeighbours:Int=3,detectorFlag:HaarDetectorFlag=HaarDetectorFlag.ScaleImage,minSize:Dimensions=Dimensions(width=30,height=30),maxSize:Option[Dimensions]=None){privatevalfaceCascade=newCascadeClassifier(classifierPath)privatevalminSizeOpenCV=newSize(minSize.width,minSize.height)privatevalmaxSizeOpenCV=maxSize.map(d=>newSize(d.width,d.height)).getOrElse(newSize())/** * Given a frame matrix, a series of detected faces */defdetect(frameMatWithGrey:WithGrey):(WithGrey,Seq[Face])={valcurrentGreyMat=frameMatWithGrey.greyvalfaceRects=findFaces(currentGreyMat)valfaces=for{i<-0LuntilfaceRects.size()faceRect=faceRects.get(i)}yieldFace(i,faceRect)(frameMatWithGrey,faces)}privatedeffindFaces(greyMat:Mat):RectVector={valfaceRects=newRectVector()faceCascade.detectMultiScale(greyMat,faceRects,scaleFactor,minNeighbours,detectorFlag.flag,minSizeOpenCV,maxSizeOpenCV)faceRects}}
To be clear, there is really nothing face-specific in our classifier because what it detects is entirely dependent on the Haar cascade XML file passed to it on construction.
Drawing rectangles
Once we have a list of rectangles that denote where our objects are in the image matrix, the last thing we need to do is draw the rectangles on the original image matrix. OpenCV provides a rectangle method that takes a Mat and two points denoting the top left and bottom right corners of a rectangle and draws the rectangle to the matrix it in-place. Here again, our implementation will clone the matrix first before calling the OpenCV method so as to keep our code easy to reason about.
classFaceDrawer(fontScale:Float=0.6f){privatevalRedColour=newScalar(AbstractCvScalar.RED)/** * Clones the Mat, draws squares around the faces on it using the provided [[Face]] sequence and returns the new Mat */defdrawFaces(withGrey:WithGrey,faces:Seq[Face]):Mat={valclonedMat=withGrey.orig.clone()for(f<-faces)drawFace(clonedMat,f)clonedMat}privatedefdrawFace(clonedMat:Mat,f:Face):Unit={rectangle(clonedMat,newPoint(f.faceRect.x,f.faceRect.y),newPoint(f.faceRect.x+f.faceRect.width,f.faceRect.y+f.faceRect.height),RedColour,1,CV_AA,0)// draw the face numbervalcvPoint=newPoint(f.faceRect.x,f.faceRect.y-20)putText(clonedMat,s"Face ${f.id}",cvPoint,FONT_HERSHEY_SIMPLEX,fontScale,RedColour)}}
Our FaceDrawer will expose adrawFaces method that takes a WithGrey with a list of detected Faces and use the above method to draw rectanges around each face. We’ll also make use of OpenCV’s putText method to write the word “Face” along with a number right on top of the rectangle.
UI
We’ll hook up all our components in a simple Swing app. To make things a little more interesting, the app will consist of 2 frames:
An initial frame to allow the user to choose between loading a custom Haar cascade classifier file or to load the default one that’s packaged in resources
The actual CanvasFrame shows our feed along with rectangles around detected objects
objectWebcamFaceDetectorextendsSimpleSwingApplication{deftop:Frame=newOptionsFrame/** * This is the initial frame, which presents two simple options, to load a custom Haar cascade file for face detection, * or to use the default one */privateclassOptionsFrameextendsFrame{currentFrame=>peer.setDefaultCloseOperation(javax.swing.JFrame.EXIT_ON_CLOSE)valimageDimensions=Dimensions(width=640,height=480)valchooseCascadeBtn=Button("Load custom Haar cascade file"){valfilePath=openChooser()filePath.foreach{path=>valdetector=newFaceDetector(dimensions=imageDimensions,classifierPath=path)openFaceDetectionWindow(detector)}}valdefaultCascadeBtn=Button("Use default face Haar cascade file"){valdetector=FaceDetector.defaultCascadeFile(imageDimensions)openFaceDetectionWindow(detector)}valmainPanel=newGridPanel(rows0=0,cols0=1){preferredSize=newDimension(300,200)contents++=Seq(chooseCascadeBtn,defaultCascadeBtn)}contents=mainPanelprivatedefopenChooser():Option[String]={valchooser=newFileChooser(newjava.io.File("."))chooser.fileSelectionMode=FileChooser.SelectionMode.FilesOnlychooser.showOpenDialog(currentFrame)match{caseFileChooser.Result.Approve=>Some(chooser.selectedFile.toPath.toAbsolutePath.toString)case_=>None}}privatedefopenFaceDetectionWindow(faceDetector:FaceDetector):Unit={newDetectionFrame(faceDetector)peer.setDefaultCloseOperation(javax.swing.WindowConstants.DO_NOTHING_ON_CLOSE)currentFrame.close()}}/** * Our detection window; opened by Initial Frame */privateclassDetectionFrame(faceDetector:FaceDetector){implicitvalsystem=ActorSystem()implicitvalmaterializer=ActorMaterializer()valwebcamSource=Webcam.source(deviceId=0,dimensions=faceDetector.dimensions)valcanvas=newCanvasFrame("Webcam")// //Set Canvas frame to close on exitcanvas.setDefaultCloseOperation(javax.swing.JFrame.EXIT_ON_CLOSE)valfaceDrawer=newFaceDrawer()valflow=webcamSource.map(MediaConversion.toMat)// most OpenCV manipulations require a Matrix.map(Flip.horizontal).map(WithGrey.build).map(faceDetector.detect).map((faceDrawer.drawFaces_).tupled).map(MediaConversion.toFrame)// convert back to a frame.map(canvas.showImage).to(Sink.ignore)flow.run()}}
Notice that once again, the code defining the Akka Flow Graph maps almost one to one to our flow chart.
Conclusion
We now have a face detector that uses OpenCV’s Haar cascade classifier toolbelt and draws rectangles around any identified faces, and we made it by expanding on the Akka Stream foundations laid in the previous post. As before, the code for this tutorial can be found on Github.
In the next post, we’ll expand this further by classifying the faces that we’ve detected as smiling or not using a supervised machine-learning model. We could of course continue to use Haar cascades to identify smiles in our feed (we can simply choose to load a smile Haar cascade classifier file), but what would be the fun in that ? :)