Hello! My name is Vladimir Shalkov, I am an Android developer at Surf.

Not so long ago, we needed to implement a face recognition system on Android with fraud protection. In this article, I will share the most interesting aspects of the implementation with code examples and links. I’m sure you will find something new and interesting for yourself, so sit back, start.

ITKarma picture

Face recognition systems are now becoming more and more popular: the number of devices with face unlock feature is growing, as well as the number of tools for developers.

Apple uses FaceID in its products, in addition they took care of the developers and added an API to access this functionality. FaceID is considered quite safe and can be used to unlock banking applications. Android SDK, until recently, did not have a ready-made solution. Although device manufacturers added the ability to unlock the device with their faces in their firmware, developers could not use the functionality in applications, and the security of this method of unlocking left much to be desired.

Recently, the FingerprintManager class, which was used to unlock applications using fingerprints, was set to API 28 and higher, and developers are encouraged to use BiometricPrompt . This class contains logic related to biometrics, including identification of persons. However, you won’t be able to use it in every smartphone, because according to information from Google , the device must have a high security rating.

Some devices do not have a built-in fingerprint scanner, they refused it because of the high level of protection against fraud in face recognition and all thanks to the front ToF (Time-of-flight) to the sensor. Using it, you can build a depth map, thereby increasing the system’s resistance to cracking.


The application that we implemented, by its functionality, is an access control system, where a person is a way of identifying a person. Using special algorithms, the affiliation of a person to a real person is checked. A new user can be added to the database directly from the device, taking a picture and specifying a name. If it is necessary to determine the presence of a person in the database, then the search is carried out using a photograph taken in real time from the device. Algorithms determine the similarity with persons from the database, if this is found, information about this person is issued.

Our main goal was to ensure the highest level of security: it was necessary to minimize the possibility of bypassing the face recognition system, for example, using a photograph that was brought to the viewfinder. For this, we decided to use the 3D camera Intel RealSense (model D435i), which has a built-in ToF sensor, thanks to it you can get all the necessary data for building depth maps.


As a working device, we needed to use a tablet with a large screen diagonal, which did not have a built-in battery and required a constant connection to the mains.

Another no less important limitation is working offline. Because of this, we could not use cloud services for face recognition. In addition, writing face recognition algorithms from scratch is unreasonable, taking into account time limitations and labor costs.The question arises: why reinvent the wheel, if there are already ready-made solutions? Based on the foregoing, we decided to use the Face SDK from 3DiVi .

Getting Image from Intel RealSense Camera

At the first stage of implementation, it was necessary to obtain two images from a 3D camera: one color, the second with a depth map. Then they will be used by the Face SDK library for further calculations.

To get started with the Intel RealSense camera in an Android project, you need to add the RealSense SDK for Android OS dependency : it is a wrapper over the official C++ library. In the official samples you can find how to initialize and display the picture from the cameras, we won’t stop there, everything is quite simple there. Let's go straight to the image acquisition code:

private val pipeline=Pipeline() private val streamingHandler=Handler() private var streamRunnable: Runnable=object : Runnable { override fun run() { try { FrameReleaser().use { fr -> val frames=pipeline.waitForFrames(1000).releaseWith(fr) val orgFrameSet=frames.releaseWith(fr) val processedFrameSet=frames.applyFilter(align).releaseWith(fr) val orgFrame: Frame=orgFrameSet.first(StreamType.COLOR, StreamFormat.RGB8).releaseWith(fr)//Получаем фрейм цветного изображения val videoFrame: VideoFrame=orgFrame.`as`(Extension.VIDEO_FRAME) val processedDepth: Frame=processedFrameSet.first(StreamType.DEPTH, StreamFormat.Z16).releaseWith(fr)//Получаем фрейм глубины изображения val depthFrame: DepthFrame=processedDepth.`as`(Extension.DEPTH_FRAME) upload(orgFrame)//Выводим на экран цветное изображение } streamingHandler.post(this) } catch (e: Exception) { Logger.d("Streaming, error: " + e.message) } } } streamingHandler.post(streamRunnable)//Запуск 

Using FrameReleaser () we get individual frames from the video stream that are of type Frame. You can apply various filters to frames through applyFilter ().

To get a frame of the desired format, the frame must be converted to the appropriate type. In our case, the first one is of type VideoFrame, the second is DepthFrame.

If we want to display the image on the device’s screen, then for this there is the upload () method, in the parameters the type of frame to be displayed on the screen is indicated, we have frames from a color camera.

Convert frames to image

The next step is to get the images in the format we need from the VideoFrame and DepthFrame. We will use these pictures to determine whether the person in the image belongs to a real person and add information to the database.

Image Format:

  • Color, with the extension.bmp obtained from VideoFrame
  • With a depth map, having the extension.tiff and obtained from DepthFrame

To implement the conversion, we need an OpenCV open-source computer vision library. All the work is to form the Mat object and convert it to the desired format:

fun videoFrameToMat(videoFrame: VideoFrame): Mat { val colorMat=Mat(videoFrame.height, videoFrame.width, CvType.CV_8UC3) val returnBuff=ByteArray(videoFrame.dataSize) videoFrame.getData(returnBuff) colorMat.put(0, 0, returnBuff) val colorMatNew=Mat() Imgproc.cvtColor(colorMat, colorMatNew, Imgproc.COLOR_RGB2BGR) return colorMatNew } 

To save the color image, it is necessary to form a matrix with the type CvType.CV_8UC3, then convert it to BRG, so that the colors have a normal hue.
Using the Imgcodecs.imwrite method, save to device:

fun VideoFrame.saveToFile(path: String): Boolean { val colorMat=videoFrameToMat(this) return Imgcodecs.imwrite(path + COLOR_IMAGE_FORMAT, colorMat) } 

The same thing needs to be done for DepthFrame with the only difference that the matrix must be of type CvType.CV_16UC1, since the image will be built from a frame that contains data from the depth sensor:

fun depthFrameToMat(depthFrame: DepthFrame): Mat { val depthMat=Mat(depthFrame.height, depthFrame.width, CvType.CV_16UC1) val size=(depthMat.total() * depthMat.elemSize()).toInt() val returnBuff=ByteArray(size) depthFrame.getData(returnBuff) val shorts=ShortArray(size/2) ByteBuffer.wrap(returnBuff).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts) depthMat.put(0, 0, shorts) return depthMat } 

Saving Image with Depth Map:

fun DepthFrame.saveToFile(path: String): Boolean { val depthMat=depthFrameToMat(this) return Imgcodecs.imwrite(path + DEPTH_IMAGE_FORMAT, depthMat) } 

Working with the Face SDK

Face SDK has a large amount of software components, but most of them we do not need. The library, like the RealSense SDK, is written in C++ and has a wrapper to make it convenient to work under Android. Face SDK is not free, but if you are a developer, you will be given a test license.

Most library components are configured using XML configuration files. Depending on the configuration, this or that algorithm will be applied.
To get started, you need to create an instance of the FacerecService class, it is used to initialize other components, the path to the DLLs, configuration files and licenses is passed in the parameters.

Next, using this service, you need to create objects of the FacerecService.Config and Capturer classes:

private val service: FacerecService=FacerecService.createService( dllPath, confDirPath, onlineLicenseDir ) private val confManual: FacerecService.Config=service.Config("manual_capturer.xml") private val capturerManual: Capturer=service.createCapturer(confManual) 

The Capturer class is used for face recognition. The manual_capturer.xml configuration means that we will use the algorithms from the OpenCV library - this is a Viola-Jones frontal face detector, Haar features are used for recognition. The library provides a ready-made set of XML files with configurations that differ in terms of recognition quality and runtime. Slower methods have better recognition performance. If we need to recognize faces in a profile, then we should use another XML configuration file - common_lprofile_capturer.xml.There are a lot of configs, they can be found in more detail in the documentation . In our case, it was necessary to use the common_capturer4_singleface.xml config - this is a configuration with a reduced quality threshold as a result of which no more than one person will always be returned.

To find the face in the image, the capturerSingleFace.capture () method is used, into which the byte array of the image that contains the person’s face is transferred:

fun createRawSample(imagePath: String): RawSample? { val imageColorFile=File(imagePath) val originalColorByteArray=ImageUtil.readImage(imageColorFile) return capturerSingleFace.capture(originalColorByteArray).getOrNull(0) } 

The RawSample object stores information about the found face and contains a set of different methods, for example, if you call getLandmarks (), you can get anthropometric dots faces.

Affiliation of a person to a real person

To determine whether a real person is in the frame, and not a photograph attached to the face detection camera, the Face SDK library provides the DepthLivenessEstimator module, it returns an enum with one of four values:

  • NOT_ENOUGH_DATA - too many missing values ​​on the depth map
  • REAL - the observed person belongs to a living person
  • FAKE - the observed person is a photograph
  • NOT_COMPUTED - calculation failed

Module Initialization:

val depthLivenessEstimator: DepthLivenessEstimator=service.createDepthLivenessEstimator( "depth_liveness_estimator_cnn.xml" ) 

Determining a person’s affiliation with a real person:

fun getLivenessState( rgbPath: String, depthPath: String ): DepthLivenessEstimator.Liveness { val imageColorFile=File(rgbPath + COLOR_IMAGE_FORMAT) val originalColorByteArray=readImage(imageColorFile) val originalRawSimple=capturerSingleFace.capture(originalColorByteArray).getOrNull(0) val originalRawImage=RawImage( SCREEN_RESOLUTION_WIDTH, SCREEN_RESOLUTION_HEIGHT, RawImage.Format.FORMAT_BGR, originalColorByteArray ) val originalDepthPtr=Natives().readDepthMap(depthPath + DEPTH_IMAGE_FORMAT)//параметры камеры val hFov=69.4f val vFov=42.5f val depthMapRaw=DepthMapRaw() with(depthMapRaw) { depth_map_rows=originalRawImage.height depth_map_cols=originalRawImage.width depth_map_2_image_offset_x=0f depth_map_2_image_offset_y=0f depth_map_2_image_scale_x=1f depth_map_2_image_scale_y=1f horizontal_fov=hFov vertical_fov=vFov depth_unit_in_millimeters=1f depth_data_ptr=originalDepthPtr depth_data_stride_in_bytes=(2 * originalRawImage.width) } return depthLivenessEstimator.estimateLiveness(originalRawSimple, depthMapRaw) } 

The getLivenessState () method as parameters gets links to images: color and with a depth map. We form a RawImage object from color, this class provides raw image data and optional cropping information. DepthMapRaw is formed from the depth map - a depth map registered in accordance with the original color image. This must be done in order to use the estimateLiveness method (originalRawSimple, depthMapRaw), which will return us an enum with information whether the real person was in the frame.

It is worth paying attention to the formation of the DepthMapRaw object. One of the variables is called depth_data_ptr - a pointer to depth data, but as you know in Java there are no pointers. To get the pointer, you need to use the JNI function , which takes a link to an image with a depth map as an argument:

extern "C" JNIEXPORT jlong JNICALL Java_ru_face_detect_Natives_readDepthMap (JNIEnv *env, jobject obj, jstring jfilename) { const char * buf=env->GetStringUTFChars(jfilename, NULL); std::string filename=buf; env->ReleaseStringUTFChars(jfilename, buf); cv::Mat depth_map=cv::imread(filename, -1); unsigned char * data=new unsigned char[depth_map.rows * depth_map.cols * depth_map.elemSize()]; memcpy(data, depth_map.data, depth_map.rows * depth_map.cols * depth_map.elemSize()); return (jlong) data; } 

To call code written in C in Kotlin, you need to create a class of this type:

class Natives { init { System.loadLibrary("native-lib") } external fun readDepthMap(fileName: String): Long } 

The name of the.cpp file is passed to System.loadLibrary (), where the readDepthMap () method is contained, in our case it is native-lib.cpp. You also need to install the external modifier, which means that the method is not implemented in Kotlin.

Person Identification

An equally important function is the identification of the person found in the frame. Face SDK allows you to implement this using the Recognizer module. Initialization:

val recognizer: Recognizer=service.createRecognizer( "method8v7_recognizer.xml", true, true, true ) 

We use the configuration file method8v7_recognizer.xml, which has the highest recognition rate, but the recognition quality is lower than that of methods 6v7 and 7v7.

Before identifying a person, it is necessary to create a list of faces, using which we will find correspondence according to the sample photograph. To implement, you need to create Vector from Template objects:

var templates=Vector<Template>() val rawSample=createRawSample(imageUrl) val template=recognizer.processing(rawSample) templates.add(template) 

To create a Template, the recognizer.processing () method is used, RawSample is passed as a parameter. After the list with face templates is formed, you need to add it to Recognizer and save the resulting TemplatesIndex, which is needed for quick search in large databases:

val templatesIndex=recognizer.createIndex(templates, SEARCH_THREAD_COUNT) 

At this stage, we formed the Recognizer object, which contains all the necessary information to identify:

fun detectFaceSearchResult(rgbPath: String): Recognizer.SearchResult { val rawSample=createRawSample(rgbPath + COLOR_IMAGE_FORMAT) val template=recognizer.processing(rawSample) val searchResult=recognizer.search( template, templateIndex, searchResultCount, Recognizer.SearchAccelerationType.SEARCH_ACCELERATION_1 ).firstElement() return searchResult } 

The recognizer.search () function will return us the result where we can get the index of the found element, match it with a list of persons from the database and identify the person. In addition, we can find out the similarity value, a real number from 0 to 1. This information is provided in the Recognizer class.MatchResult, scope variable:

val detectResult=detectFaceSearchResult(rgbPath)//Величина сходства шаблонов - действительное число от 0 до 1. val scoreResult=detectResult.matchResult.score 


There is no doubt that in the future such systems will be used everywhere: when approaching the porch, the door will automatically open, and the coffee machine in the office will choose your favorite degree of grinding.

In the Android SDK, an API is gradually being added that allows the developer to work with the face identification system, but now everything is at the initial stage of development. And if we talk about the access control system using an Android tablet, Face SDK library and Intel RealSense 3D camera, I would like to note great flexibility and extensibility. There is no binding to the device, the camera can be connected to any modern smartphone. You can expand the range of supported 3D cameras, as well as connect several pieces to one device. It is possible to adapt the written application to Android Things , and use it in your smart home. If you look at the capabilities of the Face SDK library, then with its help we can add facial identification in a continuous video stream, determine gender, age and emotions. These possibilities provide scope for many experiments. And we can say from our own experience: do not be afraid of experiments and challenge yourself !.