I think this tech is too complex to run on mobile devices in 8 seconds for video...

I think this tech is too complex to run on mobile devices in 8 seconds for video transfer from 1 selfie.

What I think Zao does is preprocess the videos (manually or highly accurate facepoint detection). They pre-calculate the transforms (standard facemorphing algorithms with opacity/alpha tweaks), and shading depending on the scene lighting. Then they just need a good frontal selfie, or do some frontalization, and keypoint detection, and the rest can be rendered/computed without much resources, following a pre-defined script.

If more advanced than facemorphing, then perhaps something more like: https://github.com/facebookresearch/supervision-by-registrat... (pre-fitting a 3D face mask, then texturing it with your selfie)