I think this tech is too complex to run on mobile devices in 8 seconds for video transfer from 1 selfie.
What I think Zao does is preprocess the videos (manually or highly accurate facepoint detection). They pre-calculate the transforms (standard facemorphing algorithms with opacity/alpha tweaks), and shading depending on the scene lighting. Then they just need a good frontal selfie, or do some frontalization, and keypoint detection, and the rest can be rendered/computed without much resources, following a pre-defined script.
Yeah, I think the big question is if Zao only allows pre-selected "scenes" that they have already done the processing on or if they allow you to upload any video.
From the results, I think you are exactly right in how they are accomplishing those videos.
What I think Zao does is preprocess the videos (manually or highly accurate facepoint detection). They pre-calculate the transforms (standard facemorphing algorithms with opacity/alpha tweaks), and shading depending on the scene lighting. Then they just need a good frontal selfie, or do some frontalization, and keypoint detection, and the rest can be rendered/computed without much resources, following a pre-defined script.
If more advanced than facemorphing, then perhaps something more like: https://github.com/facebookresearch/supervision-by-registrat... (pre-fitting a 3D face mask, then texturing it with your selfie)