I don't see benchmarks for this particular implimentation yet, but the readme references the Gentry-Halevi-Smart optimizations as an inspiration. That paper Homomorphic Evaluation of the AES Circuit (http://eprint.iacr.org/2012/099.pdf), discusses three FHE methods of computing AES-128. The fastest disclosed implementation has an amortized run time (on a beast of a machine -- 18MB cache, 256GB RAM) of around 19 seconds per byte. In contrast some CPU have native instructions that can run AES-128 on the order of cycles per byte. So we are roughly talking a 10 billion fold slowdown.