My understanding of both this and Apple AFT is that they are trained with attent...

		RC_ITR on May 15, 2023 \| parent \| context \| favorite \| on: Brex’s Prompt Engineering Guide My understanding of both this and Apple AFT is that they are trained with attention, but then inference is done as an RNN. Is your understanding different?