This is really a strange take. Encoder-only transformers like BERT and RoBERTa have been wildly popular in NLP for years now, replacing pretty much every model that came prior to it and beating pretty much all traditional NLP benchmarks (tagging, parsing, etc.).