How the language model-based tokenizers fare on domain-specific documents, since language models don't have context for unknown tokens.
Are language model-based tokenizers any better at identifying abbreviations than rule-based ones?
How the language model-based tokenizers fare on domain-specific documents, since language models don't have context for unknown tokens.
Are language model-based tokenizers any better at identifying abbreviations than rule-based ones?