Fill-mask (masked language modeling) is the original BERT pretraining objective: mask 15% of tokens, predict what goes there. It powered the encoder revolution that dominated NLP from 2018 to 2022 and remains the training signal behind models like RoBERTa, DeBERTa, and XLM-RoBERTa that still run most production classification and NER systems. As a standalone task it has limited direct applications, but probing what a model predicts for masked slots became a key technique for analyzing bias, factual knowledge, and linguistic competence stored in model weights. The task has faded from the research spotlight as decoder-only (GPT-style) pretraining proved more scalable, but encoder models trained with MLM remain the most cost-efficient option for tasks that need fast inference on structured prediction.
General Language Understanding Evaluation for masked language models
Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.
Still looking for something on Fill-Mask? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.
Real humans read every message. We track what people are asking for and prioritize accordingly.