Hacker News new | past | comments | ask | show | jobs | submit login
Code Walkthrough of Bert with PyTorch for a Multilabel Classification in NLP (wootric.com)
13 points by prabhatjha on Aug 14, 2019 | hide | past | favorite | 2 comments



For fine tuning BERT onto a specific domain, what amount of text data would you recommend to train on?


Since the training is done on the tasks of masked word prediction and contiguous sentence prediction, I'd suggest about a million sentences (from the same domain), with an average token length of 7 per sentence. Longer sentences would definitely help, as BERT uses the transformer encoder architecture which has multi head attention. This would enable the model to do better contextual representation learning for the embedding layer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: