From Word2Vec to BERT: A Review on Language Representations

Authors

Anshul Vashisth
Department of Computer Engineering, J.C. Bose University of Science & Technology, YMCA, Faridabad, Haryana, India.
Vedpal
Department of Computer Applications, J.C. Bose University of Science & Technology, YMCA, Faridabad, Haryana, India.
Piyush Gupta
Department of Information Technology , J.C. Bose University of Science & Technology, YMCA, Faridabad, Haryana, India.

Abstract

Transfer learning in the field of language generation is the fundamental idea that comprises pre-training calibrating with fine-tuning of the tasks basis on a particular model. Traditional models were based on a Word embedding such as word2vec and GloVe. These models were used for downstream Natural Language Processing (NLP) tasks. The major limitation of these models was that there was a limit to the amount of information they could capture and didn’t take the context of word into account that result in losing valuable information. These limitations encourage in developing new language generation tasks. A new language depiction model known as Bidirectional Encoder Representations from Transformers (BERT) was introduced in 2018 which is a deep bidirectional model which gives state-of-the-art results to the NLP community. In this paper, we review the convolution models and new bidirectional models which do a revolutionary change in the NLP.