NMT Zeroshot

Daniel Chen

May 17, 2020 Deep Learning

Code

Neural Machine Translation(NMT) is a relatively new approach towards the translation of text between languages. Older methods in machine translation were limited in nature and often performed poorly especially with east asian languages. With the advent of deep learning based models however, translation quality vastly improved. We are able to much more freely encode word embeddings and directionality of sequences and more precisely translate those word embeddings with deep learning models using Recurrent Neural Networks. However, as many translation models used word vectors to handle word embeddings, it is possible to for models to fail due to unseen words such as with popular culture. Additionally, the training dataset for one language might have the the translation for one text but another language has the translation for another but not directly between each other. With zero shot learning, we can potentially use one target language and translate into another language that we did not train against. For example, we might be able to encode slang in one language and ideally, decode it to another language where the direct translation has not been trained upon.

In this project, I attempt to replicate the work that Google has published in the fields of NMT. In the Google Neural Machine Translation(GNMT) paper, the authors employ numerous methods to improve upon their models such as using a word piece model to better handle unseen words. In addition to their baseline model, they also developed a method in order to allow their GNMT to perform zero shot translation which essentially allows the network to learn associations between word phrases and to find translations that it has not explictly learned using language targeting tags.

This repository is an attempt to implement a NMT model using a wordpiece model and the additional features needed to create a simplified zero shot translation model. The datasets used are Spanish to English, and French to English sentence pairs. The model was built using a bidirectional LSTMs, ADAM for the optimizer, dropout, and softmax output.

Deep Learning