11/30/2022 0 Comments Babelnet chatThe authors train base transformer NMT models on each of these parallel data sets to function as their baselines. For the rare set, they remove a random 10% of each word’s training data (named “sample-10”), effectively lowering the frequency while maintaining the sense distribution. For the unseen set, they remove all training data of the least common sense of each word. In order to directly test cases where there originally was few or no training data for a given sense of a word, the authors split the MuCoW test set into “rare” and “unseen” subsets. With such, they have created a synthetic parallel corpus of multi-sense word examples. To create parallel data, they mark out the collected multi-sense German word, back translate the sentence, and then replace the marker with the original English word, the intended multi-sense English word is present in the source sentences. The top 5 scoring contextualized words and their sentences are then mined. To find German sentences that use the target word with the same sense as in the test corpus, the authors use XLM-R large (Conneau et al., 2020) to create cross-lingual contextual word representations (CCWRs) of the multi-sense word in its English sentence, and measure cosine similarity with each word in each sentence in the German Wikipedia data. The authors first collect all the nouns used in the MuCoW test set and filter out any that only have one general sense registered on BabelNet, ending up with 3,732 multi-sense nouns. They choose XLM-R large for cross-lingual contextual word representations (CCWRs) after it consistently outperformed mBERT in their experiments, but any pre-trained multilingual model and monolingual corpus could recreate this process. The authors particularly emphasize that this framework can be language and resource flexible as it doesn’t require any specific datasets or models. work with English to German translation, although all of these resources are available in multiple languages. A random sample of 2 million German Wikipedia sentences to mine from.the MuCoW datasets ( Raganato et al., 2019), which were designed specifically for word sense disambiguation and provide parallel training corpora and a test set.BabelNet, a multilingual resource with words labeled not only by the senses of meaning they have, but also potential translations of said senses.Hangya et al., 2021 rely on three main resources for their framework: In today’s blog we discuss a framework proposed by Hangya et al., 2021 that contextually mines back-translations of multi-sense words to increase frequency of rare sense words in training data and improve their translation. A disambiguation error like this where only one word is incorrect can render a translation incompressible. Unless provided with enough examples of all contexts of a word, a neural machine translation model can be inclined to translate a word like “mole” the same, no matter if the mole is being removed by a doctor or by pest control. This is particularly difficult when some meanings or “senses” of a word are much rarer than others. While neural machine translation has made great strides over the years, one consistent struggle has been correctly disambiguating the different meanings a word can have.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |