National Repository of Grey Literature 5 records found  Search took 0.00 seconds. 
Aligning pre-trained models for spoken language translation
Sedláček, Šimon ; Beneš, Karel (referee) ; Kesiraju, Santosh (advisor)
Tato práce zkoumá nový end-to-end přístup k překladu mluveného jazyka (ST) využívající předtrénovaných modelů pro přepis řeči (ASR) a strojový překlad (MT), propojené malým spojovacím modulem (Q-Former, STE). Ten má za úkol překlenout mezeru mezi modalitami řeči a textu mapováním embedding reprezentací ASR enkodéru do latentního prostoru reprezentací MT modelu. Během trénování jsou zvolené ASR a MT model zmrazeny, laděny jsou pouze parametry spojovacího modulu. Trénování a evaluace jsou prováděny na datasetu How2, obsahujícím ST data z Angličtiny do Portugalštiny. V našich experimentech zjišťujeme, že většina sladěných systémů překonává referenční kaskádový ST systém, přičemž využívají stejné základní modely. Navíc, při zachování konstantní a ve srovnání malé (10M parametrů) velikosti spojovacího modulu, větší a silnější ASR a MT modely univerzálně zlepšují výsledky překladu. Zjišťujeme, že spojovací moduly mohou také sloužit jako doménové adaptéry pro zvolené základní systémy, kdy významně zlepšují výsledky překladu ve sladěném ST prostředí, a to i oproti holému MT výkonu daného MT modelu. Nakonec navrhujeme proceduru pro předtrénování spojovacího modulu s potenciálem snížit množství ST dat potřebných pro trénink obdobných sladěných systémů.
Machine Translation Using Syntactic Analysis
Popel, Martin ; Žabokrtský, Zdeněk (advisor) ; Ircing, Pavel (referee) ; Čmejrek, Martin (referee)
Machine Translation Using Syntactic Analysis Martin Popel This thesis describes our improvement of machine translation (MT), with a special focus on the English-Czech language pair, but using techniques ap- plicable also to other languages. First, we present multiple improvements of the deep-syntactic system TectoMT. For instance, we implemented a novel context-sensitive translation model, comparing several machine learning ap- proaches. We also adapted TectoMT to other domains and languages. Sec- ond, we present Transformer - a state-of-the-art end-to-end neural MT sys- tem. We analyzed in detail the effect of several training hyper-parameters. With our optimized training, the system outperformed the best result on the WMT2017 test set by +1.0 BLEU. We further extended this system by uti- lization of monolingual training data and by a new type of backtranslation (+2.8 BLEU compared to the baseline system). In addition, we leveraged domain adaptation and the effect of "translationese" (i.e which language in parallel data is the original and which is the translation) to optimize MT systems for original-language and translated-language data (gaining further +0.2 BLEU). Our improved neural MT system significantly (p¡0.05) out- performed all other systems in English-Czech and Czech-English WMT2018 shared tasks,...
Robust Parsing of Noisy Content
Daiber, Joachim ; Zeman, Daniel (advisor) ; Mareček, David (referee)
While parsing performance on in-domain text has developed steadily in recent years, out-of-domain text and grammatically noisy text remain an obstacle and often lead to significant decreases in parsing accuracy. In this thesis, we focus on the parsing of noisy content, such as user-generated content in services like Twitter. We investigate the question whether a preprocessing step based on machine translation techniques and unsupervised models for text-normalization can improve parsing performance on noisy data. Existing data sets are evaluated and a new data set for dependency parsing of grammatically noisy Twitter data is introduced. We show that text-normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy. Powered by TCPDF (www.tcpdf.org)
Machine Translation Using Syntactic Analysis
Popel, Martin ; Žabokrtský, Zdeněk (advisor) ; Ircing, Pavel (referee) ; Čmejrek, Martin (referee)
Machine Translation Using Syntactic Analysis Martin Popel This thesis describes our improvement of machine translation (MT), with a special focus on the English-Czech language pair, but using techniques ap- plicable also to other languages. First, we present multiple improvements of the deep-syntactic system TectoMT. For instance, we implemented a novel context-sensitive translation model, comparing several machine learning ap- proaches. We also adapted TectoMT to other domains and languages. Sec- ond, we present Transformer - a state-of-the-art end-to-end neural MT sys- tem. We analyzed in detail the effect of several training hyper-parameters. With our optimized training, the system outperformed the best result on the WMT2017 test set by +1.0 BLEU. We further extended this system by uti- lization of monolingual training data and by a new type of backtranslation (+2.8 BLEU compared to the baseline system). In addition, we leveraged domain adaptation and the effect of "translationese" (i.e which language in parallel data is the original and which is the translation) to optimize MT systems for original-language and translated-language data (gaining further +0.2 BLEU). Our improved neural MT system significantly (p¡0.05) out- performed all other systems in English-Czech and Czech-English WMT2018 shared tasks,...
Robust Parsing of Noisy Content
Daiber, Joachim ; Zeman, Daniel (advisor) ; Mareček, David (referee)
While parsing performance on in-domain text has developed steadily in recent years, out-of-domain text and grammatically noisy text remain an obstacle and often lead to significant decreases in parsing accuracy. In this thesis, we focus on the parsing of noisy content, such as user-generated content in services like Twitter. We investigate the question whether a preprocessing step based on machine translation techniques and unsupervised models for text-normalization can improve parsing performance on noisy data. Existing data sets are evaluated and a new data set for dependency parsing of grammatically noisy Twitter data is introduced. We show that text-normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy. Powered by TCPDF (www.tcpdf.org)

Interested in being notified about new results for this query?
Subscribe to the RSS feed.