Národní úložiště šedé literatury Nalezeno 1 záznamů.  Hledání trvalo 0.00 vteřin. 
Aligning pre-trained models for spoken language translation
Sedláček, Šimon ; Beneš, Karel (oponent) ; Kesiraju, Santosh (vedoucí práce)
In this work, we investigate a novel approach to end-to-end speech translation (ST) by leveraging pre-trained models for automatic speech recognition (ASR) and machine translation (MT) and connecting them with a small connector module (Q-Former, STE). The connector bridges the gap between the speech and text modalities, transforming the ASR encoder embeddings into the latent representation space of the MT encoder. During training, the foundation ASR and MT models are frozen, and only the connector parameters are tuned, optimizing for the ST objective. We train and evaluate our models on the How2 English to Portuguese ST dataset. In our experiments, aligned systems outperform our cascade ST baseline while utilizing the same foundation models. Additionally, while keeping the size of the connector module constant and small in comparison (10M parameters), increasing the size and capability of the ASR encoder and MT decoder universally improves translation results. We find that the connectors can also serve as domain adapters for the foundation models, significantly improving translation performance in the aligned ST setting, compared even to the base MT scenario. Lastly, we propose a pre-training procedure for the connector, with the potential for reducing the amount of ST data required for training similar aligned systems.

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.