Národní úložiště šedé literatury Nalezeno 2 záznamů.  Hledání trvalo 0.00 vteřin. 
Aligning pre-trained models for spoken language translation
Sedláček, Šimon ; Beneš, Karel (oponent) ; Kesiraju, Santosh (vedoucí práce)
In this work, we investigate a novel approach to end-to-end speech translation (ST) by leveraging pre-trained models for automatic speech recognition (ASR) and machine translation (MT) and connecting them with a small connector module (Q-Former, STE). The connector bridges the gap between the speech and text modalities, transforming the ASR encoder embeddings into the latent representation space of the MT encoder. During training, the foundation ASR and MT models are frozen, and only the connector parameters are tuned, optimizing for the ST objective. We train and evaluate our models on the How2 English to Portuguese ST dataset. In our experiments, aligned systems outperform our cascade ST baseline while utilizing the same foundation models. Additionally, while keeping the size of the connector module constant and small in comparison (10M parameters), increasing the size and capability of the ASR encoder and MT decoder universally improves translation results. We find that the connectors can also serve as domain adapters for the foundation models, significantly improving translation performance in the aligned ST setting, compared even to the base MT scenario. Lastly, we propose a pre-training procedure for the connector, with the potential for reducing the amount of ST data required for training similar aligned systems.
Deep Learning for 3D Mesh Registration
Pukanec, Dávid ; Beran, Vítězslav (oponent) ; Španěl, Michal (vedoucí práce)
The problem of mesh alignment is often solved through point cloud registration. Numer- ous deep learning-based registration methods are published every year achieving state-of- the-art results. Based on their core concepts, the methods can loosely be divided into correspondence-based and correspondence-free. Even though comparisons of individual methods exist, the cross-evaluations of both categories are lacking. In this work, a deeper evaluation of Lepard and FINet models is presented. For this purpose, the ModelNet40 and Teeth3DS datasets are used. The experiments show that FINet is able to align unseen shapes, obscured by partiality and noise with a translation error of 4.16% of model size and a rotation error of 3.640 degrees. While Lepard manages this with a translation error of 6.73% of model size and a rotation error of 7.265 degrees.

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.