National Repository of Grey Literature 5 records found  Search took 0.01 seconds. 
Porovnáni metod česko-ruského automatického překladu
Bílek, Karel ; Kuboň, Vladislav (advisor) ; Bojar, Ondřej (referee)
In this thesis, I am presenting several methods of Czech-to-Russian ma- chine translation, including both historical approaches and more modern ones, and including both phrase-based and rule-based systems. I am rst brie y describing the linguistic background of Czech and Russian, and their common history and di er- ences. en, I am describing automating, building and improving some o he ma- chine translation systems, together with their comparison, using both an automated metric and a limited human annotation. Meanwhile, I am also describing the creation of a several corpora of Czech-Russian parallel data and Russian monolingual data.
News Topics Tracking
Bílek, Karel ; Bojar, Ondřej (advisor) ; Holan, Tomáš (referee)
In this thesis, I try to find a definition of a news topic to make topic detec- tion implementable and its quality measurable. I describe various methods - a "simple" words counting, optionally with stopwords. I also describe TF-IDF and the text categorization problem. I touch the subject of text clustering. Then I briefly describe approaches called latent semantic indexing and la- tent Dirichlet allocation. The thesis includes my experiments with "simple" words counting, TF-IDF and text categorization on database of articles from several online news websites; I also describe the creation of this database. Precision and recall are used as a metric to text categorization approach. 1
Porovnáni metod česko-ruského automatického překladu
Bílek, Karel ; Kuboň, Vladislav (advisor) ; Bojar, Ondřej (referee)
In this thesis, I am presenting several methods of Czech-to-Russian ma- chine translation, including both historical approaches and more modern ones, and including both phrase-based and rule-based systems. I am rst brie y describing the linguistic background of Czech and Russian, and their common history and di er- ences. en, I am describing automating, building and improving some o he ma- chine translation systems, together with their comparison, using both an automated metric and a limited human annotation. Meanwhile, I am also describing the creation of a several corpora of Czech-Russian parallel data and Russian monolingual data.
News Topics Tracking
Bílek, Karel ; Bojar, Ondřej (advisor) ; Holan, Tomáš (referee)
In this thesis, I try to find a definition of a news topic to make topic detec- tion implementable and its quality measurable. I describe various methods - a "simple" words counting, optionally with stopwords. I also describe TF-IDF and the text categorization problem. I touch the subject of text clustering. Then I briefly describe approaches called latent semantic indexing and la- tent Dirichlet allocation. The thesis includes my experiments with "simple" words counting, TF-IDF and text categorization on database of articles from several online news websites; I also describe the creation of this database. Precision and recall are used as a metric to text categorization approach. 1

See also: similar author names
1 Bílek, K.