Original title: Porovnání přístupů ke klasifikaci textu
Translated title: Comparison of approaches to text classification
Authors: Knížek, Jan ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
Document type: Bachelor's theses
Year: 2019
Language: eng
Abstract: The focus of this thesis is short text classification. Short text is the prevailing form of text on e-commerce and review platforms, such as Yelp, Tripadvisor or Heureka. As the popularity of the online communication is increasing, it is becoming infeasible for users to filter information manually. It is therefore becoming more and more important to recog- nise the relevant information in text. Classification of reviews is especially challenging, because they have limited structure, use informal language, contain a high number of errors and rely heavily on context and common knowledge. One of the possible appli- cations of machine learning is to automatically filter data and show users only relevant pieces of information. We work with restaurant reviews from Yelp and aim to predict their usefulness. Most restaurants have relatively many reviews, yet only few are truly useful. Our objective is to compare machine learning methods for predicting usefulness. 1
Keywords: machine learning; NLP; review classification; text classification; klasifikace recenzí; klasifikace textu; NLP; strojové učení

Institution: Charles University Faculties (theses) (web)
Document availability information: Available in the Charles University Digital Repository.
Original record: http://hdl.handle.net/20.500.11956/117016

Permalink: http://www.nusl.cz/ntk/nusl-410996


The record appears in these collections:
Universities and colleges > Public universities > Charles University > Charles University Faculties (theses)
Academic theses (ETDs) > Bachelor's theses
 Record created 2020-03-19, last modified 2022-03-04


No fulltext
  • Export as DC, NUŠL, RIS
  • Share