keywords:"text classification" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"text classification"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Porovnání open-source nástrojů pro strojové učení Poliakova, Yevheniia Poliakova, Y. Comparison of open-source tools for machine learning. Thesis. Brno: Mendel University in Brno, 2022. This work is devoted to the research of accessible open source artificial intelligence. The thesis describes a selected list of available artificial intelligence tools and the use of these tools for specific tasks. The main contribution of the work is the comparison of open-source tools using experiments focused on inductively controlled (supervised, classification) knowledge acquisition from large volumes of text and data. These experiments will be performed using selected open-source tools. The result of the work will be a conclusion about the advantages and disadvantages of the already mentioned platforms, their characteristics in solving specific problems and recommendations for choosing a platform according to the assigned task or data. Detailed record
	Assessment and implementation of text data preprocessing in neural network models Ratnasari, Febiyanti In the realm of text data processing, text preprocessing has traditionally played a significant role. However, with the growing prominence of neural network models and novel representations of textual data, the importance of text preprocessing has been relatively understated. To address this, the present research endeavors to investigate the potential benefits of employing a composite of multiple text data preprocessing techniques in conjunction with a neural network-based text processing model. Detailed record
	Binární klasifikace zákaznických incidentů pomocí metod NLP Pokorný, Jiří This bachelor thesis focuses on building a model for binary classification of customer incidents within the SAP system. By classifying the individual sentences of incidents, the final category of the incident is predicted. The used text is in English. To compare traditional and modern approaches to text classification as well as obtain optimal results, a series of experiments is carried out using different methods of balancing the dataset, vector representation and classification. Finally, the results are analyzed and recommendation is formulated with regard to further development, including applying knowledge gained within the SAP environment. Detailed record
	Crude Oil Price Forecast based on Text News Skalický, Jan ; Bojar, Ondřej (advisor) ; Žabokrtský, Zdeněk (referee) For crude oil price forecast, there is a whole range of algorithms. In this thesis we bring out a new perspective on this issue and introduce our project COPF. Using a maximum entropy classifier, we try to predict the change in crude oil price from text information available on the Internet. We are taking advantage of the knowledge of experts in the field. As a part of the thesis, we tested and improved COPF precision. We have found out that this approach poses a lot of interesting problems. In the current state, the precision of our prediction surpassed the baseline but for further development, it is necessary to obtain more data sources. Our algorithm has never been regarded as a self-standing method but it may nicely complement numerical algorithms. Detailed record
	Popularity Meter Hajič, Jan ; Bojar, Ondřej (advisor) ; Popel, Martin (referee) Having the possibility of automatically tracking a person's popularity in the newspapers is an idea appealing not just to those in the media spotlight. While sentiment (subjectivity) analysis is a rapidly growing subfield of computational linguistics, no data from the news domain are yet available for Czech. We have therefore started building a manually annotated polarity corpus of sentences from Czech news texts; however, these texts have proven themselves rather unwieldy for such processing. We have also designed a classifier which should be able to track popularity based on this corpus; the classifier has been tested on a corpus of product reviews of domestic appliances and some introductory testing has been done on the nascent news corpus. As a model, we simply extract a unigram polarity lexicon from the data. We then use three related methods for identifying lemma polarity and a number of simple filters for feature selection. On the domestic appliance data, our simplest model has achieved results comparable to the state of the art, however, the properties of Czech news texts and preliminary results hint a more linguistically oriented approach might be preferrable. Detailed record
	Email spam filtering using artificial intelligence Safonov, Yehor ; Uher, Václav (referee) ; Kolařík, Martin (advisor) In the modern world, email communication defines itself as the most used technology for exchanging messages between users. It is based on three pillars which contribute to the popularity and stimulate its rapid growth. These pillars are represented by free availability, efficiency and intuitiveness during exchange of information. All of them constitute a significant advantage in the provision of communication services. On the other hand, the growing popularity of email technologies poses considerable security risks and transforms them into an universal tool for spreading unsolicited content. Potential attacks may be aimed at either a specific endpoints or whole computer infrastructures. Despite achieving high accuracy during spam filtering, traditional techniques do not often catch up to rapid growth and evolution of spam techniques. These approaches are affected by overfitting issues, converging into a poor local minimum, inefficiency in highdimensional data processing and have long-term maintainability issues. One of the main goals of this master's thesis is to develop and train deep neural networks using the latest machine learning techniques for successfully solving text-based spam classification problem belonging to the Natural Language Processing (NLP) domain. From a theoretical point of view, the master's thesis is focused on the e-mail communication area with an emphasis on spam filtering. Next parts of the thesis bring attention to the domain of machine learning and artificial neural networks, discuss principles of their operations and basic properties. The theoretical part also covers possible ways of applying described techniques to the area of text analysis and solving NLP. One of the key aspects of the study lies in a detailed comparison of current machine learning methods, their specifics and accuracy when applied to spam filtering. At the beginning of the practical part, focus will be placed on the e-mail dataset processing. This phase was divided into five stages with the motivation of maintaining key features of the raw data and increasing the final quality of the dataset. The created dataset was used for training, testing and validation of types of the chosen deep neural networks. Selected models ULMFiT, BERT and XLNet have been successfully implemented. The master's thesis includes a description of the final data adaptation, neural networks learning process, their testing and validation. In the end of the work, the implemented models are compared using a confusion matrix and possible improvements and concise conclusion are also outlined. Detailed record
	Extraction of Semantic Relations from Text Pospíšil, Milan ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor) Today exists many semi-structured documents, whitch we want convert to structured form. Goal of this work is create a system, that make this task more automatized. That could be difficult problem, because most of these documents are not generated by computer, so system have to tolerate differences. We also need some semantic understanding, thats why we choose only domain of meeting minutes documents. Detailed record
	Comparison of approaches to text classification Knížek, Jan ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee) The focus of this thesis is short text classification. Short text is the prevailing form of text on e-commerce and review platforms, such as Yelp, Tripadvisor or Heureka. As the popularity of the online communication is increasing, it is becoming infeasible for users to filter information manually. It is therefore becoming more and more important to recog- nise the relevant information in text. Classification of reviews is especially challenging, because they have limited structure, use informal language, contain a high number of errors and rely heavily on context and common knowledge. One of the possible appli- cations of machine learning is to automatically filter data and show users only relevant pieces of information. We work with restaurant reviews from Yelp and aim to predict their usefulness. Most restaurants have relatively many reviews, yet only few are truly useful. Our objective is to compare machine learning methods for predicting usefulness. 1 Detailed record
	Artificial Intelligence Document Classification Molnár, Ondřej ; Kačic, Matej (referee) ; Třeštíková, Lenka (advisor) This paper deals with document classification using artificial intelligence. It describes the principles of classification and machine learning. It also introduces AI methods and presents Naive Bayes classification method in detail. Provides practical implementation of the classifier in MS Office and discusses other possible extensions. Detailed record
	Scala Programming Language and Its Use for Data Analysis Kohout, Tomáš ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor) This thesis deals with comparing the Scala programming language with other commonly used languages for data analysis. These languages are evaluated on the basis of the following categories: data manipulation and visualization, machine learning and concurent processing capabilities. The evaluation then shows the strengths and weaknesses of Scala. The strengths will be demonstrated on application for email categorization. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English