keywords:"TF-IDF" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"TF-IDF"

Search:



Search Tips :: Simple Search

Search collections:

Sort by:	Display results:	Output format:

	Methods of Web Page Classification Nachtnebl, Viktor ; Burget, Radek (referee) ; Bartík, Vladimír (advisor) This work deals with methods of web page classification. It explains the concept of classification and different features of web pages used for their classification. Further it analyses representation of a page and in detail describes classification method that deals with hierarchical category model and is able to dynamically create new categories. In the second half it shows implementation of chosen method and describes the results. Detailed record
	Semantic Similarity of Articles Veselovský, Martin ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor) This bachelor's thesis deals with modelling of structure of semantic relationships among articles in English language. There are introduced existing methods of articles representation and computation of similarity. The base method is vector space model, which represents document as vector of words. There are given weights of importance to these words using TF-IDF method. Next, there are described advanced methods of modelling, Latent semantic analysis (LSA) and Latent Dirichlet allocation (LDA). This thesis also deals with articles, which are semantically annotated, while weights of annotation words are computed by Stochastic Gradient Descent method. Evaluation of results takes place on the prepared test corpus of documents to which there is reference similarity evaluation. Detailed record
	Derivation of Dictionary for Process Inspector Tool on SharePoint Platform Pavlín, Václav ; Masařík, Karel (referee) ; Kreslíková, Jitka (advisor) This master's thesis presents methods for mining important pieces of information from text. It analyses the problem of terms extraction from large document collection and describes the implementation using C# language and Microsoft SQL Server. The system uses stemming and a number of statistical methods for term extraction. This project also compares used methods and suggests the process of the dictionary derivation. Detailed record
	Improved Prediction of Social Tags Using Data Mining Harár, Pavol ; Galáž, Zoltán (referee) ; Kříž, Jiří (advisor) This master’s thesis deals with using Text mining as a method to predict tags of articles. It describes the iterative way of handling big data files, parsing the data, cleaning the data and scoring of terms in article using TF-IDF. It describes in detail the flow of program written in programming language Python 3.4.3. The result of processing more than 1 million articles from Wikipedia database is a dictionary of English terms. By using this dictionary one is capable of determining the most important terms from article in corpus of articles. Relevancy of consequent tags proves the method used in this case. Detailed record
	Actual Events Tracker Odstrčilík, Martin ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor) The goal of the master thesis project was to develop an application for tracking of actual events in the surrounding area of the users. This application should allow the users to view events, create new events and add comments to existing ones. Beyond the implementation of developed application, this project deals with an analysis of the presented problem. The analysis includes a comparison with existing solutions and search for available technologies and frameworks applicable for implementation. Another part inside this work is description of the theory in behind of data classification that is internally used for event and comment analysis. This work also includes a design of appliction including design of user interface, software architecture, database, communication protocol and data classifiers. The main part of this project, the implementation, is described aftewards. At the end of this work, there is a summary of the whole process and also there are given some ideas about enhancing the application in the future. Detailed record
	Binární klasifikace zákaznických incidentů pomocí metod NLP Pokorný, Jiří This bachelor thesis focuses on building a model for binary classification of customer incidents within the SAP system. By classifying the individual sentences of incidents, the final category of the incident is predicted. The used text is in English. To compare traditional and modern approaches to text classification as well as obtain optimal results, a series of experiments is carried out using different methods of balancing the dataset, vector representation and classification. Finally, the results are analyzed and recommendation is formulated with regard to further development, including applying knowledge gained within the SAP environment. Detailed record
	Detekce kategorie obsahu webové stránky prostřednictvím metod strojového učení. DOHNAL, Patrik This bachelor thesis is focused on design and the implementation of the algorithm for classifying the websites into a several categories. The implementation of this software is written in Python. For classifying purposes I use machine learning models such as Naive Bayes classifier, K-Nearest neighbors and Support Vector Machines. Within the process it is assumed to collect my own dataset, wich will be used for training and testing purposes. Thesis also includes detailed description of the methods I uesd. Detailed record
	Data Mining Methods for Text Analysis Kozák, Ondřej ; Marcoň, Petr (referee) ; Dohnal, Přemysl (advisor) This bachelor thesis explores the current methodology and possibilities of text mining and the subsequent application of some methods. The thesis described methods for preprocessing, methods for converting text to vector space and methods for text analysis and discusses their possible applications. The different preprocessing methods were applied to the text and then the conversion to vector space was demonstrated using simple methods such as BOW, Bag of n-grams, TF-IDF or with machine learning methods which are FastText and GloVe. LSA, LDA, TextRank and cosine similarity methods were applied to the extracted vectors to extract information from the text. Detailed record
	Searching relevant articles in extensive collections Vojt, Ján ; Novák, Jiří (advisor) ; Bartoš, Tomáš (referee) Searching text in articles is usually implemented with fulltext search. Using more advanced techniques however, it is possible to achieve significantly better results. The subject of this work is to create a universal library for searching extensible collections, specialized in czech language. The library makes use of tools capable of working with morphology while considering importance of words. It also conducts an experiment with word pairs, which adds context into the search process. The success rate of this experiment is tried on an extensible collection of data. Created library is a unique tool for processing extensible collections of czech text, while at the same time it is ready for further extension by new languages and methods. Detailed record
	Analysis of Mobile Devices Network Communication Data Abraham, Lukáš ; Bartík, Vladimír (referee) ; Burgetová, Ivana (advisor) At the beginning, the work describes DNS and SSL/TLS protocols, it mainly deals with communication between devices using these protocols. Then we'll talk about data preprocessing and data cleaning. Furthermore, the thesis deals with basic data mining techniques such as data classification, association rules, information retrieval, regression analysis and cluster analysis. The next chapter we can read something about how to identify mobile devices on the network. We will evaluate data sets that contain collected data from communication between the above mentioned protocols, which will be used in the practical part. After that, we finally get to the design of a system for analyzing network communication data. We will describe the libraries, which we used and the entire system implementation. We will perform a large number of experiments, which we will finally evaluate. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English