Národní úložiště šedé literatury Nalezeno 4 záznamů.  Hledání trvalo 0.02 vteřin. 
Classification on unbalanced data
Hlosta, Martin ; Popelínský, Lubomír (oponent) ; Štěpánková,, Olga (oponent) ; Zendulka, Jaroslav (vedoucí práce)
This thesis is focused on classification on unbalanced data. It is an important part of machine learning with the objective to address the issues when one class is significantly underrepresented compared to the other one. The minority class is usually more important, and the traditional algorithms favouring the majority class may ignore the importance of the minority class. Two application domains motivated the research and identification of two specific problems of the imbalanced data.  First, the presence of a constraint on the performance of a minority class in the computer security domain resulted in the formulation of the constrained classification problem. I proposed a solution that combines the cost-sensitive logistic regression and stochastic algorithms, which in the conducted experiments always improved the performance of the logistic regression.The domain of Learning Analytics motivated me to define a general prediction problem, whether a goal is has been achieved within the deadline. I designed the Self-Learning framework, in which models are trained by analysing attributes of objects that achieved the goal early in the investigated period. Because only a few objects satisfy the goal at the beginning, the problem is by its nature imbalanced, with the imbalance decreasing in time. The evaluation, performed on the task of identification of at-risk students in the distance higher education, showed (1) the predictive power compared the specified baseline models and (2) that methods for tackling the class imbalance without domain information didn't lead to significant improvements. When the domain information is utilised in the extended version of Self-Learning, the evaluation showed the performance increase.  Understanding and exploiting the source of imbalance can also lead to better results.
Mining Multi-Level Sequential Patterns
Šebek, Michal ; Platoš, Jan (oponent) ; Popelínský, Lubomír (oponent) ; Zendulka, Jaroslav (vedoucí práce)
Mining sequential patterns is a very important area of the data mining. Many industrial and business applications save sequential data where the ordering of transactions is defined. It can be used for example for analysis of consecutive shopping transactions. This thesis deals with the using of concept hierarchies of items for mining sequential patterns. This thesis focuses on two basic approaches - mining level-crossing sequential patterns and mining multi-level sequential patterns. The approaches for the both data mining tasks are formalized and there are proposed data mining algorithms hGSP and MLSP to solve these tasks. Experiments verified that mainly the MLSP has good performance and stability. The usability of newly obtained patterns is shown on the real-world data mining task.
Classification on unbalanced data
Hlosta, Martin ; Popelínský, Lubomír (oponent) ; Štěpánková,, Olga (oponent) ; Zendulka, Jaroslav (vedoucí práce)
This thesis is focused on classification on unbalanced data. It is an important part of machine learning with the objective to address the issues when one class is significantly underrepresented compared to the other one. The minority class is usually more important, and the traditional algorithms favouring the majority class may ignore the importance of the minority class. Two application domains motivated the research and identification of two specific problems of the imbalanced data.  First, the presence of a constraint on the performance of a minority class in the computer security domain resulted in the formulation of the constrained classification problem. I proposed a solution that combines the cost-sensitive logistic regression and stochastic algorithms, which in the conducted experiments always improved the performance of the logistic regression.The domain of Learning Analytics motivated me to define a general prediction problem, whether a goal is has been achieved within the deadline. I designed the Self-Learning framework, in which models are trained by analysing attributes of objects that achieved the goal early in the investigated period. Because only a few objects satisfy the goal at the beginning, the problem is by its nature imbalanced, with the imbalance decreasing in time. The evaluation, performed on the task of identification of at-risk students in the distance higher education, showed (1) the predictive power compared the specified baseline models and (2) that methods for tackling the class imbalance without domain information didn't lead to significant improvements. When the domain information is utilised in the extended version of Self-Learning, the evaluation showed the performance increase.  Understanding and exploiting the source of imbalance can also lead to better results.
Mining Multi-Level Sequential Patterns
Šebek, Michal ; Platoš, Jan (oponent) ; Popelínský, Lubomír (oponent) ; Zendulka, Jaroslav (vedoucí práce)
Mining sequential patterns is a very important area of the data mining. Many industrial and business applications save sequential data where the ordering of transactions is defined. It can be used for example for analysis of consecutive shopping transactions. This thesis deals with the using of concept hierarchies of items for mining sequential patterns. This thesis focuses on two basic approaches - mining level-crossing sequential patterns and mining multi-level sequential patterns. The approaches for the both data mining tasks are formalized and there are proposed data mining algorithms hGSP and MLSP to solve these tasks. Experiments verified that mainly the MLSP has good performance and stability. The usability of newly obtained patterns is shown on the real-world data mining task.

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.