keywords:"imbalanced data" - Výsledky hledání - Digitální repozitář

host :: přihlásit Digitální repozitář
		Hledej		Nový záznam		Nápověda		O repozitáři

Hlavní stránka > Výsledky hledání: keywords:"imbalanced data"

Hledej:

Tipy pro vyhledávaní :: Rozšířené hledání

Hledej ve sbírkách:

Seřadit podle:	Zobrazit výsledky:	Výstupní formát:

	Classification on unbalanced data Hlosta, Martin ; Popelínský, Lubomír (oponent) ; Štěpánková,, Olga (oponent) ; Zendulka, Jaroslav (vedoucí práce) This thesis is focused on classification on unbalanced data. It is an important part of machine learning with the objective to address the issues when one class is significantly underrepresented compared to the other one. The minority class is usually more important, and the traditional algorithms favouring the majority class may ignore the importance of the minority class. Two application domains motivated the research and identification of two specific problems of the imbalanced data. First, the presence of a constraint on the performance of a minority class in the computer security domain resulted in the formulation of the constrained classification problem. I proposed a solution that combines the cost-sensitive logistic regression and stochastic algorithms, which in the conducted experiments always improved the performance of the logistic regression.The domain of Learning Analytics motivated me to define a general prediction problem, whether a goal is has been achieved within the deadline. I designed the Self-Learning framework, in which models are trained by analysing attributes of objects that achieved the goal early in the investigated period. Because only a few objects satisfy the goal at the beginning, the problem is by its nature imbalanced, with the imbalance decreasing in time. The evaluation, performed on the task of identification of at-risk students in the distance higher education, showed (1) the predictive power compared the specified baseline models and (2) that methods for tackling the class imbalance without domain information didn't lead to significant improvements. When the domain information is utilised in the extended version of Self-Learning, the evaluation showed the performance increase. Understanding and exploiting the source of imbalance can also lead to better results. Úplný záznam
	Machine Learning Methods in Payment Card Fraud Detection Sinčák, Jan ; Baruník, Jozef (vedoucí práce) ; Vácha, Lukáš (oponent) Ochrana klientů před podvodnými transakcemi je náročný úkol. Banky se ob- vykle spoléhají na systémy založené na pravidlech, které vyžadují ruční tvorbu těchto pravidel pro identifikaci podvodu. Tato pravidla musí nastavit zaměst- nanci banky, kteří musí sami vyhledávat trendy v podvodných transakcích. Tato práce se zabývá problémem odhalování podvodných karetních transakcí a porovnává několik modelů strojového učení pro detekci podvodů. Tyto mod- ely mohou v datech najít složité vztahy a potenciálně překonat klasické sys- témy detekce podvodů, Logistická regrese, neuronová síť, random forest a ex- treme gradient boosting (XGBoost) jsou trénovány na simulovaném souboru dat, který věrně kopíruje vlastnosti skutečných karetních transakcí. Výkon- nost modelů se měří podle citlivosti, specificity, preciznosti, AUC a časové náročnosti předpovědi na testovacím souboru dat. XGBoost vykazuje nejvyšší výkonnost mezi testovanými modely. Poté je porovnáván se standardním sys- témem detekce podvodů používaným v české bance. Bankovní systém dosahuje vyšší specificity, ale XGBoost přesto vykazuje slibné výsledky. Je možné, že některé modely strojového učení by mohly překonat současné systémy detekce podvodů, pokud budou dobře vyladěny. Klasifikace JEL G21, K42 Klíčová slova strojové učení, karetní podvody,... Úplný záznam
	Classification on unbalanced data Hlosta, Martin ; Popelínský, Lubomír (oponent) ; Štěpánková,, Olga (oponent) ; Zendulka, Jaroslav (vedoucí práce) This thesis is focused on classification on unbalanced data. It is an important part of machine learning with the objective to address the issues when one class is significantly underrepresented compared to the other one. The minority class is usually more important, and the traditional algorithms favouring the majority class may ignore the importance of the minority class. Two application domains motivated the research and identification of two specific problems of the imbalanced data. First, the presence of a constraint on the performance of a minority class in the computer security domain resulted in the formulation of the constrained classification problem. I proposed a solution that combines the cost-sensitive logistic regression and stochastic algorithms, which in the conducted experiments always improved the performance of the logistic regression.The domain of Learning Analytics motivated me to define a general prediction problem, whether a goal is has been achieved within the deadline. I designed the Self-Learning framework, in which models are trained by analysing attributes of objects that achieved the goal early in the investigated period. Because only a few objects satisfy the goal at the beginning, the problem is by its nature imbalanced, with the imbalance decreasing in time. The evaluation, performed on the task of identification of at-risk students in the distance higher education, showed (1) the predictive power compared the specified baseline models and (2) that methods for tackling the class imbalance without domain information didn't lead to significant improvements. When the domain information is utilised in the extended version of Self-Learning, the evaluation showed the performance increase. Understanding and exploiting the source of imbalance can also lead to better results. Úplný záznam
	A machine learning method for incomplete and imbalanced medical data Salman, I. ; Vomlel, Jiří Our research reported in this paper is twofold. In the first part of the paper we use\nstandard statistical methods to analyze medical records of patients suffering myocardial\ninfarction from the third world Syria and a developed country - the Czech Republic.\nOne of our goals is to find whether there are statistically significant differences between\nthe two countries. In the second part of the paper we present an idea how to deal with\nincomplete and imbalanced data for tree-augmented naive Bayesian (TAN). All results\npresented in this paper are based on a real data about 603 patients from a hospital in\nthe Czech Republic and about 184 patients from two hospitals in Syria. Úplný záznam
	Trends in random forests parameters for classification of imbalanced data Robnik-Šikonja, M. ; Savický, Petr Plný tet: v1153-12 - PDF Plný text: content.csg - PDF Úplný záznam

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.

Digitální repozitář :: :: :: ::
Powered by v1.1.2
Spravuje

Tato stránka je dostupná také v následujících jazycích:
Česky English