| |
|
Vyhodnocení stability jednotlivých metod i skupin metod výběru příznaků, který optimalizují kardinalitu podmnožiny příznaků
Somol, Petr ; Novovičová, Jana
Stability (robustness) of feature selection methods is a topic of recent interest yet often neglected importance with direct impact on the reliability of machine learning systems. We investigate the problem of evaluating the stability of feature selection processes yielding subsets of varying size. We introduce several novel feature selection stability measures and adjust some existing measures in a unifying framework that offers broad insight into the stability problem. We study in detail the properties of considered measures and demonstrate on various examples what information about the feature selection process can be gained. We also introduce an alternative approach to feature selection evaluation in form of measures that enable comparing the similarity of two feature selection processes. These measures enable comparing, e.g., the output of two feature selection methods or two runs of one method with different parameters. The information obtained using the considered stability and similarity measures is shown usable for assessing feature selection methods (or criteria) as such
|
|
Má smysl vyvíjet nové metody výběru příznaků?
Somol, Petr ; Novovičová, Jana
One of hot topics discussed recently in relation to pattern recognition techniques is the question of actual performance of modern feature selection methods. Feature selection has been a highly active area of research in recent years due to its potential to improve both the performance and economy of automatic decision systems in various applicational fields, with medical diagnosis being among the most prominent. Feature selection may also improve the performance of classifiers learned from limited data, or contribute to model interpretability. The number of available methods and methodologies has grown rapidly while promising important improvements. Yet recently many authors put this development in question, claiming that simpler older tools show to be actually better than complex modern ones -- which, despite promises, are claimed to actually fail in real-world applications.
|
|
Application of finite mixtures to text document classification
Novovičová, Jana ; Malík, Antonín
Finite mixture modelling of class-conditional distributions is a standard method in a statistical pattern recognition. We proposed to use the mixture of multinomial distributions as a model for class-conditional distribution for text document classification task. The vector document representations using a bag-of-words or a unigram approach are employed. Experimental comparison of the proposed model and the standard models was performed using Reuters-21578 database.
|
| |
| |
| |
| |
| |
| |