
The power of two sample tests
Rózsahegyi, Dominik ; Maciak, Matúš (advisor) ; Nagy, Stanislav (referee)
Twosample tests are commonly used in practice, for example in scientific sphere or financial sectors. The power of test is also an important feature and express probability that the test will reject invalid null hypothesis. In this work we introduce four basic tests in which we compare some parameters of two popu lations. The reader gets to know with basic terms of hypothesis testing which are necessary for introduction of tests. For each test we use simulation for estima tion of the power and observe its behavior with different distributions of samples, ranges or selected null and alternative hypothesis. Based on obtained results we compare chosen tests and discuss suitability for using them in different cases. 1


Generalized Wilcoxon Test for Censored Data
Vařejková, Michaela ; Maciak, Matúš (advisor) ; Komárek, Arnošt (referee)
This paper deals with the generalized Wilcoxon test and its use for censored data. The introduction describes standard onesample and twosamples Wilco xon tests and their basic properties, censored data and methods of censoring. The main part of the paper is devoted to the introduction of the generalized Wilcoxon test and to its properties. First, a test for singlycensored data is de scribed; the description of a test for doubly censored data follows. The paper concludes with a simulations part in which statistical properties of the test are demonstrated. The first example compares the generalized test with the stan dard twosamples Wilcoxon test. The second example shows how the censoring rate affects the power and significance level of the generalized test. 1


Multiple testing problems
Turzová, Kristína ; Maciak, Matúš (advisor) ; Komárek, Arnošt (referee)
Statistical hypothesis testing is used while analyzing experimental data. This thesis is focused on multiple testing, which means testing many hypotheses simultaneously, and multiple testing problems occurring while running multiple hypotheses tests. These multiple testing problems are described and two errors, FWER (FamilyWise Error Rate) and FDR (False Discovery Rate), are defined. Selected multiple testing corrections are introduced and compared in detail using simulations regarding significance level and power. All of the discussed corrections control for the problem of multiple testing.


Continuity correction
Štěpán, Marek ; Omelka, Marek (advisor) ; Maciak, Matúš (referee)
For an approximation of discrete random variable, which is the sum of n inde pendent, identically distributed discrete random variables, we can use the central limit theorem. However, it turns out that we can refine this approximation by applying continuity correction. This term is explained in the thesis, and it is illustrated several ways how the continuity correction can be derived. There is also a numerical comparison of the approximation error for the binomial distribu tion approximation by the normal distribution with the correction for continuity and approximation without the correction. There are also described confidence intervals and χ2 test of independence in contingency tables in which continu ity correction are used. On simulations for various parameters, we will test the properties of these intervals (true confidence level and length) and tests (actual significance level and power).


Neighborhood components analysis and machine learning
Hanousek, Jan ; Antoch, Jaromír (advisor) ; Maciak, Matúš (referee)
In this thesis we focus on the NCA algorithm, which is a modification of knearest neighbors algorithm. Following a brief introduction into classification algorithms we overview KNN algorithm, its strengths and flaws and what lead to the creation of the NCA. Then we discuss two of the most widely used mod ifications of NCA called Fast NCA and Kernel (fast) NCA, which implements the socalled kernel trick. Integral part of this thesis is also a proposed algo rithm based on KNN (/NCA) and Linear discriminant analysis titled TSKNN (/TSNCA), respectively. We conclude this thesis with a detailed study of two real life financial problems and compare all the algorithms introduced in this thesis based on the performance in these tasks. 1


Bayesian factor analysis
Vávra, Jan ; Komárek, Arnošt (advisor) ; Maciak, Matúš (referee)
Bayesian factor analysis  abstract Factor analysis is a method which enables highdimensional random vector of measurements to be approximated by linear combinations of much lower number of hidden factors. Classical estimation procedure of this model lies on the cho ice of the number of factors, the decomposition of variance matrix while keeping identification conditions satisfied and on the appropriate choice of rotation for better interpretation of the model. This model will be transferred into bayesian framework which offers the usage of prior information unlike the classical appro ach. The number of hidden factors can be considered as a random parameter and the dependency of each measurement on at most one factor can be forced by suitable specification of prior distribution. Estimates of model parameters are based on posterior distribution which is approximated by Monte Carlo Markov Chain methods. Bayesian approach solves the problem of selection of the num ber of factors, the model estimation and the ensuring of the identifiability and the interpretability at the same time. The ability to estimate the real number of hidden factors is tested in a simulation study. 1


Joinpoint Regression
Lain, Michal ; Maciak, Matúš (advisor) ; Hlávka, Zdeněk (referee)
The theme of this thesis is the joinpoint regression, the description of model, its properties and its construction. We are interested in methods of estimating parameters. We show practical use of the model. In the first chapter we define the model, we describe alternative forms and properties. In the second chapter we focus on estimating parameters of model. We briefly mention of Hudson method, profile likelihood, grid search and LASSO. We mention likelihood ratio for testing hypotheses about values of parameters. The third chapter deals with comparison of models by number of break points by permutation tests and information cri terions. In the fourth chapter we deal with practical examples. We show diverse application of the model. We compare methods using simulations and show model application. 1


Structural Equation Models with Application in Social Sciences
Veselý, Václav ; Pešta, Michal (advisor) ; Maciak, Matúš (referee)
We investigate possible usage of ErrorsinVariables estimator (EIV), when esti mating structural equations models (SEM). Structural equations modelling pro vides framework for analysing complex relations among set of random variables where for example the response variable in one equation plays role of the predic tor in another equation. First an overview of SEM and some common covariance based estimators is provided. Special case of linear regression model is investi gated, showing that the covariance based estimators yield the same results as ordinary least squares. A compact review of EIV models follows, ErrorsinVariables models are re gression models where not only response but also predictors are assumed to be measured with an error. Main contribution of this paper then lies in defining modifications of the EIV estimator to fit in the SEM framework. General opti mization problem to estimate the parameters of structural equations model with errorsinvariables si postulated. Several modifications of two stage least squares are also proposed for future research. Equationwise ErrorsinVariables estimator is proposed to estimate the coeffi cients of structural equations model. The coefficients of every structural equation are estimated separately using EIV estimator. Some theoretical conditions...


Statistical inference in varying coefficient models
Splítek, Martin ; Maciak, Matúš (advisor) ; Pešta, Michal (referee)
Tato práce se zabývá modely s promìnlivými koe cienty se za mìøením na statistickou inferenci. Hlavní my¹lenkou tìchto modelù je pou¾ití regresních koe cientù, mìnících se v závislosti na nìjakém modi kátoru vlivu, namísto konstantních koe cientù klasické lineární regrese. Nejprve si de nujeme tyto modely a jejich odhadové procedury, kterých bylo doposud publikováno nì kolik variant. K odhadu se pou¾ívá lokální regrese nebo rùzné druhy splajnù { vyhlazovací, polynomiální èi penalizované. Od metody odhadu se následnì od víjí i daná statistická inference, ke které uvedeme odvozené vychýlení, rozptyl, asymptotickou normalitu, kon denèní pásma a testování hypotéz. Hlavním cílem na¹í práce je kompaktnì shrnout vybrané metody a jejich inferenci. Na závìr je navr¾ena proceduru pro výbìr promìnných.


Regularization and variable selection in regression models
Lahodová, Kateřina ; Komárek, Arnošt (advisor) ; Maciak, Matúš (referee)
This diploma thesis focuses on regularization and variable selection in regres sion models. Basics of penalised likelihood, generalized linear models and their evaluation and comparison based on prediction quality and variable selection are described. Methods called LASSO and LARS for variable selection in normal linear regression are briefly introduced. The main topic of this thesis is method called Boosting. General Boosting algorithm is introduced including functional gradient descent, followed by selection of base procedure, especially the componentwise linear least squares method. Two specific application of general Boosting algorithm are introduced with derivation of some important characteristics. These methods are AdaBoost for data with conditional binomial distribution and L2Boosting for condi tional normal distribution. As a final point a simulation study comparing LASSO, LARS and L2Boosting methods was conducted. It is shown that methods LASSO and LARS are more suitable for variable selection whereas L2Boosting is more fitting for new data prediction.
