National Repository of Grey Literature 3 records found  Search took 0.00 seconds. 
Empirical comparison of imputation methods for missing values in data
Ostrenska, Alona ; Holý, Vladimír (advisor) ; Zouhar, Jan (referee)
Missing values are present in all types of data such as different surveys, socio-scientific information etc. In many applications, it is necessary to replace missing observations to maintain the size of the dataset needed for the statistics. This bachelor thesis at first place introduce the categories of causes of missing data and the problems connected with them. The next step is to acquaint with common methods of imputation of missing values and the explanation of applicating those methods on real data in the context of linear regression. Then the assumptions of linear regression models that are based on data with artificially created missing observations are verified. These observations are removed using the mentioned mechanisms and different proportion of missing, with seven subsequent imputation methods. Regression models constructed based on such imputed datasets are then statically verified. Finally, imputation models are compared using different statistics and visualizations and is suggested possible solution - particular methods in case of a real problem of incomplete data.
Using data mining methods for demographic survey data processing
Fišer, David ; Šídlo, Luděk (advisor) ; Kraus, Jaroslav (referee)
USING DATA MINING METHODS FOR DEMOGRAPHIC SURVEY DATA PROCESSING Abstract The goal of the thesis was to describe and demonstrate principles of the process of knowledge discovery in databases - data mining (DM). In the theoretical part of the thesis, selected methods for data mining processes are described as well as basic principles of those DM techniques. In the second part of the thesis a DM task is realized in accordance to CRISP-DM methodology. Practical part of the thesis is divided into two parts and data from the survey of American Community Survey served as the basic data for the practical part of the thesis. First part contains a classification task which goal was to determinate whether the selected DM techniques can be used to solve missing data in the surveys. The success rate of classifications and following data value prediction in selected attributes was in 55-80 % range. The second part of the practical part of the thesis was then focused of determining knowledge of interest using associating rules and the GUHA method. Keywords: data mining, knowledge discovery in databases, statistic surveys, missing values, classification, association rules, GUHA method, ACS
A comparison of statistical packages such as SPSS, STATISTICA, and SAS Enterprise Guide.
Torboněnko, Natalja ; Řezanková, Hana (advisor) ; Pecáková, Iva (referee)
The purpose of this bachelor thesis is to compare three statistical software packages when applied to analysis of categorical data. These packages are SPSS, STATISTICA , and SAS Enterprise Guide. They provide a wide variety of statistical and graphical techniques, and enable users to obtain the results of statistical procedures, without requiring programming. The comparison between these packages were based upon some features such as data input, capability to define missing values, value labels, producing a frequency tables (including multiple response), making a bar graphs, descriptive statistics, Binomial test, Chi-square goodness-of-fit test.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.