
A Robustified Metalearning Procedure for Regression Estimators
Kalina, Jan ; Neoral, A.
Metalearning represents a useful methodology for selecting and recommending a suitable algorithm or method for a new dataset exploiting a database of training datasets. While metalearning is potentially beneficial for the analysis of economic data, we must be aware of its instability and sensitivity to outlying measurements (outliers) as well as measurement errors. The aim of this paper is to robustify the metalearning process. First, we prepare some useful theoretical tools exploiting the idea of implicit weighting, inspired by the least weighted squares estimator. These include a robust coefficient of determination, a robust version of mean square error, and a simple rule for outlier detection in linear regression. We perform a metalearning study for recommending the best linear regression estimator for a new dataset (not included in the training database). The prediction of the optimal estimator is learned over a set of 20 real datasets with economic motivation, while the least squares are compared with several (highly) robust estimators. We investigate the effect of variable selection on the metalearning results. If the training as well as validation data are considered after a proper robust variable selection, the metalearning performance is improved remarkably, especially if a robust prediction error is used.


Bayesian Networks for the Analysis of Subjective WellBeing
Švorc, Jan ; Vomlel, Jiří
We use Bayesian Networks to model the influence of diverse socioeconomic factors on subjective wellbeing and their interrelations. The classical statistical analysis aims at finding significant explanatory variables, while Bayesian Networks can also help sociologists to explain and visualize the problem in its complexity. Using Bayesian Networks the sociologists may get a deeper insight into the interplay of all measured factors and their influence on the variable of a special interest. In the paper we present several Bayesian Network models  each being optimal from a different perspective. We show how important it is to pay a special attention to a local structure of conditional probability tables. Finally, we present results of an experimental evaluation of the suggested approaches based on real data from a large international survey. We believe that the suggested approach is well applicable to other sociological problems and that Bayesian Networks represent a new valuable tool for sociological research.


Application of the Cox regression model with time dependent parameters to unemployment data
Volf, Petr
The contribution deals with the application of statistical survival analysis with the intensity described by a generalized version of Cox regression model with time dependent parameters. A\nmethod of model components nonparametric estimation is recalled, the flexibility of result is assessed with a goodnessoffit test based on martingale residuals. The application\nconcerns to the real data representing the job opportunities development and reduction, during a given period. The risk of leaving the company is changing in time and depends also on the age of employees and their time with company. Both these covariates are considered and their impact to the risk analyzed.

 

Detection of weak signals in noise
Tichavský, Petr
In this report, a bound on amplitude of impuls signal detection in stationary background noise is computed.\nIt was evaluated for an experiment with acoustic emission from mechanically strained sample.


Discrete Dynamic Endogenous Growth Model: Derivation, Calibration and Simulation
Kodera, J. ; Van Tran, Q. ; Vošvrda, Miloslav
Endogenous economic growth model were developed to improve traditional growth models with exogenous technological changes. There are several approaches how to incorporate technological progress into a growth model. Romer was the first author who has introduced it by expanding the variety of intermediate goods. Overall, the growth models are often continuous. In our paper we formulate a discrete version of Romer's model with endogenous technological change based on expanding variety of intermediates, both in the final good sector and in the researchdevelopment sector, where the target is to maximize present value of the returns from discovering of intermediate goods which should prevail introducing costs. These discrete version then will be calibrated by a numerical example. Our aim is to find the solution and analyse the development of economic variables with respect to external changes.


How to downweight observations in robust regression: A metalearning study
Kalina, Jan ; Pitra, Z.
Metalearning is becoming an increasingly important methodology for extracting knowledge from a data base of available training data sets to a new (independent) data set. The concept of metalearning is becoming popular in statistical learning and there is an increasing number of metalearning applications also in the analysis of economic data sets. Still, not much attention has been paid to its limitations and disadvantages. For this purpose, we use various linear regression estimators (including highly robust ones) over a set of 30 data sets with economic background and perform a metalearning study over them as well as over the same data sets after an artificial contamination.


Nonparametric Bootstrap Techniques for Implicitly Weighted Robust Estimators
Kalina, Jan
The paper is devoted to highly robust statistical estimators based on implicit weighting, which have a potential to find econometric applications. Two particular methods include a robust correlation coefficient based on the least weighted squares regression and the minimum weighted covariance determinant estimator, where the latter allows to estimate the mean and covariance matrix of multivariate data. New tools are proposed allowing to test hypotheses about these robust estimators or to estimate their variance. The techniques considered in the paper include resampling approaches with or without replacement, i.e. permutation tests, bootstrap variance estimation, and bootstrap confidence intervals. The performance of the newly described tools is illustrated on numerical examples. They reveal the suitability of the robust procedures also for noncontaminated data, as their confidence intervals are not much wider compared to those for standard maximum likelihood estimators. While resampling without replacement turns out to be more suitable for hypothesis testing, bootstrapping with replacement yields reliable confidence intervals but not corresponding hypothesis tests.


Experimental comparison of traffic flow models on traffic data
Přikryl, Jan ; Horňák, Ivan
Despite their deficiencies, continuous secondorder traffic flow models are still commonly used to derive discretetime models that help traffic engineers to model and predict traffic oflow behaviour on highways. We brie fly overview the development of traffic flow theory based on continuous flowdensity models of LighthillWhithamRichards (LWR) type, that lead to the secondorder model of AwRascle. We will then concentrate on widelyadopted discrete approximation to the LWR model by Daganzo's Cell Transmission Model. Behaviour of the discussed models will be demonstrated by comparing the traffic flow prediction based on these models with real traffic data on the southern highway ring of Prague.


Question Selection Methods for Adaptive Testing with Bayesian Networks
Plajner, Martin ; Magauina, A. ; Vomlel, Jiří
The performance of Computerized Adaptive Testing systems, which are used for testing of human knowledge, relies heavily on methods selecting correct questions for tested students. In this article we propose three different methods selecting questions with Bayesian networks as students’ models. We present the motivation to use these methods and their mathematical description. Two empirical datasets, paper tests of specific topics in mathematics and Czech language for foreigners, were collected for the purpose of methods’ testing. All three methods were tested using simulated testing procedure and results are compared for individual methods. The comparison is done also with the sequential selection of questions to provide a relation to the classical way of testing. The proposed methods are behaving much better than the sequential selection which verifies the need to use a better selection method. Individually, our methods behave differently, i.e., select different questions but the success rate of model’s predictions is very similar for all of them. This motivates further research in this topic to find an ordering between methods and to find the best method which would provide the best possible selections in computerized adaptive tests.
