Extracting Information from Medical Texts
Zvára, Karel ; Svátek, Vojtěch (advisor) ; Veselý, Arnošt (referee) ; Skalská, Hana (referee)
The aim of my work was to find out the specific features of Czech medical reports in terms of the possibility of extracting specific information from them. For my work, I had a total of 268 anonymized narrative medical reports from two outpatient departments. I have studied standards for preserving electronic health records and for transferring clinical information between healthcare information systems. I have also participated in the process of implementing electronic medical record in the field of dentistry. First of all, I tried to process narrative medical reports using natural language processing (NLP) tools. I came to the conclusion that narrative medical reports in the Czech language are very different than a typical Czech text, especially because it mostly contains short telegraphic phrases and the texts lack typical Czech sentence structure. It also contains many misspellings, acronyms and abbreviations. Another problem was the absence of existence of the Czech translation of the main international classification systems. Therefore I decided to continue the research by developing the method for pro-processing the input text for translation and its semantic annotation. The main objective of this part of the research was to propose a method and support software for interactive correction...
The language of medical reports and its information-lexical analysis
Přečková, Petra ; Zvárová, Jana (advisor) ; Hanzlíček, Petr (referee) ; Skalská, Hana (referee)
The objective of the dissertation thesis has been the information-lexical analysis of Czech medical reports and the usability of international classification systems in the Czech healthcare environment. The analysis of medical reports has been based on the attributes of the Minimal Data Model for Cardiology (MDMC). Narrative medical reports and structured medical reports from the ADAMEK software application have been used. For the thesis SNOMED CT and ICD-10 classification systems have been used. There has been compared how well attributes of MDMC are recorded in narrative and structured medical reports. The language analysis of the Czech narrative medical reports has been made. A new application for measuring diversity in medical reports written in any language is proposed. The application is based on the general concepts of diversities derived from f-diversity, relative f- diversity, self f-diversity and marginal f-diversity. The thesis has come to the conclusion that using a free text in medical reports is not consistent and not standardized. The standardized terminology would bring benefits to physicians, patients, administrators, software developers and payers and it would help healthcare providers as it could provide complete and easily accessible information that belongs to the process of...
Quality measures of classification models and their conversion
Hanusek, Lubomír ; Hebák, Petr (advisor) ; Řezanková, Hana (referee) ; Skalská, Hana (referee)
Predictive power of classification models can be evaluated by various measures. The most popular measures in data mining (DM) are Gini coefficient, Kolmogorov-Smirnov statistic and lift. These measures are each based on a completely different way of calculation. If an analyst is used to one of these measures it can be difficult for him to asses the predictive power of a model evaluated by another measure. The aim of this thesis is to develop a method how to convert one performance measure into another. Even though this thesis focuses mainly on the above-mentioned measures, it deals also with other measures like sensitivity, specificity, total accuracy and area under ROC curve. During development of DM models you may need to work with a sample that is stratified by values of the target variable Y instead of working with the whole population containing millions of observations. If you evaluate a model developed on a stratified data you may need to convert these measures to the whole population. This thesis describes a way, how to carry out this conversion. A software application (CPM) enabling all these conversions makes part of this thesis. With this application you can not only convert one performance measure to another, but you can also convert measures calculated on a stratified sample to the whole population. Besides the above mentioned performance measures (sensitivity, specificity, total accuracy, Gini coefficient, Kolmogorov-Smirnov statistic), CPM will also generate confusion matrix and performance charts (lift chart, gains chart, ROC chart and KS chart). This thesis comprises the user manual to this application as well as the web address where the application can be downloaded. The theory described in this thesis was verified on the real data.

