Original title:
Automatické přiřazení diagnoz lékařským zprávám
Translated title:
Automatic assignment of diagnosis to medical reports
Authors:
Lachata, Adrián ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee) Document type: Bachelor's theses
Year:
2014
Language:
slo Abstract:
[eng][cze] The goal of the thesis is to examine the percentage of automatically assigned diagnosis codes (ICD10) to Czech text medical reports. We used machine learning and text classification algorithms such as Naive Bayes and decision trees. Program WEKA was used for classification. Features selection and data preprocessing were made by our program, which was created exclusive for this purpose. The key features of the program are features selection based on IG or PMI, text lemmatization and stopwords generation by IDF. We took closer look at I10 diagnosis but the results were processed for H660, J00, K30 and Z001 as well. For the curiosity we include a comparison of automatic assignment I10 versus manuals assignment by doctors on a sample of hundred. Out data set was about one million medical reports.Cieľom práce je preskúmať úspešnosť automatického priraďovania kódov diagnóz (ICD10) lekárskym správam písaných v českom jazyku. Použili sme metódy strojového učenia a algoritmy na kategorizáciu textu ako sú Naive Bayes a Rozhodovacie stromy. Na samotnú klasifikáciu sme využili program WEKA. Na výber atribútov a predspracovanie dát sme vytvorili vlastný program. Hlavné schopnosti programu sú vybratie atribútov na základe IG alebo PMI, lematizácia textu a generovanie stopwords podľa IDF. Najviac sme skúmali diagnózu I10 ale výsledky boli spracované aj pre H660, J00, K30 a Z001. Ako zaujímavosť sme uviedli porovnanie automatického verzus manuálneho priradenia I10 priamo lekármi na vzorke 100 správ. Celkovo sme mali k dispozícií milión správ.
Keywords:
ICD-10; machine learning; text classification; ICD-10; kategorizácia textu; strojové učenie
Institution: Charles University Faculties (theses)
(web)
Document availability information: Available in the Charles University Digital Repository. Original record: http://hdl.handle.net/20.500.11956/71541