Název: Slabiková komprese
Překlad názvu: Syllable-Based Compression
Autoři: Lánský, Jan ; Pokorný, Jaroslav (vedoucí práce) ; Dvorský, Jiří (oponent) ; Grabowski, Szymon (oponent)
Typ dokumentu: Disertační práce
Rok: 2009
Jazyk: eng
Abstrakt: Classic textual compression methods work over the alphabet of characters or alphabet of words. For languages with rich morphology as well as for compression of smaller files it can be advantageous to use an alphabet of syllables. For some compression methods like the ones based on Burrows-Wheeler transformation the syllable is a reasonable solution also for large files - even for languages having quite simple morphology. Although the main goal of our research is the compression over the alphabet of syllables, all implemented methods can compress also over the alphabet of words. For small files we use the LZW method and Huffman coding. These methods were improved by the use of initialized dictionary containing characteristic syllables specific for given language. For the compression of very large files we implemented the project XBW allowing combination of compression methods BWT, MTF, RLE, PPM, LZC, and LZSS. We have also tried to compress XML files that are not well-formed. When compressing over a large alphabet, it is necessary to compress also the used alphabet. We have proposed two solutions. The first one works well especially for small documents. We initialize the compression method with a set of characteristic syllables whereas other syllables are coded when necessary character by character. The...

Instituce: Fakulty UK (VŠKP) (web)
Informace o dostupnosti dokumentu: Dostupné v digitálním repozitáři UK.
Původní záznam: http://hdl.handle.net/20.500.11956/84670

Trvalý odkaz NUŠL: http://www.nusl.cz/ntk/nusl-354464


Záznam je zařazen do těchto sbírek:
Školství > Veřejné vysoké školy > Univerzita Karlova > Fakulty UK (VŠKP)
Vysokoškolské kvalifikační práce > Disertační práce
 Záznam vytvořen dne 2017-06-20, naposledy upraven 2022-03-04.


Není přiložen dokument
  • Exportovat ve formátu DC, NUŠL, RIS
  • Sdílet