National Repository of Grey Literature 12 records found  1 - 10next  jump to record: Search took 0.00 seconds. 
High-performance exploration and querying of selected multi-dimensional spaces in life sciences
Kratochvíl, Miroslav ; Bednárek, David (advisor) ; Glaab, Enrico (referee) ; Svozil, Daniel (referee)
This thesis studies, implements and experiments with specific application-oriented approaches for exploring and querying multi-dimensional datasets. The first part of the thesis scrutinizes indexing of the complex space of chemical compounds, and details a design of high-performance retrieval system for small molecules. The resulting system is then utilized within a wider context of federated search in heterogeneous data and metadata related to the chemical datasets. In the second part, the thesis focuses on fast visualization and exploration of many-dimensional data that originate from single- cell cytometry. Self-organizing maps are used to derive fast methods for analysis of the datasets, and used as a base for a novel data visualization algorithm. Finally, a similar approach is utilized for highly interactive exploration of multimedia datasets. The main contributions of the thesis comprise the advancement in optimization and methods for querying the chemical data implemented in the Sachem database cartridge, the federated, SPARQL-based interface to Sachem that provides the heterogeneous search support, dimensionality reduction algorithm EmbedSOM, design and implementation of the specific EmbedSOM-backed analysis tool for flow and mass cytometry, and design and implementation of the multimedia...
Quantitative structure-activity relationship and machine learning
Nierostek, Jakub ; Uhlík, Filip (advisor) ; Svozil, Daniel (referee)
Quantitative structure-activity relationship (QSAR) computational methods allow us to examine the relationship between the chemical structure of molecules and their chemical or biological properties. For QSAR calculations, widely used machine learning methods, such as deep learning models, can be used. In this work, we construct a pipeline for training QSAR machine-learning models that can predict molecular toxicity. Furthermore, we investigate the effect of molecular representation on model performance. Both our deep learning mod- els and traditional machine learning models are employed on Tox21 and Ames Mutagenicity datasets. Their performance is evaluated against recently published models for toxicity prediction using the AUC-ROC metric and, regarding certain toxicity targets, shows improvement over these models. Keywords: QSAR, machine learning, deep learning, molecular descriptors 1
Similarity Search in Protein Databases
Hoksza, David ; Skopal, Tomáš (advisor) ; Navarro, Gonzalo (referee) ; Svozil, Daniel (referee)
One of the principal operations in the area of bioinformatics is similarity assessment at the levels of protein sequence (string of characters) and protein structure (3D shape). It is employed in a wide range of applications such as protein structure prediction, protein function assessment, automatic classification, etc. The protein databases have been growing exponentially in recent years, thus making the existing methods for similarity retrieval inappropriate concerning the volume of the protein-related data. In this thesis, we focus on similarity retrieval on protein sequence and structure levels. At both levels, we propose improvements to the existing methods, as well as novel methods for managing proteins from the similarity perspective. In the first part of the thesis we approach the problem of similarity retrieval at protein sequence level. First, we evaluate the possibilities of utilizing metric access methods for efficient storing and retrieval of protein sequences. Then, we focus on the protein similarity measure itself. Since the similarity computation of protein sequences is based on dynamic programming, we introduce an improvement for increasing efficiency (response time) of the retrieval by reusing parts of the dynamic programming matrix, while maintaining original effectiveness (quality of...
Similarity search in Mass Spectra Databases
Novák, Jiří ; Skopal, Tomáš (advisor) ; Svozil, Daniel (referee) ; Nahnsen, Sven (referee)
Shotgun proteomics is a widely known technique for identification of protein and peptide sequences from an "in vitro" sample. A tandem mass spectrometer generates tens of thousands of mass spectra which must be annotated with peptide sequences. For this purpose, the similarity search in a database of theoretical spectra generated from a database of known protein sequences can be utilized. Since the sizes of databases grow rapidly in recent years, there is a demand for utilization of various database indexing techniques. We investigate the capabilities of (non)metric access methods as the database indexing techniques for fast and approximate similarity retrieval in mass spectra databases. We show that the method for peptide sequences identification is more than 100x faster than a sequential scan over the entire database while more than 90% of spectra are correctly annotated with peptide sequences. Since the method is currently suitable for small mixtures of proteins, we also utilize a precursor mass filter as the database indexing technique for complex mixtures of proteins. The precursor mass filter followed by ranking of spectra by a modification of the parametrized Hausdorff distance outperforms state-of-the-art tools in the number of identified peptide sequences and the speed of search. The...
Similarity Search in Protein Structure Databases
Galgonek, Jakub ; Skopal, Tomáš (advisor) ; Porto, Markus (referee) ; Svozil, Daniel (referee)
Proteins are one of the most important biopolymers having a wide range of functions in living organisms. Their huge functional diversity is achieved by their ability to fold into various 3D structures. Moreover, it has been shown that proteins sharing similar structure often share also other properties (e.g, a biological function, an evolutionary origin, etc.). Therefore, protein structures and methods to identify their similarities are so widely studied. In this thesis, we introduce a system allowing similarity search in pro- tein structure databases. The system retrieves, given a query structure, all database structures being similar to the query structure. It employs several key components. We have introduced a novel similarity measure assigning similarity scores to pairs of protein structures. We have designed specific access method based on LAESA metric indexing and using the proposed measure. The access method allows to search similar structures more effi- ciently than when a sequential scan of a database is employed. To achieve further speedup, the measure and the access method have been parallelized, resulting in almost linear speedup with the respect to the number of available cores. The last component is a web user interface that allows to accept a query structure and to present a list of...
High-performance exploration and querying of selected multi-dimensional spaces in life sciences
Kratochvíl, Miroslav ; Bednárek, David (advisor) ; Glaab, Enrico (referee) ; Svozil, Daniel (referee)
This thesis studies, implements and experiments with specific application-oriented approaches for exploring and querying multi-dimensional datasets. The first part of the thesis scrutinizes indexing of the complex space of chemical compounds, and details a design of high-performance retrieval system for small molecules. The resulting system is then utilized within a wider context of federated search in heterogeneous data and metadata related to the chemical datasets. In the second part, the thesis focuses on fast visualization and exploration of many-dimensional data that originate from single- cell cytometry. Self-organizing maps are used to derive fast methods for analysis of the datasets, and used as a base for a novel data visualization algorithm. Finally, a similar approach is utilized for highly interactive exploration of multimedia datasets. The main contributions of the thesis comprise the advancement in optimization and methods for querying the chemical data implemented in the Sachem database cartridge, the federated, SPARQL-based interface to Sachem that provides the heterogeneous search support, dimensionality reduction algorithm EmbedSOM, design and implementation of the specific EmbedSOM-backed analysis tool for flow and mass cytometry, and design and implementation of the multimedia...
Computational study of short peptides and miniproteins in different environments
Vymětal, Jiří ; Vondrášek, Jiří (advisor) ; Svozil, Daniel (referee) ; Berka, Karel (referee)
Apart from biological functions, peptides are of uttermost importance as models for un- folded, denatured or disordered state of the proteins. Similarly, miniproteins such as Trp-cage have proven their role as simple models of both experimental and theoretical studies of protein folding. Molecular dynamics and computer simulations can provide an unique insight on processes at atomic level. However, simulations of peptides and minipro- teins face two cardinal problems-inaccuracy of force fields and inadequate conformation sampling. Both principal issues were tackled in this theses. Firstly, the differences in several force field for peptides and proteins were questioned. We demonstrated the inability of the used force fields to predict consistently intrinsic conformational preferences of individual amino acids in the form of dipeptides and the source of the discrepancies was traced. In order to shed light on the nature of conformational ensembles under various denatur- ing conditions, we studied host-guest AAXAA peptides. The simulations revealed that thermal and chemical denaturation by urea produces qualitatively different ensembles and shift propensities of individual amino acids to particular conformers. The problem of insufficient conformation sampling was dealt by introducing gyration- and...
Similarity search in Mass Spectra Databases
Novák, Jiří ; Skopal, Tomáš (advisor) ; Svozil, Daniel (referee) ; Nahnsen, Sven (referee)
Shotgun proteomics is a widely known technique for identification of protein and peptide sequences from an "in vitro" sample. A tandem mass spectrometer generates tens of thousands of mass spectra which must be annotated with peptide sequences. For this purpose, the similarity search in a database of theoretical spectra generated from a database of known protein sequences can be utilized. Since the sizes of databases grow rapidly in recent years, there is a demand for utilization of various database indexing techniques. We investigate the capabilities of (non)metric access methods as the database indexing techniques for fast and approximate similarity retrieval in mass spectra databases. We show that the method for peptide sequences identification is more than 100x faster than a sequential scan over the entire database while more than 90% of spectra are correctly annotated with peptide sequences. Since the method is currently suitable for small mixtures of proteins, we also utilize a precursor mass filter as the database indexing technique for complex mixtures of proteins. The precursor mass filter followed by ranking of spectra by a modification of the parametrized Hausdorff distance outperforms state-of-the-art tools in the number of identified peptide sequences and the speed of search. The...

National Repository of Grey Literature : 12 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.