National Repository of Grey Literature 15 records found  1 - 10next  jump to record: Search took 0.01 seconds. 
Quantitative analysis of networked environments to improve performance of information systems
Petříček, Václav ; Pokorný, Jaroslav (advisor) ; Cox, Ingemar J. (referee) ; Snášel, Václav (referee)
In this thesis we encounter networks in three contexts i) as the citation networks between documents in citation databases CiteSeer and DBLP, ii) as the structure of e-government websites that is navigated by users and iii) as the social network of users of a photo-sharing site Flickr and a social networking site Yahoo!360. We study the properties of networks present in real datasets, what are the effects of their structure and how this structure can be exploited. We analyze the citation networks between computer science publications and compare them to those described in Physics community. We also demonstrate the bias of citation databases collected autonomously and present mathematical models of this bias. We then analyze the link structure of three websites extracted by exhaustive crawls. We perform a user study with 134 participants on these websites in an lab. We discuss the structure of the link networks and the performance of subjects in locating information on these websites. We finally exploit the knowledge of users' social network to provide higher quality recommendations than current collaborative filtering techniques and demonstrate the performance benefit on two real datasets.
Extrakce informací z webových stránek pomoci extrakčních ontologií
Labský, Martin ; Berka, Petr (advisor) ; Strossa, Petr (referee) ; Vojtáš, Peter (referee) ; Snášel, Václav (referee)
Automatic information extraction (IE) from various types of text became very popular during the last decade. Owing to information overload, there are many practical applications that can utilize semantically labelled data extracted from textual sources like the Internet, emails, intranet documents and even conventional sources like newspaper and magazines. Applications of IE exist in many areas of computer science: information retrieval systems, question answering or website quality assessment. This work focuses on developing IE methods and tools that are particularly suited to extraction from semi-structured documents such as web pages and to situations where available training data is limited. The main contribution of this thesis is the proposed approach of extended extraction ontologies. It attempts to combine extraction evidence from three distinct sources: (1) manually specified extraction knowledge, (2) existing training data and (3) formatting regularities that are often present in online documents. The underlying hypothesis is that using extraction evidence of all three types by the extraction algorithm can help improve its extraction accuracy and robustness. The motivation for this work has been the lack of described methods and tools that would exploit these extraction evidence types at the same time. This thesis first describes a statistically trained approach to IE based on Hidden Markov Models which integrates with a picture classification algorithm in order to extract product offers from the Internet, including textual items as well as images. This approach is evaluated using a bicycle sale domain. Several methods of image classification using various feature sets are described and evaluated as well. These trained approaches are then integrated in the proposed novel approach of extended extraction ontologies, which builds on top of the work of Embley [21] by exploiting manual, trained and formatting types of extraction evidence at the same time. The intended benefit of using extraction ontologies is a quick development of a functional IE prototype, its smooth transition to deployed IE application and the possibility to leverage the use of each of the three extraction evidence types. Also, since extraction ontologies are typically developed by adapting suitable domain ontologies and the ontology remains in center of the extraction process, the work related to the conversion of extracted results back to a domain ontology or schema is minimized. The described approach is evaluated using several distinct real-world datasets.
Binární faktorová analýza založená na neuronových sítích jako nástroj pro shlukování velkých datových souborů
Frolov, A. A. ; Húsek, Dušan ; Snášel, Václav ; Řezanková, H. ; Polyakov, P.Y.
The feature space transformation is a widely used method for data compression. Due to this transformation the original patterns are mapped into the space of features or factors of reduced dimensionality. In this paper we demonstrate that Hebbian learning in Hopfield-like neural network is a natural procedure for binary factorization. This paper is dedicated to estimation of the size of attraction basins around factors. Two global spurious attractors are shown to prevent convergence of the network activity to the factors invalidating any procedure of their search. These global attractors can be completely deleted from network dynamics by introducing a single inhibitory neuron with bi-directional Hebbian synapses. Due to additional inhibition, the size of attraction basins around factors becomes the same as around the stored patterns in usual Hopfield network.
Cluster Analysis and Textual Data
Húsek, Dušan ; Řezanková, H. ; Snášel, Václav
Applicability of the cluster analysis in the area of a large textual databases is studied. Main principles of clustering algorithms are discussed and compared from the point of view of their applicability in this field.
National Conference of Knowledge
Húsek, Dušan ; Pokorný, J. ; Snášel, Václav
Thie special issue contains selected papers from the 3rd National Conference on Knowledge (Znalosti 2004), devoted to following topics: knowledge discovery, textual and multimedia information, knowledge engineering, knowledge management.

National Repository of Grey Literature : 15 records found   1 - 10next  jump to record:
See also: similar author names
1 Snášel, V.
Interested in being notified about new results for this query?
Subscribe to the RSS feed.