National Repository of Grey Literature 4 records found  Search took 0.01 seconds. 
Data Lineage Analysis of Frameworks with Complex Interaction Patterns
Hýbl, Oskar ; Parízek, Pavel (advisor) ; Hnětynka, Petr (referee)
Manta Flow is a tool for analyzing data flow in enterprise environment. It features Java scanner, a module using static analysis to determine the flows through Java applications. To analyze an application using some framework, the scanner requires a dedicated plugin. Although Java scanner provides plugins for several frameworks, to be usable for real applications, it is essential that the scanner supports as many frameworks as possible, which requires implementation of new plugins. Application using Apache Spark, a framework for cluster computing, are increasingly popular. Therefore we designed and implemented Java scanner plugin that allows the scanner to analyze Spark applications. As Spark focuses on data processing, this presented several challenges that were not encountered in other frameworks. In particular it was necessary to resolve the data schema in various scenarios and track the schema changes throughout any operations invoked on the data. Of the multiple APIs Spark provides for data processing, we focused on Spark SQL module, notably on Dataset, omitting the legacy RDD. We also implemented support for data access, covering JDBC and chosen file formats. The implementation has been thoroughly tested and is proven to work correctly as a part of Manta Flow, which features the plugin in...
Data Lineage Analysis for Qlik Sense
Jurčo, Andrej ; Parízek, Pavel (advisor) ; Blicha, Martin (referee)
Business Intelligence has become essential for all companies and organizations in the world over the past few years when it comes to decision-making and observing long-term trends. It often happens that Business Intelligence tools that are used become very com- plex over time and it can then be very difficult to make any changes. Data lineage solves this problem by visualizing data flows and showing relative dependencies. Manta Flow is the platform which creates such lineage which supports programming languages (Java, C), databases (Oracle, MS SQL) or Business Intelligence tools (Cognos, Qlik Sense). The goal of this thesis was to implement a prototype of a scanner module for the Manta Flow platform, which would analyze data flows in Qlik Sense and create a data lineage graph from data sources to the presentation layer. This module extracts metadata neces- sary for the analysis, resolves the objects that are present in the Qlik Sense applications, and analyzes data flows in them. The resulting data lineage graph is then visualized by other components of the Manta Flow platform. 1
Analyzing Data Lineage in Database Frameworks
Eliáš, Richard ; Parízek, Pavel (advisor) ; Hnětynka, Petr (referee)
Large information systems are typically implemented using frameworks and libraries. An important property of such systems is data lineage - the flow of data loaded from one system (e.g. database), through the program code, and back to another system. We implemented the Java Resolver tool for data lineage analysis of Java programs based on the Symbolic analysis library for computing data lineage of simple Java applications. The library supports only JDBC and I/O APIs to identify the sources and sinks of data flow. We proposed some archi- tecture changes to the library to make easily extensible by plugins that can add support for new data processing frameworks. We implemented such plugins for few frameworks with different approach for accessing the data, including Spring JDBC, MyBatis and Kafka. Our tests show that this approach works and can be usable in practice. 1
Extracting Information from Database Modeling Tools
Drobný, Denis ; Parízek, Pavel (advisor) ; Kopecký, Michal (referee)
Data lineage is a way of showing how information flows through complicated software systems. If the given system is a database, tables and columns are visualized along with transformations of the stored data. However, this picture may be difficult to understand for people with weaker technical background, as database objects usually obey naming conventions and do not necessarily represent something tangible. To improve lineage comprehension, we developed a software called Metadata Extractor that on one hand brings the further description of the database objects, as well as introduces a whole new perspective on data in a system through business lineage aimed for non-technical users. The additional metadata enriching data lineage is extracted from data modeling tools, such as ER/Studio and PowerDesigner, that are widely used in the database design process. The solution extends the Manta Flow lineage tool while taking advantage of its features at the same time. 1

Interested in being notified about new results for this query?
Subscribe to the RSS feed.