National Repository of Grey Literature 112 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Extending Data Lineage Analysis for Python with Runtime Types
Luňák, Václav ; Parízek, Pavel (advisor) ; Petříček, Tomáš (referee)
There is an increasing demand in the domain of data science for auto- mated tools analyzing the data lineage of software systems. In situations where general-purpose programming languages are used, Python is among the most popular choices. It is also one of the most challenging to analyze. Manta Flow is an automated data lineage analysis platform that contains a scanner for Python. In this work, we developed an extension of this scanner. Its purpose is to statically determine the types of expressions in an analyzed application. We achieved this by expanding the concept of data flows to carry type information and we appropriately refactored the internals of the scanner. This information was then used to implement an improved method for finding the targets of function invocations during the analysis of data flows.
Performance and Usability Improvements for Data Lineage Analysis of C# Programs
Kleprlík, Jan ; Parízek, Pavel (advisor) ; Bednárek, David (referee)
The Manta Flow is a highly automated static analysis platform producing data lineage over its input and representing it in a graph. The platform performs analysis on various technologies and programming languages via specialised scanners. One of the scanners performs analysis of C# code, or rather its compiled alternative Common Intermediate Language. While the scanner was already capable of analysing non-trivial scenarios, it lacked in some aspects that held it up from its broader adoption by customers. The main issues are low support for analysis of real-life scenarios such as web applications or embedded code in other technologies, sub-optimal performance and imprecise lineage output. As a part of this thesis, we improved the precision, scalability and performance of the scanner on multiple levels of abstractions, from analysis of the CIL to modifications of core high-level analysis algorithms. We added support for analysis of the ASP.NET web endpoints and enabled the C# scanner to be used as a service for analysis of embedded code in other technologies. We improved the precision of the resulting lineage for existing scenarios by modifying the core algorithms used throughout the analysis and optimized the analysis process to lift its performance. 1
Fuzz testing of network subsystem in PikeOS
Piroutek, Jan ; Parízek, Pavel (advisor) ; Yaghob, Jakub (referee)
Stability under every possible circumstance is a goal for a lot of applications. This problem applies to the network stack ANIS of the real-time operating system PikeOS developed by SYSGO. PikeOS requires security and stability because it is used in areas, e.g., airborne systems, where unstable software could cause severe damage. A proven way to ensure the stability and security of software is testing. Fuzzing is an automated testing technique that generates randomized inputs for the application to find bugs, vulnerabilities, or crashes within the application. Another testing technique is long- run testing, which exposes an application to some input for longer periods. Because ANIS is a product usually shipped with PikeOS, it must follow the same security standards. We have developed a testing tool for the ANIS network stack, using the two mentioned techniques and emphasizing the option to configure such a test. This testing tool exposes the ANIS to various scenarios that could stress the stack and uses fuzzing to create a combination of these scenarios automatically, which could crash the network stack. The developed test is implemented with a small set of scenarios that expose ANIS to various network traffic. The test can be extended to work with more scenarios. All scenarios have a predefined set of...
Data Lineage Analysis for Databricks platform
Potočeková, Natália ; Parízek, Pavel (advisor) ; Škoda, Petr (referee)
Notebook-based technologies, like Databricks and Jupyter notebooks, have gained popularity in recent years due to their adaptability and convenience. A notebook is an interactive computational environment that allows users to create documents that contain code, visualizations, and explanatory text in one place. Notebooks provide a space for data exploration, analysis, and documentation, enabling users to easily develop and present their work. The ability to combine code execution with explanations and visualizations within a single document promotes reproducibility, enhances collaboration among team members, and motivates data scientists to efficiently work with data. In this work, we analyzed the Databricks technology in order to extend the Manta Flow platform, a highly automated data lineage analysis tool, to support this technology. We designed and implemented a new scanner that provides basic support for analyzing Databricks notebooks written in Python and Databricks SQL languages. We also provide an implementation of a so-called shared context that can be used for passing information between different scanners in the Manta Flow platform. To visualize the interactions between languages and scanners we extended the Manta graph with a new node type that represents the shared context. Alongside this, we...
Vulnerabilities of web applications
Žák, Vojtěch ; Mareš, Martin (advisor) ; Parízek, Pavel (referee)
Se stále dostupnějším připojením k internetu roste i počet webových aplikací a jejich uživatelů. S tímto nárůstem - jak už to bývá se vším - se zvětšuje i počet lidí, kteří se snaží nedostatky v těchto aplikacích zneužít. V této práci si ukážeme, jak nejčastější zranitelnosti webových aplikací fungují, a jak se jich jako vývojáři můžeme vyvarovat. Zaměříme se i na to, jak odhalit zranitelnosti aplikací jako uživatelé, a jak provést útoky zneužívající tyto zranitelnosti. Součástí této bakalářské práce je také projekt Vulnerability Presentation Server (Vul- pes), který vznikal ve spolupráci se společností CZ.NIC. Jedná se o soubor webových aplikací, ve kterých jsou za účelem jejich demonstrace záměrně ponechány zranitelnosti zmíněné v tomto textu. 1
Data Lineage Analysis Service for Embedded Code
Jurčo, Michal ; Parízek, Pavel (advisor) ; Bednárek, David (referee)
Data integration tools often use embedded code for data manipulation tasks. Popular examples of such tools include AWS Glue data integration service, Databricks platform, Snowflake data cloud or SQL Server Integration Services (SSIS). Embedded code is typi- cally written in programming languages such as Python, Java, C# or JavaScript. Manta Flow is an automated platform that can analyze data lineage in database models, data pipelines of data integration tools, and in application source code, but it lacks the ability to analyze embedded code. In this work, we discussed potential ways to extend the capabilities of Manta Flow with the ability to analyze data lineage in embedded code. We created a general design of a reusable Embedded Code Service that leverages the existing potential of data flow analysis of source code, and uses it to analyze embedded code. We implemented a specialization of this service for the Python programming language, and to demonstrate its usefulness, we designed and implemented a prototype of data lineage scanner for AWS Glue data integration service. This scanner extensively uses the service to analyze data lineage in embedded Python scripts, which we demonstrated on a realistic example. 1
Visualization of SMT solvers results
Bobeničová, Michaela ; Kofroň, Jan (advisor) ; Parízek, Pavel (referee)
Nowadays, SMT solvers are used for solving various problems in multiple fields. Therefore, it makes sense to optimize their speed depending on the size and type of the problems. However, going over and comparing results in tables containing information of the runs of the solvers is impractical. The goal of this thesis is creating a graphical interface for simplifying the analysis of the performance of the solvers. It has a form of a web application. It allows a solver developer to compare performance of multiple solvers on given problem sets. The interface contains visualizations of the performance in the form of interactive plots and tables.
Data Lineage Analysis for PySpark and Python ORM Libraries
Jurčo, Andrej ; Parízek, Pavel (advisor) ; Škoda, Petr (referee)
In the world of ETL tools and data processing, Python is one of the main languages used in practice. Python scripts that define data manipulations usually use the same Python framework, PySpark, which is the Python API for the Spark framework, alongside database libraries, using their ORM features. These ORM features usually work in a similar way in most of the relevant libraries. Recently, MANTA Flow, a highly automated data lineage analysis tool, was extended with a Python language scanner and now it is in the phase of being extended to support more commonly used frameworks. In this work, we analyzed the PySpark library and the SQLAlchemy ORM technology in order to extend the MANTA's Python scanner with the support for these two frequently used tools. In case of the PySpark library, we designed and implemented a core of the plugin to the Python scanner which supports elementary functionality. The plugin is capable of analyzing various DataFrame input and output options available in PySpark for both file and database data sources, and it is able to propagate data flows during transformations with reasonable level of overapproximation, as demonstrated in the work. In case of the SQLAlchemy ORM, we designed a solution that would allow the scanner to analyze the ORM source code and its core could be used to...
Rubik's cube
Bošániová, Monika ; Majerech, Vladan (advisor) ; Parízek, Pavel (referee)
The main goal of this thesis is to simplify the beginners' experience with learning and independently solving the Rubik's cube. We provide different perspectives on how to find the solution for this puzzle. The implementation of all used components, chosen solving process for beginners, appearance of the application environment and the interactivity of different elements are explained and described in an easily understandable way. We included insights on the teaching process and analysis of its effectiveness in comparison to similar existing solutions. User has an option of adding their own solving algorithm in text format. The text contains user documentation and suggests possible improvements for future development. 1
Machine-learning-based self-adaptation of component ensembles
Töpfer, Michal ; Bureš, Tomáš (advisor) ; Parízek, Pavel (referee)
In the area of distributed self-adaptive smart systems (such as applications of Internet of Things and Cyber-Physical Systems), machine learning has been successfully used in several applications including the prediction of metrics regarding the components in the system (e.g., battery consumption), and pruning of the space of possible adaptations. It is clear that machine learning can be a useful tool in self-adaptive systems. Most of the research works focus on using the machine learning algorithms for a specific task, yet they are (at least partially) lacking in providing a systematic approach to the introduction of machine learning into the architecture of the system. In this thesis, we propose ML-DEECo - a machine-learning-enabled component model for adaptive component architectures. It is based on the concepts of autonomous com- ponents and their ensembles (coalitions) from the DEECo component model. We enrich DEECo with abstractions for specifying machine-learning-based estimates directly in the architecture of the system. The architect can thus focus on the business logic of the application while all the tasks necessary to provide the estimates (such as collecting the data and training the model) are provided by our runtime framework. We provide an implementation of the ML-DEECo runtime in Python and...

National Repository of Grey Literature : 112 records found   1 - 10nextend  jump to record:
See also: similar author names
5 Parízek, Pavel
5 Pařízek, Pavel
7 Pařízek, Petr
Interested in being notified about new results for this query?
Subscribe to the RSS feed.