Nikl, Vojtěch - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: Nikl, Vojtěch

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Debugging Information in Linker Nikl, Vojtěch ; Křoustek, Jakub (referee) ; Masařík, Karel (advisor) This thesis describes the conversion between the CCOFF object file format and the ELF file format. We start with a general object file format and its debbuging information, then we focus closely on the ELF, CCOFF and DWARF debugging information. The functionality of the CCOFF format is encapsulated in the ObjectFile class library. Then follows the description of creating an ELF object file, its filling with the proper data and its conversion back to the CCOFF format. Detailed record
	Automatization of Analysis of Performance and Power Consumption Rudolf, Tomáš ; Jaroš, Jiří (referee) ; Nikl, Vojtěch (advisor) This thesis deals with increasing efficiency of supercomputers. Higher efficiency can be achieved by reducing frequency of processor if the algorithm does not slow down significantly. This thesis presents set of scripts designed to monitor consumption of processor along with scripts that visualize these measured values. It also allows easy control of processor frequency. The created solution gives user a capability to measure given algorithm efficiency and optimize computing power of specific computer exactly for the algorithm. Due to this work the user will be informed about whether it is advantageous to run his algorithm on one or other frequency of the processor. Detailed record
	Assisted Code Vectorization and Parallelization Using the OpenMP 4.0 Standard Slouka, Lukáš ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor) The subject of the bachelor's thesis is code optimization using the OpenMP 4.0 standard which provides tools for assisted parallelization and vectorization. In addition to the descrip tion of the OpenMP 4.0 standard, the thesis as well contains an insight into architectures of modern computers, specifically the system of cache memories and SSE/AVX modules that play a major role in the optimization field. The thesis demonstrates advantages of optimized code compared to unoptimized version on a set of benchmarks which are aimed at various aspects of optimization. Detailed record
	Dynamic Load-Balancing in Parallel Applications Dvořáček, Vojtěch ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor) This thesis aims to implement dynamic load balancing mechanism into the parallel simulation model of the heat distribution in a CPU cooler. The first part introduces theoretical foundations for dynamic load balancing, describing current solution approaches. The second part refers to the heat distribution model and related topics such as MPI communications library or HDF library for data storage. Then it proceeds to the implementation of simulation model with dynamic 2D decomposition of square model domain. Custom geometry based dynamic load balancing algorithm was introduced, which works with this decomposition. Important part of the implementation is Zoltan library, used especially for data migration. At the end, a set of experiments was presented, which demonstrates load balancing abilities of designed model together with conclusions and motivation for future research. Detailed record
	Overclocking of Modern Processors with an Emphasis on Performance, Power Consumption and Temperature Kelečéni, Jakub ; Vaverka, Filip (referee) ; Nikl, Vojtěch (advisor) This thesis analyzes the dependency of performance, power consumption and temperature on processor frequency. Theoretical part discusses the processor architecture, benchmarks and algorithm types. Experimental part is focused on benchmarks - matrix multiplication, Quicksort, PI number calculation, Ackermann function, LAMMPS, PMBW, Linpack. This set of benchmarks includes both single-threaded and multi-threaded algorithms. Testing consist of three different settings of processor frequency. Multi-threaded benchmarks using different number of threads. Informations regarding the power consumption of CPU and RAM were recorded during these tests. Every test logs his running time. The impact of parallelization on power consumption and runtime is also reflected. Results from the tests are shown in charts and tables. The proper configuration of CPU for each given algorithm is analyzed in conclusion. Detailed record
	Development and Programming of Low Power Cluster Hradecký, Michal ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor) This thesis deals with the building and programming of a low power cluster composed of Hardkernel Odroid XU4 kits based on ARM Cortex A15 and Cortex A7 chips. The goal was to design a simple cluster composed of multiple kits and run a set of benchmarks to analyze performance and power consumption. The test set consisted of HPL and Stream benchmarks and various tests for the MPI interface. The overall performance of the cluster composed of four kits in HPL benchmark was measured 23~GFLOP/s in double-precision. During this test, the cluster showed power efficiency about 0.58~GFLOP/W. The work also describes the installation of PBS Torque scheduler and HPC software build and installation framework EasyBuild on 32-bit ARM platform. The comparison with Anselm supercomputer showed that Odroid cluster is as effiecient as large supercomputer but with slightly higher price. Detailed record
	Modern Programming Language Julia Fojtík, Pavel ; Grochol, David (referee) ; Nikl, Vojtěch (advisor) This work describes dynamic programming language Julia. Firstly, user is introduced to syntax and implementation of this language. Next there are advices for writing effective code and his optimalization. Also some examples of using Julia in scientific projects are described. Comparison between Julia, C and Python is in experimental part. Python and C were chosen as examples of statically and dynamically typed languages. Detailed record
	Analysis of Operational Data and Detection od Anomalies during Supercomputer Job Execution Stehlík, Petr ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor) V posledních letech jsou superpočítače stále větší a složitější, s čímž souvisí problém využití plného potenciálu systému. Tento problém se umocňuje díky nedostatku nástrojů pro monitorování, které jsou specificky přizpůsobeny uživatelům těchto systémů. Cílem práce je vytvořit nástroj, nazvaný Examon Web, pro analýzu a vizualizaci provozních dat superpočítače a provést nad těmito daty hloubkovou analýzu pomocí neurálních sítí. Ty určí, zda daná úloha běžela korektně, či vykazovala známky podezřelého a nežádoucího chování jako je nezarovnaný přístup do operační paměti nebo např. nízké využití alokovaých zdrojů. O těchto faktech je uživatel informován pomocí GUI. Examon Web je postavený na frameworku Examon, který sbírá a procesuje metrická data ze superpočítače a následně je ukládá do databáze KairosDB. Implementace zahrnuje disciplíny od návrhu a implementace GUI, přes datovou analýzu, těžení dat a neurální sítě až po implementaci rozhraní na serverové straně. Examon Web je zaměřen zejména na uživatele, ale může být také využíván administrátory. GUI je vytvořeno ve frameworku Angular s knihovnami Dygraphs a Bootstrap. Uživatel díky tomu může analyzovat časové řady různých metrik své úlohy a stejně jako administrátor se může informovat o současném stavu superpočítače. Tento stav je zobrazen jako několik globálně agregovaných metrik v posledních 30 minutách nebo jako 3D model (či 2D model) superpočítače, který získává data ze samotných uzlů pomocí protokolu MQTT. Pro kontinuální získávání dat bylo využito rozhraní WebSocket s vlastním mechanismem přihlašování a odhlašování konkretních metrik zobrazovaných v modelu. Při analýze spuštěné úlohy má uživatel dostupné tři různé pohledy na danou úlohu. První nabízí celkový přehled o úloze a informuje o využitých zdrojích, času běhu a vytížení části superpočítače, kterou úloha využila společně s informací z neurálních sítí o podezřelosti úlohy. Další dva pohledy zobrazují metriky z výkonnostiního energetického hlediska. Pro naučení neurálních sítí bylo potřeba vytvořit novou datovou sadu ze superpočítače Galileo. Tato sada obsahuje přes 1100 úloh monitorovaných na tomto superpočítači z čehož 500 úloh bylo ručně anotováno a následně použito pro trénování sítí. Neurální sítě využívají model back-propagation, vhodný pro anotování časových sérií fixní délky. Celkem bylo vytvořeno 12 sítí pro metriky zahrnující vytížení procesoru, paměti a dalších části a např. také podíl celkového času procesoru v úsporném režimu C6. Tyto sítě jsou na sobě nezávislé a po experimentech jejich finální konfigurace 80-20-4-3-1 (80 vstupních až 1 výstupní neuron) podávaly nejlepší výsledky. Poslední síť (v konfiguraci 12-4-3-1) anotovala výsledky předešlých sítí. Celková úspěšnost systému klasifikace do 2 tříd je 84 %, což je na použitý model velmi dobré. Výstupem této práce jsou dva produkty. Prvním je uživatelské rozhraní a jeho serverová část Examon Web, která jakožto rozšiřující vrstva systému Examon pomůže s rozšířením daného systému mezi další uživatele či přímo další superpočítačová centra. Druhým výstupem je částečně anotovaná datová sada, která může pomoci dalším lidem v jejich výzkumu a je výsledkem spolupráce VUT, UNIBO a CINECA. Oba výstupy budou zveřejněny s otevřenými zdrojovými kódy. Examon Web byl prezentován na konferenci 1st Users' Conference v Ostravě pořádanou IT4Innovations. Další rozšíření práce může být anotace datové sady a také rozšíření Examon Web o rozhodovací stromy, které určí přesný důvod špatného chování dané úlohy. Detailed record
	Parallelization of Ultrasound Simulations Using 2D Decomposition Nikl, Vojtěch ; Dvořák, Václav (referee) ; Jaroš, Jiří (advisor) This thesis is a part of the k-Wave project, which is a toolbox for the simulation and reconstruction of acoustic wave felds and one of its main contributions is the planning of focused ultrasound surgeries (HIFU). One simulation can take tens of hours and about 60% of the simulation time is taken by the calculation of the 3D Fast Fourier transforms. Up until now the 3D FFT has been calculated purely by the FFTW library and its 1D decomposition, whose major limitation is the maximum number of employable cores. Therefore we introduce a new approach, called the 2D hybrid decomposition of the 3D FFT (HybridFFT), where we combine both MPI processes and OpenMP threads to reach as best performance as possible. On a low number of cores, on the order of a few hundreds, we are about as fast or slightly faster than FFTW and pure MPI 2D decomposition libraries (PFFT and P3DFFT). One of the best results was achieved on a 512^3FFT using 512 cores, where our hybrid version run 31ms, FFTW run 39ms and PFFT run 44ms. The most significant performance advantage should be seen when employing around 8-16 thousand cores, however we haven't had an access to a machine with such resources. Almost a linear scalability has been proven for up to 2048 employed cores. Detailed record
	Parallelization of Ultrasound Simulations Using 2D Decomposition Nikl, Vojtěch ; Dvořák, Václav (referee) ; Jaroš, Jiří (advisor) This thesis is a part of the k-Wave project, which is a toolbox for the simulation and reconstruction of acoustic wave felds and one of its main contributions is the planning of focused ultrasound surgeries (HIFU). One simulation can take tens of hours and about 60% of the simulation time is taken by the calculation of the 3D Fast Fourier transforms. Up until now the 3D FFT has been calculated purely by the FFTW library and its 1D decomposition, whose major limitation is the maximum number of employable cores. Therefore we introduce a new approach, called the 2D hybrid decomposition of the 3D FFT (HybridFFT), where we combine both MPI processes and OpenMP threads to reach as best performance as possible. On a low number of cores, on the order of a few hundreds, we are about as fast or slightly faster than FFTW and pure MPI 2D decomposition libraries (PFFT and P3DFFT). One of the best results was achieved on a 512^3FFT using 512 cores, where our hybrid version run 31ms, FFTW run 39ms and PFFT run 44ms. The most significant performance advantage should be seen when employing around 8-16 thousand cores, however we haven't had an access to a machine with such resources. Almost a linear scalability has been proven for up to 2048 employed cores. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English