
Simplified Multiplication in Convolutional Neural Networks
Juhaňák, Pavel ; Jaroš, Jiří (referee) ; Sekanina, Lukáš (advisor)
This thesis provides an introduction to classical and convolutional neural networks. It describes how hardware multiplication is conventionally performed and optimized. A simplified multiplication method is proposed, namely multiplierless multiplication. This method is implemented and integrated into the TypeCNN library. The cost of the hardware solution of both conventional and simplified multipliers is estimated. The thesis also introduces software tools developed to work with convolutional neural networks and datasets used to test them in the image classification task. Test architectures and experimentation methodology are proposed. The results are evaluated, and both the classification accuracy and cost of the hardware solution are discussed.


The Automatic Update Mechanism of Software Packages in Cloud Distribution System
Willaschek, Tomáš ; Jaroš, Jiří (referee) ; Crha, Adam (advisor)
This bachelor thesis deals with web services that are used to distribute software components. NXP is a manufacturer of semiconductor devices. NXP also produces software component and tools for development of embedded software applications for those semiconductor devices. The aim of this work is to create a process that will deploy the software components on the distribution website and to create tests to verify functionality of this process. The solution includes an analysis of the original process, with the intention of preventing known problems and thus building a more reliable and efficient solution.


Embedded Device for Control of Digital Audio Workstation
Svoboda, Tomáš ; Jaroš, Jiří (referee) ; Šimek, Václav (advisor)
The aim of this work is to design an architecture of the embedded device that will be used for controlling DAW software in recording studio. First of all, attention is given to a brief summary of the necessary knowledge which is needed to design such kind of device. Af ter that follows short survey of the existing solutions and description of protocols which can be used for communication with the recording software. Then, subsequent part of the thesis builds upon these foundations and further elaborates the device architecture by me ans of decomposing it into several modules. In fact, two hardware modules are designed and manufactured, when each of them is conceived on a separate PCB with its own microcon troller. Then the control firmware has been implemented for each of the modules. At the end of the work an aluminium enclosure, which holds both modules, is designed. The result of this work is a functional prototype of the assembled controller which can be used for the purpose of controlling DAW software.


Automatization of Analysis of Performance and Power Consumption
Rudolf, Tomáš ; Jaroš, Jiří (referee) ; Nikl, Vojtěch (advisor)
This thesis deals with increasing efficiency of supercomputers. Higher efficiency can be achieved by reducing frequency of processor if the algorithm does not slow down significantly. This thesis presents set of scripts designed to monitor consumption of processor along with scripts that visualize these measured values. It also allows easy control of processor frequency. The created solution gives user a capability to measure given algorithm efficiency and optimize computing power of specific computer exactly for the algorithm. Due to this work the user will be informed about whether it is advantageous to run his algorithm on one or other frequency of the processor.


Acceleration of Axisymetric Ultrasound Simulations
Kukliš, Filip ; Vaverka, Filip (referee) ; Jaroš, Jiří (advisor)
Simulácia šírenia ultrazvuku prostredníctvom mäkkých biologických tkanív má širokú škálu praktických aplikácií. Patria sem dizajn prevodníkov pre diagnostický a terapeutický ultrazvuk, vývoj nových metód spracovania signálov a zobrazovacích techník, štúdium anomálií ultrazvukových lúčov v heterogénnych médiách, ultrazvuková klasifikácia tkanív, učenie rádiológov používať ultrazvukové zariadenia a interpretáciu ultrazvukových obrazov, modelové vrstvenie medicínskeho obrazu a plánovanie liečby pre ultrazvuk s vysokou intenzitou. Ultrazvuková simulácia však predstavuje výpočtovo zložitý problém, pretože simulačné domény sú veľmi veľké v porovnaní s akustickými vlnovými dĺžkami, ktoré sú predmetom záujmu. Ale ak je problém osovo symetrický, problém môže byť riešený v 2D.To umožňuje spúšťanie simulácií na mriežke s väčším počtom bodov, s menším využitím výpoč tových zdrojov za kratšiu dobu. Táto práca modeluje a implementuje zrýchlenie vlnovej nelineárnej ultrazvukovej simulácie v axisymetrickom súradnicovom systéme realizovanom v Matlabe pomocou Mex súborov pre diskrétne sínové a kosínové transformácie. Axisymetrická simulácia bola implementovaná v C++ ako open source rozšírenie KWAVE toolboxu. Kód je optimalizovaný na beh na jednom uzle superpočítaču Salomon (IT4Innovations, Ostrava, Česká republika) s dvoma dvanásťjadrovými procesormi Intel Xeon E52680v3. Na maximalizáciu výpočtovej efektívnosti boli vykonané viaceré optimalizácie kódu. Po prvé, fourierové tramsformácie boli vypočítané pomocou realtocomplex FFT z knižnice FFTW. V porovnaní s complextocomplex FFT to znížilo čas výpočtu a pamäť spojenú s výpočtom FFT o takmer 50%. Taktiež diskrétne sínové a kosínové transformácie sa počítali pomocou knižnice FFTW, ktoré v Matlab verzii museli byť vyvolané z dynamicky načítaných MEX súborov. Po druhé, aby sa znížilo zaťaženie priepustnosti pamäte, boli všetky operácie počítané jednoduchej presnosti pohyblivej rádovej čiarky. Po tretie, elementárne operá cie boli paralelizované pomocou OpenMP a potom vektorizované pomocou rozšírení SIMD (SSE). Celkový výpočet C++ verzie je až do 34násobne rýchlejší a využíva menej ako tretinu pamäte ako Matlab verzia simulácie. Simulácia ktorá by trvala takmer dva dni tak môže byť vypočítaná za jeden a pol hodinu. Toto všetko umožňuje počítať simuláciu na výpočetnej mriežke s veľkosťou 16384 × 8192 bodov v primeranom čase.


Implementation of 2D Ultrasound Simulations
Šimek, Dominik ; Vaverka, Filip (referee) ; Jaroš, Jiří (advisor)
The work deals with design and implementation of 2D ultrasound simulation. Applications of the ultrasound simulation can be found in medicine, biophysic or image reconstruction. As an example of using the ultrasound simulation we can mention High Intensity Focused Ultrasound that is used for diagnosing and treating cancer. The program is part of the kWave toolbox designed for supercomputer systems, specifically for machines with shared memory architecture. The program is implemented in the C++ language and using OpenMP acceleration. Using the designed solution, it is possible to solve largescale simulations in 2D space. The work also deals with merging and unification of the 2D and 3D simulation using modern C++. A realistic example of use is ultrasound simulation in transcranial neuromodulation and neurostimulation in large domains, which have more than 16384x16384 grid points. Simulation of such size may take several days if we use the original MATLAB 2D kWave. Speedup of the new implementation is up to 8 on the Anselm and Salomon supercomputers.


Efficient Communication in MultiGPU Systems
Špeťko, Matej ; Jaroš, Jiří (referee) ; Vaverka, Filip (advisor)
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes single GPU performance is not enough for a scientific application and it needs to scale over multiple GPUs. During the computation, there is a need for the GPUs to exchange partial results. This communication represents computation overhead and it is important to research methods of the effective communication between GPUs. This means less CPU involvement, lower latency and shared system buffers. This thesis is focused on internode and intranode GPUtoGPU communication using GPUDirect technologies from Nvidia and CUDAAware MPI. Subsequently, kWave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDAAware MPI. Peertopeer transfer support is also integrated to kWave using CUDA Interprocess Communication.

 

Acceleration of Applications on a Supercomputer Using Python
Čelka, Marek ; Jaroš, Jiří (referee) ; Jaroš, Marta (advisor)
Nowadays, all computers we use are capable of parallel processing that saves time in computeintensive tasks such as scientific computations, various simulations or predictions. The theme of this thesis is acceleration of computeintensive tasks on supercomputer. This is achieved by the parallelization of the problem. For better understanding the issue by scientists from diverse scientific fields, the python programming language was chosen. Python is very powerful and easy to use as well. The first part of the thesis deals with the parallel processing techniques. The set of microtests was designed and implemented for this purpose. Results are then discussed and used in the further work. The second part of the thesis deals with the problem of parallel image reconstruction. For a comparison, the sequential version of the problem was also implemented. Both versions, sequential and parallel, were tested on a set of images of a different size. Experiments focus on acceleration, spent time, memory bandwidth and latency. These outcomes are also presented and discussed.


Acceleration of Python Applications on GPU
Turcel, Matej ; Jaroš, Jiří (referee) ; Jaroš, Marta (advisor)
Compiled languages, such as C++, are conventionally used in the field of high performance computing (HPC). However, scripting languages like Python are more convenient and application development is quicker and simpler in these languages. This work compares C++ and Python in terms of the possibilities of computation acceleration on graphics card. Its aim is to show that scripting languages are also suitable for the implementation of HPC applications, and point out their advantages and disadvantages compared to compiled languages. To this purpose, a number of programs have been implemented. Several smaller programs for testing purposes and a larger one, implementing a computationally intensive problem. The implementations of these programs in C++ and Python are compared in terms of performance, as well as difficulty of implementation.
