National Repository of Grey Literature 29 records found  previous11 - 20next  jump to record: Search took 0.01 seconds. 
Overclocking of Modern Processors with an Emphasis on Performance, Power Consumption and Temperature
Kelečéni, Jakub ; Vaverka, Filip (referee) ; Nikl, Vojtěch (advisor)
This thesis analyzes the dependency of performance, power consumption and temperature on processor frequency. Theoretical part discusses the processor architecture, benchmarks and algorithm types. Experimental part is focused on  benchmarks - matrix multiplication, Quicksort, PI number calculation, Ackermann function, LAMMPS, PMBW, Linpack. This set of benchmarks includes both single-threaded and multi-threaded algorithms. Testing consist of three different settings of processor frequency. Multi-threaded benchmarks using different number of threads. Informations regarding the power consumption of CPU and RAM were recorded during these tests. Every test logs his running time. The impact of parallelization on power consumption and runtime is also reflected. Results from the tests are shown in charts and tables. The proper configuration of CPU for each given algorithm is analyzed in conclusion.
Efficient Communication in Multi-GPU Systems
Špeťko, Matej ; Jaroš, Jiří (referee) ; Vaverka, Filip (advisor)
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes single GPU performance is not enough for a scientific application and it needs to scale over multiple GPUs. During the computation, there is a need for the GPUs to exchange partial results. This communication represents computation overhead and it is important to research methods of the effective communication between GPUs. This means less CPU involvement, lower latency and shared system buffers. This thesis is focused on inter-node and intra-node GPU-to-GPU communication using GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently, k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI. Peer-to-peer transfer support is also integrated to k-Wave using CUDA Inter-process Communication.
Efficient Communication in Multi-GPU Systems
Špeťko, Matej ; Jaroš, Jiří (referee) ; Vaverka, Filip (advisor)
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes the performance or the memory capacity of a single GPU is not enough for a scientific application. The application needs to be scaled into multiple GPUs. During the computation there is need for the GPUs to exchange partial results. This communication represents computation overhead. For this reason it is important to research the methods of the effective communication between GPUs. This means less CPU involvement, lower latency, shared system buffers. Inter-node and intra-node communication is examined. The main focus is on GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI.
Large-scale Ultrasound Simulations using Accelerated Clusters
Vaverka, Filip ; Boehm, Christian (referee) ; Říha, Lubomír (referee) ; Jaroš, Jiří (advisor)
Efektivní využití akcelerovaných HPC clusterů je obzvlášť závislé na efektivitě komunikace použitých algoritmů. Tato práce se tedy věnuje přezkoumání pseudo-spektrálních algorimů používaných pro řešení vlnových problémů převážně v oblasti medicínského ultrazvuku s cílem umožnit jejich běh na akcelerovaných strojích. Je ukázáno, že doménová dekompozice je preferovaný způsob dosažení daného cíle, jelikož řada alternativních přístupů vykazuje výrazně horší numerické vlastnosti. Na základě tohoto přístupu a k-Wave modelu ultrazvuku, široce používaného v medicíně, je navržen nový simulační algoritmus. Následnými experimenty je ukázáno, že tento přístup dosahuje až 7.5x zrychlení a dosahuje téměř perfektního slabého škálování až do 512 GPU akcelerovaných uzlů. Zároveň toto řešení umožňuje plné využití výpočetních uzlů s několika GPU akcelerátory a pokročilým propojením jako je NVIDIA DGX-2 s NVLink. Tato metoda také nabízí možnost flexibilní volby mezi přesností a efektivitou. Volbou hloubky překryvu subdomén lze dosáhnout jak přesnosti srovnatelné s původní k-Space metodou, tak i maximalizovat výkon při zachování dostatečné přesnosti.
GPU-Accelerated Design of Optically Generated Ultrasound Using Binary Amplitude Holograms
Knotek, Martin ; Vaverka, Filip (referee) ; Jaroš, Jiří (advisor)
In this thesis, we deal with the possibilities of the acceleration of scientific computations using the graphical processing unit. The term scientific computation in this context means an algorithm, which computes binary holograms that are used to generate ultrasound. We will concentrate specifically on the design of the hologram, focusing at the speed we can achieve when computing the surface of the hologram. For this purpose, we will use two popular parallel data processing platforms - CUDA and OpenMP. The surface design pattern of the hologram is important due to the fact, that it determines the hologram’s specific physical characteristics.
Acceleration of Ultrasound Simulations on Multi-GPU Systems
Stodůlka, Martin ; Vaverka, Filip (referee) ; Jaroš, Jiří (advisor)
The main focus of this project is usage of multi - GPU systems and usage of CUDA unified memory . Its goal is to accelerate computation of 2D and 3D FFT, which is the main part of simulations in k- Wave library .K- Wave is a C++/ Matlab library used for simulations of propagation of ultrasonic waves in 1D , 2D or 3D space . Acceleration of these functions is necessary , because the simulations are computationally intensive .
Non-Blocking Input/Output for the k-Wave Toolbox
Kondula, Václav ; Vaverka, Filip (referee) ; Jaroš, Jiří (advisor)
This thesis deals with an implementation of non-blocking I/O interface for the k-Wave project, which is designed for time-domain simulation of ultrasound propagation. Main focus is on large domain simulations that, due to high computing power requirements, must run on supercomputers and produce tens of GB of data in a single simulation step. In this thesis, I have designed and implemented a non-blocking interface for storing data using dedicated threads, which allows to overlap simulation calculations with disk operations in order to speed up the simulation. An acceleration of up to 33% was achieved compared to the current implementation of project k-Wave, which resulted, among other things, also to reduce cost of the simulation.
Efficient Communication in Multi-GPU Systems
Špeťko, Matej ; Jaroš, Jiří (referee) ; Vaverka, Filip (advisor)
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes the performance or the memory capacity of a single GPU is not enough for a scientific application. The application needs to be scaled into multiple GPUs. During the computation there is need for the GPUs to exchange partial results. This communication represents computation overhead. For this reason it is important to research the methods of the effective communication between GPUs. This means less CPU involvement, lower latency, shared system buffers. Inter-node and intra-node communication is examined. The main focus is on GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI.
Deep Neural Networks Approximation
Stodůlka, Martin ; Mrázek, Vojtěch (referee) ; Vaverka, Filip (advisor)
The goal of this work is to find out the impact of approximated computing on accuracy of deep neural network, specifically neural networks for image classification. A version of framework Caffe called Ristretto-caffe was chosen for neural network implementation, which was extended for the use of approximated operations. Approximated computing was used for multiplication in forward pass for convolution. Approximated components from Evoapproxlib were chosen for this work.
Acceleration of Axisymetric Ultrasound Simulations
Kukliš, Filip ; Vaverka, Filip (referee) ; Jaroš, Jiří (advisor)
Simulácia šírenia ultrazvuku prostredníctvom mäkkých biologických tkanív má širokú škálu praktických aplikácií. Patria sem dizajn prevodníkov pre diagnostický a terapeutický ultrazvuk, vývoj nových metód spracovania signálov a zobrazovacích techník, štúdium anomálií ultrazvukových lúčov v heterogénnych médiách, ultrazvuková klasifikácia tkanív, učenie rádiológov používať ultrazvukové zariadenia a interpretáciu ultrazvukových obrazov, modelové vrstvenie medicínskeho obrazu a plánovanie liečby pre ultrazvuk s vysokou intenzitou. Ultrazvuková simulácia však predstavuje výpočtovo zložitý problém, pretože simulačné domény sú veľmi veľké v porovnaní s akustickými vlnovými dĺžkami, ktoré sú predmetom záujmu. Ale ak je problém osovo symetrický, problém môže byť riešený v 2D.To umožňuje spúšťanie simulácií na mriežke s väčším počtom bodov, s menším využitím výpoč- tových zdrojov za kratšiu dobu. Táto práca modeluje a implementuje zrýchlenie vlnovej nelineárnej ultrazvukovej simulácie v axisymetrickom súradnicovom systéme realizovanom v Matlabe pomocou Mex súborov pre diskrétne sínové a kosínové transformácie. Axisymetrická simulácia bola implementovaná v C++ ako open source rozšírenie K-WAVE toolboxu. Kód je optimalizovaný na beh na jednom uzle superpočítaču Salomon (IT4Innovations, Ostrava, Česká republika) s dvoma dvanásť-jadrovými procesormi Intel Xeon E5-2680v3. Na maximalizáciu výpočtovej efektívnosti boli vykonané viaceré optimalizácie kódu. Po prvé, fourierové tramsformácie boli vypočítané pomocou real-to-complex FFT z knižnice FFTW. V porovnaní s complex-to-complex FFT to znížilo čas výpočtu a pamäť spojenú s výpočtom FFT o takmer 50%. Taktiež diskrétne sínové a kosínové transformácie sa počítali pomocou knižnice FFTW, ktoré v Matlab verzii museli byť vyvolané z dynamicky načítaných MEX súborov. Po druhé, aby sa znížilo zaťaženie priepustnosti pamäte, boli všetky operácie počítané jednoduchej presnosti pohyblivej rádovej čiarky. Po tretie, elementárne operá- cie boli paralelizované pomocou OpenMP a potom vektorizované pomocou rozšírení SIMD (SSE). Celkový výpočet C++ verzie je až do 34-násobne rýchlejší a využíva menej ako tretinu pamäte ako Matlab verzia simulácie. Simulácia ktorá by trvala takmer dva dni tak môže byť vypočítaná za jeden a pol hodinu. Toto všetko umožňuje počítať simuláciu na výpočetnej mriežke s veľkosťou 16384 × 8192 bodov v primeranom čase.

National Repository of Grey Literature : 29 records found   previous11 - 20next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.