National Repository of Grey Literature 8 records found  Search took 0.01 seconds. 
Development and Programming of Low Power Cluster
Hradecký, Michal ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor)
This thesis deals with the building and programming of a low power cluster composed of Hardkernel Odroid XU4 kits based on ARM Cortex A15 and Cortex A7 chips. The goal was to design a simple cluster composed of multiple kits and run a set of benchmarks to analyze performance and power consumption. The test set consisted of HPL and Stream benchmarks and various tests for the MPI interface. The overall performance of the cluster composed of four kits in HPL benchmark was measured 23~GFLOP/s in double-precision. During this test, the cluster showed power efficiency about 0.58~GFLOP/W. The work also describes the installation of PBS Torque scheduler and HPC software build and installation framework EasyBuild on 32-bit ARM platform. The comparison with Anselm supercomputer showed that Odroid cluster is as effiecient as large supercomputer but with slightly higher price.
Neuroevolution Principles and Applications
Herec, Jan ; Strnadel, Josef (referee) ; Bidlo, Michal (advisor)
The theoretical part of this work deals with evolutionary algorithms (EA), neural networks (NN) and their synthesis in the form of neuroevolution. From a practical point of view, the aim of the work is to show the application of neuroevolution on two different tasks. The first task is the evolutionary design of the convolutional neural network (CNN) architecture that would be able to classify handwritten digits (from the MNIST dataset) with a high accurancy. The second task is the evolutionary optimization of neurocontroller for a simulated Falcon 9 rocket landing. Both tasks are computationally demanding and therefore have been solved on a supercomputer. As a part of the first task, it was possible to design such architectures which, when properly trained, achieve an accuracy of 99.49%. It turned out that it is possible to automate the design of high-quality architectures with the use of neuroevolution. Within the second task, the neuro-controller weights have been optimized so that, for defined initial conditions, the model of the Falcon booster can successfully land. Neuroevolution succeeded in both tasks.
Interest-Point Detection on CUDA
Ryba, Jan ; Řezníček, Ivo (referee) ; Herout, Adam (advisor)
Corner point detection is one of many functions used in computer vision for tasks such as tracking, detecting objects, comparing images and much more. Many of the algorithms are complex and require a lot of CPU time. This is where the CUDA platform comes in. CUDA kernels run parallely on graphic accelerators can rapidly decrease time needed for execution, allowing even these complex calculations to work in real time or even better. Text focuces on Moravec and Harris corner detection algorithms and their effective implementation on CUDA. Examination of potetntial and performance of CUDA platform is also importatnt.
Large-scale Ultrasound Simulations using Accelerated Clusters
Vaverka, Filip ; Boehm, Christian (referee) ; Říha, Lubomír (referee) ; Jaroš, Jiří (advisor)
Efektivní využití akcelerovaných HPC clusterů je obzvlášť závislé na efektivitě komunikace použitých algoritmů. Tato práce se tedy věnuje přezkoumání pseudo-spektrálních algorimů používaných pro řešení vlnových problémů převážně v oblasti medicínského ultrazvuku s cílem umožnit jejich běh na akcelerovaných strojích. Je ukázáno, že doménová dekompozice je preferovaný způsob dosažení daného cíle, jelikož řada alternativních přístupů vykazuje výrazně horší numerické vlastnosti. Na základě tohoto přístupu a k-Wave modelu ultrazvuku, široce používaného v medicíně, je navržen nový simulační algoritmus. Následnými experimenty je ukázáno, že tento přístup dosahuje až 7.5x zrychlení a dosahuje téměř perfektního slabého škálování až do 512 GPU akcelerovaných uzlů. Zároveň toto řešení umožňuje plné využití výpočetních uzlů s několika GPU akcelerátory a pokročilým propojením jako je NVIDIA DGX-2 s NVLink. Tato metoda také nabízí možnost flexibilní volby mezi přesností a efektivitou. Volbou hloubky překryvu subdomén lze dosáhnout jak přesnosti srovnatelné s původní k-Space metodou, tak i maximalizovat výkon při zachování dostatečné přesnosti.
Neuroevolution Principles and Applications
Herec, Jan ; Strnadel, Josef (referee) ; Bidlo, Michal (advisor)
The theoretical part of this work deals with evolutionary algorithms (EA), neural networks (NN) and their synthesis in the form of neuroevolution. From a practical point of view, the aim of the work is to show the application of neuroevolution on two different tasks. The first task is the evolutionary design of the convolutional neural network (CNN) architecture that would be able to classify handwritten digits (from the MNIST dataset) with a high accurancy. The second task is the evolutionary optimization of neurocontroller for a simulated Falcon 9 rocket landing. Both tasks are computationally demanding and therefore have been solved on a supercomputer. As a part of the first task, it was possible to design such architectures which, when properly trained, achieve an accuracy of 99.49%. It turned out that it is possible to automate the design of high-quality architectures with the use of neuroevolution. Within the second task, the neuro-controller weights have been optimized so that, for defined initial conditions, the model of the Falcon booster can successfully land. Neuroevolution succeeded in both tasks.
Development and Programming of Low Power Cluster
Hradecký, Michal ; Nikl, Vojtěch (referee) ; Jaroš, Jiří (advisor)
This thesis deals with the building and programming of a low power cluster composed of Hardkernel Odroid XU4 kits based on ARM Cortex A15 and Cortex A7 chips. The goal was to design a simple cluster composed of multiple kits and run a set of benchmarks to analyze performance and power consumption. The test set consisted of HPL and Stream benchmarks and various tests for the MPI interface. The overall performance of the cluster composed of four kits in HPL benchmark was measured 23~GFLOP/s in double-precision. During this test, the cluster showed power efficiency about 0.58~GFLOP/W. The work also describes the installation of PBS Torque scheduler and HPC software build and installation framework EasyBuild on 32-bit ARM platform. The comparison with Anselm supercomputer showed that Odroid cluster is as effiecient as large supercomputer but with slightly higher price.
Interest-Point Detection on CUDA
Ryba, Jan ; Řezníček, Ivo (referee) ; Herout, Adam (advisor)
Corner point detection is one of many functions used in computer vision for tasks such as tracking, detecting objects, comparing images and much more. Many of the algorithms are complex and require a lot of CPU time. This is where the CUDA platform comes in. CUDA kernels run parallely on graphic accelerators can rapidly decrease time needed for execution, allowing even these complex calculations to work in real time or even better. Text focuces on Moravec and Harris corner detection algorithms and their effective implementation on CUDA. Examination of potetntial and performance of CUDA platform is also importatnt.
Generating Complex Procedural Terrains Using the GPU
Ryba, Jan ; Bartoň, Radek (referee) ; Herout, Adam (advisor)
Generating fully 3D terrains is a dificult task, meaning that we need to store a lot of data or do a lot of computing or both. We can reduce or completly eliminate the data srorage by using a procedural approch, but this is where the problem gets realy computationaly costly and the CUDA platform comes in. CUDA kernels runinng parallely on graphic accelerators can rapidly decrease time needed for computation, allowing even these complex calculations to work in real time or even better. Finding its use in game or movie industry.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.