National Repository of Grey Literature 6 records found  Search took 0.01 seconds. 
High Performance Applications on Intel Xeon Phi Cluster
Kačurik, Tomáš ; Hrbáček, Radek (referee) ; Jaroš, Jiří (advisor)
The main topic of this thesis is the implementation and subsequent optimization of high performance applications on a cluster of Intel Xeon Phi coprocessors. Using two approaches to solve the N-Body problem, the possibilities of the program execution on a cluster of processors, coprocessors or both device types have been demonstrated. Two particular versions of the N-Body problem have been chosen - the naive and Barnes-hut. Both problems have been implemented and optimized. For better comparison of the achieved results, we only considered achieved acceleration against single node runs using processors only. In the case of the naive version a 15-fold increase has been achieved when using combination of processors and coprocessors on 8 computational nodes. The performance in this case was 9 TFLOP/s. Based on the obtained results we concluded the advantages and disadvantages of the program execution in the distributed environments using processors, coprocessors or both.
Efficient Implementation of High Performance Algorithms on Intel Xeon Phi
Šimek, Dominik ; Hrbáček, Radek (referee) ; Jaroš, Jiří (advisor)
This thesis is dedicated to the implementation of high performance algorithms on the Intel Xeon Phi coprocessor. The Xeon phi was introduced by Intel as a new MIC (Many Integrated Core) architecture in 2012. The theoretical part of the thesis is focused on the architecture of the coprocessor (with peak performance of 2 tFLOPS for a single precision data) and on the procedure of algorithms implementation and optimization. The theoretical knowledge is then applied to a practical examples with demonstration of the implementation and  the optimization of algorithms and work with the coprocessor. In the practical part of the thesis, simple benchmarks such as a vector matrix multiplication and a matrix multiplication are explained and implemented. In the first benchmark 6.5% of theoretical coprocessor performance was achieved, in the second it was much more. In following chapter a more complex benchmark - simulation of a particles system (N-Body), that reached more than 35% of coprocessor performance (725 gFLOPS), is discussed. The following section is dedicated to some interesting problems such as optimization of a MATLAB module k-Wave (propagation  of the ultrasound waves), extraction of I-vector (speech processing), cross-compilation of existing libraries, modules and programs. In the conclusion of the thesis the usage the potential of the Intel Xeon Phi is evaluated.
Parallelisation of Ultrasound Simulations on Intel Xeon Phi Accelerator
Vrbenský, Andrej ; Hrbáček, Radek (referee) ; Jaroš, Jiří (advisor)
Nowadays, the simulation of ultrasound acoustic waves has a wide range of practical usage. As one of them we can name the simulation in realistic tissue media, which is successfully used in medicine. There are several software applications dedicated to perform such simulations. k-Wave is one of them. The computational difficulty of the simulation itself is very high, and this leaves a space to explore new speed-up methods. In this master's thesis, we proposed a way to speed-up the simulation based on parallelization using Intel Xeon Phi accelerator. The accelerator contains large amount of cores and an extra-wide vector unit, and therefore, is ideal for purpose of parallelization and vectorization. The implementation is using OpenMP version 4.0, which brings some new options such as explicit vectorization. Results were measured during extensive experiments.
Parallelisation of Ultrasound Simulations on Intel Xeon Phi Accelerator
Vrbenský, Andrej ; Hrbáček, Radek (referee) ; Jaroš, Jiří (advisor)
Nowadays, the simulation of ultrasound acoustic waves has a wide range of practical usage. As one of them we can name the simulation in realistic tissue media, which is successfully used in medicine. There are several software applications dedicated to perform such simulations. k-Wave is one of them. The computational difficulty of the simulation itself is very high, and this leaves a space to explore new speed-up methods. In this master's thesis, we proposed a way to speed-up the simulation based on parallelization using Intel Xeon Phi accelerator. The accelerator contains large amount of cores and an extra-wide vector unit, and therefore, is ideal for purpose of parallelization and vectorization. The implementation is using OpenMP version 4.0, which brings some new options such as explicit vectorization. Results were measured during extensive experiments.
High Performance Applications on Intel Xeon Phi Cluster
Kačurik, Tomáš ; Hrbáček, Radek (referee) ; Jaroš, Jiří (advisor)
The main topic of this thesis is the implementation and subsequent optimization of high performance applications on a cluster of Intel Xeon Phi coprocessors. Using two approaches to solve the N-Body problem, the possibilities of the program execution on a cluster of processors, coprocessors or both device types have been demonstrated. Two particular versions of the N-Body problem have been chosen - the naive and Barnes-hut. Both problems have been implemented and optimized. For better comparison of the achieved results, we only considered achieved acceleration against single node runs using processors only. In the case of the naive version a 15-fold increase has been achieved when using combination of processors and coprocessors on 8 computational nodes. The performance in this case was 9 TFLOP/s. Based on the obtained results we concluded the advantages and disadvantages of the program execution in the distributed environments using processors, coprocessors or both.
Efficient Implementation of High Performance Algorithms on Intel Xeon Phi
Šimek, Dominik ; Hrbáček, Radek (referee) ; Jaroš, Jiří (advisor)
This thesis is dedicated to the implementation of high performance algorithms on the Intel Xeon Phi coprocessor. The Xeon phi was introduced by Intel as a new MIC (Many Integrated Core) architecture in 2012. The theoretical part of the thesis is focused on the architecture of the coprocessor (with peak performance of 2 tFLOPS for a single precision data) and on the procedure of algorithms implementation and optimization. The theoretical knowledge is then applied to a practical examples with demonstration of the implementation and  the optimization of algorithms and work with the coprocessor. In the practical part of the thesis, simple benchmarks such as a vector matrix multiplication and a matrix multiplication are explained and implemented. In the first benchmark 6.5% of theoretical coprocessor performance was achieved, in the second it was much more. In following chapter a more complex benchmark - simulation of a particles system (N-Body), that reached more than 35% of coprocessor performance (725 gFLOPS), is discussed. The following section is dedicated to some interesting problems such as optimization of a MATLAB module k-Wave (propagation  of the ultrasound waves), extraction of I-vector (speech processing), cross-compilation of existing libraries, modules and programs. In the conclusion of the thesis the usage the potential of the Intel Xeon Phi is evaluated.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.