National Repository of Grey Literature 8 records found  Search took 0.00 seconds. 
Parallel Application Development with Intel Threading Tools
Vadkerti, Ladislav ; Jaroš, Jiří (referee) ; Dvořák, Václav (advisor)
Today's trend in microprocessor design is increasing the number of execution cores within one single chip. Increasing the processor's clock speed reached its limit with growing power consumption. This trend brings new opportunities to software developers, as they can take advantage of real multithreading in their applications. But a lot of new problems to solve appear with threading compared to sequential programming. With proper design, threading can enhance performance by making better use of hardware resources. However, the improper use of threading can lead to performance degradation, unpredictible behavior, or error conditions that are difficult to solve. For this reason Intel developed a suite of tools, that can help software developers to analyze performance and detect coding errors in thread interactions. This thesis focuses on the examination of ways that this tools can be used in multithreaded application development.
Neural Network Implementation without Multiplication
Slouka, Lukáš ; Baskar, Murali Karthick (referee) ; Szőke, Igor (advisor)
The subject of this thesis is neural network acceleration with the goal of reducing the number of floating point multiplications. The theoretical part of the thesis surveys current trends and methods used in the field of neural network acceleration. However, the focus is on the binarization techniques which allow replacing multiplications with logical operators. The theoretical base is put into practice in two ways. First is the GPU implementation of crucial binary operators in the Tensorflow framework with a performance benchmark. Second is an application of these operators in simple image classifier. Results are certainly encouraging. Implemented operators achieve speed-up by a factor of 2.5 when compared to highly optimized cuBLAS operators. The last chapter compares accuracies achieved by binarized models and their full-precision counterparts on various architectures.
Efficient Implementation of High Performance Algorithms on Intel Xeon Phi
Šimek, Dominik ; Hrbáček, Radek (referee) ; Jaroš, Jiří (advisor)
This thesis is dedicated to the implementation of high performance algorithms on the Intel Xeon Phi coprocessor. The Xeon phi was introduced by Intel as a new MIC (Many Integrated Core) architecture in 2012. The theoretical part of the thesis is focused on the architecture of the coprocessor (with peak performance of 2 tFLOPS for a single precision data) and on the procedure of algorithms implementation and optimization. The theoretical knowledge is then applied to a practical examples with demonstration of the implementation and  the optimization of algorithms and work with the coprocessor. In the practical part of the thesis, simple benchmarks such as a vector matrix multiplication and a matrix multiplication are explained and implemented. In the first benchmark 6.5% of theoretical coprocessor performance was achieved, in the second it was much more. In following chapter a more complex benchmark - simulation of a particles system (N-Body), that reached more than 35% of coprocessor performance (725 gFLOPS), is discussed. The following section is dedicated to some interesting problems such as optimization of a MATLAB module k-Wave (propagation  of the ultrasound waves), extraction of I-vector (speech processing), cross-compilation of existing libraries, modules and programs. In the conclusion of the thesis the usage the potential of the Intel Xeon Phi is evaluated.
Neural Network Implementation without Multiplication
Slouka, Lukáš ; Baskar, Murali Karthick (referee) ; Szőke, Igor (advisor)
The subject of this thesis is neural network acceleration with the goal of reducing the number of floating point multiplications. The theoretical part of the thesis surveys current trends and methods used in the field of neural network acceleration. However, the focus is on the binarization techniques which allow replacing multiplications with logical operators. The theoretical base is put into practice in two ways. First is the GPU implementation of crucial binary operators in the Tensorflow framework with a performance benchmark. Second is an application of these operators in simple image classifier. Results are certainly encouraging. Implemented operators achieve speed-up by a factor of 2.5 when compared to highly optimized cuBLAS operators. The last chapter compares accuracies achieved by binarized models and their full-precision counterparts on various architectures.
Parallel Application Development with Intel Threading Tools
Vadkerti, Ladislav ; Jaroš, Jiří (referee) ; Dvořák, Václav (advisor)
Today's trend in microprocessor design is increasing the number of execution cores within one single chip. Increasing the processor's clock speed reached its limit with growing power consumption. This trend brings new opportunities to software developers, as they can take advantage of real multithreading in their applications. But a lot of new problems to solve appear with threading compared to sequential programming. With proper design, threading can enhance performance by making better use of hardware resources. However, the improper use of threading can lead to performance degradation, unpredictible behavior, or error conditions that are difficult to solve. For this reason Intel developed a suite of tools, that can help software developers to analyze performance and detect coding errors in thread interactions. This thesis focuses on the examination of ways that this tools can be used in multithreaded application development.
Using GPU for HPC
Máček, Branislav ; Szőke, Igor (referee) ; Kašpárek, Tomáš (advisor)
Recently there was a significant grow in building HPC systems. Nowadays they are building from mainstream computer components. One of them is graphics accelerators with GPU. This thesis deals with description of graphics accelerators. It examines possibilities usage. GPU chip has hundreds simple processors. This thesis examine possibilities how to benefit from these parallel processors. It contains description of several testing applications, discuss results from experiments and compares them with another components used for HPC.
Efficient Implementation of High Performance Algorithms on Intel Xeon Phi
Šimek, Dominik ; Hrbáček, Radek (referee) ; Jaroš, Jiří (advisor)
This thesis is dedicated to the implementation of high performance algorithms on the Intel Xeon Phi coprocessor. The Xeon phi was introduced by Intel as a new MIC (Many Integrated Core) architecture in 2012. The theoretical part of the thesis is focused on the architecture of the coprocessor (with peak performance of 2 tFLOPS for a single precision data) and on the procedure of algorithms implementation and optimization. The theoretical knowledge is then applied to a practical examples with demonstration of the implementation and  the optimization of algorithms and work with the coprocessor. In the practical part of the thesis, simple benchmarks such as a vector matrix multiplication and a matrix multiplication are explained and implemented. In the first benchmark 6.5% of theoretical coprocessor performance was achieved, in the second it was much more. In following chapter a more complex benchmark - simulation of a particles system (N-Body), that reached more than 35% of coprocessor performance (725 gFLOPS), is discussed. The following section is dedicated to some interesting problems such as optimization of a MATLAB module k-Wave (propagation  of the ultrasound waves), extraction of I-vector (speech processing), cross-compilation of existing libraries, modules and programs. In the conclusion of the thesis the usage the potential of the Intel Xeon Phi is evaluated.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.