National Repository of Grey Literature 44 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Intel Integrated Performance Primitives and their use in application development
Machač, Jiří ; Přinosil, Jiří (referee) ; Malý, Jan (advisor)
The aim of the presented work is to demonstrate and evaluate the contribution of computing system SIMD especially units MMX, SSE, SSE2, SSE3, SSSE3 and SSE4 from Intel company, by creation of demostrating applications with using Intel Integrated Performance Primitives library. At first, possibilities of SIMD programming using intrinsic function, vektorization and libraries Intel Integrated Performance Primitives are presented, as next are descibed options of evaluation of particular algorithms. Finally procedure of programing by using Intel Integrated Performance Primitives library are ilustrated.
GPU Image Processing Library
Čermák, Michal ; Španěl, Michal (referee) ; Smrž, Pavel (advisor)
This work is concerned with architecture of recent Nvidia graphics cards and application programming interface CUDA. That is used to create accelerated image processing library. It place emphasis on testing performance gain compassion with high optimized and used OpenCv library.
Generating Code of Optimised Mathematical Operations
Beneš, Vojtěch ; Horáček, Petr (referee) ; Čermák, Martin (advisor)
Bachelor's thesis deals with creating a simple programming language for working with mathematical operations. Main point of the thesis is to create a compiler of this language, which is using MMX technology to generate instructions of an assembler code. The optimized code generation is based on modified algorithm of context generation.
The Efficient Implementation of the Genetic Algorithm Using Multicore Processors
Kouřil, Miroslav ; Žaloudek, Luděk (referee) ; Jaroš, Jiří (advisor)
This diploma thesis deals with acceleration of advanced genetic algorithm. For implementation, discrete and continuos versions of UMDA genetic algorithm were chosen. The main part of the acceleration is the utilization of SSE instruction set. Using this set, the functions for calculating fitness and new population sampling were accelerated in particular. Then the pseudorandom number generator that also uses SSE instruction set was implemented.  The discrete algorithm reached the speed of up to 4,6 after this implementation. Finally, the algorithms were modified so that the system  OpenMP could be used, which enables the running of blocks of code in more threads. The continuous version of algorithm is not convenient for parallelization, because computational complexity of that algorithm is low. In comparison, the discrete versions of algorithm are really appropriate for parallelization. Both the implemented versions reached the total acceleration of up to 4,9 and 7,2. 
.NET's LINQ Optimization
Šerý, Daniel ; Ryšavý, Ondřej (referee) ; Pluskal, Jan (advisor)
This thesis deals with LINQ (Language integrated query) and investigates possibilities of its implementation and optimization in C# language. Method of rewriting of query to procedural code is chosen and implemented. The goal is to provide a LINQ that can be used in code with the need for high speed.          Regarding the program created for rewriting LINQ queries, the performance of most operators has been increased by 1.2x to 20x of System.Linq speed depending of rewritten algorithm, data source and provided information to rewriting program.
SIMD Instructions Support in LLVM Compiler
Šnobl, Pavel ; Hynek, Jiří (referee) ; Masařík, Karel (advisor)
This bachelor thesis deals with support of automatic vectorization of code in the LLVM compilation framework and with extension of Codix processor model of SIMD instructions. As a result, LLVM is able to create reports about the process of auto-vectorization and it is possible to use special pragma directives to provide the compiler with additional information for optimizations of programs. Also a way of providing information about architectures of processors created using development environment Codasip Framework, needed for more effective vectorization, is introduced and implemented. Finally a set of integer vector instructions and related new registers for Codix is chosen and added to the model.
Compilation of OpenCL Applications for Embedded Systems
Šnobl, Pavel ; Čekan, Ondřej (referee) ; Hruška, Tomáš (advisor)
This master's thesis deals with the support for compilation and execution of programs written using OpenCL framework on embedded systems. OpenCL is a system for programming heterogeneous systems comprising processors, graphic accelerators and other computing devices. But it also finds usage on systems composed of just one computing unit, where it allows to write parallel programs (task and data parallelism) and work with hierarchical system of memories. In this thesis, various available open source OpenCL implementations are compared and one selected is then integrated into LLVM compiler infrastructure. This compiler is generated as a part of toolchain provided by application specific instruction set architecture processor developement environment called Codasip Studio. Designed and implemented are also optimizations for architectures with SIMD instructions and VLIW architectures. The result is tested and demonstrated on a set of testing applications.
Paralelized image processing library
Fuksa, Tomáš ; Macho, Tomáš (referee) ; Petyovský, Petr (advisor)
This work deals with parallel computing on modern processors - multi-core CPU and GPU. The goal is to learn about computing on this devices suitable for parallelization, define their advantages and disadvantages, test their properties in examples and select appropriate tools to implement a library for parallel image processing. This library is going to be used for the vanishing point estimation in the path finding mobile robot.
Optimization of Voice Recognition for Mobile Devices
Tomec, Martin ; Zbořil, František (referee) ; Hanáček, Petr (advisor)
This work deals with optimization of keyword spotting algorithms   on processor architecture ARM Cortex-A8. At first it describes this    architecture and especially the NEON unit for vector computing.   In addition it briefly describes keyword spotting algorithms and also there is proposed optimization of these algorithms for described architecture. Main part of this work is implementation of these optimizations and analysis of their impact on performance.
Efficient Implementation of High Performance Algorithms on Multi-Core Processors
Tomečko, Lukáš ; Bidlo, Michal (referee) ; Jaroš, Jiří (advisor)
This thesis describes the process of parallelization and vectorization of fluid simulation using OpenMP library and Intel compiler. Various approaches were tried e.g. cache blocking, data sorting and data reorganization. By combining the best of them, final application preformed 11.4 times faster than the original one, using 16 cores. Benchmarks show that used algorithms are not suitable for vectorization.

National Repository of Grey Literature : 44 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.