keywords:"Posilované učení" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"Posilované učení"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Methods for Playing the Game 'Liar's Dice' Using Dynamic Programming Lohn, Marek ; Šátek, Václav (referee) ; Zbořil, František (advisor) This project is about Methods of playing game Liar’s Dice using dynamic programming. The algorithm that was chosen for my study is SARSA, short for State Action Reward State Action algorithm. It is a modified version of algorithm named Q-Learning. It comparing algorithm SARSA with other algorithms by letting them play against each other in application, that was made in Unity Engine. Algorithms that were compared to SARSA are Q-Learning and Counterfactual Regret Minimization. SARSA achieved a 69,147 % win ratio in a game against Q-Learning. In games against Counterfactual Regret Minimization it was only 25 % win ratio. The main outcome of this study is that modified SARSA is effective against Q-Learning algorithm in a game of Liar’s Dice. On the other hand the SARSA algorithm was very ineffective against the Counterfactual Regret Minimization algorithm. Detailed record
	Heuristics for the Scotland Yard Board Game Cejpek, Michal ; Zbořil, František (referee) ; Zbořil, František (advisor) This thesis explores the possibility of using deep and reinforcement learning algorithms to solve problems with incomplete information. The main algorithm under investigation is PPO – Proximal Policy Optimization. In order to test the suitability of the PPO algorithm, a simplified implementation of the Scotland Yard game was created as well as an environment for training and testing the algorithms. From performed experiments, it emerged that the PPO algorithm is very suitable for solving problems with incomplete information. The agents very quickly gained a sense of the game’s goals and built appropriate strategies to meet those goals through training. Detailed record
	Using Reinforcement learning and inductive synthesis for designing robust controllers in POMDPs Hudák, David ; Holík, Lukáš (referee) ; Češka, Milan (advisor) Jednou ze současných výzev při sekvenční rozhodováním je práce s neurčitostí, která je způsobena nepřesnými senzory či neúplnou informací o prostředích, ve kterých bychom chtěli dělat rozhodnutí. Tato neurčitost je formálně popsána takzvanými částečně pozorovatelnými Markovskými rozhodovacími procesy (POMDP), které oproti Markovským rozhodovacím procesům (MDP) nahrazují informaci o konkrétním stavu nepřesným pozorováním. Pro rozhodování v takových prostředích je nutno nějakým způsobem odhadovat současný stav a obecně tvorba optimálních politik v takových prostředích není rozhodnutelná. K vyrovnání se s touto výzvou existují dva zcela odlišné přístupy, kdy lze k problému přistupovat úplnými formálními metodami, a to buď s pomocí výpočtu beliefů či syntézou konečně stavových kontrolérů, nebo metodami založenými na nepřesné aproximaci současného stavu, reprezentované především hlubokým zpětnovazebným učením. Zatímco formální přístupy jsou schopné dělat verifikovatelná a robustní rozhodnutí pro malá prostředí, tak zpětnovazebné učení je schopné škálovat na reálné problémy. Tato práce se pak soustředí na spojení těchto dvou odlišných přístupů, kdy navrhuje různé metody jak pro interpretaci výsledku, tak pro vzájemné předávání nápověd. Experimenty v této práci ukazují, že z této symbiózy mohou těžit oba přístupy, ale také že zvolený přístup ke trénování agentů už sám o sobě řádově překonává současné systémy pro trénování agentů na podobných úlohách. Detailed record
	Methods for Playing the Game 'Liar's Dice' Using Dynamic Programming Lohn, Marek ; Šátek, Václav (referee) ; Zbořil, František (advisor) This project is about Methods of playing game Liar's Dice using dynamic programming. The algorithm that I chose for my study is SARSA, short for State Action Reward State Action algorithm. It is a modified version of algorithm named Q-Learning. I compared SARSA with other algorithms by letting them play against each other in application that I made in Unity Engine. Algorithms that I compared to SARSA are Q-Learning and Counterfactual Regret Minimization. I achieved a 69,147\,\% win ratio in a game against Q-Learning. In games against Counterfactual Regret Minimization it was only 29,84\,\% win ratio. The main outcome of this study is that SARSA, modified version of Q-Learning is effective against Q-Learning algorithm. On the other hand the SARSA algorithm was very ineffective against the Counterfactual Regret Minimization algorithm. Detailed record
	Multiagentní systém učící se maximalizovat komfort uživatelů v rámci Smart Home Čábela, Radek ; Zbořil, František (referee) ; Janoušek, Vladimír (advisor) This thesis comes with a solution, how to work with feedback, Smart Home devices and "agents" in a way that minimizes direct Smart Home parameters changes coming from house inhabitants and therefore increases their comfort. Resulting simulation demonstrating the funcionality of the system design is focused on problematics regarding changing temperature inside of a house. Detailed record
	Vehicle Control via Reinforcement Learning Maslowski, Petr ; Uhlíř, Václav (referee) ; Šůstek, Martin (advisor) The goal of this thesis is a creation of an autonomous agent that can control a vehicle. The agent utilizes reinforcement learning that uses neural networks. The agent interprets images from the front vehicle camera and selects appropriate actions to control the vehicle. I designed and created reward functions and then experimented with hyperparameters setup. Trained agent simulate driving on the road. The result of this thesis shows a possible approach to control an autonomous vehicle agent using machine learning method in CARLA simulator. Detailed record
	Strategic Game Based on Multiagent Systems Knapek, Petr ; Kočí, Radek (referee) ; Zbořil, František (advisor) This thesis is focused on designing and implementing system, that adds learning and planning capabilities to agents designed for playing real-time strategy games like StarCraft. It will explain problems of controlling game entities and bots by computer and introduce some often used solutions. Based on analysis, a new system has been designed and implemented. It uses multi-agent systems to control the game, utilizes machine learning methods and is capable of overcoming oponents and adapting to new challenges. Detailed record
	Using of Reinforcement Learning for Four Legged Robot Control Ondroušek, Vít ; Maga,, Dušan (referee) ; Maňas, Pavel (referee) ; Singule, Vladislav (referee) ; Březina, Tomáš (advisor) The Ph.D. thesis is focused on using the reinforcement learning for four legged robot control. The main aim is to create an adaptive control system of the walking robot, which will be able to plan the walking gait through Q-learning algorithm. This aim is achieved using the design of the complex three layered architecture, which is based on the DEDS paradigm. The small set of elementary reactive behaviors forms the basis of proposed solution. The set of composite control laws is designed using simultaneous activations of these behaviors. Both types of controllers are able to operate on the plain terrain as well as on the rugged one. The model of all possible behaviors, that can be achieved using activations of mentioned controllers, is designed using an appropriate discretization of the continuous state space. This model is used by the Q-learning algorithm for finding the optimal strategies of robot control. The capabilities of the control unit are shown on solving three complex tasks: rotation of the robot, walking of the robot in the straight line and the walking on the inclined plane. These tasks are solved using the spatial dynamic simulations of the four legged robot with three degrees of freedom on each leg. Resulting walking gaits are evaluated using the quantitative standardized indicators. The video files, which show acting of elementary and composite controllers as well as the resulting walking gaits of the robot, are integral part of this thesis. Detailed record
	Reinforcement Learning for RoboCup Bočán, Hynek ; Škoda, Petr (referee) ; Smrž, Pavel (advisor) Goal of this thesis is creation of artificial intelligence capable of controlling robotic soccer player simulated in SimSpark environment. Agent created is expanding capabilities of existing third party agent which provides set of basic skills such as localization on the field, dribbling with the ball and omnidirectional walk. Responsibility of the created agent is to pick the best action based current state of the game. This decision making was implemented using reinforcement learning and its method Q-learning. State of the game is transformed into 2D picture with several planes. This picture is then analyzed using deep convolution neural network implemented using C++ and DeepCL library. Detailed record
	Improving Bots Playing Starcraft II Game in PySC2 Environment Krušina, Jan ; Škoda, Petr (referee) ; Smrž, Pavel (advisor) The aim of this thesis is to create an automated system for playing a real-time strategy game Starcraft II. Learning from replays via supervised learning and reinforcement learning techniques are used for improving bot's behavior. The proposed system should be capable of playing the whole game utilizing PySC2 framework for machine learning. Performance of the bot is evaluated against the built-in scripted AI in the game. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English