Original title: Second Order Optimality in Transient and Discounted Markov Decision Chains
Authors: Sladký, Karel
Document type: Papers
Conference/Event: Mathematical Methods in Economics 2015 /33./, Cheb (CZ), 2015-09-09 / 2015-09-11
Year: 2015
Language: eng
Abstract: The article is devoted to second order optimality in Markov decision processes. Attention is primarily focused on the reward variance for discounted models and undiscounted transient models (i.e. where the spectral radius of the transition probability matrix is less than unity). Considering the second order optimality criteria means that in the class of policies maximizing (or minimizing) total expected discounted reward (or undiscounted reward for the transient model) we choose the policy minimizing the total variance. Explicit formulae for calculating the variances for transient and discounted models are reported along with sketches of algoritmic procedures for finding second order optimal policies.
Keywords: discounted and transient Markov reward chains; dynamic programming; reward-variance optimality
Project no.: GA13-14445S (CEP), GA15-10331S (CEP)
Funding provider: GA ČR, GA ČR
Host item entry: Procedings of the 33rd International Conference Mathematical Methods in Economics MME 2015, ISBN 978-80-261-0539-8

Institution: Institute of Information Theory and Automation AS ČR (web)
Document availability information: Fulltext is available at external website.
External URL: http://library.utia.cas.cz/separaty/2015/E/sladky-0448938.pdf
Original record: http://hdl.handle.net/11104/0250633

Permalink: http://www.nusl.cz/ntk/nusl-200860


The record appears in these collections:
Research > Institutes ASCR > Institute of Information Theory and Automation
Conference materials > Papers
 Record created 2015-11-04, last modified 2022-09-29


No fulltext
  • Export as DC, NUŠL, RIS
  • Share