==== Markov Decision Processes Toolbox ==== [[http://www7.inra.fr/mia/T/MDPtoolbox/|Markov Decision Processes toolbox]] provides packages for [[http://www.mathworks.com/|MATLAB]], [[http://www.gnu.org/software/octave/|GNU Octave]], [[https://www.scilab.org|Scilab]], and [[http://www.r-project.org/|R]]. == Installation == MDPtoolbox can be installed with the following code. You could choose any mirror site near you when a dialog or a message asks you to choose one. install.packages(c("MDPtoolbox"), dependencies = TRUE) You may need to set proxy before running the code above. Sys.setenv("http_proxy"="http://130.153.8.66:8080/") == Preparation == You need to run the following code every time you open R to use this package. library(MDPtoolbox) This code let R load MDPtoolbox package. === Example === A textbook case from Hillier and Lieberman (2005). == Case == 4 states. |State|Condition| |0|Good as new| |1|Operable --- minor deterioration| |2|Operable --- major deterioration| |3|Inoperable --- output of unacceptable quality| 3 actions. |Decision|Action|Relevant States| |1|Do nothing|0, 1, 2| |2|Overhaul (return system to state 1)|2| |3|Replace (return system to state 0)|1,2,3| 3 transition matrices with size 4 by 4. Theoretical transition probability for "Do notiong" action: P^{(1)}=\left(\begin{array}{cccc} 0 & 7/8 & 1/16 & 1/16 \\ 0 & 3/4 & 1/8 & 1/8 \\ 0 & 0 & 1/2 & 1/2 \\ 0 & 0 & 0 & 1 \end{array}\right) Theoretical transition probability for "Overhaul" action: P^{(2)}=\left(\begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right) Theoretical transition probability for "Replace" action: P^{(3)}=\left(\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{array}\right) Cost (= Reward * -1) table |State/Acton|1|2|3| |0|0| | | |1|1000| |6000| |2|3000|4000|6000| |3| | |6000| Reward table |State/Acton|1|2|3| |0|0| | | |1|-1000| |-6000| |2|-3000|-4000|-6000| |3| | |-6000| == Extension for practice == Extended transaction probabitlity matrix: P^{(2)}=\left(\begin{array}{cccc} 0 & 7/8 & 1/16 & 1/16 \\ 0 & 3/4 & 1/8 & 1/8 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right) Extended reward table |State/Acton|1|2|3| |0|0|-4000|-6000| |1|-1000|-4000|-6000| |2|-3000|-4000|-6000| |3|-8000|-8000|-6000| == Setting for R == Conifuguration n.state <- 4 n.action <- 3 P <- array(0, c(n.state, n.state, n.action)) R <- array(0, c(n.state, n.action)) Definition of the set of transition probability matrices P[,,1] <- matrix( c( 0, 7/8, 1/16, 1/16, 0, 3/4, 1/8, 1/8, 0, 0, 1/2, 1/2, 0, 0, 0, 1), nrow=n.state, ncol=n.state, byrow=TRUE) P[,,2] <- matrix(c( 0, 7/8, 1/16, 1/16, 0, 3/4, 1/8, 1/8, 0, 1, 0, 0, 0, 0, 0, 1), nrow=n.state, ncol=n.state, byrow=TRUE) P[,,3] <- matrix(c( 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0), nrow=n.state, ncol=n.state, byrow=TRUE) dimnames(P)[[1]] <- c("new","minor det","major det", "inoperable") dimnames(P)[[2]] <- c("new","minor det","major det", "inoperable") dimnames(P)[[3]] <- c("do nothing", "overhaul", "replace") P Definition of a reward matrix R <- array(0, c(n.state, n.action)) R[, 1] <- -c(0, 1000, 3000, 8000) R[, 2] <- -c(4000, 4000, 4000, 8000) R[, 3] <- -c(6000, 6000, 6000, 6000) rownames(R) <- c("new","minor det","major det", "inoperable") colnames(R) <- c("do nothing", "overhaul", "replace") R == Policy Optimization == MDP solvers for inifinite horizon problems: mdp_LP(P, R, 0.9) mdp_policy_iteration(P, R, 0.9) mdp_policy_iteration_modified(P, R, 0.9) mdp_Q_learning(P, R, 0.9) mdp_value_iteration(P, R, 0.9) mdp_value_iterationGS(P, R, 0.9) 0.9 is the discount rate. == Policy Evaluation == Evaluation of a policy for infinite horizon problems mdp_eval_policy_iterative(P, R, 0.99, policy=c(1,1,2,3), c(0,0,0,0), epsilon=0.0001, max_iter=4000)