==== Markov Decision Processes Toolbox ==== [[http://www7.inra.fr/mia/T/MDPtoolbox/|Markov Decision Processes toolbox]] provides packages for [[http://www.mathworks.com/|MATLAB]], [[http://www.gnu.org/software/octave/|GNU Octave]], [[https://www.scilab.org|Scilab]], and [[http://www.r-project.org/|R]]. == Installation == MDPtoolbox can be installed with the following code. You could choose any mirror site near you when a dialog or a message asks you to choose one.


install.packages(c("MDPtoolbox"), dependencies = TRUE)

You may need to set proxy before running the code above.


Sys.setenv("http_proxy"="http://130.153.8.66:8080/")

== Preparation == You need to run the following code every time you open R to use this package.


library(MDPtoolbox)

This code let R load MDPtoolbox package. === Example === A textbook case from Hillier and Lieberman (2005). == Case == 4 states. |State|Condition| |0|Good as new| |1|Operable --- minor deterioration| |2|Operable --- major deterioration| |3|Inoperable --- output of unacceptable quality| 3 actions. |Decision|Action|Relevant States| |1|Do nothing|0, 1, 2| |2|Overhaul (return system to state 1)|2| |3|Replace (return system to state 0)|1,2,3| 3 transition matrices with size 4 by 4. Theoretical transition probability for "Do notiong" action: P^{(1)}=\left(\begin{array}{cccc} 0 & 7/8 & 1/16 & 1/16 \\ 0 & 3/4 & 1/8 & 1/8 \\ 0 & 0 & 1/2 & 1/2 \\ 0 & 0 & 0 & 1 \end{array}\right) Theoretical transition probability for "Overhaul" action: P^{(2)}=\left(\begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right) Theoretical transition probability for "Replace" action: P^{(3)}=\left(\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{array}\right) Cost (= Reward * -1) table |State/Acton|1|2|3| |0|0| | | |1|1000| |6000| |2|3000|4000|6000| |3| | |6000| Reward table |State/Acton|1|2|3| |0|0| | | |1|-1000| |-6000| |2|-3000|-4000|-6000| |3| | |-6000| == Extension for practice == Extended transaction probabitlity matrix: P^{(2)}=\left(\begin{array}{cccc} 0 & 7/8 & 1/16 & 1/16 \\ 0 & 3/4 & 1/8 & 1/8 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right) Extended reward table |State/Acton|1|2|3| |0|0|-4000|-6000| |1|-1000|-4000|-6000| |2|-3000|-4000|-6000| |3|-8000|-8000|-6000| == Setting for R == Conifuguration


n.state <- 4
n.action <- 3
P <- array(0, c(n.state, n.state, n.action))
R <- array(0, c(n.state, n.action))

Definition of the set of transition probability matrices


P[,,1] <- matrix(
  c(
    0, 7/8, 1/16, 1/16,
    0, 3/4, 1/8,   1/8,
    0, 0,     1/2,   1/2,
    0, 0,     0,       1),
  nrow=n.state, ncol=n.state, byrow=TRUE)
P[,,2] <- matrix(c(
    0, 7/8, 1/16, 1/16,
    0, 3/4, 1/8, 1/8,
    0, 1, 0, 0,
    0, 0, 0, 1),
  nrow=n.state, ncol=n.state, byrow=TRUE)
P[,,3] <- matrix(c(
    1, 0, 0, 0,
    1, 0, 0, 0,
    1, 0, 0, 0,
    1, 0, 0, 0),
  nrow=n.state, ncol=n.state, byrow=TRUE)
dimnames(P)[[1]] <- c("new","minor det","major det", "inoperable")
dimnames(P)[[2]] <- c("new","minor det","major det", "inoperable")
dimnames(P)[[3]] <- c("do nothing", "overhaul", "replace")
P

Definition of a reward matrix


R <- array(0, c(n.state, n.action))
R[, 1] <- -c(0, 1000, 3000, 8000)
R[, 2] <- -c(4000, 4000, 4000, 8000)
R[, 3] <- -c(6000, 6000, 6000, 6000)
rownames(R) <- c("new","minor det","major det", "inoperable")
colnames(R) <- c("do nothing", "overhaul", "replace")
R

== Policy Optimization == MDP solvers for inifinite horizon problems:


mdp_LP(P, R, 0.9)
mdp_policy_iteration(P, R, 0.9)
mdp_policy_iteration_modified(P, R, 0.9)
mdp_Q_learning(P, R, 0.9)
mdp_value_iteration(P, R, 0.9)
mdp_value_iterationGS(P, R, 0.9)

0.9 is the discount rate. == Policy Evaluation == Evaluation of a policy for infinite horizon problems


mdp_eval_policy_iterative(P, R, 0.99,
policy=c(1,1,2,3), c(0,0,0,0), epsilon=0.0001, max_iter=4000)