==== Markov Decision Processes Toolbox ====
[[http://www7.inra.fr/mia/T/MDPtoolbox/|Markov Decision Processes toolbox]] provides packages for [[http://www.mathworks.com/|MATLAB]], [[http://www.gnu.org/software/octave/|GNU Octave]], [[https://www.scilab.org|Scilab]], and [[http://www.r-project.org/|R]].
== Installation ==
MDPtoolbox can be installed with the following code.
You could choose any mirror site near you when a dialog or a message asks you to choose one.
install.packages(c("MDPtoolbox"), dependencies = TRUE)
You may need to set proxy before running the code above.
Sys.setenv("http_proxy"="http://130.153.8.66:8080/")
== Preparation ==
You need to run the following code every time you open R to use this package.
library(MDPtoolbox)
This code let R load MDPtoolbox package.
=== Example ===
A textbook case from Hillier and Lieberman (2005).
== Case ==
4 states.
|State|Condition|
|0|Good as new|
|1|Operable --- minor deterioration|
|2|Operable --- major deterioration|
|3|Inoperable --- output of unacceptable quality|
3 actions.
|Decision|Action|Relevant States|
|1|Do nothing|0, 1, 2|
|2|Overhaul (return system to state 1)|2|
|3|Replace (return system to state 0)|1,2,3|
3 transition matrices with size 4 by 4.
Theoretical transition probability for "Do notiong" action:
P^{(1)}=\left(\begin{array}{cccc}
0 & 7/8 & 1/16 & 1/16 \\
0 & 3/4 & 1/8 & 1/8 \\
0 & 0 & 1/2 & 1/2 \\
0 & 0 & 0 & 1
\end{array}\right)
Theoretical transition probability for "Overhaul" action:
P^{(2)}=\left(\begin{array}{cccc}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 0
\end{array}\right)
Theoretical transition probability for "Replace" action:
P^{(3)}=\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
1 & 0 & 0 & 0 \\
1 & 0 & 0 & 0 \\
1 & 0 & 0 & 0
\end{array}\right)
Cost (= Reward * -1) table
|State/Acton|1|2|3|
|0|0| | |
|1|1000| |6000|
|2|3000|4000|6000|
|3| | |6000|
Reward table
|State/Acton|1|2|3|
|0|0| | |
|1|-1000| |-6000|
|2|-3000|-4000|-6000|
|3| | |-6000|
== Extension for practice ==
Extended transaction probabitlity matrix:
P^{(2)}=\left(\begin{array}{cccc}
0 & 7/8 & 1/16 & 1/16 \\
0 & 3/4 & 1/8 & 1/8 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 0
\end{array}\right)
Extended reward table
|State/Acton|1|2|3|
|0|0|-4000|-6000|
|1|-1000|-4000|-6000|
|2|-3000|-4000|-6000|
|3|-8000|-8000|-6000|
== Setting for R ==
Conifuguration
n.state <- 4
n.action <- 3
P <- array(0, c(n.state, n.state, n.action))
R <- array(0, c(n.state, n.action))
Definition of the set of transition probability matrices
P[,,1] <- matrix(
c(
0, 7/8, 1/16, 1/16,
0, 3/4, 1/8, 1/8,
0, 0, 1/2, 1/2,
0, 0, 0, 1),
nrow=n.state, ncol=n.state, byrow=TRUE)
P[,,2] <- matrix(c(
0, 7/8, 1/16, 1/16,
0, 3/4, 1/8, 1/8,
0, 1, 0, 0,
0, 0, 0, 1),
nrow=n.state, ncol=n.state, byrow=TRUE)
P[,,3] <- matrix(c(
1, 0, 0, 0,
1, 0, 0, 0,
1, 0, 0, 0,
1, 0, 0, 0),
nrow=n.state, ncol=n.state, byrow=TRUE)
dimnames(P)[[1]] <- c("new","minor det","major det", "inoperable")
dimnames(P)[[2]] <- c("new","minor det","major det", "inoperable")
dimnames(P)[[3]] <- c("do nothing", "overhaul", "replace")
P
Definition of a reward matrix
R <- array(0, c(n.state, n.action))
R[, 1] <- -c(0, 1000, 3000, 8000)
R[, 2] <- -c(4000, 4000, 4000, 8000)
R[, 3] <- -c(6000, 6000, 6000, 6000)
rownames(R) <- c("new","minor det","major det", "inoperable")
colnames(R) <- c("do nothing", "overhaul", "replace")
R
== Policy Optimization ==
MDP solvers for inifinite horizon problems:
mdp_LP(P, R, 0.9)
mdp_policy_iteration(P, R, 0.9)
mdp_policy_iteration_modified(P, R, 0.9)
mdp_Q_learning(P, R, 0.9)
mdp_value_iteration(P, R, 0.9)
mdp_value_iterationGS(P, R, 0.9)
0.9 is the discount rate.
== Policy Evaluation ==
Evaluation of a policy for infinite horizon problems
mdp_eval_policy_iterative(P, R, 0.99,
policy=c(1,1,2,3), c(0,0,0,0), epsilon=0.0001, max_iter=4000)