Markov Decision Processes toolbox provides packages for MATLAB, GNU Octave, Scilab, and R.
MDPtoolbox can be installed with the following code. You could choose any mirror site near you when a dialog or a message asks you to choose one.
install.packages(c("MDPtoolbox"), dependencies = TRUE)
You may need to set proxy before running the code above.
Sys.setenv("http_proxy"="http://130.153.8.66:8080/")
You need to run the following code every time you open R to use this package.
library(MDPtoolbox)
This code let R load MDPtoolbox package.
A textbook case from Hillier and Lieberman (2005).
4 states.
State | Condition |
0 | Good as new |
1 | Operable — minor deterioration |
2 | Operable — major deterioration |
3 | Inoperable — output of unacceptable quality |
3 actions.
Decision | Action | Relevant States |
1 | Do nothing | 0, 1, 2 |
2 | Overhaul (return system to state 1) | 2 |
3 | Replace (return system to state 0) | 1,2,3 |
3 transition matrices with size 4 by 4.
Theoretical transition probability for “Do notiong” action:
Theoretical transition probability for “Overhaul” action:
Theoretical transition probability for “Replace” action:
Cost (= Reward * -1) table
State/Acton | 1 | 2 | 3 |
0 | 0 | ||
1 | 1000 | 6000 | |
2 | 3000 | 4000 | 6000 |
3 | 6000 |
Reward table
State/Acton | 1 | 2 | 3 |
0 | 0 | ||
1 | -1000 | -6000 | |
2 | -3000 | -4000 | -6000 |
3 | -6000 |
Extended transaction probabitlity matrix:
Extended reward table
State/Acton | 1 | 2 | 3 |
0 | 0 | -4000 | -6000 |
1 | -1000 | -4000 | -6000 |
2 | -3000 | -4000 | -6000 |
3 | -8000 | -8000 | -6000 |
Conifuguration
n.state <- 4 n.action <- 3 P <- array(0, c(n.state, n.state, n.action)) R <- array(0, c(n.state, n.action))
Definition of the set of transition probability matrices
P[,,1] <- matrix( c( 0, 7/8, 1/16, 1/16, 0, 3/4, 1/8, 1/8, 0, 0, 1/2, 1/2, 0, 0, 0, 1), nrow=n.state, ncol=n.state, byrow=TRUE) P[,,2] <- matrix(c( 0, 7/8, 1/16, 1/16, 0, 3/4, 1/8, 1/8, 0, 1, 0, 0, 0, 0, 0, 1), nrow=n.state, ncol=n.state, byrow=TRUE) P[,,3] <- matrix(c( 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0), nrow=n.state, ncol=n.state, byrow=TRUE) dimnames(P)[[1]] <- c("new","minor det","major det", "inoperable") dimnames(P)[[2]] <- c("new","minor det","major det", "inoperable") dimnames(P)[[3]] <- c("do nothing", "overhaul", "replace") P
Definition of a reward matrix
R <- array(0, c(n.state, n.action)) R[, 1] <- -c(0, 1000, 3000, 8000) R[, 2] <- -c(4000, 4000, 4000, 8000) R[, 3] <- -c(6000, 6000, 6000, 6000) rownames(R) <- c("new","minor det","major det", "inoperable") colnames(R) <- c("do nothing", "overhaul", "replace") R
MDP solvers for inifinite horizon problems:
mdp_LP(P, R, 0.9) mdp_policy_iteration(P, R, 0.9) mdp_policy_iteration_modified(P, R, 0.9) mdp_Q_learning(P, R, 0.9) mdp_value_iteration(P, R, 0.9) mdp_value_iterationGS(P, R, 0.9)
0.9 is the discount rate.
Evaluation of a policy for infinite horizon problems
mdp_eval_policy_iterative(P, R, 0.99, policy=c(1,1,2,3), c(0,0,0,0), epsilon=0.0001, max_iter=4000)