Markov Decision Processes Toolbox

Installation

MDPtoolbox can be installed with the following code. You could choose any mirror site near you when a dialog or a message asks you to choose one.

install.packages(c("MDPtoolbox"), dependencies = TRUE)

You may need to set proxy before running the code above.

Sys.setenv("http_proxy"="http://130.153.8.66:8080/")
Preparation

You need to run the following code every time you open R to use this package.

library(MDPtoolbox)

This code let R load MDPtoolbox package.

Example

A textbook case from Hillier and Lieberman (2005).

Case

4 states.

StateCondition
0Good as new
1Operable — minor deterioration
2Operable — major deterioration
3Inoperable — output of unacceptable quality

3 actions.

DecisionActionRelevant States
1Do nothing0, 1, 2
2Overhaul (return system to state 1)2
3Replace (return system to state 0)1,2,3

3 transition matrices with size 4 by 4.

Theoretical transition probability for “Do notiong” action: 
P^{(1)}=\left(\begin{array}{cccc}
    0 & 7/8 & 1/16 & 1/16 \\
    0 & 3/4 & 1/8  & 1/8  \\
    0 & 0   & 1/2  & 1/2  \\
    0 & 0   & 0    & 1
  \end{array}\right)

Theoretical transition probability for “Overhaul” action: 
P^{(2)}=\left(\begin{array}{cccc}
    0 & 0   & 0    & 0    \\
    0 & 0   & 0    & 0   \\
    0 & 1   & 0    & 0    \\
    0 & 0   & 0    & 0
  \end{array}\right)

Theoretical transition probability for “Replace” action: 
P^{(3)}=\left(\begin{array}{cccc}
    1 & 0   & 0    & 0    \\
    1 & 0   & 0    & 0   \\
    1 & 0   & 0    & 0    \\
    1 & 0   & 0    & 0
  \end{array}\right)

Cost (= Reward * -1) table

State/Acton123
00
11000 6000
2300040006000
3 6000

Reward table

State/Acton123
00
1-1000 -6000
2-3000-4000-6000
3 -6000
Extension for practice

Extended transaction probabitlity matrix:


P^{(2)}=\left(\begin{array}{cccc}
    0 & 7/8 & 1/16 & 1/16 \\
    0 & 3/4 & 1/8  & 1/8  \\
    0 & 1   & 0    & 0    \\
    0 & 0   & 0    & 0
  \end{array}\right)

Extended reward table

State/Acton123
00-4000-6000
1-1000-4000-6000
2-3000-4000-6000
3-8000-8000-6000
Setting for R

Conifuguration

n.state <- 4
n.action <- 3
P <- array(0, c(n.state, n.state, n.action))
R <- array(0, c(n.state, n.action))

Definition of the set of transition probability matrices

P[,,1] <- matrix(
  c(
    0, 7/8, 1/16, 1/16,
    0, 3/4, 1/8,   1/8,
    0, 0,     1/2,   1/2,
    0, 0,     0,       1),
  nrow=n.state, ncol=n.state, byrow=TRUE)
P[,,2] <- matrix(c(
    0, 7/8, 1/16, 1/16,
    0, 3/4, 1/8, 1/8,
    0, 1, 0, 0,
    0, 0, 0, 1),
  nrow=n.state, ncol=n.state, byrow=TRUE)
P[,,3] <- matrix(c(
    1, 0, 0, 0,
    1, 0, 0, 0,
    1, 0, 0, 0,
    1, 0, 0, 0),
  nrow=n.state, ncol=n.state, byrow=TRUE)
dimnames(P)[[1]] <- c("new","minor det","major det", "inoperable")
dimnames(P)[[2]] <- c("new","minor det","major det", "inoperable")
dimnames(P)[[3]] <- c("do nothing", "overhaul", "replace")
P

Definition of a reward matrix

R <- array(0, c(n.state, n.action))
R[, 1] <- -c(0, 1000, 3000, 8000)
R[, 2] <- -c(4000, 4000, 4000, 8000)
R[, 3] <- -c(6000, 6000, 6000, 6000)
rownames(R) <- c("new","minor det","major det", "inoperable")
colnames(R) <- c("do nothing", "overhaul", "replace")
R
Policy Optimization

MDP solvers for inifinite horizon problems:

mdp_LP(P, R, 0.9)
mdp_policy_iteration(P, R, 0.9)
mdp_policy_iteration_modified(P, R, 0.9)
mdp_Q_learning(P, R, 0.9)
mdp_value_iteration(P, R, 0.9)
mdp_value_iterationGS(P, R, 0.9)

0.9 is the discount rate.

Policy Evaluation

Evaluation of a policy for infinite horizon problems

mdp_eval_policy_iterative(P, R, 0.99,
policy=c(1,1,2,3), c(0,0,0,0), epsilon=0.0001, max_iter=4000)
r/how_to/markov_decision_processes.txt · 最終更新: 2013/06/09 15:28 by watalu
 
特に明示されていない限り、本Wikiの内容は次のライセンスに従います: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki