Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MIE1615: Markov Decision Processes
Assignment 1
Question 1: A driver is looking for a parking space on the way to his destination. Each
parking space is free with probability p independently of whether other parking spaces are
free or not. The driver cannot observe whether a parking space is free until he reaches
it. If he parks s spaces from the destination, he incurs cost s, s = 0, 1, · · · . If he passes
the destination without having parked the cost is D. Formulate the parking problem as a
dynamic programming problem.
Question 2: Requisitions arrive daily at a data processing center. With probability
Pj, j ≥ 0, there will be j requisitions in a day. Right after the daily requisitions have
arrived, a decision has to be made as to whether or not to process. The cost of processing
requisitions of any number is K (i.e. if we decide to process, then all of the requisitions still
awaiting processing are simultaneously processed at a fixed cost K) and the processing time
is negligible. A penalty cost of c pre day per requisition is incurred while requisitions await
processing. All requisitions must be processed by time N , and the objective is to minimize
the expected sum of costs.
1. Write the optimality equation.
2. Show that the optima policy has the following form: it processes at time n if and only
if the number of requisitions accumulated just after time n is greater than or equal to
some value sn. Show how to determine these critical numbers s1, · · · , sN .
Question 3: You have invested I0 in a stock market and the value of your investment
evolves as follows:
It = λIt−1 + ϵt, t = 1, 2, . . . , T
where ϵt is an independently and identically distributed discrete random noise and λ ∈ (0, 1).
You can sell your assets at any time and start earning interest at a fixed rate r until T .
1. Formulate the problem a discrete-time Markov decision process in which the objective
is to maximize your wealth at time T .
2. Show that there exists a point in time τ such that it is optimal to sell for t ≥ τ , and
optimal to hold for t < τ .
1
Question 4: An airline has to decide when to bring an aircraft in for a major engine
overhaul. Let st represent the state of the engine in terms of engine wear, and let dt be a
nonnegative amount by which the engine deteriorates during period t. At the beginning of
period t the airline may decide to continue operating the aircraft (zt = 0) or to repair the
aircraft (zt = 1) at a cost of c
R, which has the effect of returning the aircraft to st+1 = 0. If the
airline does not repair the aircraft, the cost of operation is co(st), which is a nondecreasing,
convex function in st .
1. Formulate the one-period reward function Ct(st, zt), and show that it is submodular.
2. Show that the decision rule is monotone in st.
Question 5: A machine is observed to be in some state i at the beginning of each of T
periods. Let α denote the one-period discount factor. State 1 corresponds to a new machine,
while state n corresponds to an failed machine. Intermediate states represent gradual states
of breakdown and inefficiency. At the beginning of each period, there are two possible
decisions: decision ”0” corresponds to ”doing nothing” and just continuing to operate the
existing machine. Decision ”1” corresponds to replacing the machine, which costs CR, and
the machine is replaced ”immediately” (i.e. state becomes ”1”). The resulting, possibly
unchanged, machine is then operated for the period, incurring a cost of c(i) during that
period as a function of the state. Assume that c(i) increases in i and that the state space
is the set of integers 1, 2, · · · , n. Let pij denote the probability that a machine that starts a
period in state i makes a transition to state j by the end of the period. Your objective is to
minimize the expected costs over the finite time horizon, assuming VT+1(i) = 0 for each i.
1. Assume that in this problem
∑n
j=k pij is an increasing function of i, for each k ≤ n.
Interpret this assumption from a managerial perspective.
2. Write the optimality equation.
3. Prove that the optimal value function in each period is increasing in the state (i.e.
Vt(i) is increasing in i, ∀t).
4. Show that the following control policy πt is optimal: for each period t the decision rule
can be characterized by a single critical number at such that if the state of the machine
is above at then replacing the machine, otherwise doing nothing.