WeChat:lovexc60
ECE 517: Reinforcement Learning in Artificial Intelligence (Fall 2020)
Your goal is to find the optimal policy for a driver to park their car at the ice-cream shop as
shown in the diagram below. The shop has M parking spaces, at different distances to the
shop itself.
1. There are M parking spots at different distances to the store (as shown in the diagram).
2. The driver enters the lot from the far side of the store.
3. When the driver passes a parking spot he can decide if to take the spot, to continue
to the next spot, or to wait and see if the spot becomes available.
4. When making a decision the driver is not able to see if the next spots are available
(can only see the current spot).
5. The driver would like to get to the store in minimum time.
6. If the driver is not able to find a spot he must leave and will not get ice cream, which
will be very disappointing.
7. If you attempt to take a spot which is already taken that results in an accident and no
ice cream (very bad).
1
Figure 1: A diagram of the problem.
You are tasked with writing a computer program which will guide the driver’s behavior.
More specifically you are asked to model the problem as an MDP, and find the optimal policy
using DP. You can make the following assumptions:
1. Each parking spot i has a probability pi of being taken. This can be modeled as
p1 = pmin and pM = pmax with the rest of the probabilities evenly spaced between
them.
2. As your goal is to get the ice cream as quickly as possible, there are three different
times needed to be considered: walking from a spot (twalk), waiting for a spot (twait),
and driving form one spot to the next (tdrive) .
3. Assume parking itself does not take any time.
4. The probability of a space staying in the state (taken/free) after each waiting time
(twait) is psame.
5. There should be a way to quantify how bad it is to not get any ice cream (rno).
6. The reward should depend on the amount time until the ice cream store has been
reached.
3 Example Setup
For the report you will be required to show results on this specific setup:
1. 12 parking spots
2. It takes 1 second to drive to the next spot (tdrive)
3. It takes 3 seconds to walk across each spot (twalk)
4. It takes 10 seconds to wait for a spot (twait)
2
5. Not getting ice cream is as bad as waiting 100 seconds (rno)
6. pmin = 0.1, pmax = 0.9, psame = 0.9.