You are welcome to use more advanced algorithms that utilise deep learning such asDQN’s [6] or Proximal Policy optimisation etc. but these are not covered in the unit andlab facilitators may not be able to assist with your implementations. These algorithms willalso have to be referenced in your project report.
You are allowed to use existing implementations such as those from stable baselineshoweveryou are required to implement at least one of the algorithms yourself.For example the following combinations of algorithms would be allowed (this list is notexhaustive):
•Hand Implemented Rule based agent and PPO from Stable Baselines
•Hand Implemented TD(λ) and Hand Implemented DQN
but comparing PPD and DQN from stable baselines would not be allowed.
You are welcome to used utilities and libraries from Stable Baselines in your own implementations, just make note of them in your report. An algorithm from stable baselines canbe counted as hand implemented if sufficient fine tuning, adjustments or optimisations havebeen made for the Super Mario Bro’s environment, but you will have to make note of thesein your report and you may be required to explain them in an interview see 2.
For example if you were to take the DQN from Stable baselines, define your own custompolicy, have custom image prepossessing and added internal replay it could count as handimplemented.
Projects that use two hand implemented algorithms are likely to score higher on the implementation section of the marking criteria.