Skip to content

batra98/Monte_Carlo-TD-Function_Approximation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Monte-Carlo, TD Methods and Functional Approximation

Introduction

In this assignment, we will use Monte-Carlo (MC) Methods and Temporal Difference (TD) Learning on couple of games and toy problems. The problems as given below:

  1. Train an agent that plays the Tic-Tac-Toe using Monte-Carlo Methods.
  2. Train an agent that generates the optimal policy through TD-Methods in the Frozen-Lake Environment.
  3. Build a Deep Q-Learning Network (DQN) which can play Atari Breakout and get the best scores. I was not able to implement this component of the assignment, so instead I build a DQN which can play the cart-pole game.

Details of the problems are included in the respective folders.

πŸ“ File Structure

.
β”œβ”€β”€ Q_1
β”‚Β Β  β”œβ”€β”€ Mc_OffPolicy_agent.dat
β”‚Β Β  β”œβ”€β”€ Mc_OnPolicy_agent
β”‚Β Β  β”œβ”€β”€ Monte-Carlo_Methods(3).html
β”‚Β Β  β”œβ”€β”€ Monte-Carlo_Methods.ipynb
β”‚Β Β  β”œβ”€β”€ __pycache__
β”‚Β Β  β”œβ”€β”€ base_agent.py
β”‚Β Β  β”œβ”€β”€ best_td_agent.dat
β”‚Β Β  β”œβ”€β”€ gym-tictactoe
β”‚Β Β  β”œβ”€β”€ human_agent.py
β”‚Β Β  β”œβ”€β”€ mc_agents.py
β”‚Β Β  └── td_agent.py
β”œβ”€β”€ Q_2
β”‚Β Β  β”œβ”€β”€ Expected_Sarsa.py
β”‚Β Β  β”œβ”€β”€ Frozen_Lake_Through_TD_Methods.html
β”‚Β Β  β”œβ”€β”€ Frozen_Lake_Through_TD_Methods.ipynb
β”‚Β Β  β”œβ”€β”€ Q_Learning.py
β”‚Β Β  β”œβ”€β”€ Sarsa.py
β”‚Β Β  β”œβ”€β”€ __pycache__
β”‚Β Β  └── frozen_lake.py
β”œβ”€β”€ Q_3
β”‚Β Β  β”œβ”€β”€ DQN_Agent.py
β”‚Β Β  β”œβ”€β”€ Function_Approximation_DQN.html
β”‚Β Β  β”œβ”€β”€ Function_Approximation_DQN.ipynb
β”‚Β Β  β”œβ”€β”€ __pycache__
β”‚Β Β  └── cartpole-dqn.h5
β”œβ”€β”€ README.md
└── assignment.pdf

7 directories, 21 files
  • Q_* - Contains files for respective problems along with trained models.
  • assignment.pdf - contains the all the problems statements of the assignment.

Future Work

At the time of doing the assignment, I did't have sufficient knowledge of DL to implement the last part of the assignment. I would like to complete this part of the assignment now.

About

Use Monte-Carlo (MC) Methods and Temporal Difference (TD) Learning on couple of games and toy problems.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published