-
简介
-
Q-learning
-
Sarsa
-
Deep Q Network
-
Policy Gradient
-
Actor Critic
- 6.1 什么是 Actor Critic
- 6.2 Actor Critic (Tensorflow)
- 6.3 什么是 Deep Deterministic Policy Gradient (DDPG)
- 6.4 Deep Deterministic Policy Gradient (DDPG) (Tensorflow)
- 6.5 什么是 Asynchronous Advantage Actor-Critic (A3C)
- 6.6 Asynchronous Advantage Actor-Critic (A3C) (Tensorflow)
- 6.7 Distributed Proximal Policy Optimization (DPPO) (Tensorflow)
-
Model Based RL