十二月一日学习汇总
代码:
Deep reinforcement learning course :https://github.com/simoninithomas/Deep_reinforcement_learning_Course/tree/master/PPO with Sonic the Hedgehog,
Deep reinforcement learning with pytorch:https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch,https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch
Reinforce详解:https://blog.csdn.net/lrt366/article/details/91359230
相关代码:https://github.com/chingyaoc/pytorch-REINFORCE/blob/master/assets/algo.png
算法博客:
DDPG算法详解:https://blog.csdn.net/kenneth_yu/article/details/78478356
策略梯度:https://developer.ibm.com/zh/articles/ba-lo-deep-introduce-policy-gradient/
ONpolicy off policy 区别:https://www.zhihu.com/question/57159315#:~:text=On-policy和off-policy,策略,后者则不是。&text=-greedy,则是on-policy。&text=)%EF%BC%8C%E6%9B%B4%E6%96%B0%E7%9A%84%E6%97%B6%E5%80%99%E6%98%AF0,%EF%BC%8C%E5%88%99%E6%98%AFoff%2Dpolicy%E3%80%82。
TD算法详解:https://zhuanlan.zhihu.com/p/25913410
DQN算法:https://blog.csdn.net/qq_30615903/article/details/80744083,https://zhuanlan.zhihu.com/p/21421729
No module named ...解决办法:https://github.com/openai/spinningup/issues/60
课程
应用随机过程:概率模型导论
Probability in electrical engineering and computer science an application driven course
凸优化以及随机过程,CS285.
数学相关知识:https://www.msra.cn/zh-cn/news/features/book-recommendation-machine-learning-math