https://www.zhihu.com/question/277325426
https://github.com/jinglescode/reinforcement-learning-tic-tac-toe/blob/master/README.md
Intuition
After a long day at work, you are deciding between 2 choices: to head home and write a Medium article or hang out with friends at a bar. If you choose to hang out with friends, your friends will make you feel happy; whereas heading home to write an article, you’ll end up feeling tired after a long day at work. In this example, enjoying yourself is a reward and feeling tired is viewed as a negative reward, so why write articles?
Because in life, we don’t just think about immediate rewards; we plan a course of actions to determine the possible future rewards that may follow. Perhaps writing an article may brush up your understanding of a particular topic really well, get recognised and ultimately lands you that dream job you’ve always wanted. In this scenario, getting your dream job is a delayed reward from a list of actions you took, then we want to assign some value for being at those states (for example “going home and write an article”). In order to determine the value of a state, we call this the “value function”.
So how do we learn from our past? Let’s say you made some great decisions and are in the best state of your life. Now look back at the various decisions you’ve made to reach this stage: what do you attribute your success to? What are the previous states that led you to this success? What are the actions you did in the past that led you to this state of receiving this reward? How is the action you are doing now related to the potential reward you may receive in the future?
Reinforcement Learning — Implement TicTacToe
How to use reinforcement learning to play tic-tac-toe
https://github.com/MJeremy2017/reinforcement-learning-implementation/tree/master/TicTacToe
直接看这个「井字棋」的代码,结合反复阅读这几篇文章,慢慢理解 Q-Learning 是个什么东西,每个参数的意义又是什么。
https://github.com/ZuzooVn/machine-learning-for-software-engineers
https://machinelearningmastery.com/machine-learning-for-programmers/#comment-358985
https://towardsdatascience.com/simple-reinforcement-learning-q-learning-fcddc4b6fe56
What’s ‘Q’?
The ‘q’ in q-learning stands for quality. Quality in this case represents how useful a given action is in gaining some future reward.
How Does Learning Rate Decay Help Modern Neural Networks?
https://smartlabai.medium.com/reinforcement-learning-algorithms-an-intuitive-overview-904e2dff5bbc
RL 分为 Model-free 和 Model-based 两类
Q-Learning 就属于 Model-free
http://incompleteideas.net/book/the-book-2nd.html
http://incompleteideas.net/book/code/code2nd.html