• (六)Value Function Approximation-LSPI code (5)


    本篇是sample.py

     1 # -*- coding: utf-8 -*-
     2 """Contains class representing an LSPI sample."""
     3 
     4 
     5 class Sample(object):
     6 
     7     """Represents an LSPI sample tuple ``(s, a, r, s', absorb)``.
     8     #表达了LSPI的采样,用tuple表示
     9     Parameters#输入参数    
    10     ----------
    11         
    12     state : numpy.array#状态向量
    13         State of the environment at the start of the sample.采样开始时环境的状态
    14         ``s`` in the sample tuple.
    15         (The usual type is a numpy array.)
    16     action : int#执行的动作的编号
    17         Index of action that was executed.
    18         ``a`` in the sample tuple
    19     reward : float#从环境中获得的奖励
    20         Reward received from the environment.
    21         ``r`` in the sample tuple
    22     next_state : numpy.array#采用了采样中的动作后的下一个环境状态
    23         State of the environment after executing the sample's action.
    24         ``s'`` in the sample tuple
    25         (The type should match that of state.)
    26     absorb : bool, optional#如果这个采样终结了这个episode那么就返回True
    27         True if this sample ended the episode. False otherwise.
    28         ``absorb`` in the sample tuple
    29         (The default is False, which implies that this is a
    30         non-episode-ending sample)
    31 
    32 
    33     Assumes that this is a non-absorbing sample (as the vast majority
    34     of samples will be non-absorbing).
    35     #假设这个sample是不会结束episode的,
    36     #这么做:设成一个类,是为了方便不同的调用方式
    37     This class is just a dumb data holder so the types of the different
    38     fields can be anything convenient for the problem domain.
    39 
    40     For states represented by vectors a numpy array works well.
    41 
    42     """
    43 
    44     def __init__(self, state, action, reward, next_state, absorb=False):#初始化
    45         """Initialize Sample instance."""
    46         self.state = state
    47         self.action = action
    48         self.reward = reward
    49         self.next_state = next_state
    50         self.absorb = absorb
    51 
    52     def __repr__(self):#打印的时候调用该函数.
    53         """Create string representation of tuple."""
    54         return 'Sample(%s, %s, %s, %s, %s)' % (self.state,
    55                                                self.action,
    56                                                self.reward,
    57                                                self.next_state,
    58                                                self.absorb)
  • 相关阅读:
    redis单机安装以及简单redis集群搭建
    Linux中JDK安装教程
    微信公众号开发(一)
    easyui多图片上传+预览切换+支持IE8
    mybatis动态sql之foreach标签
    java List递归排序,传统方式和java8 Stream优化递归,无序的列表按照父级关系进行排序(两种排序类型)
    java钉钉通讯录同步
    java使用poi生成导出Excel(新)
    java 图片转base64字符串、base64字符串转图片
    Spring事务mysql不回滚:mysql引擎修改
  • 原文地址:https://www.cnblogs.com/lijiajun/p/5490109.html
Copyright © 2020-2023  润新知