• TensorLayer官方中文文档1.7.4:API – 强化学习


    API - 强化学习

    强化学习(增强学习)相关函数。

    discount_episode_rewards([rewards, gamma, mode]) Take 1D float array of rewards and compute discounted rewards for an episode.
    cross_entropy_reward_loss(logits, actions, ...) Calculate the loss for Policy Gradient Network.
    log_weight(probs, weights[, name]) Log weight.
    choice_action_by_probs([probs, action_list]) Choice and return an an action by given the action probability distribution.

    奖励函数

    tensorlayer.rein.discount_episode_rewards(rewards=[], gamma=0.99, mode=0)[源代码]

    Take 1D float array of rewards and compute discounted rewards for an
    episode. When encount a non-zero value, consider as the end a of an episode.

    Parameters:

    rewards : numpy list

    a list of rewards

    gamma : float

    discounted factor

    mode : int

    if mode == 0, reset the discount process when encount a non-zero reward (Ping-pong game).
    if mode == 1, would not reset the discount process.

    Examples

    >>> rewards = np.asarray([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1])
    >>> gamma = 0.9
    >>> discount_rewards = tl.rein.discount_episode_rewards(rewards, gamma)
    >>> print(discount_rewards)
    ... [ 0.72899997  0.81        0.89999998  1.          0.72899997  0.81
    ... 0.89999998  1.          0.72899997  0.81        0.89999998  1.        ]
    >>> discount_rewards = tl.rein.discount_episode_rewards(rewards, gamma, mode=1)
    >>> print(discount_rewards)
    ... [ 1.52110755  1.69011939  1.87791049  2.08656716  1.20729685  1.34144104
    ... 1.49048996  1.65610003  0.72899997  0.81        0.89999998  1.        ]
    

    损失函数

    Weighted Cross Entropy

    tensorlayer.rein.cross_entropy_reward_loss(logits, actions, rewards, name=None)[源代码]

    Calculate the loss for Policy Gradient Network.

    Parameters:

    logits : tensor

    The network outputs without softmax. This function implements softmax
    inside.

    actions : tensor/ placeholder

    The agent actions.

    rewards : tensor/ placeholder

    The rewards.

    Examples

    >>> states_batch_pl = tf.placeholder(tf.float32, shape=[None, D])
    >>> network = InputLayer(states_batch_pl, name='input')
    >>> network = DenseLayer(network, n_units=H, act=tf.nn.relu, name='relu1')
    >>> network = DenseLayer(network, n_units=3, name='out')
    >>> probs = network.outputs
    >>> sampling_prob = tf.nn.softmax(probs)
    >>> actions_batch_pl = tf.placeholder(tf.int32, shape=[None])
    >>> discount_rewards_batch_pl = tf.placeholder(tf.float32, shape=[None])
    >>> loss = tl.rein.cross_entropy_reward_loss(probs, actions_batch_pl, discount_rewards_batch_pl)
    >>> train_op = tf.train.RMSPropOptimizer(learning_rate, decay_rate).minimize(loss)
    

    Log weight

    tensorlayer.rein.log_weight(probs, weights, name='log_weight')[源代码]

    Log weight.

    Parameters:

    probs : tensor

    If it is a network output, usually we should scale it to [0, 1] via softmax.

    weights : tensor

    采样选择函数

    tensorlayer.rein.choice_action_by_probs(probs=[0.5, 0.5], action_list=None)[源代码]

    Choice and return an an action by given the action probability distribution.

    Parameters:

    probs : a list of float.

    The probability distribution of all actions.

    action_list : None or a list of action in integer, string or others.

    If None, returns an integer range between 0 and len(probs)-1.

    Examples

    >>> for _ in range(5):
    >>>     a = choice_action_by_probs([0.2, 0.4, 0.4])
    >>>     print(a)
    ... 0
    ... 1
    ... 1
    ... 2
    ... 1
    >>> for _ in range(3):
    >>>     a = choice_action_by_probs([0.5, 0.5], ['a', 'b'])
    >>>     print(a)
    ... a
    ... b
    ... b
    

    艾伯特(http://www.aibbt.com/)国内第一家人工智能门户

  • 相关阅读:
    CSS资料:IE8 CSS hack
    作为前端制作师你必须知道的事情!
    TABLE的1PT边框
    IE6 png图片的支持
    开源公司IronTec将推动PHP进驻Android平台
    终于解决了FLASH 播放器的问题了
    Windows7 添加快速启动栏
    不能调试存储过程的解决方法
    给Asp.Net Forums的后台管理菜单做一个树形外衣
    Asp.net Forums与Cnforums研究文章集合
  • 原文地址:https://www.cnblogs.com/aibbtcom/p/8540519.html
Copyright © 2020-2023  润新知