• transformer中的 train.py的理解

    1. 定义矩形scheme ret 得到一个bach_sizes数组
    {'min_length': 8, 'window_size': 720,
    'shuffle_queue_size': 270,
    'boundaries': [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 33, 36, 39, 42, 46, 50, 55, 60, 66, 72, 79, 86, 94, 103, 113, 124, 136, 149, 163, 179, 196, 215, 236],
    'max_length': 256,
    'batch_sizes': [240, 180, 180, 180, 144, 144, 144, 120, 120, 120, 90, 90, 90, 90, 80, 72, 72, 60, 60, 48, 48, 48, 40, 40, 36, 30, 30, 24, 24, 20, 20, 18, 18, 16, 15, 12, 12, 10, 10, 9, 8, 8]}
    2.input_pipline 读取文件 10个文件 decode_record
    组合成字典形式的数据集 dataset {"src_id": "target_id":}
    length = _example_length(example)
    return tf.logical_and(length >= min_length, length <= max_length)
    dataset = dataset.filter(functools.partial(example_valid_size, min_length = batching_scheme["min_length"], max_length = batching_scheme["max_length"]))
    (2)根据长度选择篮子编号:传入dataset {"src_id": "target_id":} 以及bundaries{} 遍历句子的长度,进行比较
    conditions_c = tf.logical_and(tf.less_equal(buckets_min, seq_length), tf.less(seq_length, buckets_max))
    返回 budaries所在的位置
    window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.

    Defined in tensorflow/contrib/data/python/ops/grouping.py.

    A transformation that groups windows of elements by key and reduces them.

    This transformation maps each consecutive element in a dataset to a key using key_func and groups the elements by key. It then applies reduce_func to at most window_size_func(key) elements matching the same key. All except the final window for each key will contain window_size_func(key) elements; the final window may be smaller.

    You may provide either a constant window_size or a window size determined by the key through window_size_func.

    key_func: A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to a scalar tf.int64 tensor.
    reduce_func: A function mapping a key and a dataset of up to window_size consecutive elements matching that key to another dataset.
    window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.
    window_size_func: A function mapping a key to a tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size.
    A Dataset transformation function, which can be passed to tf.data.Dataset.apply.

    ValueError: if neither or both of {window_size, window_size_func} are passed.
    (3)进行 pad grouped_dataset.padded_batch(batch_size, padded_shapes) ----group_dataset是什么 batch_size 为句子的个数 padded_shapes 要pad的维度
    整合 ,将id序列编程矩阵 dataset.apply(tf.contrib.data.group_by_window(example_to_bucket_id, batching_fn, None, )

    expend_dims :https://blog.csdn.net/qq_31780525/article/details/72280284
    tf.concat 以及tf.split: https://blog.csdn.net/momaojia/article/details/77603322 https://blog.csdn.net/UESTC_C2_403/article/details/73350457
    (1)normalization: normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
    outputs = gamma * normalized + beta 获取均值和方差:
    '''Applies layer normalization.

    inputs: A tensor with 2 or more dimensions, where the first dimension has
    epsilon: A floating number. A very small number for preventing ZeroDivision Error.
    scope: Optional scope for `variable_scope`.
    reuse: Boolean, whether to reuse the weights of a previous layer
    by the same name.

    A tensor with the same shape and data dtype as `inputs`.

    (2)embedding: 其用到了一个tensorflow中一个embedding 方法使输入的张量分布的更均匀,词与词之间存在着某种关系
    '''Embeds a given tensor.
    inputs: A `Tensor` with type `int32` or `int64` containing the ids
    to be looked up in `lookup table`.
    vocab_size: An int. Vocabulary size.
    num_units: An int. Number of embedding hidden units.
    zero_pad: A boolean. If True, all the values of the fist row (id 0)
    should be constant zeros.
    scale: A boolean. If True. the outputs is multiplied by sqrt num_units.
    scope: Optional scope for `variable_scope`.
    reuse: Boolean, whether to reuse the weights of a previous layer
    by the same name.
    A `Tensor` with one more rank than inputs's. The last dimensionality
    should be `num_units`.
    其中有用到一个函数: 其作用相当于,中文---英文 之间的对应 一个博客里讲的很靠谱吧,就是输入一个inputs_tensor 当作字典,
    其链接:https://www.jianshu.com/p/677e71364c8e 其用到one-hot编码https://blog.csdn.net/pipisorry/article/details/61193868
    (3)multi-head attention;
    a. QKV的全连接 dense:全连接层,其最后一维变为num_units,
    且 outputs = activation(inputs * kernel + bias)
    b.mask 的操作,利用reduce_sum找出为0 的,进行mask,通过将attention_score设置为最小值,标记其位置

    (6)位置编码: 有点问题

