• Tensorflow.nn 核心模块详解


    看过前面的例子,会发现实现深度神经网络需要使用 tensorflow.nn 这个核心模块。我们通过源码来一探究竟。

       1 # Copyright 2015 Google Inc. All Rights Reserved.
       2 #
       3 # Licensed under the Apache License, Version 2.0 (the "License");
       4 # you may not use this file except in compliance with the License.
       5 # You may obtain a copy of the License at
       6 #
       7 #     http://www.apache.org/licenses/LICENSE-2.0
       8 #
       9 # Unless required by applicable law or agreed to in writing, software
      10 # distributed under the License is distributed on an "AS IS" BASIS,
      11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      12 # See the License for the specific language governing permissions and
      13 # limitations under the License.
      14 # ==============================================================================
      15 
      16 # pylint: disable=unused-import,g-bad-import-order
      17 """## Activation Functions
      18 
      19 The activation ops provide different types of nonlinearities for use in neural
      20 networks.  These include smooth nonlinearities (`sigmoid`, `tanh`, `elu`,
      21 `softplus`, and `softsign`), continuous but not everywhere differentiable
      22 functions (`relu`, `relu6`, and `relu_x`), and random regularization
      23 (`dropout`).
      24 
      25 All activation ops apply componentwise, and produce a tensor of the same
      26 shape as the input tensor.
      27 
      28 @@relu
      29 @@relu6
      30 @@elu
      31 @@softplus
      32 @@softsign
      33 @@dropout
      34 @@bias_add
      35 @@sigmoid
      36 @@tanh
      37 
      38 ## Convolution
      39 
      40 The convolution ops sweep a 2-D filter over a batch of images, applying the
      41 filter to each window of each image of the appropriate size.  The different
      42 ops trade off between generic vs. specific filters:
      43 
      44 * `conv2d`: Arbitrary filters that can mix channels together.
      45 * `depthwise_conv2d`: Filters that operate on each channel independently.
      46 * `separable_conv2d`: A depthwise spatial filter followed by a pointwise filter.
      47 
      48 Note that although these ops are called "convolution", they are strictly
      49 speaking "cross-correlation" since the filter is combined with an input window
      50 without reversing the filter.  For details, see [the properties of
      51 cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation#Properties).
      52 
      53 The filter is applied to image patches of the same size as the filter and
      54 strided according to the `strides` argument.  `strides = [1, 1, 1, 1]` applies
      55 the filter to a patch at every offset, `strides = [1, 2, 2, 1]` applies the
      56 filter to every other image patch in each dimension, etc.
      57 
      58 Ignoring channels for the moment, and assume that the 4-D `input` has shape
      59 `[batch, in_height, in_width, ...]` and the 4-D `filter` has shape
      60 `[filter_height, filter_width, ...]`, then the spatial semantics of the
      61 convolution ops are as follows: first, according to the padding scheme chosen
      62 as `'SAME'` or `'VALID'`, the output size and the padding pixels are computed.
      63 For the `'SAME'` padding, the output height and width are computed as:
      64 
      65     out_height = ceil(float(in_height) / float(strides[1]))
      66     out_width  = ceil(float(in_width) / float(strides[2]))
      67 
      68 and the padding on the top and left are computed as:
      69 
      70     pad_along_height = ((out_height - 1) * strides[1] +
      71                         filter_height - in_height)
      72     pad_along_width = ((out_width - 1) * strides[2] +
      73                        filter_width - in_width)
      74     pad_top = pad_along_height / 2
      75     pad_left = pad_along_width / 2
      76 
      77 Note that the division by 2 means that there might be cases when the padding on
      78 both sides (top vs bottom, right vs left) are off by one. In this case, the
      79 bottom and right sides always get the one additional padded pixel. For example,
      80 when `pad_along_height` is 5, we pad 2 pixels at the top and 3 pixels at the
      81 bottom. Note that this is different from existing libraries such as cuDNN and
      82 Caffe, which explicitly specify the number of padded pixels and always pad the
      83 same number of pixels on both sides.
      84 
      85 For the `'VALID`' padding, the output height and width are computed as:
      86 
      87     out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
      88     out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))
      89 
      90 and the padding values are always zero. The output is then computed as
      91 
      92     output[b, i, j, :] =
      93         sum_{di, dj} input[b, strides[1] * i + di - pad_top,
      94                            strides[2] * j + dj - pad_left, ...] *
      95                      filter[di, dj, ...]
      96 
      97 where any value outside the original input image region are considered zero (
      98 i.e. we pad zero values around the border of the image).
      99 
     100 Since `input` is 4-D, each `input[b, i, j, :]` is a vector.  For `conv2d`, these
     101 vectors are multiplied by the `filter[di, dj, :, :]` matrices to produce new
     102 vectors.  For `depthwise_conv_2d`, each scalar component `input[b, i, j, k]`
     103 is multiplied by a vector `filter[di, dj, k]`, and all the vectors are
     104 concatenated.
     105 
     106 @@conv2d
     107 @@depthwise_conv2d
     108 @@separable_conv2d
     109 @@conv2d_transpose
     110 
     111 ## Pooling
     112 
     113 The pooling ops sweep a rectangular window over the input tensor, computing a
     114 reduction operation for each window (average, max, or max with argmax).  Each
     115 pooling op uses rectangular windows of size `ksize` separated by offset
     116 `strides`.  For example, if `strides` is all ones every window is used, if
     117 `strides` is all twos every other window is used in each dimension, etc.
     118 
     119 In detail, the output is
     120 
     121     output[i] = reduce(value[strides * i:strides * i + ksize])
     122 
     123 where the indices also take into consideration the padding values. Please refer
     124 to the `Convolution` section for details about the padding calculation.
     125 
     126 @@avg_pool
     127 @@max_pool
     128 @@max_pool_with_argmax
     129 
     130 ## Normalization
     131 
     132 Normalization is useful to prevent neurons from saturating when inputs may
     133 have varying scale, and to aid generalization.
     134 
     135 @@l2_normalize
     136 @@local_response_normalization
     137 @@sufficient_statistics
     138 @@normalize_moments
     139 @@moments
     140 
     141 ## Losses
     142 
     143 The loss ops measure error between two tensors, or between a tensor and zero.
     144 These can be used for measuring accuracy of a network in a regression task
     145 or for regularization purposes (weight decay).
     146 
     147 @@l2_loss
     148 
     149 ## Classification
     150 
     151 TensorFlow provides several operations that help you perform classification.
     152 
     153 @@sigmoid_cross_entropy_with_logits
     154 @@softmax
     155 @@log_softmax
     156 @@softmax_cross_entropy_with_logits
     157 @@sparse_softmax_cross_entropy_with_logits
     158 @@weighted_cross_entropy_with_logits
     159 
     160 ## Embeddings
     161 
     162 TensorFlow provides library support for looking up values in embedding
     163 tensors.
     164 
     165 @@embedding_lookup
     166 @@embedding_lookup_sparse
     167 
     168 ## Evaluation
     169 
     170 The evaluation ops are useful for measuring the performance of a network.
     171 Since they are nondifferentiable, they are typically used at evaluation time.
     172 
     173 @@top_k
     174 @@in_top_k
     175 
     176 ## Candidate Sampling
     177 
     178 Do you want to train a multiclass or multilabel model with thousands
     179 or millions of output classes (for example, a language model with a
     180 large vocabulary)?  Training with a full Softmax is slow in this case,
     181 since all of the classes are evaluated for every training example.
     182 Candidate Sampling training algorithms can speed up your step times by
     183 only considering a small randomly-chosen subset of contrastive classes
     184 (called candidates) for each batch of training examples.
     185 
     186 See our [Candidate Sampling Algorithms Reference]
     187 (../../extras/candidate_sampling.pdf)
     188 
     189 ### Sampled Loss Functions
     190 
     191 TensorFlow provides the following sampled loss functions for faster training.
     192 
     193 @@nce_loss
     194 @@sampled_softmax_loss
     195 
     196 ### Candidate Samplers
     197 
     198 TensorFlow provides the following samplers for randomly sampling candidate
     199 classes when using one of the sampled loss functions above.
     200 
     201 @@uniform_candidate_sampler
     202 @@log_uniform_candidate_sampler
     203 @@learned_unigram_candidate_sampler
     204 @@fixed_unigram_candidate_sampler
     205 
     206 ### Miscellaneous candidate sampling utilities
     207 
     208 @@compute_accidental_hits
     209 
     210 """
     211 from __future__ import absolute_import
     212 from __future__ import division
     213 from __future__ import print_function
     214 
     215 from six.moves import xrange  # pylint: disable=redefined-builtin
     216 
     217 from tensorflow.python.framework import dtypes
     218 from tensorflow.python.framework import ops
     219 from tensorflow.python.framework import tensor_shape
     220 from tensorflow.python.ops import array_ops
     221 from tensorflow.python.ops import candidate_sampling_ops
     222 from tensorflow.python.ops import constant_op
     223 from tensorflow.python.ops import control_flow_ops
     224 from tensorflow.python.ops import embedding_ops
     225 from tensorflow.python.ops import init_ops
     226 from tensorflow.python.ops import math_ops
     227 from tensorflow.python.ops import nn_grad
     228 from tensorflow.python.ops import nn_ops
     229 from tensorflow.python.ops import numerics
     230 from tensorflow.python.ops import random_ops
     231 from tensorflow.python.ops import rnn_cell
     232 from tensorflow.python.ops import seq2seq
     233 from tensorflow.python.ops import sparse_ops
     234 from tensorflow.python.ops import variable_scope as vs
     235 from tensorflow.python.ops.math_ops import sigmoid
     236 from tensorflow.python.ops.math_ops import tanh
     237 from tensorflow.python.util.all_util import make_all
     238 
     239 # Bring more nn-associated functionality into this package.
     240 # go/tf-wildcard-import
     241 # pylint: disable=wildcard-import
     242 from tensorflow.python.ops.nn_ops import *
     243 from tensorflow.python.ops.candidate_sampling_ops import *
     244 from tensorflow.python.ops.embedding_ops import *
     245 from tensorflow.python.ops.rnn import *
     246 # pylint: enable=wildcard-import
     247 
     248 
     249 def sigmoid_cross_entropy_with_logits(logits, targets, name=None):
     250   """Computes sigmoid cross entropy given `logits`.
     251 
     252   Measures the probability error in discrete classification tasks in which each
     253   class is independent and not mutually exclusive.  For instance, one could
     254   perform multilabel classification where a picture can contain both an elephant
     255   and a dog at the same time.
     256 
     257   For brevity, let `x = logits`, `z = targets`.  The logistic loss is
     258 
     259         z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
     260       = z * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x)))
     261       = z * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
     262       = z * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x))
     263       = (1 - z) * x + log(1 + exp(-x))
     264       = x - x * z + log(1 + exp(-x))
     265 
     266   To ensure stability and avoid overflow, the implementation uses
     267 
     268       max(x, 0) - x * z + log(1 + exp(-abs(x)))
     269 
     270   `logits` and `targets` must have the same type and shape.
     271 
     272   Args:
     273     logits: A `Tensor` of type `float32` or `float64`.
     274     targets: A `Tensor` of the same type and shape as `logits`.
     275     name: A name for the operation (optional).
     276 
     277   Returns:
     278     A `Tensor` of the same shape as `logits` with the componentwise
     279     logistic losses.
     280 
     281   Raises:
     282     ValueError: If `logits` and `targets` do not have the same shape.
     283   """
     284   with ops.op_scope([logits, targets], name, "logistic_loss") as name:
     285     logits = ops.convert_to_tensor(logits, name="logits")
     286     targets = ops.convert_to_tensor(targets, name="targets")
     287     try:
     288       targets.get_shape().merge_with(logits.get_shape())
     289     except ValueError:
     290       raise ValueError(
     291           "logits and targets must have the same shape (%s vs %s)"
     292           % (logits.get_shape(), targets.get_shape()))
     293 
     294     # The logistic loss formula from above is
     295     #   x - x * z + log(1 + exp(-x))
     296     # For x < 0, a more numerically stable formula is
     297     #   -x * z + log(1 + exp(x))
     298     # To avoid branching, we use the combined version
     299     #   max(x, 0) - x * z + log(1 + exp(-abs(x)))
     300     return math_ops.add(nn_ops.relu(logits) - logits * targets,
     301                         math_ops.log(1 + math_ops.exp(-math_ops.abs(logits))),
     302                         name=name)
     303 
     304 
     305 def weighted_cross_entropy_with_logits(logits, targets, pos_weight,
     306                                        name=None):
     307   """Computes a weighted cross entropy.
     308 
     309   This is like `sigmoid_cross_entropy_with_logits()` except that `pos_weight`,
     310   allows one to trade off recall and precision by up- or down-weighting the
     311   cost of a positive error relative to a negative error.
     312 
     313   The usual cross-entropy cost is defined as:
     314 
     315     targets * -log(sigmoid(logits)) + (1 - targets) * -log(1 - sigmoid(logits))
     316 
     317   The argument `pos_weight` is used as a multiplier for the positive targets:
     318 
     319     targets * -log(sigmoid(logits)) * pos_weight +
     320         (1 - targets) * -log(1 - sigmoid(logits))
     321 
     322   For brevity, let `x = logits`, `z = targets`, `q = pos_weight`.
     323   The loss is:
     324 
     325         qz * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
     326       = qz * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x)))
     327       = qz * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
     328       = qz * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x))
     329       = (1 - z) * x + (qz +  1 - z) * log(1 + exp(-x))
     330       = (1 - z) * x + (1 + (q - 1) * z) * log(1 + exp(-x))
     331 
     332   Setting `l = (1 + (q - 1) * z)`, to ensure stability and avoid overflow,
     333   the implementation uses
     334 
     335       (1 - z) * x + l * (log(1 + exp(-abs(x))) + max(-x, 0))
     336 
     337   `logits` and `targets` must have the same type and shape.
     338 
     339   Args:
     340     logits: A `Tensor` of type `float32` or `float64`.
     341     targets: A `Tensor` of the same type and shape as `logits`.
     342     pos_weight: A coefficient to use on the positive examples.
     343     name: A name for the operation (optional).
     344 
     345   Returns:
     346     A `Tensor` of the same shape as `logits` with the componentwise
     347     weightedlogistic losses.
     348 
     349   Raises:
     350     ValueError: If `logits` and `targets` do not have the same shape.
     351   """
     352   with ops.op_scope([logits, targets], name, "logistic_loss") as name:
     353     logits = ops.convert_to_tensor(logits, name="logits")
     354     targets = ops.convert_to_tensor(targets, name="targets")
     355     try:
     356       targets.get_shape().merge_with(logits.get_shape())
     357     except ValueError:
     358       raise ValueError(
     359           "logits and targets must have the same shape (%s vs %s)"
     360           % (logits.get_shape(), targets.get_shape()))
     361 
     362     # The logistic loss formula from above is
     363     #   (1 - z) * x + (1 + (q - 1) * z) * log(1 + exp(-x))
     364     # For x < 0, a more numerically stable formula is
     365     #   (1 - z) * x + (1 + (q - 1) * z) * log(1 + exp(x)) - l * x
     366     # To avoid branching, we use the combined version
     367     #   (1 - z) * x + l * (log(1 + exp(-abs(x))) + max(-x, 0))
     368     log_weight = 1 + (pos_weight - 1) * targets
     369     return math_ops.add(
     370         (1 - targets) * logits,
     371         log_weight * (math_ops.log(1 + math_ops.exp(-math_ops.abs(logits))) +
     372                       nn_ops.relu(-logits)),
     373         name=name)
     374 
     375 
     376 def relu_layer(x, weights, biases, name=None):
     377   """Computes Relu(x * weight + biases).
     378 
     379   Args:
     380     x: a 2D tensor.  Dimensions typically: batch, in_units
     381     weights: a 2D tensor.  Dimensions typically: in_units, out_units
     382     biases: a 1D tensor.  Dimensions: out_units
     383     name: A name for the operation (optional).  If not specified
     384       "nn_relu_layer" is used.
     385 
     386   Returns:
     387     A 2-D Tensor computing relu(matmul(x, weights) + biases).
     388     Dimensions typically: batch, out_units.
     389   """
     390   with ops.op_scope([x, weights, biases], name, "relu_layer") as name:
     391     x = ops.convert_to_tensor(x, name="x")
     392     weights = ops.convert_to_tensor(weights, name="weights")
     393     biases = ops.convert_to_tensor(biases, name="biases")
     394     xw_plus_b = nn_ops.bias_add(math_ops.matmul(x, weights), biases)
     395     return nn_ops.relu(xw_plus_b, name=name)
     396 
     397 
     398 def l2_normalize(x, dim, epsilon=1e-12, name=None):
     399   """Normalizes along dimension `dim` using an L2 norm.
     400 
     401   For a 1-D tensor with `dim = 0`, computes
     402 
     403       output = x / sqrt(max(sum(x**2), epsilon))
     404 
     405   For `x` with more dimensions, independently normalizes each 1-D slice along
     406   dimension `dim`.
     407 
     408   Args:
     409     x: A `Tensor`.
     410     dim: Dimension along which to normalize.
     411     epsilon: A lower bound value for the norm. Will use `sqrt(epsilon)` as the
     412       divisor if `norm < sqrt(epsilon)`.
     413     name: A name for this operation (optional).
     414 
     415   Returns:
     416     A `Tensor` with the same shape as `x`.
     417   """
     418   with ops.op_scope([x], name, "l2_normalize") as name:
     419     x = ops.convert_to_tensor(x, name="x")
     420     square_sum = math_ops.reduce_sum(math_ops.square(x), [dim], keep_dims=True)
     421     x_inv_norm = math_ops.rsqrt(math_ops.maximum(square_sum, epsilon))
     422     return math_ops.mul(x, x_inv_norm, name=name)
     423 
     424 
     425 def zero_fraction(value, name=None):
     426   """Returns the fraction of zeros in `value`.
     427 
     428   If `value` is empty, the result is `nan`.
     429 
     430   This is useful in summaries to measure and report sparsity.  For example,
     431 
     432       z = tf.Relu(...)
     433       summ = tf.scalar_summary('sparsity', tf.nn.zero_fraction(z))
     434 
     435   Args:
     436     value: A tensor of numeric type.
     437     name: A name for the operation (optional).
     438 
     439   Returns:
     440     The fraction of zeros in `value`, with type `float32`.
     441   """
     442   with ops.op_scope([value], name, "zero_fraction"):
     443     value = ops.convert_to_tensor(value, name="value")
     444     zero = constant_op.constant(0, dtype=value.dtype, name="zero")
     445     return math_ops.reduce_mean(math_ops.cast(math_ops.equal(value, zero),
     446                                               dtypes.float32))
     447 
     448 
     449 def depthwise_conv2d(input, filter, strides, padding, name=None):
     450   """Depthwise 2-D convolution.
     451 
     452   Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
     453   and a filter tensor of shape
     454   `[filter_height, filter_width, in_channels, channel_multiplier]`
     455   containing `in_channels` convolutional filters of depth 1, `depthwise_conv2d`
     456   applies a different filter to each input channel (expanding from 1 channel
     457   to `channel_multiplier` channels for each), then concatenates the results
     458   together.  The output has `in_channels * channel_multiplier` channels.
     459 
     460   In detail,
     461 
     462       output[b, i, j, k * channel_multiplier + q] =
     463           sum_{di, dj} input[b, strides[1] * i + di, strides[2] * j + dj, k] *
     464                        filter[di, dj, k, q]
     465 
     466   Must have `strides[0] = strides[3] = 1`.  For the most common case of the
     467   same horizontal and vertical strides, `strides = [1, stride, stride, 1]`.
     468 
     469   Args:
     470     input: 4-D with shape `[batch, in_height, in_width, in_channels]`.
     471     filter: 4-D with shape
     472       `[filter_height, filter_width, in_channels, channel_multiplier]`.
     473     strides: 1-D of size 4.  The stride of the sliding window for each
     474       dimension of `input`.
     475     padding: A string, either `'VALID'` or `'SAME'`.  The padding algorithm.
     476     name: A name for this operation (optional).
     477 
     478   Returns:
     479     A 4-D `Tensor` of shape
     480     `[batch, out_height, out_width, in_channels * channel_multiplier].`
     481   """
     482   with ops.op_scope([input, filter], name, "depthwise") as name:
     483     input = ops.convert_to_tensor(input, name="tensor_in")
     484     filter = ops.convert_to_tensor(filter, name="filter_in")
     485     # A shape is required to statically compute the number of separable filters.
     486     if filter.get_shape().ndims is not None:
     487       assert len(filter.get_shape()) == 4
     488       in_channels = filter.get_shape()[2]
     489       # Sanity checks, if shape information is available for the inputs.
     490       if input.get_shape().ndims is not None:
     491         assert len(input.get_shape()) == 4
     492         assert input.get_shape()[3] == in_channels, (
     493             "Mismatched input depth %d and number of depthwise filters %d." % (
     494                 input.get_shape()[3].value, in_channels))
     495     else:
     496       assert input.get_shape().ndims is not None, (
     497           "Either tensor must provide static shape information.")
     498       assert input.get_shape().ndims == 4
     499       in_channels = input.get_shape()[3]
     500 
     501     if in_channels == 1:
     502       return nn_ops.conv2d(input, filter, strides, padding, name=name)
     503     else:
     504       # Create one separate convolution per channel.
     505       convs = []
     506       for channel in xrange(in_channels):
     507         with ops.name_scope("depth%d" % channel) as channel_scope:
     508           t_in = array_ops.slice(input, [0, 0, 0, channel], [-1, -1, -1, 1],
     509                                  name="slice_inputs")
     510           f_in = array_ops.slice(filter, [0, 0, channel, 0], [-1, -1, 1, -1],
     511                                  name="slice_params")
     512           convs.append(nn_ops.conv2d(t_in, f_in,
     513                                      strides, padding, name=channel_scope))
     514       # Concatenate the per-channel convolutions along the channel dimension.
     515       return array_ops.concat(3, convs, name=name)
     516 
     517 
     518 def separable_conv2d(input, depthwise_filter, pointwise_filter, strides,
     519                      padding,
     520                      name=None):
     521   """2-D convolution with separable filters.
     522 
     523   Performs a depthwise convolution that acts separately on channels followed by
     524   a pointwise convolution that mixes channels.  Note that this is separability
     525   between dimensions `[1, 2]` and `3`, not spatial separability between
     526   dimensions `1` and `2`.
     527 
     528   In detail,
     529 
     530       output[b, i, j, k] = sum_{di, dj, q, r]
     531           input[b, strides[1] * i + di, strides[2] * j + dj, q] *
     532           depthwise_filter[di, dj, q, r] *
     533           pointwise_filter[0, 0, q * channel_multiplier + r, k]
     534 
     535   `strides` controls the strides for the depthwise convolution only, since
     536   the pointwise convolution has implicit strides of `[1, 1, 1, 1]`.  Must have
     537   `strides[0] = strides[3] = 1`.  For the most common case of the same
     538   horizontal and vertical strides, `strides = [1, stride, stride, 1]`.
     539 
     540   Args:
     541     input: 4-D `Tensor` with shape `[batch, in_height, in_width, in_channels]`.
     542     depthwise_filter: 4-D `Tensor` with shape
     543       `[filter_height, filter_width, in_channels, channel_multiplier]`.
     544       Contains `in_channels` convolutional filters of depth 1.
     545     pointwise_filter: 4-D `Tensor` with shape
     546       `[1, 1, channel_multiplier * in_channels, out_channels]`.  Pointwise
     547       filter to mix channels after `depthwise_filter` has convolved spatially.
     548     strides: 1-D of size 4.  The strides for the depthwise convolution for
     549       each dimension of `input`.
     550     padding: A string, either `'VALID'` or `'SAME'`.  The padding algorithm.
     551     name: A name for this operation (optional).
     552 
     553   Returns:
     554     A 4-D `Tensor` of shape `[batch, out_height, out_width, out_channels]`.
     555   """
     556   with ops.op_scope([input, depthwise_filter, pointwise_filter],
     557                    name, "separable_conv2d") as name:
     558     input = ops.convert_to_tensor(input, name="tensor_in")
     559     depthwise_filter = ops.convert_to_tensor(depthwise_filter,
     560                                              name="depthwise_filter")
     561     pointwise_filter = ops.convert_to_tensor(pointwise_filter,
     562                                              name="pointwise_filter")
     563 
     564     if pointwise_filter.get_shape().ndims is not None:
     565       assert len(pointwise_filter.get_shape()) == 4
     566       assert pointwise_filter.get_shape()[0] == 1
     567       assert pointwise_filter.get_shape()[1] == 1
     568       if depthwise_filter.get_shape().ndims and input.get_shape().ndims:
     569         channel_multiplier = depthwise_filter.get_shape()[3]
     570         in_channels = input.get_shape()[3]
     571         out_channels = pointwise_filter.get_shape()[3]
     572         # This would mean the separable convolutions is over-parametrized.
     573         assert channel_multiplier * in_channels < out_channels
     574     # The layout of the ops in the graph are expected to be as follows:
     575     # separable_conv2d  // Conv2D op corresponding to the pointwise conv.
     576     # separable_conv2d/depthwise  // Concat op for the deptwise outputs.
     577     # separable_conv2d/depthwise/depth0  // Conv2D op for depth 0
     578     # separable_conv2d/depthwise/depth1  // Conv2D op for depth 1
     579     # separable_conv2d/depthwise/depth2  // Conv2D op for depth 2
     580     depthwise = depthwise_conv2d(input, depthwise_filter, strides,
     581                                  padding, name="depthwise")
     582     return nn_ops.conv2d(depthwise, pointwise_filter, [1, 1, 1, 1],
     583                          padding="VALID", name=name)
     584 
     585 
     586 def sufficient_statistics(x, axes, shift=True, keep_dims=False, name=None):
     587   """Calculate the sufficient statistics for the mean and variance of `x`.
     588 
     589   These sufficient statistics are computed using the one pass algorithm on
     590   an input that's optionally shifted using the value of the 1st element in `x`.
     591   See:
     592   https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Computing_shifted_data
     593 
     594   Args:
     595     x: A `Tensor`.
     596     axes: Array of ints. Axes along which to compute mean and variance.
     597     shift: If true, shift the data to provide more numerically stable results.
     598     keep_dims: produce statistics with the same dimensionality as the input.
     599     name: Name used to scope the operations that compute the sufficient stats.
     600 
     601   Returns:
     602     Four `Tensor` objects of the same type as `x`:
     603     * the count (number of elements to average over).
     604     * the (possibly shifted) sum of the elements in the array.
     605     * the (possibly shifted) sum of squares of the elements in the array.
     606     * the shift by which the mean must be corrected or None if `shift` is False.
     607   """
     608   with ops.op_scope([x, axes], name, "sufficient_statistics"):
     609     x = ops.convert_to_tensor(x, name="x")
     610     x_shape = x.get_shape()
     611     if x_shape.is_fully_defined():
     612       counts = 1
     613       m_shape = []
     614       for d in xrange(x_shape.ndims):
     615         dim = x_shape[d].value
     616         if d in set(axes):
     617           counts *= dim
     618           dim = 1
     619         m_shape.append(dim)
     620       counts = constant_op.constant(counts, dtype=x.dtype)
     621     else:  # shape needs to be inferred at runtime.
     622       x_shape = array_ops.shape(x)
     623       select_axes = sparse_ops.sparse_to_dense(axes, array_ops.shape(x_shape),
     624                                                True, False)
     625       m_shape = math_ops.select(select_axes, array_ops.ones_like(x_shape),
     626                                 x_shape)
     627       counts = math_ops.cast(
     628           math_ops.reduce_prod(x_shape / m_shape),
     629           x.dtype,
     630           name="count")
     631     if shift:
     632       shift_value = array_ops.slice(x, array_ops.zeros_like(m_shape), m_shape)
     633       m_ss = math_ops.sub(x, shift_value)
     634       v_ss = math_ops.squared_difference(x, shift_value)
     635       if keep_dims:
     636         shift_value = array_ops.identity(shift_value, name="shift")
     637       else:
     638         shift_value = array_ops.squeeze(shift_value,
     639                                         squeeze_dims=axes,
     640                                         name="shift")
     641     else:  # not shift.
     642       m_ss = x
     643       v_ss = math_ops.square(x)
     644       shift_value = None
     645     m_ss = math_ops.reduce_sum(m_ss, axes, keep_dims=keep_dims, name="mean_ss")
     646     v_ss = math_ops.reduce_sum(v_ss, axes, keep_dims=keep_dims, name="var_ss")
     647   return counts, m_ss, v_ss, shift_value
     648 
     649 
     650 def normalize_moments(counts, mean_ss, variance_ss, shift, name=None):
     651   """Calculate the mean and variance of based on the sufficient statistics.
     652 
     653   Args:
     654     counts: A `Tensor` containing a the total count of the data (one value).
     655     mean_ss: A `Tensor` containing the mean sufficient statistics: the (possibly
     656       shifted) sum of the elements to average over.
     657     variance_ss: A `Tensor` containing the variance sufficient statistics: the
     658       (possibly shifted) squared sum of the data to compute the variance over.
     659     shift: A `Tensor` containing the value by which the data is shifted for
     660       numerical stability, or `None` if no shift was performed.
     661     name: Name used to scope the operations that compute the moments.
     662 
     663   Returns:
     664     Two `Tensor` objects: `mean` and `variance`.
     665   """
     666   with ops.op_scope([counts, mean_ss, variance_ss, shift], name, "normalize"):
     667     divisor = math_ops.inv(counts, name="divisor")
     668     if shift is not None:
     669       shifted_mean = math_ops.mul(mean_ss, divisor, name="shifted_mean")
     670       mean = math_ops.add(shifted_mean, shift, name="mean")
     671     else:  # no shift.
     672       shifted_mean = math_ops.mul(mean_ss, divisor, name="mean")
     673       mean = shifted_mean
     674     variance = math_ops.sub(
     675         math_ops.mul(variance_ss, divisor),
     676         math_ops.square(shifted_mean),
     677         name="variance")
     678   return (mean, variance)
     679 
     680 
     681 def moments(x, axes, name=None, keep_dims=False):
     682   """Calculate the mean and variance of `x`.
     683 
     684   The mean and variance are calculated by aggregating the contents of `x`
     685   across `axes`.  If `x` is 1-D and `axes = [0]` this is just the mean
     686   and variance of a vector.
     687 
     688   When using these moments for batch normalization (see
     689   `tf.nn.batch_normalization`):
     690     * for so-called "global normalization", used with convolutional filters with
     691       shape `[batch, height, width, depth]`, pass `axes=[0, 1, 2]`.
     692     * for simple batch normalization pass `axes=[0]` (batch only).
     693 
     694   Args:
     695     x: A `Tensor`.
     696     axes: array of ints.  Axes along which to compute mean and
     697       variance.
     698     keep_dims: produce moments with the same dimensionality as the input.
     699     name: Name used to scope the operations that compute the moments.
     700 
     701   Returns:
     702     Two `Tensor` objects: `mean` and `variance`.
     703   """
     704   with ops.op_scope([x, axes], name, "moments"):
     705     counts, m_ss, v_ss, shift = sufficient_statistics(x,
     706                                                       axes,
     707                                                       keep_dims=keep_dims,
     708                                                       name=name)
     709     return normalize_moments(counts, m_ss, v_ss, shift, name=name)
     710 
     711 
     712 def batch_normalization(x,
     713                         mean,
     714                         variance,
     715                         offset,
     716                         scale,
     717                         variance_epsilon,
     718                         name=None):
     719   """Batch normalization.
     720 
     721   As described in http://arxiv.org/abs/1502.03167.
     722   Normalizes a tensor by `mean` and `variance`, and applies (optionally) a
     723   `scale` \\(gamma\\) to it, as well as an `offset` \\(\beta\\):
     724 
     725   \\(\frac{gamma(x-mu)}{sigma}+\beta\\)
     726 
     727   `mean`, `variance`, `offset` and `scale` are all expected to be of one of two
     728   shapes:
     729     * In all generality, they can have the same number of dimensions as the
     730       input `x`, with identical sizes as `x` for the dimensions that are not
     731       normalized over (the 'depth' dimension(s)), and dimension 1 for the
     732       others which are being normalized over.
     733       `mean` and `variance` in this case would typically be the outputs of
     734       `tf.nn.moments(..., keep_dims=True)` during training, or running averages
     735       thereof during inference.
     736     * In the common case where the 'depth' dimension is the last dimension in
     737       the input tensor `x`, they may be one dimensional tensors of the same
     738       size as the 'depth' dimension.
     739       This is the case for example for the common `[batch, depth]` layout of
     740       fully-connected layers, and `[batch, height, width, depth]` for
     741       convolutions.
     742       `mean` and `variance` in this case would typically be the outputs of
     743       `tf.nn.moments(..., keep_dims=False)` during training, or running averages
     744       thereof during inference.
     745 
     746   Args:
     747     x: Input `Tensor` of arbitrary dimensionality.
     748     mean: A mean `Tensor`.
     749     variance: A variance `Tensor`.
     750     offset: An offset `Tensor`, often denoted \\(\beta\\) in equations, or
     751       None. If present, will be added to the normalized tensor.
     752     scale: A scale `Tensor`, often denoted \\(gamma\\) in equations, or
     753       `None`. If present, the scale is applied to the normalized tensor.
     754     variance_epsilon: A small float number to avoid dividing by 0.
     755     name: A name for this operation (optional).
     756 
     757   Returns:
     758     the normalized, scaled, offset tensor.
     759   """
     760   with ops.op_scope([x, mean, variance, scale, offset], name, "batchnorm"):
     761     inv = math_ops.rsqrt(variance + variance_epsilon)
     762     if scale is not None:
     763       inv *= scale
     764     return x * inv + (
     765         offset - mean * inv if offset is not None else -mean * inv)
     766 
     767 
     768 def batch_norm_with_global_normalization(t,
     769                                          m,
     770                                          v,
     771                                          beta,
     772                                          gamma,
     773                                          variance_epsilon,
     774                                          scale_after_normalization,
     775                                          name=None):
     776   """Batch normalization.
     777 
     778   This op is deprecated. See `tf.nn.batch_normalization`.
     779 
     780   Args:
     781     t: A 4D input Tensor.
     782     m: A 1D mean Tensor with size matching the last dimension of t.
     783       This is the first output from tf.nn.moments,
     784       or a saved moving average thereof.
     785     v: A 1D variance Tensor with size matching the last dimension of t.
     786       This is the second output from tf.nn.moments,
     787       or a saved moving average thereof.
     788     beta: A 1D beta Tensor with size matching the last dimension of t.
     789       An offset to be added to the normalized tensor.
     790     gamma: A 1D gamma Tensor with size matching the last dimension of t.
     791       If "scale_after_normalization" is true, this tensor will be multiplied
     792       with the normalized tensor.
     793     variance_epsilon: A small float number to avoid dividing by 0.
     794     scale_after_normalization: A bool indicating whether the resulted tensor
     795       needs to be multiplied with gamma.
     796     name: A name for this operation (optional).
     797 
     798    Returns:
     799      A batch-normalized `t`.
     800   """
     801   return batch_normalization(t, m, v, beta, gamma if scale_after_normalization
     802                              else None, variance_epsilon, name)
     803 
     804 
     805 def _sum_rows(x):
     806   """Returns a vector summing up each row of the matrix x."""
     807   # _sum_rows(x) is equivalent to math_ops.reduce_sum(x, 1) when x is
     808   # a matrix.  The gradient of _sum_rows(x) is more efficient than
     809   # reduce_sum(x, 1)'s gradient in today's implementation. Therefore,
     810   # we use _sum_rows(x) in the nce_loss() computation since the loss
     811   # is mostly used for training.
     812   cols = array_ops.shape(x)[1]
     813   ones_shape = array_ops.pack([cols, 1])
     814   ones = array_ops.ones(ones_shape, x.dtype)
     815   return array_ops.reshape(math_ops.matmul(x, ones), [-1])
     816 
     817 
     818 def _compute_sampled_logits(weights, biases, inputs, labels, num_sampled,
     819                             num_classes, num_true=1,
     820                             sampled_values=None,
     821                             subtract_log_q=True,
     822                             remove_accidental_hits=False,
     823                             partition_strategy="mod",
     824                             name=None):
     825   """Helper function for nce_loss and sampled_softmax_loss functions.
     826 
     827   Computes sampled output training logits and labels suitable for implementing
     828   e.g. noise-contrastive estimation (see nce_loss) or sampled softmax (see
     829   sampled_softmax_loss).
     830 
     831   Note: In the case where num_true > 1, we assign to each target class
     832   the target probability 1 / num_true so that the target probabilities
     833   sum to 1 per-example.
     834 
     835   Args:
     836     weights: A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor`
     837         objects whose concatenation along dimension 0 has shape
     838         `[num_classes, dim]`.  The (possibly-partitioned) class embeddings.
     839     biases: A `Tensor` of shape `[num_classes]`.  The class biases.
     840     inputs: A `Tensor` of shape `[batch_size, dim]`.  The forward
     841         activations of the input network.
     842     labels: A `Tensor` of type `int64` and shape `[batch_size,
     843         num_true]`. The target classes.  Note that this format differs from
     844         the `labels` argument of `nn.softmax_cross_entropy_with_logits`.
     845     num_sampled: An `int`.  The number of classes to randomly sample per batch.
     846     num_classes: An `int`. The number of possible classes.
     847     num_true: An `int`.  The number of target classes per training example.
     848     sampled_values: a tuple of (`sampled_candidates`, `true_expected_count`,
     849         `sampled_expected_count`) returned by a `*_candidate_sampler` function.
     850         (if None, we default to `log_uniform_candidate_sampler`)
     851     subtract_log_q: A `bool`.  whether to subtract the log expected count of
     852         the labels in the sample to get the logits of the true labels.
     853         Default is True.  Turn off for Negative Sampling.
     854     remove_accidental_hits:  A `bool`.  whether to remove "accidental hits"
     855         where a sampled class equals one of the target classes.  Default is
     856         False.
     857     partition_strategy: A string specifying the partitioning strategy, relevant
     858         if `len(weights) > 1`. Currently `"div"` and `"mod"` are supported.
     859         Default is `"mod"`. See `tf.nn.embedding_lookup` for more details.
     860     name: A name for the operation (optional).
     861   Returns:
     862     out_logits, out_labels: `Tensor` objects each with shape
     863         `[batch_size, num_true + num_sampled]`, for passing to either
     864         `nn.sigmoid_cross_entropy_with_logits` (NCE) or
     865         `nn.softmax_cross_entropy_with_logits` (sampled softmax).
     866   """
     867 
     868   if not isinstance(weights, list):
     869     weights = [weights]
     870 
     871   with ops.op_scope(
     872       weights + [biases, inputs, labels], name, "compute_sampled_logits"):
     873     if labels.dtype != dtypes.int64:
     874       labels = math_ops.cast(labels, dtypes.int64)
     875     labels_flat = array_ops.reshape(labels, [-1])
     876 
     877     # Sample the negative labels.
     878     #   sampled shape: [num_sampled] tensor
     879     #   true_expected_count shape = [batch_size, 1] tensor
     880     #   sampled_expected_count shape = [num_sampled] tensor
     881     if sampled_values is None:
     882       sampled_values = candidate_sampling_ops.log_uniform_candidate_sampler(
     883           true_classes=labels,
     884           num_true=num_true,
     885           num_sampled=num_sampled,
     886           unique=True,
     887           range_max=num_classes)
     888     # NOTE: pylint cannot tell that 'sampled_values' is a sequence
     889     # pylint: disable=unpacking-non-sequence
     890     sampled, true_expected_count, sampled_expected_count = sampled_values
     891     # pylint: enable=unpacking-non-sequence
     892 
     893     # labels_flat is a [batch_size * num_true] tensor
     894     # sampled is a [num_sampled] int tensor
     895     all_ids = array_ops.concat(0, [labels_flat, sampled])
     896 
     897     # weights shape is [num_classes, dim]
     898     all_w = embedding_ops.embedding_lookup(
     899         weights, all_ids, partition_strategy=partition_strategy)
     900     all_b = embedding_ops.embedding_lookup(biases, all_ids)
     901     # true_w shape is [batch_size * num_true, dim]
     902     # true_b is a [batch_size * num_true] tensor
     903     true_w = array_ops.slice(
     904         all_w, [0, 0], array_ops.pack([array_ops.shape(labels_flat)[0], -1]))
     905     true_b = array_ops.slice(all_b, [0], array_ops.shape(labels_flat))
     906 
     907     # inputs shape is [batch_size, dim]
     908     # true_w shape is [batch_size * num_true, dim]
     909     # row_wise_dots is [batch_size, num_true, dim]
     910     dim = array_ops.shape(true_w)[1:2]
     911     new_true_w_shape = array_ops.concat(0, [[-1, num_true], dim])
     912     row_wise_dots = math_ops.mul(
     913         array_ops.expand_dims(inputs, 1),
     914         array_ops.reshape(true_w, new_true_w_shape))
     915     # We want the row-wise dot plus biases which yields a
     916     # [batch_size, num_true] tensor of true_logits.
     917     dots_as_matrix = array_ops.reshape(row_wise_dots,
     918                                        array_ops.concat(0, [[-1], dim]))
     919     true_logits = array_ops.reshape(_sum_rows(dots_as_matrix), [-1, num_true])
     920     true_b = array_ops.reshape(true_b, [-1, num_true])
     921     true_logits += true_b
     922 
     923     # Lookup weights and biases for sampled labels.
     924     #   sampled_w shape is [num_sampled, dim]
     925     #   sampled_b is a [num_sampled] float tensor
     926     sampled_w = array_ops.slice(
     927         all_w, array_ops.pack([array_ops.shape(labels_flat)[0], 0]), [-1, -1])
     928     sampled_b = array_ops.slice(all_b, array_ops.shape(labels_flat), [-1])
     929 
     930     # inputs has shape [batch_size, dim]
     931     # sampled_w has shape [num_sampled, dim]
     932     # sampled_b has shape [num_sampled]
     933     # Apply X*W'+B, which yields [batch_size, num_sampled]
     934     sampled_logits = math_ops.matmul(inputs,
     935                                      sampled_w,
     936                                      transpose_b=True) + sampled_b
     937 
     938     if remove_accidental_hits:
     939       acc_hits = candidate_sampling_ops.compute_accidental_hits(
     940           labels, sampled, num_true=num_true)
     941       acc_indices, acc_ids, acc_weights = acc_hits
     942 
     943       # This is how SparseToDense expects the indices.
     944       acc_indices_2d = array_ops.reshape(acc_indices, [-1, 1])
     945       acc_ids_2d_int32 = array_ops.reshape(math_ops.cast(
     946           acc_ids, dtypes.int32), [-1, 1])
     947       sparse_indices = array_ops.concat(
     948           1, [acc_indices_2d, acc_ids_2d_int32], "sparse_indices")
     949       # Create sampled_logits_shape = [batch_size, num_sampled]
     950       sampled_logits_shape = array_ops.concat(
     951           0,
     952           [array_ops.shape(labels)[:1], array_ops.expand_dims(num_sampled, 0)])
     953       if sampled_logits.dtype != acc_weights.dtype:
     954         acc_weights = math_ops.cast(acc_weights, sampled_logits.dtype)
     955       sampled_logits += sparse_ops.sparse_to_dense(
     956           sparse_indices, sampled_logits_shape, acc_weights,
     957           default_value=0.0, validate_indices=False)
     958 
     959     if subtract_log_q:
     960       # Subtract log of Q(l), prior probability that l appears in sampled.
     961       true_logits -= math_ops.log(true_expected_count)
     962       sampled_logits -= math_ops.log(sampled_expected_count)
     963 
     964     # Construct output logits and labels. The true labels/logits start at col 0.
     965     out_logits = array_ops.concat(1, [true_logits, sampled_logits])
     966     # true_logits is a float tensor, ones_like(true_logits) is a float tensor
     967     # of ones. We then divide by num_true to ensure the per-example labels sum
     968     # to 1.0, i.e. form a proper probability distribution.
     969     out_labels = array_ops.concat(
     970         1, [array_ops.ones_like(true_logits) / num_true,
     971             array_ops.zeros_like(sampled_logits)])
     972 
     973   return out_logits, out_labels
     974 
     975 
     976 def nce_loss(weights, biases, inputs, labels, num_sampled, num_classes,
     977              num_true=1,
     978              sampled_values=None,
     979              remove_accidental_hits=False,
     980              partition_strategy="mod",
     981              name="nce_loss"):
     982   """Computes and returns the noise-contrastive estimation training loss.
     983 
     984   See [Noise-contrastive estimation: A new estimation principle for
     985   unnormalized statistical models]
     986   (http://www.jmlr.org/proceedings/papers/v9/gutmann10a/gutmann10a.pdf).
     987   Also see our [Candidate Sampling Algorithms Reference]
     988   (../../extras/candidate_sampling.pdf)
     989 
     990   Note: In the case where `num_true` > 1, we assign to each target class
     991   the target probability 1 / `num_true` so that the target probabilities
     992   sum to 1 per-example.
     993 
     994   Note: It would be useful to allow a variable number of target classes per
     995   example.  We hope to provide this functionality in a future release.
     996   For now, if you have a variable number of target classes, you can pad them
     997   out to a constant number by either repeating them or by padding
     998   with an otherwise unused class.
     999 
    1000   Args:
    1001     weights: A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor`
    1002         objects whose concatenation along dimension 0 has shape
    1003         [num_classes, dim].  The (possibly-partitioned) class embeddings.
    1004     biases: A `Tensor` of shape `[num_classes]`.  The class biases.
    1005     inputs: A `Tensor` of shape `[batch_size, dim]`.  The forward
    1006         activations of the input network.
    1007     labels: A `Tensor` of type `int64` and shape `[batch_size,
    1008         num_true]`. The target classes.
    1009     num_sampled: An `int`.  The number of classes to randomly sample per batch.
    1010     num_classes: An `int`. The number of possible classes.
    1011     num_true: An `int`.  The number of target classes per training example.
    1012     sampled_values: a tuple of (`sampled_candidates`, `true_expected_count`,
    1013         `sampled_expected_count`) returned by a `*_candidate_sampler` function.
    1014         (if None, we default to `log_uniform_candidate_sampler`)
    1015     remove_accidental_hits:  A `bool`.  Whether to remove "accidental hits"
    1016         where a sampled class equals one of the target classes.  If set to
    1017         `True`, this is a "Sampled Logistic" loss instead of NCE, and we are
    1018         learning to generate log-odds instead of log probabilities.  See
    1019         our [Candidate Sampling Algorithms Reference]
    1020         (../../extras/candidate_sampling.pdf).
    1021         Default is False.
    1022     partition_strategy: A string specifying the partitioning strategy, relevant
    1023         if `len(weights) > 1`. Currently `"div"` and `"mod"` are supported.
    1024         Default is `"mod"`. See `tf.nn.embedding_lookup` for more details.
    1025     name: A name for the operation (optional).
    1026 
    1027   Returns:
    1028     A `batch_size` 1-D tensor of per-example NCE losses.
    1029   """
    1030   logits, labels = _compute_sampled_logits(
    1031       weights, biases, inputs, labels, num_sampled, num_classes,
    1032       num_true=num_true,
    1033       sampled_values=sampled_values,
    1034       subtract_log_q=True,
    1035       remove_accidental_hits=remove_accidental_hits,
    1036       partition_strategy=partition_strategy,
    1037       name=name)
    1038   sampled_losses = sigmoid_cross_entropy_with_logits(logits,
    1039                                                      labels,
    1040                                                      name="sampled_losses")
    1041   # sampled_losses is batch_size x {true_loss, sampled_losses...}
    1042   # We sum out true and sampled losses.
    1043   return _sum_rows(sampled_losses)
    1044 
    1045 
    1046 def sampled_softmax_loss(weights, biases, inputs, labels, num_sampled,
    1047                          num_classes, num_true=1,
    1048                          sampled_values=None,
    1049                          remove_accidental_hits=True,
    1050                          partition_strategy="mod",
    1051                          name="sampled_softmax_loss"):
    1052   """Computes and returns the sampled softmax training loss.
    1053 
    1054   This is a faster way to train a softmax classifier over a huge number of
    1055   classes.
    1056 
    1057   This operation is for training only.  It is generally an underestimate of
    1058   the full softmax loss.
    1059 
    1060   At inference time, you can compute full softmax probabilities with the
    1061   expression `tf.nn.softmax(tf.matmul(inputs, weights) + biases)`.
    1062 
    1063   See our [Candidate Sampling Algorithms Reference]
    1064   (../../extras/candidate_sampling.pdf)
    1065 
    1066   Also see Section 3 of [Jean et al., 2014](http://arxiv.org/abs/1412.2007)
    1067   ([pdf](http://arxiv.org/pdf/1412.2007.pdf)) for the math.
    1068 
    1069   Args:
    1070     weights: A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor`
    1071         objects whose concatenation along dimension 0 has shape
    1072         [num_classes, dim].  The (possibly-sharded) class embeddings.
    1073     biases: A `Tensor` of shape `[num_classes]`.  The class biases.
    1074     inputs: A `Tensor` of shape `[batch_size, dim]`.  The forward
    1075         activations of the input network.
    1076     labels: A `Tensor` of type `int64` and shape `[batch_size,
    1077         num_true]`. The target classes.  Note that this format differs from
    1078         the `labels` argument of `nn.softmax_cross_entropy_with_logits`.
    1079     num_sampled: An `int`.  The number of classes to randomly sample per batch.
    1080     num_classes: An `int`. The number of possible classes.
    1081     num_true: An `int`.  The number of target classes per training example.
    1082     sampled_values: a tuple of (`sampled_candidates`, `true_expected_count`,
    1083         `sampled_expected_count`) returned by a `*_candidate_sampler` function.
    1084         (if None, we default to `log_uniform_candidate_sampler`)
    1085     remove_accidental_hits:  A `bool`.  whether to remove "accidental hits"
    1086         where a sampled class equals one of the target classes.  Default is
    1087         True.
    1088     partition_strategy: A string specifying the partitioning strategy, relevant
    1089         if `len(weights) > 1`. Currently `"div"` and `"mod"` are supported.
    1090         Default is `"mod"`. See `tf.nn.embedding_lookup` for more details.
    1091     name: A name for the operation (optional).
    1092 
    1093   Returns:
    1094     A `batch_size` 1-D tensor of per-example sampled softmax losses.
      logits, labels = _compute_sampled_logits(
          weights, biases, inputs, labels, num_sampled, num_classes,
          num_true=num_true,
          sampled_values=sampled_values,
          subtract_log_q=True,
          remove_accidental_hits=remove_accidental_hits,
          partition_strategy=partition_strategy,
          name=name)
      sampled_losses = nn_ops.softmax_cross_entropy_with_logits(logits, labels)
      # sampled_losses is a [batch_size] tensor.
      return sampled_losses
    
    
    # TODO(cwhipkey): sigmoid and tanh should not be exposed from tf.nn.
    __all__ = make_all(__name__)
    __all__.append("zero_fraction")  # documented in training.py
    
    # Modules whitelisted for reference through tf.nn.
    # TODO(cwhipkey): migrate callers to use the submodule directly.
    __all__.extend(["nn_ops", "rnn_cell", "seq2seq"])
    
    # Symbols whitelisted for export without documentation.
    # TODO(cwhipkey): review these and move to contrib or expose through
    # documentation.
    __all__.extend([
        "all_candidate_sampler",
        "batch_norm_with_global_normalization",
        "batch_normalization",
        "bidirectional_rnn",
        "conv2d_backprop_filter",
        "conv2d_backprop_input",
        "depthwise_conv2d_native",
        "dynamic_rnn",
        "lrn",
        "relu_layer",
        "rnn",
        "state_saving_rnn",
        "xw_plus_b",
    ])

      如果还有问题未能得到解决,搜索887934385交流群,进入后下载资料工具安装包等。最后,感谢观看!
     
  • 相关阅读:
    PHP 关于字符串操作的练习
    PHP 字符串操作
    JavaScript DOM级事件
    JavaScript 会闪烁的文字
    JavaScript 当输入框获得焦点:如果输入框值为空,提示输入你的姓名,当输入框失去焦点,如果输入框值为空,提示用户名不能为空,边框颜色变为红色,如果输入框值不为0,那么不提示边框默认颜色
    Js 特效之鼠标点击出现小心心特效
    JavaScript BOM对象 DOM对象
    JavaScript 当用户在弹出的输入框中输入手机号码后,将手机号码的前7位转化为*号
    JavaScript 编写代码对用户输入内容的输入框进行排查,看有没有敏感字“草”字
    第一次c++作业小结
  • 原文地址:https://www.cnblogs.com/pypypy/p/11860967.html
Copyright © 2020-2023  润新知