看过前面的例子,会发现实现深度神经网络需要使用 tensorflow.nn 这个核心模块。我们通过源码来一探究竟。
1 # Copyright 2015 Google Inc. All Rights Reserved. 2 # 3 # Licensed under the Apache License, Version 2.0 (the "License"); 4 # you may not use this file except in compliance with the License. 5 # You may obtain a copy of the License at 6 # 7 # http://www.apache.org/licenses/LICENSE-2.0 8 # 9 # Unless required by applicable law or agreed to in writing, software 10 # distributed under the License is distributed on an "AS IS" BASIS, 11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 # See the License for the specific language governing permissions and 13 # limitations under the License. 14 # ============================================================================== 15 16 # pylint: disable=unused-import,g-bad-import-order 17 """## Activation Functions 18 19 The activation ops provide different types of nonlinearities for use in neural 20 networks. These include smooth nonlinearities (`sigmoid`, `tanh`, `elu`, 21 `softplus`, and `softsign`), continuous but not everywhere differentiable 22 functions (`relu`, `relu6`, and `relu_x`), and random regularization 23 (`dropout`). 24 25 All activation ops apply componentwise, and produce a tensor of the same 26 shape as the input tensor. 27 28 @@relu 29 @@relu6 30 @@elu 31 @@softplus 32 @@softsign 33 @@dropout 34 @@bias_add 35 @@sigmoid 36 @@tanh 37 38 ## Convolution 39 40 The convolution ops sweep a 2-D filter over a batch of images, applying the 41 filter to each window of each image of the appropriate size. The different 42 ops trade off between generic vs. specific filters: 43 44 * `conv2d`: Arbitrary filters that can mix channels together. 45 * `depthwise_conv2d`: Filters that operate on each channel independently. 46 * `separable_conv2d`: A depthwise spatial filter followed by a pointwise filter. 47 48 Note that although these ops are called "convolution", they are strictly 49 speaking "cross-correlation" since the filter is combined with an input window 50 without reversing the filter. For details, see [the properties of 51 cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation#Properties). 52 53 The filter is applied to image patches of the same size as the filter and 54 strided according to the `strides` argument. `strides = [1, 1, 1, 1]` applies 55 the filter to a patch at every offset, `strides = [1, 2, 2, 1]` applies the 56 filter to every other image patch in each dimension, etc. 57 58 Ignoring channels for the moment, and assume that the 4-D `input` has shape 59 `[batch, in_height, in_width, ...]` and the 4-D `filter` has shape 60 `[filter_height, filter_width, ...]`, then the spatial semantics of the 61 convolution ops are as follows: first, according to the padding scheme chosen 62 as `'SAME'` or `'VALID'`, the output size and the padding pixels are computed. 63 For the `'SAME'` padding, the output height and width are computed as: 64 65 out_height = ceil(float(in_height) / float(strides[1])) 66 out_width = ceil(float(in_width) / float(strides[2])) 67 68 and the padding on the top and left are computed as: 69 70 pad_along_height = ((out_height - 1) * strides[1] + 71 filter_height - in_height) 72 pad_along_width = ((out_width - 1) * strides[2] + 73 filter_width - in_width) 74 pad_top = pad_along_height / 2 75 pad_left = pad_along_width / 2 76 77 Note that the division by 2 means that there might be cases when the padding on 78 both sides (top vs bottom, right vs left) are off by one. In this case, the 79 bottom and right sides always get the one additional padded pixel. For example, 80 when `pad_along_height` is 5, we pad 2 pixels at the top and 3 pixels at the 81 bottom. Note that this is different from existing libraries such as cuDNN and 82 Caffe, which explicitly specify the number of padded pixels and always pad the 83 same number of pixels on both sides. 84 85 For the `'VALID`' padding, the output height and width are computed as: 86 87 out_height = ceil(float(in_height - filter_height + 1) / float(strides[1])) 88 out_width = ceil(float(in_width - filter_width + 1) / float(strides[2])) 89 90 and the padding values are always zero. The output is then computed as 91 92 output[b, i, j, :] = 93 sum_{di, dj} input[b, strides[1] * i + di - pad_top, 94 strides[2] * j + dj - pad_left, ...] * 95 filter[di, dj, ...] 96 97 where any value outside the original input image region are considered zero ( 98 i.e. we pad zero values around the border of the image). 99 100 Since `input` is 4-D, each `input[b, i, j, :]` is a vector. For `conv2d`, these 101 vectors are multiplied by the `filter[di, dj, :, :]` matrices to produce new 102 vectors. For `depthwise_conv_2d`, each scalar component `input[b, i, j, k]` 103 is multiplied by a vector `filter[di, dj, k]`, and all the vectors are 104 concatenated. 105 106 @@conv2d 107 @@depthwise_conv2d 108 @@separable_conv2d 109 @@conv2d_transpose 110 111 ## Pooling 112 113 The pooling ops sweep a rectangular window over the input tensor, computing a 114 reduction operation for each window (average, max, or max with argmax). Each 115 pooling op uses rectangular windows of size `ksize` separated by offset 116 `strides`. For example, if `strides` is all ones every window is used, if 117 `strides` is all twos every other window is used in each dimension, etc. 118 119 In detail, the output is 120 121 output[i] = reduce(value[strides * i:strides * i + ksize]) 122 123 where the indices also take into consideration the padding values. Please refer 124 to the `Convolution` section for details about the padding calculation. 125 126 @@avg_pool 127 @@max_pool 128 @@max_pool_with_argmax 129 130 ## Normalization 131 132 Normalization is useful to prevent neurons from saturating when inputs may 133 have varying scale, and to aid generalization. 134 135 @@l2_normalize 136 @@local_response_normalization 137 @@sufficient_statistics 138 @@normalize_moments 139 @@moments 140 141 ## Losses 142 143 The loss ops measure error between two tensors, or between a tensor and zero. 144 These can be used for measuring accuracy of a network in a regression task 145 or for regularization purposes (weight decay). 146 147 @@l2_loss 148 149 ## Classification 150 151 TensorFlow provides several operations that help you perform classification. 152 153 @@sigmoid_cross_entropy_with_logits 154 @@softmax 155 @@log_softmax 156 @@softmax_cross_entropy_with_logits 157 @@sparse_softmax_cross_entropy_with_logits 158 @@weighted_cross_entropy_with_logits 159 160 ## Embeddings 161 162 TensorFlow provides library support for looking up values in embedding 163 tensors. 164 165 @@embedding_lookup 166 @@embedding_lookup_sparse 167 168 ## Evaluation 169 170 The evaluation ops are useful for measuring the performance of a network. 171 Since they are nondifferentiable, they are typically used at evaluation time. 172 173 @@top_k 174 @@in_top_k 175 176 ## Candidate Sampling 177 178 Do you want to train a multiclass or multilabel model with thousands 179 or millions of output classes (for example, a language model with a 180 large vocabulary)? Training with a full Softmax is slow in this case, 181 since all of the classes are evaluated for every training example. 182 Candidate Sampling training algorithms can speed up your step times by 183 only considering a small randomly-chosen subset of contrastive classes 184 (called candidates) for each batch of training examples. 185 186 See our [Candidate Sampling Algorithms Reference] 187 (../../extras/candidate_sampling.pdf) 188 189 ### Sampled Loss Functions 190 191 TensorFlow provides the following sampled loss functions for faster training. 192 193 @@nce_loss 194 @@sampled_softmax_loss 195 196 ### Candidate Samplers 197 198 TensorFlow provides the following samplers for randomly sampling candidate 199 classes when using one of the sampled loss functions above. 200 201 @@uniform_candidate_sampler 202 @@log_uniform_candidate_sampler 203 @@learned_unigram_candidate_sampler 204 @@fixed_unigram_candidate_sampler 205 206 ### Miscellaneous candidate sampling utilities 207 208 @@compute_accidental_hits 209 210 """ 211 from __future__ import absolute_import 212 from __future__ import division 213 from __future__ import print_function 214 215 from six.moves import xrange # pylint: disable=redefined-builtin 216 217 from tensorflow.python.framework import dtypes 218 from tensorflow.python.framework import ops 219 from tensorflow.python.framework import tensor_shape 220 from tensorflow.python.ops import array_ops 221 from tensorflow.python.ops import candidate_sampling_ops 222 from tensorflow.python.ops import constant_op 223 from tensorflow.python.ops import control_flow_ops 224 from tensorflow.python.ops import embedding_ops 225 from tensorflow.python.ops import init_ops 226 from tensorflow.python.ops import math_ops 227 from tensorflow.python.ops import nn_grad 228 from tensorflow.python.ops import nn_ops 229 from tensorflow.python.ops import numerics 230 from tensorflow.python.ops import random_ops 231 from tensorflow.python.ops import rnn_cell 232 from tensorflow.python.ops import seq2seq 233 from tensorflow.python.ops import sparse_ops 234 from tensorflow.python.ops import variable_scope as vs 235 from tensorflow.python.ops.math_ops import sigmoid 236 from tensorflow.python.ops.math_ops import tanh 237 from tensorflow.python.util.all_util import make_all 238 239 # Bring more nn-associated functionality into this package. 240 # go/tf-wildcard-import 241 # pylint: disable=wildcard-import 242 from tensorflow.python.ops.nn_ops import * 243 from tensorflow.python.ops.candidate_sampling_ops import * 244 from tensorflow.python.ops.embedding_ops import * 245 from tensorflow.python.ops.rnn import * 246 # pylint: enable=wildcard-import 247 248 249 def sigmoid_cross_entropy_with_logits(logits, targets, name=None): 250 """Computes sigmoid cross entropy given `logits`. 251 252 Measures the probability error in discrete classification tasks in which each 253 class is independent and not mutually exclusive. For instance, one could 254 perform multilabel classification where a picture can contain both an elephant 255 and a dog at the same time. 256 257 For brevity, let `x = logits`, `z = targets`. The logistic loss is 258 259 z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x)) 260 = z * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x))) 261 = z * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x))) 262 = z * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x)) 263 = (1 - z) * x + log(1 + exp(-x)) 264 = x - x * z + log(1 + exp(-x)) 265 266 To ensure stability and avoid overflow, the implementation uses 267 268 max(x, 0) - x * z + log(1 + exp(-abs(x))) 269 270 `logits` and `targets` must have the same type and shape. 271 272 Args: 273 logits: A `Tensor` of type `float32` or `float64`. 274 targets: A `Tensor` of the same type and shape as `logits`. 275 name: A name for the operation (optional). 276 277 Returns: 278 A `Tensor` of the same shape as `logits` with the componentwise 279 logistic losses. 280 281 Raises: 282 ValueError: If `logits` and `targets` do not have the same shape. 283 """ 284 with ops.op_scope([logits, targets], name, "logistic_loss") as name: 285 logits = ops.convert_to_tensor(logits, name="logits") 286 targets = ops.convert_to_tensor(targets, name="targets") 287 try: 288 targets.get_shape().merge_with(logits.get_shape()) 289 except ValueError: 290 raise ValueError( 291 "logits and targets must have the same shape (%s vs %s)" 292 % (logits.get_shape(), targets.get_shape())) 293 294 # The logistic loss formula from above is 295 # x - x * z + log(1 + exp(-x)) 296 # For x < 0, a more numerically stable formula is 297 # -x * z + log(1 + exp(x)) 298 # To avoid branching, we use the combined version 299 # max(x, 0) - x * z + log(1 + exp(-abs(x))) 300 return math_ops.add(nn_ops.relu(logits) - logits * targets, 301 math_ops.log(1 + math_ops.exp(-math_ops.abs(logits))), 302 name=name) 303 304 305 def weighted_cross_entropy_with_logits(logits, targets, pos_weight, 306 name=None): 307 """Computes a weighted cross entropy. 308 309 This is like `sigmoid_cross_entropy_with_logits()` except that `pos_weight`, 310 allows one to trade off recall and precision by up- or down-weighting the 311 cost of a positive error relative to a negative error. 312 313 The usual cross-entropy cost is defined as: 314 315 targets * -log(sigmoid(logits)) + (1 - targets) * -log(1 - sigmoid(logits)) 316 317 The argument `pos_weight` is used as a multiplier for the positive targets: 318 319 targets * -log(sigmoid(logits)) * pos_weight + 320 (1 - targets) * -log(1 - sigmoid(logits)) 321 322 For brevity, let `x = logits`, `z = targets`, `q = pos_weight`. 323 The loss is: 324 325 qz * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x)) 326 = qz * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x))) 327 = qz * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x))) 328 = qz * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x)) 329 = (1 - z) * x + (qz + 1 - z) * log(1 + exp(-x)) 330 = (1 - z) * x + (1 + (q - 1) * z) * log(1 + exp(-x)) 331 332 Setting `l = (1 + (q - 1) * z)`, to ensure stability and avoid overflow, 333 the implementation uses 334 335 (1 - z) * x + l * (log(1 + exp(-abs(x))) + max(-x, 0)) 336 337 `logits` and `targets` must have the same type and shape. 338 339 Args: 340 logits: A `Tensor` of type `float32` or `float64`. 341 targets: A `Tensor` of the same type and shape as `logits`. 342 pos_weight: A coefficient to use on the positive examples. 343 name: A name for the operation (optional). 344 345 Returns: 346 A `Tensor` of the same shape as `logits` with the componentwise 347 weightedlogistic losses. 348 349 Raises: 350 ValueError: If `logits` and `targets` do not have the same shape. 351 """ 352 with ops.op_scope([logits, targets], name, "logistic_loss") as name: 353 logits = ops.convert_to_tensor(logits, name="logits") 354 targets = ops.convert_to_tensor(targets, name="targets") 355 try: 356 targets.get_shape().merge_with(logits.get_shape()) 357 except ValueError: 358 raise ValueError( 359 "logits and targets must have the same shape (%s vs %s)" 360 % (logits.get_shape(), targets.get_shape())) 361 362 # The logistic loss formula from above is 363 # (1 - z) * x + (1 + (q - 1) * z) * log(1 + exp(-x)) 364 # For x < 0, a more numerically stable formula is 365 # (1 - z) * x + (1 + (q - 1) * z) * log(1 + exp(x)) - l * x 366 # To avoid branching, we use the combined version 367 # (1 - z) * x + l * (log(1 + exp(-abs(x))) + max(-x, 0)) 368 log_weight = 1 + (pos_weight - 1) * targets 369 return math_ops.add( 370 (1 - targets) * logits, 371 log_weight * (math_ops.log(1 + math_ops.exp(-math_ops.abs(logits))) + 372 nn_ops.relu(-logits)), 373 name=name) 374 375 376 def relu_layer(x, weights, biases, name=None): 377 """Computes Relu(x * weight + biases). 378 379 Args: 380 x: a 2D tensor. Dimensions typically: batch, in_units 381 weights: a 2D tensor. Dimensions typically: in_units, out_units 382 biases: a 1D tensor. Dimensions: out_units 383 name: A name for the operation (optional). If not specified 384 "nn_relu_layer" is used. 385 386 Returns: 387 A 2-D Tensor computing relu(matmul(x, weights) + biases). 388 Dimensions typically: batch, out_units. 389 """ 390 with ops.op_scope([x, weights, biases], name, "relu_layer") as name: 391 x = ops.convert_to_tensor(x, name="x") 392 weights = ops.convert_to_tensor(weights, name="weights") 393 biases = ops.convert_to_tensor(biases, name="biases") 394 xw_plus_b = nn_ops.bias_add(math_ops.matmul(x, weights), biases) 395 return nn_ops.relu(xw_plus_b, name=name) 396 397 398 def l2_normalize(x, dim, epsilon=1e-12, name=None): 399 """Normalizes along dimension `dim` using an L2 norm. 400 401 For a 1-D tensor with `dim = 0`, computes 402 403 output = x / sqrt(max(sum(x**2), epsilon)) 404 405 For `x` with more dimensions, independently normalizes each 1-D slice along 406 dimension `dim`. 407 408 Args: 409 x: A `Tensor`. 410 dim: Dimension along which to normalize. 411 epsilon: A lower bound value for the norm. Will use `sqrt(epsilon)` as the 412 divisor if `norm < sqrt(epsilon)`. 413 name: A name for this operation (optional). 414 415 Returns: 416 A `Tensor` with the same shape as `x`. 417 """ 418 with ops.op_scope([x], name, "l2_normalize") as name: 419 x = ops.convert_to_tensor(x, name="x") 420 square_sum = math_ops.reduce_sum(math_ops.square(x), [dim], keep_dims=True) 421 x_inv_norm = math_ops.rsqrt(math_ops.maximum(square_sum, epsilon)) 422 return math_ops.mul(x, x_inv_norm, name=name) 423 424 425 def zero_fraction(value, name=None): 426 """Returns the fraction of zeros in `value`. 427 428 If `value` is empty, the result is `nan`. 429 430 This is useful in summaries to measure and report sparsity. For example, 431 432 z = tf.Relu(...) 433 summ = tf.scalar_summary('sparsity', tf.nn.zero_fraction(z)) 434 435 Args: 436 value: A tensor of numeric type. 437 name: A name for the operation (optional). 438 439 Returns: 440 The fraction of zeros in `value`, with type `float32`. 441 """ 442 with ops.op_scope([value], name, "zero_fraction"): 443 value = ops.convert_to_tensor(value, name="value") 444 zero = constant_op.constant(0, dtype=value.dtype, name="zero") 445 return math_ops.reduce_mean(math_ops.cast(math_ops.equal(value, zero), 446 dtypes.float32)) 447 448 449 def depthwise_conv2d(input, filter, strides, padding, name=None): 450 """Depthwise 2-D convolution. 451 452 Given an input tensor of shape `[batch, in_height, in_width, in_channels]` 453 and a filter tensor of shape 454 `[filter_height, filter_width, in_channels, channel_multiplier]` 455 containing `in_channels` convolutional filters of depth 1, `depthwise_conv2d` 456 applies a different filter to each input channel (expanding from 1 channel 457 to `channel_multiplier` channels for each), then concatenates the results 458 together. The output has `in_channels * channel_multiplier` channels. 459 460 In detail, 461 462 output[b, i, j, k * channel_multiplier + q] = 463 sum_{di, dj} input[b, strides[1] * i + di, strides[2] * j + dj, k] * 464 filter[di, dj, k, q] 465 466 Must have `strides[0] = strides[3] = 1`. For the most common case of the 467 same horizontal and vertical strides, `strides = [1, stride, stride, 1]`. 468 469 Args: 470 input: 4-D with shape `[batch, in_height, in_width, in_channels]`. 471 filter: 4-D with shape 472 `[filter_height, filter_width, in_channels, channel_multiplier]`. 473 strides: 1-D of size 4. The stride of the sliding window for each 474 dimension of `input`. 475 padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm. 476 name: A name for this operation (optional). 477 478 Returns: 479 A 4-D `Tensor` of shape 480 `[batch, out_height, out_width, in_channels * channel_multiplier].` 481 """ 482 with ops.op_scope([input, filter], name, "depthwise") as name: 483 input = ops.convert_to_tensor(input, name="tensor_in") 484 filter = ops.convert_to_tensor(filter, name="filter_in") 485 # A shape is required to statically compute the number of separable filters. 486 if filter.get_shape().ndims is not None: 487 assert len(filter.get_shape()) == 4 488 in_channels = filter.get_shape()[2] 489 # Sanity checks, if shape information is available for the inputs. 490 if input.get_shape().ndims is not None: 491 assert len(input.get_shape()) == 4 492 assert input.get_shape()[3] == in_channels, ( 493 "Mismatched input depth %d and number of depthwise filters %d." % ( 494 input.get_shape()[3].value, in_channels)) 495 else: 496 assert input.get_shape().ndims is not None, ( 497 "Either tensor must provide static shape information.") 498 assert input.get_shape().ndims == 4 499 in_channels = input.get_shape()[3] 500 501 if in_channels == 1: 502 return nn_ops.conv2d(input, filter, strides, padding, name=name) 503 else: 504 # Create one separate convolution per channel. 505 convs = [] 506 for channel in xrange(in_channels): 507 with ops.name_scope("depth%d" % channel) as channel_scope: 508 t_in = array_ops.slice(input, [0, 0, 0, channel], [-1, -1, -1, 1], 509 name="slice_inputs") 510 f_in = array_ops.slice(filter, [0, 0, channel, 0], [-1, -1, 1, -1], 511 name="slice_params") 512 convs.append(nn_ops.conv2d(t_in, f_in, 513 strides, padding, name=channel_scope)) 514 # Concatenate the per-channel convolutions along the channel dimension. 515 return array_ops.concat(3, convs, name=name) 516 517 518 def separable_conv2d(input, depthwise_filter, pointwise_filter, strides, 519 padding, 520 name=None): 521 """2-D convolution with separable filters. 522 523 Performs a depthwise convolution that acts separately on channels followed by 524 a pointwise convolution that mixes channels. Note that this is separability 525 between dimensions `[1, 2]` and `3`, not spatial separability between 526 dimensions `1` and `2`. 527 528 In detail, 529 530 output[b, i, j, k] = sum_{di, dj, q, r] 531 input[b, strides[1] * i + di, strides[2] * j + dj, q] * 532 depthwise_filter[di, dj, q, r] * 533 pointwise_filter[0, 0, q * channel_multiplier + r, k] 534 535 `strides` controls the strides for the depthwise convolution only, since 536 the pointwise convolution has implicit strides of `[1, 1, 1, 1]`. Must have 537 `strides[0] = strides[3] = 1`. For the most common case of the same 538 horizontal and vertical strides, `strides = [1, stride, stride, 1]`. 539 540 Args: 541 input: 4-D `Tensor` with shape `[batch, in_height, in_width, in_channels]`. 542 depthwise_filter: 4-D `Tensor` with shape 543 `[filter_height, filter_width, in_channels, channel_multiplier]`. 544 Contains `in_channels` convolutional filters of depth 1. 545 pointwise_filter: 4-D `Tensor` with shape 546 `[1, 1, channel_multiplier * in_channels, out_channels]`. Pointwise 547 filter to mix channels after `depthwise_filter` has convolved spatially. 548 strides: 1-D of size 4. The strides for the depthwise convolution for 549 each dimension of `input`. 550 padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm. 551 name: A name for this operation (optional). 552 553 Returns: 554 A 4-D `Tensor` of shape `[batch, out_height, out_width, out_channels]`. 555 """ 556 with ops.op_scope([input, depthwise_filter, pointwise_filter], 557 name, "separable_conv2d") as name: 558 input = ops.convert_to_tensor(input, name="tensor_in") 559 depthwise_filter = ops.convert_to_tensor(depthwise_filter, 560 name="depthwise_filter") 561 pointwise_filter = ops.convert_to_tensor(pointwise_filter, 562 name="pointwise_filter") 563 564 if pointwise_filter.get_shape().ndims is not None: 565 assert len(pointwise_filter.get_shape()) == 4 566 assert pointwise_filter.get_shape()[0] == 1 567 assert pointwise_filter.get_shape()[1] == 1 568 if depthwise_filter.get_shape().ndims and input.get_shape().ndims: 569 channel_multiplier = depthwise_filter.get_shape()[3] 570 in_channels = input.get_shape()[3] 571 out_channels = pointwise_filter.get_shape()[3] 572 # This would mean the separable convolutions is over-parametrized. 573 assert channel_multiplier * in_channels < out_channels 574 # The layout of the ops in the graph are expected to be as follows: 575 # separable_conv2d // Conv2D op corresponding to the pointwise conv. 576 # separable_conv2d/depthwise // Concat op for the deptwise outputs. 577 # separable_conv2d/depthwise/depth0 // Conv2D op for depth 0 578 # separable_conv2d/depthwise/depth1 // Conv2D op for depth 1 579 # separable_conv2d/depthwise/depth2 // Conv2D op for depth 2 580 depthwise = depthwise_conv2d(input, depthwise_filter, strides, 581 padding, name="depthwise") 582 return nn_ops.conv2d(depthwise, pointwise_filter, [1, 1, 1, 1], 583 padding="VALID", name=name) 584 585 586 def sufficient_statistics(x, axes, shift=True, keep_dims=False, name=None): 587 """Calculate the sufficient statistics for the mean and variance of `x`. 588 589 These sufficient statistics are computed using the one pass algorithm on 590 an input that's optionally shifted using the value of the 1st element in `x`. 591 See: 592 https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Computing_shifted_data 593 594 Args: 595 x: A `Tensor`. 596 axes: Array of ints. Axes along which to compute mean and variance. 597 shift: If true, shift the data to provide more numerically stable results. 598 keep_dims: produce statistics with the same dimensionality as the input. 599 name: Name used to scope the operations that compute the sufficient stats. 600 601 Returns: 602 Four `Tensor` objects of the same type as `x`: 603 * the count (number of elements to average over). 604 * the (possibly shifted) sum of the elements in the array. 605 * the (possibly shifted) sum of squares of the elements in the array. 606 * the shift by which the mean must be corrected or None if `shift` is False. 607 """ 608 with ops.op_scope([x, axes], name, "sufficient_statistics"): 609 x = ops.convert_to_tensor(x, name="x") 610 x_shape = x.get_shape() 611 if x_shape.is_fully_defined(): 612 counts = 1 613 m_shape = [] 614 for d in xrange(x_shape.ndims): 615 dim = x_shape[d].value 616 if d in set(axes): 617 counts *= dim 618 dim = 1 619 m_shape.append(dim) 620 counts = constant_op.constant(counts, dtype=x.dtype) 621 else: # shape needs to be inferred at runtime. 622 x_shape = array_ops.shape(x) 623 select_axes = sparse_ops.sparse_to_dense(axes, array_ops.shape(x_shape), 624 True, False) 625 m_shape = math_ops.select(select_axes, array_ops.ones_like(x_shape), 626 x_shape) 627 counts = math_ops.cast( 628 math_ops.reduce_prod(x_shape / m_shape), 629 x.dtype, 630 name="count") 631 if shift: 632 shift_value = array_ops.slice(x, array_ops.zeros_like(m_shape), m_shape) 633 m_ss = math_ops.sub(x, shift_value) 634 v_ss = math_ops.squared_difference(x, shift_value) 635 if keep_dims: 636 shift_value = array_ops.identity(shift_value, name="shift") 637 else: 638 shift_value = array_ops.squeeze(shift_value, 639 squeeze_dims=axes, 640 name="shift") 641 else: # not shift. 642 m_ss = x 643 v_ss = math_ops.square(x) 644 shift_value = None 645 m_ss = math_ops.reduce_sum(m_ss, axes, keep_dims=keep_dims, name="mean_ss") 646 v_ss = math_ops.reduce_sum(v_ss, axes, keep_dims=keep_dims, name="var_ss") 647 return counts, m_ss, v_ss, shift_value 648 649 650 def normalize_moments(counts, mean_ss, variance_ss, shift, name=None): 651 """Calculate the mean and variance of based on the sufficient statistics. 652 653 Args: 654 counts: A `Tensor` containing a the total count of the data (one value). 655 mean_ss: A `Tensor` containing the mean sufficient statistics: the (possibly 656 shifted) sum of the elements to average over. 657 variance_ss: A `Tensor` containing the variance sufficient statistics: the 658 (possibly shifted) squared sum of the data to compute the variance over. 659 shift: A `Tensor` containing the value by which the data is shifted for 660 numerical stability, or `None` if no shift was performed. 661 name: Name used to scope the operations that compute the moments. 662 663 Returns: 664 Two `Tensor` objects: `mean` and `variance`. 665 """ 666 with ops.op_scope([counts, mean_ss, variance_ss, shift], name, "normalize"): 667 divisor = math_ops.inv(counts, name="divisor") 668 if shift is not None: 669 shifted_mean = math_ops.mul(mean_ss, divisor, name="shifted_mean") 670 mean = math_ops.add(shifted_mean, shift, name="mean") 671 else: # no shift. 672 shifted_mean = math_ops.mul(mean_ss, divisor, name="mean") 673 mean = shifted_mean 674 variance = math_ops.sub( 675 math_ops.mul(variance_ss, divisor), 676 math_ops.square(shifted_mean), 677 name="variance") 678 return (mean, variance) 679 680 681 def moments(x, axes, name=None, keep_dims=False): 682 """Calculate the mean and variance of `x`. 683 684 The mean and variance are calculated by aggregating the contents of `x` 685 across `axes`. If `x` is 1-D and `axes = [0]` this is just the mean 686 and variance of a vector. 687 688 When using these moments for batch normalization (see 689 `tf.nn.batch_normalization`): 690 * for so-called "global normalization", used with convolutional filters with 691 shape `[batch, height, width, depth]`, pass `axes=[0, 1, 2]`. 692 * for simple batch normalization pass `axes=[0]` (batch only). 693 694 Args: 695 x: A `Tensor`. 696 axes: array of ints. Axes along which to compute mean and 697 variance. 698 keep_dims: produce moments with the same dimensionality as the input. 699 name: Name used to scope the operations that compute the moments. 700 701 Returns: 702 Two `Tensor` objects: `mean` and `variance`. 703 """ 704 with ops.op_scope([x, axes], name, "moments"): 705 counts, m_ss, v_ss, shift = sufficient_statistics(x, 706 axes, 707 keep_dims=keep_dims, 708 name=name) 709 return normalize_moments(counts, m_ss, v_ss, shift, name=name) 710 711 712 def batch_normalization(x, 713 mean, 714 variance, 715 offset, 716 scale, 717 variance_epsilon, 718 name=None): 719 """Batch normalization. 720 721 As described in http://arxiv.org/abs/1502.03167. 722 Normalizes a tensor by `mean` and `variance`, and applies (optionally) a 723 `scale` \\(gamma\\) to it, as well as an `offset` \\(\beta\\): 724 725 \\(\frac{gamma(x-mu)}{sigma}+\beta\\) 726 727 `mean`, `variance`, `offset` and `scale` are all expected to be of one of two 728 shapes: 729 * In all generality, they can have the same number of dimensions as the 730 input `x`, with identical sizes as `x` for the dimensions that are not 731 normalized over (the 'depth' dimension(s)), and dimension 1 for the 732 others which are being normalized over. 733 `mean` and `variance` in this case would typically be the outputs of 734 `tf.nn.moments(..., keep_dims=True)` during training, or running averages 735 thereof during inference. 736 * In the common case where the 'depth' dimension is the last dimension in 737 the input tensor `x`, they may be one dimensional tensors of the same 738 size as the 'depth' dimension. 739 This is the case for example for the common `[batch, depth]` layout of 740 fully-connected layers, and `[batch, height, width, depth]` for 741 convolutions. 742 `mean` and `variance` in this case would typically be the outputs of 743 `tf.nn.moments(..., keep_dims=False)` during training, or running averages 744 thereof during inference. 745 746 Args: 747 x: Input `Tensor` of arbitrary dimensionality. 748 mean: A mean `Tensor`. 749 variance: A variance `Tensor`. 750 offset: An offset `Tensor`, often denoted \\(\beta\\) in equations, or 751 None. If present, will be added to the normalized tensor. 752 scale: A scale `Tensor`, often denoted \\(gamma\\) in equations, or 753 `None`. If present, the scale is applied to the normalized tensor. 754 variance_epsilon: A small float number to avoid dividing by 0. 755 name: A name for this operation (optional). 756 757 Returns: 758 the normalized, scaled, offset tensor. 759 """ 760 with ops.op_scope([x, mean, variance, scale, offset], name, "batchnorm"): 761 inv = math_ops.rsqrt(variance + variance_epsilon) 762 if scale is not None: 763 inv *= scale 764 return x * inv + ( 765 offset - mean * inv if offset is not None else -mean * inv) 766 767 768 def batch_norm_with_global_normalization(t, 769 m, 770 v, 771 beta, 772 gamma, 773 variance_epsilon, 774 scale_after_normalization, 775 name=None): 776 """Batch normalization. 777 778 This op is deprecated. See `tf.nn.batch_normalization`. 779 780 Args: 781 t: A 4D input Tensor. 782 m: A 1D mean Tensor with size matching the last dimension of t. 783 This is the first output from tf.nn.moments, 784 or a saved moving average thereof. 785 v: A 1D variance Tensor with size matching the last dimension of t. 786 This is the second output from tf.nn.moments, 787 or a saved moving average thereof. 788 beta: A 1D beta Tensor with size matching the last dimension of t. 789 An offset to be added to the normalized tensor. 790 gamma: A 1D gamma Tensor with size matching the last dimension of t. 791 If "scale_after_normalization" is true, this tensor will be multiplied 792 with the normalized tensor. 793 variance_epsilon: A small float number to avoid dividing by 0. 794 scale_after_normalization: A bool indicating whether the resulted tensor 795 needs to be multiplied with gamma. 796 name: A name for this operation (optional). 797 798 Returns: 799 A batch-normalized `t`. 800 """ 801 return batch_normalization(t, m, v, beta, gamma if scale_after_normalization 802 else None, variance_epsilon, name) 803 804 805 def _sum_rows(x): 806 """Returns a vector summing up each row of the matrix x.""" 807 # _sum_rows(x) is equivalent to math_ops.reduce_sum(x, 1) when x is 808 # a matrix. The gradient of _sum_rows(x) is more efficient than 809 # reduce_sum(x, 1)'s gradient in today's implementation. Therefore, 810 # we use _sum_rows(x) in the nce_loss() computation since the loss 811 # is mostly used for training. 812 cols = array_ops.shape(x)[1] 813 ones_shape = array_ops.pack([cols, 1]) 814 ones = array_ops.ones(ones_shape, x.dtype) 815 return array_ops.reshape(math_ops.matmul(x, ones), [-1]) 816 817 818 def _compute_sampled_logits(weights, biases, inputs, labels, num_sampled, 819 num_classes, num_true=1, 820 sampled_values=None, 821 subtract_log_q=True, 822 remove_accidental_hits=False, 823 partition_strategy="mod", 824 name=None): 825 """Helper function for nce_loss and sampled_softmax_loss functions. 826 827 Computes sampled output training logits and labels suitable for implementing 828 e.g. noise-contrastive estimation (see nce_loss) or sampled softmax (see 829 sampled_softmax_loss). 830 831 Note: In the case where num_true > 1, we assign to each target class 832 the target probability 1 / num_true so that the target probabilities 833 sum to 1 per-example. 834 835 Args: 836 weights: A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor` 837 objects whose concatenation along dimension 0 has shape 838 `[num_classes, dim]`. The (possibly-partitioned) class embeddings. 839 biases: A `Tensor` of shape `[num_classes]`. The class biases. 840 inputs: A `Tensor` of shape `[batch_size, dim]`. The forward 841 activations of the input network. 842 labels: A `Tensor` of type `int64` and shape `[batch_size, 843 num_true]`. The target classes. Note that this format differs from 844 the `labels` argument of `nn.softmax_cross_entropy_with_logits`. 845 num_sampled: An `int`. The number of classes to randomly sample per batch. 846 num_classes: An `int`. The number of possible classes. 847 num_true: An `int`. The number of target classes per training example. 848 sampled_values: a tuple of (`sampled_candidates`, `true_expected_count`, 849 `sampled_expected_count`) returned by a `*_candidate_sampler` function. 850 (if None, we default to `log_uniform_candidate_sampler`) 851 subtract_log_q: A `bool`. whether to subtract the log expected count of 852 the labels in the sample to get the logits of the true labels. 853 Default is True. Turn off for Negative Sampling. 854 remove_accidental_hits: A `bool`. whether to remove "accidental hits" 855 where a sampled class equals one of the target classes. Default is 856 False. 857 partition_strategy: A string specifying the partitioning strategy, relevant 858 if `len(weights) > 1`. Currently `"div"` and `"mod"` are supported. 859 Default is `"mod"`. See `tf.nn.embedding_lookup` for more details. 860 name: A name for the operation (optional). 861 Returns: 862 out_logits, out_labels: `Tensor` objects each with shape 863 `[batch_size, num_true + num_sampled]`, for passing to either 864 `nn.sigmoid_cross_entropy_with_logits` (NCE) or 865 `nn.softmax_cross_entropy_with_logits` (sampled softmax). 866 """ 867 868 if not isinstance(weights, list): 869 weights = [weights] 870 871 with ops.op_scope( 872 weights + [biases, inputs, labels], name, "compute_sampled_logits"): 873 if labels.dtype != dtypes.int64: 874 labels = math_ops.cast(labels, dtypes.int64) 875 labels_flat = array_ops.reshape(labels, [-1]) 876 877 # Sample the negative labels. 878 # sampled shape: [num_sampled] tensor 879 # true_expected_count shape = [batch_size, 1] tensor 880 # sampled_expected_count shape = [num_sampled] tensor 881 if sampled_values is None: 882 sampled_values = candidate_sampling_ops.log_uniform_candidate_sampler( 883 true_classes=labels, 884 num_true=num_true, 885 num_sampled=num_sampled, 886 unique=True, 887 range_max=num_classes) 888 # NOTE: pylint cannot tell that 'sampled_values' is a sequence 889 # pylint: disable=unpacking-non-sequence 890 sampled, true_expected_count, sampled_expected_count = sampled_values 891 # pylint: enable=unpacking-non-sequence 892 893 # labels_flat is a [batch_size * num_true] tensor 894 # sampled is a [num_sampled] int tensor 895 all_ids = array_ops.concat(0, [labels_flat, sampled]) 896 897 # weights shape is [num_classes, dim] 898 all_w = embedding_ops.embedding_lookup( 899 weights, all_ids, partition_strategy=partition_strategy) 900 all_b = embedding_ops.embedding_lookup(biases, all_ids) 901 # true_w shape is [batch_size * num_true, dim] 902 # true_b is a [batch_size * num_true] tensor 903 true_w = array_ops.slice( 904 all_w, [0, 0], array_ops.pack([array_ops.shape(labels_flat)[0], -1])) 905 true_b = array_ops.slice(all_b, [0], array_ops.shape(labels_flat)) 906 907 # inputs shape is [batch_size, dim] 908 # true_w shape is [batch_size * num_true, dim] 909 # row_wise_dots is [batch_size, num_true, dim] 910 dim = array_ops.shape(true_w)[1:2] 911 new_true_w_shape = array_ops.concat(0, [[-1, num_true], dim]) 912 row_wise_dots = math_ops.mul( 913 array_ops.expand_dims(inputs, 1), 914 array_ops.reshape(true_w, new_true_w_shape)) 915 # We want the row-wise dot plus biases which yields a 916 # [batch_size, num_true] tensor of true_logits. 917 dots_as_matrix = array_ops.reshape(row_wise_dots, 918 array_ops.concat(0, [[-1], dim])) 919 true_logits = array_ops.reshape(_sum_rows(dots_as_matrix), [-1, num_true]) 920 true_b = array_ops.reshape(true_b, [-1, num_true]) 921 true_logits += true_b 922 923 # Lookup weights and biases for sampled labels. 924 # sampled_w shape is [num_sampled, dim] 925 # sampled_b is a [num_sampled] float tensor 926 sampled_w = array_ops.slice( 927 all_w, array_ops.pack([array_ops.shape(labels_flat)[0], 0]), [-1, -1]) 928 sampled_b = array_ops.slice(all_b, array_ops.shape(labels_flat), [-1]) 929 930 # inputs has shape [batch_size, dim] 931 # sampled_w has shape [num_sampled, dim] 932 # sampled_b has shape [num_sampled] 933 # Apply X*W'+B, which yields [batch_size, num_sampled] 934 sampled_logits = math_ops.matmul(inputs, 935 sampled_w, 936 transpose_b=True) + sampled_b 937 938 if remove_accidental_hits: 939 acc_hits = candidate_sampling_ops.compute_accidental_hits( 940 labels, sampled, num_true=num_true) 941 acc_indices, acc_ids, acc_weights = acc_hits 942 943 # This is how SparseToDense expects the indices. 944 acc_indices_2d = array_ops.reshape(acc_indices, [-1, 1]) 945 acc_ids_2d_int32 = array_ops.reshape(math_ops.cast( 946 acc_ids, dtypes.int32), [-1, 1]) 947 sparse_indices = array_ops.concat( 948 1, [acc_indices_2d, acc_ids_2d_int32], "sparse_indices") 949 # Create sampled_logits_shape = [batch_size, num_sampled] 950 sampled_logits_shape = array_ops.concat( 951 0, 952 [array_ops.shape(labels)[:1], array_ops.expand_dims(num_sampled, 0)]) 953 if sampled_logits.dtype != acc_weights.dtype: 954 acc_weights = math_ops.cast(acc_weights, sampled_logits.dtype) 955 sampled_logits += sparse_ops.sparse_to_dense( 956 sparse_indices, sampled_logits_shape, acc_weights, 957 default_value=0.0, validate_indices=False) 958 959 if subtract_log_q: 960 # Subtract log of Q(l), prior probability that l appears in sampled. 961 true_logits -= math_ops.log(true_expected_count) 962 sampled_logits -= math_ops.log(sampled_expected_count) 963 964 # Construct output logits and labels. The true labels/logits start at col 0. 965 out_logits = array_ops.concat(1, [true_logits, sampled_logits]) 966 # true_logits is a float tensor, ones_like(true_logits) is a float tensor 967 # of ones. We then divide by num_true to ensure the per-example labels sum 968 # to 1.0, i.e. form a proper probability distribution. 969 out_labels = array_ops.concat( 970 1, [array_ops.ones_like(true_logits) / num_true, 971 array_ops.zeros_like(sampled_logits)]) 972 973 return out_logits, out_labels 974 975 976 def nce_loss(weights, biases, inputs, labels, num_sampled, num_classes, 977 num_true=1, 978 sampled_values=None, 979 remove_accidental_hits=False, 980 partition_strategy="mod", 981 name="nce_loss"): 982 """Computes and returns the noise-contrastive estimation training loss. 983 984 See [Noise-contrastive estimation: A new estimation principle for 985 unnormalized statistical models] 986 (http://www.jmlr.org/proceedings/papers/v9/gutmann10a/gutmann10a.pdf). 987 Also see our [Candidate Sampling Algorithms Reference] 988 (../../extras/candidate_sampling.pdf) 989 990 Note: In the case where `num_true` > 1, we assign to each target class 991 the target probability 1 / `num_true` so that the target probabilities 992 sum to 1 per-example. 993 994 Note: It would be useful to allow a variable number of target classes per 995 example. We hope to provide this functionality in a future release. 996 For now, if you have a variable number of target classes, you can pad them 997 out to a constant number by either repeating them or by padding 998 with an otherwise unused class. 999 1000 Args: 1001 weights: A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor` 1002 objects whose concatenation along dimension 0 has shape 1003 [num_classes, dim]. The (possibly-partitioned) class embeddings. 1004 biases: A `Tensor` of shape `[num_classes]`. The class biases. 1005 inputs: A `Tensor` of shape `[batch_size, dim]`. The forward 1006 activations of the input network. 1007 labels: A `Tensor` of type `int64` and shape `[batch_size, 1008 num_true]`. The target classes. 1009 num_sampled: An `int`. The number of classes to randomly sample per batch. 1010 num_classes: An `int`. The number of possible classes. 1011 num_true: An `int`. The number of target classes per training example. 1012 sampled_values: a tuple of (`sampled_candidates`, `true_expected_count`, 1013 `sampled_expected_count`) returned by a `*_candidate_sampler` function. 1014 (if None, we default to `log_uniform_candidate_sampler`) 1015 remove_accidental_hits: A `bool`. Whether to remove "accidental hits" 1016 where a sampled class equals one of the target classes. If set to 1017 `True`, this is a "Sampled Logistic" loss instead of NCE, and we are 1018 learning to generate log-odds instead of log probabilities. See 1019 our [Candidate Sampling Algorithms Reference] 1020 (../../extras/candidate_sampling.pdf). 1021 Default is False. 1022 partition_strategy: A string specifying the partitioning strategy, relevant 1023 if `len(weights) > 1`. Currently `"div"` and `"mod"` are supported. 1024 Default is `"mod"`. See `tf.nn.embedding_lookup` for more details. 1025 name: A name for the operation (optional). 1026 1027 Returns: 1028 A `batch_size` 1-D tensor of per-example NCE losses. 1029 """ 1030 logits, labels = _compute_sampled_logits( 1031 weights, biases, inputs, labels, num_sampled, num_classes, 1032 num_true=num_true, 1033 sampled_values=sampled_values, 1034 subtract_log_q=True, 1035 remove_accidental_hits=remove_accidental_hits, 1036 partition_strategy=partition_strategy, 1037 name=name) 1038 sampled_losses = sigmoid_cross_entropy_with_logits(logits, 1039 labels, 1040 name="sampled_losses") 1041 # sampled_losses is batch_size x {true_loss, sampled_losses...} 1042 # We sum out true and sampled losses. 1043 return _sum_rows(sampled_losses) 1044 1045 1046 def sampled_softmax_loss(weights, biases, inputs, labels, num_sampled, 1047 num_classes, num_true=1, 1048 sampled_values=None, 1049 remove_accidental_hits=True, 1050 partition_strategy="mod", 1051 name="sampled_softmax_loss"): 1052 """Computes and returns the sampled softmax training loss. 1053 1054 This is a faster way to train a softmax classifier over a huge number of 1055 classes. 1056 1057 This operation is for training only. It is generally an underestimate of 1058 the full softmax loss. 1059 1060 At inference time, you can compute full softmax probabilities with the 1061 expression `tf.nn.softmax(tf.matmul(inputs, weights) + biases)`. 1062 1063 See our [Candidate Sampling Algorithms Reference] 1064 (../../extras/candidate_sampling.pdf) 1065 1066 Also see Section 3 of [Jean et al., 2014](http://arxiv.org/abs/1412.2007) 1067 ([pdf](http://arxiv.org/pdf/1412.2007.pdf)) for the math. 1068 1069 Args: 1070 weights: A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor` 1071 objects whose concatenation along dimension 0 has shape 1072 [num_classes, dim]. The (possibly-sharded) class embeddings. 1073 biases: A `Tensor` of shape `[num_classes]`. The class biases. 1074 inputs: A `Tensor` of shape `[batch_size, dim]`. The forward 1075 activations of the input network. 1076 labels: A `Tensor` of type `int64` and shape `[batch_size, 1077 num_true]`. The target classes. Note that this format differs from 1078 the `labels` argument of `nn.softmax_cross_entropy_with_logits`. 1079 num_sampled: An `int`. The number of classes to randomly sample per batch. 1080 num_classes: An `int`. The number of possible classes. 1081 num_true: An `int`. The number of target classes per training example. 1082 sampled_values: a tuple of (`sampled_candidates`, `true_expected_count`, 1083 `sampled_expected_count`) returned by a `*_candidate_sampler` function. 1084 (if None, we default to `log_uniform_candidate_sampler`) 1085 remove_accidental_hits: A `bool`. whether to remove "accidental hits" 1086 where a sampled class equals one of the target classes. Default is 1087 True. 1088 partition_strategy: A string specifying the partitioning strategy, relevant 1089 if `len(weights) > 1`. Currently `"div"` and `"mod"` are supported. 1090 Default is `"mod"`. See `tf.nn.embedding_lookup` for more details. 1091 name: A name for the operation (optional). 1092 1093 Returns: 1094 A `batch_size` 1-D tensor of per-example sampled softmax losses.
logits, labels = _compute_sampled_logits( weights, biases, inputs, labels, num_sampled, num_classes, num_true=num_true, sampled_values=sampled_values, subtract_log_q=True, remove_accidental_hits=remove_accidental_hits, partition_strategy=partition_strategy, name=name) sampled_losses = nn_ops.softmax_cross_entropy_with_logits(logits, labels) # sampled_losses is a [batch_size] tensor. return sampled_losses # TODO(cwhipkey): sigmoid and tanh should not be exposed from tf.nn. __all__ = make_all(__name__) __all__.append("zero_fraction") # documented in training.py # Modules whitelisted for reference through tf.nn. # TODO(cwhipkey): migrate callers to use the submodule directly. __all__.extend(["nn_ops", "rnn_cell", "seq2seq"]) # Symbols whitelisted for export without documentation. # TODO(cwhipkey): review these and move to contrib or expose through # documentation. __all__.extend([ "all_candidate_sampler", "batch_norm_with_global_normalization", "batch_normalization", "bidirectional_rnn", "conv2d_backprop_filter", "conv2d_backprop_input", "depthwise_conv2d_native", "dynamic_rnn", "lrn", "relu_layer", "rnn", "state_saving_rnn", "xw_plus_b", ])
如果还有问题未能得到解决,搜索887934385交流群,进入后下载资料工具安装包等。最后,感谢观看!