#mxnet#多符号输出模型后向传播

原问题

目前需要将经过mx.sym.Group的模型进行自定义后向传播，打包的symbol中包含经过MakeLoss的和直接输出（e.g., conv的输出）的数据，
记成如下形式:

modG = Module( mx.sym.Group( [data_A, data_B, loss_A, loss_B ] )

现在需要对其进行backward，显然，需要提供关于data_A和data_B的grad信息，但如何提供（形式是怎样的）？

查看实施的程序：

#python/mxnet/module/module.py
#  --->
# python/build/lib.linux-i686-2.7/mxnet/module/executor_group.py
    def backward(self, out_grads=None):
        assert self.for_training, 're-bind with for_training=True to run backward'
        if out_grads is None:
            out_grads = []

        for i, (exec_, islice) in enumerate(zip(self.execs, self.slices)):                                                                    
            out_grads_slice = []
            for grad, axis in zip(out_grads, self.output_layouts):
                if axis >= 0:
                    # pylint: disable=no-member
                    og_my_slice = nd.slice_axis(grad, axis=axis, begin=islice.start,
                                                end=islice.stop)
                    # pylint: enable=no-member
                    out_grads_slice.append(og_my_slice.as_in_context(self.contexts[i]))
                else:
                    out_grads_slice.append(grad.copyto(self.contexts[i]))

            exec_.backward(out_grads=out_grads_slice)

需要查看如下变量：

modG._exec_group.slices
#[slice(0, 2, None)]

 modG._exec_group.output_layouts
#[0, 0, 0, 0]

modG._exec_group.execs
#[<mxnet.executor.Executor object at 0xb130d8c>]

有些不知所踪，但看起来是要把需要的grad都放进去，但关于loss_A和loss_B的需不不需要虚位以待呢？
试了下，如下的命令可以通过：

outG = modG.get_outputs
grad_1=mx.nd.zeros(outG[1].shape)
grad_for_G=diffD+[grad_1]         #   diffD ~ mx.nd.zeros(outG[0].shape)
grad_for_G
#[<NDArray 2x3x64x64 @cpu(0)>, <NDArray 2x1x64x64 @cpu(0)>]
modG.backward(grad_for_G)

于是进一步查看了exec_.backward(out_grads=out_grads_slice)的操作：

//src/executor/graph_executor.cc
void GraphExecutor::Backward(const std::vector<NDArray>& head_grads) {
  const auto& idx = graph_.indexed_graph();
  if (num_forward_inputs_ != idx.input_nodes().size()) {
    for (size_t i = 0; i < head_grad_array_.size(); ++i) {
      if (!head_grad_array_[i].is_none()) {
        CHECK(i < head_grads.size() && !head_grads[i].is_none())
            << "Because the last operator is not Loss function, "
            << "head_gradient is required in calling backward.";
        CopyFromTo(head_grads[i], &(head_grad_array_[i]));
      }   
    }   
  }
  RunOps(true, num_forward_nodes_, idx.num_nodes());
}

看来，是根据是否分配了grad空间决定的，前面正好提到了关于此话题的内容。
也就是说MakeLoss里面应该有使其与grad_req产生关联的部分。查看了src/operator/make_loss-inl.h，但没有发现特别的地方。忽然想起Op注册时要进行依赖声明：

// src/operator/make_loss-inl.h
  std::vector<int> DeclareBackwardDependency(
      const std::vector<int> &out_grad,
      const std::vector<int> &in_data,
      const std::vector<int> &out_data) const override {
    if (param_.normalization == make_loss_enum::kValid) {
      return {in_data[make_loss_enum::kData]};
    }   
    return {}; 
  }

看起来只能是这里了。

Followup

另一个紧跟的问题是，如果modG的symbol变换了排列顺序呢:

modG = Module( mx.sym.Group( [data_A, loss_A, data_B, loss_B ] )

从src/executor/graph_executor.cc的程序来看，应该使两者对齐，但如何通过python/mxnet/module/module.py?
进一步，slices，output_layouts的变化应该被理解。

outG=modG.get_outputs()
outG
# [<NDArray 2x3x64x64 @cpu(0)>, <NDArray 1 @cpu(0)>, <NDArray 2x1x64x64 @cpu(0)>, <NDArray 1 @cpu(0)>]
modG._exec_group.output_layouts
# [0, 0, 0, 0]
modG._exec_group.slices
# [slice(0, 2, None)]

没变化。。。(⊙﹏⊙)b 这就尴尬了，怎么破？

grad_2=mx.nd.zeros(outG[2].shape)
grad_for_G=diffD+[grad_2] 
modG.backward(grad_for_G)
[17:24:14] /home/chen-k/mxnet/dmlc-core/include/dmlc/./logging.h:300: [17:24:14] src/executor/graph_executor.cc:44: Check failed: i < head_grads.size() && !head_grads[i].is_none() Because the last operator is not Loss function, head_gradient is required in calling backward.
...

显然，机器发现，2号grad没对应的输入。那用None填充：

grad_for_G=diffD+[None,grad_2] 
modG.backward(grad_for_G)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/module/module.py", line 465, in backward
    self._exec_group.backward(out_grads=out_grads)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/module/executor_group.py", line 405, in backward
    end=islice.stop)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/_ctypes/ndarray.py", line 131, in generic_ndarray_function
    c_array(ctypes.c_char_p, [c_str(str(i)) for i in kwargs.values()])))
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/base.py", line 75, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Invalid Parameter format for axis expect int but value='None', in operator slice_axis(name="", axis="None", end="2", begin="0")

也失败了。。。再试试随便填充：

grad_for_G=diffD+[grad_2,grad_2] 
modG.backward(grad_for_G)

可以通过。。。

Unsolved

令人不畅的是python/mxnet/module/module.py中已经有可以通过的机制（猜测可以利用axis<0），但为什么两次output_layouts和slices都没有变化？
思之令人心塞。记到这吧，先解决问题再说。

相关阅读:
C++文件流类与文件流对象
 当java出现异常，应如何进行处理
 C语言函数的声明以及函数原型
 MySQL的create table as 与 like区别
 java中BigDecimal加减乘除基本用法
 项目小结
 自动化测试如何快速提取Json数据
 Java Map 集合类在selenium自动化测试设计中的应用
 UFT对于PDF 文档的操作方法 VBS代码
 Selenium 自动化测试中对页面元素的value比较验证 java语言
原文地址：https://www.cnblogs.com/chenyliang/p/6792292.html