dubbo的异常栈问题

1.现象：

当dubbo provider 抛出异常时，dubbo consumer 在输出异常栈信息时，展示的都是provider 侧的线程栈，这是因为，异常的stackTrace实在在new Throwable()的时候生成的。

      /**
     * Constructs a new throwable with {@code null} as its detail message.
     * The cause is not initialized, and may subsequently be initialized by a
     * call to {@link #initCause}.
     *
     * <p>The {@link #fillInStackTrace()} method is called to initialize
     * the stack trace data in the newly created throwable.
     */
    public Throwable() {
        fillInStackTrace();
    }

    /**
     * Fills in the execution stack trace. This method records within this
     * {@code Throwable} object information about the current state of
     * the stack frames for the current thread.
     *
     * <p>If the stack trace of this {@code Throwable} {@linkplain
     * Throwable#Throwable(String, Throwable, boolean, boolean) is not
     * writable}, calling this method has no effect.
     *
     * @return  a reference to this {@code Throwable} instance.
     * @see     java.lang.Throwable#printStackTrace()
     */
    public synchronized Throwable fillInStackTrace() {
        if (stackTrace != null ||
            backtrace != null /* Out of protocol state */ ) {
            fillInStackTrace(0);
            stackTrace = UNASSIGNED_STACK;
        }
        return this;
    }
    //执行native方法
    private native Throwable fillInStackTrace(int dummy);

这里的关键在fillInStackTrace native 方法，会将当前线程栈的信息填充进stackTrace中；但是分布式服务的调用链中各个服务，都是不同进程，更是不同线程，所以这里的stackTrace只会有发生异常的provider的线程栈信息。当consumer接收到异常时，哪怕log出来，也只有provider侧的相关信息，丢失了consumer侧的线程栈信息；而这一现象，在复杂的dubbo调用链中，是无法满足开发人员对异常分析的需求的；

2.解决办法：

为了解决这个问题，首先想到的是ExceptionFilter.class，dubbo自带的ExceptionFilter.class是只对provider生效的，对异常是否需要包装成RuntimeException进行判断；那我们可以相应的实现一个consumer侧的ConsumerExceptionFilter去实现，当有provider返回异常时，对异常栈进行追加当前consumer侧的线程栈；这样就变相的实现了跨线程的异常栈了；

这里是实现代码，为了避免异常栈过大，在代码实现时，追加的异常栈只取了当前服务的执行api行为的位置；（Filter要起作用，是要在org.apache.dubbo.rpc.Filter中添加自定义filter的配置）

@Activate(group = "consumer")
public class ConsumerExceptionFilter extends ListenableFilter {

    public ConsumerExceptionFilter() {
        super.listener = new ConsumerExceptionFilter.ExceptionListener();
    }


    @Override
    public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
        return invoker.invoke(invocation);
    }


    static class ExceptionListener implements Listener {

        private Logger logger = LoggerFactory.getLogger(ConsumerExceptionFilter.ExceptionListener.class);

        @Override
        public void onResponse(Result appResponse, Invoker<?> invoker, Invocation invocation) {
            if (appResponse.hasException() && GenericService.class != invoker.getInterface()) {
                try {
                    Throwable exception = appResponse.getException();

                    // directly throw if it's checked exception
                    if (!(exception instanceof RuntimeException) && (exception instanceof Exception)) {
                        return;
                    }

                    //这段代码的主要目的是为了将consumer方的部分stackTrace追加到provider抛出来的异常的stackTrace
                    //方便在复杂调用环境中，追踪异常位置
                    StackTraceElement[] stackTrace = exception.getStackTrace();
                    StackTraceElement[] newStackTrace = Arrays.copyOf(stackTrace, stackTrace.length + 1);
                    StackTraceElement[] consumerStackTrace = new RuntimeException().getStackTrace();
                    boolean meetProxyElement = false;
                    for (StackTraceElement consumerStackTraceElement : consumerStackTrace) {

                        if (meetProxyElement){
                            //这里为了节省资源，只追加一行stackTrace（执行api代码位置的stackTrace）
                            newStackTrace[newStackTrace.length-1] = consumerStackTraceElement;
                            break;
                        }
                        //dubbo的调用使用动态代理，所以stackTraceElement的className会是com.sun.proxy.$Proxy,
                        //它的下一个stackTraceElement就是真正的调用方位置
                        if(
                            Objects.equals(consumerStackTraceElement.getMethodName(), invocation.getMethodName())
                                && consumerStackTraceElement.getClassName().startsWith("com.sun.proxy")){
                            meetProxyElement = true;
                        }

                    }
                    exception.setStackTrace(newStackTrace);

                    return;
                } catch (Throwable e) {
                    logger.warn("Fail to ConsumerExceptionFilter when execute " + RpcContext.getContext().getRemoteHost() + ". service: " + invoker.getInterface().getName() + ", method: " + invocation.getMethodName() + ", exception: " + e.getClass().getName() + ": " + e.getMessage(), e);
                    return;
                }
            }
        }

        @Override
        public void onError(Throwable e, Invoker<?> invoker, Invocation invocation) {
            logger.error("Got unchecked and undeclared exception which from " + RpcContext.getContext().getRemoteHost() + ". service: " + invoker.getInterface().getName() + ", method: " + invocation.getMethodName() + ", exception: " + e.getClass().getName() + ": " + e.getMessage(), e);

        }
    }
}

相关阅读:
记一次生产数据库"意外"重启的经历
 我爬了链家青岛市北3000套二手房得出一个结论
 我用Python实现了一个小说网站雏形
 Linux下安装 Python3
Lepus搭建企业级数据库慢查询分析平台
 Lepus搭建企业级数据库全方位监控系统
 shell浅谈之九子shell与进程处理
 shell中测试命变量是否已经定义
 getline数据来源你的三种方式
 awk中的system和getline的用法
原文地址：https://www.cnblogs.com/IC1101/p/13498041.html