android -- WatchDog看门狗分析

在由单片机构成的微型计算机系统中,由于单片机的工作常常会受到来自外界电磁场的干扰,造成程序的跑飞,而陷入死循环,程序的正常运行被打断,由单片机控制的系统无法继续工作,会造成整个系统的陷入停滞状态,发生不可预料的后果,所以出于对单片机运行状态进行实时监测的考虑,便产生了一种专门用于监测单片机程序运行状态的芯片,俗称"看门狗"。

在Android系统中也需要看好几个重要的Service门，用于发现出了问题的Service杀掉SystemServer进程，所以有必要了解并分析其系统问题。

那么被监控的有哪些Service呢？

ActivityManagerService.java :frameworksaseservicesjavacomandroidserveram
PowerManagerService.java :frameworksaseservicesjavacomandroidserver
WindowManagerService.java :frameworksaseservicesjavacomandroidserver

下面就依次分析一下其整个处理流程：

1、初始化
run @ SysemServer.java
   Slog.i(TAG, "Init Watchdog");
   Watchdog.getInstance().init(context, battery, power, alarm,
   ActivityManagerService.self());

这里使用单例模式创建：
   public static Watchdog getInstance() {
   if (sWatchdog == null) {
   sWatchdog = new Watchdog();
   }
   return sWatchdog;
   }

   public void init(Context context, BatteryService battery,
   PowerManagerService power, AlarmManagerService alarm,
   ActivityManagerService activity) {
   // 上下文环境变量
   mResolver = context.getContentResolver();
   mBattery = battery;
   mPower = power;
   mAlarm = alarm;
   mActivity = activity;

// 登记 RebootReceiver() 接收，用于reboot广播接收使用

context.registerReceiver(new RebootReceiver(),

new IntentFilter(REBOOT_ACTION));

...

// 系统启动时间

   mBootTime = System.currentTimeMillis();
   }

ok,调用init函数启动完毕

2、运行中
run @ SysemServer.java
调用 Watchdog.getInstance().start(); 启动看门狗

首先看下 Watchdog 类定义：
/** This class calls its monitor every minute. Killing this process if they don't return **/
public class Watchdog extends Thread {
}

从线程类中继承，即会在一个单独线程中运行，调用thrrad.start()即调用 Watchdog.java 中的 run() 函数

   public void run() {
   boolean waitedHalf = false;

   while (true) {
   mCompleted = false;

   // 1、给mHandler发送 MONITOR 消息，用于请求检查 Service是否工作正常
   mHandler.sendEmptyMessage(MONITOR);

   synchronized (this) {

// 2、进行 wait 等待 timeout 时间确认是否退出循环

   long timeout = TIME_TO_WAIT;
   // NOTE: We use uptimeMillis() here because we do not want to increment the time we
   // wait while asleep. If the device is asleep then the thing that we are waiting
   // to timeout on is asleep as well and won't have a chance to run, causing a false
   // positive on when to kill things.
   long start = SystemClock.uptimeMillis();
   while (timeout > 0 && !mForceKillSystem) {
   try {
   wait(timeout); // notifyAll() is called when mForceKillSystem is set
   } catch (InterruptedException e) {
   Log.wtf(TAG, e);
   }
   timeout = TIME_TO_WAIT - (SystemClock.uptimeMillis() - start);
   }

// 3、如果 mCompleted 为真表示service一切正常，后面会再讲到

   if (mCompleted && !mForceKillSystem) {
   // The monitors have returned.
   waitedHalf = false;
   continue;
   }

// 4、表明检测到了有 deadlock-detection 条件发生，利用 dumpStackTraces 打印堆栈依信息

   if (!waitedHalf) {
   // We've waited half the deadlock-detection interval. Pull a stack
   // trace and wait another half.
   ArrayList<Integer> pids = new ArrayList<Integer>();
   pids.add(Process.myPid());
   ActivityManagerService.dumpStackTraces(true, pids, null, null);
   waitedHalf = true;
   continue; // 不过这里会再次检测一次
   }
}

SystemClock.sleep(2000);

   // 5、打印内核栈调用关系
   // Pull our own kernel thread stacks as well if we're configured for that
   if (RECORD_KERNEL_THREADS) {
   dumpKernelStackTraces();
   }

// 6、ok,系统出问题了，检测到某个 Service 出现死锁情况，杀死SystemServer进程

   // Only kill the process if the debugger is not attached.
   if (!Debug.isDebuggerConnected()) {
   Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + name);
   Process.killProcess(Process.myPid());
   System.exit(10);
   } else {
   Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
   }

   waitedHalf = false;
   }
   }

主要工作逻辑：监控线程每隔一段时间发送一条 MONITOR 线另外一个线程，另个一个线程会检查各个 Service 是否正常运行，看门狗就不停的检查并等待结果，失败则杀死SystemServer.

3、Service 检查线程

   /**
   * Used for scheduling monitor callbacks and checking memory usage.
   */
   final class HeartbeatHandler extends Handler {
@Override
   public void handleMessage(Message msg) { // Looper 消息处理函数
   switch (msg.what) {

   case MONITOR: {

// 依次检测各个服务，即调用 monitor() 函数

final int size = mMonitors.size();

   for (int i = 0 ; i < size ; i++) {
   mCurrentMonitor = mMonitors.get(i);
   mCurrentMonitor.monitor();
   }

// 检测成功则设置 mCompleted 变量为 true

   synchronized (Watchdog.this) {
   mCompleted = true;
   mCurrentMonitor = null;
   }

下面我们来看一下各个Service如何确定自已运行ok呢？以 ActivityManagerService 为例：

首先加入检查队列：
private ActivityManagerService() {
   // Add ourself to the Watchdog monitors.
   Watchdog.getInstance().addMonitor(this);
}

然后实现 monitor() 函数：
   /** In this method we try to acquire our lock to make sure that we have not deadlocked */
   public void monitor() {
   synchronized (this) { }
   }
明白了吧，其实就是检查这个 Service 是否发生了死锁，对于此情况就只能kill SystemServer系统了。对于死锁的产生原因非常多，但有个情况需要注意：java层死锁可能发生在调用native函数，而native函数可能与硬件交互导致时间过长而没有返回，从而导致长时间占用导致问题。

4、内存使用检测

消息发送

final class GlobalPssCollected implements Runnable {

   public void run() {
   mHandler.sendEmptyMessage(GLOBAL_PSS);
   }
   }

   检测内存处理函数：
   final class HeartbeatHandler extends Handler {
   @Override
   public void handleMessage(Message msg) {
   switch (msg.what) {
   case GLOBAL_PSS: {
   if (mHaveGlobalPss) {
   // During the last pass we collected pss information, so
   // now it is time to report it.
   mHaveGlobalPss = false;
   if (localLOGV) Slog.v(TAG, "Received global pss, logging.");

logGlobalMemory();

   }
   } break;


   其主要功能如下,统计pSS状况及读取相关linux内核中内存信息：
   void logGlobalMemory() {
   mActivity.collectPss(stats);

   Process.readProcLines("/proc/meminfo", mMemInfoFields, mMemInfoSizes);

   Process.readProcLines("/proc/vmstat", mVMStatFields, mVMStatSizes);

}

Le王冬冬博客分享地址： http://www.cnblogs.com/dongdong230/ 每个人都应做一天攻城狮

相关阅读:
将jar包安装到本地repository中,---以greenplum.jar举例
推荐系统学习起步
代理模式详解:静态代理、JDK动态代理与Cglib动态代理
MyBatis(四):自定义持久层框架优化
MyBatis(二):自定义持久层框架思路分析
MyBatis(一):JDBC使用存在的问题
PHP编程趣事：能喝几瓶啤酒？
Linux下的两个经典宏定义
C/C++中常用的字符串处理函数和内存字符串函数
设计模式之适配器模式（Adapter Pattern）C++实现

原文地址：https://www.cnblogs.com/dongdong230/p/4183081.html