转自:http://blog.sina.com.cn/s/blog_c0de2be70102wd1k.html
1、ANR
basic knowledge
ANR分类:
Key Dispatch Timeout
Service Timeout
Broadcast Timeout
关键词:
event log: “am_anr”
trace.txt -java
-native
CPU usage ago
CPU usage later
iowait
“Early ANR” = The first ANR
2、How
to read Event Log
一些信息没有在logcat log中而是在event log中,查看event log对与解决ANR问题很有帮助。
1) event log定义:
event.logtags
( in code path /system/core/logcat/event.logtags )
( in phone path /system/etc/event-log-tags )
event.logtags:
30008 am_anr (pid|1|5),(Package Name|3),(Flags|1|5),(reason|3)
Source Code ActivityManagerService.java:
EventLog.writeEvent(EventLogTags.AM_ANR, app.pid, app.processName...))
2) Log Example:
I/am_anr ( 1776): [3274,com.sonyericsson.home,572997,keyDispatchingTimedOut]
3) Event: am_anr
system/core/logcat/event.logtags
52004 binder_sample (descriptor|3),(method_num|1|5),(time|1|3),(blocking_package|3),(sample_percent|1|6)
# Application Not Responding
30008 am_anr (pid|1|5),(Package Name|3),(Flags|1|5),(reason|3)
I/am_anr ( 1776): [3274,com.sonyericsson.home,572997,keyDispatchingTimedOut]
4) 重要的event log标志:
db_sample
content_query_sample
content_update_sample
binder sample
dvm_lock_sample
Activity Event Series:
am_proc_start
am_restart_activity
am_on_resume_called
am_finish_activity
am_pause_activity
am_on_paused_called
am_destroy_activity
5) Database related:
定义:
52000 db_sample (db|3),(sql|3),(time|1|3),(blocking_package|3), (sample_percent|1|6)
# Database operation samples.
# db: 数据库文件名
# sql: the executed query (without query args)
# time: cpu time millis (not wall time), including lock acquisition
# blocking_package: if this is on a main thread, the package name, otherwise ""
# sample_percent: the percent likelihood this query was logged (based on 500ms)
sample_percent? If time >= 500ms, then sample_percent show 100.
例子1:
I/db_sample(27898): [/data/data/com.facebook.katana/databases/fb.db,INSERT INTO cache(value, timestamp, name) VALUES(?, ?, ?);,99,com.facebook.katana,20]
信息解析:
Database: /data/data/com.facebook.katana/databases/fb.db
Sql: INSERT INTO cache(value, timestamp, name) VALUES(?, ?, ?)
Time spent: 99ms
This database operation runs in com.facebook.katana main thread.
例子2:
I/db_sample(26092): [/data/data/com.google.android.gm/databases/XX@YY,COMMIT;DELETE FROM messages WHERE conversation = ? AND synced = ,797,,100]
6) Events : binder_sample
定义:
52004 binder_sample (descriptor|3),(method_num|1|5),(time|1|3),(blocking_package|3),(sample_percent|1|6)
descriptor:接口描述符(任何一个binder都实现一个接口,IPC调用用的API,任何一个接口都有一个描述符,即一个字符串)
method_num:调用的方法的序列号
time:该方法调用耗用的时间 (从client端发起到server端完成并返回)
blocking_package:从哪个进程发起的调用
例子:
I/binder_sample( 520): [com.android.internal.telephony.ITelephonyRegistry,7,12518,com.android.phone,100]
含义:从com.android.phone主线程发起了一个ITelephonyRegistry接口的第7个方法的binder调用,耗时12s。
(从后面的源代码可知,系统只记录从主线程发起的binder通讯)
查可知有ITelephonyRegistry.aidl,系统编译后会自动生成ITelephonyRegistry.java文件,里面定义了方法的序号。
out/target/common/obj/JAVA_LIBRARIES/framework_intermediates/src/telephony/java/com/android/internal/telephony/ITelephonyRegistry.java
private static final java.lang.String DESCRIPTOR = "com.android.internal.telephony.ITelephonyRegistry";
static final int TRANSACTION_listen = (android.os.IBinder.FIRST_CALL_TRANSACTION + 0);
static final int TRANSACTION_notifyCallState = (android.os.IBinder.FIRST_CALL_TRANSACTION + 1);
......
static final int TRANSACTION_notifyDataConnection = (android.os.IBinder.FIRST_CALL_TRANSACTION + 7);
由此知第7个方法是notifyDataConnection()
7) Events : dvm_lock_sample
定义:
dvm_lock_sample (process|3),(main|1|5),(thread|3),(time|1|3),(file|3),(line|1|5),(ownerfile|3),(ownerline|1|5),(sample_percent|1|6)
dvm_lock_sample会记录所有线程在java层(即dvm层)获取锁的记录。该log在获取到锁之后输出。
包括:所属进程,是否为主线程,线程名称,获取锁所用的时间,获取锁代码所在的源文件,行号,锁被占有的代码所在的源文件,行号,耗用时间所占比例。
如果ownerfile与file相同,用"-"表示。
Example:
11-25 14:14:24.140 I/dvm_lock_sample( 296): [system_server,0,Binder Thread #10,12515,
PackageManagerService.java,1719,-,682,100]
含义:Binder Thread #10 ,非主线程,属于system_server进程,获取锁耗用了12s, PackageManagerService.java, line 1719获取锁,PackageManagerService.java,line 682占用了锁。
3、OOM
Memory leak will cause ANR. Because when memory is low, normal operations will be blocked for lack of memory or wait GC to collect memory, this will make normal operations taking much more time than before. Low memory will cause system into very bad performance, ANR occurrence rate will increase greatly in this condition. You will see many out of memory issues are along with ANR issues.
If you see frequent ANR occurs in short time and system is in OOM condition. These ANRs often can be ignored. We should put effert on fixing memory leak. When the memory leak is fixed, these ANRs should disappear.
There are 2 kinds of OOM. One is OOM in one dvm process. Another is system overall memory is not enough.
1) OOM in one dvm process
One process, one dvm, one GC.
Dvm heap limit in ICS: Change from dalvik.vm.heapsize to dalvik.vm.heapgrowthlimit (64Mb now in ginger fuji ICS)
D/dalvikvm( 216): GC_CONCURRENT freed 4443K, 54% free 28546K/61347K, paused 4ms+20ms
系统为进程216分配了61347K的dvm heap,有54% free,分母代表了历史上对Heap的最大需求。
D/dalvikvm( 3142): GC_BEFORE_OOM freed 10K, 2% free 64421K/65571K, paused 258msE/dalvikvm-heap( 3142): Out of memory on a 335888-byte allocation.
2) System overall OOM
Kernel oom killer – when there is no enough memory for allocating.
内核kernel
log:
init invoked oom-killer: gfp_mask=0xd0, order=2, oom_adj=-16, oom_score_adj=-941
[] (>] (pgd_alloc+0x14/0xe0)
...... unwind_backtrace+0x0/0x12c) from [] (dump_header.clone.1+0xd8/0x22c)
] (dump_header.clone.1+0xd8/0x22c) from [] (oom_kill_process.clone.0+0x40/0x33c)
[] (oom_kill_process.clone.0+0x40/0x33c) from [] (out_of_memory+0x328/0x458)
[] (out_of_memory+0x328/0x458) from [] (__alloc_pages_nodemask+0x4a0/0x608)
[] (__alloc_pages_nodemask+0x4a0/0x608) from [] (__get_free_pages+0x10/0x24)
[] (__get_free_pages+0x10/0x24) from [
[] (sys_fork+0x28/0x2c) from [] (ret_fast_syscall+0x0/0x30)
……
Out of memory: Kill process 22591 (ckscreen.uxpnxt) score 754 or sacrifice child
Killed process 22591 (ckscreen.uxpnxt) total-vm:466932kB, anon-rss:23168kB, file-rss:8396kB
Android Low memory killer – when memory decrease to some value. So, it doesn’t mean memory is used up.
select 1381 (android.browser), adj 7, size 22194, to kill
select 18759 (ndroid.crashsms), adj 8, size 7129, to kill