1.理解Oracle进程
首先我们要做的是理解Oracle的3种进程类型:后台进程( background process)和服务进程(也叫前台进程)还有用户进程。当我们尝试启动Oracle实例,首先受到召唤的是后台进程,一组后台进程和内存组建构成了 Oracle 实例,这些后台进程包括 日志记录进程lgwr,数据库写出进程 dbwr, 系统监控进程smon, 进程监控进程pmon, 分布式恢复进程reco, 检查点进程ckpt, 11g后变得更多了,多到我记不住。 这些进程在UNIX上的具体args总是形如ora_functionname_sid, 这里的functionname即后台进程的功能名而sid 即 $ORACLE_SID所指出的值。
第二类是用户进程,它可能是一个sqlplus命令行,可能是imp/exp工具,也可能是用户开发的一个java程序,当用户进程在本地启动时它们不直接操作SGA或者PGA,但毫无疑问它们也是需要消耗一定数量的虚拟内存的。
第三类进程就是我们说的服务进程,启动一个sqlplus 连接(这个连接可能是连到本地的 也可能的是远程的,这在我们讨论内存使用时无关紧要)的同时我们需要用到一个服务进程,它直接向我们的sqlplus终端负责。我们有时候也称服务进程为影子进程。影子进程总是和每一个用户进程一一对应的映射除非我们用到了MTS(多线程服务器)时。影子进程一般形如oracleSID,这里的sid和前文所指一般。
2.理解Oracle的内存使用
Oracle对内存的使用可以划分为2个大类型,即私有的和共享的。私有内存仅供单个进程使用。相反的,共享内存可以供多个进程使用且在具体使用上要复杂得多。在合计共享内存时,我们只需将所有进程所共享的内存段累加一次即可(Oracle 的SGA具体反映到OS层可能是多个shared memory segment,我们只需要将这一个或多个段的大小加到一起就可以了)。
我们可能使用到的最大的共享内存段毫无疑问会是SGA(SYSTEM GLOBAL AREA),我们看到的SGA被映射成虚拟地址且被每一个后台进程和前台进程attach到自己身上,以便随时能够利用到SGA; 我们有很多性能工具可以反映这部分的内存使用, 好比'top','ps -lf', 但他们都无法分辨前后台进程内存使用中私有内存和共享内存分别得使用状况(我们往往只能得到一个Oracle使用内存很多的结论,却不知道是PGA还是 SGA消耗的更多的内存)。如果我们把从这些途径中获得每个进程的内存使用量相加,我们会发现这样计算的总内存使用量是SGA+PGA的几十倍,这是违反常识的,实际也分配不到那么多内存。 要真正了解Oracle内存使用,你使用的内存窥测命令需要能够分离Oracle使用的私有内存和共享内存。在Aix平台上有这样一个svmon(在其他 UNIX平台上有一个我认为更好的工具是pmap,与之对应AIX上有一个procmap命令,但这个命令并不能窥测Oracle 私有或共享内存的使用,所以我们只能退而求其次了)。您可能在AIX的安装光盘上通过安装文件(filesets) "perfagent.tools"来获取该工具。使用"smit install_lastest"命令可以配备这个命令。对于svmon,作为一个非AIX操作系统专家而言,我推荐您读一下我引用的这篇文档:
The svmon Command
The svmon command provides a more in-depth analysis of memory usage. It is more informative, but also more intrusive, than the vmstat and ps commands. The svmon command captures a snapshot of the current state of memory. However, it is not a true snapshot because it runs at the user level with interrupts enabled.
To determine whether svmon is installed and available, run the following command:
# lslpp -lI perfagent.tools
The svmon command can only be executed by the root user.
If an interval is used (-i option), statistics will be displayed until the command is killed or until the number of intervals, which can be specified right after the interval, is reached.
You can use four different reports to analyze the displayed information:
Global (-G)
Displays statistics describing the real memory and paging space in use for the whole system.
Process (-P)
Displays memory usage statistics for active processes.
Segment (-S)
Displays memory usage for a specified number of segments or the top ten highest memory-usage processes in descending order.
Detailed Segment (-D)
Displays detailed information on specified segments.
Additional reports are available in AIX 4.3.3 and later, as follows:
User (-U)
Displays memory usage statistics for the specified login names. If no list of login names is supplied, memory usage statistics display all defined login names.
Command (-C)
Displays memory usage statistics for the processes specified by command name.
Workload Management Class (-W)
Displays memory usage statistics for the specified workload management classes. If no classes are supplied, memory usage statistics display all defined classes.
To support 64-bit applications, the output format of the svmon command was modified in AIX 4.3.3 and later.
Additional reports are available in operating system versions later than 4.3.3, as follows:
Frame (-F)
Displays information about frames. When no frame number is specified, the percentage of used memory is reported. When a frame number is specified, information about that frame is reported.
Tier (-T)
Displays information about tiers, such as the tier number, the superclass name when the -a flag is used, and the total number of pages in real memory from segments belonging to the tier.
How Much Memory is in Use
To print out global statistics, use the -G flag. In this example, we will repeat it five times at two-second intervals.
# svmon -G -i 2 5
m e m o r y i n u s e p i n p g s p a c e
size inuse free pin work pers clnt work pers clnt size inuse
16384 16250 134 2006 10675 2939 2636 2006 0 0 40960 12674
16384 16254 130 2006 10679 2939 2636 2006 0 0 40960 12676
16384 16254 130 2006 10679 2939 2636 2006 0 0 40960 12676
16384 16254 130 2006 10679 2939 2636 2006 0 0 40960 12676
16384 16254 130 2006 10679 2939 2636 2006 0 0 40960 12676
The columns on the resulting svmon report are described as follows:
memory
Statistics describing the use of real memory, shown in 4 K pages.
size
Total size of memory in 4 K pages.
inuse
Number of pages in RAM that are in use by a process plus the number of persistent pages that belonged to a terminated process and are still resident in RAM. This value is the total size of memory minus the number of pages on the free list.
free
Number of pages on the free list.
pin
Number of pages pinned in RAM (a pinned page is a page that is always resident in RAM and cannot be paged out).
in use
Detailed statistics on the subset of real memory in use, shown in 4 K frames.
work
Number of working pages in RAM.
pers
Number of persistent pages in RAM.
clnt
Number of client pages in RAM (client page is a remote file page).
pin
Detailed statistics on the subset of real memory containing pinned pages, shown in 4 K frames.
work
Number of working pages pinned in RAM.
pers
Number of persistent pages pinned in RAM.
clnt
Number of client pages pinned in RAM.
pg space
Statistics describing the use of paging space, shown in 4 K pages. This data is reported only if the -r flag is not used. The value reported starting with AIX 4.3.2 is the actual number of paging-space pages used (which indicates that these pages were paged out to the paging space). This differs from the vmstat command in that vmstat's avm column which shows the virtual memory accessed but not necessarily paged out.
size
Total size of paging space in 4 K pages.
inuse
Total number of allocated pages.
In our example, there are 16384 pages of total size of memory. Multiply this number by 4096 to see the total real memory size (64 MB). While 16250 pages are in use, there are 134 pages on the free list and 2006 pages are pinned in RAM. Of the total pages in use, there are 10675 working pages in RAM, 2939 persistent pages in RAM, and 2636 client pages in RAM. The sum of these three parts is equal to the inuse column of the memory part. The pin part divides the pinned memory size into working, persistent and client categories. The sum of them is equal to the pin column of the memory part. There are 40960 pages (160 MB) of total paging space, and 12676 pages are in use. The inuse column of memory is usually greater than the inuse column of pg spage because memory for file pages is not freed when a program completes, while paging-space allocation is.
In AIX 4.3.3 and later, systems the output of the same command looks similar to the following:
# svmon -G -i 2 5
size inuse free pin virtual
memory 65527 64087 1440 5909 81136
pg space 131072 55824
work pers clnt
pin 5918 0 0
in use 47554 13838 2695
size inuse free pin virtual
memory 65527 64091 1436 5909 81137
pg space 131072 55824
work pers clnt
pin 5918 0 0
in use 47558 13838 2695
size inuse free pin virtual
memory 65527 64091 1436 5909 81137
pg space 131072 55824
work pers clnt
pin 5918 0 0
in use 47558 13838 2695
size inuse free pin virtual
memory 65527 64090 1437 5909 81137
pg space 131072 55824
work pers clnt
pin 5918 0 0
in use 47558 13837 2695
size inuse free pin virtual
memory 65527 64168 1359 5912 81206
pg space 131072 55824
work pers clnt
pin 5921 0 0
in use 47636 13837 2695
The additional output field is the virtual field, which shows the number of pages allocated in the system virtual space.
Who is Using Memory?
The following command displays the memory usage statistics for the top ten processes. If you do not specify a number, it will display all the processes currently running in this system.
# svmon -Pau 10
Pid Command Inuse Pin Pgspace
15012 maker4X.exe 4783 1174 4781
2750 X 4353 1178 5544
15706 dtwm 3257 1174 4003
17172 dtsession 2986 1174 3827
21150 dtterm 2941 1174 3697
17764 aixterm 2862 1174 3644
2910 dtterm 2813 1174 3705
19334 dtterm 2813 1174 3704
13664 dtterm 2804 1174 3706
17520 aixterm 2801 1174 3619
Pid: 15012
Command: maker4X.exe
Segid Type Description Inuse Pin Pgspace Address Range
1572 pers /dev/hd3:62 0 0 0 0..-1
142 pers /dev/hd3:51 0 0 0 0..-1
1bde pers /dev/hd3:50 0 0 0 0..-1
2c1 pers /dev/hd3:49 1 0 0 0..7
9ab pers /dev/hd2:53289 1 0 0 0..0
404 work kernel extension 27 27 0 0..24580
1d9b work lib data 39 0 23 0..607
909 work shared library text 864 0 7 0..65535
5a3 work sreg[4] 9 0 12 0..32768
1096 work sreg[3] 32 0 32 0..32783
1b9d work private 1057 1 1219 0..1306 : 65307..65535
1af8 clnt 961 0 0 0..1716
0 work kernel 1792 1146 3488 0..32767 : 32768..65535
...
The output is divided into summary and detail sections. The summary section lists the top ten highest memory-usage processes in descending order.
Pid 15012 is the process ID that has the highest memory usage. The Command indicates the command name, in this case maker4X.exe. The Inuse column (total number of pages in real memory from segments that are used by the process) shows 4783 pages (each page is 4 KB). The Pin column (total number of pages pinned from segments that are used by the process) shows 1174 pages. The Pgspace column (total number of paging-space pages that are used by the process) shows 4781 pages.
The detailed section displays information about each segment for each process that is shown in the summary section. This includes the segment ID, the type of the segment, description (a textual description of the segment, including the volume name and i-node of the file for persistent segments), number of pages in RAM, number of pinned pages in RAM, number of pages in paging space, and address range.
The Address Range specifies one range for a persistent or client segment and two ranges for a working segment. The range for a persistent or a client segment takes the form '0..x,' where x is the maximum number of virtual pages that have been used. The range field for a working segment can be '0..x : y..65535', where 0..x contains global data and grows upward, and y..65535 contains stack area and grows downward. For the address range, in a working segment, space is allocated starting from both ends and working towards the middle. If the working segment is non-private (kernel or shared library), space is allocated differently. In this example, the segment ID 1b9d is a private working segment; its address range is 0..1306 : 65307..65535. The segment ID 909 is a shared library text working segment; its address range is 0..65535.
A segment can be used by multiple processes. Each page in real memory from such a segment is accounted for in the Inuse field for each process using that segment. Thus, the total for Inuse may exceed the total number of pages in real memory. The same is true for the Pgspace and Pin fields. The sum of Inuse, Pin, and Pgspace of all segments of a process is equal to the numbers in the summary section.
You can use one of the following commands to display the file name associated with the i-node:
* ncheck -i i-node_number volume_name
* find file_system_associated_with_lv_name -xdev -inum inode_number -print
To get a similar output in AIX 4.3.3 and later, use the following command:
# svmon -Put 10
------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd
2164 X 15535 1461 34577 37869 N N
Vsid Esid Type Description Inuse Pin Pgsp Virtual Addr Range
1966 2 work process private 9984 4 31892 32234 0..32272 :
65309..65535
4411 d work shared library text 3165 0 1264 1315 0..65535
0 0 work kernel seg 2044 1455 1370 4170 0..32767 :
65475..65535
396e 1 pers code,/dev/hd2:18950 200 0 - - 0..706
2ca3 - work 32 0 0 32 0..32783
43d5 - work 31 0 6 32 0..32783
2661 - work 29 0 0 29 0..32783
681f - work 29 0 25 29 0..32783
356d f work shared library data 18 0 18 24 0..310
34e8 3 work shmat/mmap 2 2 2 4 0..32767
5c97 - pers /dev/hd4:2 1 0 - - 0..0
5575 - pers /dev/hd2:19315 0 0 - - 0..0
4972 - pers /dev/hd2:19316 0 0 - - 0..5
4170 - pers /dev/hd3:28 0 0 - - 0..0
755d - pers /dev/hd9var:94 0 0 - - 0..0
6158 - pers /dev/hd9var:90 0 0 - - 0..0
------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd
25336 austin.ibm. 12466 1456 2797 11638 N N
Vsid Esid Type Description Inuse Pin Pgsp Virtual Addr Range
14c3 2 work process private 5644 1 161 5993 0..6550 :
65293..65535
4411 d work shared library text 3165 0 1264 1315 0..65535
0 0 work kernel seg 2044 1455 1370 4170 0..32767 :
65475..65535
13c5 1 clnt code 735 0 - - 0..4424
d21 - pers /dev/andy:563 603 0 - - 0..618
9e6 f work shared library data 190 0 2 128 0..3303
942 - pers /dev/cache:16 43 0 - - 0..42
2ca3 - work 32 0 0 32 0..32783
49f0 - clnt 10 0 - - 0..471
1b07 - pers /dev/andy:8568 0 0 - - 0..0
623 - pers /dev/hd2:22539 0 0 - - 0..1
2de9 - clnt 0 0 - - 0..0
1541 5 mmap mapped to sid 761b 0 0 - -
5d15 - pers /dev/andy:487 0 0 - - 0..3
4513 - pers /dev/andy:486 0 0 - - 0..45
cc4 4 mmap mapped to sid 803 0 0 - -
242a - pers /dev/andy:485 0 0 - - 0..0
...
The Vsid column is the virtual segment ID, and the Esid column is the effective segment ID. The effective segment ID reflects the segment register that is used to access the corresponding pages.
Detailed Information on a Specific Segment ID
The -D option displays detailed memory-usage statistics for segments.
# svmon -D 404
Segid: 404
Type: working
Description: kernel extension
Address Range: 0..24580
Size of page space allocation: 0 pages ( 0.0 Mb)
Inuse: 28 frames ( 0.1 Mb)
Page Frame Pin Ref Mod
12294 3320 pin ref mod
24580 1052 pin ref mod
12293 52774 pin ref mod
24579 20109 pin ref mod
12292 19494 pin ref mod
12291 52108 pin ref mod
24578 50685 pin ref mod
12290 51024 pin ref mod
24577 1598 pin ref mod
12289 35007 pin ref mod
24576 204 pin ref mod
12288 206 pin ref mod
4112 53007 pin mod
4111 53006 pin mod
4110 53005 pin mod
4109 53004 pin mod
4108 53003 pin mod
4107 53002 pin mod
4106 53001 pin mod
4105 53000 pin mod
4104 52999 pin mod
4103 52998 pin mod
4102 52997 pin mod
4101 52996 pin mod
4100 52995 pin mod
4099 52994 pin mod
4098 52993 pin mod
4097 52992 pin ref mod
The detail columns are explained as follows:
Page
Specifies the index of the page within the segment.
Frame
Specifies the index of the real memory frame that the page resides in.
Pin
Specifies a flag indicating whether the page is pinned.
Ref
Specifies a flag indicating whether the page's reference bit is on.
Mod
Specifies a flag indicating whether the page is modified.
The size of page space allocation is 0 because all the pages are pinned in real memory.
An example output from AIX 4.3.3 and later, is very similar to the following:
# svmon -D 629 -b
Segid: 629
Type: working
Address Range: 0..77
Size of page space allocation: 7 pages ( 0.0 Mb)
Virtual: 11 frames ( 0.0 Mb)
Inuse: 7 frames ( 0.0 Mb)
Page Frame Pin Ref Mod
0 32304 N Y Y
3 32167 N Y Y
7 32321 N Y Y
8 32320 N Y Y
5 32941 N Y Y
1 48357 N N Y
77 47897 N N Y
The -b flag shows the status of the reference and modified bits of all the displayed frames. After it is shown, the reference bit of the frame is reset. When used with the -i flag, it detects which frames are accessed between each interval.
Note: Use this flag with caution because of its performance impacts.
List of Top Memory Usage of Segments
The -S option is used to sort segments by memory usage and to display the memory-usage statistics for the top memory-usage segments. If count is not specified, then a count of 10 is implicit. The following command sorts system and non-system segments by the number of pages in real memory and prints out the top 10 segments of the resulting list.
# svmon -Sau
Segid Type Description Inuse Pin Pgspace Address Range
0 work kernel 1990 1408 3722 0..32767 : 32768..65535
1 work private, pid=4042 1553 1 1497 0..1907 : 65307..65535
1435 work private, pid=3006 1391 3 1800 0..4565 : 65309..65535
11f5 work private, pid=14248 1049 1 1081 0..1104 : 65307..65535
11f3 clnt 991 0 0 0..1716
681 clnt 960 0 0 0..1880
909 work shared library text 900 0 8 0..65535
101 work vmm data 497 496 1 0..27115 : 43464..65535
a0a work shared library data 247 0 718 0..65535
1bf9 work private, pid=21094 221 1 320 0..290 : 65277..65535
All output fields are described in the previous examples.
An example output from AIX 4.3.3 and later is similar to the following:
# svmon -Sut 10
Vsid Esid Type Description Inuse Pin Pgsp Virtual Addr Range
1966 - work 9985 4 31892 32234 0..32272 :
65309..65535
14c3 - work 5644 1 161 5993 0..6550 :
65293..65535
5453 - work 3437 1 2971 4187 0..4141 :
65303..65535
4411 - work 3165 0 1264 1315 0..65535
5a1e - work 2986 1 13 2994 0..3036 :
65295..65535
340d - work misc kernel tables 2643 0 993 2645 0..15038 :
63488..65535
380e - work kernel pinned heap 2183 1055 1416 2936 0..65535
0 - work kernel seg 2044 1455 1370 4170 0..32767 :
65475..65535
6afb - pers /dev/notes:92 1522 0 - - 0..10295
2faa - clnt 1189 0 - - 0..2324
Correlating svmon and vmstat Outputs
There are some relationships between the svmon and vmstat outputs. The svmon report of AIX 4.3.2 follows (the example is the same with AIX 4.3.3 and later, although the output format is different):
# svmon -G
m e m o r y i n u s e p i n p g s p a c e
size inuse free pin work pers clnt work pers clnt size inuse
16384 16254 130 2016 11198 2537 2519 2016 0 0 40960 13392
The vmstat command was run in a separate window while the svmon command was running. The vmstat report follows:
# vmstat 5
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 13392 130 0 0 0 0 2 0 125 140 36 2 1 97 0
0 0 13336 199 0 0 0 0 0 0 145 14028 38 11 22 67 0
0 0 13336 199 0 0 0 0 0 0 141 49 31 1 1 98 0
0 0 13336 199 0 0 0 0 0 0 142 49 32 1 1 98 0
0 0 13336 199 0 0 0 0 0 0 145 49 32 1 1 99 0
0 0 13336 199 0 0 0 0 0 0 163 49 33 1 1 92 6
0 0 13336 199 0 0 0 0 0 0 142 49 32 0 1 98 0
The global svmon report shows related numbers. The vmstatfre column relates to the svmon memory free column. The number that vmstat reports as Active Virtual Memory (avm) is reported by the svmon command as pg space inuse (13392).
The vmstat avm column provides the same figures as the pg space inuse column of the svmon command except starting with AIX 4.3.2 where Deferred Page Space Allocation is used. In that case, the svmon command shows the number of pages actually paged out to paging space whereas the vmstat command shows the number of virtual pages accessed but not necessarily paged out (see Looking at Paging Space and Virtual Memory).
Correlating svmon and ps Outputs
There are some relationships between the svmon and ps outputs. The svmon report of AIX 4.3.2 follows (the example is the same with AIX 4.3.3 and later, although the output format is different):
# svmon -P 7226
Pid Command Inuse Pin Pgspace
7226 telnetd 936 1 69
Pid: 7226
Command: telnetd
Segid Type Description Inuse Pin Pgspace Address Range
828 pers /dev/hd2:15333 0 0 0 0..0
1d3e work lib data 0 0 28 0..559
909 work shared library text 930 0 8 0..65535
1cbb work sreg[3] 0 0 1 0..0
1694 work private 6 1 32 0..24 : 65310..65535
12f6 pers code,/dev/hd2:69914 0 0 0 0..11
Compare with the ps report, which follows:
# ps v 7226
PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND
7226 - A 0:00 51 240 24 32768 33 0 0.0 0.0 telnetd
SIZE refers to the virtual size in KB of the data section of the process (in paging space). This number is equal to the number of working segment pages of the process that have been touched (that is, the number of paging-space pages that have been allocated) times 4. It must be multiplied by 4 because pages are in 4 K units and SIZE is in 1 K units. If some working segment pages are currently paged out, this number is larger than the amount of real memory being used. The SIZE value (240) correlates with the Pgspace number from the svmon command for private (32) plus lib data (28) in 1 K units.
RSS refers to the real memory (resident set) size in KB of the process. This number is equal to the sum of the number of working segment and code segment pages in memory times 4. Remember that code segment pages are shared among all of the currently running instances of the program. If 26 ksh processes are running, only one copy of any given page of the ksh executable program would be in memory, but the ps command would report that code segment size as part of the RSS of each instance of the ksh program. The RSS value (24) correlates with the Inuse numbers from the svmon command for private (6) working-storage segments, for code (0) segments, and for lib data (0) of the process in 1-K units.
TRS refers to the size of the resident set (real memory) of text. This is the number of code segment pages times four. As was noted earlier, this number exaggerates memory use for programs of which multiple instances are running. This does not include the shared text of the process. The TRS value (0) correlates with the number of the svmon pages in the code segment (0) of the Inuse column in 1 K units. The TRS value can be higher than the TSIZ value because other pages, such as the XCOFF header and the loader section, may be included in the code segment.
The following calculations can be made for the values mentioned:
SIZE = 4 * Pgspace of (work lib data + work private)
RSS = 4 * Inuse of (work lib data + work private + pers code)
TRS = 4 * Inuse of (pers code)
Calculating the Minimum Memory Requirement of a Program
To calculate the minimum memory requirement of a program, the formula would be:
Total memory pages (4 KB units) = T + ( N * ( PD + LD ) ) + F
where:
T
= Number of pages for text (shared by all users)
N
= Number of copies of this program running simultaneously
PD
= Number of working segment pages in process private segment
LD
= Number of shared library data pages used by the process
F
= Number of file pages (shared by all users)
Multiply the result by 4 to obtain the number of kilobytes required. You may want to add in the kernel, kernel extension, and shared library text segment values to this as well even though they are shared by all processes on the system. For example, some applications like CATIA and databases use very large shared library modules. Note that because we have only used statistics from a single snapshot of the process, there is no guarantee that the value we get from the formula will be the correct value for the minimum working set size of a process. To get working set size, one would need to run a tool such as the rmss command or take many snapshots during the life of the process and determine the average values from these snapshots (see Assessing Memory Requirements Through the rmss Command).
If we estimate the minimum memory requirement for the program pacman, shown in Finding Memory-Leaking Programs, the formula would be:
T
= 2 (Inuse of code,/dev/lv01:12302 of pers)
PD
= 1632 (Inuse of private of work)
LD
= 12 (Inuse of lib data of work)
F
= 1 (Inuse of /dev/hd2:53289 of pers
That is: 2 + (N * (1632+ 12)) + 1, equal to 1644 * N + 3 in 4 KB units.
需要注意一点是,svmon会将UNIX上的文件系统缓存对应到曾经申请过这些文件页的进程身上。可笑的是,这些文件系统缓存是不受Oracle本身控制的,他既不是PGA亦不是SGA,这些缓存是受AIX操作系统分配并被排他式地控制着(controlled exclusively).以缓存文件为目的的这部分内存不在我们考虑的Oracle内存使用问题的范畴内,因为这部分内存实际是被AIX所支配着,与我们讨论的PGA/SGA没有联系,如果我们的环境中全部是裸设备(raw device)的话(当然这不太可能),就不存在大量文件系统缓存的问题了。当然这也不意味着这部分在我们考虑总的内存使用时被忽略或漠视,因为这部分文件系统缓存同样会消耗大量物理内存并可能引起不必要的换页操作。我们可以通过"svmon -Pau 10"来了解这部分内存的使用状况;在AIX上著名的性能调优工具virtual memory optimizer,原先的vmtume,现在的vmo工具,可以帮助我们调节文件系统内存的具体阀值如 maxperm,minperm,strict_maxperm(这里不做展开)。有兴趣的话可以参考下面引用的这篇文档:
Tuning VMM Page Replacement with the vmtune Command
The memory management algorithm, discussed in Real-Memory Management, tries to keep the size of the free list and the percentage of real memory occupied by persistent segment pages within specified bounds. These bounds can be altered with the vmtune command, which can only be run by the root user. Changes made by this tool remain in effect until the next reboot of the system. To determine whether the vmtune command is installed and available, run the following command:
# lslpp -lI bos.adt.samples
Note: The vmtune command is in the samples directory because it is very VMM-implementation dependent. The vmtune code that accompanies each release of the operating system is tailored specifically to the VMM in that release. Running the vmtune command from one release on a different release might result in an operating-system failure. It is also possible that the functions of vmtune may change from release to release. Do not propagate shell scripts or /etc/inittab entries that include the vmtune command to a new release without checking the vmtune documentation for the new release to make sure that the scripts will still have the desired effect.
Executing the vmtune command on AIX 4.3.3 with no options results in the following output:
# /usr/samples/kernel/vmtune
vmtune: current values:
-p -P -r -R -f -F -N -W
minperm maxperm minpgahead maxpgahead minfree maxfree pd_npages maxrandwrt
52190 208760 2 8 120 128 524288 0
-M -w -k -c -b -B -u -l -d
maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt lrubucket defps
209581 4096 1024 1 93 96 9 131072 1
-s -n -S -h
sync_release_ilock nokillroot v_pinshm strict_maxperm
0 0 0 0
number of valid memory pages = 261976 maxperm=79.7% of real memory
maximum pinable=80.0% of real memory minperm=19.9% of real memory
number of file memory pages = 19772 numperm=7.5% of real memory
The output shows the current settings for all the parameters.
Choosing minfree and maxfree Settings
The purpose of the free list is to keep track of real-memory page frames released by terminating processes and to supply page frames to requestors immediately, without forcing them to wait for page steals and the accompanying I/O to complete. The minfree limit specifies the free-list size below which page stealing to replenish the free list is to be started. The maxfree parameter is the size above which stealing will end.
The objectives in tuning these limits are to ensure that:
* Any activity that has critical response-time objectives can always get the page frames it needs from the free list.
* The system does not experience unnecessarily high levels of I/O because of premature stealing of pages to expand the free list.
The default value of minfree and maxfree depend on the memory size of the machine. The default value of maxfree is determined by this formula:
maxfree = minimum (# of memory pages/128, 128)
By default the minfree value is the value of maxfree - 8. However, the difference between minfree and maxfree should always be equal to or greater than maxpgahead. Or in other words, the value of maxfree should always be greater than or equal to minfree plus the size of maxpgahead. The minfree/maxfree values will be different if there is more than one memory pool. Memory pools were introduced in AIX 4.3.3 for MP systems with large amounts of RAM. Each memory pool will have its own minfree/maxfree which are determined by the previous formulas, but the minfree/maxfree values shown by the vmtune command will be the sum of the minfree/maxfree for all memory pools.
Remember, that minfree pages in some sense are wasted, because they are available, but not in use. If you have a short list of the programs you want to run fast, you can investigate their memory requirements with the svmon command (see Determining How Much Memory Is Being Used), and set minfree to the size of the largest. This technique risks being too conservative because not all of the pages that a process uses are acquired in one burst. At the same time, you might be missing dynamic demands that come from programs not on your list that may lower the average size of the free list when your critical programs run.
A less precise but more comprehensive tool for investigating an appropriate size for minfree is the vmstat command. The following is a portion of a vmstat command output obtained while running an C compilation on an otherwise idle system.
# vmstat 1
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 3085 118 0 0 0 0 0 0 115 2 19 0 0 99 0
0 0 3086 117 0 0 0 0 0 0 119 134 24 1 3 96 0
2 0 3141 55 2 0 6 24 98 0 175 223 60 3 9 54 34
0 1 3254 57 0 0 6 176 814 0 205 219 110 22 14 0 64
0 1 3342 59 0 0 42 104 249 0 163 314 57 43 16 0 42
1 0 3411 78 0 0 49 104 169 0 176 306 51 30 15 0 55
1 0 3528 160 1 0 10 216 487 0 143 387 54 50 22 0 27
1 0 3627 94 0 0 0 72 160 0 148 292 79 57 9 0 34
1 0 3444 327 0 0 0 64 102 0 132 150 41 82 8 0 11
1 0 3505 251 0 0 0 0 0 0 128 189 50 79 11 0 11
1 0 3550 206 0 0 0 0 0 0 124 150 22 94 6 0 0
1 0 3576 180 0 0 0 0 0 0 121 145 30 96 4 0 0
0 1 3654 100 0 0 0 0 0 0 124 145 28 91 8 0 1
1 0 3586 208 0 0 0 40 68 0 123 139 24 91 9 0 0
Because the compiler has not been run recently, the code of the compiler itself must be read in. All told, the compiler acquires about 2 MB in about 6 seconds. On this 32 MB system, maxfree is 64 and minfree is 56. The compiler almost instantly drives the free list size below minfree, and several seconds of rapid page-stealing activity take place. Some of the steals require that dirty working segment pages be written to paging space, which shows up in the po column. If the steals cause the writing of dirty permanent segment pages, that I/O does not appear in the vmstat report (unless you have directed the vmstat command to report on the I/O activity of the physical volumes to which the permanent pages are being written).
This example describes a fork() and exec() environment (not an environment where a process is long lived, such as in a database) and is not intended to suggest that you set minfree to 500 to accommodate large compiles. It suggests how to use the vmstat command to identify situations in which the free list has to be replenished while a program is waiting for space. In this case, about 2 seconds were added to the compiler execution time because there were not enough page frames immediately available. If you observe the page frame consumption of your program, either during initialization or during normal processing, you will soon have an idea of the number page frames that need to be in the free list to keep the program from waiting for memory.
If we concluded from the example above that minfree needed to be 128, and we had set maxpgahead to 16 to improve sequential performance, we would use the following vmtune command:
# /usr/samples/kernel/vmtune -f 128 -F 144
Tuning Memory Pools
In operating system versions later than AIX 4.3.3, the vmtune -m number_of_memory_pools command allows you to change the number of memory pools that are configured at system boot time. The -m flag is therefore not a dynamic change. The change is written to the kernel file if it is an MP kernel (the change is not allowed on a UP kernel). A value of 0 restores the default number of memory pools.
By default, the vmtune -m command writes to the file /usr/lib/boot/unix_mp, but this can be changed with the command vmtune -U path_to_unix_file. Before changing the kernel file, the vmtune command saves the original file as name_of_original_file.sav.
Tuning lrubucket to Reduce Memory Scanning Overhead
Tuning lrubucket can reduce scanning overhead on large memory systems. In AIX 4.3, a new parameter lrubucket was added. The page-replacement algorithm scans memory frames looking for a free frame. During this scan, reference bits of pages are reset, and if a free frame has not been found, a second scan is done. In the second scan, if the reference bit is still off, the frame will be used for a new page (page replacement).
On large memory systems, there may be too many frames to scan, so now memory is divided up into buckets of frames. The page-replacement algorithm will scan the frames in the bucket and then start over on that bucket for the second scan before moving on to the next bucket. The default number of frames in this bucket is 131072 or 512 MB of RAM. The number of frames is tunable with the command vmtune -l, and the value is in 4 K frames.
Choosing minperm and maxperm Settings
The operating system takes advantage of the varying requirements for real memory by leaving in memory pages of files that have been read or written. If the file pages are requested again before their page frames are reassigned, this technique saves an I/O operation. These file pages may be from local or remote (for example, NFS) file systems.
The ratio of page frames used for files versus those used for computational (working or program text) segments is loosely controlled by the minperm and maxperm values:
* If percentage of RAM occupied by file pages rises above maxperm, page-replacement steals only file pages.
* If percentage of RAM occupied by file pages falls below minperm, page-replacement steals both file and computational pages.
* If percentage of RAM occupied by file pages is between minperm and maxperm, page-replacement steals only file pages unless the number of file repages is higher than the number of computational repages.
In a particular workload, it might be worthwhile to emphasize the avoidance of file I/O. In another workload, keeping computational segment pages in memory might be more important. To understand what the ratio is in the untuned state, we use the vmtune command with no arguments.
# /usr/samples/kernel/vmtune
vmtune: current values:
-p -P -r -R -f -F -N -W
minperm maxperm minpgahead maxpgahead minfree maxfree pd_npages maxrandwrt
52190 208760 2 8 120 128 524288 0
-M -w -k -c -b -B -u -l -d
maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt lrubucket defps
209581 4096 1024 1 93 96 9 131072 1
-s -n -S -h
sync_release_ilock nokillroot v_pinshm strict_maxperm
0 0 0 0
number of valid memory pages = 261976 maxperm=79.7% of real memory
maximum pinable=80.0% of real memory minperm=19.9% of real memory
number of file memory pages = 19772 numperm=7.5% of real memory
The default values are calculated by the following algorithm:
minperm (in pages) = ((number of memory frames) - 1024) * .2
maxperm (in pages) = ((number of memory frames) - 1024) * .8
The numperm value gives the number of file pages in memory, 19772. This is 7.5 percent of real memory.
If we know that our workload makes little use of recently read or written files, we may want to constrain the amount of memory used for that purpose. The following command:
# /usr/samples/kernel/vmtune -p 15 -P 50
sets minperm to 15 percent and maxperm to 50 percent of real memory. This would ensure that the VMM would steal page frames only from file pages when the ratio of file pages to total memory pages exceeded 50 percent. This should reduce the paging to page space with no detrimental effect on the persistent storage. The maxperm value is not a strict limit, it is only considered when the VMM needs to perform page replacement. Because of this, it is usually safe to reduce the maxperm value on most systems.
On the other hand, if our application frequently references a small set of existing files (especially if those files are in an NFS-mounted file system), we might want to allow more space for local caching of the file pages by using the following command:
# /usr/samples/kernel/vmtune -p 30 -P 90
NFS servers that are used mostly for reads with large amounts of RAM can benefit from increasing the value of maxperm. This allows more pages to reside in RAM so that NFS clients can access them without forcing the NFS server to retrieve the pages from disk again.
Another example would be a program that reads 1.5 GB of sequential file data into the working storage of a system with 2 GB of real memory. You may want to set maxperm to 50 percent or less, because you do not need to keep the file data in memory.
Placing a Hard Limit on Persistent File Cache with strict_maxperm
Starting with AIX 4.3.3, a new vmtune option (-h) called strict_maxperm has been added. This option, when set to 1, places a hard limit on how much memory is used for a persistent file cache by making the maxperm value be the upper limit for this file cache. When the upper limit is reached, the least recently used (LRU) is performed on persistent pages.
另一个可以尝试的工具是"ps vg"命令,一般来说每个AIX版本上都默认存在"ps"命令。输入"ps v"命令后紧跟上进程号,可以显示该进程号对应进程
的较详细内存使用状况,注意在"v"之前是没有"-"号的,以下是"ps -lf"命令和"ps v"命令的对比:
$ps -lfp 5029994
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
240001 A orauser 5029994 1 0 60 20 1d2e7b510 98000 Apr 15 - 190:34 ora_pmon_DEC
$ps v 5029994
PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND
5029994 - A 190:34 4 9152 144536 xx 88849 135384 0.0 0.0 ora_pm
"ps v"命令显示了我们感兴趣的RSS和TRS值,RSS也就是我们说的驻留集,其等于工作段页数(working-segment)*4 + 代码段(code segment) *4,单位为kbytes,而TRS值则仅等于代码段(code segment)*4 kbytes。
请注意AIX平台上内存页的单位为4096 bytes即4k一页,这就是为什么以上RSS和TRS值需要乘以四,举例来说在实际内存使用中代码段占用了2页内存(2 * 4096bytes= 8k),则显示的TRS值应为8。由于RSS既包含了work_segment又包含了code_segment,则RSS-TRS所仅余为工作段内存(work_segment),或曰私有内存段(private memory)。以上例而言,pmon后台进程所用内存:
144536(RSS)-135384(TRS)=9152
9152*1024=9371648 bytes
则pmon后台进程所用私有内存为9152k(9371648 bytes),而非"ps -lf"命令所显示的95MB(98000k)。
TRS即代码段所用内存大致与$ORACLE_HOME/bin/oracle 2进制文件的大小相仿,每个Oracle进程(前后台进程)都需要引用到该oracle 2进制文件,实际该code_segment代码段概念即Unix C中正文段(text)的概念。
如果您真的有闲心想要计算Oracle后台进程内存使用总量,那么可以尝试使用一下公式估算:
(P1.RSS-P1.TRS)+(P2.RSS-P2.TRS)+(P3.RSS-P3.TRS)+...+(Pn.RSS-Pn.TRS)+ TRS + SGA
前台进程的所使用的私有内存计算要复杂上一些,因为前台进程更频繁地使用的私有内存,同时Oracle会尝试回收部分内存,所以其波动更大。你可以多试几次"ps v"命令以便察觉当前窥视的前台进程内存使用是否存在颠簸。
呵呵,在AIX这个黑盒上想要了解Oracle内存使用的细节还真有些难度,实在不行我们就猜吧!