windows memory: Memory: Cache Bytes 是系统的working set, 也就是系统使用的物理内存数目。 可以观察Windows用了多少物理内存。
1. System Cache Resident Bytes
2. System Driver Resident Bytes
3. System Code Resident Bytes
4. Pool Paged Resident Bytes
SQL Server 动态管理内存:
SQL Server 是通过以下的API去感知windows是否有内存压力:
APIQueryMemoryResourceNotification -》 windows memory -》 decrease target server memory.
Total server memory : SQL Server启动账户拥有 Lock pages in memory 权限, 锁定内存,避免windows抢夺内存
Target server memory: SQL在启动时候, 比较AWE, max server memory, physical memory 三者选一个最小的值作为Target server memory.
comparison |
Result |
Remark |
Total < target |
Windows has enough memory, SQL can allocate new memory for new data |
Total is increasing |
Total = target |
SQL used all of memory, SQL don’t allocate new memory for new data |
SQL clean up memory for new data, such as Lazy writer |
Total > target |
Windows has memory pressure, SQL decrease Target |
SQL clean up memory for new data, such as Lazy writer to release buffer pool and cached plan |
SQL server 内存管理概念: DBCC memorystatus -- 查看内存使用情况
1. Reserved memory:
2. Committed memory: = Physical memory + Page file = Shared memory + Private Bytes
3. Shared memory:
4. Private Bytes:
5. Working Set: = shared memory + Private Bytes - Page file
6. Page Fault(soft/hard)
32位SQL:
memToLeave: 256MB+256thread*521KB=384MB (SQL 启动的时候就计算好了,不能变大): extended stored procedure, third party dirver, and linked server : 启动参数 -g512 = 512MB+256thread*521KB
BufferPool = 2G-384MB=1664MB (包括 database cache + stolen memory)
AWE开启以后, 只能给先reserve再commit的部分使用,即只能给database cache来扩展使用(Physical memory-2GB就是AWE扩展后database cache所能用的内存数量),stolen memory只能用1664MB里的内存。
SQL内存使用情况分析:
1. SQL性能计数器:
memory manager:
Total server memory,
target server memory,
Optimizer memory
SQL cache memory
Lock memory
Connection memory
Granted workspace memory: hash, sort, bulk-insert, create index...
Memory grants pending: 等待工作空间内存授权的进程总数, 不等于0, 意味着比较严重的内存瓶颈
Buffer manager:
Buffer Cache Hit Ratio: 应该>99%, 如果<95%, 通常就是内存不足的问题
Checkpoint pages/sec: 这个和内存压力没有关系,和用户的行为有关。用户做很多insert/update/delete, 这个值就会很大,脏数据多。
database pages: 就是 database cache
free pages:
Lazy writes/sec: 如果SQL 内存压力不大,不会经常触发lazy writer。 如果经常触发, 那么就应该是有内存的瓶颈
page life expectancy: 页不被引用,将在缓冲池中停留的秒数。只有Lazy writer 被触发, Page life expectancy 才会突然下降。如果总是高高低低的, 应该是有内存压力
page reads/sec: 这个值正常情况下应该很低。 如果比较高, 一般page life expectancy 会下降, Lazy writes/sec 会上升。
stolen pages: 所有非database pages, 包括执行计划缓存。
内存DMV:sys.dm_os_memory_clerk
select type,
sum(virtual_memory_reserved_kb) as [vm reserved],
sum(virtual_memory_committed_kb) as [vm committed],
sum(awe_allocated_kb) as [AWE allocated],
sum(shared_memory_reserved_kb) as [SM reserved],
sum(shared_memory_committed_kb) as [SM committed],
sum(multi_pages_kb) as [Multipage allocator],
sum(single_pages_kb) as [Singlepage allocator]
from sys.dm_os_memory_clerks
group by type
order by type
内存中的数据页面由哪些表格组成,各占多少: sys.dm_os_buffer_descriptors
dbcc dropcleanbuffers --clean up data buffer
go
declare @name varchar(100)
declare @cmd varchar(5000)
declare c1 cursor for select name from sys.sysdatabases
open c1
fetch next from c1 into @name
while @@FETCH_STATUS=0
begin
print @name
set @cmd= 'select b.database_id, db=db_name(b.database_id), object_name(p.object_id), p.index_id, buffer_count_kb=count(*)*8 from ' +@name+'.sys.allocation_units a, '+@name+'.sys.dm_os_buffer_descriptors b,'+@name +'.sys.partitions p where a.allocation_unit_id=b.allocation_unit_id and a.container_id=p.hobt_id ' +' and b.database_id=db_id('''+@name+''')' +' group by b.database_id, p.object_id, p.index_id ' +' order by b.database_id, buffer_count_kb desc '
print @cmd
exec(@cmd)
fetch next from c1 into @name
end
close c1
deallocate c1
go
执行计划都缓存了些什么? 哪些比较占内存?
select objtype, sum(size_in_bytes) as sum_size_in_bytes, count(bucketid) as cache_counts
from sys.dm_exec_cached_plans
group by objtype
select usecounts,refcounts, size_in_bytes, cacheobjtype, objtype, text
from sys.dm_exec_cached_plans cp
cross apply sys.dm_exec_sql_text(plan_handle)
order by objtype desc;
go
-----------------华丽的分割线: 下面开始分析 数据页面(database page), buffer pool 里的stolen, multi-page 这三部分------------------------------------
数据缓冲区压力分析:
Lazy writes/sec 高
Page life expectancy 低
Page reads/sec 高: 正常这个值应该接近0, 指从数据文件读取的数据量
Stolen pages 降低
sys.sysprocesses 中出现一些连接等待I/O完成的现象: PAGEIOLATCH_SH
使用DMV分析启动以来做read最多的语句:
--按照物理读的页面数排序,前50名:
select top 50
qs.total_physical_reads, qs.execution_count,
qs.total_physical_reads / qs.execution_count as [avg IO],
substring(qt.text, qs.statement_start_offset/2,
(case when qs.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2
else qs.statement_end_offset end - qs.statement_start_offset)/2) as query_text,
qt.dbid, dbname=db_name(qt.dbid),
qt.objectid, qs.sql_handle, qs.plan_handle
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as qt
order by qs.total_physical_reads desc
--按照逻辑读的页面数排序,前50名
select top 50
qs.total_logical_reads, qs.execution_count,
qs.total_logical_reads / qs.execution_count as [avg IO],
substring(qt.text, qs.statement_start_offset/2,
(case when qs.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2
else qs.statement_end_offset end - qs.statement_start_offset)/2) as query_text,
qt.dbid, dbname=db_name(qt.dbid),
qt.objectid, qs.sql_handle, qs.plan_handle
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as qt
order by qs.total_logical_reads desc
--使用trace 文件分析做read最多的语句
select * into sample
from fn_trace_gettable('c:sample.trc',default)
where eventclass in (10, 12) -- 10: RPC completed 12: SQL batch completed
select top 1000 textdata, databaseid, hostname, applicationname, loginname, spid
starttime, endtime,duration,reads,writes,cpu
from sample
order by reads desc
--使用ReadTrace工具分析trace文件,找出使用大量系统资源的语句
使用方法见稍后发出。。。
Stolen Memory缓存压力分析:
Stolen memory 凡是以8K为分配单位的,保存在buffer pool里。 大于8K的,保存在MemToLeave里。
缓存: 执行计划,用户安全上下文,连接的数据结构和输入/输出缓冲区
没有缓存: 语义分析,优化,排序,Hash,计算
Stolen memory 在不同的SQL版本,最大的限制是不同的,见下表:
表征与解决方法:查看 sys.sysprocesses 里面的连接等待waittype字段不等于0x0000: (可以手动 DBCC FREEPROCCACHE)
1. CMEMTHREAD (0x00B9)
2. SOS_RESERVEDMEMBLOCKLIST (0x007B)
3. RESOURCE_SEMAPHORE_QUERY_COMPILE(0x011A)