今天发现线上数据库主从延迟严重:
SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server, dr_state.database_id as database_id, dr_state.redo_queue_size, is_ag_replica_local = CASE WHEN ar_state.is_local = 1 THEN N'LOCAL' ELSE 'REMOTE' END , ag_replica_role = CASE WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED' ELSE ar_state.role_desc END FROM (( sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id ) JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id) JOIN sys.dm_hadr_database_replica_states dr_state on ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id;
从库大量日志没有做,当时就想到可能是从库有事物没有执行完毕,查看了一下未结束的事物和锁信息,发现并不是这个原因,查看错误日志:
消息
Timeout occurred while waiting for latch: class 'COLUMNSTORE_ROWGROUP_COLLECTION', id 00000030F72767A0, type 4, Task 0x00000030FF058CA8 : 0, waittime 44400 seconds, flags 0x1a, owning task 0x00000030FF058CA8. Continuing to wait.
我擦,这不是上次碰到的问题吗,假定您启用 Microsoft SQL Server 2014年的 AlwaysOn 可用性组功能。在 rowgroups 转换为压缩的状态将导致在主站点上聚集 Columnstore 索引 (CCI) 为插入数据时,在辅助站点上的重做线程可能会遇到闩锁超时错误。COLUMNSTORE_ROWGROUP_COLLECTION内部锁争用,这里说明下,这是SQL Server2014 12.0.2000的一个BUG,列存储锁争用,我们需要升级补丁或者重启数据库才能够解决,我是碰到两次了,最终解决办法就是给从库打了一个补丁,完美解决。