原来文地址:http://blog.csdn.net/emili/archive/2008/03/17/2192081.aspx
------------------------
大概记录下先,有时间再整理,最后优化的代码如下:
表结构猜测
FACT(用户号,操作时间,动作)
FACT应该是跟踪用户操作的一张表,
现在需要根据这张表记录的数据来统计,给定时间(今天)一共有多少个用户活动过(上线过)以及这些用户中新用户有多少个.
这个存储过程每天定时远行,将昨天的的统计数据插入到表PRT_IM_USERINFO_DAILY中
========优化前的存储过程
CREATE PROCEDURE [dbo].[pro_ImAnalysis_daily]
@THEDATE VARCHAR(30)
AS
BEGIN
IF @THEDATE IS NULL
BEGIN
SET @THEDATE=CONVERT(VARCHAR(30),GETDATE()-1,112);
END
DELETE FROM RPT_IM_USERINFO_DAILY WHERE THEDATE=@THEDATE;
INSERT RPT_IM_USERINFO_DAILY (THEDATE,ALLUSER,NEWUSER)
SELECT AA.THEDATE,ALLUSER,NEWUSER
FROM
( ( SELECT THEDATE,COUNT(DISTINCT USERID) ALLUSER
FROM FACT
WHERE THEDATE=@THEDATE
GROUP BY THEDATE
) AA
LEFT JOIN
(SELECT THEDATE,COUNT(DISTINCT USERID) NEWUSER
FROM FACT T1
WHERE NOT EXISTS(
SELECT 1
FROM FACT T2
WHERE T2.THEDATE<@THEDATE
AND T1.USERID=T2.USERID)
AND T1.THEDATE=@THEDATE
GROUP BY THEDATE
) BB
ON AA.THEDATE=BB.THEDATE);
GO
由于“Parameter sniffing”的特性,导致上面的存储过程执行缓慢,具体请看上面给出的原文地址,下面分析新存储过程:
========优化后的存储过程
ALTER PROCEDURE [dbo].[pro_ImAnalysis_daily]
@var_thedate VARCHAR(30)
AS
BEGIN
declare @THEDATE VARCHAR(30)
IF @var_thedate IS NULL
BEGIN
SET @var_thedate=CONVERT(VARCHAR(30),GETDATE()-1,112);
END
SET @THEDATE=@var_thedate;
DELETE FROM RPT_IM_USERINFO_DAILY WHERE THEDATE=@THEDATE;
INSERT RPT_IM_USERINFO_DAILY(THEDATE,ALLUSER,NEWUSER)
select @thedate as thedate,
count(distinct case when today>0 then userid else null end) as alluser,
count(distinct case when dates=0 then userid else null end) as newuser
from
(
select userid,
count(CASE WHEN thedate>=@thedate then null else thedate end) as dates,
count(case when thedate=@thedate then thedate else null end) as today
from FACT
group by userid
)as fact
GO
根据用户号分组, 统计thedate>=@thedate 等价于(thedate<@thedate then thedate else null)即过去登陆(活动过)的统计进来,thedate=@thedate 则表示今天活动过统计进来,那么下面的分组统计就形成这样的数据集:
T1 (用户ID,过去活动次数,今天活动次数)
用户要么过去登录过要么今天登录过,因此T1.[过去活动次数]与T1.[今天活动次数],绝对不能同时为零,于是外围的count 第二句,当 dates=0时,那么就绝对是新用户了.
注意:这里成立的条件是现在是当前时间,如果我今天统计昨天的,全部登录用户跟最新用户,那么如果今天有Userid新进来那么就有可能导致T1.[过去活动次数]与T1.[今天活动次数]都是零的情况,那么dates=0就不能作为判断新用户的依据了