年初时,写过一篇去重的,在小表中还能用用,在大表中真的是效率低下,现在给了一次优化
https://www.cnblogs.com/jarjune/p/8328013.html
继上一篇文章
方法三:
DELIMITER //
DROP PROCEDURE IF EXISTS delete_rows_2;
CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN
DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);
SET DELETE_TABLE_ROWS_SQL = CONCAT('
DELETE
FROM
', TABLENAME ,'
WHERE
(', FIELDNAMES ,') IN (
SELECT ', FIELDNAMES ,'
FROM (
SELECT
', FIELDNAMES ,'
FROM
', TABLENAME ,'
GROUP BY
', FIELDNAMES ,'
HAVING
COUNT(1) > 1
) t1
)
AND ', AUTOFIELD ,' NOT IN (
SELECT ', AUTOFIELD ,'
FROM (
SELECT
MAX(', AUTOFIELD ,') ', AUTOFIELD ,'
FROM
', TABLENAME ,'
GROUP BY
', FIELDNAMES ,'
HAVING
COUNT(1) > 1
) t2
)
');
SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;
PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;
END//
DELIMITER ;
CALL delete_rows_1('表名', '字段1,字段2,字段3...', '主键(唯一)字段');
之后发现删除的效率还是挺低,又优化成
方法三(优化):
DELIMITER //
DROP PROCEDURE IF EXISTS delete_rows_2;
CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN
DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);
SET DELETE_TABLE_ROWS_SQL = CONCAT('
DELETE
FROM
', TABLENAME ,'
WHERE
', AUTOFIELD ,' IN (
SELECT
', AUTOFIELD ,'
FROM
(
SELECT
', AUTOFIELD ,'
FROM
', TABLENAME ,'
WHERE
(', FIELDNAMES ,') IN (
SELECT
', FIELDNAMES ,'
FROM
', TABLENAME ,'
GROUP BY
', FIELDNAMES ,'
HAVING
COUNT(1) > 1
)
AND ', AUTOFIELD ,' NOT IN (
SELECT
MAX(', AUTOFIELD ,')
FROM
', TABLENAME ,'
GROUP BY
', FIELDNAMES ,'
HAVING
COUNT(1) > 1
)
) t2
)
');
SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;
PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;
END//
DELIMITER ;
CALL delete_rows_2('表名', '字段1,字段2,字段3...', '主键字段');
由于上述都要group by 两次,又换了一种思路
方法四
DELETE t1
FROM
l_weijij_47 t1,
(
SELECT
f01,
f02,
f03,
MAX(seq_value) seq_value
FROM
l_weijij_47
GROUP BY
f01,
f02,
f03
HAVING
COUNT(1) > 1
ORDER BY NULL
) t2
where
t1.f01 = t2.f01
AND t1.f02 = t2.f02
AND t1.f03 = t2.f03
and t1.seq_value < t2.seq_value
注:group by默认会进行排序,所以要加上order by NULL就避免了排序
group by a,b,c的时候,a,b,c一定要加索引才快
综上,方法四是目前在用的去重。