• Mysql删除重复的数据 hello*


    最近在做一个多线程的爬虫程序,由于队列中有重复的数据,尽管程序中有判断不存在则插入,但由于多个线程并发,导致数据库中存在部分重复的数据。

     程序中的bug已经修复,但重新爬一遍耗时耗力,于是就选择删除重复的数据,只保留一条有效数据

    解决的思路就是根据确定其数据唯一的聚合字段进行分组,然后只保留一条有效数据

    1.查询重复数据

    select * FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)

     2.只保留Id最小的1条数据,过滤出要被删除的数据

    select * FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)
        AND id NOT IN(SELECT
                        id
                      FROM (SELECT
                              MIN(id) AS id
                            FROM ZYZBBData
                            GROUP BY code,year,report_type
                            HAVING COUNT( * ) > 1) b)

    3.删除重复的数据

    DELETE
    FROM ZYZBBData
    WHERE (code,year,report_type) IN (SELECT
                              code,
                              year,
                              report_type
                            FROM (SELECT
                                    code,
                                    year,
                                    report_type
                                  FROM ZYZBBData
                                  GROUP BY code,year,report_type
                                  HAVING COUNT( * ) > 1) a)
        AND id NOT IN(SELECT
                       id
                      FROM (SELECT
                              MIN(id) AS id
                            FROM ZYZBBData
                            GROUP BY code,year,report_type
                            HAVING COUNT( * ) > 1) b)

     数据正常

  • 相关阅读:
    开源操作系统发行版特性学习
    ssar
    OpenEuler特性学习 —— 统一共享内存 (share_pool)
    OpenEuler特性学习 —— Limit Pagecache
    oomd、lmkd与PSI
    使用git rebase onto一例
    Alibaba Cloud Linux 资源隔离及混部技术
    sysAK
    walk_tg_tree_from的图解
    编译ubuntu内核
  • 原文地址:https://www.cnblogs.com/HTLucky/p/15516231.html
Copyright © 2020-2023  润新知