mysql数据去重实践总结
在线上服务器,由于程序的bug,导入很多重复的数据。刚开始想到的思路是直接写SQL进行去重。例始要对login_name这个字段进行去重:
DELETE
FROM
`oldTable`
WHERE
login_name IN (
SELECT
a.login_name
FROM
(
SELECT
login_name
FROM
`oldTable`
GROUP BY
login_name
HAVING
count(login_name) > 1
) AS a
)
AND id NOT IN (
SELECT
b.aa
FROM
(
SELECT
min(id) AS aa
FROM
`oldTable`
GROUP BY
login_name
HAVING
count(login_name) > 1
) AS b
);
如果mysql数量一多,用上面的方法操作效率是很低的。这时要用mysql联合索引,可以建立一个表跟oldTable结构一样的表,并对login_name建立联合索引。
alter table newTableadd unique index(login_name);
如果数择量很大,可以根据时间或主键id值进行分批插入。
INSERT IGNORE INTO newTable SELECT * FROM oldTable WHERE START_TIME >= '2021-08-20 00:00:00' AND START_TIME <'2021-08-23 00:00:00'