mysql - correct way to handle select + insert to avoid duplicates errors -
hello have simple mysql innodb table 2 fields:
- id - autoincrement primary index
- name - unique index
i importing data various sources in parallel , need make sure data not duplicated @ insert doing following:
select `id` `table` `name` = <name>; if `id` <= 0 insert `table` set `name` = "<name>"; return auto_increment else return `id`
this works 99.9999% of times can happen (and has happened me) 2 or more different scripts inserting same data because both select returned id
<=0 both insert , 1 of them raises error.
i have 2 possible solutions in mind not sure work best.
one more piece of information: import not find elements in table more of elements inserted probability found grows. final table, after rough calculation, have approximately 7-10 million records :
select `id` `table` `name` = <name>; if `id` <= 0 insert ignore `table` set `name` = "<name>"; auto_increment if auto_increment <=0 select `id` `table` `name` = <name>; return `id` else return auto_increment else return `id`
or
insert ignore `table` set `name` = "<name>"; auto_increment if auto_increment <=0 select `id` `table` `name` = <name>; return `id` else return auto_increment
you're hitting race condition. when code detects new insertion necessary, 2 of clients racing first insert value. it's winner-take-all. need write code avoid race condition. fortunately, sql designed it's possible that.
you have couple of choices here, both specific mysql's dialect of sql.
one use built-in function last_insert_id()
. serves purpose believe mean get auto_increment
.
the other use insert ... on duplicate key update
.
it looks logic intended 2 things:
- make sure
name
value in table, putting there if isn't already. - return
id
value associated name value.
you can so.
insert ignore `table` (name) values (<name>); select id `table` name = <name>;
notice insert ignore
operation doesn't caught race conditions between different programs hitting database, because it's single sql statement.
you can, possibly, optimize using last_insert_id()
.
insert ignore `table` (name) values (<name>); if (last_insert_id()=0) select.
Comments
Post a Comment