mysql - correct way to handle select + insert to avoid duplicates errors -


hello have simple mysql innodb table 2 fields:

  • id - autoincrement primary index
  • name - unique index

i importing data various sources in parallel , need make sure data not duplicated @ insert doing following:

select `id` `table` `name` = <name>;  if `id` <= 0   insert `table` set `name` = "<name>";   return auto_increment  else return `id` 

this works 99.9999% of times can happen (and has happened me) 2 or more different scripts inserting same data because both select returned id <=0 both insert , 1 of them raises error.

i have 2 possible solutions in mind not sure work best.

one more piece of information: import not find elements in table more of elements inserted probability found grows. final table, after rough calculation, have approximately 7-10 million records :

select `id` `table` `name` = <name>; if `id` <= 0  insert ignore `table` set `name` = "<name>";  auto_increment   if auto_increment <=0    select `id` `table` `name` = <name>;    return `id`   else return auto_increment else return `id` 

or

insert ignore `table` set `name` = "<name>"; auto_increment  if auto_increment <=0   select `id` `table` `name` = <name>;   return `id`  else return auto_increment 

you're hitting race condition. when code detects new insertion necessary, 2 of clients racing first insert value. it's winner-take-all. need write code avoid race condition. fortunately, sql designed it's possible that.

you have couple of choices here, both specific mysql's dialect of sql.

one use built-in function last_insert_id(). serves purpose believe mean get auto_increment.

the other use insert ... on duplicate key update.

it looks logic intended 2 things:

  1. make sure name value in table, putting there if isn't already.
  2. return id value associated name value.

you can so.

insert ignore `table` (name) values (<name>); select id `table` name = <name>; 

notice insert ignore operation doesn't caught race conditions between different programs hitting database, because it's single sql statement.

you can, possibly, optimize using last_insert_id().

insert ignore `table` (name) values (<name>); if (last_insert_id()=0) select. 

Comments

Popular posts from this blog

google chrome - Developer tools - How to inspect the elements which are added momentarily (by JQuery)? -

angularjs - Showing an empty as first option in select tag -

php - Cloud9 cloud IDE and CakePHP -