dp.cx blog

Posted on

Filed under linux, mysql, software-raid, and migrations

Is it possible software raid 1 to not actually record data writes until some time later? I know it's possible with hardware raid, but I've never heard of that with software raid.

Yesterday, a database migration ran in production. The table that we were inserting data to does not have an auto incrementing column; we're forced to use SELECT MAX(column)+1. Once the migration finished, we ended up with 2 instances of ~34k records with the same data, and same id, and 6-8 instances of ~600 records with the same data and same id.

We have triggers on the table for inserts, updates, and deletes, that log to another table, for our data warehouse project.

In each of the 8-10 instances above, all of the records with the same values were inserted at the same second.

The only thing I can think is that the raid didn't sync, even though it suggested it had written, and caused the SELECT MAX(column)+1 to return the same value.

A friend thinks that I've got "concurrency issues, probably several threads that do a "SELECT MAX(column)+1", buffer it while several others get the same result, then they all insert rows with the same id". But my migration runs in a loop, doing a new SELECT MAX(column)+1 each time, and a new insert each time. I can't think of a way where this would queue up in that fashion.

I'm annoyed enough by a missing auto incrementing column, and being forced to SELECT MAX(column)+1; but it's something we'll have to live with for a while longer. But I'm stymied by the fact that our triggers all show the records being inserted at exactly the same time, across (in two cases) ~34k records.