Anton Vinogradov
Company: Apache Software Foundation
Imagine you were editing a document, but deleted it by mistake. Rolling back to Report3_release2FinalLast-Fixed!!!4.txt.bak.bak, saved on a flash drive, and a couple of memory additions would fix the problem.
Now imagine that several people were editing a document online and the server went down. A server backup and coordinated work by the authors of the document would solve the problem.
Finally, imagine that thousands of people edited millions of documents on hundreds of servers with asynchronous replication to a backup cluster, but a bug in the code caused every million changes within each cluster to be lost. Is there a solution to such a problem?
The speaker will tell you what to do when code-review, failover, and certification did not help avoid a distributed database crash.
Company: Apache Software Foundation