This trilogy is dedicated to Data Compare SQL with intention to show you what we’ve done while the product is under development. The whole data comparison/transfer thing comprises 3 general steps: set-up, comparison, and synchronization. At the first step a user chooses servers and databases to compare, including adjusting schema/object mapping and comparison options. At second step the program compares data in tables according the specified user settings. And the last step is about selecting what to synchronize, adjusting synchronization options, and the data synchronization itself. At each of these steps the performance is the crucial point for large databases (I’m talking about thousands of tables and terabytes of records’ data), because e.g. saving 1 microsecond (10e-6) on each record comparison will decrease the total comparison time by 15 minutes for 1 billion of records. We have all required knowledge to put best algorithms and methods together, thus we promised to create the fastest product ever, and we really will.
Today we finished the first step, a set-up, and got astonishing performance! But before to show you the results, let me clarify what do we mean saying ‘set-up’. Let’s start from the very beginning: you select a source server, then a target server, connect to both of them, and choose a pair of databases to compare. At this point a program must read databases’ metadata to construct an object model of these databases, i.e. describe tables and views, their columns, and so on. After a program has two object models of two databases, it finds matches between objects by theirs names. This step we call ‘mapping’. Moreover, any good software performs deep analysis for every matched pair of columns to find out whether data types in that columns are compatible to compare and to synchronize, which warn a user about possible data loss when source database schema differs from target one. This analysis is considered as a part of mapping.
Latency chart on constructing database object models and mapping of 22,000 tables |
In order to test the performance of database model constructing and objects mapping we took a really huge SQL Server database with approximately 22,000 tables (thanks to our friend Mike). And, of course, we picked out two competitive software programs. We won’t call their names or their vendors, but we suppose there are only two adequate products that pass Optillect’s internal standards of quality and performance. Then we did 10 runs for each product, measured time for each run, and chose the best time for each product. As you see, the results on the chart say by their selves. After three weeks of designing Data Compare SQL’s architecture and two weeks more for implementing step 1, we can proudly claim that our approach works 4-5 times faster than any best competitive solution available.
In the end, I’d like to say that we spend much time on writing unit tests to ensure product’s quality, and at the moment we have more than 3,000 of tests which cover every possible case. When we finish step 2 (data comparison) I will certainly share the results with you by publishing them in ‘At speed of light – Part 2’. We sincerely believe you’ll be enjoying our solutions :) Thanks for reading!
At speed of light - Part 1 (current article)