After careful examination and internal discussions about Golconde, we have decided to re-architecture Golconde. One of the primary reasons is while great tools like Slony and Londiste exist for replicating data, our goal is slightly different. While our primary goal is still to create a BASE like data distribution system, the emphasis is being shifted from PostgreSQL and trigger based queueing.
One of our reasons for doing this is through the evolution of our database, schema has evolved in many areas yet at the same time, legacy schema exists at core points in our database. By engineering Golconde to look at data distribution channels or targets in a generic sense, we have the ability to take different actions based upon the various destinations for our data. One such action is the “autoSQL” feature where it marries like named columns and actions such as delete and upsert with the appropriate PostgreSQL commands. We also are able to create destination specific distribution handlers for different schemas or workflows all-together. Consider the following diagram illustrating a possible Golconde work flow:
In a scenario where the first Destination Table has a different schema than the others, the golcondeDistribution.py application will route a different set of instructions for distributing data and management commands than it will for Destination Tables #2 and #3. With a simplified enqueue / insertion point, this allows us to manage our different database architectures with common data in one streamlined, scalable location.
By decoupling Golconde from PostgreSQL we also differentiate it from existing replication solutions, moving the workload from the database tier, where CPU, RAM and IO overhead can be very expensive, to a commodity layer where the operational cost for performing the data distribution work is much less expensive. In a typical Golconde scenario in our environment, the PostgreSQL operational overhead is similar to a client write workload.
While we’re early down the re-engineering road, it is my believe that we can implement a true BASE implementation that with proper exception handling, can ensure data consistency across multiple architectures, databases and data centers with minimal overhead.