Posterous theme by Cory Watilo

Filed under: PostgreSQL

Countdown to pgCon

As you may be aware, pgCon 2010the PostgreSQL conference, is just a few months away.  It is an important year for pgCon with the upcoming release of PostgreSQL 9.0.  pgCon's roots are in the PostgreSQL Anniversary Summit where hackers from all over the world flew in to celebrate PostgreSQL's 10 years of open-source.  pgCon has kept with the tradition of catering to the core of the PostgreSQL community including hackers and DBA's alike.  

As the largest PostgreSQL conference one can always expect to be exposed to new features, the latest in thoughts on PostgreSQL performance and scaling, not to mention the many talks aimed at all levels of PostgreSQL professionals and enthusiasts.  To get ready for pgCon this year, I thought I might point out the talks not to miss.

One of the biggest talks will have to be Heikke Linnakangas introduction of the new built-in replication features in PostgreSQL, Hot Standby and Streaming Replication.  If you're not familiar with these features, they are truly game changing for Postgres.  Building on the conceptual foundation of Warm-Standby and Point and Time Recovery log shipping, Hot Standby and Streaming Replication turn idle Warm Standby boxes in to active read-only slaves.  Married with a HA solution, Hot Standby becomes an immediate failover solution with the ability to disconnect from the Hot Standby master server and turn on writes.  In an effort to speed up the whole log-shipping paradigm, the streaming replication feature means no more external commands for copying log segments across the wire to Warm Standby servers.  By combing these two features, one of the most sought-after features for PostgreSQL, native replication, becomes a reality.  Heikke brings to this talk an intimate knowledge of the implementation of these two new features making this talk a must-see.

If you haven't been to pgCon before and are a PostgreSQL user, DBA or enthusiast like myself, this is the year not to miss.  As for the other significant talks for pgCon 2010, stay tuned as I will highlight my must-see list of talks in the coming weeks.

Golconde 0.4 Released

I am pleased to announce the first beta release of Golconde, 0.4.

Golconde is a queue based replication solution for PostgreSQL written in Python 2.6.

It is designed to be loosely coupled and rely upon existing enterprise messaging systems that have STOMP protocol support. Designed to scale easily and with multi-data center implementations in mind, the application and message queues for distribution live outside of the database. By decoupling Golconde from PostgreSQL it is differentiated from existing replication solutions, moving the workload from the database tier, where CPU, RAM and IO overhead can be very expensive to a commodity layer where the operational cost for performing the data distribution work is much less expensive. In a typical Golconde target database, the PostgreSQL operational overhead is similar to the canonical database write workload.

For more information, including downloads, please visit http://code.google.com/p/golconde/

Golconde 0.3 Released

Dubbed a test release, I posted 0.3 today which contains the fully functioning golconde daemon, examples in the test directory for configuration and use, and the ability to use triggers to enqueue messages for distribution.

It can be downloaded here.

I consider this the first stable test release and will be using it as a foundation for subsequent releases.  The roadmap is as follows:

0.4 - Client classes to abstract the enequeue process from the protocol level.
0.5 - AMQP support in addition to Stomp support and possibly 0MQ support.
0.6 - Two-Phase Commit like behavior on non-trigger application flow via rollback commands.

In addition I have already added additional documentation to the Golconde wiki and will be adding more as time permits.  If you have an opportunity to play with or test it, I’d love to hear your thoughts.

Retooling Golconde

After careful examination and internal discussions about Golconde, we have decided to re-architecture Golconde.  One of the primary reasons is while great tools like Slony and Londiste exist for replicating data, our goal is slightly different.  While our primary  goal is still to create a BASE like data distribution system, the emphasis is being shifted from PostgreSQL and trigger based queueing.  

One of our reasons for doing this is through the evolution of our database, schema has evolved in many areas yet at the same time, legacy schema exists at core points in our database.  By engineering Golconde to look at data distribution channels or targets in a generic sense, we have the ability to take different actions based upon the various destinations for our data.  One such action is the “autoSQL” feature where it marries like named columns and actions such as delete and upsert with the appropriate PostgreSQL commands.  We also are able to create destination specific distribution handlers for different schemas or workflows all-together.  Consider the following diagram illustrating a possible Golconde work flow:

Media_httpgolcondegoo_zmvba

In a scenario where the first Destination Table has a different schema than the others, the golcondeDistribution.py application will route a different set of instructions for distributing data and management commands than it will for Destination Tables #2 and #3.  With a simplified enqueue / insertion point, this allows us to manage our different database architectures with common data in one streamlined, scalable location.

By decoupling Golconde from PostgreSQL we also differentiate it from existing replication solutions, moving the workload from the database tier, where CPU, RAM and IO overhead can be very expensive, to a commodity layer where the operational cost for performing the data distribution work is much less expensive.  In a typical Golconde scenario in our environment, the PostgreSQL operational overhead is similar to a client write workload.

While we’re early down the re-engineering road, it is my believe that we can implement a true BASE implementation that with proper exception handling, can ensure data consistency across multiple architectures, databases and data centers with minimal overhead.

Highload++ Presentation

I have uploaded the my slides from Highload++, a web scaling conference in Moscow.  It was a great trip and conference and I was impressed with the size of the PostgreSQL community.  You can view the slides here and the video here.

Worth noting outside of the slide deck is that PostgreSQL has proven to scale to 32 cores for us at MyYearbook.com.  We are currently running the 8.2 line of Postgres on HP 785G5 boxes with 256GB of RAM on 32 Cores.  We found it gave us significant headroom on servers that were eating CPU at 16 cores.

I’d like to thank Nikolay Samokhvalov and the Highload++ Conference for having me come out and speak at the conference.  Nikolay is doing great things in Russia for Postgres and the Postgres user group in Moscow is truly an impressive sight.

Golconde Update

I’m in the middle of moving the Golconde code around to use stomp.py instead of libamqcpp.  In production testing, I found that a memory bug in libamqcpp was causing PostgreSQL to core, which I figure wouldn’t do in an a hardcore production environment.  I’ve converted the triggers, which are the PostgreSQL side and still need to do the client apps.  In addition, due to the fact that stomp.py uses a different connection scheme than libamqcpp, I have to create new configuration tables.

The goal is to release a 0.4 release sometime next week.