Saturday, April 2, 2011

Vertica @ NEDB Summit - Crescando, SharedDB, SwissBox

[This post is waaay late -- I wrote it back in January but I am just putting it up now]

Last Friday a crew from Vertica attended the New England Database Summit at MIT (Many thanks to Sam Madden for organizing such a good event). Despite Mingsheng walking off with my backpack (by accident) and almost giving me a heart attack it was a fun day.

I was especially impressed with the keynotes.

The first was Donald Kossmann, a professor at ETH Zurich entitled Predictable Performance for Unpredictable Workloads about a system they had co-developed with Amadeus IT Sophia-Antipolis for their airline reservation tracking system – think the computer that the boarding attendant uses at the gate to find out who is eligible for first class upgrades, who is a vegetarian, etc.

Overall I thought it was an excellent example of one-size-doesn't-fit-all: by designing a system for the specific workload, their result was both an order of magnitude less expensive and could actually meet the SLA promised by Amadeus to its customers.

The SLA promised: no query will exceed 2 seconds. Ever. Regardless of load on the system. It is totally fine if the system returns results in 2 seconds when it is running a single query, but it also must return results in 2 seconds if it is running thousands queries simultaneously. Stewardesses can smile at you for 2 seconds and you won't even notice, but waiting minutes under heavy load, was unacceptable. Think about the recent Icelandic volcano eruption which hosed air travel for months.

Their system was built around a storage manager called Crescando which is a word play on how it is structured: a sharded, distributed, in-memory table which is scanned continuously. They choose an amount of memory (~1GB) that can be scanned in ~1 second by a single modern CPU core and shard the table across cores and nodes into enough chunks to store the entire database (~1000 nodes). They put a query processor on top called SharedDB to plans queries and instruct each Crescando scan node what rows to return (what predicates). Since the table is completely scanned each second, voila you have your 2 second guarantee. Since it is an in memory database, it also supports arbitrary updates by applying them in batch during the sweep.

Professor Kossmann concluded with a short bit about the SwissBox project which would package the software into an appliance form factor and add some custom hardware (maybe the networking layer?) I have to admit that I find it amusing that even though a parade of companies have proven time and time again that custom FPGAs and ASICs can't keep pace with well written software for x86 processors people keep trying to add custom hardware anyways. Of course as a software guy I am biased, but I can't imagine being able to make hardware in small batches to compete with the economies of scale and R&D budget that Intel can throw at x86 development.

In another post, I will hopefully write up the other keynote: Renee Miller who I thought presented clearly and concisely the theoretical underpinnings to automatically determine functional dependencies from data (it is fascinating, I swear!), which I have discovered is a classic Data Mining technique. That one got me thinking of one of the directions we are headed at Vertica.

Sunday, March 27, 2011

Emacs + Vertica = vsql-mode




The other day it occurred to me that Vertica is grown up and therefore deserve its own emacs mode. Thus, I went and hacked together a bit of elisp (thank you 6.001) and voila! the vsql mode mode was born.


The emacs SQL mode lets a developer/vertica user connect and interact with a database from within emacs. Emacs ships with specializations for all the big databases (Oracle, SQLServer, Postgres, etc.) -- the vsql mode is a specialization for the Vertica client tool.

As a little back story: I am an avid (addicted?) emacs user, and as I get more and more entrenched in the editor I find it does more and more. I am often embarrassed, in fact, when someone asks me “how did you do x”, and I don't know. Typically I need to watch my hands as they recall the pattern from muscle memory to tell them. One mode I particularly like and use daily is the SQL mode and I have used the postgres mode for the last three years at Vertica (killer feature: C-c C-r) while developing the Vertica Optimizer. I can get away by symlinking vsql to psql because they behave similarly enough, but it was time for native vsql support.

To use:

Download vsql.el from github (https://github.com/alamb/Emacs-VSQL-Mode)

Add the following to your .emacs file (pointing at wherever you downloaded vsql.el):

;; vsql specific emacs SQL mode:
;; run via M-x sql-vertica
(load “vsql")

Invoke the mode with M-x sql-vertica

Tip: you can send sql text from other buffers to the *SQL* buffer via C-c C-r.

Things left to do: update the keywords list for the vertica specific SQL dialect, and maybe other stuff. Feel free to help out or tell me what would make it more useful

Andrew