Programming

The big old reliable elephant: talking about Postgres with Craig Kerstiens

In this opportunity I interviewed Craig Kerstiens. Craig works for citusdata, a company that helps customers scale databases beyond a…

LambdaClass

26 Apr 2017 • 4 min read

In this opportunity I interviewed Craig Kerstiens. Craig works for citusdata, a company that helps customers scale databases beyond a single node. Over the last few years, after building systems that used Redis, Cassandra, Riak, Elasticsearch and even Mongo, I rediscovered my love for PostgreSQL. The documentation is excellent, its development pace is astonishing and it is a good old reliable beast. However its feature set can be quite overwhelming. To mention just a few of its special features: you got a big option of index types (B-tree, Hash, GiST, SP-GiST, GIN and BRIN), many data types, extensions, window functions, common table expressions, foreign data wrappers. As Craig said in one of this talks “Postgres is the Emacs of databases”. If you have ever used Emacs you know that learning it can take some time due to the number of choices available.

Craig helped me decide what postgres features to use and how, thanks to his blog, postgresguide.com and postgresweekly.com. He has been really helpful in sharpening my postgres knife.

Discuss and vote at lobsters, reddit and hn.

Why do you prefer PostgreSQL over other SQL databases?

It was actually a little ways into my career before I came around to Postgres. I started on Oracle in college. When I found Postgres it was a bit stodgey and just correct. There was nothing wrong with it, but it wasn’t anything to write home about. What was interesting about it was the license which made it very favorable to fork and extend and add any more value. Since that time though it’s really come a long way from a user experience perspective, and really expanding beyond the “SQL” world.

Why do you think a lot of developers moved away from SQL databases a few years ago?

To be right to the point… user experience. SQL is a great language for querying lots of data. The idea that you have to setup your schema ahead of time and define your data model is painful though. Over the longterm is pays off, but there is nothing inherint in SQL that says you can’t model things in documents and have that transformed to relational. The ease and promise of document databases is huge for getting started, but it’s not a perfect long term solution…

Why did many of them return to use SQL databases?

The ability to get up and running quickly is always at odds with long term maintainability. Thats not to say that you should pre-maturely optimize, but SQL has stood a test of time of querying and accessing data and we’re seeing that in a return to systems that re-implement SQL on top of other datastores.

In which cases wouldn’t you use PostgreSQL?

The only one that really comes to mind is graph databases. Over time PostgreSQL has gained support XML, key value, full text search, document storage, the only thing it hasn’t covered yet really is the graph arena.

What is the most misused feature of PostgreSQL or SQL?

Wow, thats a tough one. I’m going to completely side step the question and say that it’s not leveraging Postgres specific features. I see a lot of developers say “I want to be able to migrate away so I’m going to do the most generic bare bones thing possible” Whether that’s not using the array datatype, JSONB, GIN or GiST indexes, at that point why even pick Postgres? The biggest mistake I see is not mis-using a feature, it’s not using it for the idea you might one day migrate.

What are your favorite PostgreSQL extensions?

Obviously I’m biased towards Citus. I’ve actually known the Citus team for over 4 years now, before they were an extension to Postgres. At Heroku what Iwould hear and see over and over is people running into this ceiling with Postgres. SQL databases typically work really well on a single node, and there’s this notion that SQL can’t be distributed. Citus actually solves a problem for hundreds of customers I saw outgrowing single node Postgres whether at 100 GB or 1 TB, that makes it trivial to shard out while still maintaining transactional semantics.

Though I have to mention my second favorite: HyperLogLog. This is one I personally have not got to put into production, but is super awesome. It’s essentially a sketch algorithm for probabilistic distincts across really large datasets.

Is there any feature that another database has that you would like PostgreSQL to have?

I think the biggest would be easier onboarding or not having to define your schema ahead of time. ToroDB aims to do some of this for Postgres when coming from Mongo, but a native Postgres experience would be huge here.

What do you recommend reading to understand the implementation of PostgreSQL?

Subscribe to the hackers mailing list.
https://www.postgresql.org/list/pgsql-hackers/

The best engineers (even one that don’t actively develop for Postgres subscribe here as it’s just great discussion and development).

And for a lighter read, postgresweekly.com

What do you recommend reading to get better at SQL and PostgreSQL?

I actually got my foundation in relational algebra. At the time it seemed overly academic like a lot of other CS, when it comes to databases though and in particular SQL it gives you a huge leg up. I’d very much encourage people to spend some time on the academic side of relational algebra, then move to SQL syntax, over the long term it’ll really pay off.

Sign up for more like this.