Choosing PostgreSQL Over MySQL, MariaDB

shutterstock_313785986

Check out this recent chart from Google Trends, which shows interest in PostgreSQL largely holding steady over the past ten years, even as MySQL faces fairly significant declines. It’s a good jumping-off point for a discussion about whether it’s time to switch to PostgreSQL for your database-management needs.

PostgreSQL, MySQL, and MariaDB are the three “main” open-source relational databases available today (there are four if you count FireBird, but for brevity I’m excluding it). If you don’t know MariaDB, it’s a drop-in replacement for MySQL, forked not long after Oracle bought Sun (which had previously bought and owned MySQL). MariaDB is more open than Oracle’s MySQL, but technically maintains very high compatibility.

There are other forks out there. Drizzle’s roadmap doesn’t go past 2012, so it seems dead, whereas WebScaleSQL is an ongoing collaboration by engineers from Alibaba, Facebook, Google, LinkedIn, and Twitter. The latter aims at large-scale users with a fork of MySQL 5.6 with a GPL 2 license. With commits in the last couple of weeks, this project is clearly alive and kicking.

For years, MySQL had a reputation of being faster than PostgreSQL, but much of that was due to the MyISAM database engine, which didn’t support transactions. On the flip side of things, PostgreSQL had a reputation for being slower but more reliable. In recent versions of both platforms, speed has been less of a problem for PostgreSQL, while MySQL now defaults to the InnoDB engine, which does handle transactions.

Perception Bias

These assumptions have long biased developers in selecting a database. But as the databases continue to improve, the choice gets a little more complicated. MySQL is less standardized that PostgreSQL, and it lacks schemas. (A schema is similar to a namespace, so you can have two tables called MyTable in the same PostgreSQL database; they might be named database1.schema1.Mytable and database1.schema2.MyTable. In MySQL, without schemas, you’d probably use two databases.)

For a simple web application that needs a relational database, MySQL is probably going to be your main choice. It’s mostly SQL 99 compliant, is easy to install (if not already installed by your hosting company) and there are plenty of good (and also free) client tools. I particularly like Heidi  SQL, which works well for both MySQL and PostgreSQL and Microsoft SQL Server. Generally, there are more client tools available for MySQL, which reached 20 years old in 2015.

PostgreSQL is actually based on Ingres, a database from the 1970’s. Ironically, PostgreSQL’s sheer number of advanced features harms it in the comparison with MySQL; sometimes simpler is better. Here are some other things that MySQL lacks:

Indexing

All relational databases let you add indexes to speed up searches, but PostgreSQL’s indexes go further. It’s in good company with both Oracle and Firebird in having expression indexes. These let you apply a function (say lowercase or some kind of hash) rather than just indexing on one or more columns. Partial indexes only apply to rows which match a ‘where’ clause. This keeps the index smaller.

BRIN (Block Range Index) breaks the data from very large tables into ranges, then stores those pages with extra data showing the minimum and maximum values. This avoids creating and maintaining a B-Tree index, which is what non-BRIN indexes use. It’s slower than a B-Tree index but takes up less space. You can read more about it on Michael Otacoo’s post.

For text searches there are GiST and GIN indexes. The GiST (Generalized Search Tree) lets you create custom indexes. You provide the customization to index your content and the GiST infrastructure code manages the index pages layout on disk, the search algorithms to search and delete from indexes, plus complex transactional details for page-level locking for high concurrency and write-ahead logging for crash recovery.

Generalized Inverted Indexes (aka GIN) offer an alternative to GiST by storing words, but it’s slower to index than GiST. According to the PostgreSQL documentation, GIN index lookups are three times faster than GiST.

Table Inheritance

PostgreSQL is described as object relational. There’s always been something of an Impedance Mismatch between the worlds of relational databases and object oriented programming. They have different paradigms and this is why Object Relation Mapping (ORM) exists. PostgreSQL tries to bridge the two worlds. For instance, table inheritance lets you define a base table then create a new table derived from it, just like in object oriented programming. Tables can inherit from multiple tables with identical names and types of columns merged into one. As you’d expect, you can select, update or delete but not insert into an inherited table. More details are listed in the PostgreSQL documentation.

JSON and NoSQL

Although MySQL has JSON support, PostgreSQL goes further with both JSON  and jsonb document support. This lets you do NoSQL-type operations, making PostgreSQL extra useful.

The previous versions (9.3, 9.4) have been adding to this documentation, increasing its usefulness. The jsonb format, which appeared in the 9.4 release, stores JSON data in a binary form and enables indexing of data in a JSON document. You can read more about JSON and jsonb.

Conclusion

PostgreSQL is a pretty complicated beast, but is very reliable and has features typically only found in Oracle and Informix. You can see these and other comparisons on Wikipedia.

Whether you need these functions, or prefer to use MySQL/MariaDB etc. depends on your application and its future database requirements. But it’s certainly worth checking out what’s possible. To get the best performance out of PostgreSQL, you’ll need to tune the config files.

Image Credit: PureSolution/Shutterstock.com

Comments

7 Responses to “Choosing PostgreSQL Over MySQL, MariaDB”

December 03, 2015 at 4:05 pm, Rob Andrews said:

How is PostgreSQL at replication, clustering and columnar representation of data?

Reply

December 03, 2015 at 6:19 pm, Patrick Griffin said:

Starting with PostgreSQL 9, it has excellent support for replication, via it’s Streaming Replication feature, which puts replication in PostgreSQL on-par with Oracle (via Oracle’s DataGuard).

As far as clustering goes, there’s no built-in support for Active-Master clustering, so you’re limited to having a single Master with many read-only slaves, or using a 3rd party add-on for such functionality (assuming any exist). This is, in my opinion, what makes database systems like Oracle worth the money spent on it. If, however, you’re ok with only being able to write to one master and reading from many slaves, then you can take advantage of Streaming Replication to make a pretty performant cluster, but again it’s not anything as fancy as Oracle’s RAC technology, which would let you write to any node and have the changes available immediately to all other nodes.

Reply

December 04, 2015 at 3:44 am, e70838 said:

I am a programmer. SQL is a good language that has evolved a lot since its creation. When using oracle, I always suffer from its very poor support of standard. Look at examples at https://modern-sql.com/standard to see how rich the syntax is. All this works perfectly fine with postgres, but not in oracle or mysql.

Reply

December 04, 2015 at 8:23 am, Carlos Ferreyra said:

I always loved PostgreSQL over MySQL, but now that I’ve seen that you can get synchronized replication writing on any node using a MariaDB Galera cluster, I can’t wait for PostgreSQL to keep up and come up with the same functionality.

Reply

December 04, 2015 at 2:37 pm, Richard Saunders said:

“For brevity” you do not compare Firebird? If you are trying to alert people to a possible good alternative to MySQL how is omitting Firebird helpful?

Reply

December 10, 2015 at 8:07 pm, Michael Rasmussen said:

“As you’d expect, you can select, update or delete but not insert into an inherited table.”

Incorrect. You can insert into an inherited table. All CRUD operations are possible against inherited tables. If you do them against the parent, they automatically are run against the children. Running an operation against a child does not propagate up to the parent.

Inheritance is frequently used for data sharding.

Reply

December 11, 2015 at 1:24 am, Anonymous Coward said:

About the google trends chart, I don’t see a sharply different pattern for postgres vs mysql. Proportionally, they both seem to loose market share at about the same speed.

My personal reason to always go with mariadb instead of postgres is a very simple one. Microservices are where the world goes, and they are inherently of little complexity, given their size. Sure all the features that Postgres has are very nice, but not of much use with a microservices-based architecture. Why carry the elephant along when you can use an embedded funny little seal instead?

More, in a world dominated by web-based applications, regardless if they are native mobile clients or html5-based, read speed has typically the highest contribution to app speed, compared to write speed. Mariadb and mysql before it are highly optimized for fast reads. Add this to the typical low complexity of modern applications, and you get the picture.

Postgres’ superior feature-richness and support for complex SQL is only useful when you move much of your app logic to the database. This is an anti-pattern, in today’s world. This was useful at a time when most of the app logic was residing in the database, packed as stored procedures and functions and triggers and whatever else the particular database was supporting, and clients and app servers were a thin layer on top of the database. At that time, postgres was ages ahead of anything else, including most commercial offerings.

But that’s no longer the case. Typically, much of the application logic is code meant to run outside of the database. Having logic split across both Java or C# or C++ or PHP or whatever other language you use and also packed as SQL inside the database is very difficult to manage, eventually getting your architecture to crumble, leading to an unmaintainable mess. One more reason to go with mariadb – the database’s limitations will act towards your best interest, which is to keep all related logic in one place – the source code.

Reply

Post a Comment

Your email address will not be published.