Why I Choose PostgreSQL Over MySQL/MariaDB

shutterstock_172084607

shutterstock_172084607

For the past ten years, developers and tech pros have made a game of comparing MySQL and PostgreSQL, with the latter seen by many as technically superior.

Those who support PostgreSQL argue that its standards support and ACID compliance outweighs MySQL’s speed. MySQL remains popular thanks to its inclusion in every Linux Web hosting package, meaning that many Web developers have used it; but ever since Oracle bought Sun, which owned the MySQL copyright and trademark, there have been widespread concerns that the platform isn’t quite as open-source as before. Meanwhile, PostgreSQL not only got faster, it also added JSON, making it one of the few relational databases to also support NoSQL.

(Just to go into a little more depth on the Oracle saga: Monty Widenius, the creator of MySQL, forked MySQL to create MariaDB, which is drop-in compatible with MySQL and not under Oracle’s control.)

Check out the latest PostgreSQL-related jobs.

The current version of MySQL/MariaDB is 5.7.6 (released to the community March 9, 2015), while PostgreSQL is 9.4.1. How do these latest versions match up? Let’s take a look.

ANSI Standard Compatible?

Compared to earlier versions, MySQL has made progress in the standards area; the philosophy behind MySQL is that they’ll support non-standard extensions if the customers like them. But with PostgreSQL, the standards were built into the platform from the start, whereas MySQL added them later.

Verdict: Tie.

Fully ACID Compliant?

PostgreSQL has one storage engine; MySQL has nine, but only two of those really matter to most users: MyIsam and InnoDB. MyIsam was the original engine, built for speed, but it lacked transactions; InnoDB has transactions and is speedier than MyIsam, which is why it’s the default storage engine. Both MySQL’s InnoDB and PostgreSQL are fully ACID compliant, so there’s really no difference between the platforms.

Verdict: Tie

Table Changes Without Locking

MyIsam uses table-locking to gain speed. That’s fine if many sessions involve reading; but when writing to a table, the writing session gets exclusive access and other sessions must wait until its finished. But PostgreSQl and InnoDB both use row-level locking…so again, it’s much less of an issue.

Verdict: Tie

Subqueries Can Be Problematic

Subqueries were one of MySQL’s major weaknesses for a long time; it was notorious for losing its way with two or more levels of sub-queries. Since 5.6.5, though, there have been major improvements—but PostgreSQL is still considered better for joins especially as MySQL doesn’t support Full Outer Joins.

Verdict: PostgreSQL

JSON Support and NoSQL

This is a recent addition to PostgreSQL, and it does make the platform more appealing to anyone who wants to try out NoSQL and store JSON (JavaScript Object Notation) files in the database. It gives greater flexibility on how data is stored compared to traditional relational databases; with PostgreSQL, you can have the best of all worlds.

Verdict: PostgreSQL

So has MySQL caught up with PostgreSQL? In many ways, yes; but if you’re still looking for other reasons to jump ship to PostgreSQL, here are some more:

Better Licensing

PostgreSQL has a MIT-style license that allows you to do anything, including commercial use in open or closed source. With MySQL, on the other hand, the client library is GPL, so you must pay a commercial fee to Oracle or supply the source code of your application. (That’s less of an issue when using MySQL in websites; MariaDB uses the GPL 2 license but also has a less restrictive LGPL license for MySQL Client libraries.)

Better Data Integrity

PostgreSQL has always been strict about ensuring data is valid before inserting or updating it. Whereas with MySQL, you need to set the server to a strict SQL mode (STRICT_ALL_TABLES or STRICT_TRANS_TABLES), otherwise adjusted values will be inserted or updated.

Extending the Server

Both databases allow you to extend the server. MySQL has a plugin API for C/C++ or any C-compatible language. You can even do full text search in InnoDB from MySQL 5.7.3 onwards. PostgreSQL has a similar system but also supports a wide variety of languages including C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC and others; it’s even possible to run user-supplied code in separate processes, i.e. running as background workers.

PostgreSQL also supports information about data types, functions and access methods in the system catalogs along with the usual information about databases, tables, and columns that all relational databases hold. The user can modify these system-catalog tables to extend them.

Conclusion

I’ve worked with both databases for over ten years and think MySQL has done a great job of improving itself to keep relevant, but I have to confess to favoring PostgreSQL. There’s less hassle with licensing, custom data types, table inheritance, a rules systems, and database events.

Related Articles

Upload Your ResumeEmployers want candidates like you. Upload your resume. Show them you’re awesome.

Image: scyther5/Shutterstock.com

Comments

39 Responses to “Why I Choose PostgreSQL Over MySQL/MariaDB”

March 19, 2015 at 9:12 am, Chip Lynch said:

FWIW, PostgreSQL also has far better geospatial data support. It’s not a hugely common requirement, but as soon as you need it, MySQL falls short. They (or MariaDB) are probably working on it, but as of a year ago there was no contest. That’s what made me change.

Reply

March 19, 2015 at 3:12 pm, Scott Simpson said:

You didn’t specify the MOST important reason for using Postgres. It has object ids (OIDs). This allows true referential integrity. You can have a row point at another row and this reference stays the same REGARDLESS of whether you change the primary key of the referenced row. This allows true object orientation.

Reply

March 19, 2015 at 7:14 pm, Jamie said:

OIDs have nothing to do with referential integrity. They also don’t work they way you are indicating; you seem to think they are some sort of automatic surrogate key, and they are not. They only sort-of relate to object orientation, which again has nothing to do with referential integrity.

OIDs are simply a unique number that can point to an object in the DB – a column, a table, a function, etc. They are not used for referential integrity. OIDs have been a non-default option on non-system tables since (I think) 8.1, so one can’t count on them being available.

I strongly suggest you read up a bit on this, as you’re conflating several different concepts and coming up with a nonsensical notion of how things work that is likely to cause severe problems in anything developed with that thinking.

Reply

March 19, 2015 at 3:44 pm, Phil Turmel said:

I stopped waffling when PostgreSQL added windowing functions and support for recursive CTEs. I learned to like the Oracle equivalents when that was the only game around, but $$.

Reply

March 19, 2015 at 3:45 pm, Bianchetti Marcello said:

It would be nice to have a deep look at the clustering features offered by MySQL, MariaDB & PostgreSQL (i.e. MySQL Cluster, MariaDB Galera Cluster, Pgpool-II, etc…). I think this the field where the real game will be played in the very near future…

Reply

March 20, 2015 at 5:36 pm, Terry Erisman said:

pg_shard is a free, open source extension for PostgreSQL which allows users to scale out by creating a cluster across commodity servers which can handle hundreds of billions of events. More nodes can be added to the cluster at any time. The database is sharded and the shards are replicated across the cluster nodes so the failure of any individual node has no impact on data availability. The latest version, pg_shard 1.1, was released this week and can be found on the Citus Data website or on GitHub.

Reply

March 19, 2015 at 4:04 pm, Cellar said:

“Both MySQL’s InnoDB and PostgreSQL are fully ACID compliant, so there’s really no difference between the platforms.”
This is the politically correct (and fanboi-appeasing) answer, but it is factually incorrect.

I’ll leave out the practicalities of having so many backends to choose from that it’s often quite confusing yet insidiously necessary to work out just what backend you’re talking to now, though it is not unimportant. No, I’ll just refer to the manual that even for innodb doesn’t claim “full ACID compliance”, it just says “follows ACID”, whatever that may mean.

A little reading is enlightening. Both in the comments to various features, and in the manual itself, for example how CHECK constraints are handled. Pretty soon you’ve shed at least three letters of the acronym. Unless you consider that not doing anything wrong, ever, being a hard requirement for having a robust system is a valid approach to software engineering.

PostgreSQL didn’t have its ACID compliance bolted on as an afterthought. Since this idea is a rather central and important feature of Relational Database Management Systems, you can safely say that PostgreSQL is one of those, and MySQL, not so much. But it isn’t exactly rare for authors to fail to discern the difference.

Reply

March 19, 2015 at 4:25 pm, Brandon M. Petty said:

Another substantial enhancement has been PostgreSQL’s new locking capabilities. MySQL still has a very naive implementation which will cause needless deadlocking. If you are using MySQL for backing a personal web blog, you’ll never see this. If you have a highly relational schema being hit by numerous simultaneous transactions (read multi-threaded enterprise software) you will definitely want to think twice about MySQL.

https://bugs.mysql.com/bug.php?id=48652

Reply

March 19, 2015 at 4:41 pm, antxon said:

What about clustering? What about clientless managment? phppgadmin can’t be compared to phpmyadmin…

Reply

March 20, 2015 at 7:03 am, Josh said:

It’s been a while since I’ve used phpmyadmin, but there are better choices than phppgadmin for web-based database administration. Namely, PostgreSQL Studio.

http://www.postgresqlstudio.org/

Reply

March 19, 2015 at 5:59 pm, Mitch said:

Hmm, an interesting article, but I think the author fails to take in to account some of the considerations that might actually make MariaDB a better choice than Postgres, intentional because of the author’s preference towards Postgres? As a previous commenter mentioned, clustering, high availability, multi master replication are all possible on MariaDB. What about options for taking database backups, xtradbbackup is super fast and very helpfully stores your binlog position so it can be used to bring up a slave that can be quickly joined to a master and used. FULLTEXT indexes were available in InnoDB in 5.6 BTW. I do admit that being able to serialize data to a JSON format and store it in the DB is a helpful feature that I do use in one of my Rails apps. There are ways around the licensing of the MySQL client library, the path Atlassian choose to take is telling you where to download the JDBC connector from for MySQL and where to drop the jar in your application if you want to connect your Atlassian apps to a MySQL database. What about sharding, from my understanding it is built right in to MySQL cluster, and yet you require a third party extension pg_shard to do it in Postgres, not that I’m advocating sharding as a good idea if you want to keep your DB per formant. I’m not saying that Postgres is a bad DB, I use it all the time, particularly for really large data sets, I’m just saying don’t write such a one sided article without considering other use cases that perhaps MySQL is better suited for.

Reply

March 21, 2015 at 5:57 am, Chreo said:

How’s not pg_basebackup relevant to that? Can My/ariaDB do Point in TIme recovery? Lagging replication (like dont apply transactions unless they’re like 20 minutes old)?

And if licensing is an issue wou’ve chosen the wrong DB

Postgres has multi-master too (BDR and Bucardo).

I’ve simply not yet seen a case where MySQL could do something PostgreSQL couldn’t be at least as good at (not that it doesn’t exist) and if PostgreSQL can be used it really should not be a choice given how badly MySQL is with data consistency

Reply

March 19, 2015 at 6:45 pm, burne said:

So, along comes this government customer demanding a serious SLA. However, their java application uses Hibernate allowing us the choice between any of the supported SQL backends. We’ve experimented and after four years and quite a bit of monies, I have something of a conclusion.

If you want capable, experienced staff, useable solutions and available on-call support without paying serious money (DB2 or Oracle-level spending) Percona (and MySQL) is your friend.

Competent, affordable PgSQL support is something like a secure, intelligent PHP programmer. You’re sure they exist, and you still look forward to meeting one. It just didn’t happen yet.

Reply

March 21, 2015 at 5:58 am, Chreo said:

I’m quite sure PostgreSQL Experts, 2nd Quadrant, Citus, Redpill Linpro and EnterpriseDB (to mention a few) would disagree with you

Reply

March 23, 2015 at 7:33 am, Jeff Dafoe said:

No, they’d agree. We ran into this problem as well, the pricing for commercial support packages is astronomical and is on-par with Oracle. We were finally able to find a competent provider, Endpoint, who had a workable pricing structure with a modest per-month monitoring fee and then we purchase hours as-needed. But we first went through nearly all of the companies you mention above. EnterpriseDB, in particular, was as much as our entire Oracle license structure that we were replacing.

Reply

March 19, 2015 at 7:14 pm, mike said:

I am really curious to find out how many people really do use FULL JOIN? I bet facebook and twitter engineers are not, since they decided to use MySQL, as well as Wikipedia. I guess their main reasons were irrelevant things, like resource utilization, money, scalability or I don’t know what… but it was something irrelevant, that’s for sure….

Reply

March 21, 2015 at 7:46 am, Chreo said:

Perhaps because they were started by PHP coders and they seem oblivious to MySQLs shortcomings.

I’ve rarely used FULL JOIN but I almost daily use CTE’s a lot and other modersn SQL MySQL does not support

http://use-the-index-luke.com/

Reply

March 19, 2015 at 9:49 pm, K. L. Estes said:

How does adding JSON automatically make it NoSQL capable ? Is JSON used in lieu of SQL somehow ?

Reply

March 20, 2015 at 8:17 am, BWM said:

We were using MySQL and paying for their “embedded licenses” as we deployed systems at customer sites. The licensing policies were obtuse under the original MySQL management (“oh, for that usage, it’s not open source”. ???), with lots of gotchas around which MySQL “engine” you happened to use. Once acquired by Oracle, the licensing policies were either unobtainable or incomprehensible, and our calls to purchase more of the inexpensive embedded licenses were not returned.

We started a rush project to move to Postgres. Some developers had to be kicked for putting too much logic in their SQL queries. The only show-stopping problem was bulk data ingest, which our systems run daily, and for which Postgres was much slower. A lot of tuning work only provided small gains. One of our developers had the idea of pre-sorting the data with command line utilities. Massive performance improvement, even with the pre-sort time included.

MySQL has been obliterated from all our deployments. No looking back, no regrets.

Haven’t looked at the MariaDB licensing terms… but it’s the same crowd as the original MySQL management. And do they still have the multiple engines to chose from? That was a big gotcha when talking about MySQL features and performance. They each had different characteristics and advantages. It’s odd that reviews never seem to mention which engine is used.

Reply

March 20, 2015 at 10:32 am, Jeff Dafoe said:

One comparison point that is significant and that I never see discussed is the connection model. PG uses one process per connection. MySQL and Oracle use multiple connections per process. PG’s model can cause a noteworthy amount of wasted memory. Worse still, the PG backend is not very well optimized for high fork loads, quite a bit of startup work is done when the new backend connection is launched.

Reply

March 20, 2015 at 3:44 pm, Jonathan Briggs said:

If PostgreSQL’s connection handling is a problem there are ways to handle it with connection pools and proxies. These have problems of their own but what doesn’t?

It seems to me that memory use per connection is insignificant compared to the power of the For … I mean, the total RAM of your database server. If it is like most then 16 GB RAM is a minimum these days. 256 GB is not out of reach. Your server probably runs out of available slots in its default-configured Netfilter firewall before it runs out of RAM.

Reply

March 21, 2015 at 12:26 pm, Jeff Dafoe said:

The RAM per connection isn’t insignificant, though. The work memory as well as other memory settings that I can’t remember offhand are per-backend. Envision a desktop app that establishes its PG connection when the app starts and closes the connection when the user exits. Now imagine a call center where hundreds of users are using this app all day. 200-500 PG backends. You really have to turn the memory settings way down and it does impact performance. Too much of the working memory allocation is done per-backend and the backend doesn’t die until the connection ends. In MySQL and Oracle, the working memory is shared more across connections since multiple connections are serviced by one process.

I’m pro-PG but the connection model is definitely something that should be thought of when doing a comparison of PG versus others.

Reply

March 21, 2015 at 3:28 pm, Steven Christy said:

Hey Jeff, I was thinking about your comment and it seems to me that copy on write should significantly reduce the wasted memory for multiple processes. Also, if one process crashes other processes will not be affected. I realize that it looks strange to see one process per connection, but I think you should consider re-evaluating whether this is a bad thing.

Reply

March 23, 2015 at 7:41 am, Jeff Dafoe said:

It’s cool that you mention copy-on-write, I was originally going to go back and add something about it. I’m not sure it helps much with PG. It does mitigate the memory use by the base forked process, but as soon as an individual process allocates and starts using that large working memory segment, I’d guess it pretty much ends up with a unique memory footprint specific to that process. Also, with PG, if one process backend crashes, all backends are intentionally terminated and restarted. So you don’t really gain the full benefit of process isolation like you would with Apache.

March 25, 2015 at 2:11 am, Hans said:

@Jeff:

> Also, with PG, if one process backend crashes,
> all backends are intentionally terminated and restarted.

No, that’s definitely not true. If the backend process that handles a specific connection crashes, everything else keeps running. That is one of the advantages of having the multi-process model. There is even an administrative function that will terminate a backend forcefully.

If one of the “management” processes dies (e.g. the checkpoint process or the writer process) then yes, everything else is restarted as well because those are vital and essential system processes. But those are very different to “regular” backends that handle client connections

March 25, 2015 at 7:39 am, Jeff Dafoe said:

Hans, you can easily repro this by issuing kill -9 to a single backend. When PG detects that a single backend has terminated abnormally, it restarts all backends. This causes connection loss on all backends.

March 25, 2015 at 8:54 am, Hans said:

@Jeff:

You are indeed right.

But you can safely kill such a backend using kill -15 or through “select pg_terminate_backend()” without any other process being affected

March 23, 2015 at 4:12 am, Karl said:

Oracle is also using a process-per-connection model on Linux. This can be changed however since Oracle 12, but the default is still one OS process per connection

Reply

March 23, 2015 at 7:43 am, Jeff Dafoe said:

I think it’s different under Oracle, though, because the connections don’t seem to be doing any work. I believe they just proxy data back and forth from the main DB processes and the client connection socket. I’m basing this on looking at the memory utilization of the individual connection processes. It’s so trivial that I even forgot there was one connection per client.

Reply

March 23, 2015 at 1:27 pm, Karl said:

The processes _are_ doing the work in Oracle. You can see that when looking at CPU intensive queries: the CPU is attributed to that process, not to the main Oracle process. As far as the memory consumption is concerned: the Postgres “connection processes” use the same shared memory as the main Postgres process. So the memory usage per process as reported by the OS is much higher than it is in reality because the shared memory (which is only allocated once) is counted for each child process. Oracle is maybe using shared memory differently.

Reply

March 23, 2015 at 2:17 pm, Jeff Dafoe said:

You’re right about the Oracle processes, I was so focused on memory utilization I forgot that the processes are clearly doing work. But I believe that, under Oracle, most of that work is done through a shared memory structure. Under PG, much of that work is done in a memory structure (work_mem) localized to the individual client process. It’s not as shared as you might think.

March 20, 2015 at 1:24 pm, Kyle Harrison said:

Just throwing these in there

MariaDB NoSQL: https://mariadb.com/kb/en/mariadb/handlersocket/ , though it IS a plugin, nevertheless it’s MariaDB’s official answer to PostgreSQL’s NoSQL capabilities as well.

Also, MariaDB’s “Dynamic Columns” https://mariadb.com/kb/en/mariadb/dynamic-columns/ and COLUMN_JSON https://mariadb.com/kb/en/mariadb/column_json/

So, MySQL drop-in replacement, that can be further enhanced with all the benefits of Postgre? Dunno about you, but sign me up for that

Reply

March 21, 2015 at 11:04 pm, Eric Wadsworth said:

A few months back, we had completed initial development on a new persistence layer on a demanding application. We’d put it all into PostgreSQL, and were enjoying the easy JSON and other features. It worked great.

So we got it up and running on high-end hardware in our five data centers, then we turned on the pipes for all the writes. But our systems team members were going insane, trying to get High Availability working right. It turns out that there is just no good way to accomplish this in PostgreSQL. It could fail over to the slave if the master stopped responding, but fail-back was basically impossible. It had to do an rsync on the file system level, which was expected to fail. When it failed, the docs said, just do it again. It took almost a full day to run, each time.

And it failed with alarming regularity! When under load, every couple of days the database would just freeze for ten to fifteen minutes, choking on some non-scary query. It would just sit there, stuck. Calls to it would just block, and eventually timeout. When this happened, it would fail over to the slave, and we’re days away from getting back to a sane state.

Don’t think we didn’t do our best to solve this issue. We spent many thousands of dollars on two different highly recommended consulting companies, who specialized in PostgreSQL. They came onsite and looked at everything, and recommended a number of configuration adjustments, but nothing helped.

In desperation, the project now seriously behind schedule, we worked over Christmas, and branched the code to use MySQL as the database, instead of PostgreSQL. Then we set up two parallel systems. Both on identical high-end hardware ($50,000 machines), one for each database, and turned on all the pipes.

The result? MySQL answered its queries in 50% less time than PostgreSQL. Plus we already knew that it did HA quite well, and it never just froze up like PostgreSQL would.

We have since completely obliterated all traces of PostgreSQL from our code base.

Reply

March 22, 2015 at 8:12 pm, Michael Paquier said:

What is the version of PostgreSQL that has been used? Mentioning the version with the cool JSON features of a couple of months back that has been released before winter, then I would think that it is 9.4.0. And you should know that there has been a bug in this area that has been fixed in 9.4.0 (fix released quickly in 9.4.1) after a somewhat similar locking issue reported here:
http://www.postgresql.org/message-id/20140731233051.GN17765@andrew-ThinkPad-X230
That has been fixed consequently here:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0e3a1f71dfb31527c07a80ccba815b2928d70d8f

Reply

March 22, 2015 at 11:53 pm, Eric Wadsworth said:

That looks like it is probably the bug that was biting us. Glad it’s fixed! So that’s one of the three issues that moved us from PostgreSQL to MySQL. The other two remain: Lack of HA, and plain-ole performance under load.

Reply

March 23, 2015 at 2:30 am, Hans said:

MySQL is not “fully” acid compatible – at least in terms of read-consistency.

The statement:

update some_table
set foo = bar,
bar = foo;

will not swap the values correctly

Also constraints aren’t evaluated correctly on a per-statement level:

create table foo
(
id integer primary key,
parent_id integer,
foreign key (parent_id) references foo (id)
);
insert into foo values (1,null), (2,1),(3,2),(4,1);
delete from foo where id in (2,3,4);

Is failing in MySQL because the foreign key constraint is evaluated row-by-row rather than at the end of the statement

Reply

September 14, 2015 at 3:33 pm, schu said:

Some other reasons to avoid mysql:

1. It doesn’t support a timestamp data type with offset. So you can’t store data in a local timezone with offset and have it convert it for you if you change timezones.

2. You can’t use functions or procedures as default column values. The only dynamic default column value is the constant timestamp. Yea, it’s that dumb.

3. Last I looked you couldn’t have more than one auto_increment column. Please don’t tell me I don’t need more than one, just give me a proper sequence.

4. People argue that mysql is much easier to setup, but by the time you turn on strict data typing, and tune memory a little it’s just as much work if not more.

In my opinion the only reason to use mysql is if you want one dumb insert after the next, each in it’s own transaction, without a care about the integrity of the data. In other words it makes a great logging server.

Reply

December 19, 2015 at 11:08 am, ptgugo said:

We are evaluating an Open Source solution, comparing Postgres, MySQL and MariaDB.
Testing uses 3 sets of real data, from 40M-60M rows each, and e2e is files to facts. SQL was built in MS SQL Server Enterprise system, time to adapt is also being considered on purpose. We are not optimizing SQL in the Open Source solution at this point.
Test #1 was pgsql, and a surprise it was, loads data fast and was 4.5 times slower than production process (Azure D13) in an outdated laptop (CentOS 7). SQL was converted in half a day with google help. Stuff like isdate or isnumeric had to be built but code can be found with Google, easy. Another good surprise was datetime conversion, better than MSSRV, we can convert time specifying a city and that’s a really great feature when we live in WET area (two timezones time per year). String functions are not so rich (position vs charindex) so again we googled an instr function (and I still miss instr, truncate partition, and decode from Oracle days), and we got an e2e process running. A (negative) mention to partitioning, it’s weird not to be able to insert to the master table natively, but partitions work natively as tables (we can live with it).
Test #2 was mysql, not finished. Nasty surprises when converting to varchar, we are adding an empty string to the result instead of casting/converting, and the crash on the wall, no row number (!?), we are converting to variables, let’s see. I haven’t checked partitioning, but string functions are richer than pgsql, locate is just like charindex, and tops position from pgsql (and instr tops the three).
Test #3 is going to be MariaDB, i tried to install alongside mysql but without success, I did the manual installation, changed the conf files, followed the instructions in mariadb site (for an older version), but no success. So tests will go after finishing and removing mysql.

Reply

June 29, 2016 at 6:19 pm, Joshua Gollaher said:

After reading this article, I think I will use PostgreSQL.

Reply

Post a Comment

Your email address will not be published.