JSON Datatypes in Relational Databases: a Match Made in Heaven

I once used a NoSQL solution for a production application which stored tens of thousands of records. I eventually ported it over to a relational database solution. This is my there and back again story.

A highly-productive sequencing facility with a wide variety of customers fields a lot of questions and takes on a wide variety of projects. To do that effectively, it’s necessary to be able to answer these questions efficiently so valuable time isn’t lost while trying to keep up with the many other demands. Part of being able to quickly answer questions is knowing about the quality of data being produced and released. By storing the data in a sensible thorough fashion, it’s possible to catch mistakes and identify problematic trends before they become major headaches for everyone involved.

But how good was JSON support in the world of Postgres? As I would soon found out, it was better than I had dared to imagine.

Before I built a LIMS to replace the old one being used, I made a stats database to make it easy and quick to assess the performance of our sample preparation and sequencing instrumentation. I chose a NoSQL approach for its flexibility. At first it was bliss. Using NoSQL left plenty of room for experimentation, and the actual web-development was mainly front-end work. For the most part, the express.js backend was just pulling JSON from the database and sending it along to the client. I used backbone.js for the client and life was good for a time.

After we input a few tens of thousands of datasets we found that the database was running much more slowly, especially when performing joins. I did some research and wrote some pretty hairy indexing functions that went two array-levels deep. This sped things up noticeably, but overall performance was still degrading as time went on and the number of datasets increased.

After I finished writing the new LIMS, I had a good opportunity to back and redo the stats database as part of the LIMS. All of the problems inherent in performing the joins in a NoSQL context would be resolved since we were now in the world of relational databases. But how good was JSON support in the world of Postgres? As I would soon found out, it was better than I had dared to imagine. Technically relational databases have always supported JSON–you could just store it as text. But having features that allowed working with JSON data in a SQL context was valuable.

For example, one type of request I frequently received was to “find some x number of samples with coverage between Y and Z and do this operation on them.” With RethinkDB (the NoSQL solution), it was easy. If I wanted the IDs of thirty samples having coverage between 40X and 45X, it was as simple as this:

What I didn’t realize at the time was that Postgres has an impressive operator set for querying JSON. For example, if my stats are in a column named stats , I can do the following query:

The performance increase in going to a relational database was refreshing. I rolled my own client-side rendering scheme for the LIMS, so I wasn’t locked into backbone, and that gave me the flexibility to make some really cool interfaces for retrieving stats data:

Sparklines allow users to quickly assess relative coverages and base qualities for a group of samples.

Sparklines allow users to quickly assess relative coverages and base qualities for a group of samples.

Here I use d3.js to render the sparklines from the same JSON data that I use to fill in the table values. When users click on a table row they get larger plots, also courtesy of d3.js:

Depth of Coverage Plot

Depth of Coverage Plot

Quality Score Plot

Quality Score Plot

If you look on the left-hand panel of the screenshot, you’ll see that some of the icons are blue while others are gray. The blue-colored icons indicate which types of stats are available for the selected samples. This information is retrieved instantly allows the user to see at-a-glance what’s available without waiting for the actual underlying data to load.

The ease, speed, and sophistication of working with JSON in a relational database context strongly limits the reasons to use NoSQL in a production environment, though I still have found good use cases. If you’re frustrated with your NoSQL-powered solution, maybe it’s time to consider using the JSON datatype in a relational database.

Leave a Reply

Your email address will not be published. Required fields are marked *