

(Tee11/Shutterstock)
How we retailer and serve information are essential elements in what we are able to do with information, and at present we wish to do oh-so a lot. That massive information necessity is the mom of all invention, and over the previous 20 years, it has spurred an immense quantity of database creativity, from MapReduce and array databases to NoSQL and vector DBs. All of it appears so promising…after which Mike Stonebraker enters the room.
For half a century, Stonebraker has been churning out the database designs at a livid tempo. The Turing Award winner made his early mark with Ingres and Postgres. Nonetheless, apparently not content material to having created what would grow to be the world’s hottest database (PostgreSQL), he additionally created Vertica, Tamr, and VoltDB, amongst others. His newest endeavor: inverting the whole computing paradigm with the Database-Oriented Working System (DBOS).
Stonebraker additionally is known for his frank assessments of databases and the info processing trade. He’s been recognized to pop some bubbles and slay a sacred cow or two. When Hadoop was on the peak of its reputation in 2014, Stonebraker took clear pleasure in stating that Google (the supply of the tech) had already moved away from MapReduce to one thing else: BigTable.
That’s to not say Stonebraker is an enormous supporter of NoSQL tech. In actual fact, he’s been a relentless champion for the facility of the relational information mannequin and SQL, the 2 core tenets of relational database administration programs, for a few years.

Mike Stonebraker
Again in 2005, Stonebraker and two of his college students, Peter Bailis and Joe Hellerstein (members of the 2021 Datanami Folks to Watch class), analyzed the earlier 40 years of database design and shared their findings in a paper referred to as “Readings in Database Techniques.” In it, they concluded that the relational mannequin and SQL emerged as your best option for a database administration system, having out-battled different concepts, together with hierarchical file programs, object-oriented databases, and XML databases, amongst others.
In his new paper, “What Goes Round Comes Round…And Round…,” which was revealed within the June 2024 version of SIGMOD Report, the legendary MIT pc scientist and his writing accomplice, Carnegie Mellon College’s Andrew Pavlo, analyze the previous 20 years of database design. As they be aware, “Quite a bit has occurred on the planet of databases since our 2005 survey.”
Whereas a few of the database tech that has been invented since 2005 is nice and useful and can final for a while, in accordance with Stonebraker and Pavlo, a lot of the brand new stuff is just not useful, is just not good, and can solely exist in area of interest markets.
20 Years of Database Dev
Right here’s what the duo wrote about new database innovations of the previous 20 years:
MapReduce: MapReduce programs, of which Hadoop was essentially the most seen and (for a time) most profitable implementation, are lifeless. “They died years in the past and are, at greatest, a legacy expertise at current.”

Hadoop…er, MapReduce…is lifeless, Stonebraker mentioned
Key-value shops: These programs (Redis, RocksDB) have both “matured into RM [relational model] programs or are solely used for particular issues.”
Doc shops: NoSQL databases that retailer information as JSON paperwork, akin to MongoDB and Couchbase, benefited from developer pleasure over a denormalized information constructions, a lower-level API, and horizontal scalability at the price of ACID transactions. Nonetheless, doc shops “are on a collision course with RDBMSs,” the authors write, as they’ve adopted SQL and relational databases have added horizontal scalability and JSON help.
Columnar database: This household of NoSQL database (BigTable, Cassandra, HBase) is much like doc shops however with only one degree of nesting, as an alternative of an arbitrary quantity. Nonetheless, the column retailer household already is out of date, in accordance with the authors. “With out Google, this paper wouldn’t be speaking about this class,” they wrote
Textual content serps: Search engines like google have been round for 70 years, and at present’s serps (akin to Elasticsearch and Solr)proceed to be in style. They may probably stay separate from relational databases as a result of conducting search operations in SQL “is commonly clunky and differs between DBMSs,” the authors write.

The cloud is obligatory for industrial databases
Array databases: Databases akin to Rasdaman, kdb+, and SciDB (a Stonebraker creation) that retailer information as two-dimensional matrices or as tensors (three or extra dimensions) are in style within the scientific group, and sure will stay that manner “as a result of RDBMSs can not effectively retailer and analyze arrays regardless of new SQL/MDA enhancements,” the authors write.
Vector databases: Devoted vector databases akin to Pineone, Milvus, and Weaviate (amongst others) are “basically document-oriented DBMSs with specialised ANN [approximate nearest neighbor] indexes,” the authors write. One benefit is that they combine with AI instruments, akin to LangChain, higher than relational databases. Nonetheless, the long-term viability for vector DBs isn’t good, as RDBMSs will probably undertake all of their options, “render[ing] such specialised databases pointless.”
Graph database: Property graph databases (Neo4j, TigerGraph) have carved themselves a snug area of interest because of their effectivity with sure varieties of OLTP and OLAP workloads on related information, the place executing joins in a relational database would result in an inefficient use of compute sources. “However their potential market success comes down as to whether there are sufficient ‘lengthy chain’ situations that advantage forgoing a RDBMS,” the authors write.
Traits in Database Structure
Past the “relational or non-relational” argument, Stonebraker and Pavlo provided their ideas on the most recent developments in database structure.
Column shops: Relational databases that retailer information in columns (versus rows), akin to Google Cloud BigQuery, AWS‘ Redshift, and Snowflake, have grown to dominate the info warehouse/OLAP market, “due to their superior efficiency.”

Lakehouses are a vivid spot within the not-strictly- relational-at-all-times world
Cloud databases: The most important revolution in database design over the previous 20 years has occurred within the cloud, the authors write. Due to the large bounce in networking bandwidth relative to disk bandwidth, storing information in object shops through community hooked up storage (NAS) has grown very enticing. That in flip pushed the separation of compute and storage, and the rise of serverless computing. The push to the cloud created a “once-in-a-lifetime alternative for enterprises to refactor codebases and take away unhealthy historic expertise selections,” they write. “Aside from embedded DBMSs, any product not beginning with a cloud providing will probably fail.”
Information Lakes / Lakehouses: Constructing on the rise of cloud object shops (see above), these programs “are the successor to the ‘Massive Information’ motion from the early 2010s,” the authors write. Desk codecs like Apache Iceberg, Apache Hudi, and Databricks Delta Lake have smoothed over what “looks as if a horrible thought”–i.e. letting any software write any arbitrary information right into a centralized retailer, the authors write. The aptitude to help non-SQL workloads, akin to information scientists crunching information in a pocket book through a Pandas DataFrame API, is one other benefit of the lakehouse structure. This may “be the OLAP DBMS archetype for the following ten years,” they write.
NewSQL programs: The rise of latest relational (or SQL) database that scaled horizontally like NoSQL databases with out giving up ACID ensures might have appeared like a good suggestion. However this class of databases, akin to SingleStore, NuoDB (now owned by Dassault Techniques), and VoltDB (a Stonebraker creation) by no means caught on, largely as a result of current databases had been “adequate” and didn’t warrant taking the chance of migrating to a brand new database.
{Hardware} accelerators: The final 20 years has seen a smattering of {hardware} accelerators for OLAP workloads, utilizing each FPGAs (Netezza, Swarm64) and GPUs (Kinetica, Sqream, Brylyt, and HeavyDB [formerly OmniSci]). Few firms outdoors the cloud giants can justify the expense of constructing customized {hardware} for databases lately, the authors write. However hope springs everlasting in information. “Regardless of the lengthy odds, we predict that there can be many makes an attempt on this house over the following 20 years,” they write.

GPUs are in style database accelerators owing to the supply of Nvidia’s CUDA, the authors write
Blockchain Databases: As soon as promoted as the longer term information retailer for a trustless society, blockchain databases at the moment are “a waning database expertise fad,” the authors write. It’s not that the expertise doesn’t work, however there simply aren’t any purposes outdoors of the Darkish Internet. “Authentic companies are unwilling to pay the efficiency value (about 5 orders of magnitude) to make use of a blockchain DBMS,” they write. “An inefficient expertise in search of an software. Historical past has proven that is the unsuitable strategy to strategy programs improvement.”
Trying Ahead: It’s All Relative
On the finish of the paper, the reader is left with the indelible impression that “what goes round” is the relational mannequin and SQL. The mix of those two entities can be powerful to beat, however they are going to attempt anyway, Stonebraker and Pavlo write.
“One other wave of builders will declare that SQL and the RM are inadequate for rising software domains,” they write. “Folks will then suggest new question languages and information fashions to beat these issues. There’s great worth in exploring new concepts and ideas for DBMSs (it’s the place we get new options for SQL). The database analysis group and market are extra strong due to it. Nonetheless, we don’t anticipate these new information fashions to supplant the RM.”
So, what is going to the way forward for database improvement maintain? The pair encourage the database group to “foster the event of open-source reusable elements and providers. There are some efforts in the direction of this aim, together with for file codecs [Iceberg, Hudi, Delta], question optimization (e.g., Calcite, Orca), and execution engines (e.g., DataFusion, Velox). We contend that the database group ought to attempt for a POSIX-like customary of DBMS internals to speed up interoperability.”
“We warning builders to study from historical past,” they conclude. “In different phrases, stand on the shoulders of those that got here earlier than and never on their toes. One in every of us will probably nonetheless be alive and out on bail in 20 years, and thus absolutely expects to write down a follow-up to this paper in 2044.”
You’ll be able to entry the Stonebraker/Pavlo paper right here.
Associated Objects:
Stonebraker Seeks to Invert the Computing Paradigm with DBOS
Cloud Databases Are Maturing Quickly, Gartner Says
The Way forward for Databases Is Now
AWS, Brytlyt, Couchbase, Databricks, Elastic, Google Cloud, HeavyDB, Kinetica, KX, Milvus, MongoDB, Neo4j, NuoDB, Pinecone, Redis, SingleStore, Snowflake, Tamr, Teradata, TigerGraph, VoltDB, Weaviate
acid, Andrew Pavlo, array database, BigTable, cassandra, cloud database, database, document-store, graph database, json, KV retailer, lakehouse, mapreduce, Michael Stonebraker, NoSQL, relational database, relational mannequin, sql, textual content search, vector database