Mark Zuckerberg and company open sourced Cassandra in the summer 2008, and it helped kick off the now enormous NoSQL movement, along with other databases like CouchDB and MongoDB. Rackspace hired Ellis that very year to evaluate options for a next-generation database, and he tried all the various NoSQL databases available at the time. None, he says, could top Cassandra. "Facebook open sourced it, but weren't moving it forward," he says. "But the technical foundations were ahead of everyone else."
Facebook hadn't built a community around Cassandra, which was was both a liability and an opportunity. Ellis could tailor the open source project to meet Rackspace's needs---build and guide the community himself. But the idea to start his own Cassandra company didn't come until 2010. Cassandra was already gaining traction outside of Facebook and Rackspace, but when an engineer at another company told Ellis it had decided to use a competing NoSQL database because there was a startup that would provide technical support for the software, he knew he had to act.
Keep On Chuggin'
Even as Cassandra grew behind the scenes, the initial buzz wore off. Today, there are too many NoSQL databases to keep track of. And when Facebook decided to use Hbase instead of Cassandra for its messaging system, it took a little sheen off the database. But even as the NoSQL hype faded, Cassandra kept chugging along, picking up new users along the way. According to data compiled by Austrian consulting firm Solid IT, Cassandra is the second most popular NoSQL database in the world, after MongoDB, and the third fastest growing database overall.
DataStax is a big part of this, offering service and support for a proprietary version of Cassandra called DataStax Enterprise. "A lot of companies have more time than money, so they use the open source Cassandra and contribute back," Ellis says. "But other companies prefer to trade money for time, and they pay for the enterprise version. Personally, though the sales team would disagree, I'm happy to work with people from either camp."
At the time, the larger Cassandra community has continued to grow, with many other companies supporting its development. Apple is now the second largest contributor to the project, though it's quiet about how it uses the database. Ellis couldn't confirm whether Apple is a DataStax customer, but three Apple engineers are speaking at the annual Cassandra Summit in September. And Cassandra has found its way back into Facebook thanks to the company's acquisition of Instagram, which is a heavy user of the database.
Chasing the Future
The tech community has reached a point where one database product from one company will no longer dominate the market. From now on, there will be multiple different approaches to storing and working with data. But the big data landscape has evolved since 2008. Since then Google has unveiled numerous new tools, such as Dremel, which it uses to query data at insanely fast speeds, and Spanner, its internal replacement for the database that inspired Cassandra.
The open source community is trying to keep up with these advances. MapR started building a Dremel clone Drill in 2012, and a startup called Databricks has been developing an analytics system called Spark that is now in use by Yahoo. More recently, a team of ex-Google engineers began building a Spanner clone called CockroachDB.
Ellis says the strategy for Cassandra and DataStax will be ensuring that its technology can work with any new technology that can come along. For example, DataStax recently released a connector for Spark that will enable developers to easily use Spark to analyze data stored in Cassandra. "We're trying to be the database that drives our application, not necessarily the analytics," he says. "There's nothing that marries us to one of those platforms."
Correction 8/4/2014 7:15 PM EST: An earlier version of this story said that Yahoo was developing Spark, but it's actually being developed by a company called Databricks.