Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

3/13/2021

Reading time:3 min

Out in the Open: The Abandoned Facebook Tech That Now Helps Power Apple

by John Doe

Mark Zuckerberg and company open sourced Cassandra in the summer 2008, and it helped kick off the now enormous NoSQL movement, along with other databases like CouchDB and MongoDB. Rackspace hired Ellis that very year to evaluate options for a next-generation database, and he tried all the various NoSQL databases available at the time. None, he says, could top Cassandra. "Facebook open sourced it, but weren't moving it forward," he says. "But the technical foundations were ahead of everyone else."Facebook hadn't built a community around Cassandra, which was was both a liability and an opportunity. Ellis could tailor the open source project to meet Rackspace's needs---build and guide the community himself. But the idea to start his own Cassandra company didn't come until 2010. Cassandra was already gaining traction outside of Facebook and Rackspace, but when an engineer at another company told Ellis it had decided to use a competing NoSQL database because there was a startup that would provide technical support for the software, he knew he had to act.Keep On Chuggin'Even as Cassandra grew behind the scenes, the initial buzz wore off. Today, there are too many NoSQL databases to keep track of. And when Facebook decided to use Hbase instead of Cassandra for its messaging system, it took a little sheen off the database. But even as the NoSQL hype faded, Cassandra kept chugging along, picking up new users along the way. According to data compiled by Austrian consulting firm Solid IT, Cassandra is the second most popular NoSQL database in the world, after MongoDB, and the third fastest growing database overall.Matt Pfeil.DataStaxDataStax is a big part of this, offering service and support for a proprietary version of Cassandra called DataStax Enterprise. "A lot of companies have more time than money, so they use the open source Cassandra and contribute back," Ellis says. "But other companies prefer to trade money for time, and they pay for the enterprise version. Personally, though the sales team would disagree, I'm happy to work with people from either camp."At the time, the larger Cassandra community has continued to grow, with many other companies supporting its development. Apple is now the second largest contributor to the project, though it's quiet about how it uses the database. Ellis couldn't confirm whether Apple is a DataStax customer, but three Apple engineers are speaking at the annual Cassandra Summit in September. And Cassandra has found its way back into Facebook thanks to the company's acquisition of Instagram, which is a heavy user of the database.Chasing the FutureThe tech community has reached a point where one database product from one company will no longer dominate the market. From now on, there will be multiple different approaches to storing and working with data. But the big data landscape has evolved since 2008. Since then Google has unveiled numerous new tools, such as Dremel, which it uses to query data at insanely fast speeds, and Spanner, its internal replacement for the database that inspired Cassandra.The open source community is trying to keep up with these advances. MapR started building a Dremel clone Drill in 2012, and a startup called Databricks has been developing an analytics system called Spark that is now in use by Yahoo. More recently, a team of ex-Google engineers began building a Spanner clone called CockroachDB.Ellis says the strategy for Cassandra and DataStax will be ensuring that its technology can work with any new technology that can come along. For example, DataStax recently released a connector for Spark that will enable developers to easily use Spark to analyze data stored in Cassandra. "We're trying to be the database that drives our application, not necessarily the analytics," he says. "There's nothing that marries us to one of those platforms."Correction 8/4/2014 7:15 PM EST: An earlier version of this story said that Yahoo was developing Spark, but it's actually being developed by a company called Databricks.

Illustration Image

Mark Zuckerberg and company open sourced Cassandra in the summer 2008, and it helped kick off the now enormous NoSQL movement, along with other databases like CouchDB and MongoDB. Rackspace hired Ellis that very year to evaluate options for a next-generation database, and he tried all the various NoSQL databases available at the time. None, he says, could top Cassandra. "Facebook open sourced it, but weren't moving it forward," he says. "But the technical foundations were ahead of everyone else."

Facebook hadn't built a community around Cassandra, which was was both a liability and an opportunity. Ellis could tailor the open source project to meet Rackspace's needs---build and guide the community himself. But the idea to start his own Cassandra company didn't come until 2010. Cassandra was already gaining traction outside of Facebook and Rackspace, but when an engineer at another company told Ellis it had decided to use a competing NoSQL database because there was a startup that would provide technical support for the software, he knew he had to act.

Keep On Chuggin'

Even as Cassandra grew behind the scenes, the initial buzz wore off. Today, there are too many NoSQL databases to keep track of. And when Facebook decided to use Hbase instead of Cassandra for its messaging system, it took a little sheen off the database. But even as the NoSQL hype faded, Cassandra kept chugging along, picking up new users along the way. According to data compiled by Austrian consulting firm Solid IT, Cassandra is the second most popular NoSQL database in the world, after MongoDB, and the third fastest growing database overall.

Matt Pfeil. DataStax

DataStax is a big part of this, offering service and support for a proprietary version of Cassandra called DataStax Enterprise. "A lot of companies have more time than money, so they use the open source Cassandra and contribute back," Ellis says. "But other companies prefer to trade money for time, and they pay for the enterprise version. Personally, though the sales team would disagree, I'm happy to work with people from either camp."

At the time, the larger Cassandra community has continued to grow, with many other companies supporting its development. Apple is now the second largest contributor to the project, though it's quiet about how it uses the database. Ellis couldn't confirm whether Apple is a DataStax customer, but three Apple engineers are speaking at the annual Cassandra Summit in September. And Cassandra has found its way back into Facebook thanks to the company's acquisition of Instagram, which is a heavy user of the database.

Chasing the Future

The tech community has reached a point where one database product from one company will no longer dominate the market. From now on, there will be multiple different approaches to storing and working with data. But the big data landscape has evolved since 2008. Since then Google has unveiled numerous new tools, such as Dremel, which it uses to query data at insanely fast speeds, and Spanner, its internal replacement for the database that inspired Cassandra.

The open source community is trying to keep up with these advances. MapR started building a Dremel clone Drill in 2012, and a startup called Databricks has been developing an analytics system called Spark that is now in use by Yahoo. More recently, a team of ex-Google engineers began building a Spanner clone called CockroachDB.

Ellis says the strategy for Cassandra and DataStax will be ensuring that its technology can work with any new technology that can come along. For example, DataStax recently released a connector for Spark that will enable developers to easily use Spark to analyze data stored in Cassandra. "We're trying to be the database that drives our application, not necessarily the analytics," he says. "There's nothing that marries us to one of those platforms."

Correction 8/4/2014 7:15 PM EST: An earlier version of this story said that Yahoo was developing Spark, but it's actually being developed by a company called Databricks.

Related Articles

cluster
troubleshooting
datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra