The term SMACK Stack was widely popularized in the San Francisco/Dublin Scala/Spark/Reactive Systems meetups and By the Bay series of conferences (Scala and Data). Since it took a life of its own, this is an abridged chronology on how it came about. I’m surely missing something, since it’s a view from (by) the Bay — please comment with any corrections or suggestions, as well as updates. I’m not elaborating the technical merits of the SMACK Stack here, as they are covered at NoETL.org.
I’ve first seen the phrase SMACK Stack in a tweet by Jamie Allen, attributed to Oliver White, on June 25, 2015. Jamie is a veteran Typesafe/Lightbend executive and Scala/Akka/Reactive practitioner, and Oliver is their Chief Storyteller. By that time, our long-established plan to run the first end-to-end data pipeline training at Scala By the Bay and Big Data Scala 2015 was in full smack. We brought together the actual five companies comprising the SMACK Stack with their own trainers, training materials, and top OSS contributors, having worked relentlessly to create the actual SMACK Stack training, defining the SMACK Stack by doing and teaching.
The original SF Scala announcement went out on December 23, 2014. Here it is:
We are starting two new conferences in 2015: Big Data Scala for complete data pipelines and “big” data science, and Text By the Bay for applied NLP/text mining. Our flagship Scala By the Bay conference continues into its third year, growing to 500, and directly followed by Big Data Scala.
text.bythebay.io
Both previous installments of SBTB sold out completely.
You can find the talks from the previous conferences and meetups at Functional.TV. The 2014 conference is now at
2014.scala.bythebay.io (photos from 2014 SBTB)
We’re also creating a new kind of an all-day tutorial — Complete Pipeline Training, where in one day we’ll go through an end-to-end datapipeline in Scala, running Akka APIs on Mesos and pumping data through Kafka into Spark. Each segment will be taught by the respective company driving the component — Mesosphere, Typesafe, Confluent, and Databricks.
The acronym SMACK was used publicly in the official announcement of the training. Here it is in the context of collecting questions for the closing panel at Big Data Scala with Martin Odersky, creator of Scala and cofounder of Typesafe (now Lightbend), who keynoted it.
The C for Cassandra was added when we invited Ryan Knight and Evan Chan. Ryan transitioned from Typesafe to DataStax, and Evan combined Spark and Cassandra for fast OLAP.
Here’s the final lineup as delivered on the Complete Pipeline Training day
— S: Scala and Spark => Typesafe, Databricks; every trainer was a fairly well-known Scala developer. Chris Fregly represented Databricks and Advanced Spark. Nilanjan Raychaudhuri helped with the Akka presentation. For Spark, we used Spark Notebook as the GUI, presented by its creator Andy Petrella.
— M: Mesos => Mesosphere. Jason Swartz
— A: Akka => Typesafe, Duncan De Vore, Nilanjan with slides, Ryan Knight emeritus.
— C: Cassandra: Ryan Knight, for a while of Typesafe and then of DataStax, helped across the stack. Evan Chan (of Spark Job Server and FiloDB fame) is recognized for marrying Spark and Cassandra and was also instrumental.
— K: Kafka => Confluent. Jesse Anderson was fielded by Confluent as its official training provider. Ewen Cheslak-Postava helped with the Kafka segment in the docker.
Here’s the github repo used in the training: https://github.com/bythebay/pipeline.
It’s important to note that Helena Edelson wrote killrweather, a project implementing the SMACK Stack, earlier, and presented it at PNWScala 2014 (among other venues). When Ryan and Evan joined the training we considered using killrweather and would have loved to have Helena join us (but she deferred to Ryan). We ended up with a different codebase to simplify setup and allow for Spark Notebook as a Spark GUI. Helena is a pioneer — and now a lead innovator — of SMACK Stack.
Originally, we worked with Mesosphere to spin up a training cluster for every student. (Mesosphere was able to secure GCE credits, but it was not ready to run on GCE.) At that point, Mesosphere messaging was mostly on fully utilizing the datacenter. Immediately after Big Data Scala, Mesosphere announced dcos infinity, inspired by the SMACK training By the Bay. Infinity spins up Kafka, Spark and Cassandra working together. The SMACK Stack training By the Bay was the catalyst that lead to dcos infinity — I pitched it to Flo directly (my good friend Jason Swartz was a devmanager at Mesosphere and we discussed it over lunch on the deck), and Flo put us together with Matt Trifiro, the CMO, with whom we continue to collaborate. It is important to observe that the data pipeline offering encapsulated by dcos infinity heralded a new direction for Mesosphere, from data center utilization to data pipeline integration as a product. They’ve not called it “dcos smack” to allow for database vendor neutrality. (We’ve also discussed “dcos khrabrov” as an easily googlable alias.:)
The training was an incredible success — we had a 100 people at Galvanize going over the whole stack in one day, using a fully dockerized setup we provided on USB. It took five sweaty guys in an AirBnB — that Andy Petrella rented — to finalize the night before, after which I took it home to replicate on a device running under Windows. Of course it failed, and I manually copied the USBs into the night. But thanks to Nitro docker genius Ben Rizkowski, it worked great. Ben even popped in a Slack support channel, on a Sunday, while house-hunting in the East Bay.
Big Data Scala was held for the first time together with the training. It was keynoted by Martin Odersky, the creator of Scala, Matei Zaharia, the creator of Spark, Jay Kreps, the creator of Kafka, and Mike Olson, the CSO of Cloudera. Here we had most of the creators of the SMACK stack in person, as well as its key proponents and integrators. We’ve unveiled NoETL.org there as a way to popularize the reactive, streaming approach to data, as opposed to batching and dumping/reparsing text from HDFS. NoETL is in fact another way to describe what SMACK is good for, as a new way to write data pipelines.
Since all of original SMACK Stack trainers are public speakers and OSS leads, the notion spread fast. Notably, Evan Chan moved forward with FiloDB, combining Cassandra and Spark to support OLAP. Evan presented his work at SF Scala and SF Spark, as well as Scala By the Bay and Big Data Scala, and subsequently teamed up with Helena Edelson to advance FiloDB and SMACK Stack in industry. Evan ran a Q&A with O’Reilly in the Fall 2015. At the same time we took SMACK Stack to Europe via the Nitro office, the home of the Dublin Spark meetup. We also regularly host Andy Petrella at there regularly who further spread the word. Dean Wampler, the Lightbend Big Data architect, will present SMACK Stack at the O’Reilly Architecture conference this Fall (2016) — see how the SMACKing already began!
The SMACK Stack concept had become such a hit that I and other Nitro practitioners were approached by publishers to put together a book describing it. I’m helping drive such a project, and we’re looking for co-authors — ping me if you’re interested.
It’s a good time to remind all SMACK lovers that Scala By the Bay and Big Data Scala are held at Twitter this year, employing its own SMACK Stack that you can see every day in action — e.g. when you like a tweet and someone gets a notification immediately, for millions of tweets. We’re calling the whole conference Scale+Scala By the Bay, or simply Scalæ By the Bay, and run three tracks over three days, all joined in harmony. The tracks are as follows:
— Thoughtful Sofware Engineering with Types
— Reactive Systems and Streaming
— Data Pipelines for Data Engineering and Machine Learning. This is SMACK Stack in action.
SMACK is a registered trademark of By the Bay, LLC (pending). It was first used in our SMACK training commercially, and nowhere else before it. We trained a hundred people to build a whole big data backend in a day, and they took this knowledge to the world. It’s important this history, however brief, is acknowledged when using the term SMACK stack.