Awesome Cassandra Awesome

A curated list of awesome Apache Cassandra packages and resources. Maintained by Rahul Singh of Anant. Feel free contact me if you'd like to collaborate on this and other awesome lists. Awesome Cassandra , Awesome Solr, Awesome Lucene

Contents

General

Cassandra

Cassandra History

Cassandra Use Cases

Cassandra Distributions

  • Datastax Enterrpise - Most widely used commercial distribution of Apache Cassandra, integrated with Apache Spark (for SparkSQL, analytics), Apache Solr (for secondary index), Apache TinkerPop based Graph stored in Cassandra, and OpsCenter.
  • DDACS - Datastax Distribution of Apache Cassandra, a production ready distribution with a bulk loader supported by Datastax.
  • Elassandra - Elassandra = Elasticsearch as a Cassandra secondary index.
  • ScyllaDB - NoSQL data store using the seastar framework, compatible with Apache Cassandra
  • YugaByte Database - YugaByteDB is a transactional, high-performance database for building distributed cloud services. It supports Cassandra-compatible and Redis-compatible APIs, with PostgreSQL in Beta.
  • Microsoft Azure Cosmos DB: Apache Cassandra API - Azure Cosmos DB provides the Cassandra API (preview) for applications that are written for Apache Cassandra that need premium capabilities.

Using Cassandra

Cassandra from Relational

Cassandra Data Modeling

Cassandra Architecture

Cassandra Monitoring

Cassandra Maintenance

Cassandra Performance Tuning

Cassandra Security

Deploying Cassandra

Integrating with Cassandra

Spark

  • DataStax Spark Cassandra Connector: This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.
  • Stratio Deep (deprecated): Deep is a thin integration layer between Apache Spark and several NoSQL datastores. We actually support Apache Cassandra and MongoDB, but in the near future we will add support for sever other datastores.
  • sample Spark Job Server Cassandra - Simple sample job illustrating the use of Spark Jobserver to execute Apache Spark analytics with Cassandra.
  • fluxcapacitor/pipeline: End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark ML, GraphX, Spark Streaming, Kafka, NiFi, Cassandra, ElasticSearch, Redis, Tachyon, HDFS, Zeppelin, iPython/Jupyter Notebook, Tableau, Twitter Algebird.

Search / Secondary Indexes

  • Tuning DSE Search Tuning DSE Search – Indexing latency and query latency
  • Elassandra: Elassandra = Elasticsearch as a Cassandra secondary index.
  • Cassandra Lucene Index: Lucene based secondary indexes for Cassandra
  • OLD - Solandra: Solandra is a real-time distributed search engine built on Apache Solr and Apache Cassandra.
  • cassandra-trigger - Cassandra trigger to push realtime updates to elasticsearch

Packages

Libraries

  • express-cassandra - Cassandra ORM/ODM/OGM for Node.js with optional support for Elassandra & JanusGraph
  • DataStax Java Driver: A Java client driver for Apache Cassandra.
  • DataStax C++ Driver: A modern, feature-rich, and highly tunable C/C++ client library for Apache Cassandra (1.2+) and DataStax Enterprise (3.1+) using exclusively Cassandra's native protocol and Cassandra Query Language v3. http://datastax.github.io/cpp-driver/
  • DataStax Python Driver: A modern, feature-rich and highly-tunable Python client library for Apache Cassandra (2.1+) using exclusively Cassandra's binary protocol and Cassandra Query Language v3.
  • DataStax Ruby Driver : A Ruby client driver for Apache Cassandra. This driver works exclusively with the Cassandra Query Language version 3 (CQL3) and Cassandra's native protocol.
  • DataStax NodeJS Driver: A modern, feature-rich and highly tunable Node.js client library for Apache Cassandra (1.2+) and DataStax Enterprise (3.1+) using exclusively Cassandra's binary protocol and Cassandra Query Language v3.
  • DataStax C# Driver A modern, feature-rich and highly tunable C# client library for Apache Cassandra (1.2+) and DataStax Enterprise (3.1+) using exclusively Cassandra's binary protocol and Cassandra Query Language v3.
  • DataStax PHP Driver: DataStax PHP Driver for Apache Cassandra http://datastax.github.io/php-driver/
  • Achilles: Achilles is an open source Persistence Manager for Apache Cassandra,with the features like Advanced bean mapping (compound primary key, composite partition key, timeUUID...),Native collections and map support,and so.
  • phpcassa: PHP client library for Apache Cassandra
  • Caffinitas: Caffinitas is an advanced object mapper for Apache Cassandra which has been especially designed to work with Datastax Java Driver 2.1+ against Apache Cassandra 2.1, 2.0 or 1.2.
  • Spring Data for Apache Cassandra - Spring Data for Apache Cassandra offers a familiar interface to those who have used other Spring Data modules in the past.
  • gocql - Package gocql implements a fast and robust Cassandra client for the Go programming language.
  • OLD - Netflix Astyanax: Astyanax is a high level Java client for Apache Cassandra, based on Thrift protocol. Not maintained.

Tools

  • DbSchema - Cassandra Designer - DbSchema: Cassandra Diagram Designer & GUI Admin Tool which can do Cassandra amongst other databases.
  • DBEaver - Free Universal Database Tool - A third party tool for dealing with all sorts of databases including Cassandra.
  • RazorSQL - Multi DB Manager Tool - A multi-db tool for Linux, Mac, and Windows that works with Apache Cassandra.
  • KDM - The Kashlev Data Modeler - An automated big data modeling tool for Apache Cassandra
  • Cassandra Reaper: Automated repairs for Apache Cassandra. Supports all versions.
  • cstar perf Apache Cassandra performance testing platform
  • Spark Cassandra Stress A tool for testing the DataStax Spark Connector against Apache Cassandra or DSE
  • trireme: Migration tool providing support for Apache Cassandra, DataStax Enterprise Cassandra, & DataStax Enterprise Solr.
  • cqlmigrate - Cassandra CQL migration tool. cqlmigrate is a library for performing schema migrations on a cassandra cluster.
  • cassandra-migration-tool-java - Cassandra migration tool for java is a lightweight tool used to execute schema and data migration on Cassandra database.
  • cassalog - Cassalog is a schema change management library and tool for Apache Cassandra that can be used with applications running on the JVM.
  • cdeploy - cdeploy is a simple tool to manage your Cassandra schema migrations in the style of dbdeploy.
  • Web: Cassandra Calculator: A simple calculator to see how size / replication factor affect the system's consistency.
  • Cassandra-web - A web interface for Apache Cassandra https://github.com/rohitsakala/CassandraRestfulAPI
  • CassanddraRestfulAPI - CassandraRestfulAPI project exposes the cassandra data tables with the help of Restful API.
  • Netflix: Staash - A language-agnostic as well as storage-agnostic web interface for storing data into persistent storage systems, the metadata layer abstracts a lot of storage details and the pattern automation APIs take care of automating common data access patterns.
  • cql-vim - Cassandra CQL Syntax Highlighter for Vim
  • Presto - Distributed SQL Query Engine for Big Data. Presto allows querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores.
  • Sstable Tools - A toolkit for parsing, creating and doing other fun stuff with Cassandra 3.x SSTables.
  • cassandra-exporter - Simple Tool to Export / Import Cassandra Tables into JSON
  • Cassandra SStable Tools - A few different tools combined into one that helps admins get summaries, metadata, partition info, cell info.
  • Cassandra-Client - A simple gui tool for browsing tables and data in Cassandra.
  • CQL Data Modeler - A very useful tool to test out a CQL schema and visualize what the partition would like in relationship to the columns and rows.

Admin / Monitor

  • DataStax OpsCenter: Simplified management for DataStax Enterprise and Cassandra database clusters.
  • Cassandra Cluster Admin: Cassandra Cluster Admin is a GUI tool to help people administrate their Apache Cassandra cluster.
  • Cassandra StatD Agent: Java Agent for Cassandra integration with StatsD
  • Cassandra Scripts: Python based cassandra ops scripts to monitor cfstats.
  • Cassandra-Tools: Python Fabric scripts to help automate the launching and managing of cluster testing on AWS.
  • Cassandra Opstools: Generic scripts to review and monitor cassandra, from Spotify.
  • CCM: Cassandra Cluster Manager): A script/library to create, launch and remove an Apache Cassandra cluster on localhost.
  • Cassandra Nagios: Perl Based scripts to get metrics for monitoring using Jolokia.
  • Cassandra Log Tools: Simple scripts for working with Apache Cassandra logs.
  • Cassandra CFStats to CSV Parser: Converts the output of CFStats to CSV.
  • Netflix-PriamCo-Process for backup/recovery, Token Management, and Centralized Configuration management for Cassandra.
  • CStar - Apache Cassandra cluster orchestration tool for the command line.
  • ctop - This is a very simple console tool for monitoring column families read/write activities at remote cassandra host.

Queues / Schedulers

  • CMB: A highly available, horizontally scalable queuing and notification service compatible with AWS SQS and SNS
  • CassieQ: A Distributed queue built off of Cassandra.
  • Cherami : Distributed, scalable, durable, and highly available message queue system.
  • scheduler : A Scala library for scheduling arbitrary code to run at an arbitrary time.

Logging

Open Source Applications

  • Twissandra - Twissandra is an example project, created to learn and demonstrate how to use Cassandra. Running the project will present a website that has similar functionality to Twitter.
  • FiloDB - High-performance distributed analytical database + Spark SQL queries + built for streaming.
  • ChronoServer - A test server for sampling how long it takes mobile & web clients to make various types of requests to a server doing common request patterns.

Resources

Documentation

Books

Courses

Communities

Blogs

Videos

Slides