7/18/2018

Reading time:4 min

strapdata/elassandra

by John Doe

README.md Elassandra is a fork of Elasticsearch modified to run as a plugin for Apache Cassandra in a scalable and resilient peer-to-peer architecture. Elasticsearch code is embedded in Cassanda nodes providing advanced search features on Cassandra tables and Cassandra serve as an Elasticsearch data and configuration store.Elassandra supports Cassandra vnodes and scales horizontally by adding more nodes.Project documentation is available at doc.elassandra.io.Benefits of ElassandraFor Cassandra users, elassandra provides Elasticsearch features :Cassandra update are indexed in Elasticsearch.Full-text and spatial search on your Cassandra data.Real-time aggregation (does not require Spark or Hadoop to GROUP BY)Provide search on multiple keyspaces and tables in one query.Provide automatic schema creation and support nested document using User Defined Types.Provide a read/write JSON REST access to Cassandra data.Numerous Elasticsearch plugins and products like Kibana.For Elasticsearch users, elassandra provides useful features :Elassandra is masterless, cluster state is managed through a cassandra lightweight transactions.Elassandra is a sharded multi-master database, where Elasticsearch is sharded master-slave, Thus, Elassandra has no Single Point Of Write, helping to achieve high availability.Elassandra inherits Cassandra data repair mechanisms (hinted handoff, read repair and nodetool repair) allowing to support cross datacenter replication.When adding a node to an Elassandra cluster, only data pulled from existing nodes are re-indexed in Elasticsearch.Cassandra could be your unique datastore for indexed and non-indexed data, it's easier to manage and secure. Source documents are now stored in Cassandra, reducing disk space if you need a NoSQL database and Elasticsearch.Write operations are not more restricted to one primary shards, but distributed on all Cassandra nodes in a virtual datacenter. Number of shards does not limit your write throughput, just add some elassandra nodes to increase both read and write throughput.Elasticsearch indices can be replicated between many Cassandra datacenters, allowing to write to the closest datacenter and search globally.The cassandra driver is Datacenter and Token aware, providing automatic load-balancing and failover.Quick startElasticsearch 6.x changesElasticsearch now supports only one document type per index backed by one Cassandra table. Unless you specify an elasticsearch type name in your mapping, data are stored in a cassandra table named "_doc". If you want to search in many cassandra tables, you now need to create and search in many indices.Elasticsearch 6.x manages shards consistency through several metadata fields (_primary_term, _seq_no, _version) that are not more used in elassandra because replication is fully managed by cassandra.RequirementsEnsure Java 8 is installed and JAVA_HOME points to the correct location.InstallationDownload and extract the distribution tarballDefine the CASSANDRA_HOME environment variable : export CASSANDRA_HOME=<extracted_directory>Run bin/cassandra -eRun bin/nodetool statusRun curl -XGET localhost:9200/_cluster/stateExampleTry indexing a document on a non-existing index:curl -XPUT 'http://localhost:9200/twitter/_doc/1?pretty' -H 'Content-Type: application/json' -d '{ "user": "Poulpy", "post_date": "2017/10/4 13:12:00", "message": "Elassandra adds dynamic mapping to Cassandra"}'Then look-up in Cassandra:bin/cqlsh -c "SELECT * from twitter.\"_doc\""Behind the scene, Elassandra has created a new Keyspace twitter and table _doc.Now, insert a row with CQL :INSERT INTO twitter.doc ("_id", user, post_date, message)VALUES ( '2', ['Jimmy'], [dateof(now())], ['New data is indexed automatically']);Then search for it with the Elasticsearch API:curl "localhost:9200/twitter/_search?q=user:Jimmy&pretty"And here is a sample response :{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.9808292, "hits" : [ { "_index" : "twitter", "_type" : "doc", "_id" : "2", "_score" : 0.9808292, "_source" : { "post_date" : "2017/10/04 13:20:00", "message" : "New data is indexed automatically", "user" : "Jimmy" } } ] }}SupportCommercial support is available through Strapdata.Community support available via elassandra google groups.Post feature requests and bugs on https://github.com/strapdata/elassandra/issuesLicenseThis software is licensed under the Apache License, version 2 ("ALv2"), quoted below.Copyright 2015-2018, Strapdata (contact@strapdata.com).Licensed under the Apache License, Version 2.0 (the "License"); you may notuse this file except in compliance with the License. You may obtain a copy ofthe License at http://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS, WITHOUTWARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See theLicense for the specific language governing permissions and limitations underthe License.AcknowledgmentsElasticsearch and Kibana are trademarks of Elasticsearch BV, registered in the U.S. and in other countries.Apache Cassandra, Apache Lucene, Apache, Lucene and Cassandra are trademarks of the Apache Software Foundation.Elassandra is a trademark of Strapdata SAS.

Read this article if you want to know more about strapdata/elassandra

README.md

Elassandra is a fork of Elasticsearch modified to run as a plugin for Apache Cassandra in a scalable and resilient peer-to-peer architecture. Elasticsearch code is embedded in Cassanda nodes providing advanced search features on Cassandra tables and Cassandra serve as an Elasticsearch data and configuration store.

Elassandra supports Cassandra vnodes and scales horizontally by adding more nodes.

Project documentation is available at doc.elassandra.io.

Benefits of Elassandra

For Cassandra users, elassandra provides Elasticsearch features :

Cassandra update are indexed in Elasticsearch.
Full-text and spatial search on your Cassandra data.
Real-time aggregation (does not require Spark or Hadoop to GROUP BY)
Provide search on multiple keyspaces and tables in one query.
Provide automatic schema creation and support nested document using User Defined Types.
Provide a read/write JSON REST access to Cassandra data.
Numerous Elasticsearch plugins and products like Kibana.

For Elasticsearch users, elassandra provides useful features :

Elassandra is masterless, cluster state is managed through a cassandra lightweight transactions.
Elassandra is a sharded multi-master database, where Elasticsearch is sharded master-slave, Thus, Elassandra has no Single Point Of Write, helping to achieve high availability.
Elassandra inherits Cassandra data repair mechanisms (hinted handoff, read repair and nodetool repair) allowing to support cross datacenter replication.
When adding a node to an Elassandra cluster, only data pulled from existing nodes are re-indexed in Elasticsearch.
Cassandra could be your unique datastore for indexed and non-indexed data, it's easier to manage and secure. Source documents are now stored in Cassandra, reducing disk space if you need a NoSQL database and Elasticsearch.
Write operations are not more restricted to one primary shards, but distributed on all Cassandra nodes in a virtual datacenter. Number of shards does not limit your write throughput, just add some elassandra nodes to increase both read and write throughput.
Elasticsearch indices can be replicated between many Cassandra datacenters, allowing to write to the closest datacenter and search globally.
The cassandra driver is Datacenter and Token aware, providing automatic load-balancing and failover.

Quick start

Elasticsearch 6.x changes

Elasticsearch now supports only one document type per index backed by one Cassandra table. Unless you specify an elasticsearch type name in your mapping, data are stored in a cassandra table named "_doc". If you want to search in many cassandra tables, you now need to create and search in many indices.
Elasticsearch 6.x manages shards consistency through several metadata fields (_primary_term, _seq_no, _version) that are not more used in elassandra because replication is fully managed by cassandra.

Requirements

Ensure Java 8 is installed and JAVA_HOME points to the correct location.

Installation

Download and extract the distribution tarball
Define the CASSANDRA_HOME environment variable : export CASSANDRA_HOME=<extracted_directory>
Run bin/cassandra -e
Run bin/nodetool status
Run curl -XGET localhost:9200/_cluster/state

Example

Try indexing a document on a non-existing index:

curl -XPUT 'http://localhost:9200/twitter/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "Poulpy",
    "post_date": "2017/10/4 13:12:00",
    "message": "Elassandra adds dynamic mapping to Cassandra"
}'

Then look-up in Cassandra:

bin/cqlsh -c "SELECT * from twitter.\"_doc\""

Behind the scene, Elassandra has created a new Keyspace twitter and table _doc.

Now, insert a row with CQL :

INSERT INTO twitter.doc ("_id", user, post_date, message)
VALUES ( '2', ['Jimmy'], [dateof(now())], ['New data is indexed automatically']);

Then search for it with the Elasticsearch API:

curl "localhost:9200/twitter/_search?q=user:Jimmy&pretty"

And here is a sample response :

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.9808292,
        "_source" : {
          "post_date" : "2017/10/04 13:20:00",
          "message" : "New data is indexed automatically",
          "user" : "Jimmy"
        }
      }
    ]
  }
}

Support

Commercial support is available through Strapdata.
Community support available via elassandra google groups.
Post feature requests and bugs on https://github.com/strapdata/elassandra/issues

License

This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.
Copyright 2015-2018, Strapdata (contact@strapdata.com).
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
    http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

Acknowledgments

Elasticsearch and Kibana are trademarks of Elasticsearch BV, registered in the U.S. and in other countries.
Apache Cassandra, Apache Lucene, Apache, Lucene and Cassandra are trademarks of the Apache Software Foundation.
Elassandra is a trademark of Strapdata SAS.

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.