Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

10/30/2020

Reading time:2 min

Apache Cassandra Lunch #23: Lucene Based Indexes on Cassandra - Business Platform Team

by John Doe

In this blog, we will cover packaged and DIY methods for Lucene based indexes on Cassandra; as well as, give some pros and cons for using Lucene Based Indexes on Cassandra. Also, the live webinar recording of Apache Cassandra Lunch #23 is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register at this link now!In Apache Cassandra Lunch #23, we cover Lucene based indexes on Cassandra. We also covered packaged and DIY methods; as well as, pros and cons. If you want to watch the live version of Apache Cassandra Lunch #23, which includes a more in-depth discussion, you can find it embedded below. Also, check out the rest of the Cassandra Lunches you may have missed on our YouTube page linked here. Don’t forget to subscribe while you’re there so you can keep up to date with all of the upcoming Cassandra Lunches; as well as, our other content!PackagedDSE Search / SolrDSE Search | DSE 6.0 Dev guideCassandra Lucene Indexinstaclustr/cassandra-lucene-index: Lucene based secondary indexes for CassandraMutations to Cassandra -> Mutations on Disk (for that node)Elassandra / Elasticsearchstrapdata/elassandra: Elassandra = Elasticsearch + Apache CassandraCassandra + ElasticsearchMutations to Cassandra -> Mutations to ElasticsearchDIYEvent -> CQRS -> Cassandra + IndexWriteEvent / Command goes into an Event Source repository (Kafka, SQL Table, etc. )Command Processor processes it into CQL / Elasticsearch or SOLR or Amazon … AlgoliaRequestEvent / Command goes into an Event Source repository (Kafka, SQL Table, etc. )Command Processor goes to index / finds the data, goes to Cassandra, gets the data, returns.Query the index —Cassandra -> Batch -> IndexWrites to CassandraEvery now and then – Index to ???Cassandra + SparkCassandra Triggers -> IndexCassandra CDC -> IndexCDC -> Kafka Connect -> Lucene Index ( Elastic/Solr/etc.)CDC -> Kafka -> Indexer / Kafka Consumer -> Lucene Index ( Elastic/Solr/etc.)Serverless Function -> Cassandra + IndexApache Nifi (Lucene) -> Nifi Processor -> Cassandra + IndexProsextremely rich search capabilitiesgeospatialsynonymfuzzy logic searchtyposstemmingpackaged elastic/Solr/Lucene -> shorter latencyseparate index -> better separation of concerns and speedConspure Lucene -> reinventing the wheel of what Solr/ElasticSearchexternal elastic/solr/ ?? -> longer latency between finding the data / getting the datapackaged elastic/Solr/Lucene -> don’t expect it to solve all your problemsconsistency issues (if DIY)Lucene is memory heavyLucene is disk heavyAs mentioned above, the live recording of Apache Cassandra Lunch #23: Lucene Based Indexes on Cassandra is embedded below. Also, check out our YouTube page for more videos and the Cassandra Lunch playlist here! If you want to attend Cassandra Lunch live, it is hosted weekly on Wednesdays at 12 PM EST. You can register at this link now! Additional ResourcesCassandra.LinkCassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email! Posted in Modern Business | Comments Off on Apache Cassandra Lunch #23: Lucene Based Indexes on Cassandra

Illustration Image

In this blog, we will cover packaged and DIY methods for Lucene based indexes on Cassandra; as well as, give some pros and cons for using Lucene Based Indexes on Cassandra. Also, the live webinar recording of Apache Cassandra Lunch #23 is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register at this link now!

In Apache Cassandra Lunch #23, we cover Lucene based indexes on Cassandra. We also covered packaged and DIY methods; as well as, pros and cons. If you want to watch the live version of Apache Cassandra Lunch #23, which includes a more in-depth discussion, you can find it embedded below. Also, check out the rest of the Cassandra Lunches you may have missed on our YouTube page linked here. Don’t forget to subscribe while you’re there so you can keep up to date with all of the upcoming Cassandra Lunches; as well as, our other content!

Packaged

DIY

  • Event -> CQRS -> Cassandra + Index
    • Write
      • Event / Command goes into an Event Source repository (Kafka, SQL Table, etc. )
      • Command Processor processes it into CQL / Elasticsearch or SOLR or Amazon … Algolia
    • Request
      • Event / Command goes into an Event Source repository (Kafka, SQL Table, etc. )
      • Command Processor goes to index / finds the data, goes to Cassandra, gets the data, returns.
      • Query the index —
  • Cassandra -> Batch -> Index
    • Writes to Cassandra
    • Every now and then – Index to ???
    • Cassandra + Spark
  • Cassandra Triggers -> Index
  • Cassandra CDC -> Index
    • CDC -> Kafka Connect -> Lucene Index ( Elastic/Solr/etc.)
    • CDC -> Kafka -> Indexer / Kafka Consumer -> Lucene Index ( Elastic/Solr/etc.)
  • Serverless Function -> Cassandra + Index
  • Apache Nifi (Lucene) -> Nifi Processor -> Cassandra + Index

Pros

  • extremely rich search capabilities
  • geospatial
  • synonym
  • fuzzy logic search
  • typos
  • stemming
  • packaged elastic/Solr/Lucene -> shorter latency
  • separate index -> better separation of concerns and speed

Cons

  • pure Lucene -> reinventing the wheel of what Solr/ElasticSearch
  • external elastic/solr/ ?? -> longer latency between finding the data / getting the data
  • packaged elastic/Solr/Lucene -> don’t expect it to solve all your problems
  • consistency issues (if DIY)
  • Lucene is memory heavy
  • Lucene is disk heavy

As mentioned above, the live recording of Apache Cassandra Lunch #23: Lucene Based Indexes on Cassandra is embedded below. Also, check out our YouTube page for more videos and the Cassandra Lunch playlist here! If you want to attend Cassandra Lunch live, it is hosted weekly on Wednesdays at 12 PM EST. You can register at this link now!

Additional Resources

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!

Related Articles

jvm
rest
search

Para - backend for busy developers

John Doe

1/28/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

lucene