Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

8/2/2018

Reading time:7 min

spotify/heroic

by John Doe

A scalable time series database based on Bigtable, Cassandra, and Elasticsearch.Go to https://spotify.github.io/heroic/ for documentation, please join #heroic at Freenode if you need help or want to chat.This project adheres to the Open Code of Conduct.By participating, you are expected to honor this code.Stability Disclaimer:Heroic is an evolving project, and should in its current state be considered unstable.Do not use in production unless you are willing to spend time with it, experiment and contribute.Not doing so might result in losing your data to goblins. It is currently not on a release schedule and is not versioned. At Spotify we rely on multiple release forks that we actively maintain and flip between.BuildingJava 8 is required.There are some repackaged dependencies that you have to make available, you dothis by running tools/install-repackaged.$ tools/install-repackagedInstalling repackaged/x...After this, the project is built using Maven:$ mvn packageThis will cause the heroic-dist module to produce a shaded jar that containsall required dependencies.RunningAfter building, the entry point of the service iscom.spotify.heroic.HeroicService.The following is an example of how this can be run:$ java -cp $PWD/heroic-dist/target/heroic-dist-0.0.1-SNAPSHOT-shaded.jar com.spotify.heroic.HeroicService <config>For help on how to write a configuration file, see the Configuration Section of the official documentation.Heroic has been tested with the following services:LoggingLogging is captured using SLF4J, and forwarded toLog4j.To configure logging, define the -Dlog4j.configurationFile=<path>parameter. You can use docs/log4j2-file.xml as a base.TestingWe run unit tests with Maven:$ mvn testA more comprehensive test suite is enabled with the environment=testproperty.$ mvn -D environment=test verifyThis adds:CheckstyleFindBugsIntegration Tests with Maven Failsafe PluginCoverage Reporting with JacocoIt is strongly recommended that you run the full test suite before setting up apull request, otherwise it will be rejected by Travis.Remote Integration TestsIntegration tests are configured to run remotely depending on a set of systemproperties.ElasticsearchPropertyDescription-D elasticsearch.version=<version>Use the given client version when building the project-D it.elasticsearch.remote=trueRun Elasticsearch tests against a remote database-D it.elasticsearch.seed=<seed>Use the given seed (default: localhost)-D it.elasticsearch.clusterName=<clusterName>Use the given cluster name (default: elasticsearch)DatastaxPropertyDescription-D datastax.version=<version>Use the given client version when building the project-D it.datastax.remote=trueRun Datastax tests against a remote database-D it.datastax.seed=<seed>Use the given seed (default: localhost)BigtablePropertyDescription-D bigtable.version=<version>Use the given client version when building the project-D it.bigtable.remote=trueRun Bigtable tests against a remote database-D it.bigtable.project=<project>Use the given project-D it.bigtable.zone=<zone>Use the given zone-D it.bigtable.instance=<instance>Use the given instance-D it.bigtable.credentials=<credentials>Use the given credentials fileThe following is an example Elasticsearch remote integration test:$> mvn -P integration-tests \ -D elasticsearch.version=1.7.5 \ -D it.elasticsearch.remote=true \ clean verifyFull Cluster TestsFull cluster tests are defined in heroic-dist/src/test/java.This way, they have access to all the modules and parts of Heroic.The JVM RPC module is specifically designed to allow for rapidexecution of integration tests. It allows multiple cores to be defined andcommunicate with each other in the same JVM instance.CoverageThere's an ongoing project to improve test coverage.Clicking the above graph will bring you to codecov.io, where you can find areas to focus on.Speedy BuildingFor a speedy build without tests and checks, you can run:$ mvn -D maven.test.skip=true packageBuilding a Debian PackageThis project does not provide a single debian package, this is primarilybecause the current nature of the service (alpha state) does not mesh well withstable releases.Instead, you are encouraged to build your own using the provided scripts inthis project.First run the prepare-sources script:$ debian/bin/prepare-sources myrel 1myrel will be the name of your release, it will be part of your package namedebian-myrel, it will also be suffixed to all helper tools (e.g.heroic-myrel).For the next step you'll need a Debian environment:$ dpkg-buildpackage -uc -usIf you encounter problems, you can troubleshoot the build with DH_VERBOSE:$ env DH_VERBOSE=1 dpkg-buildpackage -uc -usContributingFork the code at https://github.com/spotify/heroicMake sure you format the code using the provided formatter in idea. Even if you disagreewith the way it is formatted, consistency is more important.For special cases, see Bypassing Validation.If possible, limit your changes to one module per commit.If you add new, or modify existing classes. Keep that change to a single commit while maintaingbackwards compatible behaviour. Deprecate any old APIs as appropriate with @Deprecated andadd documentation for how to use the new API.The first line of the commit should be formatted with [module1,module2] my message.module1 and module2 are paths to the modules affected with any heroic- prefix stripped.So if your change affects heroic-core and metric/bigtable, the message should say[core,metric/bigtable] did x to y.If more than 3 modules are affected by a commit, use [all].For other cases, adapt to the format of existing commit messages.Before setting up a pull request, run the comprehensive test suite as specified inTesting.Module OrientationThe Heroic project is split into a couple of modules.The most critical one is heroic-component. It containsinterfaces, value objects, and the basic set of dependencies necessary to gluedifferent components together.Submodules include metric, suggest,metadata, and aggregation. The first threecontain various implementations of the given backend type, while the latterprovides aggregation methods.heroic-core contains thecom.spotify.heroic.HeroicCoreclass which is the central building block for setting up a Heroic instance.heroic-elasticsearch-utils is a collection ofutilities for interacting with Elasticsearch. This is separate since we havemore than one backend that needs to talk with elasticsearch.heroic-parser provides an Antlr4 implementation ofcom.spotify.heroic.grammar.QueryParser,which is used to parse the Heroic DSL.heroic-shell containscom.spotify.heroic.HeroicShell,a shell capable of either running a standalone, or connecting to an existingHeroic instance for administration.heroic-all contains dependencies and references to all modulesthat makes up a Heroic distribution. This is also where profiles are definedsince they need to have access to all dependencies.Anything in the repackaged directory is dependencies thatinclude one or more Java packages that must be relocated to avoid conflicts.These are exported under the com.spotify.heroic.repackaged groupId.Finally there is heroic-dist, a small project that depends onheroic-all, heroic-shell, and a loggingimplementation. Here is where everything is bound together into a distribution— a shaded jar. It also provides the entry-point for services, namelycom.spotify.heroic.HeroicService.Bypassing ValidationTo bypass automatic formatting and checkstyle validation you can use thefollowing stanza:// @formatter:offfinal List<String> list = ImmutableList.of( "Welcome to...", "... The Wild West");// @formatter:onTo bypass a FindBugs error, you should use the @SupressFBWarnings annotation.@SupressFBWarnings(value="FINDBUGS_ERROR_CODE", justification="I Know Better Than FindBugs")public class IKnowBetterThanFindbugs() { // ...}HeroicShellHeroic comes with a shell that contains many useful tasks, these can eitherbe run in a readline-based shell with some basic completions and history, orstandalone.You can use the following helper script to run the shell directly from theproject.$ tools/heroic-shell [opts]There are a few interesting options available, most notably is --connect thatallows the shell to connect to a remote heroic instance.See -h for a full listing of options.You can run individual tasks in standalone mode, giving you a bit moreoptions (like redirecting output) through the following.$ tools/heroic-shell <heroic-options> -- com.spotify.heroic.shell.task.<task-name> <task-options>There are also profiles that can be activated with the -P <profile> switch,available profiles are listed in --help.Repackaged DependenciesThese are third-party dependencies that has to be repackaged to avoid binaryincompatibilities with dependencies.Every time these are upgraded, they must be inspected for new conflicts.The easiest way to do this, is to build the project and look at the warningsfor the shaded jar.$> mvn clean package -D maven.test.skip=true...[WARNING] foo-3.5.jar, foo-4.5.jar define 10 overlapping classes:[WARNING] - com.foo.ConflictingClass...This would indicate that there is a package called foo with overlappingclasses.You can find the culprit using the dependency plugin.$> mvn package dependency:tree

Illustration Image

Build Status Codecov License Join the chat at https://gitter.im/spotify/heroic

A scalable time series database based on Bigtable, Cassandra, and Elasticsearch. Go to https://spotify.github.io/heroic/ for documentation, please join #heroic at Freenode if you need help or want to chat.

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

Stability Disclaimer: Heroic is an evolving project, and should in its current state be considered unstable. Do not use in production unless you are willing to spend time with it, experiment and contribute. Not doing so might result in losing your data to goblins. It is currently not on a release schedule and is not versioned. At Spotify we rely on multiple release forks that we actively maintain and flip between.

Building

Java 8 is required.

There are some repackaged dependencies that you have to make available, you do this by running tools/install-repackaged.

$ tools/install-repackaged
Installing repackaged/x
...

After this, the project is built using Maven:

$ mvn package

This will cause the heroic-dist module to produce a shaded jar that contains all required dependencies.

Running

After building, the entry point of the service is com.spotify.heroic.HeroicService. The following is an example of how this can be run:

$ java -cp $PWD/heroic-dist/target/heroic-dist-0.0.1-SNAPSHOT-shaded.jar com.spotify.heroic.HeroicService <config>

For help on how to write a configuration file, see the Configuration Section of the official documentation.

Heroic has been tested with the following services:

Logging

Logging is captured using SLF4J, and forwarded to Log4j.

To configure logging, define the -Dlog4j.configurationFile=<path> parameter. You can use docs/log4j2-file.xml as a base.

Testing

We run unit tests with Maven:

$ mvn test

A more comprehensive test suite is enabled with the environment=test property.

$ mvn -D environment=test verify

This adds:

It is strongly recommended that you run the full test suite before setting up a pull request, otherwise it will be rejected by Travis.

Remote Integration Tests

Integration tests are configured to run remotely depending on a set of system properties.

Elasticsearch
Property Description
-D elasticsearch.version=<version> Use the given client version when building the project
-D it.elasticsearch.remote=true Run Elasticsearch tests against a remote database
-D it.elasticsearch.seed=<seed> Use the given seed (default: localhost)
-D it.elasticsearch.clusterName=<clusterName> Use the given cluster name (default: elasticsearch)
Datastax
Property Description
-D datastax.version=<version> Use the given client version when building the project
-D it.datastax.remote=true Run Datastax tests against a remote database
-D it.datastax.seed=<seed> Use the given seed (default: localhost)
Bigtable
Property Description
-D bigtable.version=<version> Use the given client version when building the project
-D it.bigtable.remote=true Run Bigtable tests against a remote database
-D it.bigtable.project=<project> Use the given project
-D it.bigtable.zone=<zone> Use the given zone
-D it.bigtable.instance=<instance> Use the given instance
-D it.bigtable.credentials=<credentials> Use the given credentials file

The following is an example Elasticsearch remote integration test:

$> mvn -P integration-tests \
    -D elasticsearch.version=1.7.5 \
    -D it.elasticsearch.remote=true \
    clean verify

Full Cluster Tests

Full cluster tests are defined in heroic-dist/src/test/java.

This way, they have access to all the modules and parts of Heroic.

The JVM RPC module is specifically designed to allow for rapid execution of integration tests. It allows multiple cores to be defined and communicate with each other in the same JVM instance.

Coverage

Coverage

There's an ongoing project to improve test coverage. Clicking the above graph will bring you to codecov.io, where you can find areas to focus on.

Speedy Building

For a speedy build without tests and checks, you can run:

$ mvn -D maven.test.skip=true package

Building a Debian Package

This project does not provide a single debian package, this is primarily because the current nature of the service (alpha state) does not mesh well with stable releases.

Instead, you are encouraged to build your own using the provided scripts in this project.

First run the prepare-sources script:

$ debian/bin/prepare-sources myrel 1

myrel will be the name of your release, it will be part of your package name debian-myrel, it will also be suffixed to all helper tools (e.g. heroic-myrel).

For the next step you'll need a Debian environment:

$ dpkg-buildpackage -uc -us

If you encounter problems, you can troubleshoot the build with DH_VERBOSE:

$ env DH_VERBOSE=1 dpkg-buildpackage -uc -us

Contributing

Fork the code at https://github.com/spotify/heroic

Make sure you format the code using the provided formatter in idea. Even if you disagree with the way it is formatted, consistency is more important. For special cases, see Bypassing Validation.

If possible, limit your changes to one module per commit. If you add new, or modify existing classes. Keep that change to a single commit while maintaing backwards compatible behaviour. Deprecate any old APIs as appropriate with @Deprecated and add documentation for how to use the new API.

The first line of the commit should be formatted with [module1,module2] my message.

module1 and module2 are paths to the modules affected with any heroic- prefix stripped. So if your change affects heroic-core and metric/bigtable, the message should say [core,metric/bigtable] did x to y.

If more than 3 modules are affected by a commit, use [all]. For other cases, adapt to the format of existing commit messages.

Before setting up a pull request, run the comprehensive test suite as specified in Testing.

Module Orientation

The Heroic project is split into a couple of modules.

The most critical one is heroic-component. It contains interfaces, value objects, and the basic set of dependencies necessary to glue different components together.

Submodules include metric, suggest, metadata, and aggregation. The first three contain various implementations of the given backend type, while the latter provides aggregation methods.

heroic-core contains the com.spotify.heroic.HeroicCore class which is the central building block for setting up a Heroic instance.

heroic-elasticsearch-utils is a collection of utilities for interacting with Elasticsearch. This is separate since we have more than one backend that needs to talk with elasticsearch.

heroic-parser provides an Antlr4 implementation of com.spotify.heroic.grammar.QueryParser, which is used to parse the Heroic DSL.

heroic-shell contains com.spotify.heroic.HeroicShell, a shell capable of either running a standalone, or connecting to an existing Heroic instance for administration.

heroic-all contains dependencies and references to all modules that makes up a Heroic distribution. This is also where profiles are defined since they need to have access to all dependencies.

Anything in the repackaged directory is dependencies that include one or more Java packages that must be relocated to avoid conflicts. These are exported under the com.spotify.heroic.repackaged groupId.

Finally there is heroic-dist, a small project that depends on heroic-all, heroic-shell, and a logging implementation. Here is where everything is bound together into a distribution — a shaded jar. It also provides the entry-point for services, namely com.spotify.heroic.HeroicService.

Bypassing Validation

To bypass automatic formatting and checkstyle validation you can use the following stanza:

// @formatter:off
final List<String> list = ImmutableList.of(
   "Welcome to...",
   "... The Wild West"
);
// @formatter:on

To bypass a FindBugs error, you should use the @SupressFBWarnings annotation.

@SupressFBWarnings(value="FINDBUGS_ERROR_CODE", justification="I Know Better Than FindBugs")
public class IKnowBetterThanFindbugs() {
    // ...
}

HeroicShell

Heroic comes with a shell that contains many useful tasks, these can either be run in a readline-based shell with some basic completions and history, or standalone.

You can use the following helper script to run the shell directly from the project.

$ tools/heroic-shell [opts]

There are a few interesting options available, most notably is --connect that allows the shell to connect to a remote heroic instance.

See -h for a full listing of options.

You can run individual tasks in standalone mode, giving you a bit more options (like redirecting output) through the following.

$ tools/heroic-shell <heroic-options> -- com.spotify.heroic.shell.task.<task-name> <task-options>

There are also profiles that can be activated with the -P <profile> switch, available profiles are listed in --help.

Repackaged Dependencies

These are third-party dependencies that has to be repackaged to avoid binary incompatibilities with dependencies.

Every time these are upgraded, they must be inspected for new conflicts. The easiest way to do this, is to build the project and look at the warnings for the shaded jar.

$> mvn clean package -D maven.test.skip=true
...
[WARNING] foo-3.5.jar, foo-4.5.jar define 10 overlapping classes:
[WARNING]   - com.foo.ConflictingClass
...

This would indicate that there is a package called foo with overlapping classes.

You can find the culprit using the dependency plugin.

$> mvn package dependency:tree

Related Articles

mongo
rest
elasticsearch

GitHub - Erudika/para: Multitenant backend server for building web and mobile apps rapidly. The backend for busy developers. (self-hosted or hosted)

John Doe

1/26/2024

elasticsearch
cassandra

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

bigtable