Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

11/8/2021

Reading time:6 min

The Journey of Building a Scalable API

by John Doe

Nitesh KumarNov 1 · 7 min readAPIs are an essential tool to allow partners, developers, and applications to consume, communicate, or build on top of the various capabilities your microservices provide.Building a high quality API that can scale and perform with the business ecosystem is not easy and requires putting thought and planning into everything, from choosing an execution environment to even determining what API technology you will use.So how did we do it? In this blog post, I will share my experience of building the API for Activity Platform at Salesforce as a guide to writing an API for your own needs. Activity Platform is a big data event processing engine that ingests and analyzes over 100 million customer interactions every day to automatically capture data and to generate insights, recommendations, and feeds. Activity Platform provides APIs to serve these to our clients.Depending on the requirement, an execution environment could be bare metal, a virtual machine (VM), or an application container. We chose application containers, as these can run on a physical machine or in a VM, and a single operating system instance can support multiple containers, each running within its own, separate execution environment. In a nutshell, containers are lightweight, portable, fast, and easy to deploy and scale, so they are a natural fit for microservices.If you decide to go with containers, like we did, container orchestration will help you automate the deployment, management, scaling, and networking of containers. There are many container orchestration tools to consider: Kubernetes, Apache Mesos, or DC/OS (with Marathon), Amazon EKS, Google Kubernetes Engine (GKE), and others.We use Nomad clusters from Hashicorp. It’s simple, lightweight, and can orchestrate applications of any type — not just containers. It integrates seamlessly with Consul and Vault for service discovery and secrets management. You can easily describe the requirements a task needs to execute such as memory, network, CPU, and more along with specifying the number of instances you need to horizontally scale your service.To build an API, we chose GraphQL. If you haven’t heard of it, it is a popular alternative to other available options like REST, SOAP, Apache Thrift, OpenAPI/Swagger or gRPC.Why did we choose GraphQL?We wanted to build an API that can serve various clients ranging from web to mobile app. It needed to be efficient, powerful and flexible.GraphQL was the best fit for our needs for a few reasons:1). GraphQL is database agnostic and can serve data from anywhere you want for your defined business domain. This means that underneath you can use Cassandra, Elasticsearch, or an existing API from other modules for a single query.2). It allows clients to request exactly what they need, avoiding overfetching or underfetching. If an API returns more than what a client needs, there is a performance hit, and if it returns less, multiple network calls will slow the rendering time. GraphQL avoids both of these outcomes.3). While most APIs do versioning, GraphQL serves a versionless API, as it only returns the data that’s explicitly requested, so new capabilities can be added via new types and new fields on those types without creating a breaking change.4). GraphQL uses a strong type system where all the types are written in schema using the Graph SDL. It serves as the contract between the client and the server with no confusion about request/response structure.5). GraphLQ supports introspection, so schema definition can easily be shared or downloaded using various tools like GraphiQL , GraphQL- playground, or cli tools.We used GraphQL in our Classification Insight API. Classification Insight offers information about a user and helps meeting participants know the titles and roles of other people present at the meeting. For this API, we used Kotlin and graphql-java, a Java implementation of GraphQL.Step 1: Define your schema (e.g. schema.graphqls). Every GraphQL service defines a set of types. The most basic components of a GraphQL schema are object types, which represent a kind of object you can fetch from your service. Query type is to define the entry point of every GraphQL query.In the schema below, I have defined a query “getClassificationInsightsByUser” which can be called later by posting this payload to your running api (e.g. localhost:8080/api) :{ getClassificationInsightsByUser(emailAddresses: [“test1@gmail.com”, “test2@gmail.com”]) { userId, title } }schema.graphqls# object type to describe what you can fetchtype ClassificationInsightByUser {organizationId: ID!userId: String!emailAddress: String!title: String!}# Query type to define all your queriestype Query {getClassificationInsightsByUser(emailAddresses: [String!]!): [ClassificationInsightByUser]}schema {query: Query}Step 2: Implement Datafetcher (also known as resolver) to resolve the field getClassificationInsightsByUser. A resolver is basically a function provided by the developer to resolve each field of type defined in schema and return its value from the configured resources like a database, other APIs, or from cache, etc.In this example, our Query type provides a field called getClassificationInsightsByUser which accepts the argument emailAddresses. The resolver function for this field likely accesses a database and then constructs and returns a list of ClassificationInsightByUser object.// Assuming you already have your data class// (e.g. ClassificationInsightByUser) defined to hold the data// Write your datafetcher classclass ClassificationInsightByUserDataFetcher:DataFetcher<List<ClassificationInsightByUser>?> {// override DataFetcher's get function.override fun get(env: DataFetchingEnvironment):List<ClassificationInsightByUser>? { // get the argument passed in submitted queryval emailAddresses = env.getArgument<List<String>> (EMAIL_ADDRESSES)// write logic to get data from other API Or,// from your business layer calling your controller/service// Here, just returning the static data to keep it simple.return EntityData.getClassificationInsightByUser(emailAddresses)}}Step 3: Initialize GraphQLSchema and GraphQL Object (using graphql-java) to help execute the query.// load all your schema files as string using your own utility functionString schema = getResourceFileAsString("schema.graphqls")// create the typeRegistry from all your schema filesval schemaParser = SchemaParser()val typeDefinitionRegistry = TypeDefinitionRegistry()typeDefinitionRegistry.merge(schemaParser.parse(schema))// runtime wiring where you wire your query type to resolverval runtimeWiring = RuntimeWiring().type("Query", builder -> builder.dataFetcher("getClassificationInsightsByUser", ClassificationInsightByUserDataFetcher())).build();// create graphQL Schemaval schemaGenerator = SchemaGenerator();val graphQLSchema = schemaGenerator.makeExecutableSchema(typeDefinitionRegistry,runtimeWiring);// create graphQLval graphQL = GraphQL.newGraphQL(graphQLSchema).build();Step 4: Write a servlet (MyAppServlet) to handle incoming requests.override fun doPost(req: HttpServletRequest, resp:HttpServletResponse) {val jsonRequest = JSONObject(payloadString)val executionInput = ExecutionInput.newExecutionInput().query(jsonRequest.getString("query")).build()// execute your query using graphQL. // It will call your resolvers to get the data// and only return the data that was requested.val executionResult = graphQL.execute(executionInput)//send the responseresp.characterEncoding = "UTF-8"resp.writer.println(mapper.writeValueAsString(executionResult.toSpecification()))resp.writer.close()}Step 5: Embed the web server (jetty in this case) in your application.// The Serverval server = new Server();// HTTP connector, use HTTPS in productionval http = ServerConnector(server)http.host = "localhost"http.Port = 8080http.idleTimeout = 30000// Setup handlerval servletContextHandler = ServletContextHandler()servletContextHandler.contextPath = "/"servletContextHandler.addServlet(ServletHolder(MyAppServlet()), "/api")server.handler = servletContextHandler//start the jetty server to listen the requestserver.start()server.join()Step 6: Build and start your application. Use your CI/CD tool to create, publish, and deploy your Docker images to your cluster.At Salesforce, security is our top priority. Our APIs are accessible only to registered users, and they can access only the data that they have the permissions for. You may want to explore OAuth 2.0 (JWT grant type and role based access control) and Open Policy Agent (OPA) for your access control needs.As a best practice, your authentication middleware should be placed before GraphQL and have a single source of truth for authorization in the business logic layer, avoiding the need to check at multiple places. In addition to authentication and authorization, rate limiting, data masking, and payload scanning should also be considered while designing your API.We have demonstrated how to build a scalable, efficient, secure API. We used application containers to scale, GraphQL and embedded Jetty to make it efficient and lightweight, and prioritized the security aspects of our API. We will discuss other aspects of API development, such as security and deployment, in more detail in upcoming posts.

Illustration Image

Read this article if you want to know more about The Journey of Building a Scalable API

Nov 1 · 7 min read

APIs are an essential tool to allow partners, developers, and applications to consume, communicate, or build on top of the various capabilities your microservices provide.

Building a high quality API that can scale and perform with the business ecosystem is not easy and requires putting thought and planning into everything, from choosing an execution environment to even determining what API technology you will use.

So how did we do it? In this blog post, I will share my experience of building the API for Activity Platform at Salesforce as a guide to writing an API for your own needs. Activity Platform is a big data event processing engine that ingests and analyzes over 100 million customer interactions every day to automatically capture data and to generate insights, recommendations, and feeds. Activity Platform provides APIs to serve these to our clients.

Depending on the requirement, an execution environment could be bare metal, a virtual machine (VM), or an application container. We chose application containers, as these can run on a physical machine or in a VM, and a single operating system instance can support multiple containers, each running within its own, separate execution environment. In a nutshell, containers are lightweight, portable, fast, and easy to deploy and scale, so they are a natural fit for microservices.

If you decide to go with containers, like we did, container orchestration will help you automate the deployment, management, scaling, and networking of containers. There are many container orchestration tools to consider: Kubernetes, Apache Mesos, or DC/OS (with Marathon), Amazon EKS, Google Kubernetes Engine (GKE), and others.

We use Nomad clusters from Hashicorp. It’s simple, lightweight, and can orchestrate applications of any type — not just containers. It integrates seamlessly with Consul and Vault for service discovery and secrets management. You can easily describe the requirements a task needs to execute such as memory, network, CPU, and more along with specifying the number of instances you need to horizontally scale your service.

To build an API, we chose GraphQL. If you haven’t heard of it, it is a popular alternative to other available options like REST, SOAP, Apache Thrift, OpenAPI/Swagger or gRPC.

Why did we choose GraphQL?

We wanted to build an API that can serve various clients ranging from web to mobile app. It needed to be efficient, powerful and flexible.

GraphQL was the best fit for our needs for a few reasons:

1). GraphQL is database agnostic and can serve data from anywhere you want for your defined business domain. This means that underneath you can use Cassandra, Elasticsearch, or an existing API from other modules for a single query.

2). It allows clients to request exactly what they need, avoiding overfetching or underfetching. If an API returns more than what a client needs, there is a performance hit, and if it returns less, multiple network calls will slow the rendering time. GraphQL avoids both of these outcomes.

3). While most APIs do versioning, GraphQL serves a versionless API, as it only returns the data that’s explicitly requested, so new capabilities can be added via new types and new fields on those types without creating a breaking change.

4). GraphQL uses a strong type system where all the types are written in schema using the Graph SDL. It serves as the contract between the client and the server with no confusion about request/response structure.

5). GraphLQ supports introspection, so schema definition can easily be shared or downloaded using various tools like GraphiQL , GraphQL- playground, or cli tools.

We used GraphQL in our Classification Insight API. Classification Insight offers information about a user and helps meeting participants know the titles and roles of other people present at the meeting. For this API, we used Kotlin and graphql-java, a Java implementation of GraphQL.

Step 1: Define your schema (e.g. schema.graphqls). Every GraphQL service defines a set of types. The most basic components of a GraphQL schema are object types, which represent a kind of object you can fetch from your service. Query type is to define the entry point of every GraphQL query.

In the schema below, I have defined a query “getClassificationInsightsByUser” which can be called later by posting this payload to your running api (e.g. localhost:8080/api) :
{ getClassificationInsightsByUser(emailAddresses: [“test1@gmail.com”, “test2@gmail.com”]) { userId, title } }

schema.graphqls

# object type to describe what you can fetch
type ClassificationInsightByUser {
organizationId: ID!
userId: String!
emailAddress: String!
title: String!
}
# Query type to define all your queries
type Query {
getClassificationInsightsByUser(
emailAddresses: [String!]!
): [ClassificationInsightByUser]
}schema {
query: Query
}

Step 2: Implement Datafetcher (also known as resolver) to resolve the field getClassificationInsightsByUser. A resolver is basically a function provided by the developer to resolve each field of type defined in schema and return its value from the configured resources like a database, other APIs, or from cache, etc.

In this example, our Query type provides a field called getClassificationInsightsByUser which accepts the argument emailAddresses. The resolver function for this field likely accesses a database and then constructs and returns a list of ClassificationInsightByUser object.

// Assuming you already have your data class
// (e.g. ClassificationInsightByUser) defined to hold the data// Write your datafetcher class
class ClassificationInsightByUserDataFetcher:
DataFetcher<List<ClassificationInsightByUser>?> {// override DataFetcher's get function.
override fun get(env: DataFetchingEnvironment):
List<ClassificationInsightByUser>? {    // get the argument passed in submitted query
val emailAddresses = env.getArgument<List<String>>    (EMAIL_ADDRESSES)
// write logic to get data from other API Or,
// from your business layer calling your controller/service
// Here, just returning the static data to keep it simple.
return EntityData.getClassificationInsightByUser(emailAddresses)
}
}

Step 3: Initialize GraphQLSchema and GraphQL Object (using graphql-java) to help execute the query.

// load all your schema files as string using your own utility function
String schema = getResourceFileAsString("schema.graphqls")// create the typeRegistry from all your schema files
val schemaParser = SchemaParser()
val typeDefinitionRegistry = TypeDefinitionRegistry()
typeDefinitionRegistry.merge(schemaParser.parse(schema))// runtime wiring where you wire your query type to resolver
val runtimeWiring = RuntimeWiring()
.type("Query", builder -> builder.dataFetcher(
"getClassificationInsightsByUser", ClassificationInsightByUserDataFetcher()
)
)
.build();
// create graphQL Schema
val schemaGenerator = SchemaGenerator();
val graphQLSchema = schemaGenerator
.makeExecutableSchema(typeDefinitionRegistry,runtimeWiring);
// create graphQL
val graphQL = GraphQL.newGraphQL(graphQLSchema).build();

Step 4: Write a servlet (MyAppServlet) to handle incoming requests.

override fun doPost(req: HttpServletRequest, resp:
HttpServletResponse) {
val jsonRequest = JSONObject(payloadString)
val executionInput = ExecutionInput.newExecutionInput()
.query(jsonRequest.getString("query"))
.build()
// execute your query using graphQL. 
// It will call your resolvers to get the data
// and only return the data that was requested.
val executionResult = graphQL.execute(executionInput)//send the response
resp.characterEncoding = "UTF-8"
resp.writer.println(mapper.writeValueAsString(executionResult.toSpecification()))
resp.writer.close()}

Step 5: Embed the web server (jetty in this case) in your application.

// The Server
val server = new Server();// HTTP connector, use HTTPS in production
val http = ServerConnector(server)
http.host = "localhost"
http.Port = 8080
http.idleTimeout = 30000// Setup handler
val servletContextHandler = ServletContextHandler()
servletContextHandler.contextPath = "/"
servletContextHandler.addServlet(ServletHolder(MyAppServlet()), "/api")
server.handler = servletContextHandler//start the jetty server to listen the request
server.start()
server.join()

Step 6: Build and start your application. Use your CI/CD tool to create, publish, and deploy your Docker images to your cluster.

At Salesforce, security is our top priority. Our APIs are accessible only to registered users, and they can access only the data that they have the permissions for. You may want to explore OAuth 2.0 (JWT grant type and role based access control) and Open Policy Agent (OPA) for your access control needs.

As a best practice, your authentication middleware should be placed before GraphQL and have a single source of truth for authorization in the business logic layer, avoiding the need to check at multiple places. In addition to authentication and authorization, rate limiting, data masking, and payload scanning should also be considered while designing your API.

We have demonstrated how to build a scalable, efficient, secure API. We used application containers to scale, GraphQL and embedded Jetty to make it efficient and lightweight, and prioritized the security aspects of our API. We will discuss other aspects of API development, such as security and deployment, in more detail in upcoming posts.

Related Articles

LoopBack

John Doe

3/7/2024

GitHub - dreamfactorysoftware/df-cassandra: The DreamFactory Cassandra service

dreamfactorysoftware

3/7/2024

GitHub - dreamfactorysoftware/dreamfactory: DreamFactory API Management Platform

John Doe

3/7/2024

GitHub - stargate/stargate: An open source data gateway

John Doe

3/7/2024

an open-source, general-purpose, backend framework for the cloud.

John Doe

3/7/2024

Cassandra Database : A Comprehensive Guide with a Node.js Application.

chitaranjan biswal

3/7/2024

Create a profile app with Node.js using the API for Cassandra

TheovanKraay

3/7/2024

Modern backend (BaaS) frameworks - an overview of Usergrid, LoopBack, Para, BaasBox, Deployd and Telepat

erudika.com

2/14/2024

GitHub - dbgjerez/golang-rest-api-cassandra: Example using CQL and Go REST API

dbgjerez

2/14/2024

GitHub - DataStax-Examples/spring-data-starter: ⚡️ A sample Spring Data Cassandra REST API

John Doe

2/14/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company.

Contact Info

3 Washington Circle NW Suite 301 - Washington, D.C. 20037

support@anant.us

(855) 262-6526

Resources

Services

Careers

Events

Contact Us

Open Source Tools

Properties

Blog

Cassandra.Link

Cassandra.Tools

Anant Playbook

Awesome Cassandra

Follow Us

Github

Youtube

Twitter

Linkedin

Facebook

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company.

Illustration Image

Illustration Image

© 2023 Anant Corporation

Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation.