Spark 4.1.1 on AKS + Cosmos DB Cassandra API: ClassNotFound without connector, ClosedConnectionException with spark-cassandra-connector_2.13-3.5.1
Author: akshay kadam
Originally Sourced from: https://stackoverflow.com/questions/79904052/spark-4-1-1-on-aks-cosmos-db-cassandra-api-classnotfound-without-connector-c
We are upgrading a Spark job running on AKS (Kubernetes) from Spark 3.5.3 to Spark 4.1.1.
Current working setup (Spark 3.5.3):
Connector:
com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.5.1Target: Azure Cosmos DB Cassandra API
Reads/writes using the Cassandra data source (
org.apache.spark.sql.cassandra)
After upgrading to Spark 4.1.1 (on AKS), we see:
Case A – no connector jar provided
SparkClassNotFoundExceptionwhen calling.format("org.apache.spark.sql.cassandra")
Case B – with connector jar spark-cassandra-connector-assembly_2.12:3.5.1
- ClassNotFound is resolved, but connection fails during read/write with
java.io.IOException/ClosedConnectionException
Question:
What is the correct connector/artifact and configuration for Spark 4.1.1 on AKS to read/write?Cosmos DB Cassandra API?
Are there known changes needed?
val df = spark.read.format("org.apache.spark.sql.cassandra").options(Map("keyspace" -> "<ks>", "table" -> "<tbl>")).load()
Environment:
AKS + Spark on Kubernetes (spark-submit)
Spark: 3.5.3 (works) → 4.1.1 (fails)
Connector tried:
spark-cassandra-connector-assembly_2.13:3.5.1Target: Azure Cosmos DB Cassandra API