Scala: get data from scylla using spark


Author: ahajib

Originally Sourced from: https://stackoverflow.com/questions/67204889/scala-get-data-from-scylla-using-spark

scala/spark newbie here. I have inherited an old code which I have refactored and been trying to use in order to retrieve data from Scylla. The code looks like:

val TEST_QUERY = s"SELECT user_id FROM test_table WHERE name = ? AND id_type = 'test_type';"

var selectData = List[Row]()
dataRdd.foreachPartition {
  iter => {
    // Build up a cluster that we can connect to
    // Start a session with the cluster by connecting to it.
    val cluster = ScyllaConnector.getCluster(clusterIpString, scyllaPreferredDc, scyllaUsername, scyllaPassword)
    var batchCounter = 0

    val session = cluster.connect(tableConfig.keySpace)
    val preparedStatement: PreparedStatement = session.prepare(TEST_QUERY)

    iter.foreach {
      case (test_name: String) => {
        // Get results
        val testResults = session.execute(preparedStatement.bind(test_name))
        if (testResults != null){
          val testResult = testResults.one()
          if(testResult != null){
            val user_id = testResult.getString("user_id")
            selectData ::= Row(user_id, test_name)
          }
        }
      }
    }
    session.close()
    cluster.close()
  }
}

println("Head is =======> ")
println(selectData.head)

The above does not return any data and fails with null pointer exception because the selectedData list is empty although there is data in there for sure that matches the select statement. I feel like how I'm doing it is not correct but can't figure out what needs to change in order to get this fixed so any help is much appreciated.

PS: The whole idea of me using a list to keep the results is so that I can use that list to create a dataframe. I'd be grateful if you could point me to the right direction here.