spark.sql (hive) schema doesn't match Cassandra schema

Author: ChiMo

So I'm trying to do a simple select statement in spark.sql, however it comes up with an error even though the column clearly exists in the Cassandra table:

// Spark ------------------------------------    
spark.sql("SELECT value2 FROM myschema.mytable").show()

>> org.apache.spark.sql.AnalysisException: cannot resolve '`value2`' 
   given input columns: [key, value1]

// Cassandra --------------------------------
DESCRIBE myschema.mytable;

>> CREATE TABLE mytable.myschema (
>>    key int,
>>    value1 text,
>>    value2 text,
>>    PRIMARY KEY (key)
>> ) WITH ...;

I assume hive just isn't synced properly but running a table refresh command does NOT work. spark.sql("REFRESH TABLE myschema.mytable")
See https://spark.apache.org/docs/2.1.2/sql-programming-guide.html#metadata-refreshing

The only way I could get it to properly refresh was to:

  1. Move all data out of the table
  2. Drop the table
  3. Delete the hive metadata row
    DELETE FROM "HiveMetaStore".sparkmetastore WHERE key='_2_myschema' AND entity='org.apache.hadoop.hive.metastore.api.Table::mytable';
  4. Recreate table
  5. Copy all data back

Surely there is a better way?

Originally Sourced from: https://stackoverflow.com/questions/57104200/spark-sql-hive-schema-doesnt-match-cassandra-schema