Is there a way to delete some records based on a select query?
I have this query,
Select min(id) from ID having count(*)>1
which will show the duplicates. I need to get those ids and delete them. How can I do it in spark sql?
Spark SQL cassandra delete records
by John Doe
Spark SQL does not support DELETE.
If the number of ids to delete is small, you can do it using the Cassandra driver instead of through Spark:
import scala.collection.JavaConverters._ import scala.collection.JavaConversions._ import com.datastax.driver.core.{Cluster, Session, BatchStatement} import com.datastax.driver.core.querybuilder.QueryBuilder val cluster = Cluster.builder().addContactPoint(host_ip).build() val session = cluster.connect(keyspace) val idsToDelete = ... // perform your query and collect the ids val queries ={ id => QueryBuilder.delete().from(keyspace, table).where(QueryBuilder.eq("id", id)) }) val batch = batchStatement().addAll(queries.asJava) session.execute(batch) cluster.close
