Is there a way to delete some records based on a select query?
I have this query,
Select min(id) from ID having count(*)>1
which will show the duplicates. I need to get those ids and delete them. How can I do it in spark sql?
The best knowledge base on Apache Cassandra®
Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.
10/15/2018
Reading time:1 min
Spark SQL cassandra delete records
by John Doe
Vote count: 0 Is there a way to delete some records based on a select query?I have this query,Select min(id) from ID having count(*)>1 which will show the duplicates. I need to get those ids and delete them. How can I do it in spark sql? asked Apr 28 '16 at 4:48 ashK 1881214 1 Answer 1 Vote count: 0 Spark SQL does not support DELETE.If the number of ids to delete is small, you can do it using the Cassandra driver instead of through Spark:import scala.collection.JavaConverters._import scala.collection.JavaConversions._import com.datastax.driver.core.{Cluster, Session, BatchStatement}import com.datastax.driver.core.querybuilder.QueryBuilderval cluster = Cluster.builder().addContactPoint(host_ip).build()val session = cluster.connect(keyspace)val idsToDelete = ... // perform your query and collect the idsval queries = idsToDelete.map({ id => QueryBuilder.delete().from(keyspace, table).where(QueryBuilder.eq("id", id)) })val batch = batchStatement().addAll(queries.asJava)session.execute(batch)cluster.close edited Oct 27 '16 at 16:34 answered Oct 27 '16 at 15:18 Didier
Is there a way to delete some records based on a select query?
I have this query,
Select min(id) from ID having count(*)>1
which will show the duplicates. I need to get those ids and delete them. How can I do it in spark sql?
Spark SQL does not support DELETE.
If the number of ids to delete is small, you can do it using the Cassandra driver instead of through Spark:
import scala.collection.JavaConverters._ import scala.collection.JavaConversions._ import com.datastax.driver.core.{Cluster, Session, BatchStatement} import com.datastax.driver.core.querybuilder.QueryBuilder val cluster = Cluster.builder().addContactPoint(host_ip).build() val session = cluster.connect(keyspace) val idsToDelete = ... // perform your query and collect the ids val queries = idsToDelete.map({ id => QueryBuilder.delete().from(keyspace, table).where(QueryBuilder.eq("id", id)) }) val batch = batchStatement().addAll(queries.asJava) session.execute(batch) cluster.close
Related Articles
Join Our Newsletter!
Sign up below to receive email updates and see what's going on with our company
Explore Further
cassandra
spark
sql