Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

10/13/2021

Reading time:4 min

How to Mask Cassandra with IRI FieldShield - IRI

by John Doe

How to Mask Cassandra with IRI FieldShield by Nathan Dymora UPDATE: Q2’17: Included JDBC and ODBC drivers, and pending native JSON handling can make the connections faster and more seamless than the approach shown below. You can also use the Hadoop edition of IRI Voracity to mask data in HDFS directly. Data exported in the manner below to a flat-file in HDFS can be masked at the field level from the FieldShield module Voracity in the same way shown below via MapReduce2, Spark, Spark Stream, Storm or Tez without additional coding.UPDATE: Q3’19: Now available in IRI DarkShield are native out-of-the-box connectors for Cassandra and MongoDB that do not require export/mask/import steps, and can handle semi- and un-structured collection data. Like FieldShield, DarkShield is also available (included) in subscriptions to the Voracity platform. See this how-to-article for finding and masking PII floating in Cassandra NoSQL DBs. If you are interested in populating Cassandra with synthetic test data, see this article.DataStax / Cassandra cannot mask specific columns to stop unauthorized users from seeing personally identifiable information (PII). In this article, we will show how to export data from a Cassandra database to a CSV file, and use the IRI FieldShield product (also available in the IRI Voracity platform) to randomize, mask, and encrypt the data in that CSV file. Then we will import that data back into Cassandra, updated and protected. Another article is planned upon release of IRI’s more direct connection to the DB.In our example we have a table with student information in DataStax. The columns are: id_num (the primary key), honors, semester_hours, socsecnum (social security number), and user_id.To de-identify each student, we will randomize their semester_hours, mask their socsecnum, and encrypt the user_id value so that each student’s PII is protected.Unprotected data in CassandraFor our command line examples, we are using the DataStax Cassandra CQL shell.Exporting a Cassandra table to a CSV fileTo export a Cassandra table to a CSV file, run the following command from the Cassandra CQL shell:COPY <Exporting Table> TO '<File Path to the CSV you exporting to>' WITH HEADER=TRUE;Opening and Defining the CSV SourceFrom the top toolbar of the IRI Workbench GUI for FieldShield (built on Eclipse), find the FieldShield icon and select ‘New Protect Job’ to run the wizard. Name your job file, “CassShieldJob.fcl” and click ‘Next’.Click ‘Add Data Source’, and then ‘Browse’ to the CSV file you created with your export. Then click ‘OK’.Click ‘Discover Metadata’ and name the (re-usable data definition file) “StudentsMeta.ddf” and click ‘Next’. Click ‘Yes’ to auto-detect and “Get field names from header row’ before clicking ‘OK’, then ‘Finish’.You should now see your fields in the Data Sources list. Click ‘Next’.Applying the ProtectionsClick ‘Add Data Target’ and browse to create a new CSV file target to hold the protected version of the data (or to overwrite the source, use the same path/file. Then click ‘Target Field Layout’, to review the layout and apply the protections to the output fields. There are two list boxes; the top shows the source fields, and the bottom shows the target fields to which you will apply the field “shields.”Randomize semester_hours:  Select the semester_hours field name in the target pane and click ‘Field Protection.” Select Random Value Generation, and for type, select Digit. Set the random minimum and maximum sizes to 2 and click OK.Mask socsecnum: Select the socsecnum field and click ‘Field Protection’ again. Select Masking, and choose the default USA SSN option to expose only the last four digits of the SSNs. Click ‘OK’.Encrypt user_id: Select the user_id and  click ‘Field Protection’ again. Select Encryption and Decryption and choose enc_fp_aes256_alphanum for format preserving encryption. Enter a passphrase (the decryption key) and in the Exclude box type “User_” to prevent the User_ part of our username from being encrypted, Then click ‘Finish’.The fields to be protected should appear as follows:Click ‘OK’, and then ‘Finish’ to see the actual (executable job) script created for you:Run the job in the IRI Workbench or on the command line,fieldshield /spec=CassShieldJob.fclThe CSV file you created will contain the new masked and encrypted data.Importing the protected data into CassandraTo import the changed data back into Cassandra, use the following command in the Cassandra CQL shell:COPY <Table you are importing data to> (field1fromCSV, field2fromCSV, ...) FROM '<Path to CSV>';Protected data in CassandraFor details on FieldShield capabilities, see http://www.iri.com/products/fieldshield/technical-details.

Illustration Image

How to Mask Cassandra with IRI FieldShield

UPDATE: Q2’17: Included JDBC and ODBC drivers, and pending native JSON handling can make the connections faster and more seamless than the approach shown below. You can also use the Hadoop edition of IRI Voracity to mask data in HDFS directly. Data exported in the manner below to a flat-file in HDFS can be masked at the field level from the FieldShield module Voracity in the same way shown below via MapReduce2, Spark, Spark Stream, Storm or Tez without additional coding.

UPDATE: Q3’19: Now available in IRI DarkShield are native out-of-the-box connectors for Cassandra and MongoDB that do not require export/mask/import steps, and can handle semi- and un-structured collection data. Like FieldShield, DarkShield is also available (included) in subscriptions to the Voracity platform. See this how-to-article for finding and masking PII floating in Cassandra NoSQL DBs. If you are interested in populating Cassandra with synthetic test data, see this article.

DataStax / Cassandra cannot mask specific columns to stop unauthorized users from seeing personally identifiable information (PII). In this article, we will show how to export data from a Cassandra database to a CSV file, and use the IRI FieldShield product (also available in the IRI Voracity platform) to randomize, mask, and encrypt the data in that CSV file. Then we will import that data back into Cassandra, updated and protected. Another article is planned upon release of IRI’s more direct connection to the DB.

In our example we have a table with student information in DataStax. The columns are: id_num (the primary key), honors, semester_hours, socsecnum (social security number), and user_id.

To de-identify each student, we will randomize their semester_hours, mask their socsecnum, and encrypt the user_id value so that each student’s PII is protected.

Unprotected data in Cassandra

Cassandra_FieldShield_InCassBefore

For our command line examples, we are using the DataStax Cassandra CQL shell.

Exporting a Cassandra table to a CSV file

To export a Cassandra table to a CSV file, run the following command from the Cassandra CQL shell:

COPY <Exporting Table> TO '<File Path to the CSV you exporting to>' WITH HEADER=TRUE;

Cassandra_FieldShield_CassandraExport

Opening and Defining the CSV Source

  1. From the top toolbar of the IRI Workbench GUI for FieldShield (built on Eclipse), find the FieldShield icon and select ‘New Protect Job’ to run the wizard. Name your job file, “CassShieldJob.fcl” and click ‘Next’.
  2. Click ‘Add Data Source’, and then ‘Browse’ to the CSV file you created with your export. Then click ‘OK’.
  3. Click ‘Discover Metadata’ and name the (re-usable data definition file) “StudentsMeta.ddf” and click ‘Next’. Click ‘Yes’ to auto-detect and “Get field names from header row’ before clicking ‘OK’, then ‘Finish’.
  4. You should now see your fields in the Data Sources list. Click ‘Next’.

Cassandra_FieldShield_DetectLayout

Applying the Protections

  1. Click ‘Add Data Target’ and browse to create a new CSV file target to hold the protected version of the data (or to overwrite the source, use the same path/file. Then click ‘Target Field Layout’, to review the layout and apply the protections to the output fields. There are two list boxes; the top shows the source fields, and the bottom shows the target fields to which you will apply the field “shields.”
  2. Randomize semester_hours:  Select the semester_hours field name in the target pane and click ‘Field Protection.” Select Random Value Generation, and for type, select Digit. Set the random minimum and maximum sizes to 2 and click OK.
  3. Mask socsecnum: Select the socsecnum field and click ‘Field Protection’ again. Select Masking, and choose the default USA SSN option to expose only the last four digits of the SSNs. Click ‘OK’.
  4. Encrypt user_id: Select the user_id and  click ‘Field Protection’ again. Select Encryption and Decryption and choose enc_fp_aes256_alphanum for format preserving encryption. Enter a passphrase (the decryption key) and in the Exclude box type “User_” to prevent the User_ part of our username from being encrypted, Then click ‘Finish’.
  5. The fields to be protected should appear as follows:

Cassandra_FieldShield_EncryptDataWindow

Click ‘OK’, and then ‘Finish’ to see the actual (executable job) script created for you:

Cassandra_FieldShield_FinalScript

Run the job in the IRI Workbench or on the command line,

fieldshield /spec=CassShieldJob.fcl

The CSV file you created will contain the new masked and encrypted data.

Importing the protected data into Cassandra

To import the changed data back into Cassandra, use the following command in the Cassandra CQL shell:

COPY <Table you are importing data to> (field1fromCSV, field2fromCSV, ...) FROM '<Path to CSV>';

Cassandra_FieldShield_CassandraImport

Protected data in Cassandra

Cassandra_FieldShield_InCassAfter

For details on FieldShield capabilities, see http://www.iri.com/products/fieldshield/technical-details.

Related Articles

cluster
troubleshooting
datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra