Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

4/25/2021

Reading time:1 min

The premier open source Data Quality solution

by John Doe

The premier open source Data Quality solution.What is DataCleaner?Data profilingThe heart of DataCleaner is a strong data profiling engine for discovering and analyzing the quality of your data. Find the patterns, missing values, character sets and other characteristics of your data values.Interrogating and profiling your data is an essential activity of any Data Quality, Master Data Management or Data Governance program. If you don’t know what you’re up against, you have poor chances of fixing it.Data wranglingDataCleaner is built to handle data both big and small. Give everything from CSV files, Excel spreadsheets to Relational Databases (RDBMs) and NoSQL databases a spin!Use reference data, external and internal, in order to verify that the data values you have correspond to the real world. DataCleaner allows you to build your own cleansing rules and compose them into several use scenarios or target databases. Whether it is simple search/replace rules, regular expressions, pattern matching or completely custom transformations, it’s all possible.A Data Quality eco-systemPluggability and Connectivity are keywords for the open source design philosophy of DataCleaner. The application delivers not only out-of-the-box functionality, but also hosts an eco-system of community driven application extensions integrations, shared content and more.Developers have the ability to embed DataCleaner into other applications, build plug-ins for the specific use case or even utilize adaptors that make DataCleaner work with Apache Hadoop and Apache Spark. Other prominent integrations exist around the integration with Pentaho Data Integration as well as support for custom data source definitions via the Apache MetaModel framework.

Illustration Image

The premier open source Data Quality solution.

What is DataCleaner?

Data profiling

The heart of DataCleaner is a strong data profiling engine for discovering and analyzing the quality of your data. Find the patterns, missing values, character sets and other characteristics of your data values.

Interrogating and profiling your data is an essential activity of any Data Quality, Master Data Management or Data Governance program. If you don’t know what you’re up against, you have poor chances of fixing it.

Data wrangling

DataCleaner is built to handle data both big and small. Give everything from CSV files, Excel spreadsheets to Relational Databases (RDBMs) and NoSQL databases a spin!

Use reference data, external and internal, in order to verify that the data values you have correspond to the real world. DataCleaner allows you to build your own cleansing rules and compose them into several use scenarios or target databases. Whether it is simple search/replace rules, regular expressions, pattern matching or completely custom transformations, it’s all possible.

A Data Quality eco-system

Pluggability and Connectivity are keywords for the open source design philosophy of DataCleaner. The application delivers not only out-of-the-box functionality, but also hosts an eco-system of community driven application extensions integrations, shared content and more.

Developers have the ability to embed DataCleaner into other applications, build plug-ins for the specific use case or even utilize adaptors that make DataCleaner work with Apache Hadoop and Apache Spark. Other prominent integrations exist around the integration with Pentaho Data Integration as well as support for custom data source definitions via the Apache MetaModel framework.

Related Articles

hive
elasticsearch
cassandra

GitHub - embulk/embulk: Embulk: Pluggable Bulk Data Loader.

embulk

12/1/2023

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

data.processing