At a very high level, the following characteristics define the quality of data:
How complete the data is
Is it accurate and reliable
Is it available when needed, and is it up-to-date, aka timeliness
Different organizations prioritize the requirements that define data quality based on the need, usage, and the life cycle of the processes that use it.
Data is an important asset used to make crucial decisions. If important business decision-making processes use data that is inherently poor in quality, it will create a ripple effect on all processes that consume it. The time, effort, and cost of triaging and cleaning data at this point have proven to be frustrating and a low ROI.
There could be many different factors that contribute to poor data quality:
Human Error during data entry
Data collated from different data sources attributing to anomalies
Data can get corrupted due to many different factors. With the necessary tools and processes in place, these can be pre-empted at the beginning of the lifecycle rather than late troubleshooting, which adds to the time and cost. Some of the ways data quality can be improved are:
Implementing a data anomaly detection tool catch issues that could break the system
Unit testing processes