Data Wiki

| Data Quality

01 | What are the characteristics of data quality

At a very high level, the following characteristics define the quality of data:

  • How complete the data is

  • Is it accurate and reliable

  • Is it available when needed, and is it up-to-date, aka timeliness

Different organizations prioritize the requirements that define data quality based on the need, usage, and the life cycle of the processes that use it.

02 | What downstream impact does poor data quality have?

Data is an important asset used to make crucial decisions. If important business decision-making processes use data that is inherently poor in quality, it will create a ripple effect on all processes that consume it.  The time, effort, and cost of triaging and cleaning data at this point have proven to be frustrating and a low ROI.

03 | What are some of the reasons for poor data quality?

There could be many different factors that contribute to poor data quality:

  • Human Error during data entry

  • Data collated from different data sources attributing to anomalies

  • Missing values

  • Erroneous data

04 | What are some ways data quality can be improved?​

Data can get corrupted due to many different factors. With the necessary tools and processes in place, these can be pre-empted at the beginning of the lifecycle rather than late troubleshooting, which adds to the time and cost. Some of the ways data quality can be improved are:

  • Implementing a data anomaly detection tool catch issues that could break the system

  • Unit testing processes

  • Business rules

  • Profiling