Tuesday, May 22, 2007

Defining Data Quality

Ever tried to define data quality?  David Loshen gives it a go over at DataFlux.  [I spent this winter leading data quality assessments for a packaged goods manufacturer, so David's article is relevant to me.] 


He defines it from the process perspective, which is probably the best approach. It did get me thinking: how would we define data quality if we defined in terms of data characteristics?


 The usual characteristics mentioned in the literature include accuracy, correctness, completeness, currency, and relevance.  The challenge, though, is that each organization has unique needs and will (very appropriately) prioritize those characteristics differently.  You have to know the business requirements of the organization to develop a good data quality program.


If I had to pull together a definition it'd be something like this:  "Data quality is the unique combination of data accuracy, completeness, correctness, currency, and relevance that best meet the specific needs of the organization for a given period of time." (Implicit in that "point in time" clause is a need for data governance and ongoing data quality auditing.)


I'm sure there's some flaws in this definition -- just take it as a starting point. 


Powered by Qumana


0 comments: