Contrary to perception, data does not sit static in a data repository. Data flows through an organization like blood in the circulatory system, and each day, each hour there are a myriad of touches to that “static” data. To the modern business, data is the crucial fluid that carries nutrients (information) to those business functions that consume it.
The movement of data imposes another dimension on a data quality strategy. Picture it as a moving target, like blood in the circulatory system. The question becomes where is the best place to intercept the data, while in transit, so that it can be cleansed and validated? The human body has its own answer, and that is the liver, but for those of us building data systems, we have many more options.
A data flow or system architecture diagram (shown above) is created as part of a data quality strategy and will indicate where the data is captured, manipulated, and stored. Knowing these locations provides the strategist a selection of the best locations to cleanse and monitor the data given the project objectives (goals). The effort of evaluating the data flow will allow the strategist to refine the results compiled in the connectivity and subject area aspects as both of those are examined when building a data flow diagram. The data flow diagram will depict access options to the data, and catalogs the locations in a networked environment where the data is staged and manipulated. These can be thought of as opportunities to cleanse the data.
These opportunities fall into the following categories:
- Transactional Updates
- Operational Feeds
- Third Party Data
- Data Migrations
- Regular Maintenance
The movement of data is spawned by two general operations: automated processes such as a nightly ETL script, and manual processes such as the data entry by a sales person into a mobile CRM application. Data flow analysis must consider both automated and manual initiators of data movement. For evaluation of manual activities the data flow analysis turns into the task of work flow analysis.
Work flow and data flow are closely related. A work flow, such as entering a new product code, immediately spawns a data flow. It is important to inventory these work flow touch points as they represent points of capture, and are opportunities to validate and cleanse data as it is created. The highest incidence of data quality errors, other than data aging, occur in the manual entry of data and therefore merits significant attention in the strategy.
User interface drop-down fields where a value is selected rather than having it entered free form is a common tactic used to ensure data integrity at the point of capture. There are numerous other tactics, such as back-end business rule checking. Just because an entry in a field may be valid against a given domain of possibilities does not mean the entry is valid in the context of all the other data. For example, a California ZIP code may be valid, but entering it for a Michigan address invalidates it. Back-end business rule checking can catch these types of work flow generated errors.
Learn the most important questions to ask before you spend a single dollar by downloading our free Utopia whitepaper Creating a Data Quality Strategy.