Friday, July 04, 2008

Data Masking: Case for a Reverse Firewall

The phrase Data Quality Firewall has been getting press for a couple years now, most recently from Mike Ferguson at DataFlux. The idea is to sanitize the data at runtime, preventing poor-quality data from ever entering corporate systems. Most DQ software vendors have enabled SOA access to their routines specifically for the "data quality firewall" purpose.

It seems to me a similar approach applies when scrubbing outbound data. There's a class of use cases for which a Reverse Firewall approach to data masking makes good sense.

As with early data quality initiatives, data masking is often seen as a batch-processing task. We build tapes or files en masse to send off for some outsourced (or partnered) processing. Similarly, we generate large quantities of sanitized test data for internal development projects. And honestly, it's easier for me to work through inter-table relationships in this approach.

That said, the interconnected economy demands real-time communication with our partners and vendors. Masking sensitive fields, or pieces of data within fields, as needed calls for an on-demand approach -- a "reverse firewall" of reusable masking routines applied to data on the outbound pipe.

The same set of routines used in batch-level masking should also be usable via real-time calls. Yes, I realize much of the masking code "in the wild" wasn't written with an eye to reusability. Neither were the early data quality routines, nor a slew of other programs for that matter.

It's an evolutionary reality: masking routines will be needed more often, with stronger benefits gained from reusable logic. Organizations able to quickly leverage the routines on a "plug and play" basis will have a competitive advantage -- they'll be able to partner more easily while simultaneously reducing their data risk.

Many of the same SOA-enabled best practices used to clean data on the way in will be used to filter it on the way out: a Data Masking Reverse Firewall.

1 comments:

Anonymous said...

Good post Beth.

The concept of a DQ firewall has been around for some time but just in different terminology I feel.

One of the first things I try and look at in a DQ assessment is setting up consumer/supplier SLA (Service Level Agreement) for the data flowing across that relationship, these can be across organisation departments, systems or even external suppliers/consumers.

This can often be tough to introduce but the benefits are huge. I've seen lead times drop from 4 months to 4 weeks just from vetting supplier feeds.

The issue I have with the firewall approach is that it implies one side does all the work, to really make things work you need both sides of the fence to continuously improve their DQ processes.

In order to launch a firewall initiative I think some of the Lean tools are priceless for this. Calculating the non-value added time incurred along the information chain will identify hot-spots of delay and cost where processes need to change and DQ "hand-shaking" processes need to be created.

This doesn't need a high-end DQ platform though so don't be put off by the big DQ vendors flying the firewall flag. I first started creating DQ "firewalls" in 1992 with Informix and Unix shell scripts, fostering support of the supplier is far more important than high-end technology.

Dylan Jones
DataMigrationPro.com
Global Professional Community