What’s Wrong with the Single Version of Truth

11:59 am in SOA Implementation, SOA Solutions by admin

As @tonyrcollins reports, a confidential report currently in preparation on the NHS Summary Care Records (SCR) database will reveal serious flaws in a massively expensive database (Computer Weekly, March 2010). Well knock me down with a superbug, whoever would have guessed this might happen?

“The final report may conclude that the success of SCRs will depend on whether the NHS, Connecting for Health and the Department of Health can bridge the deep cultural and institutional divides that have so far characterised the NPfIT. It may also ask whether the government founded the SCR on an unrealistic assumption: that the centralised database could ever be a single source of truth.”

There are several reasons to be ambivalent about the twin principles Single Version of Truth (SVOT) and Single Source of Truth (SSOT), and this kind of massive failure must worry even the most fervent advocates of these principles.

Don’t get me wrong, I have served my time in countless projects trying to reduce the proliferation and fragmentation of data and information in large organizations, and I am well aware of the technical costs and business risks associated with data duplication. However, I have some serious concerns about the dogmatic way these principles are often interpreted and implemented, especially when this dogmatism results (as seems to be the case here) in a costly and embarrassing failure.

The first problem is that Single-Truth only works if you have absolute confidence in the quality of the data. In the SCR example, there is evidence that doctors simply don’t trust the new system – and with good reason. There are errors and omissions in the summary records, and doctors prefer to double-check details of medications and allergies, rather than take the risk of relying on a single source.

The technical answer to this data quality problem is to implement rigorous data validation and cleansing routines, to make sure that the records are complete and accurate. But this would create more work for the GP practices uploading the data. Officials at the Department of Health fear that setting the standards of data quality too high would kill the scheme altogether.

There is a fundamental conflict of interest here between the providers of data and the consumers – even though these may be the same people – and between quality and quantity. If you measure the success of the scheme in terms of the number of records uploaded, then you are obviously going to get quantity at the expense of quality.

So the pusillanimous way out is to build a database with imperfect data, and defer the quality problem until later. That’s what people have always done, and will continue to do, and the poor quality data will never ever get fixed.

The second problem is that even if perfectly complete and accurate data are possible, the validation and data cleansing step generally introduces some latency into the process, especially if you are operating a post-before-processing system (particularly relevant to environments such as military and healthcare where, for some strange reason, matters of life-and-death seem to take precedence over getting the paperwork right). So there is a design trade-off between two dimensions of quality – timeliness and accuracy.

The third problem is complexity. Data cleansing generally works by comparing each record with a fixed schema, which defines the expected structure and rules (metadata) to which each record must conform, so that any information that doesn’t fit into this fixed schema will be barred or adjusted. Thus the richness of information will be attenuated, and useful and meaningful information may be filtered out. (See Jon Udell’s piece on Object Data and the Procrustean Bed from March 2000. See also my presentation on SOA for Data Management.)

The final problem is that a single source of information represents a single source of failure. If something is really important, it is better to have two independent sources of information or intelligence, as I pointed out in my piece on Information Algebra. This follows Bateson’s slogan that “two descriptions are better than one”. Doctors using the SCR database appear to understand this aspect of real-world information better than the database designers.

It may be a very good idea to build an information service that provides improved access to patient information, for those who need this information. But if this information service is designed and implemented according to some simplistic dogma, then it isn’t going to work properly.