I’ve been thinking a lot lately about ‘Smart Data’ which is the UK Government/ Policy naming and framing for the general free-ing up of data to move beyond the current silo-ed approach into more of a networked economy running on trust frameworks. There is a published strategy, an enabling bill, and a big number associated (£27.8bn contribution to UK GDP targeted). I happen to think that we can actually make this work in UK; the conditions are right and we have more than a decade of learning on trying to make data portability happen to tap into.
That being the case, i’m going to write up my own thinking over a series of posts here. There will be at least 10 of them as the issues are multi-faceted and span at least 7 industry sectors. So here goes.
First off, definitions; there are quite a few takes on ‘smart data’. Here’s one from the Competition and Market Authority who lead on making data portable as an enabler of more competitive markets/ reducing monopoly powers. (Thanks to Liz Brandt et al for the recent Smart Data Forum session, the write-up of which provides a number of key inputs to my logic).
"Smart Data is the secure sharing of customer data, upon the customer's request, with Authorised Third-party Providers (ATPs). The ATPs can then enhance the customer data with broader, contextual 'business' data, which may be provided directly to ATPs or may have been made' open'. Customers may be individual consumers or business customers."
The CMA further characterises Smart Data as follows:
Relates to customer data & business data (e.g. related to products, transactions, service usage)
Is shared in specific formats (e.g. data standards, time frames)
Is accessible by ATPs with permission of customers
Enables customer data to be actionable by ATPs (e.g. payments can be made on behalf of the customer)
That’s a good start. It mirrors/ borrows the logic from Open Banking which we have had in UK since 2017 and there is much to learn from that.
Here’s the definition i’d use from DataPal, which plans to be one of the Authorised Third Party providers for Smart Data in all the sectors. We will have a very specific take on that provider role in that we believe there is a necessary branch of these third party providers that acts on a fiduciary basis for the individual when it is personal data that is being made portable.
DataPal definition: Smart Data is that which flows in secure, transparent pipelines from its master source to downstream data using services under clear and fair terms that generate value for the data providers. Different data sources can be merged into the pipelines, much as tributaries merge into a river. Smart data will be well documented, have associated metadata, and the rights of data providers will be enabled and actionable in all parts of its journey. It will be augmented wherever relevant, and will typically operate within a trust framework or network ecosystem.
So, that’s the big picture, top down logic. But in order to make things work in practice we need to look also from the bottom up - what technical changes are required, at what level, in which technology stack(s)?
In technical terms, data will always reside on a silicon (or similar) chip in the form of 1’s and 0’s managed via transistors and accessed by computer programmes/ applications through an operating system. Critically those chips and the data on them will be owned and governed by entities; access to that data is thus governed by the nature of the entity and the access terms they have set. And, by definition, data exchange always requires two or more parties. Actually, in practice every data exchange on The Internet is bi-lateral, even if there are a million parties connecting to the same end point. In legal and technical terms that is a million data exchanges/ connections.
The crunch point is that when smart data requires flow/ movement from one chip to another there are four types of challenge to address:
1. Technical portability of data, which requires the understanding and management of create/ read/ update/ delete permissions across multiple operating systems, multiple methods through which data are structured and documented, and bilateral outcomes (changes in state on the chip).
2. Contractual portability of data, which requires the parties involved to agree the rules and incentives that lead them to enact the technical movement.
3. Cultural support for portability of data. Organisations and their leadership have been trained over many years that ‘data is valuable’ and intuitively believe that if they gather and manage the data then it belongs to them. Data portability is thus problematic for organisations at the gut level. That’s why projects like the UK Gov MiData programme delivered very slowly to the lowest common denominator, and projects like The Pensions Dashboard take vastly more time than they should. Neither project posed any major technical difficulties if there was a genuine will to make them happen within all stakeholders..
4. Consumer acceptance and adoption of the data portability methods, and the services built on them.
Critically, I think this is an ‘all or nothing’ scenario. That is to say, all four challenges must have been addressed to at least some significant extent before data will flow at scale.
So, there is very little smart data flowing at present (*). The barriers are too high, the incentives too low; there is no common set of data standards or rules to make things easier, there are few quantified outcomes and there is a perception that demand for data portability does not exist or is limited.
(*) Open Banking is clearly a forerunner in addressing the above in The United Kingdom. There are many good things about that programme, and lessons learned that can lead to improvement. But I, and others, would contend that it has reached nothing like the impact levels hoped for at the start of the programme.
For me there are two significant reasons that I focus on as regards why open banking has achieved less than it was originally targeted to do.
- The consent journey is complex and has too many clicks (this is probably not resolvable so long as consent is seen as the best legal basis)
- Data quality is sufficiently poor in that the data being ported that outcomes are held back. More on that when we get to the deep dive post on Open Banking going forward.
But now the good news….Smart Data and the Smart Data Roadmap were conceived prior to three significant innovations, each of which will contribute to addressing those four challenges noted above.
The massive wave of innovation that emerged and continues to emerge through generative AI, and now the rise of AI powered agents.
Very robust personal digital identifiers that are genuinely owned and controlled by individuals (digital identifiers are the ‘end points’ for all data exchange on The Internet)
The availability of standardised privacy policies written from the individual perspective (via IEEE7012), again a significant enabler of fair exchange. This standard will enable organisations to build contract based data exchange which are a higher bar than the current very broken consent based approach.
All that being said, my own view is that Smart Data will best emerge with an architecture that is human-centric, or as a minimum has ‘human-in the loop’. In practical terms that will mean two main changes from the model deployed in Open Banking.
Firstly I think we will see the emergence of that particular branch of Authorised Third Parties that are acting on a strict fiduciary basis for the individual (historically called the data subject). Neutral and product-centric third parties will continue to exist, but the fiduciary branch will be the real game charger that makes smart data flow. Then secondly, those AI agents mentioned above will show up and drive data to scale. That will be the case because they will tackle thousands of mundane tasks around data management and use that individuals would rarely do on their own.
Let’s conclude this introduction to the subject with a quick illustration of what becomes possible when those personal fidicuary agents are deployed. The visual below illustrates data co-management in a home energy use case, with the new and game changing capability that the individual has very robust data management capabilities on their side.
In this well evolved case, the data types in blue are ‘master data’, that is to say the originator and thus the best source for data exchange scenarios to connect to. As we can see when the customer has the necessary capabilities, they become the very logical master for significant aspects of the data exchange. The ‘person’ is almost always the best positioned by provide master data about themselves. Where the text is in black, that signifies that this data is available for use, but mastered elsewhere. And where the data is in red that signifies that there is a logical reference data-set/ master outside of the customer-supplier relationship that either or both parties can use.
Once we get into the sector specific use cases we will see how the same pattern plays out across all B2C sectors….The best provider of master data on a person is the person (via their agent in practical terms), and the best providers of master data on products/ services/ service delivery, prices, specifications etc are the organisation(s) in question. And the best overall outcome for all parties, including regulators, is when the above operates in a co-managed, mutually-beneficial, transparent relationship.
So hopefully that sets the scene for the smart data story as I see it. Next up - the many problems in the current model to make it clear that retaining the status quo and current architectural approaches is not really a valid option given the goals and aspirations set.