DATA ANALYTICS - COLLIBRA

Why data quality and data mesh are the future of data By Madalina Tanasie, CTO of Collibra.

Wednesday, 30th March 2022 Posted 2 years ago in by Phil Alsop

Driven by technological advancement and the increasing digitalisation of services across all sectors, our society produces more data than ever. Data has become crucial for enabling valuable business and economics insights, powering vital services, and allowing more strategic decision making – especially now – as the world looks to fight some of the biggest challenges we have faced in decades.

The problem with data, however, is how to manage to make it as useful as possible. Until recently, the way organisations approached it was focused on bringing it into one place, structuring it, and mining it through data lakes and warehouses. But what do you do with this big pool of data once you have it? How do you filter and extract the exact insight you need? All too often people tend to talk about big data as if it is a silver bullet, so companies often expect ‘big data’ to deliver more value than it actually does. The main reason for this discrepancy between expectation and reality is not because of a lack of potential with data – far from it. It is the fact that there is much more to it than just managing its volume – it is about ensuring data quality.

Quality over quantity

One of the biggest misconceptions when it comes to big data is that volume is everything. The reality, however, is different because in many cases you might not even need ‘big’ data to gather great insights. Data can be available in many shapes and sizes, and can be structured in different ways, but the ability to analyse it and make connections between different data sets to enhance your understanding, is what really sets out those who make a good use of it from those who don’t.

Focusing on data quality is key here. It’s one of the biggest contributing factors determining whether big data will deliver big results for the business. It is the fundamental element which ensures that the data organisations use is reliable, accurate, unbiased, and generally fit to spearhead organisations’ efforts in the right direction and enable informed decision making. What’s more, having that trust and confidence in data across the organisation is vital for using data collaboratively and good data quality is an indicator of how quickly one can achieve data-to-value.

Recognising and acknowledging the problem is the first step towards solving data quality issues. The challenge is that data quality can be affected by various factors – from inconsistency, inaccuracy, or biases – to data duplication, the inability to understand it or even lack of data driven culture. It is vital that organisations take a comprehensive approach to understand and overcome the causes of each of those from the moment of data creation to the moment it is used. The whole idea behind data quality management is to connect those two moments in time and ensure that this whole chain is designed and managed to create data correctly, so everything goes well at the moment of use.

The healthcare sector is probably the most prominent example of the significance of data quality. What happens when a patient is taken into hospital for an emergency procedure? In

those cases, healthcare staff must quickly access digital patient records, which provides an overview of any conditions the person might have and helps medical staff assess any potential risks. If the patient data fails to indicate certain allergies or give an accurate picture of any medications the patient has been prescribed recently, the consequences can be critical. Good quality patient data ensures that healthcare staff are equipped with vital and potentially life-saving information that can help them assess the unique healthcare needs of each individual.

Big data’s natural evolution

The industry has been talking about the explosion of data, the inability to find it and analyse it for quite a while now. About twenty years ago, it was considered revolutionary for businesses to put all their data in a data warehouse to make it easier for people across the organisation to find it. Businesses were undertaking long and expensive 2–3-year long programmes, by the end of which data was no longer of any use, because it was already out of date. Fast-forward 10 years and the focus had shifted towards data analytics, and data lakes became the answer to everything. While the technology behind data lakes was improved over time, the philosophy remained the same – to bring all data into one place and structure it, so people could mine it. However, this approach only worked well for businesses until the centralised data model began to fall apart as it failed to respond to the needs of the various sources of data and the variety of ways in which it was being used.

Over time what has been shown to be a much more effective way of managing data, and one that helps prioritise quality, is by following a more decentralised model. A decentralised data architecture model offers a more intelligent approach to managing, governing, tracking, and securing data, so that businesses can derive the right insights from the data they have. Following this natural evolution, we’re now seeing a huge shift across large enterprises towards something called ‘data mesh’. While data mesh is a relatively new concept, it offers businesses a more scalable way of managing data, which avoids the bottlenecks of previous methods of storing data in warehouses and lakes. More and more organisations are now adopting a modern distributed data architecture where teams take ownership of the data they generate, store, and distribute and treat it “as a product” which other domains can leverage through a central data infrastructure. This idea of using data as a product across the organisation also encourages those who are responsible for it to monitor and ensure its quality and reliability for downstream users.

Ultimately, those who want to “go big” on big data nowadays must focus on adopting the best approach that will enable the distribution of quality and trusted data across the organisation. It’s about taking the next step in your data maturity journey to make data more accessible, available, discoverable, secure, and interoperable, in order to unlock new levels of business intelligence.