Data Products: Build vs Buy

Adam Brookes

28 April 2025 - 9 min read

Digital Transformation

What is a “Data Product”?

In recent years, the term “data product” has emerged in data strategy circles, especially with the rise of data mesh architecture. A data product is essentially a curated dataset or data service that is treated as a product – meaning it’s designed to be easily consumed, has a clear purpose, and is managed through a lifecycle (with owners, versioning, improvements, etc.).

Data products can be operational, e.g. feeding real-time processes, or analytical, e.g. feeding human analysis or models. For example, an operational data product might be an API providing customer credit scores to be used in loan applications in real-time, whereas an analytical data product could be a cleaned and enriched customer 360 dataset that analysts use to generate marketing insights.

Crucially, data products are owned by cross-functional teams (not just IT) and serve a defined customer, such as an internal user or an application. This approach marks a shift from seeing data as a by-product of applications to seeing data as a first-class product in its own right. In practice, treating data as a product means things like:

documenting what the data contains (metadata)
ensuring its quality and freshness
providing it via convenient interfaces (SQL, API, etc.),
and iterating on it based on user feedback.

Build vs Buy – Developing data products:

Organisations often face the question of whether to build their own data products in-house or to leverage third-party data products (or vendor solutions). Building a data product internally means your team defines the data set, gathers and transforms the data, and provides it to consumers. This gives maximum control and provides a custom fit to your needs.

For example, a UK retailer might build an internal data product of “store footfall and sales forecast” combining CCTV counters, point-of-sale data, and weather data – something unique to their context. On the other hand, buying a data product could mean subscribing to an external data service or purchasing a packaged dataset. For instance, many enterprises subscribe to data products like credit bureaus (for credit scores), market data feeds (for finance), or analytics platforms that come with pre-built data models.

For large enterprises, “buy” might also refer to using packaged analytics solutions that include data – e.g., a Customer 360 platform that provides a model of customer data out-of-the-box. The trade-off often comes down to core competencies and differentiation: if the data product represents proprietary insight or competitive advantage, building in-house makes sense. If it’s a commodity or a common need (like address validation data, or benchmark industry data), buying can save time. Many organisations do a mix: build the internal unique combinations, but enrich with bought data.

Governance and lifecycle of data products:

Whether built or bought, data products require governance akin to software products.

This means assigning ownership, typically a data product owner role, similar to a product manager, often someone in the business who understands both the data and user needs. It also implies lifecycle management, from:

initial design (where requirements of the “users” of the data are gathered), to
development (data engineering to create the pipelines), to
deployment (publishing the data product for consumption), and
continuous improvement (adding new attributes, improving quality, etc.).

For example, an analytical data product “Customer Segmentation Data” might start with basic demographic attributes and later incorporate social media data as the product evolves.

Governance also covers access control (who can use the data product), compliance checks (does it contain personal data and if so, is that handled properly?), and ensuring consistency if multiple data products overlap.

In a data mesh approach, each domain, such as Marketing, Finance, or Supply Chain, might produce its own data products, but there needs to be federated governance to ensure, for instance, that the definition of “customer” is consistent or that data products interoperate.

One approach is the use of data product catalogues – essentially an organised inventory of all data products with meaningful descriptions, so users can discover them and trust them.

Instead of a technical data catalogue that might overwhelm users, a data product catalogue lists products like “Sales Dashboard Dataset – updated daily, owner: Analytics Team, quality SLA 99% complete” and so on, making it clear what is available. This approach has been observed in organisations adopting data mesh, where they present data in a marketplace-style portal internally.

Operational vs Analytical data products:

To clarify the difference, consider a large UK bank. An operational data product could be something like “Fraud scores API” – a real-time service that gives a fraud risk score for a transaction. It’s a data product because it’s based on data and models, packaged behind an API, and has an owner - the fraud analytics team - who ensures it’s working efficiently.

An analytical data product example is “Monthly Customer Profitability Dataset” – a compiled dataset that finance and marketing analysts download or query to do their analysis. It might not be real-time but it’s produced with each month’s data, with known definitions and quality checks, and it’s serving the analytical community.

Both types need reliability, but operational data products often need higher uptime and responsiveness (SLAs on latency), whereas analytical data products emphasise correctness and richness of context, with good documentation.

Examples in practice:

For example a global consumer goods company could implement a data product approach for its sales and marketing data. Instead of each region doing its own data extraction and report building, they could create a standardised data product such as “Global Sales Snapshot” which is made up of a data table updated daily containing key metrics by region, channel, product.

They “productised” it by assigning a product owner from the central analytics team, automating the pipeline, and setting up a help channel for users. Users then no longer had to wrangle data themselves – they had a ready “product” to consume. This is reflective of a wider trend: a Well-governed data product can greatly increase data re-use and efficiency, reducing duplicative work.

On the “buy” side, consider regulated sectors: many UK insurance companies buy data products such as vehicle telematics data or flood risk data to integrate into underwriting. They treat these external datasets as part of their data product ecosystem – for instance, an underwriting data product that merges internal claims history with an external flood risk score per postcode. The interplay of build vs buy is evident here: they build the integration and custom dataset, but buy the specialist data.

Data products and vendor solutions:

Some vendors market “data products” or pre-built analytics solutions. For example, a vendor might offer a Customer Analytics data model that an enterprise can adopt rather than designing their own from scratch.

Large enterprises often evaluate these to accelerate their analytics projects. The key is to ensure alignment with internal definitions and to avoid vendor lock-in on a critical asset. In some cases, buying a data product like a curated dataset (e.g. a market share database for your industry from a research firm) is a better choice, since building it yourself is impossible. In other cases, if it’s your proprietary operational data, you likely need to build or at heavily customise the data product internally.

Governance best practices for data products:

Each data product should have clearly defined SLAs/SLOs (service level agreements/objectives) that cover factors such as:

data latency (data will be no more than 24 hours old),
quality metrics (e.g. 98% of records have complete values on critical fields), and
support procedures (who to contact if something looks wrong).

Many organisations incorporate data products into their Data Governance Councils, meaning that any new data product proposed is reviewed for compliance and value, and its performance is periodically reviewed.

Data products also tie closely to data ownership culture: rather than IT owning all data, the business domain that knows the data best owns the product. For example, HR owns the “Employee Master Data Product”, Finance owns “Financial Actuals Data Product”, etc., with IT providing the tooling and platform support.

Build vs Buy decision factors:

Time to value: Buying an external data product or pre-built solution can be faster, but may not fit all needs; building takes longer but can be more precisely tailored.
Uniqueness: If the data or logic is a source of competitive advantage (e.g. a unique algorithm using your data), build it. If it’s generic (everyone uses it, like compliance data), consider buy.
Cost and maintenance: Building in-house means ongoing maintenance costs, whereas bought products externalise some of that, e.g. subscription fees.
Integration: An internal build can integrate better with your existing architecture. An external product might come with integration adapters but could introduce silos if not careful.
Expertise: Do you have the skills? If not, buying or partnering might be better to ensure quality. Conversely, building can grow internal expertise in important data domains.

Overall, effective data product strategies often involve starting small – identifying a high-value dataset, productising it, demonstrating success, and then scaling the approach to more domains.

The cultural change (people thinking in terms of products and “customers” of data) is as important as the technical. By having data products, organisations prevent the scenario of each analyst or project doing redundant data wrangling. It fosters a “one source of truth” mentality for each important data domain.

Data products can make data easier to use and trust, by packaging it with the user in mind. And with greater ownership in place, quality and reliability tend to improve (because domain teams ensure their data product is up to scratch).

Ebook Available

How to maximise the performance of your existing systems

Free download

Adam is Head of Consulting at Audacia, specialising in delivering advice and strategic roadmaps for the delivery of technology projects across engineering, data, AI and cloud.