From Data Lake to Data Mesh

Reading Time: 14 Minutes

At Bungee Tech, we collect, cleanse, enrich, and index retail data using our patented data collection platform.

Our state-of-the-art ML system creates a retail market map, provides rich insights, and performs autonomous actions. These are exposed using our well-architected SaaS platforms (embracing 6 pillars of architecture and clean code architecture).

We host a wide array of key datasets in the retail vertical:

  • Analytics/actions for global pricing and promotion
  • Global product catalog and assortment
  • Global brand catalog
  • Global category catalog
  • Inventory and availability

These retail domain datasets are rich in business value and are truly big data in nature.

What is a data lake?

A data lake is a centralized, domain-agnostic data persistence architecture that allows you to store structured and unstructured data at scale. It separates storage, and computes to scale for huge volume, and accommodates varied load and access patterns – all at a reduced cost.

What is data mesh?

Data mesh is an industry-leading approach to data management. It defines a clear domain-based design paradigm to group and manage datasets ownership. Data mesh treats datasets like product – all powered by a self-serve data platform and governed by a federated governance mechanism to effectively scale the data operations of an analytics organization.

Our Data Journey: The Beginning

We started by hosting datasets in a data lake, which provided immediate benefits:

» Flexibility – hosting structured, unstructured, and/or semi-structured datasets in a centralized lake.

» Viability  – separating storage and compute to accommodate different usage and load patterns across the organization.

» Availability – executing as a fast-paced startup with incredible cost benefits compared to the previous generation enterprise data warehouse architecture, solutions, and tools.

We started with a highly decentralized execution model that helped us move fast and rollout tons of advanced capabilities in a short period of time.

But it came with a few problems: data duplication, source proliferation, data quality and integrity divergence between related sources and a bunch of domain agnostic data ownerships. We quickly identified these issues and consciously created focus groups that followed a loose domain ownership model. The split of these focus groups were based on the “data pipeline architecture/organization model”.

As highlighted in the data mesh paper, the above pipeline architecture/org structure might appear to be an effective ownership model initially. However, in practice, all the focus groups must work to launch even very small, new functionality. This created a siloed hyper-specialized data platform team with very little understanding of the source domains that generate the data. They lack the domain expertise of the analytics consumption teams that they cater analytics to. This limited our ability to achieve our ideal speed and scale.

Data lakes are no longer the centerpiece of the overall architecture of a matured analytics ecosystem. The data lake architecture fails to gracefully accommodate changes in the data landscape and leads to proliferation of sources of datasets within the organization and impedes the speed of response to change.

The Present & The Future

To provide a truly decentralized architecture that avoids the above mentioned issues, we came to a conclusion that data mesh is the right data architectural and organizational pattern. Data mesh fits Bungee Tech’s needs in the short and long term.

“At Bungee Tech, we’re happy to have business and tech alignment in our core operating model. We have a data-oriented strategy where we are convinced beyond doubt that quality data, ML, and advanced analytics form our strategic differentiator in the market, explains the company’s CTO, Venkat PK. He continues that the company’s executives are “spearheading data maturity models within the organization and have a long-term commitment to invest in advanced architectural/organization transformations like data mesh in the right form and shape.”

Talk to Our Team

4 Pillars of Data Mesh @ Bungee Tech

1. Domain – Domain oriented data decomposition and ownership

The entire data ecosystem is grouped and tagged to source-oriented domain data, consumer-oriented domain data, or shared domain data. In the process, we have domain-based data ownerships. There are clear rules on who should own any new dataset requirements in the organization. This process stops any inefficient data set proliferation in the organization.

2. Data as a Product – Data and product thinking convergence

Data is given due respect, wherein data is treated as a product. It is assigned to the proper domain using a clear rule and is properly addressable and discoverable. Its structure, both logical and physical, must be defined with utmost discipline. Lineage of data and transformation rules must be defined and maintained. Quality rules, thresholds on breach of these rules, and related alarms become first-hand citizens to make the data truthful and trustworthy. E2E operational aspects like any changes in statistical shape of data, observability, freshness, retention, dev ops are key aspects to be mandatorily defined and structured for every data product. Security of the data with proper classification and related treatment such as encryption, global access control is mandated.

With this, the key focus in Bungee Tech shifts to the data within a domain. The pipelines become the data product’s internal implementation.

3. Data Platform – Data and self-serve platform design convergence

At a physical layer, data mesh’s self-serve data platform provides access to scalable polyglot data storage, data products schema, data pipeline declaration and orchestration, data products lineage, compute and data locality, etc.

At a logical level, there are proposals in the data mesh paper to have a multi-plane architecture that includes layers like data infrastructure provisioning plane, data product developer experience plane and data mesh supervision plane to name a few.

At Bungee Tech, we use our existing cloud service capabilities to drive the platform aspect of this transformation for now. But, in the months to come, based on our experience in the transformation process, we are motivated to make the right investments at a platform level to facilitate this transformation without any friction.

4. Federated computational governance – Make decentralization work efficiently

Data mesh completely decentralizes the governance aspect of the data as a product. It relies on federated custodian of data governance by domain owners. The domain owners define how to model data quality, data security/monitoring, model polysemes, reliability, and operational excellence of data as a product.

Despite such localized decision making and autonomy, they need to comply with the standard defined by the global federated governance team and automated by the platform.

Bungee Tech has created domain-based ownership and key point of contacts in each of these domains to put together a global federated governance.

We’ll keep maturing and transforming these pillars.

The Data Mesh Pitfall to Address

Data mesh tries to address most of the pitfalls associated with decentralized architectures via the power of a matured data platform. But, building and embracing such platform capabilities can take some time. The challenge in decentralizing specialized roles (data engineers, data scientists, etc.) based on domains in an organization limits communication and coordination in specialized job families.

It reduces opportunities for collaborative learning and structuring a proper growth path for these specialized roles. This could eventually lead to poor data standards and reduce the pace of execution of data related problems without organizational maturity. Bungee Tech is cognizant of these key issues with data mesh when it’s not backed by a full-fledged data platform and is working on an operating model to address this.

Request Your Insights »

References:

Asset 2@2x22
National Grocers Association, Flexible Pricing, Pricing Execution, Pricing Leaders
News
Reading Time: 8 Minutes

6 Reasons Grocery Retailers Need Competitive Intelligence

With razor-thin margins and customers always hunting for the best deal, relying on manual price monitoring and guesswork is not enough. To stay

Read
price optimization checklist, retail intelligence, retail analytics
News
Reading Time: 11 Minutes

Choosing Pricing Analytics and Optimization: 7 Essential Questions to Ask 

Choosing the right pricing analytics and price optimization platform is more than just a smart move — it’s essential for staying competitive and

Read
News
Reading Time: 8 Minutes

Buy vs. Build: Price Optimization Solution

Should I build or buy a pricing platform that empowers your teams to execute on strategies that increase margin, profit, and growth?

Read
News
Reading Time: 13 Minutes

How Data Harmonization Gives Retailers a Competitive Edge

Data harmonization is a strategic differentiator. It transforms your data into uniform, timely, and useful business insights.

Read
missing link to price optimization, what retailers need for price optimization
News
Reading Time: 9 Minutes

The Missing Link to Successful Price Optimization

AI-powered price optimization is a breakthrough for retailers seeking a competitive edge, revenue growth, and improved margins. When our team set out to build the best retail price

Read
category management, category analysis, product matching, data, retail data
News
Reading Time: 11 Minutes

Uncover Hidden Profits with Powerful Category Analysis

Online shopping now accounts for 11.6% of grocery sales and is predicted to make up 25% by 2030. Read on to uncover hidden

Read
promotion management, price monitoring, retail pricing, price benchmark
News
Reading Time: 8 Minutes

Machine Learning Reshapes Retail Price Optimization

Price optimization is the process of setting the optimal price for each product. Read this blog post to learn more.

Read
retail performance, retail data, retail tech, machine learning, data-driven
News
Reading Time: 10 Minutes

Retail Performance Management Health Check

Is your business operating ahead of the competition? Answer the questions in our health check to see areas to boost revenue operations.

Read
News
Reading Time: 9 Minutes

How Product Matching Improves Competitor Analysis and Boosts Sales

2.14 billion people shop online and spend $1.03 trillion every year. Retailers recognize the strategic significance of this group. And that online shoppers

Read

The Latest Insights – Straight to Your Inbox

Sign up for the Bungee Tech mailing list for actionable strategies, upcoming events, industry trends, and company news.