What I Learned Building DataOps for a Fortune 20 Retailer
Strategy decisions, execution details, and why DataOps maturity is your AI readiness. From cloud platform commitment and transformation layers to cross-department visibility and the BI layer as an AI launchpad.
TL;DR
Strategic bet: One platform (GCP), one future — BigQuery, Vertex AI, Dataplex, Looker. Migration from Databricks to align data and ML on shared primitives.
Transformation layer: Version-controlled, tested, AI-ready pipelines so the same curated data powers dashboards and ML feature stores.
DataOps → AI Ops: SLO-based observability for pipelines became the blueprint for ML monitoring and feature drift. Your DataOps maturity is your AI readiness.
Cross-department visibility: Enterprise catalog, semantic layer, and lineage so teams can discover, trust, and build on each other’s data — the enabler for real enterprise AI.
BI as AI launchpad: Looker’s semantic model gives the organization one governed language; that’s what makes natural-language analytics and GenAI answers trustworthy.
Data platform architecture: source systems through governance to BI and AI/ML outcomes, with SLO-based observability and cross-system lineage.
There's a moment early in every large platform engagement where you see the full picture for the first time. The ambition of the organization becomes clear, and so does the path to get there. For me, that moment came in week two. I was sitting in a virtual war room with a dozen engineers, and someone walked us through how the supply chain team had built their own ingestion pipeline to keep up with distribution center volume during peak season. It was resourceful. It was fast. And it told me exactly what this organization was capable of when given the right platform underneath them.
Data platforms at Fortune 20 scale are fascinating because the talent is already there. The business knowledge runs deep. The transaction volumes are massive, millions of events daily across thousands of locations. What these organizations need is a foundation that matches their ambition. A foundation that serves today's dashboards while positioning every team for the AI and machine learning capabilities that are already reshaping retail.
I led this engagement end to end. Cloud migration architecture, enterprise cataloging, observability, lineage, BI, and the AI/ML enablement layer that tied it all together. I want to share what the journey actually looked like. The strategy decisions, the execution details, and the moments where getting the foundation right made everything downstream possible.
🎯 The Strategic Bet: One Platform, One Future
The first and most consequential decision was the cloud platform commitment. This retailer had made a deliberate, enterprise-wide investment in Google Cloud. Compute, storage, AI/ML, everything. It was a bold, clear-eyed move, and it shaped every architectural decision that followed.
When your entire infrastructure direction converges on GCP, there's a tremendous opportunity to simplify. BigQuery as the analytical engine. Vertex AI as the machine learning platform. Dataflow for stream processing. Pub/Sub for event-driven architecture. Dataplex for governance. Looker for BI. Each component designed to work together natively, which means less integration overhead and more time spent on what actually matters: building AI and analytics capabilities that move the business forward.
The migration from Databricks was part of this consolidation. Databricks is a strong product, and the team had done impressive work on it. But the strategic direction called for a unified gravity well. One ecosystem where the data engineering team, the ML engineering team, the analytics team, and the governance team were all building on shared primitives and speaking the same architectural language. That alignment is what makes AI operations possible at scale. You simply cannot operationalize machine learning models when the training data lives in one world and the serving infrastructure lives in another.
Strategic clarity first. Execution to match.
Bottom line: One cloud, one stack — GCP end to end. That alignment is what makes AI operations possible at scale.
🔧 Building the Transformation Layer for AI-Ready Data
Something that shaped the entire engagement was the opportunity to build a modern transformation layer from the ground up. The existing SQL-based pipelines had served the organization well for years. POS data flowed, back-office reconciliation ran, supply chain numbers landed where they needed to. That foundation gave us something to build on.
The evolution was about making data AI-ready. Machine learning models are only as good as the features they consume, and generative AI applications need clean, well-documented, trustworthy data to produce meaningful results. That meant building a transformation layer with version control, automated testing, modular design, and full documentation, so that every dataset flowing into a model training pipeline or a retrieval-augmented generation workflow could be traced, validated, and trusted.
We designed the transformation architecture as part of the broader BigQuery migration, establishing patterns that served both traditional analytics and ML feature engineering simultaneously. The same curated datasets that powered executive dashboards could feed feature stores for demand forecasting models. The same data quality tests that ensured accurate financial reporting also guaranteed the integrity of training data for AI applications.
This is what I mean by foundations. When you build the transformation layer with AI in mind from day one, you create optionality. Every new ML use case, whether it's pricing optimization, inventory prediction, or personalized member experiences, becomes an execution challenge rather than a data infrastructure challenge. That's exactly where you want to be.
Insight: One transformation layer for both dashboards and ML. Version control, testing, and documentation from day one so every dataset can be traced and trusted.
🌉 Why DataOps Is the Bridge to AI Operations
This is the part I care most about, because it's where the industry is heading and where this engagement crystallized the thinking for me.
DataOps and AI operations are not separate disciplines. They are the same discipline at different points on a maturity curve. The observability practices you build for data pipelines (freshness monitoring, quality checks, SLO-based alerting) are the same practices you need for ML model monitoring, feature drift detection, and inference pipeline reliability. The lineage you build for regulatory compliance is the same lineage you need for model explainability. The catalog you build for data discovery is the same catalog that helps ML engineers find the right training datasets.
We designed SLO-based DataOps observability across every major retail function: point-of-sale, back-office operations, supply chain, and distribution center integrations. Each function had its own data freshness requirements, its own quality thresholds, and its own definition of "trustworthy." The POS team needed near-real-time reconciliation. The supply chain team needed daily accuracy within tight tolerances. The distribution center integrations had to handle volume spikes during seasonal peaks gracefully.
The insight that changed the game: those same SLO patterns became the blueprint for AI operations. When we later scoped ML model deployment for demand forecasting, the observability framework was already in place. We could monitor feature freshness with the same tools we used for pipeline freshness. We could alert on data drift using the same thresholds we had calibrated for data quality. The DataOps investment paid forward into every AI workload that followed.
If you are a CDO thinking about AI strategy, my strongest advice is this: your DataOps maturity is your AI readiness. Full stop.
Why it matters: Your DataOps maturity is your AI readiness. SLO-based observability for pipelines becomes the same framework for ML monitoring and feature drift.
👁️ Seeing Data Across Departments, the Hardest and Most Important Problem
Let me talk about the challenge that sits at the center of every enterprise data transformation, and especially every AI initiative: achieving true visibility across departments.
A Fortune 20 retailer has data flowing through dozens of functional areas. Merchandising sees products. Supply chain sees movement. Finance sees margins. Store operations sees labor and throughput. Distribution centers see logistics. Each department has built deep expertise in its own data, its own metrics, its own definitions. That specialization is a genuine strength. It is what makes each function excellent at what it does.
The AI opportunity, though, lives in the connections between those domains. Demand forecasting gets dramatically better when it can see supply chain constraints alongside POS velocity alongside promotional calendars. Shrinkage analysis becomes predictive when it can correlate inventory movement patterns with distribution center throughput and store-level receiving data. Generative AI applications, things like natural language interfaces to business data, automated insight generation, and intelligent alerting, need to draw from multiple domains simultaneously to produce answers that are actually useful.
This is exactly where strategy has to come first. Cross-departmental visibility requires an intentional operating model: shared metric definitions, agreed-upon data ownership, governed semantic layers, and executive sponsorship that makes cross-functional data sharing a priority.
We built that operating model alongside the technology. Dataplex served as the enterprise catalog, with every dataset registered, every table owned, every PII column tagged, every quality expectation codified. Looker's semantic modeling layer provided shared business definitions that every department consumed. Cross-system lineage connected the dots from source to dashboard to ML model. And the governance framework made participation the path of least resistance, because cataloging and quality checks were built into the deployment pipeline itself.
For the first time, teams across functions could see each other's data. Discover it, understand it, trust it, and build on it. That visibility is what makes enterprise AI possible. It is what makes generative AI useful rather than superficial. And it only happens when you lead with strategy and follow with execution.
Insight: Cross-department visibility is the enabler for real enterprise AI. Catalog, semantic layer, lineage, and governance make it the path of least resistance.
📊 The BI Layer as an AI Launchpad
We deployed Looker as the primary BI layer, and I want to reframe how people think about that decision, because it goes well beyond a reporting tool migration.
The work was deeply collaborative. We sat with business users across every retail function (POS analytics, supply chain visibility, distribution center throughput, back-office operations) and mapped how they thought about their data. What decisions were they making? What metrics drove action? What time horizons mattered? That understanding became the blueprint for Looker's semantic model, which gave the entire organization a shared, governed language for business metrics.
A well-built semantic layer is the single most important enabler of generative AI in the enterprise. When an LLM can query a curated, governed semantic model, where "shrinkage" means one thing and one thing only, where "distribution center throughput" has a precise, agreed-upon definition, you can build natural language interfaces that produce trustworthy answers. The business user asks a question in plain English, the AI translates it into a query against the semantic layer, and the result is grounded in the same definitions the CFO relies on.
That is the future of enterprise analytics. And it is an architecture decision you make two years before the generative AI use case arrives. The organizations that will move fastest into AI-powered decision-making are the ones building clean semantic foundations right now. This engagement was a proof point.
Takeaway: A governed semantic layer is the single most important enabler of GenAI in the enterprise. Build it before the use case arrives.
👥 The Human Side of Platform Evolution
I want to close with something that matters deeply to me, because it is the part of every engagement that determines whether the technology investment actually lands.
When a Fortune 20 retailer evolves from one cloud platform to another, the technology migration follows a playbook. We ran this one using the same methodology I have applied across AWS and Azure programs. Workload assessment, dependency mapping, cutover sequencing, validation. It is rigorous, it is repeatable, and it works.
The part that requires the most care is the people side. Engineers who have built deep expertise on one platform are learning a new one. Analysts who knew exactly where to find every metric are navigating a fresh interface. Data governance leads who had built hard-won adoption for one catalog's taxonomy are now championing a new system.
That transition deserves respect and real investment. The engineers who know the existing systems most deeply carry the most valuable institutional knowledge, and they are the ones who will ultimately make the new platform sing. Giving people room to learn, celebrating early wins on the new stack, pairing experienced engineers with cloud-native specialists: those investments in people pay back tenfold.
The technology converges faster than the culture does. The organizations that succeed are the ones where leadership treats the skills evolution with the same strategic seriousness as the architecture evolution. Strategy and execution, always together.
Why it matters: Invest in people as seriously as in architecture. Technology converges faster than culture; leadership has to treat both as strategy.
🧱 The Foundation Is the Strategy
Building DataOps at Fortune 20 retail scale taught me that every AI ambition lives or dies on the data foundation underneath it. The generative AI wave is real. The ML operations opportunity is enormous. The potential for AI-powered retail, from smarter demand forecasting to personalized member experiences to intelligent supply chain optimization to natural language analytics, is within reach for every major retailer.
But only if the foundation is right. Only if data is visible across departments, trustworthy by design, observable at every layer, and governed in a way that enables rather than constrains. Only if DataOps and AI operations are treated as one continuous discipline rather than separate initiatives on separate roadmaps.
Every architectural decision we made (Dataplex as the enterprise catalog, Looker as the BI and semantic layer, cross-system lineage, SLO-based observability, BigQuery and Vertex AI as the unified compute and ML platform) was in service of that foundation. The strategy came first. The execution followed. And the AI capabilities that the organization is now building are possible because the foundation was designed to support them from day one.
The most important AI investment a CDO can make today is not a model. It is the data platform that model will depend on tomorrow.
The specifics of this engagement are generalized to respect confidentiality. The principles are universal.
The path from DataOps to AI operations follows the same arc at every enterprise bold enough to walk it.