The Other Catalog War: Governance Platforms and the Two-Layer Architecture

Polaris and Unity Catalog are fighting over the technical catalog. But the governance layer above them is a separate battle, and the one where most enterprises are actually spending money. Atlan, Alation, Collibra, OpenMetadata, DataHub, and the investment framework.
Nidhi VichareApril 16, 2026
18 min read
Nidhi Vichare
Data GovernanceData CatalogAtlanCollibraAlationOpenMetadataDataHubEnterprise ArchitectureMulti-CloudCDO
Get insights delivered
LinkedIn
The Catalog WarsPart 3 of 4
~
The Catalog Wars
1
Catalog Wars Part 1
2
Catalog Wars Part 2
3
Catalog Wars Part 3
4
Catalog Wars Part 4

TL;DR. You Are Making Two Catalog Decisions, Not One.

The technical catalog war (Polaris vs. Unity vs. Glue) and the governance catalog war (Atlan vs. Alation vs. Collibra vs. OpenMetadata vs. DataHub) are entirely different markets with different tools, different budgets, and different buying decisions.

Polaris does not answer "who owns this table?" Unity Catalog partially does. Glue does not even try. The governance layer sits above the technical catalog and answers the questions your business users, compliance teams, and data stewards actually ask: What data do we have? Is it trustworthy? Where did it come from? Does it comply with GDPR? If you are making a catalog decision in 2026, you are actually making two decisions. Treating them as one is how you end up with a governance gap that satisfies no one.

$5.25B
COLLIBRA VALUATION
11.8K
DATAHUB GITHUB STARS
30%
ACTIVE METADATA ADOPTION BY 2026

The Two-Layer Architecture

The modern enterprise data stack has two distinct catalog layers. Confusing them is the most common architectural mistake I see in enterprise data platform reviews.

The Two-Layer Catalog Architecture

Layer 1: The Technical Catalog handles physical metadata. Where is this table? What is its schema? What are its partition statistics? Which engines can access it? This is Polaris, Unity Catalog, Glue, BigLake Metastore, Hive Metastore. These implement the Iceberg REST Catalog spec. They serve query engines. Their users are machines and data engineers.

Layer 2: The Governance Catalog handles business metadata. Who owns this table? What does the "revenue" column actually mean? Where did this data come from, through which transformations, to which dashboards? Is it fresh? Is it accurate? Does accessing it require approval? This is Atlan, Alation, Collibra, OpenMetadata, DataHub. Their users are analysts, stewards, compliance officers, and business teams.

The relationship between the two layers is bidirectional:

  • Upward flow: Technical catalogs feed raw metadata (schemas, tables, partitions, statistics) into governance catalogs via connectors
  • Downward flow: Governance catalogs push enriched context (business descriptions, classification tags, quality scores, access policies) back into technical catalogs

Atlan syncs business context into Unity Catalog. Collibra pushes access policies to AWS Lake Formation. DataHub can even function as an Iceberg REST Catalog itself (beta). The two layers are complementary infrastructure, not competing products.

Cloud-native tools govern their own cloud well and nothing else. The moment you are multi-cloud, multi-platform, or heavily regulated, you need a third-party governance layer.

The vendors that sit in both layers (Databricks with Unity Catalog's Business Semantics and lineage features, Snowflake with Horizon plus Polaris) are attempting to collapse the two layers into one. For some organizations, that works. For most large enterprises with multi-platform, multi-cloud estates, the governance layer remains a separate tool, a separate buying decision, and a separate budget line.


The Governance Catalog Contenders

The Governance Catalog Contenders

The Commercial Three

01 Atlan

The cloud-native challenger. Founded in 2019, Atlan describes itself as an "active metadata platform." Metadata does not just sit in a catalog but actively drives automation, policies, and workflows. Atlan earned Gartner Magic Quadrant Leader status in both Metadata Management (2025) and Data & Analytics Governance (2026, advancing from Visionary to Leader in one year). Forrester named it a Leader with highest possible scores in 15 of 24 criteria.

It deploys in roughly 3 months, connects to 100+ data sources, and runs as cloud-agnostic SaaS across AWS, Azure, and GCP. Funding: $206 million at a $750 million valuation.

Key differentiator: Speed and automation. Its weakness: smaller revenue base than competitors and less battle-tested in heavily regulated industries.
02 Alation

The discovery champion. Founded in 2012, Alation pioneered behavioral intelligence: tracking which datasets analysts actually query to identify trusted, certified data assets. Gartner has named it a Leader five consecutive times. It recently pivoted aggressively toward agentic AI, acquiring Numbers Station AI in May 2025 to build autonomous agents for structured data workflows.

Enterprise AI outputs became 30-60% more accurate when backed by Alation's governed metadata context. It supports 70+ connectors and offers bidirectional metadata sync with Unity Catalog. Pricing starts at roughly $198,000/year for 25 users.

Key differentiator: The most intuitive discovery UX in the market. Its weakness: deployment takes about 5 months, and deep governance and end-to-end lineage lag behind competitors.
03 Collibra

The governance heavyweight. Founded in 2008, Collibra is the longest-tenured commercial data catalog, valued at $5.25 billion, with roughly $596 million in total funding. Forrester says Collibra "sets the standard" for data marketplace capabilities. It acquired Raito in June 2025 for data access governance, enabling unified access policy enforcement across Snowflake, Databricks, AWS, Azure, and GCP.

It offers the most mature stewardship workflow engine in the market: configurable role-based responsibilities, classification rules, and compliance policies in a single system. Its AI governance features include ISO 42001 compliance and EU AI Act assessment tooling.

Key differentiator: Deepest compliance and stewardship capabilities. Its weakness: complexity. Deployments take 6-12 months. TCO runs significantly higher than base licensing when you include professional services and customization. It is the right tool for regulated enterprises and the wrong tool for agile teams.

The Open-Source Two

04 OpenMetadata

The all-in-one open-source option. Created by the team behind Apache Atlas, Apache Kafka, and Uber's data platform (Suresh Srinivas and Sriharsha Chintalapani). Open-sourced in 2021 under Apache 2.0. Its key differentiator is built-in native data quality: close to 30 table and column tests, no-code test creation, alerting, and resolution workflows. No other open-source catalog offers this without external tools.

Architecture is deliberately simplified: MySQL/PostgreSQL plus Elasticsearch, no graph database, no Kafka dependency. That simplicity makes it operationally lighter than DataHub. Native connectors to Unity Catalog, AWS Glue, Iceberg, and 80+ other sources. Roughly 10,800 GitHub stars, 433 contributors. The commercial company Collate offers a managed version and raised a $10 million Series A in July 2025.

Key differentiator: Built-in data quality and simpler operations. Best for mid-market teams that want an all-in-one architecture without the operational overhead of Kafka and graph databases.
05 DataHub

The largest open-source catalog community. Originally built at LinkedIn, open-sourced in 2020, commercially backed by Acryl Data ($65 million total funding). It has 11,800 GitHub stars, 733 contributors, and a 10,400-member Slack community. Its landmark feature: DataHub can function as an Iceberg REST Catalog itself (beta), blurring the boundary between governance catalog and technical catalog.

It supports push-based real-time metadata capture via Kafka, enabling event-driven automation that pull-based tools cannot match. The trade-off: operational complexity. DataHub requires Kafka, a graph database (Neo4j or JanusGraph), Elasticsearch, and MySQL/PostgreSQL. That is 4-5 infrastructure components versus OpenMetadata's 2-3.

Key differentiator: Real-time event-driven metadata and the Iceberg REST Catalog capability that blurs the governance/technical catalog boundary. Best for organizations that need the largest community and real-time metadata flows.

The Legacy and Niche

Apache Atlas survives in Hadoop environments (roughly 5.7% market mindshare) but is not recommended for greenfield deployments. Amundsen (Lyft) is effectively dormant; last meaningful release was August 2024. Marquez (WeWork/Datakin) lives on as the reference implementation of the OpenLineage standard, which has become the de facto industry standard for lineage interoperability, adopted by Airflow, Spark, dbt, Google Dataproc, and Microsoft.


What Governance Catalogs Provide That Technical Catalogs Do Not

This is the section that answers the "why invest?" question.

Business Glossary

"Revenue" means the same thing whether it lives in Snowflake, Databricks, or a CSV file. Technical catalogs do not have business glossaries. Governance catalogs map standardized business terms to technical assets across platforms.

Cross-Platform Lineage

Technical catalogs track lineage within their ecosystem. Governance catalogs stitch lineage end-to-end: S3 to Airflow to Snowflake to dbt to Tableau to Slack. That full lineage is what compliance teams need for GDPR data subject requests and what analysts need for impact analysis before changing a schema.

Data Quality and Observability

Monitoring freshness, completeness, validity, and anomaly detection across all data sources, not just one platform. OpenMetadata has this built in. DataHub aggregates quality signals from Monte Carlo, Soda, and Great Expectations. Commercial tools integrate with all of the above.

Data Marketplace

Curated, discoverable, self-service access to trusted data products with request-and-approval workflows. Collibra leads here. AWS DataZone offers a basic version. The marketplace model is how large enterprises scale data access without creating a bottleneck at the data engineering team.

Collaboration and Tribal Knowledge

Annotations, certifications, data owner contacts, Slack and Teams integration, Q&A threads on datasets. None of the technical catalogs offer this. It is the difference between a catalog that machines use and a catalog that humans use.

Compliance Automation

GDPR, CCPA, HIPAA, SOX, EU AI Act compliance with automated classification, PII detection, audit trails, and policy enforcement. Eight new US state privacy laws took effect in 2025, with three more in 2026. The regulatory surface area is expanding faster than most enterprises can manually govern.


The Cloud-Specific Reality

Each cloud has its own native discovery and governance layer that sits between the technical catalog and third-party governance tools. Understanding these layers is essential before deciding what to invest in.

On AWS: Glue Data Catalog is the base. Lake Formation adds fine-grained access control. Amazon DataZone adds business discovery, data products, and automated access provisioning. But Glue has no lineage, no quality profiling, no collaboration features, no business glossary, and no cross-cloud anything. DataZone fills some gaps but remains AWS-only. Third-party tools fill the rest.

On Azure: Microsoft Purview Unified Catalog (rebranded from Azure Purview in 2022, with a major Unified Catalog experience shipping September 2024) is the most integrated cloud-native governance tool. New pricing model (January 2025) charges only for governed assets, not scanned assets, making it significantly cheaper than competitors for organizations that scan broadly but govern selectively. Deep Fabric integration. Uses OpenLineage for Databricks lineage. But column-level lineage is limited for many sources, and non-Microsoft ecosystems require supplementary tools.

On GCP: Google Dataplex Universal Catalog (renamed Knowledge Catalog in April 2026) provides discovery, glossary, lineage, and quality for BigQuery, Cloud Storage, Spanner, and Vertex AI. BigLake Metastore handles the Iceberg REST technical layer. But lineage retention is only 30 days, policies do not propagate to non-GCP systems, and multi-cloud is not supported.

The pattern is consistent: cloud-native tools govern their own cloud well and nothing else. If you are single-cloud with low regulatory burden, cloud-native tools may be sufficient. The moment you are multi-cloud, multi-platform, or heavily regulated, you need a third-party governance layer.


The Investment Decision: A Framework

Governance Catalog Investment Matrix

01 When Cloud-Native Tools Are Enough

You are on a single cloud. Your data team is under 50 people. Your regulatory burden is moderate. Your governance requirements are met by platform-native access controls and basic discovery. You do not need cross-platform lineage.

Cost: Near-zero on the governance layer. Deploy in weeks. This is the right answer for many mid-market organizations and startups.
02 When Open-Source Is the Right Investment

You have strong engineering talent and a platform team that can maintain Kubernetes infrastructure. You need lineage, quality monitoring, and discovery beyond what cloud-native tools provide, but you are cost-sensitive or value control over vendor dependency.

Choose OpenMetadata if you want built-in data quality, simpler ops (no Kafka or graph database), and a unified all-in-one architecture. Best for mid-market teams.

Choose DataHub if you need real-time event-driven metadata, the largest open-source community for support, or the Iceberg REST Catalog capability that blurs the governance/technical catalog boundary.

The honest TCO: Open-source catalog tools are free to license but not free to operate. A team of 2-3 engineers maintaining a DataHub deployment (Kafka + graph database + Elasticsearch + MySQL) can cost $300-500K/year in loaded salary. OpenMetadata's simpler architecture reduces this but does not eliminate it. Factor in infrastructure costs ($200-500/month for mid-sized deployments), upgrade maintenance, and connector development for unsupported sources.
03 When Commercial Is the Right Investment

You are multi-cloud. You have more than 50 data users. You operate in a regulated industry. Your compliance requirements include automated classification, audit trails, and stewardship workflows. Your business users need polished UX and onboarding, not a platform team-maintained tool.

Choose Atlan if speed-to-value matters (3-month deployment), you are on a modern data stack, and you want active metadata automation. Best for agile, cloud-native teams. Starting cost: $20-50K/year for small teams, scaling to $200K+ for enterprise.

Choose Alation if data discovery and search are your primary use case, you have analytics-heavy teams, or you want agentic AI for natural language data workflows. Starting cost: $198K/year.

Choose Collibra if you are in financial services, healthcare, insurance, or pharmaceuticals. If your compliance requirements include SOX, HIPAA, GDPR Article 30 records, or EU AI Act assessment. If you need a data marketplace. If you need the most mature stewardship workflow engine in the market. Starting cost: $170K/year base, but expect TCO 2-3x that with professional services.

The Multi-Cloud-Specific Play

For organizations running workloads across two or more clouds, the governance catalog decision is separate from the cloud-native and technical catalog decisions. The pattern:

  1. Technical layer: Polaris or Unity Catalog as the strategic technical catalog (per the catalog wars analysis)
  2. Cloud-native layer: Glue/Lake Formation on AWS, Purview on Azure, Dataplex on GCP for platform-specific governance
  3. Federation layer: Apache Gravitino to federate metadata across technical catalogs
  4. Governance layer: Atlan, Collibra, or DataHub (managed) as the unified business catalog across all clouds

This is a four-layer stack, and yes, it is complex. But it is the honest architecture for a large enterprise with multi-cloud workloads, multiple query engines, regulatory compliance requirements, and hundreds of data consumers who need to find and trust data without asking an engineer.


Five Predictions for the Governance Catalog Market

1. The governance catalog market consolidates to three commercial vendors and two open-source projects by 2028. Atlan, Alation, and Collibra survive and grow. OpenMetadata and DataHub are the open-source survivors. Smaller players (Stemma/Amundsen, Apache Atlas for new deployments, Marquez as a standalone tool) fade. The market is too mature for new entrants and too competitive for subscale players.

2. DataHub's Iceberg REST Catalog capability is the most strategically important open-source feature shipped in 2025. If a governance catalog can also function as a technical catalog, the two-layer architecture collapses for smaller organizations. This does not threaten Polaris or Unity at enterprise scale, but it changes the economics for mid-market companies who do not want to maintain both layers.

3. The governance catalog market exceeds $3.5 billion by 2030. Gartner's return to the Metadata Management Magic Quadrant in 2025 after a 5-year hiatus, with the framing "no metadata, no AI," signals that metadata management has moved from nice-to-have to core infrastructure. Active metadata adoption will reach 30% of organizations by 2026, per Gartner.

4. Agentic AI capabilities become the primary differentiator by 2027. Alation's acquisition of Numbers Station, Atlan's MCP Server and AI Governance Studio, and Collibra's GenAI-powered descriptions are early moves. By 2027, the governance catalog that best enables AI agents to discover, reason about, and govern data access will be the one that wins enterprise adoption.

5. Open-source governance catalogs will be acquired by technical catalog vendors. Databricks acquiring OpenMetadata or Acryl Data (DataHub), or Snowflake doing the same, would be the natural endgame for collapsing the two-layer architecture. The Tabular acquisition precedent ($2 billion for a 40-person company) suggests these are not implausible price points.


The Bottom Line

The catalog war you have been reading about (Polaris versus Unity versus Glue) is the technical catalog war. It determines where your tables live and which engines can read them.

The governance catalog war (Atlan versus Alation versus Collibra versus OpenMetadata versus DataHub) determines whether your business users can find data, your compliance team can audit it, and your AI agents can trust it.

You need both layers. The question is how much to invest in each.

If you are single-cloud with moderate governance needs, cloud-native tools plus an open-source catalog may be enough. If you are multi-cloud with regulatory complexity, you are spending $170K-$500K+ per year on a commercial governance catalog whether you planned to or not, because the alternative is manual stewardship at a scale that does not work.

Treating the technical catalog and governance catalog as one decision is how you end up with a technical catalog that satisfies your data engineers and a governance gap that satisfies no one else.

The architecture decision your team should be making right now: which technical catalog is your strategic layer (Polaris or Unity), and which governance catalog sits above it (commercial or open-source, and which one). These are two decisions, not one.


In 2026, every enterprise is making a catalog decision. The ones who recognize it is two decisions will build the right architecture. The ones who do not will be back at the drawing board in eighteen months.

Pick your technical catalog. Pick your governance catalog. They are different tools for different problems, and they belong on different budget lines.


Technology Reference

A quick reference for every technology, project, and standard mentioned in this post.

Governance Catalogs (Commercial)

Technology What It Is Link
Atlan Cloud-native active metadata platform. Gartner Leader in Metadata Management (2025) and Data & Analytics Governance (2026). 3-month deployment, 100+ connectors. $750M valuation. atlan.com
Alation Data intelligence platform. Pioneered behavioral intelligence for data discovery. Gartner Leader five consecutive years. Acquired Numbers Station AI (May 2025) for agentic data workflows. alation.com
Collibra Enterprise data intelligence platform. Longest-tenured commercial catalog (founded 2008). $5.25B valuation. Most mature stewardship workflow engine. ISO 42001 and EU AI Act assessment tooling. collibra.com

Governance Catalogs (Open-Source)

Technology What It Is Link
OpenMetadata All-in-one open-source data catalog (Apache 2.0). Built-in native data quality tests. Simplified architecture (MySQL/PostgreSQL + Elasticsearch). Created by the team behind Apache Atlas and Uber's data platform. Commercial offering via Collate. open-metadata.org
DataHub Open-source metadata platform (Apache 2.0). Originally built at LinkedIn. Largest open-source catalog community (11.8K GitHub stars, 733 contributors). Can function as an Iceberg REST Catalog (beta). Commercial offering via Acryl Data. datahubproject.io
Apache Atlas Legacy open-source catalog for Hadoop environments. ~5.7% market mindshare. Not recommended for greenfield deployments. atlas.apache.org

Cloud-Native Discovery and Governance

Technology What It Is Link
Amazon DataZone AWS service for business data discovery, data products, and automated access provisioning. Sits above Glue Data Catalog. aws.amazon.com/datazone
AWS Lake Formation Fine-grained access control layer on top of Glue Data Catalog. Row-level and column-level security for S3 data. aws.amazon.com/lake-formation
Microsoft Purview Unified data governance for Azure and multi-cloud. Discovery, glossary, lineage, classification. Deep Fabric integration. learn.microsoft.com/purview
Google Dataplex GCP data governance and management. Universal Catalog (renamed Knowledge Catalog, April 2026) provides discovery, glossary, lineage, and quality for BigQuery and Cloud Storage. cloud.google.com/dataplex

Technical Catalogs (referenced from Part 1)

Technology What It Is Link
Apache Polaris Open-source Iceberg REST catalog (Apache TLP, Feb 2026). Vendor-neutral standard. polaris.apache.org
Unity Catalog Databricks catalog. Multi-format governance with AI asset management. unitycatalog.io
AWS Glue AWS-native metastore. 39.3% adoption (market leader). aws.amazon.com/glue
Apache Gravitino Federated "catalog of catalogs" (Apache TLP, June 2025). gravitino.apache.org

Standards

Technology What It Is Link
OpenLineage Open standard for lineage interoperability. Adopted by Airflow, Spark, dbt. LF AI & Data Graduate project. openlineage.io
Iceberg REST Catalog API Universal catalog interface spec. De facto standard for all catalog interoperability. iceberg.apache.org/spec
The InferenceStay Connected
Enterprise AI strategy, data architecture, and the leadership decisions that drive measurable business lift.
Or follow on LinkedIn →No spam. Unsubscribe anytime.