Picking Your Catalog: The Contenders, the Bet, and the Road Ahead

Three contenders for your enterprise catalog decision: Polaris, Unity Catalog, and Glue. Each structurally compromised differently. The convergence boundary where lock-in actually lives, a defensible bet, four decisions, and a three-year prediction timeline.
Nidhi VichareApril 16, 2026
23 min read
Apache PolarisUnity CatalogAWS GlueData ArchitectureCDOEnterprise AIData Strategy
Get insights delivered
LinkedIn
The Catalog WarsPart 2 of 4
~
The Catalog Wars
1
Catalog Wars Part 1
2
Catalog Wars Part 2
3
Catalog Wars Part 3
4
Catalog Wars Part 4

TL;DR. The Catalog Decision Framework.

Three contenders. One convergence boundary. A defensible bet with caveats. Four decisions every architect must make. A three-year timeline that will reshape enterprise data infrastructure.

Polaris, Unity Catalog, and Glue are the three serious options. Each is structurally compromised differently. The technology converges where vendors cannot monetize; it diverges where they can. The bet: Polaris for the standard, with caveats. Its Apache TLP governance structure is harder to replicate than Unity Catalog's feature lead is to close. But there is a scenario where that bet is wrong, and it should be named. Four architectural decisions must be made in the next twelve months regardless of which catalog you choose.

3
CONTENDERS
4
DECISIONS
10
PREDICTIONS

In Part 1, we established that the catalog is where lock-in lives. The format war is over. Both creators said so. The convergence is shipped code. The catalog decision carries 10x the strategic weight of the format decision because it controls governance, engine compatibility, semantics, and AI context.

Now the harder question: which catalog?


The Three Contenders, Honestly

There are three serious options on the table for an enterprise catalog decision in 2026. Here is each characterized as honestly as I can, including where each is strong and where each is structurally compromised.

The Catalog Ecosystem in 2026

The Technical Catalog Contenders

01 Apache Polaris

Apache Polaris graduated from the Apache Incubator to Top-Level Project status on February 19, 2026, with 27 +1 votes and zero objections. That is the most important fact to internalize about Polaris. It is no longer "Snowflake's catalog." It is an Apache TLP with independent governance, monthly release cadence, and a contributor base that includes Dremio (whose Project Nessie is being merged in), AWS, Confluent, Google Cloud, Microsoft Azure, Salesforce, Alation, Atlan, Collibra, dbt Labs, Immuta, and Fivetran. Community contributors include engineers from Stripe, IBM, Bloomberg, and Uber.

In 18 months of incubation, Polaris shipped 6 releases, attracted roughly 100 contributors, and closed over 2,800 pull requests. The PMC includes engineers from 8+ companies. The 1.0 release (July 2025) was the first production-ready version, with full Iceberg REST Catalog API, fine-grained RBAC, credential vending for S3/GCS/Azure, built-in table maintenance (compaction, snapshot expiry), and Kubernetes-native deployment.

Its strongest argument is the governance structure of the project itself. The April 8, 2026 Snowflake announcement of governance portability via Polaris is structurally important because it commits Snowflake to making access policies portable across systems through Polaris rather than locked inside Snowflake's proprietary Horizon Catalog. The technical mechanism (policy exchange standards, governance federation, read restriction APIs) is designed to let one system enforce policies authored by another. That is closer to true vendor neutrality than anything Unity Catalog currently offers.

Its structural weakness is the AI governance and semantic gap. Polaris does not yet have an answer that matches Unity Catalog Business Semantics. Polaris does not yet have a coherent story for governing AI agents and tools. Generic Tables went GA in version 1.3, which is a meaningful step toward format neutrality, but the AI governance gap is real.

Strongest argument: Apache TLP governance with contributors from every major cloud, BI vendor, and governance vendor. Governance portability via policy exchange standards.
02 Unity Catalog (Databricks)

Unity Catalog is the most operationally mature option. First announced in May 2021, GA since mid-2022, and open-sourced under Apache 2.0 in June 2024 (donated to LF AI & Data Foundation). Over 10,000 enterprises run it in production.

It supports multi-format (Delta, Iceberg via UniForm and native REST API, Hudi, Parquet, CSV, JSON), implements the Iceberg REST Catalog API (read GA, write Public Preview as of mid-2025), and has been battle-tested at Databricks customer scale for years.

Its strongest argument is that it is genuinely multimodal. It governs tables, files, ML models, notebooks, AI tools and functions in a single namespace. Business Semantics GA. Tool Catalogs for generative AI agents. Automated column-level lineage. If your organization is building agentic AI on top of a lakehouse, Unity Catalog has the most coherent end-to-end story right now.

The Tabular acquisition (June 4, 2024, reportedly $2 billion) is the subplot that matters. Databricks now employs the original creators of both Delta Lake and Apache Iceberg. No other company has this depth across both formats. This is not trivia. The convergence thesis (driving Delta and Iceberg toward a shared specification) is only possible because Databricks has the credibility and expertise across both communities.

Its structural weakness is provenance. Unity Catalog OSS exists, but Databricks remains the dominant maintainer and the commercial managed service is what most organizations actually consume. Choosing Unity Catalog means betting that Databricks will remain a benevolent steward of an open standard while also being the dominant commercial implementation. That bet has worked in the past for Databricks. It may continue to work. But it is a bet, and it should be named as one.

Strongest argument: Genuinely multimodal governance across tables, ML models, AI tools, and business semantics. 10,000+ enterprise deployments and operational maturity no competitor matches.
03 AWS Glue

AWS Glue is the catalog most organizations are actually running today, often by accident. Launched in August 2017, it leads catalog adoption at 39.3% per the 2025 ecosystem survey. If your data lives in S3, Glue is what your AWS-native services already talk to: Athena, EMR, Redshift Spectrum, QuickSight, SageMaker, Glue ETL.

It supports Iceberg, Delta, and Hudi. It now exposes an Iceberg REST endpoint. Automated Iceberg maintenance (compaction, storage optimization, statistics) is built in. The free tier covers the first 1 million objects and 1 million requests per month, making it essentially free for many organizations.

Its strongest argument is gravity. It is already there. Migrating off it is real work. For organizations whose workloads are AWS-native with relatively coarse-grained governance, Glue is the path of least resistance.

Its structural weakness is everything else. Glue is AWS-only, with zero deployment options outside AWS. Even the REST endpoint requires SigV4 authentication tied to AWS IAM. No built-in lineage (requires DataZone, a separate service). No AI asset governance. No semantic layer. No business glossary. Single-level namespaces only. The catalog federation play (November 2025) lets Glue access tables from Unity and Snowflake Horizon, but it positions Glue as a consumer of other catalogs, not a strategic governance layer.

Amazon S3 Tables (December 2024) is AWS's boldest move: embedding Iceberg directly into the storage layer, claiming 3x query performance and 10x higher TPS. But it is AWS-only, not open-sourced, and does not support Delta Lake.

The honest framing: Glue was not designed to be the strategic governance layer for a multi-engine, multi-cloud, multi-modal future. Treating Glue as the strategic answer is treating a tool as a strategy.

Strongest argument: Already deployed. 39.3% adoption. Free tier. Deep AWS-native integration. For AWS-only shops with simple governance, it works.

The Adjacent Contenders

Apache Gravitino graduated as an Apache Top-Level Project in June 2025 and positions itself as the "Catalog of Catalogs," a federated metadata lake that manages metadata in place across heterogeneous sources. Its 2026 roadmap includes Model Catalog for AI, MCP Server for AI agents, and unified RBAC across catalogs. If the catalog future is federated, Gravitino is the federation layer to watch.

Lakekeeper is a Rust-native Iceberg REST catalog implementation: single-binary, fast startup, low memory footprint. Recognized by ClickHouse documentation as a supported catalog. For organizations wanting a lightweight, self-hosted REST catalog without the weight of Polaris or Unity, it is a viable option.

Microsoft OneLake provides Iceberg REST read APIs within Microsoft Fabric. Google BigLake Metastore is GA with full Iceberg REST support. Both are cloud-native options tightly coupled to their respective platforms.

But the realistic strategic choice for most enterprise architects in 2026 is between Unity Catalog and Polaris, with Glue as the incumbent that gets quietly preserved or quietly migrated off.


The Convergence Boundary: Where Lock-In Actually Lives

This is the section no vendor wants you to read.

The technology converges where vendors cannot monetize. It diverges where they can. Understanding this boundary is the single most important insight for making a catalog decision.

Convergence Map: What Is Converging and What Is Not

The technology converges where vendors cannot monetize. It diverges where they can. Understand that boundary and you understand where lock-in actually lives.

What Has Converged

On-disk data representation: CONVERGED. Both formats store Apache Parquet. Shared VARIANT type, deletion vectors, geospatial types, all standardized at the Parquet level. The data files are becoming identical.

Schema evolution: CONVERGED. Delta adopted field IDs from Iceberg. Both support add, drop, rename, reorder without breaking readers.

Catalog API: CONVERGING. The Iceberg REST Catalog spec is the de facto universal interface. Polaris is the reference implementation. Unity Catalog implements it. Glue implements it. BigLake implements it. Even non-Iceberg catalogs adopt it for interoperability.

Metadata architecture: CONVERGING SLOWLY. Delta's log-replay and Iceberg's metadata tree are meeting in the middle. Full convergence is 2-3 years away, pending Iceberg v4's adaptive metadata tree proposal.

What Is Not Converging

Governance models: NOT CONVERGING. Every catalog has its own RBAC implementation, its own policy model, its own credential vending approach. Snowflake's April 2026 governance portability announcement is the first serious attempt at making governance portable, but it is early-stage and unproven. There is no industry standard for catalog governance interoperability. This is the deepest lock-in vector.

AI asset governance: NOT CONVERGING. Unity Catalog governs ML models, feature tables, AI tools, and agent functions. Polaris does not. Gravitino is building model catalog support. There is no standard for how AI assets should be cataloged, governed, or discovered across platforms. This is a greenfield where the first mover with broad adoption will set the de facto standard.

Semantic layers: NOT CONVERGING. Unity Catalog Business Semantics, Snowflake's semantic interoperability, dbt's semantic layer, and BI-tool-native semantics are completely different approaches. There is no open standard for catalog-level semantic definitions. Whoever wins this layer wins where business logic lives, and the relationship with the business, not just IT.

Operational maintenance: ACTIVELY DIVERGING. AWS S3 Tables embeds maintenance into the storage layer. Polaris TMS runs maintenance as a catalog service. Unity Catalog's predictive optimization runs as a Databricks-managed service. These are fundamentally different architectures, and they are diverging because each vendor sees maintenance as a monetization opportunity.

There is a fifth divergence that cuts across all of these: the spec itself is diverging from its own purpose. Iceberg v4 is attempting to absorb indexing, security, and semantic context into the format specification. Firebolt's Benjamin Wagner argued at Iceberg Summit 2026 that this path leads to more compliance surface area, engines falling further behind on spec support, and eventually someone building something simpler. That "someone" already exists: DuckLake uses Postgres for metadata and Parquet for data, and nothing else. The pattern: more spec features leads to more fragmentation, not less. A spec that widens too fast does not unify. It fractures. The strongest argument for keeping Iceberg lean is that engine-level innovation on indexing, security, and semantics moves at a pace no spec committee can match.

The pattern is clear: formats converge because there is no lock-in value in format differentiation. Governance, AI governance, semantics, and maintenance diverge because there is enormous lock-in value in each. And the spec itself risks fragmenting if it tries to absorb what belongs at the engine and catalog layers.


The Bet, the Framework, and the Four Decisions

The Bet I Would Make

If a CDAO asked me in a private architecture review which catalog to bet on for the next three to five years, I would say Polaris, with caveats.

The strongest argument for Polaris is the governance structure of the project itself. Apache TLP status with a contributor base spanning every major cloud, every major BI vendor, and every major governance vendor is structurally different from open-source-but-vendor-controlled. The governance portability announcement is a credible signal of intent.

The strongest argument against Polaris is the operational gap on AI governance and semantics. Unity Catalog is genuinely ahead on Business Semantics, Tool Catalogs for AI agents, and the operational maturity that comes from 10,000+ enterprise deployments.

The reason I would still bet on Polaris: the AI governance gap is closeable through community contribution, while the project governance gap of Unity Catalog is structurally harder to close as long as Databricks is the dominant commercial implementation. Specs catch up. Foundations are harder to retrofit.

There is a scenario where this bet is wrong: if Databricks ships AI governance features at its current pace, and if Polaris cannot close that gap before the agentic AI buildout phase peaks in 2027-2028, the practical center of gravity could lock in around Unity Catalog before the structural advantages of Polaris materialize. Markets do not always reward the architecturally cleaner choice on the timeline you expect.

The Decision Framework

Catalog Decision Framework

Most large enterprises will end up with both catalogs for some period, federated. The catalog that wins the federation layer will be the catalog that wins.

The Four Decisions You Have to Make Regardless

Whichever catalog you bet on, there are four architectural decisions you have to make in the next twelve months. These are the questions I would put on an architecture review agenda right now, before any vendor presentation.

01 Where does your governance live?

Catalog-level governance and engine-level governance are not the same thing. If your row-level security and column masking rules live inside Snowflake's Horizon Catalog, they do not automatically apply when Spark reads the same table through Polaris. The right answer for most enterprises is to push governance as close to the catalog layer as possible and minimize engine-specific policy.

02 What is your interop guarantee, contractually and operationally?

It is easy to say "we support the Iceberg REST API." It is harder to specify which version, which extensions, which engines have been tested in production, and what the recovery path looks like when a vendor extension breaks compatibility. If they cannot commit this in writing, the interop story is marketing.

03 What is your catalog portability plan?

If you choose Catalog A today and decide in 2028 you need Catalog B, what is the actual migration cost? How much governance transfers? How much semantic layer transfers? How much AI tool registry transfers? The honest answer today is that catalog-to-catalog migration is closer to a re-implementation than a migration. That cost should be priced into the original decision.

04 What is your engine independence horizon?

Catalog choice constrains engine choice. The question is which engines you will need over the next five years, including engines that do not exist yet. The Polaris contributor list, which includes the major engine vendors, is informative here. If a catalog's contributor list is dominated by a single engine vendor, that is information about how engine-neutral it will remain.


What the Next Three Years Look Like

This is where I put my predictions on record. These are informed by the research, the market dynamics, and twenty years of watching enterprise data infrastructure decisions. They are predictions, not certainties.

Prediction Timeline: The Road Ahead

The Next 6 Months (Through October 2026)

Summit season will reveal positioning. Snowflake Summit (June 2-5) will showcase Polaris governance federation demos; watch for whether they are live multi-engine demos or slides. Databricks Data + AI Summit (June 15-18) will push Unity Catalog Iceberg write to GA; watch for whether the OSS project governance evolves to match the technical openness.

Iceberg v3 adoption goes mainstream across all major engines. The Nessie merge into Polaris completes. Smaller catalog implementations start losing ground to the big three. AWS re:Invent (November-December) will be AWS's response: expanded Glue federation and deeper S3 Tables integration.

One Year Out (April 2027)

The AI governance battleground heats up. Unity Catalog's lead either solidifies or Polaris closes the gap through community contribution. Gravitino's model catalog and MCP server gain traction as a federation layer for AI assets.

Governance portability becomes testable. Snowflake's Policy Exchange Standards reach production maturity. The first public case studies of catalog-to-catalog governance migration appear.

Enterprise catalog selection becomes a C-suite decision. Analyst firms publish dedicated catalog comparison frameworks. CDAOs treat this as strategic architecture, not platform team choice.

Two Years Out (April 2028)

Catalog consolidation in full swing. Most enterprises have committed to a primary catalog with Glue in maintenance mode. Hive Metastore deployments decline sharply.

AI-native catalog extensions emerge. The catalog governs not just data but agent tools, prompt libraries, model registries, and function catalogs. Format convergence reaches practical completion; the "which format" question is genuinely irrelevant for new workloads.

The semantic layer wars begin. With format and basic catalog questions resolved, the battle shifts to who owns business definitions: revenue, customer, churn, risk. This will be a bloodier fight than the catalog war because business definitions touch every consumer of data, not just engineers.

Three Years Out (April 2029)

The catalog becomes the data operating system: governance layer, semantic layer, AI asset layer, and federation layer for the entire data estate. Catalog selection determines cloud strategy, engine strategy, AI strategy, and BI strategy.

The market consolidates to 2-3 major catalogs. Format is invisible; catalogs handle translation transparently. Governance is portable. "Lakehouse Format 1.0" is the understood reality: one format with two names, the format distinction meaningless for new workloads.

Ten Predictions, Ranked by Confidence

# Prediction Confidence
1 Polaris wins the standard; Unity wins the product High
2 The Tabular acquisition was the most consequential move of the catalog wars High
3 Hive Metastore will be effectively dead by 2028 High
4 The real winner is the customer, but only if they force the issue High (principle)
5 Governance portability defines the 2027 catalog selection cycle Medium-High
6 AWS embraces Polaris within 18 months Medium-High
7 The semantic layer war will be bloodier than the catalog war Medium-High
8 Catalog-as-a-service becomes a standalone infrastructure category Medium
9 AI asset governance creates a new catalog tier Medium
10 Format convergence produces a de facto "Lakehouse Format 1.0" by 2029 Medium

What I Will Be Watching For at Summit

Three specific things from each Summit will tell us which way the catalog war is actually trending. None of them will be on the keynote slides.

From Snowflake Summit (June 2-5): Watch for the specific commitments around Polaris governance federation in production. The April 2026 announcement was directional. The June detail will tell us whether the policy exchange standards are real engineering or roadmap aspiration. Watch also for which third-party engines demonstrate live read-and-write to Polaris on stage. Demos with Spark and Trino are table stakes. Demos with Databricks SQL or DuckDB or Snowflake itself reading from Polaris-managed tables created by another engine would be the more meaningful signal.

From Databricks Data + AI Summit (June 15-18): Watch for the specific roadmap on Unity Catalog as a true open standard versus Unity Catalog as a Databricks-led project. The Tabular acquisition, the open sourcing of Business Semantics, and the AI tool catalog work all point in promising directions. The question is whether the project governance evolves to match the technical openness. Watch also for the specific federation story with Polaris and Glue. If Unity Catalog can credibly federate with the catalogs you are likely to encounter elsewhere in your environment, that changes the calculation significantly.


The Bottom Line

The format war is over. Both creators have said so. The numbers confirm it: 78.6% exclusive Iceberg use, every cloud provider integrated, deletion vectors and type systems converging at the Parquet layer.

The catalog war is the architecture decision of 2026. The data catalog market is heading toward $10 billion. Gartner says active metadata adoption will grow 70-75% by 2027. The average enterprise draws from over 400 data sources. 66% of business data sits unused in silos. The catalog is the infrastructure that solves all of these problems, or creates the next generation of lock-in, depending on which one you choose and how you choose it.

The technology converges where vendors cannot monetize. It diverges where they can. Formats converge. Governance, AI governance, semantics, and maintenance diverge. Understand that boundary and you understand where lock-in actually lives.

Most CDAOs will let this decision happen to them rather than make it deliberately. The ones who frame it correctly, ask the four questions, and place a defensible bet will be the ones whose data architecture survives the agentic AI buildout intact.

The vendors will not frame this for you in June. That is what your architecture team is for.


The format war gave you a decade of paralysis. The catalog war will give you a decade of lock-in, unless you make the decision deliberately before June.

Specs catch up. Foundations are harder to retrofit. Bet on the governance structure, not the feature list.


Technology Reference

A quick reference for technologies, projects, and standards discussed in this post.

Catalogs (Technical)

Technology What It Is Link
Apache Polaris Apache Top-Level Project (Feb 2026). Open-source Iceberg REST catalog. Vendor-neutral governance, credential vending, table maintenance. Reference implementation of the Iceberg REST spec. Originally contributed by Snowflake. polaris.apache.org
Unity Catalog Databricks catalog. Governs tables, ML models, AI tools, notebooks in a single namespace. Open-sourced under Apache 2.0 (June 2024). Over 10,000 enterprises in production. unitycatalog.io
AWS Glue Data Catalog AWS-native metastore. Default catalog for Athena, EMR, Redshift Spectrum. 39.3% adoption (market leader). AWS-only. aws.amazon.com/glue
Apache Gravitino Apache TLP (June 2025). "Catalog of catalogs." Federated metadata management across heterogeneous catalog sources. gravitino.apache.org
Lakekeeper Rust-native Iceberg REST catalog. Single binary, fast startup, low memory. Lightweight alternative to Polaris. lakekeeper.io
Snowflake Horizon Snowflake's proprietary governance catalog. Integrates with Polaris for portability. snowflake.com
Google BigLake Metastore GCP-native Iceberg REST catalog. Full REST support, tightly coupled to BigQuery and Dataproc. cloud.google.com/bigquery/docs/biglake-intro
Microsoft OneLake Microsoft Fabric's unified data lake. Provides Iceberg REST read APIs within the Fabric ecosystem. learn.microsoft.com/fabric/onelake
Amazon S3 Tables AWS service (Dec 2024) embedding Iceberg directly into the S3 storage layer. Claims 3x query performance. AWS-only, not open-sourced. aws.amazon.com/s3/features/tables
Hive Metastore Legacy catalog that both Iceberg and Delta were originally built to replace. Still widely deployed. hive.apache.org

Query Engines

Technology What It Is Link
Snowflake Cloud data warehouse. Polaris sponsor and contributor. snowflake.com
Databricks Unified analytics and AI platform. Creator of Delta Lake, acquirer of Tabular ($2B). Unity Catalog steward. databricks.com
Firebolt Cloud-native analytics engine for sub-second Iceberg queries. Native aggregating indexes, text search, vector search. Native OSI (semantic views) support. firebolt.io
Dremio Lakehouse query engine. Project Nessie (Git-like catalog semantics) merging into Polaris. dremio.com
DuckDB Embedded analytical database. In-process SQL, REST catalog support. Creator of DuckLake. duckdb.org
Trino / Presto Distributed SQL query engines for federated analytics across data sources. trino.io
Apache Spark Unified engine for large-scale data processing. Native Iceberg and Delta support. spark.apache.org
Apache Flink Stream and batch processing engine. Iceberg connector for streaming writes. flink.apache.org
ClickHouse Column-oriented OLAP database. Recognizes Lakekeeper as a supported Iceberg catalog. clickhouse.com

Table Formats and Storage

Technology What It Is Link
Apache Iceberg Open table format for huge analytic datasets. Engine-agnostic, ACID transactions, schema evolution, hidden partitioning. Created at Netflix by Ryan Blue. iceberg.apache.org
Delta Lake Open table format from Databricks. Transaction log-based, optimized for streaming and high-frequency writes. Open-sourced in 2022. delta.io
Apache Parquet Columnar storage file format. The on-disk layer underneath Iceberg, Delta, and Hudi. Both formats store data as Parquet files. parquet.apache.org
DuckLake Lightweight open table format by DuckDB Labs. Uses PostgreSQL for metadata, Parquet for data. Created as a simpler alternative to Iceberg's complexity. ducklake.select

Interoperability Standards and Tools

Technology What It Is Link
Iceberg REST Catalog API OpenAPI specification for catalog operations. The de facto universal interface for all catalogs. iceberg.apache.org/spec
Delta UniForm Databricks feature exposing Delta tables as Iceberg metadata. External engines read them as Iceberg without copying data. docs.databricks.com
Apache XTable Cross-format translation (Iceberg, Delta, Hudi) without data copying. Formerly OneTable. xtable.apache.org
OpenLineage Open standard for lineage interoperability. Adopted by Airflow, Spark, dbt, Dataproc. LF AI & Data Graduate project. openlineage.io
OSI (Open Semantic Interface) Semantic model specification in YAML. Defines business measures, dimensions, and filters consumable by any agent or engine. osi.dev
MCP (Model Context Protocol) Protocol for AI agents to discover and access tools, data, and context. Databricks positioning Unity Catalog as the governance layer for MCP. modelcontextprotocol.io
Stay Connected
Enterprise AI strategy, data architecture, and the leadership decisions that determine whether AI investments deliver measurable business lift.