Monday, June 15, 2026
Google Cloud Open Knowledge Format: Standardizing Knowledge for AI Agents
Posted by

What Is OKF?
The Open Knowledge Format (OKF) is Google Cloud's open specification for representing organizational knowledge in a portable, agent-readable format. Announced June 12, 2026 by Sam McVeety and Amir Hormati on the Google Cloud Blog, v0.1 is intentionally minimal: a directory of markdown files with YAML frontmatter.
That's it. No schema registry, no central authority, no required tooling. If you can cat a file you can read OKF; if you can git clone a repo you can ship it.
OKF formalizes what Andrej Karpathy called the "LLM Wiki" pattern — the practice of storing curated knowledge as a markdown library that both humans and AI agents can read. What was an emergent convention across Obsidian vaults, AGENTS.md files, and project wikis now has a spec.
The Problem It Solves
Organizational knowledge lives in too many places:
- Schema registries (Dataplex, Unity Catalog, Collibra) know column types but not what columns mean.
- Runbooks and wikis (Notion, Confluence) have context but no structure an agent can parse.
- Code comments and READMEs are close to the data but inconsistent.
- Human heads contain the most valuable knowledge and walk out the door.
Every team building AI agents spends the first sprint writing a custom parser for their internal wiki, a scraper for their metadata catalog, and a prompt template that glues it all together. That's the context-assembly problem — and it's entirely redundant across every organization that builds agents.
OKF says: standardize the format, leave the tooling to the ecosystem.
Technical Reference
Bundle Structure
An OKF bundle is a directory tree of markdown files. The directory structure is domain-independent — producers organize concepts however makes sense.
path/to/bundle/
├── index.md # Optional. Directory listing.
├── log.md # Optional. Update history.
├── <concept>.md # A concept at the bundle root.
└── <subdirectory>/
├── index.md
├── <concept>.md
└── <subdirectory>/
└── …
A bundle can be distributed as a git repository (recommended), a tarball, a zip archive, or a subdirectory within a larger repo.
Reserved Files
| Filename | Purpose |
|---|---|
index.md | Directory listing for progressive disclosure. No frontmatter (except optional okf_version at root). |
log.md | Chronological change history. ISO 8601 date headings. |
Frontmatter Fields
Every concept document has a YAML frontmatter block delimited by ---.
---
type: <Type name> # REQUIRED — consumer routing key
title: <Optional display name>
description: <Optional summary>
resource: <Optional canonical URI>
tags: [<tag>, <tag>, …]
timestamp: <ISO 8601 datetime>
# … any additional key/value pairs
---
Only type is required. Everything else is optional. Producers can add arbitrary keys; consumers must tolerate unknown ones.
The type field is not registered centrally. Example values: BigQuery Table, API Endpoint, Metric, Playbook, Reference. Consumers must handle unknown types gracefully (treat as generic concept).
Conventional Body Sections
The body is standard markdown. These headings have conventional meaning:
| Heading | Purpose |
|---|---|
# Schema | Structured description of columns/fields. |
# Examples | Concrete usage examples, often fenced code blocks. |
# Citations | External sources backing claims in the body. |
Cross-linking
Concepts link to each other with standard markdown links. Two forms:
- Bundle-relative (recommended):
/tables/customers.md— stable when files move within subdirectories. - Relative:
./other.md— standard markdown paths.
Link semantics are untyped — the relationship is conveyed by the surrounding prose. Consumers must tolerate broken links (they represent not-yet-written knowledge).
Example Concept
---
type: BigQuery Table
title: Customer Orders
description: One row per completed customer order across all channels.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, orders, revenue]
timestamp: 2026-05-28T14:30:00Z
---
# Schema
| Column | Type | Description |
|---------------|-----------|------------------------------------------|
| `order_id` | STRING | Globally unique order identifier. |
| `customer_id` | STRING | Foreign key into [customers](/tables/customers.md). |
| `total_usd` | NUMERIC | Order total in US dollars. |
| `placed_at` | TIMESTAMP | When the customer submitted the order. |
# Joins
Joined with [customers](/tables/customers.md) on `customer_id`.
# Citations
[1] [BigQuery table schema](https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders)
Conformance Rules
A bundle is conformant with OKF v0.1 if:
- Every non-reserved
.mdfile contains parseable YAML frontmatter. - Every frontmatter block has a non-empty
typefield. - Reserved files (
index.md,log.md) follow their defined structure.
Consumers must not reject a bundle because of:
- Missing optional frontmatter fields.
- Unknown
typevalues. - Unknown extra frontmatter keys.
- Broken cross-links.
- Missing
index.mdfiles.
Three Design Principles
OKF is built on three principles that constrain what it standardizes and — just as importantly — what it does not.
1. Minimally Opinionated
Only type is required. The content model is left entirely to the producer. A "Playbook" document about incident response and a "Metric" document describing weekly active users are both valid OKF. There is no enforced taxonomy, no schema registry, no required sections.
Why this matters: Organizations that enforce rigid schemas end up with empty databases. OKF lets teams start with whatever structure makes sense and evolve it. The only cost of entry is picking a type string.
2. Producer/Consumer Independence
The tool that writes an OKF bundle does not need to be the same tool that reads it. Google's BigQuery enrichment agent produces OKF; its HTML visualizer consumes it. But nothing ties you to those tools — you can author OKF by hand in VS Code and consume it from a custom Python script or an Obsidian vault.
Decoupling means the ecosystem grows without coordination. Any producer that outputs OKF automatically feeds every consumer.
3. Format, Not Platform
OKF is not tied to Google Cloud, BigQuery, Knowledge Catalog, or any service. It is a file format — a directory of markdown files. You can host it on GitHub, serve it from S3, mount it in a Docker container, or check it into your monorepo.
There is no required SDK, no API key, no cloud account. The spec lives at GoogleCloudPlatform/knowledge-catalog/okf/SPEC.md under an open license.
OKF vs MCP vs Other Knowledge Standards
A common question: how does OKF relate to the Model Context Protocol (MCP) and other patterns?
Knowledge Standards Comparison
Values: Complementary. MCP is for *finding and acting* on live systems. OKF is for *curated static knowledge* about those systems. An MCP server could serve OKF documents; an agent could use MCP to write OKF and OKF to inform its MCP tool choices.
Values: Different scope. CLAUDE.md is about agent *behavior* in a project. OKF is about the *domain knowledge* that project works with. They complement each other — CLAUDE.md tells the agent how to act; OKF tells it what it needs to know.
Values: OKF formalizes this pattern. The LLM Wiki proved the concept; OKF adds the small set of interoperability rules (frontmatter, reserved files, conformance) that let different wikis cooperate without tool-specific parsing.
Values: OKF is a vendor-neutral alternative. A catalog can *export* to OKF (Google already updated Knowledge Catalog to do this), but OKF itself is a file format you own, not a service you subscribe to.
Values: OKF is structurally similar but emphasizes agent consumption and cross-org portability. Obsidian vaults are the closest non-AI parallel to what OKF standardizes for agents.
Values: OKF bundles can be rendered by these tools with minimal adaptation. The difference is audience: docs tools optimize for human reading; OKF optimizes for agent traversal.
OKF + MCP = Full Stack
The most interesting combination is OKF and MCP working together:
| Layer | Protocol | Job |
|---|---|---|
| Knowledge | OKF | Curated static context about your data and systems |
| Tools | MCP | Live access to query, mutate, and act on systems |
| Memory | AGENTS.md / CLAUDE.md | Project-specific behavior instructions |
An agent uses AGENTS.md to understand the project, OKF to understand the domain, and MCP to take action. No single standard covers all three.
Reference Implementations
Google published three things alongside the spec to make it tangible:
BigQuery Enrichment Agent
An LLM-based tool that crawls BigQuery datasets and auto-generates OKF documents. Runs in two passes:
- BQ pass — Reads BigQuery metadata (schemas, descriptions, foreign keys) and writes one OKF document per table, view, or dataset.
- Web pass — Uses an LLM to crawl documentation URLs and enrich the generated documents with context from external sources.
The agent is built on Google's Agent Development Kit (ADK) with Gemini as the model backend. Source interface is pluggable — BigQuery is the first implementation.
Static HTML Visualizer
A self-contained HTML file (zero dependencies, no backend) that renders any OKF bundle as an interactive force-directed graph. Built on Cytoscape.js with an in-browser markdown renderer. Features:
- Colored node types (datasets, tables, references, playbooks)
- Directed edges from cross-links
- Detail panel with rendered markdown body
- Backlinks ("Cited by") view
- Search, type filter, switchable graph layouts
You can commit viz.html next to your bundle and share it as a static artifact.
Sample Bundles
Three ready-to-browse bundles checked into the GitHub repo:
- GA4 e-commerce — Google Merchandise Store public dataset
- Stack Overflow — Public dataset from the Stack Exchange Data Dump
- Bitcoin (crypto) — Blocks, transactions, inputs, outputs from bitcoin-etl
Each comes with a viz.html you can open in any browser.
How to Structure Agent-Readable Knowledge
OKF gives you the container. What you put in it determines whether your agent actually uses it well.
Organize by Domain, Not by Source
Bad:
bigquery_tables.md
confluence_playbooks.md
datadog_dashboards.md
Good:
sales/
datasets/
sales_data.md
tables/
orders.md
customers.md
metrics/
weekly_active_users.md
playbooks/
freshness_alert.md
Group by what the concepts mean in your domain, not where they came from. Agents navigate by concept relationships, not by origin system.
Write Dense, Structured Bodies
Agents (especially LLMs) extract more reliable information from structured markdown than from prose paragraphs. Prefer:
# Schema
| Column | Type | Description |
|--------|------|-------------|
# Joins
Joined with [customers](/tables/customers.md) on customer_id.
Over:
The orders table has various columns and is often joined with the customers table.
Link Aggressively
Every link is a relationship the agent can follow. If a playbook references a table, link to it. If a metric depends on a dataset, link it. The directory hierarchy gives you one dimension of organization; cross-links give you the graph.
Use Tags for Cross-Cutting Concerns
The tags frontmatter field is your secondary indexing mechanism. Tag by team (sales, finance), by domain (compliance, real-time), by lifecycle (deprecated, experimental). Consumers can filter on tags for scoped retrieval.
Keep Concept Files Narrow
One concept = one file. If a document covers two unrelated topics, split it. Agents traverse the bundle by following links from narrow, focused documents — a monolithic file defeats progressive disclosure.
Pitfalls and Open Questions
Contradiction Handling
OKF v0.1 has no semantics for when two documents disagree. If one concept says the data latency is 30 minutes and another says 5 minutes, which does the agent trust? Future versions may introduce version pinning or provenance fields.
Name Collisions
The acronym "OKF" already has meaning in supply-chain security (Open Knowledge Format for Supply Chain Information Security). Google's OKF is unrelated. Expect confusion in search results and documentation for at least another release cycle.
Trust and Provenance
OKF is a file format, not a trust system. In a multi-writer environment — where humans, enrichment agents, and CI pipelines all produce documents — there is no built-in mechanism for authority or review status. Current best practice: rely on git commit history for attribution and treat the repo's PR workflow as the trust layer.
No Version for Bundle Content
The okf_version field declares which spec version the bundle targets, not the version of the knowledge itself. Consumers that need to cache or invalidate bundles must implement their own versioning scheme (tagged git releases, content hashes, etc.).
The "Just Markdown" Trap
OKF's minimalism is its biggest strength and its biggest risk. Without schema enforcement, teams can produce inconsistent, low-quality bundles that are technically conformant but practically useless. The cure: treat OKF bundles like code — review them, lint them, test them.
Getting Started
- Read the spec —
okf/SPEC.mdin the knowledge-catalog repo. - Clone a sample bundle —
bundles/ga4/shows a complete example. - Write three concept files by hand — Pick three related things your team knows about (a table, a metric, a runbook), write each as an OKF document, link them together.
- Commit the bundle to your repo — Next to your code, not in a separate system.
- Point your agent at it — Load the bundle directory into context, or write a retrieval step that searches it.
If you're on Google Cloud, the Enrichment Agent can auto-generate the first pass from BigQuery metadata. If you're not, a 30-line script that walks your schema and writes markdown files will get you 80% of the value.
OKF is not a platform. It's not a service. It's a file format that makes organizational knowledge portable, diffable, and agent-readable. The value is proportional to how many producers write it and how many consumers read it — so the best time to start is before the ecosystem arrives.