Founding document

Why the agent internet needs a data layer.

The Open Primitive Protocol defines how data providers make their data agent-consumable, verifiable, and discoverable.

The shift

The internet was built for humans. HTML pages. Visual design. Attention economics. Advertising. That internet peaked.

Bot traffic exceeds human traffic by 2027. Automated agents already outnumber human visitors on most federal data endpoints. The next internet does not render pages. It consumes structured data, verifies provenance, and makes decisions without a browser.

This shift changes everything about how data must be published. A human can glance at a government website and judge whether it looks trustworthy. An agent cannot. A human can read a disclaimer at the bottom of a page. An agent needs that disclaimer encoded in the response header. A human can notice that data feels stale. An agent needs a timestamp it can compare against a freshness policy.

The internet built for humans assumed trust through visual design. The internet built for agents requires trust through structure.

The infrastructure that powered the first internet -- DNS, HTTP, TLS, HTML -- was designed to move documents between servers and browsers. The infrastructure for the agent internet does not exist yet. Agents need structured data with machine-readable provenance, freshness guarantees, and confidence scores. They need to verify what they receive. They need to discover what exists. No protocol provides this today.

The problem

Agents get data naked.

When Claude answers "is my water safe?" it either hallucinate or calls an API that returns a JSON blob with no provenance, no freshness guarantee, no confidence score. The agent cannot verify the data. The user cannot verify the agent. Trust breaks at every link in the chain.

Ask an agent about hospital quality. It will synthesize text it was trained on -- text that described summaries of CMS data, filtered through health media optimized for engagement, written by journalists who never read the HCAHPS methodology. The agent sounds confident. The answer has no verifiable connection to the 4,000+ hospital quality scores CMS publishes quarterly.

Ask an agent about drug interactions. It will produce a fluent paragraph. It will not tell you where that paragraph came from, when the underlying data was last updated, or what its confidence level is. You cannot check its work. Neither can the next agent in the chain.

This is not an AI problem. AI works fine when it has good data. This is a data infrastructure problem. The plumbing between primary sources and the agents that consume them does not exist.

The gap

Six structural failures prevent agents from consuming data reliably.

No standard envelope for data provenance. An agent receives a JSON response. Nothing in that response identifies the original source, the collection method, or the chain of custody from publisher to consumer.
No freshness guarantees agents can inspect. Federal APIs serve data with no machine-readable indication of when it was collected, when it was last validated, or when it expires. An agent cannot distinguish data from yesterday and data from 2019.
No cross-domain intelligence. FDA food recall data, EPA water quality data, and CMS hospital outcome data sit in separate silos. No agent can join them. A contaminated water supply that correlates with elevated hospital admissions in the same county is invisible.
No historical archive. Federal APIs serve only current data. Last month's water quality readings, last quarter's hospital scores, last year's drug adverse events -- gone. Agents cannot analyze trends because the time-series data does not persist anywhere they can reach.
No conflict resolution. When two federal sources report different values for the same measurement, no protocol exists for an agent to detect the conflict, compare confidence levels, or choose between them.
No data quality scoring. An agent receives a response. It has no way to assess whether the response is complete, whether fields are missing, or whether the source has a history of revisions that changed previously published values.

Each of these gaps exists because federal data infrastructure was designed for human researchers navigating websites. That design assumption is now wrong. Agents are the primary consumers, and they are consuming data blind.

What we built

The Open Primitive Protocol defines how data providers make their data agent-consumable, verifiable, and discoverable. Three components.

Manifest

A machine-readable declaration of what data a provider publishes, how often it updates, what domains it covers, and what confidence levels it assigns. Agents discover available data by reading manifests. One manifest per provider. Standardized schema.

Envelope

Every data response ships inside an envelope that includes: source attribution, collection timestamp, freshness expiry, confidence score, data quality grade, and a cryptographic signature tying the response to the publisher. Agents verify data by inspecting the envelope. The payload is untouched -- the envelope wraps it.

Endpoint

A standardized query interface that agents use to request data across domains. Consistent parameter naming. Consistent error handling. Consistent pagination. An agent that knows how to query one Open Primitive endpoint knows how to query all of them.

The reference implementation covers 16 US federal data domains across 11 agencies, live at api.openprimitive.com. Flight delays from BTS. Vehicle safety from NHTSA. Hospital quality from CMS. Water quality from EPA. Drug adverse events from FDA. Food recalls from FDA. Health evidence from NLM. Each domain follows the same protocol. Each response carries the same envelope.

Territory	The question	Domains
Move	Is this journey safe?	Flying, Cars
Sustain	Is this safe to consume?	Food, Water
Heal	Can I trust what heals me?	Health, Hospitals, Drugs

The moat

Archive compounds daily. Every response Open Primitive serves gets archived. Federal APIs serve only current data. We store the time series. After one year, we hold 365 snapshots no one else has. After five years, we hold the only queryable historical record of federal data as it was actually published. Time-series data cannot be reconstructed. This advantage grows every day and cannot be replicated retroactively.

Cryptographic provenance. Every envelope carries a signature chain from the original federal source through our collection and transformation pipeline to the agent that receives it. You can verify that a water quality reading came from EPA, was collected on a specific date, and arrived unmodified. Provenance chains cannot be faked after the fact.

Cross-domain joins. No single federal agency connects its data to another agency's data. EPA does not link water quality to CMS hospital admissions. FDA does not link drug adverse events to NLM clinical trial outcomes. Open Primitive joins across domains because it holds normalized data from all of them. The cross-domain graph gets richer with every domain added.

Editorial judgment. Not all federal data benefits from AI translation. Some data -- crash test scores, flight delay percentages -- communicates clearly as numbers. Other data -- clinical trial abstracts, adverse event narratives -- requires synthesis to be useful. Open Primitive makes this judgment domain by domain. That editorial layer is a product decision, not an algorithm. It cannot be automated away.

The protocol itself. The first credible standard for agent-consumable data shapes the ecosystem. Providers adopt it because agents expect it. Agents expect it because providers adopted it. Network effects in protocol design compound faster than in any other category.

The endgame

Federated data layers, connected by one protocol.

US federal data through Open Primitive. EU regulatory data through a European equivalent running the same protocol. International health data through WHO. Global economic data through World Bank. Climate data through national meteorological agencies. Each provider publishes a manifest. Each response ships in an envelope. Each endpoint follows the same query interface.

An agent in Berlin queries hospital infection rates in Texas and pharmaceutical approval timelines in Brussels in the same request. Both responses carry provenance chains. Both specify freshness windows. Both include confidence scores. The agent verifies both before acting.

This is not a fantasy about interoperability. This is what happens when you define the protocol layer before the platform layer. HTTP did not require every website to use the same CMS. The Open Primitive Protocol does not require every data provider to use the same infrastructure. It requires them to speak the same language at the boundary.

Google organized information for humans. Open Primitive organizes data for agents.

The agent internet will be built on data infrastructure, not document infrastructure. Someone will define the protocol that connects data providers to data consumers in this new internet. The question is whether that protocol optimizes for verifiability or for convenience. Whether it preserves provenance or strips it. Whether it serves the agent's user or the agent's builder.

Open Primitive chose verifiability, provenance, and the user. The protocol is open. The reference implementation is live. The archive grows daily.

Build on it.

The reference implementation

The protocol is only as credible as the implementation that proves it works. These are the seven live domains.

Flying Cars Health Hospitals Water Drugs Food