Why this exists

The Case Against
the Translation Layer

On search, AI, and what gets lost when someone else summarizes the record for you.

The problem

The most consequential information about your daily life is public. The government records it, indexes it, and publishes it. The DOT files every flight delay with tail number, cause code, and minute-level precision. NHTSA maintains crash test data for every vehicle sold in the United States, plus a database of every active recall and every fatality investigation. The National Library of Medicine indexes every peer-reviewed study published in the English language. CMS publishes patient outcome scores for every hospital in the country.

Almost no one reads it. Not because it's hidden. Because it's not readable.

A flight delay report is a 47-column CSV. A crash test filing is a structured XML document. A clinical trial abstract assumes you understand what a hazard ratio is. The record exists. The interface to the record has never been built for a person making a decision.

Two industries promised to fix this. Neither did.

What search did

In 1998, Brin and Page published "The Anatomy of a Large-Scale Hypertextual Web Search Engine," introducing PageRank — the insight that a link from one document to another was an implicit citation, and that a document cited by many authoritative sources was likely authoritative itself.1 The algorithm was elegant. The mission was genuine: organize the world's information and make it universally accessible.

Then they discovered advertising.

The search index that exists today is not optimized for accuracy. It is optimized for engagement, which correlates with accuracy only loosely and degrades over time as publishers learn to optimize for the index instead of the reader. A search for "is vitamin D effective" returns manufacturer pages, supplement retailers, health media that depends on supplement advertising, and a small number of actual studies buried behind pagination. The signal and the noise are presented with identical authority.

Search doesn't show you what is true. It shows you what has been linked to. Those are not the same thing.

The deeper problem is structural. Search returns documents. The federal record is not a document — it is a database, a filing system, an XML feed. It does not have inbound links. It does not rank. The DOT's on-time performance data is not findable through search in any useful sense. You can find articles about it. You cannot find it.

What AI did

AI is a useful tool. Worth saying plainly. Large language models can synthesize vast amounts of text, explain complex topics accessibly, and reduce the time it takes to get oriented on an unfamiliar subject. These are genuine capabilities, and this project has no argument with them.

The specific problem is different. Fewer people are reading primary sources than they were ten years ago. More are outsourcing judgment — not just information retrieval, but the act of deciding what something means — to an interface that was trained to sound confident. The concern isn't that AI is wrong. It's that it's right enough, often enough, that people have stopped checking.

The mechanism is worth understanding. Large language models trained on human feedback — RLHF, as described by Ouyang et al. in 20222 — are optimized to generate responses that humans rate as helpful and authoritative. In practice, humans rate confident answers higher than uncertain ones, even when the uncertainty is warranted. The training signal rewards fluency and completion, not citation.

The result is a system that presents conclusions without showing its work. Ask an AI whether creatine supplementation improves athletic performance. It will tell you yes, with some qualifications. It will not show you the 53,000 studies in PubMed, the meta-analyses that found effect sizes ranging from negligible to significant depending on the population, the three large RCTs that contradict the headline finding, or the funding sources of the most-cited papers.

It cannot show you these things. It has no access to the primary record. It was trained on text that described the primary record, filtered through the same engagement-optimized search index described above, then further filtered by the preferences of human raters who had no more access to the original data than you do.

An AI translation isn't neutral. Something in the original doesn't survive the trip.

This is not a criticism of the technology. It is a description of what the technology is optimized for. RLHF-trained language models are extraordinarily good at synthesizing text into fluent, structured summaries. That capability is the right tool for many tasks. It is the wrong tool when the requirement is fidelity to primary sources, because synthesis by definition involves loss — and because the habit of outsourcing judgment is easier to build than it is to break.

The translational gap

Consider the specific case of a family deciding whether to purchase a vehicle. The available data is substantial. NHTSA's 5-Star Safety Ratings program generates frontal crash, side crash, and rollover scores for every rated vehicle, expressed as a star rating derived from physical tests with instrumented dummies.3 The agency also maintains a database of every Technical Service Bulletin and Recall Campaign, searchable by make, model, and year. Complaints from owners — more than 10,000 filed monthly — are publicly accessible and full-text searchable.

The typical consumer interaction with this data is to ask a salesperson, read a magazine review, or — increasingly — ask an AI assistant. None of these channels surfaces the primary record. The salesperson has an incentive. The review is written by a journalist who did not read the crash test methodology. The AI summarizes text that described summaries of the original data.

At each step in that chain, something is lost. The specific star ratings. The raw injury probability scores underlying the stars. The open recall status. The pattern of complaints that appears in the database before a recall is issued.

The data that would change the decision is available. The interface that would surface it has never been built.

The sources

Source What it contains Update frequency
BTS Form 41 / ATADS Flight-level on-time, cancellation, and cause data for all US carriers. Reported monthly. Publicly available via the DOT Bureau of Transportation Statistics. Monthly
FAA NASSTATUS Real-time National Airspace System status: ground stops, ground delay programs, and arrival/departure delays at every airport in the US, updated continuously. Live
NHTSA 5-Star Ratings Frontal, side, and rollover crash test scores for every tested vehicle. Expressed as stars derived from probability of serious injury in a standardized test. Per model year
NHTSA Complaints & Recalls Every owner-submitted complaint and every active recall campaign, searchable by vehicle. Approximately 10,000 complaints filed per month. Continuous
PubMed / MEDLINE 35+ million citations to peer-reviewed biomedical literature, indexed by the National Library of Medicine. Searchable by MeSH term, title, abstract, author, and publication type. Daily

Why these seven

The federal government records a great deal of data. Choosing seven interfaces required a selection principle. The principle was not what was easiest to access. It was what humans have always needed to know before making consequential decisions.

Before every significant move, every tribe has gathered intel. The oldest categories of human intelligence map to three questions: Can I move safely? Is this safe to consume? Can I be healed? Every significant personal risk in modern life falls into one of these three territories — and the federal record, by structural accident, covers them almost exactly.

Territory The primal question Primitives
Move Is this journey safe? Flying, Cars
Sustain Is this safe to consume? Food, Water
Heal Can I trust what heals me? Health, Hospitals, Drugs

The seven were chosen because these domains share three properties no other category of federal data has in combination: the stakes are high enough to change a decision, the gap between published record and public awareness is wide enough to matter, and the data is specific enough to actually answer the question.

The other criterion was independence. These interfaces have no incentive other than the record. No airline benefits from you reading the DOT delay database. No pharmaceutical company benefits from you reading the FAERS adverse event reports. No hospital system benefits from you reading the CMS outcomes data. The commercial incentive runs in exactly the opposite direction — which is precisely why the interface hasn't been built.

What Open Primitive does

Open Primitive is not an AI. It does not summarize. It does not interpret. It connects to primary-source federal databases and renders their output in a form a person can read and act on.

The design constraint is strict: every number shown must be traceable to a specific filing, a specific database record, a specific row in a government dataset. If the underlying data is ambiguous, the interface shows the ambiguity. If the data is missing, it says so.

The current tools surface three categories of federal data where the gap between the published record and public awareness is widest and the stakes of the decisions involved are highest: transportation safety, vehicle safety, and health evidence.

More will follow. Hospital quality scores from CMS HCAHPS. Water quality data from EPA ECHO. Drug interaction data from the FDA Adverse Event Reporting System. The pattern is the same in each case: the record exists, the interface doesn't, and the decisions people make without it are worse than they need to be.

What it doesn't do

Open Primitive does not tell you what to decide. It shows you what the record says. The difference matters.

A tool that tells you "Alaska Airlines is the safest choice" has made a judgment about how to weight on-time performance against cancellation rates against your specific route and departure time. That judgment belongs to you. A tool that shows you the DOT data, the FAA live status, and the BTS track record for each carrier gives you what you need to make it.

The goal is not to replace judgment. It is to restore the precondition for it: access to the actual record, in a form you can read.

Open questions

Several design problems remain unsolved. Federal databases are not designed for real-time consumer queries — rate limits, schema changes, and data gaps are operational realities, not edge cases. The line between "making data readable" and "making data interpretive" is not always clear; a composite score that ranks airlines involves weighting choices that embed assumptions. As more tools are added, the question of what counts as "primary source" becomes harder to answer cleanly.

These problems are worth naming because they are not decorative. They are the reasons no one has built this before. The commercial incentive runs in the opposite direction: AI answers are cheaper to produce, easier to scale, and more satisfying to users in the short term than raw data interfaces. The correct answer to "is this car safe" is not a star rating. It is a 90-minute reading session. Most people will not do that. Open Primitive tries to make it possible in five minutes.

Whether that is enough is an open question. The goal is to make the primary record accessible enough that more people use it, more often, for the decisions where it matters most.

References
  1. Brin, S. & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.
  2. Ouyang, L. et al. (2022). Training language models to follow instructions with human feedback. arXiv 2203.02155.
  3. NHTSA (2024). New Car Assessment Program (NCAP) 5-Star Safety Ratings Methodology. US Department of Transportation.
  4. Bureau of Transportation Statistics (2024). Airline On-Time Statistics and Delay Causes. Form 41 / ATADS. US Department of Transportation.
  5. National Library of Medicine (2024). PubMed Overview. US National Institutes of Health.
The primitives

The argument is only as good as the interfaces it produced. These are the seven.