Citations are the most visible layer, but they are not enough by themselves. The stronger test is whether a reviewer can move from a claim to the exact supporting material and check it.
OpenAI’s clearest provenance requirement in these sources appears in the Deep Research documentation: when web results, or information from web results, are shown to end users, inline citations should be clearly visible and clickable . That matters because provenance is weaker when links are hidden in metadata or detached from the claims they support.
OpenAI also provides citation-formatting guidance for preparing citable material and instructing a model to format citations effectively . Its Deep Research API example says responses include a structured final answer with inline citations, summaries of reasoning steps, and source information
. OpenAI’s Help Center similarly says Deep Research outputs include citations or source links so users can verify information
.
That supports a limited but important conclusion: OpenAI is explicit in these documents about citation presentation for web-research workflows. It does not prove that every citation is accurate, and it does not establish anything model-specific about GPT-5.5 Spud.
Anthropic’s documentation is strongest here on Claude Opus 4.7 positioning and document-based citation mechanics. Anthropic describes Claude Opus 4.7 as part of the latest Claude generation and recommends it for the most complex tasks as the company’s most capable generally available model .
For provenance, the key Anthropic source is its citations documentation. It says Claude can provide detailed citations when answering questions about documents, helping users track and verify information sources, when documents are provided and citations are enabled . It also describes citation granularity: plain-text and PDF documents are automatically chunked into sentences by default, while custom content documents can be used when developers need finer control
.
Anthropic’s PDF support documentation adds another provenance-related detail: visual PDF analysis in the Converse API requires citations to be enabled . Anthropic also documents a Files API that lets developers upload and manage files for Claude API use without re-uploading the same content on each request
. File handling is not proof of citation accuracy, but it can support a stronger audit trail when paired with stored sources and claim-level citations.
The biggest trap in evaluating “research provenance” is treating a model’s thinking artifacts as evidence. They are not the same thing.
OpenAI’s reasoning best-practices page says reasoning models perform reasoning internally and advises developers not to prompt them to think step by step or explain their chain of thought . OpenAI’s reasoning-models guide focuses on controls such as reasoning effort, reasoning tokens, and keeping reasoning state across turns
.
Anthropic exposes more terminology around thinking mechanics. Its prompt-caching documentation says thinking blocks have special behavior when extended thinking is used with prompt caching . Its extended-thinking documentation distinguishes full thinking tokens from summarized output in Claude 4 and later models
. Anthropic release notes describe a display field that can omit thinking content from responses, and Claude Code docs say adding
ultrathink to a skill enables extended thinking in that skill .
Those features can help developers tune complex workflows. But a scratchpad, hidden chain of thought, or summarized reasoning trail does not establish that a factual claim came from a specific URL, document, or file. Treat reasoning artifacts as secondary context, not as a source audit trail.
Instead of choosing by model name alone, evaluate whether the whole research workflow can survive review.
The reviewed documents support a nuanced comparison, not a leaderboard. OpenAI is better evidenced here for user-facing web-citation requirements because Deep Research explicitly calls for visible, clickable inline citations when web-derived information is shown to users . Anthropic is better evidenced here for document-grounded Claude citations because its docs describe enabling citations on supplied documents and controlling citation granularity through sentence chunking and custom content
.
Claude Opus 4.7 is documented as Anthropic’s most capable generally available model for complex tasks, but the OpenAI model-specific source reviewed here is GPT-5.4, not GPT-5.5 Spud . If the goal is auditable AI research, compare source capture, citation granularity, and validation practices before comparing model names.
Comments
0 comments