05. Conformance And Proof

This workstream exists to make the project's claims provable, not just plausible.

Primary inspiration:

TheFellow-fkyeah for black-box conformance discipline
smartcomputer-ai-forge for conformance decomposition and gap-management rigor
aliciapaz-attractor-rb for readable divergence tracking

Goal

Build and maintain a benchmark-grade black-box conformance suite with a visible scoreboard tied to executable evidence.

Implemented Surface

The repository now exposes a dedicated conformance harness in test/attractor_ex/conformance/ with one suite per domain:

parsing_conformance_test.exs
runtime_conformance_test.exs
state_conformance_test.exs
transport_conformance_test.exs
agent_loop_conformance_test.exs
unified_llm_conformance_test.exs

Shared fixture data for those suites lives in test/support/attractor_ex_conformance_fixtures.ex.

The published scoreboard is maintained in AttractorPhoenix.Conformance and rendered on the LiveView benchmark page at /benchmark.

Benchmark Matrix

Current black-box scorecard:

Domain	Score	Evidence
Parsing	`4.5`	`mix test test/attractor_ex/conformance/parsing_conformance_test.exs`
Runtime	`4.0`	`mix test test/attractor_ex/conformance/runtime_conformance_test.exs`
State	`3.5`	`mix test test/attractor_ex/conformance/state_conformance_test.exs`
Transport	`4.0`	`mix test test/attractor_ex/conformance/transport_conformance_test.exs`
Agent loop	`4.0`	`mix test test/attractor_ex/conformance/agent_loop_conformance_test.exs`
Unified LLM	`4.0`	`mix test test/attractor_ex/conformance/unified_llm_conformance_test.exs`

Composite conformance score: 4.0

Verification Commands

Run the public proof surface with:

mix test test/attractor_ex/conformance
mix test test/attractor_ex/http_test.exs
mix test test/attractor_ex/agent/session_test.exs
mix test test/attractor_ex/llm_client_test.exs

The first command is the benchmark-facing harness. The remaining focused suites provide deeper supporting evidence for areas still marked partial.

Gap Ledger

Known proof gaps remain public and explicit:

CONF-STATE-001 The file-backed HTTP manager proves restart persistence, but a wider cold-boot durability benchmark is still a runtime-foundation item. Evidence: test/attractor_ex/conformance/state_conformance_test.exs Roadmap: docs/plan/02-runtime-foundation.md
CONF-TRANSPORT-001 The benchmark harness covers create/status/questions/answers, but SSE replay remains covered by focused transport tests rather than this compact scoreboard suite. Evidence: test/attractor_ex/http_test.exs Roadmap: docs/plan/03-operator-surface-and-debugger.md
CONF-AGENT-001 The benchmark harness proves the default provider preset and event surface, but deeper multi-provider/subagent matrices still live in focused session tests. Evidence: test/attractor_ex/agent/session_test.exs Roadmap: docs/plan/06-unified-llm-and-agent-platform.md
CONF-LLM-001 The scoreboard proves provider-agnostic JSON and stream normalization, while provider-native parity gaps remain tracked in the unified LLM compliance matrix. Evidence: test/attractor_ex/conformance/unified_llm_conformance_test.exs Roadmap: docs/plan/06-unified-llm-and-agent-platform.md

Success Criteria

This workstream is considered implemented when:

implementation claims can be traced to executable tests
the repo exposes a public benchmark or conformance scorecard
partial or missing areas are documented explicitly, not implied away
the proof surface is maintained alongside the benchmark page and published docs

← Previous Page 04. Builder And Authoring Fidelity

Next Page → 06. Unified LLM And Agent Platform