AI-EXECUTABLE UI REVIEW STANDARD NINE DIMENSIONS · 98 POINTS

A standard
for evaluating UI.

A spec for AI agents to audit web interfaces against nine evidence-grounded dimensions, with severity, confidence, and tolerance baked into the math. Every page that publishes against it — including this one — should be able to defend its own score.

↳ 01

Comprehensibility over decoration

UI should help users understand structure, hierarchy, state, and action with minimal cognitive effort.

↳ 02

Systems over isolated decisions

Good UI emerges from consistent type, color, spacing, state, and component systems.

↳ 03

Perceptual clarity over mathematical purity

Optical correctness and user perception take priority over rigid numeric purity.

↳ 04

Objective defects over stylistic preference

The Agent must distinguish real usability/quality failures from design taste differences.

↳ 05

High-confidence reporting over over-reporting

When evidence is weak, lower confidence, reduce score impact, or output an advisory instead of a hard defect.

§ 02
THE BUDGET

98 points. Allocated.

Eight scored dimensions divide a fixed hard-score budget. Two points remain reserved for future rule expansion. The shape of the bar tells you where the spec believes risk concentrates — visual quality, consistency, and color outweigh information architecture.

SCORE BUDGET 98 / 100 pts RESERVED · 2 pts
D1 · 11
D2 · 9
D3 · 9
D4 · 14
D5 · 10
D6 · 14
D7 · 17
D8 · 14
— · 2
0
25
50
75
100
§ 03
THE MATH

How a deduction computes.

Every observed defect runs through the same formula: a rule's max deduction, scaled by severity, scaled by the agent's confidence. A low-confidence advisory still leaves a faint signal — never zero, never overwhelming.

↳ MAX DEDUCTION

3.0

↳ SEVERITY

↳ CONFIDENCE

↳ FORMULA
3.0 × 1.0 × 1.0 =
↳ DEDUCTION
3.00 pts

A P0 critical defect detected with high confidence consumes the rule's full max. Lower the confidence to Low and the same observation falls to 0.45 pts — visible, never decisive.

§ 04
PAGE ARCHETYPES

Six kinds of page.

Before any rule fires, the agent classifies the page into one primary archetype. The rules apply differently across archetypes — what counts as "first-screen clarity" on a marketing page is not what counts on a dashboard.

ARCHETYPE 01

Marketing / Landing

Value proposition, narrative sections, hero, CTA-driven.

D1.6 CTA D3.3 Emphasis D7.5 Media
ARCHETYPE 02

Dashboard / Analytics

KPI cards, charts, tables, filters, status-rich UI.

D6.2 Rows D8.6 Charts D7.2 Empty
ARCHETYPE 03

Form / Workflow

Data entry, setup, wizard, transactional flow.

D4.5 States D8.4 Click D8.7 Loading
ARCHETYPE 04

Content / Docs

Reading-heavy, article-like, documentation, knowledge pages.

D3.2 Body D1.5 Chunking D5.5 Whitespace
ARCHETYPE 05

Settings / CRUD / Admin

Dense controls, management UI, structured lists/tables.

D6.1 Same-role D6.3 States D5.2 Hierarchy
ARCHETYPE 06

Data Visualization-heavy

Chart-first, exploration-first, visual analytic environments.

D4.6b Color-only D8.6 Charts D7.4 Theme
§ 05
THE NINE DIMENSIONS

What gets measured, and how.

Every rule declares its class — OBJECTIVE defects, HEURISTIC guidance, or ADVISORY observations — alongside its evidence type and a confidence band. The right column shows max deduction; advisory rules are dashed.

§ 06
WORKFLOW

From page to report.

A six-step procedure the agent follows to build a structured issue report, with a sandbox layer that keeps the audit from breaking the page it's measuring.

STEP 0
Semantic Mapping

Classify archetype, identify CTAs, repeated groups, comparable cards, chart regions, semantic actions.

DOMSCREENSHOT
STEP 1
Screenshot Collection

Capture 1440 / 1024 / 375 px, both themes, plus state shots: hover, focus, active, loading, empty.

VIEWPORTSSTATES
STEP 2
DOM & Style Extraction

Computed styles, bounding rects, overflow metrics, pseudo-state availability, theme-aware tokens.

COMPUTEDRECTS
STEP 3
Interaction Sandbox

Intercept navigation; prefer state-triggering over route-changing; capture before/after; revert when possible.

SANDBOXBEFORE/AFTER
STEP 4
Rule Evaluation

For each applicable rule: result, severity, confidence, class, evidence, context, systemic vs local.

SEVERITYCONFIDENCE
STEP 5
Output Generation

Issue table, summary, systemic issues, coverage & confidence meta-indicators, advisory notes, fixes.

REPORTFIXES
§ 07
REFUSALS

What this spec refuses to enforce.

A standard is also a list of laws it declines to write. This spec explicitly avoids treating the following as universal hard defects — they may still surface as advisories in the right context, never as global tests.

×01

"A fixed F-pattern or Z-pattern is required."

SCAN PATH · D1.3
×02

"Closer must always be lighter."

FOREGROUND · D2.3
×03

"Alpha-based text is inherently wrong."

TEXT COLOR · D4.1
×04

"A 12-column grid is mandatory."

ALIGNMENT · D5.4
×05

"Primary CTAs must be pill-shaped."

RADIUS · D6.4
×06

"All native buttons require cursor: pointer."

FEEDBACK · D8.1
×07

"Every LIVE label must animate."

ANIMATION · D8.3
×08

"Pure black in dark mode is a fatal failure."

THEME · D7.4
×09

"Dense optical-centering analysis on every icon."

ALIGNMENT · D5.4
×10

"Language-specific micro-typography rules apply globally."

TYPOGRAPHY · D3
§ 08
RULE INDEX

Fifty rules, one table.

The full rule register, by dimension. Filter by class to see how the spec is built — objective defects anchor the score, heuristics guide it, advisories shape the conversation.

ID Rule Dimension Class Confidence Max
§ 09
PASS THRESHOLD

Six grade bands. One default.

The spec recommends 70 as the default pass threshold — a deliberately ambitious bar that absorbs minor drift while disqualifying clearly broken pages.

A
90 — 98
Excellent
B+
80 — 89
Good
B
70 — 79
Acceptable
C
60 — 69
Below standard
D
40 — 59
Poor
F
00 — 39
Failing
Built to its own standard. Audited continuously. Available to be challenged.

UI DESIGN SPEC · AI-EXECUTABLE STANDARD
9 DIMENSIONS · 50 RULES · 98 HARD POINTS · 3-TIER CONFIDENCE
VISUALIZED AS A SINGLE-PAGE ARTIFACT

SELF-AUDIT
98 / 98
GRADE A