ahmido b0a724acef feat: harden canonical run viewer and onboarding draft state (#173 )

## Summary
- harden the canonical operation run viewer so mismatched, missing, archived, onboarding, and selector-excluded tenant context no longer invalidates authorized canonical run viewing
- extend canonical route, header-context, deep-link, and presentation coverage for Spec 144 and add the full spec artifact set under `specs/144-canonical-operation-viewer-context-decoupling/`
- harden onboarding draft provider-connection resume logic so stale persisted provider connections fall back to the connect-provider step instead of resuming invalid state
- add architecture-audit follow-up candidate material and prompt assets for the next governance hardening wave

## Testing
- `vendor/bin/sail bin pint --dirty --format agent`
- `vendor/bin/sail artisan test --compact tests/Feature/144/CanonicalOperationViewerContextMismatchTest.php tests/Feature/144/CanonicalOperationViewerDeepLinkTrustTest.php tests/Feature/Operations/TenantlessOperationRunViewerTest.php tests/Feature/OpsUx/OperateHubShellTest.php tests/Feature/Monitoring/OperationsTenantScopeTest.php tests/Feature/RunAuthorizationTenantIsolationTest.php tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php tests/Feature/Monitoring/HeaderContextBarTest.php tests/Feature/Monitoring/OperationRunResolvedReferencePresentationTest.php tests/Feature/Monitoring/OperationsCanonicalUrlsTest.php`
- `vendor/bin/sail artisan test --compact tests/Feature/ManagedTenantOnboardingWizardTest.php tests/Unit/Onboarding/OnboardingDraftStageResolverTest.php tests/Unit/Onboarding/OnboardingLifecycleServiceTest.php`

## Notes
- branch: `144-canonical-operation-viewer-context-decoupling`
- base: `dev`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #173

2026-03-15 18:32:04 +00:00

16 KiB

Raw Blame History

Spec Candidates

Concrete future specs waiting for prioritization. Each entry has enough structure to become a real spec when the time comes.

Flow: Inbox → Qualified → Planned → Spec created → removed from this file

Last reviewed: 2026-03-15

Inbox

Ungefiltert. Kurze Notiz reicht. Wöchentlich sichten.

Dashboard trend visualizations (sparklines, compliance gauge, drift-over-time chart)
Dashboard "Needs Attention" should be visually louder (alert color, icon, severity weighting)
Operations table should show duration + affected policy count
Density control / comfortable view toggle for admin tables
Inventory landing page may be redundant — consider pure navigation section
Settings change history → explainable change tracking
Workspace chooser v2: search, sort, favorites, pins, environment badges, last activity

Qualified

Problem + Nutzen klar. Scope noch offen. Braucht noch Priorisierung.

Governance Architecture Hardening Wave

Type: hardening
Source: architecture audit 2026-03-15
Problem: The architecture audit surfaced four cross-cutting governance gaps that are too structural for isolated bugfixes: queued execution trust, tenant-owned query canon drift, findings workflow enforcement softness, and convention-driven Livewire trust boundaries.
Why it matters: These are enterprise trust issues, not cosmetic cleanup. Left unresolved, they increase the probability of scope drift, authorization decay across async boundaries, workflow bypass, and mutable UI-state trust in a product that manages tenant-sensitive governance flows.
Proposed direction: Treat the audit as a candidate wave, not as one umbrella mega-spec. Promote the four candidates individually when slots are available:
- Queued execution reauthorization and scope continuity
- Tenant-owned query canon and wrong-tenant guards
- Findings workflow enforcement and audit backstop
- Livewire context locking and trusted-state reduction
Dependencies: Audit constitution and candidate detail document in ../audits/tenantpilot-architecture-audit-constitution.md and ../audits/2026-03-15-audit-spec-candidates.md
Priority: high

Queued Execution Reauthorization and Scope Continuity

Type: hardening
Source: architecture audit 2026-03-15
Problem: Queued work still relies too heavily on dispatch-time actor and tenant state. Execution-time scope continuity and capability revalidation are not yet hardened as a canonical backend contract.
Why it matters: This is a backend trust-gap on the mutation path. It creates the class of failure where a UI action was valid at dispatch time but the queued execution is no longer legitimate when it runs.
Proposed direction: Define execution-time reauthorization, tenant operability rechecks, denial semantics, and audit visibility as a dedicated spec instead of scattering local authorize() patches.
Dependencies: Existing operations semantics, audit log foundation, queued job execution paths
Priority: high

Tenant-Owned Query Canon and Wrong-Tenant Guards

Type: hardening
Source: architecture audit 2026-03-15
Problem: Tenant isolation exists, but many reads still depend on local tenant_id filters instead of a reusable canonical query path. Wrong-tenant regression coverage is also uneven.
Why it matters: This is isolation drift. Repeated local filtering increases the chance of future cross-tenant mistakes across resources, widgets, actions, and detail pages.
Proposed direction: Define a canonical query entry pattern for tenant-owned models plus a required wrong-tenant regression matrix for tier-1 surfaces.
Dependencies: Canonical tenant context work in Specs 135 and 136
Priority: high

Findings Workflow Enforcement and Audit Backstop

Type: hardening
Source: architecture audit 2026-03-15
Problem: Findings lifecycle semantics are strong in the spec, but enforcement still depends too much on service-path discipline. Direct or bypassing state mutation remains too plausible.
Why it matters: This is workflow-truth debt in a governance domain. If findings state can drift outside the canonical workflow path, auditability and operator trust degrade together.
Proposed direction: Formalize transition enforcement and add an audit backstop so meaningful lifecycle changes cannot silently bypass the intended workflow.
Dependencies: Findings workflow SLA (Spec 111), audit log foundation (Spec 134)
Priority: high

Livewire Context Locking and Trusted-State Reduction

Type: hardening
Source: architecture audit 2026-03-15
Problem: Complex Livewire and Filament flows still expose ownership-relevant context in public component state without one explicit repo-wide hardening standard.
Why it matters: This is a trust-boundary problem. Even without a known exploit, mutable client-visible identifiers and workflow context make future authorization and isolation mistakes more likely.
Proposed direction: Define a reusable hardening pattern for locked identifiers, server-derived workflow truth, and forged-state regression tests on tier-1 component families.
Dependencies: Managed tenant onboarding draft identity (Spec 138), onboarding lifecycle checkpoint work (Spec 140)
Priority: medium

Exception / Risk-Acceptance Workflow for Findings

Type: feature
Source: HANDOVER gap analysis, Spec 111 follow-up
Problem: Finding has risk_accepted status but no formal exception entity. No workflow to accept risk, track justification, or expire acceptance.
Why it matters: Enterprise compliance requires documented risk acceptance. Auditors ask "who accepted this and when?"
Proposed direction: Exception entity linked to Finding, approval flow, expiry tracking, audit trail
Dependencies: Findings workflow (Spec 111) complete
Priority: high

Evidence Pack Entity

Type: feature
Source: HANDOVER gap, R2 theme completion
Problem: Review pack export (Spec 109) exists, permission posture (104/105) exists, but no formal "evidence pack" that bundles these for external audit/compliance submission.
Why it matters: Enterprise customers need a single deliverable for auditors — not separate exports.
Proposed direction: Evidence pack = curated bundle of review pack + posture report + findings summary + baseline governance state
Dependencies: Review pack export (109), permission posture (104)
Priority: high

Policy Lifecycle / Ghost Policies (Spec 900 refresh)

Type: feature
Source: Spec 900 draft (2025-12-22), HANDOVER risk #9
Problem: Policies deleted in Intune remain in TenantAtlas indefinitely. No deletion indicators. Backup items reference "ghost" policies.
Why it matters: Data integrity, user confusion, backup reliability
Proposed direction: Soft delete detection during sync, auto-restore on reappear, "Deleted" badge, restore from backup. Draft in Spec 900.
Dependencies: Inventory sync stable
Priority: medium

Schema-driven Secret Classification

Type: hardening
Source: Spec 120 deferred follow-up
Problem: Secret redaction currently uses pattern-based detection. A schema-driven approach via GraphContractRegistry metadata would be more reliable.
Why it matters: Reduces false negatives in secret redaction
Proposed direction: Central classifier in GraphContractRegistry, regression corpus
Dependencies: Secret redaction (120) stable, registry completeness (095)
Priority: medium

Cross-Tenant Compare & Promotion

Type: feature
Source: Spec 043 draft, 0800-future-features
Problem: No way to compare policies between tenants or promote configurations from staging to production.
Why it matters: Core MSP/enterprise workflow. Identified as top revenue lever in brainstorming.
Proposed direction: Compare/diff UI, group/scope-tag mapping, promotion plan (preview → dry-run → cutover → verify)
Dependencies: Inventory sync, backup/restore mature
Priority: medium (high value, high effort)

System Console Multi-Workspace Operator

Type: feature
Source: Spec 113 deferred
Problem: System console (/system) currently can't select/filter across workspaces for platform operators.
Why it matters: Platform ops need cross-workspace visibility for troubleshooting and monitoring.
Proposed direction: New UX + entitlement model for system-level operators
Dependencies: System console (114) stable
Priority: low

Workspace Chooser v2

Type: polish
Source: Spec 107 deferred backlog
Problem: Current chooser is functional but basic. Missing search, sort, favorites, environment badges, last activity display.
Why it matters: MSPs with 10+ workspaces need fast navigation.
Proposed direction: Search + sort + pins, environment badge (Prod/Test/Staging), last activity per workspace, dropdown switcher in header
Dependencies: Workspace chooser v1 (107) stable
Priority: low

Dashboard Polish (Enterprise-grade)

Type: polish
Source: Product review 2026-03-08
Problem: Current dashboard shows raw numbers without context. No trend indicators, no severity weighting, governance card too small.
Why it matters: First impression for evaluators. Enterprise admins compare with Datadog/Vanta/Drata/Intune Portal.
Proposed direction: Trend sparklines, compliance gauge, severity-weighted drift table, actionable alert buttons, progressive disclosure
Dependencies: Baseline governance (101), alerts (099), drift engine (119) stable
Priority: medium

Operations Naming Harmonization Across Run Types, Catalog, UI, and Audit

Type: hardening
Source: coding discovery, operations UX consistency review
Why it matters: Strategically important for enterprise UX, auditability, and long-term platform consistency. OperationRun is becoming a cross-domain execution and monitoring backbone, and the current naming drift will get more expensive as new run types and provider domains are added. This should reduce future naming drift, but it is not a blocker-critical refactor and should not be pulled in as a side quest during small UI changes.
Problem: Naming around operations appears historically grown and not consistent enough across OperationRunType values, visible run labels, OperationCatalog mappings, notifications, audit events, filters, badges, and related UI copy. Internal type names and operator-facing language are not cleanly separated, domain/object/verb ordering is uneven, and small UX fixes risk reinforcing an already inconsistent scheme. If left as-is, new run types for baseline, review, alerts, and additional provider domains will extend the inconsistency instead of converging it.
Desired outcome: A later spec should define a clear naming standard for OperationRunType, establish an explicit distinction between internal type identifiers and operator-facing labels, and align terminology across runs, notifications, audit text, monitoring views, and operations UI. New run types should have documented naming rules so they can be added without re-opening the vocabulary debate.
In scope: Inventory of current operation-related naming surfaces; naming taxonomy for internal identifiers versus visible operator language; conventions for verb/object/domain ordering; alignment rules for OperationCatalog, run labels, notifications, audit events, filters, badges, and monitoring UI; forward-looking rules for adding new run types and provider/domain families; a pragmatic migration plan that minimizes churn and preserves audit clarity.
Out of scope: Opportunistic mass-refactors during unrelated feature work; immediate renaming of all historical values without a compatibility plan; using a small UI wording issue such as "Sync from Intune" versus "Sync policies" as justification for broad churn; a full operations-domain rearchitecture unless later analysis proves it necessary.
Trigger / Best time to do this: Best tackled when multiple new run types are about to land, when OperationCatalog / monitoring / operations hub work is already active, when new domains such as Entra or Teams are being integrated, or when a broader UI naming constitution is ready to be enforced technically. This is a good candidate for a planned cleanup window, not an ad hoc refactor.
Risks if ignored: Continued terminology drift across UI and audit layers, higher cognitive load for operators, weaker enterprise polish, more brittle label mapping, and more expensive cleanup once additional domains and execution types are established. Audit/event language may diverge further from monitoring language, making cross-surface reasoning harder.
Suggested direction: Define stable internal run-type identifiers separately from visible operator labels. Standardize a single naming grammar for operation concepts, including when to lead with verb, object, or domain, and when provider-specific wording is allowed. Apply changes incrementally with compatibility-minded mapping rather than a brachial rename of every historical string. Prefer a staged migration that first defines rules and mapping layers, then updates high-value operator surfaces, and only later addresses legacy internals where justified.
Readiness level: Qualified and strategically important, but intentionally deferred. This should be specified before substantially more run types and provider domains are introduced, yet it should not become an immediate side-track or be bundled into minor UI wording fixes.
Candidate quality:
- Clearly identified cross-cutting problem with architectural and UX impact
- Strong future-facing trigger conditions instead of vague "sometime later"
- Explicit boundaries to prevent opportunistic churn
- Concrete desired outcome without overdesigning the solution
- Easy to promote into a full spec once operations-domain work is prioritized

Support Intake with Context (MVP)

Type: feature
Source: Product design, operator feedback
Problem: Nutzer haben keinen strukturierten Weg, Probleme direkt aus dem Produkt zu melden. Bei technischen Fehlern fehlen Run-/Tenant-/Provider-Details; bei Access-/UX-Problemen fehlen Route-/RBAC-Kontext. Folge: ineffiziente Support-Schleifen und Rückfragen. Ein vollwertiges Ticketsystem ist falsch priorisiert.
Why it matters: Reduziert Support-Reibung, erhöht Erfassungsqualität, steigert wahrgenommene Produktreife. Schafft typed intake layer für spätere Webhook-/PSA-/Ticketing-Erweiterungen, ohne jetzt ein Helpdesk einzuführen.
Proposed direction: Neues SupportRequest-Modell (kein Ticket/Case) mit source_type (operation_run, provider_connection, access_denied, generic) und issue_kind (technical_problem, access_problem, ux_feedback, other). Drei Entry Paths: (1) Context-bound aus failed OperationRun, (2) Access-Denied/403-Kontext, (3) generischer Feedback-Einstieg (User-Menü). Automatischer Context-Snapshot per SupportRequestContextBuilder je source_type. Persistierung vor Delivery. E-Mail-Delivery an konfigurierte Support-Adresse. Fingerprint-basierter Spam-Guard. Audit-Events. RBAC via support.request.create Capability. Scope-Isolation. Secret-Redaction in context_jsonb.
Dependencies: OperationRun-Domain stabil, RBAC/Capability-System (066+), Workspace-/Tenant-Scoping
Priority: medium

Planned

Ready for spec creation. Waiting for slot in active work.

(empty — move items here when prioritized for next sprint)

Template

### Title
- **Type**: feature | polish | hardening | bug | research
- **Source**: chat | audit | coding discovery | customer feedback | spec N follow-up
- **Problem**:
- **Why it matters**:
- **Proposed direction**:
- **Dependencies**:
- **Priority**: low | medium | high

16 KiB Raw Blame History

Spec Candidates

Inbox

Qualified

Governance Architecture Hardening Wave

Queued Execution Reauthorization and Scope Continuity

Tenant-Owned Query Canon and Wrong-Tenant Guards

Findings Workflow Enforcement and Audit Backstop

Livewire Context Locking and Trusted-State Reduction

Exception / Risk-Acceptance Workflow for Findings

Evidence Pack Entity

Policy Lifecycle / Ghost Policies (Spec 900 refresh)

Schema-driven Secret Classification

Cross-Tenant Compare & Promotion

System Console Multi-Workspace Operator

Workspace Chooser v2

Dashboard Polish (Enterprise-grade)

Operations Naming Harmonization Across Run Types, Catalog, UI, and Audit

Support Intake with Context (MVP)

Planned

Template

16 KiB

Raw Blame History