TenantAtlas/specs/364-restore-high-risk-operation-reconciliation/plan.md
ahmido 3ce1cae71e feat: implement restore high risk operation reconciliation (#435)
Implemented restore high risk operation reconciliation.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #435
2026-06-07 14:10:34 +00:00

15 KiB

Implementation Plan: Spec 364 - Restore and High-Risk Operation Reconciliation

Branch: 364-restore-high-risk-operation-reconciliation | Date: 2026-06-07 | Spec: specs/364-restore-high-risk-operation-reconciliation/spec.md Input: Feature specification from /specs/364-restore-high-risk-operation-reconciliation/spec.md

Summary

Harden the existing restore.execute OperationRun reconciliation path so success is proof-gated. The implementation should adjust the current restore adapter and visible fallout over existing Operations and Restore Run detail surfaces. It must not create new restore operation types, new outcomes, new persistence, or a generic high-risk operation framework.

Technical Context

Language/Version: PHP 8.4.15 Primary Dependencies: Laravel 12.52, Filament 5.2.1, Livewire 4.1.4 Storage: PostgreSQL; no schema change expected Testing: Pest 4.3 / PHPUnit 12; Filament/Livewire action tests where UI proof fallout changes Validation Lanes: fast-feedback + confidence; browser only if visible hierarchy changes; pgsql only if query/index/lock behavior changes Target Platform: Laravel Sail locally, Dokploy/container deployment for staging/production Project Type: Laravel monolith under apps/platform Performance Goals: adapter reconciliation remains DB-local; no provider calls; no material page-render query growth Constraints: no Graph calls outside GraphClientInterface, no Graph calls during render, no new package/dependency, no new migration Scale/Scope: exactly restore.execute; unsupported restore/high-risk operation families remain out of scope

UI / Surface Guardrail Plan

  • Guardrail scope: changed existing high-risk workflow/detail and monitoring surfaces
  • Affected routes/pages/actions/states/navigation/panel/provider surfaces:
    • /admin/workspaces/{workspace}/operations
    • /admin/workspaces/{workspace}/operations/{run}
    • existing Restore Run resource list/create/detail surfaces
    • no panel/provider registration change
  • No-impact class, if applicable: N/A
  • Native vs custom classification summary: mixed existing Filament pages plus existing restore custom Blade/infolist entries
  • Shared-family relevance: OperationRun monitoring, restore proof/detail, dangerous action proof wording
  • State layers in scope: page, detail, action feedback, reconciliation metadata
  • Audience modes in scope: operator-MSP, support-platform
  • Decision/diagnostic/raw hierarchy plan: outcome/reason/impact first, proof/evidence second, raw context and provider detail collapsed or support-only
  • Raw/support gating plan: preserve existing detail/diagnostic disclosure; no raw provider payload by default
  • One-primary-action / duplicate-truth control: Operations row/detail points to either restore run or proof gap; Restore detail owns the recovery-proof question once
  • Handling modes by drift class or surface: review-mandatory for restore proof and OperationRun outcome wording
  • Repository-signal treatment: report-only unless implementation creates new visible hierarchy, then update page report or screenshot artifacts
  • Special surface test profiles: shared-detail-family, monitoring-state-page, dangerous-workflow
  • Required tests or manual smoke: Unit + Feature; Browser smoke only if visible restore/operations hierarchy materially changes
  • Exception path and spread control: none; no new UI exception is expected
  • Active feature PR close-out entry: Guardrail / Exception / Smoke Coverage
  • UI/Productization coverage decision: existing surface coverage remains valid unless implementation creates hierarchy drift
  • Coverage artifacts to update: none expected; screenshots under this spec only if browser smoke is added
  • No-impact rationale: N/A
  • Navigation / Filament provider-panel handling: unchanged; Laravel 12 panel providers remain in apps/platform/bootstrap/providers.php
  • Screenshot or page-report need: no by default; yes only if visible proof hierarchy materially changes

Shared Pattern & System Fit

  • Cross-cutting feature marker: yes
  • Systems touched:
    • apps/platform/app/Support/Operations/Reconciliation/RestoreExecuteReconciliationAdapter.php
    • apps/platform/app/Support/Operations/Reconciliation/ReconciliationResult.php only if existing result shape needs derived reason metadata support without new outcomes
    • apps/platform/app/Services/AdapterRunReconciler.php only if restore lifecycle timestamp syncing needs proof-aware adjustment
    • apps/platform/app/Services/OperationRunService.php only if service-owned reconciliation writes need safe metadata merge support
    • apps/platform/app/Filament/Resources/RestoreRunResource/Presenters/RestoreRunDetailPresenter.php
    • apps/platform/app/Support/RestoreSafety/RestoreSafetyResolver.php
    • current OperationRun monitoring/detail presenters only as needed
  • Shared abstractions reused: OperationRunService, OperationRunReconciliationRegistry, OperationRunLinks, RestoreRunDetailPresenter, RestoreSafetyResolver, BadgeCatalog / BadgeRenderer
  • New abstraction introduced? why?: avoid by default; introduce only a local derived restore proof evaluator if it removes duplicated proof decisions and stays restore-only
  • Why the existing abstraction was sufficient or insufficient: existing adapter registry and service write seam are sufficient; current restore adapter criteria are insufficient for high-risk success
  • Bounded deviation / spread control: all changes remain restore-only and must not introduce high-risk operation framework machinery

OperationRun UX Impact

  • Touches OperationRun start/completion/link UX?: yes
  • Central contract reused: current OperationRun service, link, presenter, and monitoring/detail paths
  • Delegated UX behaviors: existing queued toast, run link, run-enqueued event, terminal notification path, and URL resolution remain on shared paths
  • Surface-owned behavior kept local: restore confirmation copy, restore-specific proof detail, restore result decision model
  • Queued DB-notification policy: unchanged / no opt-in change
  • Terminal notification path: unchanged central lifecycle mechanism
  • Exception path: none

Provider Boundary & Portability Fit

  • Shared provider/platform boundary touched?: yes
  • Provider-owned seams: restore.execute, write gate, provider capability evaluation, provider result/failure details in restore services
  • Platform-core seams: OperationRun, OperationRunOutcome, context.reconciliation, audit-safe metadata, Operations UI
  • Neutral platform terms / contracts preserved: operation, execution proof, provider acceptance, verification evidence, scope safety, managed environment
  • Retained provider-specific semantics and why: restore execution remains Microsoft/Intune-specific in current release; this spec does not pretend multi-provider restore exists
  • Bounded extraction or follow-up path: no extraction expected; future restore verification operation family is a follow-up if product truth appears

Constitution Check

GATE: Must pass before implementation. Re-check after design.

  • Inventory-first / snapshots-second: no inventory or snapshot source-of-truth change.
  • Read/write separation: existing restore write action remains preview/confirmation/audit protected; this spec only tightens proof after execution.
  • Graph contract path: no new Graph call or contract expected; any existing restore Graph behavior remains behind current services and GraphClientInterface.
  • Deterministic capabilities: no new capability; existing restore capability and provider operation start gate remain authoritative.
  • RBAC-UX: server-side authorization remains required; non-member and wrong-scope access is 404, member missing capability is 403.
  • Workspace isolation: all restored run/operation/evidence linkage must match workspace.
  • Tenant isolation: all RestoreRun and OperationRun joins must match managed environment.
  • Run observability: OperationRun.status / outcome transitions remain service-owned through OperationRunService.
  • Ops-UX summary counts: all summary counts remain flat numeric values.
  • Test governance: Unit/Feature proof is required, browser only when visible hierarchy changes.
  • Proportionality: no new persistence, no new outcome, no generic framework; any local helper must be justified by proof-rule duplication.
  • No premature abstraction: no high-risk registry/framework; restore-only hardening.
  • Persisted truth: no new table or persisted mirror.
  • Behavioral state: no new verification_required outcome; verification gaps use existing outcomes plus reason/evidence metadata.
  • Reconciliation decision semantics: not_reconciled is a non-final ReconciliationResult decision, not an OperationRun outcome, and must not hide same-scope proof gaps that operators need to see.
  • UI/Productization coverage: existing surfaces only; screenshot/page report proportional to visible change.
  • Filament v5 / Livewire v4: Livewire v4.1.4 is already installed and compliant.
  • Filament provider registration: no provider change; Laravel 12 providers remain in apps/platform/bootstrap/providers.php.
  • Global search: no globally searchable resource change is expected; if RestoreRun/OperationRun resources are touched, do not enable global search.
  • Destructive/high-impact actions: no new destructive action; existing restore execute path must keep ->action(...), ->requiresConfirmation(), server authorization, audit, and tests.
  • Asset strategy: no new Filament assets expected; deployment filament:assets remains unchanged and only needed for registered asset changes, which are out of scope.

Test Governance Check

  • Test purpose / classification by changed surface:
    • Unit: restore proof decision and adapter branch mapping
    • Feature: adapter reconciliation writes, scope safety, Operations/Restore detail fallout
    • Browser: conditional focused high-risk proof hierarchy smoke only if visible hierarchy changes
  • Affected validation lanes: fast-feedback, confidence, optional browser
  • Why this lane mix is the narrowest sufficient proof: schema and query semantics are unchanged; proof logic and UI fallout are business behavior
  • Narrowest proving command(s):
    • cd apps/platform && ./vendor/bin/sail artisan test --compact --filter=Spec364
    • cd apps/platform && ./vendor/bin/sail artisan test --compact --filter=RestoreRun
    • cd apps/platform && ./vendor/bin/sail artisan test --compact --filter=OperationRun
    • optional browser smoke command if browser file is added
  • Fixture / helper / factory / seed / context cost risks: keep restore/backup/operation/evidence fixtures local to tests
  • Expensive defaults or shared helper growth introduced?: no
  • Heavy-family additions, promotions, or visibility changes: none
  • Surface-class relief / special coverage rule: high-risk restore is not standard relief; add explicit proof tests
  • Closing validation and reviewer handoff: verify no success from preview-only, missing-proof, wrong-scope, or mixed-result fixtures
  • Budget / baseline / trend follow-up: none expected
  • Review-stop questions: proof completeness, scope safety, no new outcome, no Graph calls, no raw provider data
  • Escalation path: document-in-feature
  • Active feature PR close-out entry: Guardrail / Exception / Smoke Coverage
  • Why no dedicated follow-up spec is needed: Spec 364 is the bounded follow-up for restore execution truth; future verification/rollback families remain explicitly deferred

Project Structure

Documentation (this feature)

specs/364-restore-high-risk-operation-reconciliation/
├── spec.md
├── plan.md
├── tasks.md
└── checklists/
    └── requirements.md

Source Code (likely affected; no code changed during preparation)

apps/platform/app/Support/Operations/Reconciliation/
├── RestoreExecuteReconciliationAdapter.php
├── ReconciliationResult.php
└── OperationRunReconciliationRegistry.php

apps/platform/app/Services/
├── AdapterRunReconciler.php
└── OperationRunService.php

apps/platform/app/Support/RestoreSafety/
├── RestoreSafetyResolver.php
└── RestoreResultAttention.php

apps/platform/app/Filament/Resources/
├── OperationRunResource.php
└── RestoreRunResource.php

apps/platform/app/Filament/Pages/
├── Monitoring/Operations.php
└── Operations/TenantlessOperationRunViewer.php

apps/platform/tests/
├── Unit/Support/Operations/Reconciliation/
├── Unit/Support/RestoreSafety/
├── Feature/Operations/
├── Feature/Restore/
└── Browser/ (optional)

Structure Decision: Use existing Laravel app surfaces under apps/platform; add no new top-level folders and no dependencies.

Complexity Tracking

Violation Why Needed Simpler Alternative Rejected Because
Restore-specific proof rule hardening High-risk tenant-changing restore.execute needs stricter success proof than read-only/domain-output runs Status-only mapping is already the problem and can overclaim recovery
Optional small local proof evaluator Only if adapter/detail would duplicate the same proof bundle rules A generic high-risk framework or new outcome family is too broad

Proportionality Review

  • Current operator problem: a restore operation can appear successful without complete recovery proof.
  • Existing structure is insufficient because: terminal RestoreRun status alone is weaker than the proof required for tenant-changing success.
  • Narrowest correct implementation: harden the existing restore adapter and presentation fallout over existing records and outcomes.
  • Ownership cost created: focused restore proof rules and regression tests.
  • Alternative intentionally rejected: new verification_required OperationRun outcome, new restore.verify operation type, new restore verification table, and generic high-risk framework.
  • Release truth: current-release truth.

Implementation Phases

  1. Re-verify current restore proof truth and existing test fixtures.
  2. Add failing Unit/Feature tests for restore proof mapping, audit continuity, soft-deleted RestoreRun handling, and wrong-scope safety.
  3. Harden RestoreExecuteReconciliationAdapter to require the spec's Success Proof Bundle Matrix and map partial/blocked/failed/proof-gap cases to existing outcomes.
  4. Adjust OperationRun and Restore detail presentation only if needed to display proof-gap reasons without duplicate default-visible truth.
  5. Add optional browser smoke only if visible hierarchy changes.
  6. Run focused validation and record close-out notes.

Rollout Considerations

  • Environment variables: none expected.
  • Migrations: none expected.
  • Queues/workers: no new queue family; existing restore jobs remain queued and observable.
  • Scheduler: no scheduler change expected.
  • Storage: no storage change expected.
  • Deployment assets: no new Filament assets expected; no new filament:assets requirement beyond existing deploy process.
  • Staging/production: validate in staging before production because restore is high-risk.

Risk Controls

  • Fail closed on ambiguous proof.
  • Do not add new outcomes or operation types.
  • Keep reconciliation service-owned.
  • Do not call provider APIs from reconciliation or UI render.
  • Sanitize failure/reason metadata.
  • Preserve RBAC and deny-as-not-found boundaries.