ahmido 3ce1cae71e feat: implement restore high risk operation reconciliation (#435 )

Implemented restore high risk operation reconciliation.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #435

2026-06-07 14:10:34 +00:00

15 KiB

Raw Permalink Blame History

Implementation Plan: Spec 364 - Restore and High-Risk Operation Reconciliation

Branch: 364-restore-high-risk-operation-reconciliation | Date: 2026-06-07 | Spec: specs/364-restore-high-risk-operation-reconciliation/spec.md Input: Feature specification from /specs/364-restore-high-risk-operation-reconciliation/spec.md

Summary

Harden the existing restore.execute OperationRun reconciliation path so success is proof-gated. The implementation should adjust the current restore adapter and visible fallout over existing Operations and Restore Run detail surfaces. It must not create new restore operation types, new outcomes, new persistence, or a generic high-risk operation framework.

Technical Context

Language/Version: PHP 8.4.15 Primary Dependencies: Laravel 12.52, Filament 5.2.1, Livewire 4.1.4 Storage: PostgreSQL; no schema change expected Testing: Pest 4.3 / PHPUnit 12; Filament/Livewire action tests where UI proof fallout changes Validation Lanes: fast-feedback + confidence; browser only if visible hierarchy changes; pgsql only if query/index/lock behavior changes Target Platform: Laravel Sail locally, Dokploy/container deployment for staging/production Project Type: Laravel monolith under apps/platform Performance Goals: adapter reconciliation remains DB-local; no provider calls; no material page-render query growth Constraints: no Graph calls outside GraphClientInterface, no Graph calls during render, no new package/dependency, no new migration Scale/Scope: exactly restore.execute; unsupported restore/high-risk operation families remain out of scope

UI / Surface Guardrail Plan

Guardrail scope: changed existing high-risk workflow/detail and monitoring surfaces
Affected routes/pages/actions/states/navigation/panel/provider surfaces:
- /admin/workspaces/{workspace}/operations
- /admin/workspaces/{workspace}/operations/{run}
- existing Restore Run resource list/create/detail surfaces
- no panel/provider registration change
No-impact class, if applicable: N/A
Native vs custom classification summary: mixed existing Filament pages plus existing restore custom Blade/infolist entries
Shared-family relevance: OperationRun monitoring, restore proof/detail, dangerous action proof wording
State layers in scope: page, detail, action feedback, reconciliation metadata
Audience modes in scope: operator-MSP, support-platform
Decision/diagnostic/raw hierarchy plan: outcome/reason/impact first, proof/evidence second, raw context and provider detail collapsed or support-only
Raw/support gating plan: preserve existing detail/diagnostic disclosure; no raw provider payload by default
One-primary-action / duplicate-truth control: Operations row/detail points to either restore run or proof gap; Restore detail owns the recovery-proof question once
Handling modes by drift class or surface: review-mandatory for restore proof and OperationRun outcome wording
Repository-signal treatment: report-only unless implementation creates new visible hierarchy, then update page report or screenshot artifacts
Special surface test profiles: shared-detail-family, monitoring-state-page, dangerous-workflow
Required tests or manual smoke: Unit + Feature; Browser smoke only if visible restore/operations hierarchy materially changes
Exception path and spread control: none; no new UI exception is expected
Active feature PR close-out entry: Guardrail / Exception / Smoke Coverage
UI/Productization coverage decision: existing surface coverage remains valid unless implementation creates hierarchy drift
Coverage artifacts to update: none expected; screenshots under this spec only if browser smoke is added
No-impact rationale: N/A
Navigation / Filament provider-panel handling: unchanged; Laravel 12 panel providers remain in apps/platform/bootstrap/providers.php
Screenshot or page-report need: no by default; yes only if visible proof hierarchy materially changes

Shared Pattern & System Fit

Cross-cutting feature marker: yes
Systems touched:
- apps/platform/app/Support/Operations/Reconciliation/RestoreExecuteReconciliationAdapter.php
- apps/platform/app/Support/Operations/Reconciliation/ReconciliationResult.php only if existing result shape needs derived reason metadata support without new outcomes
- apps/platform/app/Services/AdapterRunReconciler.php only if restore lifecycle timestamp syncing needs proof-aware adjustment
- apps/platform/app/Services/OperationRunService.php only if service-owned reconciliation writes need safe metadata merge support
- apps/platform/app/Filament/Resources/RestoreRunResource/Presenters/RestoreRunDetailPresenter.php
- apps/platform/app/Support/RestoreSafety/RestoreSafetyResolver.php
- current OperationRun monitoring/detail presenters only as needed
Shared abstractions reused: OperationRunService, OperationRunReconciliationRegistry, OperationRunLinks, RestoreRunDetailPresenter, RestoreSafetyResolver, BadgeCatalog / BadgeRenderer
New abstraction introduced? why?: avoid by default; introduce only a local derived restore proof evaluator if it removes duplicated proof decisions and stays restore-only
Why the existing abstraction was sufficient or insufficient: existing adapter registry and service write seam are sufficient; current restore adapter criteria are insufficient for high-risk success
Bounded deviation / spread control: all changes remain restore-only and must not introduce high-risk operation framework machinery

OperationRun UX Impact

Touches OperationRun start/completion/link UX?: yes
Central contract reused: current OperationRun service, link, presenter, and monitoring/detail paths
Delegated UX behaviors: existing queued toast, run link, run-enqueued event, terminal notification path, and URL resolution remain on shared paths
Surface-owned behavior kept local: restore confirmation copy, restore-specific proof detail, restore result decision model
Queued DB-notification policy: unchanged / no opt-in change
Terminal notification path: unchanged central lifecycle mechanism
Exception path: none

Provider Boundary & Portability Fit

Shared provider/platform boundary touched?: yes
Provider-owned seams: restore.execute, write gate, provider capability evaluation, provider result/failure details in restore services
Platform-core seams: OperationRun, OperationRunOutcome, context.reconciliation, audit-safe metadata, Operations UI
Neutral platform terms / contracts preserved: operation, execution proof, provider acceptance, verification evidence, scope safety, managed environment
Retained provider-specific semantics and why: restore execution remains Microsoft/Intune-specific in current release; this spec does not pretend multi-provider restore exists
Bounded extraction or follow-up path: no extraction expected; future restore verification operation family is a follow-up if product truth appears

Constitution Check

GATE: Must pass before implementation. Re-check after design.

Inventory-first / snapshots-second: no inventory or snapshot source-of-truth change.
Read/write separation: existing restore write action remains preview/confirmation/audit protected; this spec only tightens proof after execution.
Graph contract path: no new Graph call or contract expected; any existing restore Graph behavior remains behind current services and GraphClientInterface.
Deterministic capabilities: no new capability; existing restore capability and provider operation start gate remain authoritative.
RBAC-UX: server-side authorization remains required; non-member and wrong-scope access is 404, member missing capability is 403.
Workspace isolation: all restored run/operation/evidence linkage must match workspace.
Tenant isolation: all RestoreRun and OperationRun joins must match managed environment.
Run observability: OperationRun.status / outcome transitions remain service-owned through OperationRunService.
Ops-UX summary counts: all summary counts remain flat numeric values.
Test governance: Unit/Feature proof is required, browser only when visible hierarchy changes.
Proportionality: no new persistence, no new outcome, no generic framework; any local helper must be justified by proof-rule duplication.
No premature abstraction: no high-risk registry/framework; restore-only hardening.
Persisted truth: no new table or persisted mirror.
Behavioral state: no new verification_required outcome; verification gaps use existing outcomes plus reason/evidence metadata.
Reconciliation decision semantics: not_reconciled is a non-final ReconciliationResult decision, not an OperationRun outcome, and must not hide same-scope proof gaps that operators need to see.
UI/Productization coverage: existing surfaces only; screenshot/page report proportional to visible change.
Filament v5 / Livewire v4: Livewire v4.1.4 is already installed and compliant.
Filament provider registration: no provider change; Laravel 12 providers remain in apps/platform/bootstrap/providers.php.
Global search: no globally searchable resource change is expected; if RestoreRun/OperationRun resources are touched, do not enable global search.
Destructive/high-impact actions: no new destructive action; existing restore execute path must keep ->action(...), ->requiresConfirmation(), server authorization, audit, and tests.
Asset strategy: no new Filament assets expected; deployment filament:assets remains unchanged and only needed for registered asset changes, which are out of scope.

Test Governance Check

Test purpose / classification by changed surface:
- Unit: restore proof decision and adapter branch mapping
- Feature: adapter reconciliation writes, scope safety, Operations/Restore detail fallout
- Browser: conditional focused high-risk proof hierarchy smoke only if visible hierarchy changes
Affected validation lanes: fast-feedback, confidence, optional browser
Why this lane mix is the narrowest sufficient proof: schema and query semantics are unchanged; proof logic and UI fallout are business behavior
Narrowest proving command(s):
- cd apps/platform && ./vendor/bin/sail artisan test --compact --filter=Spec364
- cd apps/platform && ./vendor/bin/sail artisan test --compact --filter=RestoreRun
- cd apps/platform && ./vendor/bin/sail artisan test --compact --filter=OperationRun
- optional browser smoke command if browser file is added
Fixture / helper / factory / seed / context cost risks: keep restore/backup/operation/evidence fixtures local to tests
Expensive defaults or shared helper growth introduced?: no
Heavy-family additions, promotions, or visibility changes: none
Surface-class relief / special coverage rule: high-risk restore is not standard relief; add explicit proof tests
Closing validation and reviewer handoff: verify no success from preview-only, missing-proof, wrong-scope, or mixed-result fixtures
Budget / baseline / trend follow-up: none expected
Review-stop questions: proof completeness, scope safety, no new outcome, no Graph calls, no raw provider data
Escalation path: document-in-feature
Active feature PR close-out entry: Guardrail / Exception / Smoke Coverage
Why no dedicated follow-up spec is needed: Spec 364 is the bounded follow-up for restore execution truth; future verification/rollback families remain explicitly deferred

Project Structure

Documentation (this feature)

specs/364-restore-high-risk-operation-reconciliation/
├── spec.md
├── plan.md
├── tasks.md
└── checklists/
    └── requirements.md

Source Code (likely affected; no code changed during preparation)

apps/platform/app/Support/Operations/Reconciliation/
├── RestoreExecuteReconciliationAdapter.php
├── ReconciliationResult.php
└── OperationRunReconciliationRegistry.php

apps/platform/app/Services/
├── AdapterRunReconciler.php
└── OperationRunService.php

apps/platform/app/Support/RestoreSafety/
├── RestoreSafetyResolver.php
└── RestoreResultAttention.php

apps/platform/app/Filament/Resources/
├── OperationRunResource.php
└── RestoreRunResource.php

apps/platform/app/Filament/Pages/
├── Monitoring/Operations.php
└── Operations/TenantlessOperationRunViewer.php

apps/platform/tests/
├── Unit/Support/Operations/Reconciliation/
├── Unit/Support/RestoreSafety/
├── Feature/Operations/
├── Feature/Restore/
└── Browser/ (optional)

Structure Decision: Use existing Laravel app surfaces under apps/platform; add no new top-level folders and no dependencies.

Complexity Tracking

Violation	Why Needed	Simpler Alternative Rejected Because
Restore-specific proof rule hardening	High-risk tenant-changing `restore.execute` needs stricter success proof than read-only/domain-output runs	Status-only mapping is already the problem and can overclaim recovery
Optional small local proof evaluator	Only if adapter/detail would duplicate the same proof bundle rules	A generic high-risk framework or new outcome family is too broad

Proportionality Review

Current operator problem: a restore operation can appear successful without complete recovery proof.
Existing structure is insufficient because: terminal RestoreRun status alone is weaker than the proof required for tenant-changing success.
Narrowest correct implementation: harden the existing restore adapter and presentation fallout over existing records and outcomes.
Ownership cost created: focused restore proof rules and regression tests.
Alternative intentionally rejected: new verification_required OperationRun outcome, new restore.verify operation type, new restore verification table, and generic high-risk framework.
Release truth: current-release truth.

Implementation Phases

Re-verify current restore proof truth and existing test fixtures.
Add failing Unit/Feature tests for restore proof mapping, audit continuity, soft-deleted RestoreRun handling, and wrong-scope safety.
Harden RestoreExecuteReconciliationAdapter to require the spec's Success Proof Bundle Matrix and map partial/blocked/failed/proof-gap cases to existing outcomes.
Adjust OperationRun and Restore detail presentation only if needed to display proof-gap reasons without duplicate default-visible truth.
Add optional browser smoke only if visible hierarchy changes.
Run focused validation and record close-out notes.

Rollout Considerations

Environment variables: none expected.
Migrations: none expected.
Queues/workers: no new queue family; existing restore jobs remain queued and observable.
Scheduler: no scheduler change expected.
Storage: no storage change expected.
Deployment assets: no new Filament assets expected; no new filament:assets requirement beyond existing deploy process.
Staging/production: validate in staging before production because restore is high-risk.

Risk Controls

Fail closed on ambiguous proof.
Do not add new outcomes or operation types.
Keep reconciliation service-owned.
Do not call provider APIs from reconciliation or UI render.
Sanitize failure/reason metadata.
Preserve RBAC and deny-as-not-found boundaries.

15 KiB Raw Permalink Blame History