TenantAtlas/specs/086-retire-legacy-runs-into-operation-runs/spec.md
ahmido 2bf5de4663 085-tenant-operate-hub (#103)
Summary

Consolidates the “Tenant Operate Hub” work (Spec 085) and the follow-up adjustments from the 086 session merge into a single branch ready to merge into dev.
Primary focus: stabilize Ops/Operate Hub UX flows, tighten/align authorization semantics, and make the full Sail test suite green.
Key Changes

Ops UX / Verification
Readonly members can view verification operation runs (reports) while starting verification remains restricted.
Normalized failure reason-code handling and aligned UX expectations with the provider reason-code taxonomy.
Onboarding wizard UX
“Start verification” CTA is hidden while a verification run is active; “Refresh” is shown during in-progress runs.
Treats provider_permission_denied as a blocking reason (while keeping legacy compatibility).
Test + fixture hardening
Standardized use of default provider connection fixtures in tests where sync/restore flows require it.
Fixed multiple Filament URL/tenant-context test cases to avoid 404s and reduce tenancy routing brittleness.
Policy sync / restore safety
Enrollment configuration type collision classification tests now exercise the real sync path (with required provider connection present).
Restore edge-case safety tests updated to reflect current provider-connection requirements.
Testing

vendor/bin/sail artisan test --compact (green)
vendor/bin/sail bin pint --dirty (green)
Notes

Includes merged 086 session work already (no separate PR needed).

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@ebc83aaa-d947-4a08-b88e-bd72ac9645f7.fritz.box>
Co-authored-by: Ahmed Darrazi <ahmeddarrazi@MacBookPro.fritz.box>
Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.fritz.box>
Reviewed-on: #103
2026-02-11 13:02:03 +00:00

14 KiB

Feature Specification: Retire Legacy Runs Into Operation Runs

Feature Branch: 086-retire-legacy-runs-into-operation-runs
Created: 2026-02-09
Status: Draft
Input: User description: "Retire legacy run tracking into canonical operation runs, with DB-only rendering and dispatch-time run creation. Legacy run tables remain read-only history."

Clarifications

Session 2026-02-10

  • Q: For manual backup schedule runs (backup_schedule.run_now) and retries (backup_schedule.retry), should the system dedupe while a run is active, or always create a new run per click? → A: Always create a new run per click (no dedupe).
  • Q: Who may view the canonical run detail page (“View run”)? → A: Workspace members may view runs only if they also have the required capability for that operation type; non-members get 404, members without capability get 403.
  • Q: Which capability should be required to view a run (“View run”)? → A: Use the same capability as starting that operation type.
  • Q: For backup_schedule.scheduled, how should dedupe work? → A: Strict dedupe per schedule and intended fire-time (at most one run).
  • Q: For the role definitions cache “Sync now” operation, should it use a new dedicated operation type or reuse an existing one? → A: Use a new dedicated operation type.

User Scenarios & Testing (mandatory)

User Story 1 - Start an operation with an immediate canonical run link (Priority: P1)

As a workspace member, I can start long-running operations (inventory sync, directory groups sync, scheduled backups, restore execution, directory role definitions sync) and immediately receive a stable “View run” link that I can open and share.

Why this priority: This removes the “run link appears later / changes” ambiguity, improves auditability, and prevents duplicate tracking paths.

Independent Test: Trigger each supported operation start surface and verify a canonical run record exists before work begins, and that the canonical viewer loads from persisted state.

Acceptance Scenarios:

  1. Given a workspace member with the required capability, When they start an inventory sync, Then a canonical run exists immediately and the UI shows a stable “View run” link.
  2. Given a scheduled backup fire event, When the scheduler dispatches work, Then a canonical run exists immediately and the same fire event cannot create duplicates.
  3. Given a workspace member without the required capability, When they attempt to start the operation, Then the request is rejected with a capability error (403) and no run is created.

User Story 2 - Monitor executions from a single canonical viewer (Priority: P2)

As a workspace member, I can open an operations viewer link for any run and see status, progress, results, and errors without the page triggering outbound calls.

Legacy “run history” pages remain available for older historical rows but cannot start or retry anything.

Why this priority: A single viewer reduces support load, enables consistent deep linking, and avoids UI latency and rate-limiting from outbound calls.

Independent Test: Load the canonical viewer and legacy history pages using outbound client fakes/mocks and assert no outbound calls occur during rendering/search.

Acceptance Scenarios:

  1. Given a run exists, When a user opens its canonical operations link, Then the page renders only from persisted state and performs no outbound calls.
  2. Given a legacy run history record that has a known canonical mapping, When a user opens the legacy “view” page, Then they are redirected to the canonical operations viewer.
  3. Given a legacy run history record without a canonical mapping, When a user opens the legacy “view” page, Then they see a read-only historical record and no new canonical run is created.

User Story 3 - Use cached directory data in forms without blocking calls (Priority: P3)

As a workspace member configuring tenant-related settings, I can search/select directory groups and role definitions using cached data. If cached data is missing or stale, I can trigger an asynchronous sync (“Sync now”) without the form making outbound calls.

Why this priority: Prevents slow, flaky UI and rate-limits from inline lookups, while keeping the configuration flow usable.

Independent Test: Render the configuration form and exercise search/label rendering while asserting outbound clients are not called.

Acceptance Scenarios:

  1. Given cached directory groups exist, When the user searches for groups, Then results and labels come from cached data.
  2. Given cached role definitions are missing, When the user opens the role definition selector, Then the UI indicates “data not available yet” and offers a non-destructive “Sync now” action.
  3. Given the user triggers “Sync now”, When the sync starts, Then a canonical run is created immediately and the user can open its canonical “View run” link.

Edge Cases

  • A scheduler fires the same scheduled backup more than once for the same intended time.
  • A user triggers the same sync while an identical sync is still active (dedupe/while-active semantics).
  • A job fails before writing progress; the canonical run still exists and shows a clear failure state.
  • A legacy history row exists but has no canonical mapping; it must remain viewable without creating new canonical runs.
  • A non-member attempts to access a canonical operations link; response must be deny-as-not-found (404).
  • A member lacks capability: start surfaces must reject (403) and the UI must reflect disabled affordances.
  • Cached directory data is empty or stale; UI must not block on outbound calls and must provide a safe way to sync.

Requirements (mandatory)

Constitution alignment (required): This feature includes long-running/queued/scheduled work. The spec MUST describe tenant isolation, run observability (type/identity/visibility), and tests.

Constitution alignment (RBAC-UX): This feature changes authorization behavior and navigation paths. It MUST define 404 vs 403 semantics and ensure server-side enforcement for operation-start flows.

Constitution alignment (OPS-EX-AUTH-001): Outbound HTTP without a canonical run is not allowed on Monitoring/Operations pages.

Constitution alignment (BADGE-001): Any new/changed status presentation for runs MUST remain centralized and covered by tests.

Constitution alignment (Admin UI Action Surfaces): This feature changes multiple admin UI surfaces and MUST satisfy the UI Action Surface Contract (see matrix below).

Functional Requirements

  • FR-001 (Canonical tracking): The system MUST treat the canonical run record as the single source of truth for execution tracking (status, progress, results, errors) for the in-scope operations.

  • FR-002 (Dispatch-time creation): Every start surface (UI action, console command, scheduler, internal service) MUST create the canonical run record before dispatching any asynchronous work.

  • FR-003 (No job fallback-create): Background workers MUST NOT create canonical run records as a fallback; missing run identifiers are treated as a fatal contract violation.

  • FR-004 (Canonical deep-link): The system MUST support exactly one canonical deep-link format for viewing runs which is tenantless and stable.

  • FR-005 (Membership + capability rules): Access to operation runs MUST follow these rules:

    • Non-members of the workspace scope MUST receive deny-as-not-found (404).
    • Workspace members who lack the required capability for the operation type MUST receive 403.
  • FR-005a (View capability mapping): “View run” MUST require the same capability as “Start” for the corresponding operation type.

  • FR-006 (DB-only rendering): Operations/monitoring and run viewer pages MUST render solely from persisted data and MUST NOT perform outbound calls during rendering/search/label resolution.

  • FR-007 (Legacy history read-only): Legacy run history records MUST remain viewable as historical data, but MUST be strictly read-only (no start/retry/execute actions).

  • FR-008 (Legacy redirects): If a legacy history record includes a canonical mapping, the legacy “view” page MUST redirect deterministically to the canonical viewer; otherwise it MUST display legacy-only history.

  • FR-009 (No new legacy rows): For the in-scope operations, the system MUST stop writing new legacy run history rows. Existing legacy history remains unchanged.

  • FR-010 (Scheduled backup classification): Scheduled backup executions MUST be represented with a distinct operation type (not conflated with manual runs).

  • FR-011 (Run identity & dedupe): The system MUST compute deterministic run identities for dedupe and scheduler double-fire protection, and MUST define whether each type dedupes “while active” or is strictly unique.

  • FR-011b (Scheduled backups are strict): Scheduled backup executions MUST use strict dedupe per schedule and intended fire-time (at most one canonical run ever per schedule per intended fire-time).

  • FR-011a (Backup manual runs are unique): Manual backup schedule runs (“run now”) and retries MUST be unique per user action (no while-active dedupe).

  • FR-012 (Inputs & provenance): The system MUST store operation inputs and provenance (target tenant/schedule, trigger source, optional initiating user) on the canonical run record.

  • FR-013 (Structured results): The system MUST store a standard, structured summary of results (counts) and failures (structured error entries) on the canonical run record.

  • FR-014 (Restore domain vs execution): Restore workflow domain records may remain as domain entities, but execution tracking and “View run” affordances MUST use the canonical run record exclusively.

  • FR-015 (Cached directory data): The system MUST provide cached directory group data and cached role definition data to support search and label rendering in configuration forms without outbound calls.

  • FR-015a (Role definitions sync type): The role definitions cache sync MUST use a dedicated operation type (e.g., directory_role_definitions.sync) to keep identities, results, and auditability distinct from other sync operations.

  • FR-016 (Safe “Sync now”): When cached directory data is missing, the UI MUST provide a non-destructive “Sync now” action that starts an asynchronous sync and immediately exposes the canonical run link.

Assumptions

  • A canonical run model/viewer already exists and is suitable for monitoring long-running operations.
  • Outbound calls to external services are permitted only in asynchronous execution paths and are observable via the canonical run record.

Out of Scope

  • Backfilling legacy history into canonical runs.
  • Dropping/removing legacy run history tables.
  • Introducing new cross-workspace analytics.

UI Action Matrix (mandatory when admin UI is changed)

Surface Location Header Actions Inspect Affordance (List/Table) Row Actions (max 2 visible) Bulk Actions (grouped) Empty-State CTA(s) View Header Actions Create/Edit Save+Cancel Audit log? Notes / Exemptions
Operations viewer Canonical run viewer route None Open by canonical link None None None None N/A Yes (canonical run record metadata) Must be DB-only rendering; non-member is 404
Inventory sync start Inventory admin UI Start sync View run link appears after start View run None None N/A N/A Yes Capability-gated; creates canonical run before dispatch
Directory groups sync start Directory groups admin UI & console Sync now View run link appears after start View run None Sync now (when cache empty) N/A N/A Yes Single dispatcher entry; legacy start actions removed
Backup schedule runs list Backup schedule detail None List links open canonical viewer View run None None N/A N/A Yes Includes scheduled/manual/retry runs; scheduled has distinct type
Tenant configuration selectors Tenant settings forms Sync now (when cache empty) Search from cached data None None Sync now N/A Save/Cancel Yes No outbound calls in search/label resolution
Legacy run history pages Archive/history areas None View (read-only) View only None None None N/A Yes (historical) No Start/Retry; redirect only if canonical mapping exists

Key Entities (include if feature involves data)

  • Canonical Run: A single, shareable execution record containing type, identity, provenance, status, progress, results, and errors.
  • Legacy Run History Record: A historical record for prior run-tracking paths; viewable but not mutable.
  • Managed Tenant: The tenant context targeted by operations.
  • Backup Schedule: A schedule configuration that can trigger executions automatically.
  • Restore Run (Domain Record): The domain workflow record for restore; links to canonical execution runs.
  • Directory Group Cache: Cached group metadata used for searching/label rendering in forms.
  • Role Definition Cache: Cached role definition metadata used for searching/label rendering in forms.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: 100% of newly started in-scope operations create a canonical run record before any asynchronous work is dispatched.
  • SC-002: Over a 30-day staging observation window, 0 new legacy run history rows are created for in-scope operations.
  • SC-003: Operations viewer and monitoring pages perform 0 outbound calls during rendering/search/label resolution (verified by automated tests).
  • SC-004: For scheduled backups, duplicate scheduler fires for the same schedule and intended fire-time result in at most 1 canonical run.
  • SC-005: Users can open a canonical “View run” link and see status/progress within 2 seconds in typical conditions.