TenantAtlas/specs/056-remove-legacy-bulkops/spec.md
ahmido a97beefda3 056-remove-legacy-bulkops (#65)
Kurzbeschreibung

Versteckt die Rerun-Row-Action für archivierte (soft-deleted) RestoreRuns und verhindert damit fehlerhafte Neu-Starts aus dem Archiv; ergänzt einen Regressionstest.
Änderungen

Code: RestoreRunResource.php — Sichtbarkeit der rerun-Action geprüft auf ! $record->trashed() und defensive Abbruchprüfung im Action-Handler.
Tests: RestoreRunRerunTest.php — neuer Test rerun action is hidden for archived restore runs.
Warum

Archivierte RestoreRuns durften nicht neu gestartet werden; UI zeigte trotzdem die Option. Das führte zu verwirrendem Verhalten und möglichen Fehlern beim Enqueueing.
Verifikation / QA

Unit/Feature:
./vendor/bin/sail artisan test tests/Feature/RestoreRunRerunTest.php
Stil/format:
./vendor/bin/pint --dirty
Manuell (UI):
Als Tenant-Admin Filament → Restore Runs öffnen.
Filter Archived aktivieren (oder Trashed filter auswählen).
Sicherstellen, dass für archivierte Einträge die Rerun-Action nicht sichtbar ist.
Auf einem aktiven (nicht-archivierten) Run prüfen, dass Rerun sichtbar bleibt und wie erwartet eine neue RestoreRun erzeugt.
Wichtige Hinweise

Kein DB-Migration required.
Diese PR enthält nur den UI-/Filament-Fix; die zuvor gemachten operative Fixes für Queue/adapter-Reconciliation bleiben ebenfalls auf dem Branch (z. B. frühere commits während der Debugging-Session).
T055 (Schema squash) wurde bewusst zurückgestellt und ist nicht Teil dieses PRs.
Merge-Checklist

 Tests lokal laufen (RestoreRunRerunTest grünt)
 Pint läuft ohne ungepatchte Fehler
 Branch gepusht: 056-remove-legacy-bulkops (PR-URL: https://git.cloudarix.de/ahmido/TenantAtlas/compare/dev...056-remove-legacy-bulkops)

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #65
2026-01-19 23:27:52 +00:00

13 KiB
Raw Blame History

Feature Specification: Remove Legacy BulkOperationRun & Canonicalize Operations (v1.0)

Feature Branch: 056-remove-legacy-bulkops
Created: 2026-01-18
Status: Draft
Input: User description: "Feature 056 — Remove Legacy BulkOperationRun & Canonicalize Operations (v1.0)"

Clarifications

Session 2026-01-18

  • Q: What should be the default max concurrency per target scope (entra_tenant_id / directory_context_id) for bulk operations? → A: Config-driven, default=1
  • Q: How should Selection Identity be determined for idempotency fingerprinting? → A: Hybrid (IDs-hash for explicit selection; query-hash for “select all via filter/query”)

User Scenarios & Testing (mandatory)

User Story 1 - Run-backed bulk actions are always observable (Priority: P1)

An admin performs a bulk action (e.g., apply/ignore/restore/prune across many records). The system records a single canonical run that can be monitored end-to-end, including partial failures, and provides consistent user feedback.

Why this priority: Bulk changes are operationally significant and must be traceable, support partial outcomes, and have a consistent mental model for admins.

Independent Test: Trigger a representative bulk action and verify that a run record exists, appears in the Monitoring list, has a detail view, and emits the correct feedback surfaces.

Acceptance Scenarios:

  1. Given an admin selects multiple items for a bulk action, When the action is confirmed and submitted, Then a canonical run record is created or reused and the UI confirms the enqueue/queued state via a toast.
  2. Given a bulk run is queued or running, When the admin opens Monitoring → Operations, Then the run appears in the list and can be opened via a canonical “View run” link.
  3. Given a bulk run completes with a mix of successes and failures, When the run reaches a terminal state, Then the initiator receives a terminal notification and the run detail shows a summary of outcomes.

User Story 2 - Monitoring is the single source of run history (Priority: P2)

An admin (or operator) relies on Monitoring → Operations to see the full history of operational work (including bulk). There are no separate legacy run surfaces; links from anywhere in the app point to the canonical run detail.

Why this priority: Multiple run systems lead to missed incidents, inconsistent retention, and developer confusion. One canonical surface improves operational clarity and reduces support overhead.

Independent Test: Navigate from a bulk action result to “View run” and confirm it lands in Monitorings run detail; confirm there is no legacy “bulk runs” navigation or pages.

Acceptance Scenarios:

  1. Given any UI element offers a “View run” link, When it is clicked, Then it opens the canonical Monitoring → Operations → Run Detail page for that run.
  2. Given the app navigation, When an admin searches for legacy bulk-run screens, Then no legacy bulk-run navigation or pages exist.

User Story 3 - Developers cant accidentally reintroduce legacy patterns (Priority: P3)

A developer adds or modifies an admin action. They can clearly determine whether it is an audit-only action or a run-backed operation, and the repository enforces the single-run model by preventing legacy references and UX drift.

Why this priority: Preventing regression is essential for suite readiness and long-term maintainability.

Independent Test: Introduce a legacy reference or a bulk action without a run-backed record and confirm CI/automated checks fail.

Acceptance Scenarios:

  1. Given a change introduces any reference to the legacy bulk-run system, When tests/CI run, Then the pipeline fails with a clear message.
  2. Given a security-relevant DB-only action that is eligible for audit-only classification, When the action runs, Then an audit log entry is recorded and no run record is created.

Edge Cases

  • Bulk selection is empty or resolves to zero items: the system does not start work and provides a clear non-destructive result.
  • A bulk selection is very large: the system remains responsive and continues to show progress via run summary metrics.
  • Target scope is required but missing: the system fails safely, records a terminal run with a stable reason code, and does not execute remote/bulk mutations.
  • Remote calls experience throttling: the system applies bounded retries with jittered backoff and records failures without losing overall run visibility.
  • Duplicate submissions (double click / retry / re-run): idempotency prevents duplicate processing and preserves a single canonical outcome per selection identity.
  • Tenant isolation: no run, selection, summary, or notifications leak across tenants.

Requirements (mandatory)

Constitution alignment (required): This feature consolidates operational work onto a single canonical run model and a single monitoring surface. It must preserve the defined user feedback surfaces (queued toast, active widget, terminal notification), ensure tenant-scoped observability, and maintain stable, sanitized messages and reason codes.

Functional Requirements

  • FR-001 Single run model: The system MUST use a single canonical run model (OperationRun) for all run-backed operations; the legacy bulk-run model MUST not exist after this feature.
  • FR-002 Bulk actions are run-backed: Any bulk action (apply to N records, chunked work, mass ignore/restore/prune/delete) MUST create or reuse an OperationRun and MUST be visible in Monitoring → Operations.
  • FR-003 Action taxonomy: Every admin action MUST be classified as exactly one of:
    • Audit-only DB action: DB-only, no remote/external calls, no queued work, and bounded DB work; typically completes within ~2 seconds (guidance, not a hard rule). MUST write an audit log for security/ops-relevant state changes; MUST NOT create an OperationRun.
    • Run-backed operation: queued/long-running/remote/bulk/scheduled or otherwise operationally significant; MUST create or reuse an OperationRun.

Decision rule: If classification is uncertain, default to Run-backed operation.

  • FR-004 Canonical UX surfaces: For run-backed operations, the system MUST use only these feedback surfaces:
    • Queued: toast-only
    • Active: tenant-wide active widget
    • Terminal: database-backed notification to the initiator only
  • FR-005 Canonical routing: All “View run” links MUST route to Monitoring → Operations → Run Detail.
  • FR-006 Legacy removal: The system MUST remove legacy bulk-run tables/models/services/routes/widgets/navigation and MUST prevent any new legacy writes.
  • FR-007 Canonical summary metrics: The runs summary metrics MUST use a single canonical set of keys and MUST be presented consistently in the run detail view.
  • FR-008 Target scope recording: For operations targeting a directory/remote tenant, the run context MUST record the target scope (directory identifier) and Monitoring/Run Detail MUST display it in a human-friendly way when available.
  • FR-009 Per-target throttling: Bulk orchestration MUST enforce concurrency limits per target scope to reduce throttling risk and provide predictable execution; the limit MUST be configuration-driven with a default of 1 per target scope.
  • FR-010 Idempotency for bulk: Bulk operations MUST be idempotent using a deterministic fingerprint that includes operation type, target scope, and selection identity; retries MUST NOT duplicate work.
  • FR-011 Discovery completeness: The implementation MUST include a repo-wide discovery sweep of legacy references and bulk-like actions; findings MUST be recorded in a discovery report with classification and migration/deferral decisions.
  • FR-012 Regression guardrails: Automated checks MUST fail if legacy bulk-run references reappear or if bulk actions bypass the canonical run-backed model.

Non-Functional Requirements (NFR)

NFR-01 Monitoring is DB-only at render time (Constitution Gate)

All Monitoring → Operations pages (index and run detail) MUST be DB-only at render time:

  • No Graph/remote calls during initial render or reactive renders.
  • No side-effectful work triggered by view rendering.

Verification:

  • Add a regression test/guard that mocks the Graph client (or equivalent remote client) and asserts it is not called during Monitoring renders.

  • Add a regression test/guard that mocks the Graph client (or equivalent remote client) and asserts it is not called during Monitoring renders.

NFR-02 Failure reason codes and message sanitization

Run-backed operations MUST store failures as stable, machine-readable reason_code values plus a sanitized, user-facing message.

Minimal required reason_code set (baseline):

reason_code Meaning
graph_throttled Remote service throttled (e.g., rate limited)
graph_timeout Remote call timed out
permission_denied Missing/insufficient permissions
validation_error Input/selection validation failure
conflict_detected Conflict detected (concurrency/version/resource state)
unknown_error Fallback when no specific code applies

Rules:

  • reason_code is stable over time and safe to use in programmatic filters/alerts.
  • Failure messages are sanitized and bounded in length; failures/notifications MUST NOT persist secrets/tokens/PII or raw payload dumps.

NFR-03 Retry/backoff/jitter for remote throttling

When worker jobs perform remote calls, they MUST handle transient failures (including 429/503) via a shared policy:

  • bounded retries
  • exponential backoff with jitter
  • no hand-rolled sleep() loops or ad-hoc random retry logic in feature code

Implementation Shape (decision)

Decision: standard orchestrator + item workers

  • 1 orchestrator job per run:
    • resolves selection deterministically
    • chunks work
    • dispatches item worker jobs (idempotent per item)
  • Worker jobs update operation_runs.summary_counts via canonical normalization.
  • Finalization sets terminal status once.

Target Scope (canonical keys)

Canonical context keys:

  • entra_tenant_id (Azure AD tenant GUID)
  • optional entra_tenant_name (human-friendly; if available)
  • optional directory_context_id (internal directory context identifier, if/when introduced)

For operations targeting a directory/remote tenant, the run context MUST record target scope using the canonical keys above, and Monitoring/Run Detail MUST display the target scope (human-friendly name if available).

Assumptions

  • Existing run status semantics remain unchanged (queued/running/succeeded/partial/failed).
  • Existing monitoring experience is not redesigned; it is aligned so that all operational work is represented consistently.

Dependencies

  • Prior consolidation work establishing OperationRun as the canonical run model and Monitoring → Operations as the canonical surface.
  • Existing audit logging conventions for security/ops-relevant DB-only actions.

Legacy History Decision (recorded)

  • Default path: legacy bulk-run history is not migrated into the canonical run model. The legacy tables are removed after cutover, relying on database backups/exports if historical investigation is needed.

Key Entities (include if feature involves data)

  • OperationRun: A tenant-scoped record of operational work with status, timestamps, sanitized user-facing message/reason code, summary metrics, and context.
  • Operation Type: A stable identifier describing the kind of operation (used for categorization, labeling, and governance).
  • Target Scope: The directory / remote tenant scope that the operation targets (when applicable).
  • Selection Identity: The deterministic definition of “what the bulk action applies to” used for idempotency and traceability.
  • Audit Log Entry: A record of security/ops-relevant state changes for audit-only DB actions.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: 100% of bulk actions in the admin UI create or reuse a canonical run record and appear in Monitoring → Operations.
  • SC-002: Repository contains 0 references to the legacy bulk-run system after completion, enforced by automated checks.
  • SC-003: For directory-targeted operations, 100% of run records display a target scope in Monitoring/Run Detail.
  • SC-004: For bulk operations, duplicate submissions do not increase processed item count beyond one idempotent execution per selection identity.
  • SC-005: Admins can locate a completed bulk run in Monitoring within 30 seconds using standard navigation and filters, without relying on legacy pages.