spec: refine 057 + extend 058 #67
@ -669,3 +669,11 @@ ### Replaced Utilities
|
||||
| decoration-slice | box-decoration-slice |
|
||||
| decoration-clone | box-decoration-clone |
|
||||
</laravel-boost-guidelines>
|
||||
|
||||
## Recent Changes
|
||||
- 054-unify-runs-suitewide: Added PHP 8.4 + Filament v4, Laravel v12, Livewire v3
|
||||
- 054-unify-runs-suitewide: Added [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
|
||||
- 054-unify-runs-suitewide: Added PHP 8.4 + Filament v4, Laravel v12, Livewire v3
|
||||
|
||||
## Active Technologies
|
||||
- PostgreSQL (`operation_runs` table + JSONB) (054-unify-runs-suitewide)
|
||||
|
||||
30
specs/054-unify-runs-suitewide/checklists/requirements.md
Normal file
30
specs/054-unify-runs-suitewide/checklists/requirements.md
Normal file
@ -0,0 +1,30 @@
|
||||
# Requirements Checklist: Unified Operations Runs
|
||||
|
||||
## Phase 1 Adoption Set
|
||||
- [x] `inventory.sync` (Inventory “Sync now”) covered in spec
|
||||
- [x] `policy.sync` (Policies “Sync now”) covered in spec
|
||||
- [x] `directory_groups.sync` (Directory → Groups “Sync groups”) covered in spec
|
||||
- [x] `drift.generate` (Drift “Generate drift now”) covered in spec
|
||||
- [x] `backup_set.add_policies` (Backup Sets “Add selected”) covered in spec
|
||||
- [x] `restore.execute` (adapter mode) covered in spec
|
||||
|
||||
## Critical Clarifications (Pinned)
|
||||
- [x] Retention policy defined (90 days default)
|
||||
- [x] Transition strategy defined (Parallel write: Canonical + Legacy)
|
||||
- [x] Concurrency enforcement defined (Partial unique index on active runs)
|
||||
- [x] Initiator model defined (Nullable FK + Name Snapshot)
|
||||
- [x] Restore integration defined (Physical adapter row pointing to Restore Domain record)
|
||||
|
||||
## Functional Requirements (Spec Coverage)
|
||||
- [x] FR-001 Canonical Operation Run schema defined (see `data-model.md`)
|
||||
- [x] FR-004 Monitoring List UI specified (filters/sort defined in Spec FR-004)
|
||||
- [x] FR-005 Monitoring Detail UI specified (content defined in Spec FR-005)
|
||||
- [x] FR-007 Start surfaces behavior specified (Spec FR-007)
|
||||
- [x] FR-009 Idempotency (Partial Unique Index) strategy defined (Spec FR-009, Plan)
|
||||
- [x] FR-015 Notifications for queued/terminal states specified (Spec FR-015)
|
||||
- [x] FR-016 Tenant isolation rules specified (Spec FR-016)
|
||||
|
||||
## Non-Functional (Spec Coverage)
|
||||
- [x] SC-002 Start confirmation < 2s target defined (Spec SC-002)
|
||||
- [x] SC-003 Deduplication rate > 99% strategy defined (Spec SC-003)
|
||||
- [x] SC-004 No secrets in failure logs rule defined (Spec SC-004)
|
||||
@ -0,0 +1,55 @@
|
||||
openapi: 3.0.3
|
||||
info:
|
||||
title: TenantPilot Admin Operations Contracts (Feature 054)
|
||||
version: 0.1.0
|
||||
description: |
|
||||
Minimal page-render contracts for the Monitoring/Operations hub.
|
||||
|
||||
These pages must render from the database only (no external tenant calls)
|
||||
and display only sanitized failure detail (no secrets/tokens/raw payload dumps).
|
||||
|
||||
servers:
|
||||
- url: /
|
||||
|
||||
paths:
|
||||
/admin/t/{tenantExternalId}/bulk-operation-runs:
|
||||
get:
|
||||
operationId: monitoringOperationsIndex
|
||||
summary: Monitoring → Operations (tenant-scoped)
|
||||
parameters:
|
||||
- name: tenantExternalId
|
||||
in: path
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
responses:
|
||||
'200':
|
||||
description: Page renders successfully.
|
||||
'302':
|
||||
description: Redirect to login when unauthenticated.
|
||||
|
||||
/admin/t/{tenantExternalId}/bulk-operation-runs/{bulkOperationRunId}:
|
||||
get:
|
||||
operationId: monitoringOperationsView
|
||||
summary: Operation run detail (tenant-scoped)
|
||||
parameters:
|
||||
- name: tenantExternalId
|
||||
in: path
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
- name: bulkOperationRunId
|
||||
in: path
|
||||
required: true
|
||||
schema:
|
||||
type: integer
|
||||
responses:
|
||||
'200':
|
||||
description: Page renders successfully.
|
||||
'302':
|
||||
description: Redirect to login when unauthenticated.
|
||||
'403':
|
||||
description: Forbidden when attempting cross-tenant access.
|
||||
|
||||
components: {}
|
||||
|
||||
18
specs/054-unify-runs-suitewide/contracts/routes.md
Normal file
18
specs/054-unify-runs-suitewide/contracts/routes.md
Normal file
@ -0,0 +1,18 @@
|
||||
# Routes & URLs
|
||||
|
||||
## Monitoring UI
|
||||
|
||||
### List Operations
|
||||
- **Route**: `tenant.monitoring.operations.index`
|
||||
- **URL**: `/tenants/{tenant}/monitoring/operations`
|
||||
- **Controller**: Livewire Component (`App\Livewire\Monitoring\OperationsList`)
|
||||
|
||||
### View Operation
|
||||
- **Route**: `tenant.monitoring.operations.show`
|
||||
- **URL**: `/tenants/{tenant}/monitoring/operations/{run}`
|
||||
- **Controller**: Livewire Component (`App\Livewire\Monitoring\OperationsDetail`)
|
||||
|
||||
## Deep Links
|
||||
- **Drift**: `/tenants/{tenant}/drift/history/{id}`
|
||||
- **Inventory**: `/tenants/{tenant}/inventory` (General, or specific timestamp if supported)
|
||||
- **Restore**: `/tenants/{tenant}/restore/{id}`
|
||||
@ -0,0 +1,48 @@
|
||||
# Service Interface: Operation Runs
|
||||
|
||||
## `App\Services\OperationRunService`
|
||||
|
||||
### `ensureRun`
|
||||
Idempotently creates or retrieves an active run.
|
||||
|
||||
```php
|
||||
public function ensureRun(
|
||||
Tenant $tenant,
|
||||
string $type,
|
||||
array $inputs,
|
||||
?User $initiator = null
|
||||
): OperationRun
|
||||
```
|
||||
|
||||
- **Logic**:
|
||||
1. Compute `hash = sha256(tenant_id + type + sorted_json(inputs))`.
|
||||
2. Try finding active run (`queued` or `running`) with this hash.
|
||||
3. If found, return it.
|
||||
4. If not found, create new `queued` run.
|
||||
5. Return run.
|
||||
|
||||
### `updateRun`
|
||||
Updates the status/outcome of a run.
|
||||
|
||||
```php
|
||||
public function updateRun(
|
||||
OperationRun $run,
|
||||
string $status,
|
||||
?string $outcome = null,
|
||||
array $summaryCounts = [],
|
||||
array $failures = []
|
||||
): OperationRun
|
||||
```
|
||||
|
||||
### `failRun`
|
||||
Helper to fail a run immediately.
|
||||
|
||||
```php
|
||||
public function failRun(OperationRun $run, Throwable $e): OperationRun
|
||||
```
|
||||
|
||||
## `App\Jobs\Middleware\TrackOperationRun`
|
||||
Middleware for Jobs to automatically handle `running` -> `completed`/`failed` transitions if bound to a run.
|
||||
|
||||
## `App\Listeners\SyncRestoreRunToOperation`
|
||||
Listener for `RestoreRun` events to update the shadow `OperationRun`.
|
||||
52
specs/054-unify-runs-suitewide/data-model.md
Normal file
52
specs/054-unify-runs-suitewide/data-model.md
Normal file
@ -0,0 +1,52 @@
|
||||
# Data Model: Unified Operations Runs
|
||||
|
||||
## Entities
|
||||
|
||||
### `OperationRun`
|
||||
Canonical record for all long-running tenant operations.
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `id` | BigInt | Yes | Primary Key |
|
||||
| `tenant_id` | BigInt | Yes | FK to Tenants |
|
||||
| `user_id` | BigInt | No | FK to Users (Initiator). Null for system/scheduler. |
|
||||
| `initiator_name` | String | Yes | Snapshot of user name or "System". |
|
||||
| `type` | String | Yes | stable taxonomy e.g., `inventory.sync`. |
|
||||
| `status` | String | Yes | Lifecycle state: `queued`, `running`, `completed`. |
|
||||
| `outcome` | String | Yes | Result bucket: `pending`, `succeeded`, `partially_succeeded`, `failed`, `cancelled`. |
|
||||
| `run_identity_hash` | String | Yes | Deterministic hash for idempotency. |
|
||||
| `summary_counts` | JSONB | No | `{ "total": 10, "success": 8, "failed": 2, "skipped": 0 }` |
|
||||
| `failure_summary` | JSONB | No | List of sanitized errors: `[{ "code": "GraphError", "message": "Throttled", "count": 1 }]` |
|
||||
| `context` | JSONB | No | Run-specific metadata. e.g., `{ "restore_run_id": 123, "selection": [...] }` |
|
||||
| `started_at` | Timestamp | No | When execution began. |
|
||||
| `completed_at` | Timestamp | No | When execution finished. |
|
||||
| `created_at` | Timestamp | Yes | |
|
||||
| `updated_at` | Timestamp | Yes | |
|
||||
|
||||
**Indexes**:
|
||||
- `(tenant_id, run_identity_hash)` UNIQUE WHERE status IN ('queued', 'running')
|
||||
- `(tenant_id, type, created_at)` for filtering/sorting
|
||||
- `(tenant_id, created_at)` for default sort
|
||||
|
||||
### `RestoreRun` (Existing)
|
||||
Remains the domain source of truth for Restore.
|
||||
- Linked via `OperationRun.context['restore_run_id']`.
|
||||
- `OperationRun` mirrors `RestoreRun` status/outcome.
|
||||
|
||||
## Enums
|
||||
|
||||
### `OperationRunStatus`
|
||||
- `queued`
|
||||
- `running`
|
||||
- `completed`
|
||||
|
||||
### `OperationRunOutcome`
|
||||
- `pending` (default when running/queued)
|
||||
- `succeeded`
|
||||
- `partially_succeeded`
|
||||
- `failed`
|
||||
- `cancelled`
|
||||
|
||||
## Relationships
|
||||
- `OperationRun` belongs to `Tenant`.
|
||||
- `OperationRun` belongs to `User` (optional).
|
||||
76
specs/054-unify-runs-suitewide/plan.md
Normal file
76
specs/054-unify-runs-suitewide/plan.md
Normal file
@ -0,0 +1,76 @@
|
||||
# Implementation Plan: Unified Operations Runs Suitewide
|
||||
|
||||
**Branch**: `feat/054-unify-operations-runs-suitewide` | **Date**: 2026-01-16 | **Spec**: [Spec Link](spec.md)
|
||||
**Input**: Feature specification from `specs/054-unify-runs-suitewide/spec.md`
|
||||
|
||||
## Summary
|
||||
|
||||
This feature unifies long-running tenant operations (e.g., Inventory Sync, Drift Generation) into a single canonical `operation_runs` table. This enables a consistent "Monitoring -> Operations" view for all tenant activities. Legacy run tables will be maintained in parallel for now (Parallel Write Transition). `RestoreRun` remains a domain-specific record but will be mirrored into `operation_runs` via an adapter pattern.
|
||||
|
||||
## Technical Context
|
||||
|
||||
**Language/Version**: PHP 8.4
|
||||
**Primary Dependencies**: Filament v4, Laravel v12, Livewire v3
|
||||
**Storage**: PostgreSQL (`operation_runs` table + JSONB)
|
||||
**Testing**: Pest v4 (Feature tests for Service, Livewire tests for UI)
|
||||
**Target Platform**: Linux server (Docker/Dokploy)
|
||||
**Project Type**: Web Application (Laravel Monolith)
|
||||
**Performance Goals**: Start operation < 2s. List runs < 200ms.
|
||||
**Constraints**: Tenant isolation is paramount. No cross-tenant data leakage.
|
||||
**Scale/Scope**: ~50-100 runs/day per tenant. Retention 90 days.
|
||||
|
||||
## Constitution Check
|
||||
|
||||
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
||||
|
||||
- [x] Inventory-first: N/A (this is about tracking operations, not inventory state itself)
|
||||
- [x] Read/write separation: Monitoring is read-only. Starts are explicit writes.
|
||||
- [x] Graph contract path: N/A (this feature tracks runs, doesn't call Graph directly)
|
||||
- [x] Deterministic capabilities: N/A
|
||||
- [x] Tenant isolation: `operation_runs` has `tenant_id`. Policies ensure scope.
|
||||
- [x] Automation: Idempotency enforced via DB index.
|
||||
- [x] Data minimization: No secrets in `failure_summary`.
|
||||
|
||||
## Project Structure
|
||||
|
||||
### Documentation (this feature)
|
||||
|
||||
```text
|
||||
specs/054-unify-runs-suitewide/
|
||||
├── plan.md # This file
|
||||
├── research.md # Research findings
|
||||
├── data-model.md # Database schema
|
||||
├── quickstart.md # Dev guide
|
||||
├── contracts/ # Service interfaces & routes
|
||||
└── tasks.md # Task breakdown
|
||||
```
|
||||
|
||||
### Source Code (repository root)
|
||||
|
||||
```text
|
||||
app/
|
||||
├── Models/
|
||||
│ └── OperationRun.php
|
||||
├── Services/
|
||||
│ └── OperationRunService.php
|
||||
├── Livewire/
|
||||
│ └── Monitoring/
|
||||
│ ├── OperationsList.php
|
||||
│ └── OperationsDetail.php
|
||||
├── Jobs/
|
||||
│ └── Middleware/
|
||||
│ └── TrackOperationRun.php
|
||||
└── Listeners/
|
||||
└── SyncRestoreRunToOperation.php
|
||||
|
||||
database/migrations/
|
||||
└── YYYY_MM_DD_create_operation_runs_table.php
|
||||
```
|
||||
|
||||
**Structure Decision**: Standard Laravel Service/Model/Livewire pattern.
|
||||
|
||||
## Complexity Tracking
|
||||
|
||||
| Violation | Why Needed | Simpler Alternative Rejected Because |
|
||||
|-----------|------------|-------------------------------------|
|
||||
| None | | |
|
||||
49
specs/054-unify-runs-suitewide/quickstart.md
Normal file
49
specs/054-unify-runs-suitewide/quickstart.md
Normal file
@ -0,0 +1,49 @@
|
||||
# Quickstart: Adding a New Operation
|
||||
|
||||
## 1. Register Run Type
|
||||
Add your new type constant to `App\Enums\OperationRunType` (if using Enums) or just use the string convention `resource.action`.
|
||||
|
||||
## 2. Implement Idempotency Inputs
|
||||
Define what makes a run "unique" for your feature.
|
||||
- Example: `['scope' => 'full']` vs `['scope' => 'policy', 'policy_id' => 1]`.
|
||||
|
||||
## 3. Use `OperationRunService`
|
||||
In your Start Action (Controller/Livewire):
|
||||
|
||||
```php
|
||||
// 1. Ensure Run
|
||||
$run = $service->ensureRun($tenant, 'my_resource.action', $inputs, auth()->user());
|
||||
|
||||
// 2. Dispatch Job (if new)
|
||||
if ($run->wasRecentlyCreated) {
|
||||
MyJob::dispatch($run, $inputs);
|
||||
}
|
||||
|
||||
// 3. Return View Link
|
||||
return redirect()->route('tenant.monitoring.operations.show', [$tenant, $run]);
|
||||
```
|
||||
|
||||
## 4. Instrument Job
|
||||
In your Job:
|
||||
|
||||
```php
|
||||
public function handle()
|
||||
{
|
||||
// Update to Running
|
||||
$this->run->updateStatus(status: 'running');
|
||||
|
||||
try {
|
||||
// ... do work ...
|
||||
|
||||
// Success
|
||||
$this->run->updateStatus(
|
||||
status: 'completed',
|
||||
outcome: 'succeeded',
|
||||
summary: ['processed' => 100]
|
||||
);
|
||||
} catch (\Throwable $e) {
|
||||
// Failure
|
||||
$this->run->fail($e);
|
||||
}
|
||||
}
|
||||
```
|
||||
65
specs/054-unify-runs-suitewide/research.md
Normal file
65
specs/054-unify-runs-suitewide/research.md
Normal file
@ -0,0 +1,65 @@
|
||||
# Research: Unified Operations Runs Suitewide
|
||||
|
||||
## 1. Technical Context & Unknowns
|
||||
|
||||
**Unknowns Resolved**:
|
||||
- **Transition Strategy**: Parallel write. We will maintain existing legacy tables (e.g., `inventory_sync_runs`, `restore_runs`) for now but strictly use `operation_runs` for the Monitoring UI.
|
||||
- **Restore Adapter**: `RestoreRun` remains the domain source of truth. An `OperationRun` record will be created as a "shadow" or "adapter" record. This requires hooking into `RestoreRun` lifecycle events or the service layer to keep them in sync.
|
||||
- **Run Logic Location**: Existing jobs like `RunInventorySyncJob` will be updated to manage the `OperationRun` state.
|
||||
- **Concurrency**: Enforced by partial unique index on `(tenant_id, run_identity_hash)` where status is active (`queued`, `running`).
|
||||
|
||||
## 2. Technology Choices
|
||||
|
||||
| Area | Decision | Rationale | Alternatives |
|
||||
|------|----------|-----------|--------------|
|
||||
| **Schema** | `operation_runs` table | Centralized table allows simple, performant Monitoring queries without complex UNIONs across disparate legacy tables. | Virtual UNION view (Complex, harder to paginate/sort efficiently). |
|
||||
| **Restore Integration** | Physical Adapter Row | Decouples Monitoring from Restore domain specifics. Allows uniform "list all runs" queries. The `context` JSON column will store `{ "restore_run_id": ... }`. | Polymorphic relation (Overhead for a single exception). |
|
||||
| **Idempotency** | DB Partial Unique Index | Hard guarantee against race conditions. Simpler than distributed locks (Redis) which can expire or fail. | Redis Lock (Soft guarantee), Application check (Race prone). |
|
||||
| **Initiator** | Nullable FK + Name | Handles both Users (FK) and System/Scheduler (Name "System") uniformly. | Polymorphic relation (Overkill for simple auditing). |
|
||||
|
||||
## 3. Implementation Patterns
|
||||
|
||||
### Canonical Run Lifecycle
|
||||
1. **Start Request**:
|
||||
- Compute `run_identity_hash` from inputs.
|
||||
- Attempt `INSERT` into `operation_runs` (ignore conflict if active).
|
||||
- If active run exists, return it (Idempotency).
|
||||
- If new, dispatch Job.
|
||||
2. **Job Execution**:
|
||||
- Update status to `running`.
|
||||
- Perform work.
|
||||
- Update status to `succeeded`/`failed`.
|
||||
3. **Restore Adapter**:
|
||||
- When `RestoreRun` is created, create `OperationRun` (queued/running).
|
||||
- When `RestoreRun` updates (status change), update `OperationRun`.
|
||||
|
||||
### Data Model
|
||||
```sql
|
||||
CREATE TABLE operation_runs (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
tenant_id BIGINT NOT NULL REFERENCES tenants(id),
|
||||
user_id BIGINT NULL REFERENCES users(id), -- Initiator
|
||||
initiator_name VARCHAR(255) NOT NULL, -- "John Doe" or "System"
|
||||
type VARCHAR(255) NOT NULL, -- "inventory.sync"
|
||||
status VARCHAR(50) NOT NULL, -- queued, running, completed
|
||||
outcome VARCHAR(50) NOT NULL, -- pending, succeeded, partially_succeeded, failed, cancelled
|
||||
run_identity_hash VARCHAR(64) NOT NULL, -- SHA256(tenant_id + inputs)
|
||||
summary_counts JSONB DEFAULT '{}', -- { success: 10, failed: 2 }
|
||||
failure_summary JSONB DEFAULT '[]', -- [{ code: "ERR_TIMEOUT", message: "..." }]
|
||||
context JSONB DEFAULT '{}', -- { selection: [...], restore_run_id: 123 }
|
||||
started_at TIMESTAMP NULL,
|
||||
completed_at TIMESTAMP NULL,
|
||||
created_at TIMESTAMP,
|
||||
updated_at TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE UNIQUE INDEX operation_runs_active_unique
|
||||
ON operation_runs (tenant_id, run_identity_hash)
|
||||
WHERE status IN ('queued', 'running');
|
||||
```
|
||||
|
||||
## 4. Risks & Mitigations
|
||||
- **Risk**: Desync between `RestoreRun` and `OperationRun`.
|
||||
- **Mitigation**: Use model observers or service-layer wrapping to ensure atomic-like updates, or accept slight eventual consistency (Monitoring might lag ms behind Restore UI).
|
||||
- **Risk**: Legacy runs not appearing.
|
||||
- **Mitigation**: We are NOT backfilling legacy runs. Only new runs after deployment will appear in the new Monitoring UI. This is acceptable for "Phase 1".
|
||||
148
specs/054-unify-runs-suitewide/spec.md
Normal file
148
specs/054-unify-runs-suitewide/spec.md
Normal file
@ -0,0 +1,148 @@
|
||||
# Feature Specification: Unified Operations Runs Suitewide (Except Restore Domain Model) (054)
|
||||
|
||||
**Feature Branch**: `feat/054-unify-operations-runs-suitewide`
|
||||
**Created**: 2026-01-16
|
||||
**Status**: Draft
|
||||
**Input**: User description: "Eliminate run sprawl by adopting one canonical tenant-scoped operation run record for long-running actions across the product, surfaced consistently in Monitoring → Operations, while keeping restore as a separate domain workflow that is still visible via an adapter entry."
|
||||
|
||||
## Clarifications
|
||||
|
||||
### Session 2026-01-16
|
||||
|
||||
- Q: Welche Default-Retention soll 054 für canonical Operation Runs festlegen? → A: 90 days
|
||||
- Q: Transition-Strategie in 054: schreiben wir canonical Runs parallel zu Legacy-Run-Tabellen, oder ersetzen wir sofort? → A: Parallel write (canonical + legacy)
|
||||
- Q: For `restore.execute`, the spec mentions it acts as an "adapter entry" linking to the restore domain record. How should this be implemented? → A: Physical Row (Create a physical row in `operation_runs` that points to the restore record).
|
||||
- Q: How should concurrency and deduplication (FR-009) be enforced at the database level? → A: Partial Unique Index (unique constraint on `tenant_id, run_identity_hash` where outcome is `queued` or `running`).
|
||||
- Q: How should the `initiator` be modeled to support both users and system processes (FR-001)? → A: Nullable FK + Name Snapshot (`user_id` nullable FK + required `initiator_name` string).
|
||||
|
||||
## User Scenarios & Testing *(mandatory)*
|
||||
|
||||
### User Story 1 - See Every Supported Operation in Monitoring (Priority: P1)
|
||||
|
||||
As an operator, I want Monitoring → Operations to show all supported long-running operations for my tenant in one consistent list and detail view, so I can quickly answer what ran, who started it, whether it succeeded/partially succeeded/failed, and where to look next.
|
||||
|
||||
**Why this priority**: This is the core value: a single, tenant-scoped source of truth for operational visibility.
|
||||
|
||||
**Independent Test**: Trigger at least one run of each Phase 1 run producer, then verify each appears in Monitoring with consistent status/outcome semantics, safe failure summaries, and context links.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** I am signed into tenant A, **When** I open Monitoring → Operations, **Then** I see only tenant A runs and can filter by run type, outcome bucket, time range, and initiator.
|
||||
2. **Given** multiple run types exist, **When** I filter to `inventory.sync`, **Then** only inventory sync runs are shown.
|
||||
3. **Given** a run exists, **When** I open its detail view, **Then** I can see initiator, run type, outcome bucket, timestamps, summary counts (if applicable), sanitized failures (if any), and links to relevant feature context/results.
|
||||
4. **Given** restore execution exists, **When** I open Monitoring → Operations, **Then** I can see a `restore.execute` entry that links to the existing restore record (restore history remains owned by the restore domain record).
|
||||
5. **Given** I am a `Readonly` user in tenant A, **When** I view Monitoring → Operations, **Then** I can view runs and details but I do not see any start/rerun/cancel/delete controls.
|
||||
6. **Given** I attempt to access a run from another tenant (direct link or list), **When** I request it, **Then** access is denied and no run details are disclosed.
|
||||
|
||||
---
|
||||
|
||||
### User Story 2 - Start Operations Without Blocking (Priority: P2)
|
||||
|
||||
As an operator, when I start a supported operation, I want immediate confirmation and a “View run” link so I can continue working while the operation runs in the background.
|
||||
|
||||
**Why this priority**: Removes long-running requests/timeouts and standardizes how operations are started and observed.
|
||||
|
||||
**Independent Test**: Start each Phase 1 operation from its owning UI and confirm the start returns quickly, includes “View run”, and the run progresses through queued/running into a terminal outcome.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** I have permission to start a Phase 1 operation in tenant A, **When** I start it, **Then** I receive immediate confirmation with a “View run” link and the run is visible as queued or running.
|
||||
2. **Given** I am a `Readonly` user in tenant A, **When** I attempt to start any Phase 1 operation, **Then** the system denies the request and does not create a new run.
|
||||
3. **Given** the run reaches a terminal outcome, **When** that occurs, **Then** the initiating user receives an in-app notification including a short summary and a “View run” link.
|
||||
4. **Given** background processing is unavailable, **When** I attempt to start an operation, **Then** I receive a clear message and the system MUST NOT claim it was queued.
|
||||
|
||||
---
|
||||
|
||||
### User Story 3 - Duplicate Starts Reuse the Same Active Run (Priority: P3)
|
||||
|
||||
As an operator, I want accidental double-starts (double clicks, two admins, retries) to reuse the same active run so duplicate background work is avoided and results remain auditable.
|
||||
|
||||
**Why this priority**: Reduces load, prevents confusing duplicate outcomes, and makes operations safer under concurrency.
|
||||
|
||||
**Independent Test**: Start the same operation twice with identical effective inputs while the first is queued/running and verify the system reuses the active run.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** an identical run is queued/running for a tenant, **When** another start request is made with the same effective inputs, **Then** the system reuses the existing run and does not start a second one.
|
||||
2. **Given** two starts happen at nearly the same time, **When** the system resolves the race, **Then** at most one active run exists for that identity and both users are directed to it.
|
||||
|
||||
### Edge Cases
|
||||
|
||||
- Background execution unavailable: start fails fast with a clear message; the system MUST NOT create misleading “queued” runs.
|
||||
- Partial processing: at least one success and at least one failure yields “partially succeeded”, with per-item failures when applicable.
|
||||
- Large run history: Monitoring remains usable with filters and defaults (recent runs, last 30 days).
|
||||
- Permissions revoked mid-run: the run continues; visibility is evaluated at time of access.
|
||||
|
||||
## Requirements *(mandatory)*
|
||||
|
||||
**Constitution alignment (required):** If this feature introduces any external tenant API calls or any write/change behavior,
|
||||
the spec MUST describe contract registry updates, safety gates (preview/confirmation/audit), tenant isolation, and tests.
|
||||
|
||||
### Scope & Assumptions
|
||||
|
||||
**Phase 1 adoption set (must be implemented):**
|
||||
|
||||
- `inventory.sync` (Inventory “Sync now”)
|
||||
- `policy.sync` (Policies “Sync now”)
|
||||
- `directory_groups.sync` (Directory → Groups “Sync groups”)
|
||||
- `drift.generate` (Drift “Generate drift now” / auto-on-open when eligible)
|
||||
- `backup_set.add_policies` (Backup Sets “Add selected” / “Add policies”)
|
||||
|
||||
**Restore visibility (adapter only):**
|
||||
|
||||
- `restore.execute` appears as a canonical run entry that links to an existing restore domain record.
|
||||
- Restore execution history remains owned by the restore domain record (not replaced in Phase 1).
|
||||
|
||||
**Out of scope for 054 (explicit):**
|
||||
|
||||
- Cross-tenant compare/promotion
|
||||
- UI redesign/styling polish (separate UI polish work)
|
||||
- Cancel/rerun/delete controls inside Monitoring hub (hub stays view-only)
|
||||
- Replacing restore domain records with canonical runs
|
||||
- A full settings UI for retention/notifications/etc.
|
||||
|
||||
**Assumptions (defaults to remove ambiguity in Phase 1):**
|
||||
|
||||
- Canonical run history retention defaults to 90 days, with no user-facing retention configuration in 054.
|
||||
- System-initiated runs (if any) do not notify users by default in Phase 1.
|
||||
- Transition strategy: write canonical runs in parallel with any existing legacy per-module run tables (where they exist); Monitoring uses canonical runs as the source of truth immediately.
|
||||
|
||||
### Functional Requirements
|
||||
|
||||
- **FR-001 Canonical Operation Run**: System MUST represent each supported operation execution as a canonical, tenant-scoped operation run record that captures initiator (nullable `user_id` FK + `initiator_name` string), run type, lifecycle status/timestamps, outcome bucket, summary counts (when applicable), safe failure summaries, an idempotency identity for dedupe, and a safe context payload referencing “what this run was about”.
|
||||
- **FR-002 Run taxonomy**: Run type MUST be stable and follow `"<resource>.<action>"`.
|
||||
- **FR-003 Phase 1 run types**: Phase 1 run types MUST include `inventory.sync`, `policy.sync`, `directory_groups.sync`, `drift.generate`, `backup_set.add_policies`, plus `restore.execute` implemented as a physical `operation_runs` record (adapter) pointing to the domain entity.
|
||||
- **FR-004 Monitoring lists all canonical runs**: Monitoring → Operations MUST list canonical runs for the active tenant with filters for run type, outcome bucket, time range, and initiator; default sort is most recent first; default time window is last 30 days.
|
||||
- **FR-005 Run detail**: Run detail MUST show initiator, run type, outcome bucket, timestamps (created/started/finished), summary counts (when applicable), sanitized failures (including per-item failures when applicable), and contextual links to owning feature surfaces/results.
|
||||
- **FR-006 View-only hub**: Monitoring hub MUST be view-only (no start/rerun/cancel/delete controls) and MUST link back to owning feature surfaces.
|
||||
- **FR-007 Start surfaces always enqueue**: Every Phase 1 start surface MUST authorize start, create/reuse a canonical run (dedupe), dispatch background execution, and return immediately with confirmation + “View run”.
|
||||
- **FR-008 No remote work in interactive request**: Start surfaces MUST NOT perform remote work inline; long-running work happens in background execution.
|
||||
- **FR-009 Deterministic idempotency**: For each run type, the system MUST define a deterministic identity for “identical run” based on tenant + effective inputs; initiator MUST NOT be part of identity. **Enforcement**: Uniqueness MUST be enforced via a partial unique index on `(tenant_id, run_identity_hash)` where outcome is `queued` or `running`.
|
||||
- **FR-010 Phase 1 identity rules**: Identity rules MUST be defined at least as follows:
|
||||
- `inventory.sync`: tenant + selection scope
|
||||
- `policy.sync`: tenant + effective policy scope
|
||||
- `directory_groups.sync`: tenant + selection (Phase 1 default: “all groups”)
|
||||
- `backup_set.add_policies`: tenant + backup set + selected policies + option flags (if exposed)
|
||||
- `drift.generate`: tenant + scope key + baseline/current comparison inputs
|
||||
- **FR-011 Outcome buckets**: Monitoring MUST present consistent outcome buckets: `queued`, `running`, `succeeded`, `partially succeeded`, `failed`.
|
||||
- **FR-012 Partial vs failed**: “Partially succeeded” means at least one success and at least one failure; “Failed” means zero successes or cannot proceed.
|
||||
- **FR-013 Failure details are safe + useful**: Failures MUST be persisted and displayed as stable reason codes and short sanitized messages; failures MUST NOT include secrets/tokens/credentials/PII or full external payload dumps.
|
||||
- **FR-014 Related links**: Run detail MUST include contextual links where applicable (e.g., drift findings, backup set, inventory results, directory groups, restore detail for `restore.execute`).
|
||||
- **FR-015 Notifications**: System MUST emit in-app notifications for “queued” (after start) and terminal outcomes for Phase 1 runs; notifications MUST include a short summary and a “View run” link; recipients are the initiating user only.
|
||||
- **FR-016 Tenant isolation**: All run list/detail access MUST be tenant-scoped; cross-tenant access MUST be denied without disclosing run details.
|
||||
- **FR-017 No render-time remote calls**: Monitoring pages MUST be render-safe and MUST NOT depend on external service calls during render.
|
||||
- **FR-018 Roles & permissions**: Roles `Owner`, `Manager`, `Operator`, and `Readonly` MUST be able to view runs; only `Owner`, `Manager`, `Operator` may start operations; `Readonly` is strictly view-only.
|
||||
|
||||
### Key Entities *(include if feature involves data)*
|
||||
|
||||
- **Canonical Operation Run**: A tenant-scoped record representing the lifecycle of a long-running operation, including run type, initiator (nullable `user_id` FK + `initiator_name` string), lifecycle state/timestamps, outcome bucket, summary counts, safe failure summaries, idempotency identity (uniqueness enforced by DB index on active runs), and safe context references.
|
||||
- **Restore domain record (exception)**: Restore remains a domain workflow record with richer semantics and history. Monitoring shows restore activity through a physical `operation_runs` row (adapter) that links back to the restore record, without replacing it.
|
||||
|
||||
## Success Criteria *(mandatory)*
|
||||
|
||||
### Measurable Outcomes
|
||||
|
||||
- **SC-001**: Operators can answer “what ran, when, and did it succeed?” for any Phase 1 run in under 1 minute using Monitoring → Operations.
|
||||
- **SC-002**: Starting a Phase 1 operation returns confirmation + “View run” link within 2 seconds under normal conditions.
|
||||
- **SC-003**: Duplicate starts reuse the same active run in at least 99% of attempts under normal conditions.
|
||||
- **SC-004**: No secrets/tokens/credentials/PII appear in persisted failures or notifications (verified by tests).
|
||||
64
specs/054-unify-runs-suitewide/tasks.md
Normal file
64
specs/054-unify-runs-suitewide/tasks.md
Normal file
@ -0,0 +1,64 @@
|
||||
# Tasks: Unified Operations Runs Suitewide
|
||||
|
||||
**Feature**: `054-unify-runs-suitewide`
|
||||
**Spec**: `specs/054-unify-runs-suitewide/spec.md`
|
||||
|
||||
## Phase 1: Foundation (DB & Service)
|
||||
|
||||
- [ ] **Migration**: Create `operation_runs` table with partial unique index on `(tenant_id, run_identity_hash)` where status in `queued, running`.
|
||||
- [ ] **Model**: Create `OperationRun` model with casts (JSONB for summaries/context), relationship to `Tenant` and `User`.
|
||||
- [ ] **Service**: Implement `OperationRunService::ensureRun()` (idempotent creation) and `updateRun()` methods.
|
||||
- [ ] **Test**: Feature test for `ensureRun` verifying idempotency (same hash = same run) and concurrency safety (simulated).
|
||||
- [ ] **Test**: Feature test for `updateRun` verifying status transitions and history logging (if any).
|
||||
- [ ] **Job Middleware**: Create `TrackOperationRun` middleware to automatically handle job success/failure updates for jobs using this system.
|
||||
- [ ] **Retention**: Create a daily scheduled job to prune `operation_runs` older than 90 days.
|
||||
|
||||
## Phase 2: Monitoring UI (Read-Only)
|
||||
|
||||
- [ ] **Page**: Create Filament Page `Monitoring/Operations` (List) strictly scoped to current tenant.
|
||||
- [ ] **Table**: Implement `OperationRun` table with columns: Status (Badge), Operation Type, Initiator, Started At, Duration, Outcome.
|
||||
- [ ] **Filters**: Add table filters for `Type`, `Outcome`, `Date Range`, `Initiator`.
|
||||
- [ ] **Detail View**: Create "View Run" modal or separate page showing:
|
||||
- Summary counts (Success/Fail/Total)
|
||||
- Failure list (Sanitized codes/messages)
|
||||
- Context JSON (Debug info)
|
||||
- Timeline (Created/Started/Finished)
|
||||
- [ ] **Test**: Livewire test verifying `Readonly` users can see table but no actions.
|
||||
- [ ] **Test**: Verify cross-tenant access is blocked.
|
||||
|
||||
## Phase 3: Producer Migration (Parallel Write)
|
||||
|
||||
### Inventory Sync (`inventory.sync`)
|
||||
- [ ] **Refactor**: Update `RunInventorySyncJob` dispatch logic to call `OperationRunService::ensureRun()` first.
|
||||
- [ ] **Refactor**: Update Job to use `TrackOperationRun` middleware (or manual updates) to sync status to `operation_runs`.
|
||||
- [ ] **Verify**: Ensure legacy `inventory_sync_runs` is still written to (if legacy UI depends on it) OR confirm legacy UI is replaced. *Decision: Parallel write as per spec.*
|
||||
|
||||
### Policy Sync (`policy.sync`)
|
||||
- [ ] **Refactor**: Update Policy Sync start logic to use `OperationRunService`.
|
||||
- [ ] **Refactor**: Instrument Policy Sync job to update `operation_runs`.
|
||||
|
||||
### Directory Groups Sync (`directory_groups.sync`)
|
||||
- [ ] **Refactor**: Update Group Sync start logic to use `OperationRunService`.
|
||||
- [ ] **Refactor**: Instrument Group Sync job to update `operation_runs`.
|
||||
|
||||
### Drift Generation (`drift.generate`)
|
||||
- [ ] **Refactor**: Update Drift Generation start logic to use `OperationRunService`.
|
||||
- [ ] **Refactor**: Instrument Drift job to update `operation_runs`.
|
||||
|
||||
### Backup Set (`backup_set.add_policies`)
|
||||
- [ ] **Refactor**: Update "Add Policies" action to use `OperationRunService`.
|
||||
|
||||
## Phase 4: Restore Adapter
|
||||
|
||||
- [ ] **Listener**: Create `SyncRestoreRunToOperation` listener observing `RestoreRun` events (`created`, `updated`).
|
||||
- [ ] **Logic**: Map `RestoreRun` status/outcomes to `OperationRun` schema.
|
||||
- `RestoreRun` created -> `OperationRun` created (queued/running).
|
||||
- `RestoreRun` updated -> `OperationRun` updated.
|
||||
- [ ] **Context**: Store `{"restore_run_id": <id>}` in `OperationRun.context`.
|
||||
- [ ] **Test**: Verify creating a `RestoreRun` automatically spawns a shadow `OperationRun`.
|
||||
|
||||
## Phase 5: Notifications & Polish
|
||||
|
||||
- [ ] **Notifications**: Implement Database Notifications for "Run Started" (with link) and "Run Completed" (with outcome).
|
||||
- [ ] **Frontend**: Ensure "View Run" link in Toast notifications correctly opens the Monitoring Detail view.
|
||||
- [ ] **Final Verify**: Run through the `requirements.md` checklist manually.
|
||||
Loading…
Reference in New Issue
Block a user