TenantAtlas/specs/134-audit-log-foundation/research.md
2026-03-11 10:35:32 +01:00

9.0 KiB
Raw Blame History

Research: Audit Log Foundation

Feature: 134-audit-log-foundation | Date: 2026-03-11

R1: Whether to create a new audit table or evolve the existing one

Decision: Evolve the existing audit_logs table in place and treat it as the canonical first-class audit store, adding the richer semantics required by the spec through compatibility-safe migrations and backfills.

Rationale: The repo already has a durable audit_logs table, a model, and multiple active readers and writers. Creating a second audit table would split history, duplicate migration effort, and force existing consumers to choose between two incompatible sources of truth. The current shape is too narrow for the new product requirement, but it is close enough to extend safely.

Alternatives considered:

  • Create a brand-new audit_events table and leave audit_logs as legacy: rejected because it would fragment the evidence trail and multiply compatibility work.
  • Leave the current table shape unchanged and store everything in metadata: rejected because that would bury primary audit semantics in opaque JSON and fail the summary-first requirement.

R2: How to handle the current split between tenant and workspace audit writers

Decision: Consolidate audit creation behind one shared recorder foundation, with the existing App\Services\Intune\AuditLogger, App\Services\Audit\WorkspaceAuditLogger, and App\Services\SystemConsole\SystemConsoleAuditLogger becoming wrappers or compatibility layers over the shared recorder.

Rationale: The codebase already writes audit entries from both tenant-scoped and workspace-scoped services, but the writer API is inconsistent and does not expose the richer actor or target semantics the new feature needs. A shared recorder avoids a third parallel logger and enables gradual migration of existing call sites.

Alternatives considered:

  • Keep both current loggers and add conventions only in documentation: rejected because the current drift is already visible in action naming and payload shape.
  • Replace every call site directly with model inserts: rejected because it would increase inconsistency and reduce redaction safety.

R3: How to define event taxonomy without starting from zero

Decision: Use App\Support\Audit\AuditActionId as the seed of the v1 audit taxonomy, expand it to cover the missing event families, and migrate existing free-form strings toward the shared registry.

Rationale: The repo already centralizes several workspace and baseline-related audit action IDs, which proves the direction is accepted by the codebase. At the same time, important workflows such as FindingWorkflowService still emit strings like finding.triaged directly. Expanding the registry creates one authoritative naming contract without throwing away existing intent.

Alternatives considered:

  • Keep free-form action strings and document naming guidelines only: rejected because the spec explicitly forbids event-name drift.
  • Introduce a second registry separate from AuditActionId: rejected because the codebase already has a recognizable naming anchor.

R4: Where first-wave event writes should occur

Decision: Instrument meaningful service and job boundaries rather than generic model observers or blanket saved hooks.

Rationale: The spec prioritizes high-signal actions and state transitions, not every low-level field write. Existing domain services already define the right mutation boundaries: FindingWorkflowService for findings and risk acceptance, baseline capture and compare jobs for baseline outcomes, backup and restore services and jobs for operational events, SettingsWriter and WorkspaceMembershipManager for workspace-admin changes, and existing operation jobs for high-value failures and completions.

Alternatives considered:

  • Audit every model save generically: rejected because it would create noisy, low-signal history and make human-readable summaries harder.
  • Rely only on UI-layer actions: rejected because queued jobs and background outcomes would be missed.

R5: How to implement the canonical audit UI without changing route semantics

Decision: Keep App\Filament\Pages\Monitoring\AuditLog at /admin/audit-log as the canonical workspace Monitoring surface and implement its list, filters, and detail inspection directly on that page.

Rationale: The route and navigation already exist, but the page is only a placeholder. Keeping the custom page preserves canonical routing and Monitoring placement while avoiding the overhead of introducing a new Resource solely for immutable historical records.

Alternatives considered:

  • Replace the page with a new Resource: rejected because the current route is already canonical and the feature is a Monitoring work surface, not CRUD.
  • Create a tenant-specific audit page under /admin/t/{tenant}: rejected because the spec requires a workspace-first canonical model with tenant-aware filtering, not split route ownership.

R6: Which existing Filament patterns best fit the audit page

Decision: Reuse the table and filter conventions from OperationRunResource and AlertDeliveryResource, plus existing FilterPresets date-range helpers and RelatedNavigationResolver for permission-aware drill-down links.

Rationale: These are already workspace-safe Monitoring surfaces with reverse-chronological ordering, date-range filters, scoped tenant filters, and linked inspection patterns. The audit page has very similar operator needs.

Alternatives considered:

  • Build all filters ad hoc inside the page: rejected because the repo already has reusable filter and navigation patterns.
  • Use raw JSON-first detail rendering: rejected because the spec explicitly requires readable structured context before raw payload views.

R7: How to handle outcome and actor presentation consistently

Decision: Add audit-specific badge domains to the existing BadgeCatalog / BadgeRenderer system rather than mapping audit outcome colors inside the page.

Rationale: The constitutions BADGE-001 rule requires centralized status-like semantics. The repo already centralizes operation, backup, restore, finding, and alert badge states through BadgeCatalog.

Alternatives considered:

  • Render audit outcome chips with inline color closures: rejected because it would violate BADGE-001 and create drift.
  • Reuse operation outcome badges directly without an audit domain: rejected because audit outcomes include informational and blocked semantics that are not identical to operation-run outcomes.

R8: What to do about existing retention behavior that conflicts with the spec

Decision: Treat the current deletion of audit_logs in tenantpilot:purge-nonpersistent as a behavior that must change as part of this foundation, so durable audit history is no longer handled as ephemeral tenant noise.

Rationale: The new spec explicitly requires a durable retention posture. The purge command currently deletes audit logs alongside regeneratable tenant data, which directly conflicts with that posture.

Alternatives considered:

  • Leave purge behavior unchanged and document retention as “best effort”: rejected because it would make the audit feature untrustworthy.
  • Remove all purge access to audit logs without documenting why: rejected because operators still need a clear explicit retention stance.

R9: How to represent risk acceptance in the first release

Decision: Treat risk acceptance as part of the existing findings workflow in v1, because the current domain model represents it primarily through Finding::STATUS_RISK_ACCEPTED and related workflow actions rather than a separate persistent RiskAcceptance model.

Rationale: The spec wants risk acceptance lifecycle visibility, but the codebase today appears to model at least the core acceptance action through finding status transitions. The first plan should cover the current domain truth instead of inventing a richer separate entity before the implementation proves it exists.

Alternatives considered:

  • Introduce a separate risk-acceptance data model during this foundation: rejected because it broadens scope beyond the audit foundation and is not required to begin auditable coverage.
  • Omit risk-acceptance coverage entirely: rejected because it is explicitly called out as a high-value governance action.

R10: How to preserve compatibility for existing audit readers

Decision: Preserve existing readers such as the system-console access logs page while migrating the richer audit schema, using backward-compatible field fallbacks where needed.

Rationale: app/Filament/System/Pages/Security/AccessLogs.php already queries AuditLog for platform login and break-glass events. The audit foundation should not silently break that consumer while focusing on the new workspace Monitoring page.

Alternatives considered:

  • Rewrite all readers immediately to the new final schema: rejected because it increases rollout risk unnecessarily.
  • Ignore existing readers during migration: rejected because it would create regressions outside the features main surface.