TenantAtlas/specs/243-product-usage-adoption-telemetry/plan.md
Ahmed Darrazi f38b8884ff
Some checks failed
PR Fast Feedback / fast-feedback (pull_request) Failing after 52s
feat(product-telemetry): implement spec 243 - product usage adoption telemetry
2026-04-26 22:46:32 +02:00

17 KiB

Implementation Plan: Product Usage & Adoption Telemetry

Branch: 243-product-usage-adoption-telemetry | Date: 2026-04-26 | Spec: /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/243-product-usage-adoption-telemetry/spec.md Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/243-product-usage-adoption-telemetry/spec.md

Note: This template is filled in by the /speckit.plan command. See .specify/scripts/ for helper scripts.

Summary

  • Add one tenant-owned telemetry ledger for a bounded set of user-initiated product milestones only: onboarding checkpoint completion, support diagnostics opened, tenant-bound operation started, stored report created, and review-pack generation requested.
  • Reuse existing trustworthy source seams instead of inventing passive page tracking or scraping domain tables later: OnboardingLifecycleService, support-diagnostics actions, OperationRunService, EntraAdminRolesReportService, PermissionPostureFindingGenerator, and ReviewPackService become the only v1 write paths.
  • Surface only one read-only adoption summary on the existing system dashboard through a native widget that follows the current SystemConsoleWindow filter semantics, renders five visible event families in v1, and includes active-workspace participation for the selected window. No raw event browser, no customer-facing analytics, and no AuditLog or OperationRun overloading are allowed.

Technical Context

Language/Version: PHP 8.4 (Laravel 12)
Primary Dependencies: Laravel 12 + Filament v5 + Livewire v4 + Pest; existing OnboardingLifecycleService, OperationRunService, SupportDiagnosticBundleBuilder, ReviewPackService, EntraAdminRolesReportService, PermissionPostureFindingGenerator, system dashboard widgets
Storage: PostgreSQL via one new tenant-owned product_usage_events table; source truth stays on existing onboarding, operation, report, and review-pack tables
Testing: Pest unit + feature tests only
Validation Lanes: fast-feedback, confidence
Target Platform: Sail-backed Laravel admin and system panels under /admin and /system
Project Type: web
Performance Goals: one cheap insert per eligible source milestone, no passive page-view chatter, and one indexed aggregate query for the system dashboard time window without scanning arbitrary logs
Constraints: tenant-bound rows only, no pre-tenant onboarding events, no initiator-null operation telemetry, no raw payloads or free text in metadata, no third-party analytics, no raw event browser, no customer-facing analytics, and no new panel or provider registration changes
Scale/Scope: 5 code-owned event names, 1 dashboard widget, 1 recorder, 1 summary query, 1 prune command, 1 config-backed 90-day retention rule, and focused source-seam instrumentation only

UI / Surface Guardrail Plan

  • Guardrail scope: changed surfaces
  • Native vs custom classification summary: native Filament + shared stats widget
  • Shared-family relevance: dashboard signals/cards
  • State layers in scope: page, widget, URL query
  • Handling modes by drift class or surface: review-mandatory
  • Repository-signal treatment: review-mandatory
  • Special surface test profiles: standard-native-filament
  • Required tests or manual smoke: functional-core, state-contract
  • Exception path and spread control: none
  • Active feature PR close-out entry: Guardrail

Shared Pattern & System Fit

  • Cross-cutting feature marker: yes
  • Systems touched: App\Filament\System\Pages\Dashboard, App\Filament\System\Widgets\ControlTowerKpis, App\Services\Onboarding\OnboardingLifecycleService, App\Support\SupportDiagnostics\SupportDiagnosticBundleBuilder, App\Services\OperationRunService, App\Services\EntraAdminRoles\EntraAdminRolesReportService, App\Services\PermissionPosture\PermissionPostureFindingGenerator, App\Services\ReviewPackService, and the support-diagnostics page actions on TenantDashboard and TenantlessOperationRunViewer
  • Shared abstractions reused: existing system dashboard widget conventions, existing source-owned service/action seams, and current workspace/tenant context resolution before writes
  • New abstraction introduced? why?: one bounded ProductTelemetryRecorder, one code-owned event catalog, and one summary query are justified because telemetry semantics do not belong on the existing audit, operation, or user-preference models
  • Why the existing abstraction was sufficient or insufficient: existing source seams know when a trustworthy milestone happened, but there is no shared telemetry contract or aggregate read path today
  • Bounded deviation / spread control: no page-local counters, no direct writes from Blade or Livewire render hooks, and no domain-table-specific telemetry sidecar fields

OperationRun UX Impact

  • Touches OperationRun start/completion/link UX?: no
  • Central contract reused: N/A
  • Delegated UX behaviors: N/A
  • Surface-owned behavior kept local: N/A
  • Queued DB-notification policy: N/A
  • Terminal notification path: N/A
  • Exception path: none

Provider Boundary & Portability Fit

  • Shared provider/platform boundary touched?: yes
  • Provider-owned seams: provider-backed operation types, report generation sources, support-diagnostic provider context
  • Platform-core seams: telemetry event names, feature-area labels, safe metadata schema, system dashboard widget labels
  • Neutral platform terms / contracts preserved: product telemetry, usage event, feature area, subject reference, active workspaces, recent signals
  • Retained provider-specific semantics and why: stable canonical operation and report type identifiers may appear in safe metadata because they are already product-owned identifiers used across the repo
  • Bounded extraction or follow-up path: no multi-provider telemetry abstraction beyond the bounded event catalog; later customer-health work reuses this shape rather than adding a parallel one

Constitution Check

GATE: Must pass before implementation begins. Re-check after design changes.

  • Inventory-first / snapshots-second: PASS - telemetry observes product usage only and does not become an external source of truth for tenant configuration, inventory, or backup state
  • Read/write separation: PASS - telemetry writes are bounded product-observability writes triggered after existing source actions succeed; no tenant-changing behavior is added
  • Graph contract path: PASS - the feature adds no new Graph calls
  • RBAC-UX plane separation: PASS - writes originate in existing admin-plane flows after authorization; reads remain system-plane only via the existing dashboard gate
  • Workspace isolation / tenant isolation: PASS - telemetry rows are tenant-owned with workspace_id and tenant_id required; no cross-tenant raw event viewer is introduced
  • Run observability / Ops-UX: PASS - OperationRun remains execution truth only; telemetry observes a successful tenant-bound user start without altering run UX or lifecycle
  • Shared pattern reuse / XCUT-001: PASS - widget reuse and source-seam reuse are explicit; no page-local or model-local side ledgers are planned
  • Provider boundary / PROV-001: PASS - telemetry stores platform-neutral event names and only stable canonical type identifiers, not provider payload or provider transport truth
  • Proportionality / PROP-001 and ABSTR-001: PASS - the new structure is justified by a concrete operator need and kept to one bounded ledger, one recorder, one summary query, and one widget
  • Persisted truth / PERSIST-001: PASS - telemetry rows represent independent product-observability truth with their own retention lifecycle and later reuse by Customer Health Score
  • Behavioral state / STATE-001: PASS - the event catalog changes later operator visibility and product-health workflows; it is not presentation-only decoration
  • Filament-native UI / UI-FIL-001: PASS - visibility stays on a native system widget only
  • Global search rule: N/A - no new global-searchable resource is introduced
  • Panel/provider registration: PASS - no panel or provider registration changes are planned; Livewire remains v4-compatible and provider registration stays in bootstrap/providers.php
  • Test governance / TEST-GOV-001: PASS - proof stays in focused unit + feature coverage only

Test Governance Check

  • Test purpose / classification by changed surface: Unit for event-catalog legality, safe metadata, and summary-query behavior; Feature for source capture from real service/action seams plus dashboard access and visibility
  • Affected validation lanes: fast-feedback, confidence
  • Why this lane mix is the narrowest sufficient proof: the feature is server-driven and data-focused; unit tests prove the bounded contract, while feature tests prove the real write and read seams without browser duplication
  • Narrowest proving command(s):
    • cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/ProductTelemetry/ProductUsageEventCatalogTest.php tests/Unit/Support/ProductTelemetry/ProductTelemetryRecorderTest.php tests/Unit/Support/ProductTelemetry/ProductTelemetrySafeMetadataTest.php tests/Unit/Support/ProductTelemetry/ProductTelemetrySummaryQueryTest.php
    • cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Onboarding/ProductTelemetryOnboardingCaptureTest.php tests/Feature/SupportDiagnostics/ProductTelemetrySupportDiagnosticsCaptureTest.php tests/Feature/Operations/ProductTelemetryOperationStartCaptureTest.php tests/Feature/Reports/ProductTelemetryReportCaptureTest.php
    • cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/System/ProductTelemetry/ProductTelemetryDashboardWidgetTest.php tests/Feature/System/ProductTelemetry/ProductTelemetryAuthorizationTest.php tests/Feature/System/ProductTelemetry/ProductTelemetryRetentionTest.php tests/Feature/System/ProductTelemetry/NoAdHocTelemetryBypassTest.php
  • Fixture / helper / factory / seed / context cost risks: reuse existing workspace, tenant, user, onboarding session, operation-run, stored-report, and review-pack fixtures; keep any telemetry helper local to this family only
  • Expensive defaults or shared helper growth introduced?: no
  • Heavy-family additions, promotions, or visibility changes: none
  • Surface-class relief / special coverage rule: standard-native-filament relief is sufficient for the system widget; no browser harness is required
  • Closing validation and reviewer handoff: reviewers should verify tenant-bound rows only, safe metadata only, no AuditLog or OperationRun overload, no passive page-view events, no initiator-null capture, and no raw event browser
  • Budget / baseline / trend follow-up: none expected beyond ordinary feature-local upkeep
  • Review-stop questions: did the implementation add passive page views, a raw event list, or a second telemetry store; did any metadata accept free text or raw payloads; did any read surface leave the system plane?
  • Escalation path: reject-or-split if implementation widens into broad analytics or customer-facing dashboards; document-in-feature for small source-seam additions that stay bounded to the first-slice catalog
  • Active feature PR close-out entry: Guardrail

Project Structure

Documentation (this feature)

specs/243-product-usage-adoption-telemetry/
├── checklists/
│   └── requirements.md
├── spec.md
├── plan.md
└── tasks.md

Source Code (repository root)

apps/platform/
├── app/
│   ├── Filament/Pages/Operations/TenantlessOperationRunViewer.php
│   ├── Filament/Pages/TenantDashboard.php
│   ├── Filament/System/Pages/Dashboard.php
│   ├── Filament/System/Widgets/
│   │   └── ProductTelemetryKpis.php
│   ├── Models/
│   │   └── ProductUsageEvent.php
│   ├── Support/ProductTelemetry/
│   │   ├── ProductTelemetryRecorder.php
│   │   ├── ProductTelemetrySummaryQuery.php
│   │   └── ProductUsageEventCatalog.php
│   ├── Services/Onboarding/OnboardingLifecycleService.php
│   ├── Services/EntraAdminRoles/EntraAdminRolesReportService.php
│   ├── Services/PermissionPosture/PermissionPostureFindingGenerator.php
│   ├── Services/ReviewPackService.php
│   ├── Services/OperationRunService.php
│   ├── Support/SupportDiagnostics/SupportDiagnosticBundleBuilder.php
│   └── Console/Commands/
│       └── PruneProductUsageEventsCommand.php
├── config/
│   └── tenantpilot.php
├── database/
│   ├── factories/
│   │   └── ProductUsageEventFactory.php
│   └── migrations/
│       └── *_create_product_usage_events_table.php
├── routes/
│   └── console.php
└── tests/
    ├── Unit/Support/ProductTelemetry/
    │   ├── ProductUsageEventCatalogTest.php
    │   ├── ProductTelemetryRecorderTest.php
    │   ├── ProductTelemetrySafeMetadataTest.php
    │   └── ProductTelemetrySummaryQueryTest.php
    └── Feature/
        ├── Onboarding/ProductTelemetryOnboardingCaptureTest.php
        ├── Operations/ProductTelemetryOperationStartCaptureTest.php
        ├── Reports/ProductTelemetryReportCaptureTest.php
        ├── SupportDiagnostics/ProductTelemetrySupportDiagnosticsCaptureTest.php
        └── System/ProductTelemetry/
            ├── ProductTelemetryAuthorizationTest.php
            ├── ProductTelemetryDashboardWidgetTest.php
            ├── ProductTelemetryRetentionTest.php
            └── NoAdHocTelemetryBypassTest.php

Structure Decision: Single Laravel web application. The feature adds one bounded telemetry support namespace and one system widget while reusing existing domain services and support-diagnostics page actions as source seams.

Complexity Tracking

No constitution violations are required. The only new persisted truth and abstraction are the explicitly justified tenant-owned telemetry ledger plus its bounded recorder and summary query.

Proportionality Review

  • Current operator problem: product adoption and usage still require anecdotal inference or log inspection
  • Existing structure is insufficient because: audit, operation, report, review-pack, and tenant-preference models each describe different truths and cannot safely stand in for adoption telemetry
  • Narrowest correct implementation: one tenant-owned event table, one bounded event catalog, one recorder, one summary query, and one aggregate system widget
  • Ownership cost created: migration, model, recorder, query, prune command, widget, config key, scheduler entry, and focused tests
  • Alternative intentionally rejected: AuditLog piggyback, OperationRun-context piggyback, UserTenantPreference counters, passive page-view tracking, third-party analytics
  • Release truth: current-release truth

Rollout & Risk Controls

  • Start with five code-owned event names only. Adding more events requires revisiting the spec scope, not silent catalog growth.
  • Keep the first slice tenant-bound and user-initiated only. Pre-tenant onboarding and system-initiated signals are explicit non-goals.
  • Keep the read surface aggregate-only on /system. A raw event list or customer-facing reporting requires a later spec.
  • Use a config-backed 90-day retention window via tenantpilot.product_usage_event_retention_days and schedule tenantpilot:product-usage:prune daily in apps/platform/routes/console.php so telemetry does not become an unbounded side history.

Implementation Outline

  • Add the product_usage_events table, model, factory, bounded catalog, recorder, summary query, config-backed retention rule, and prune command.
  • Instrument the five declared source seams only: onboarding checkpoint completion, support diagnostics opened, tenant-bound user-started operation, stored-report creation, and review-pack generation request.
  • Add a native system dashboard widget that reuses the existing SystemConsoleWindow selection and shows aggregate counts only.
  • Add unit and feature tests that prove safe metadata, tenant-bound scope, source capture, system access, and retention.

Constitution Check (Post-Design)

Re-check result: PASS. The plan stays bounded to one tenant-owned observability ledger, reuses existing source seams and native system widgets, keeps provider specifics out of the platform-core contract, leaves OperationRun UX unchanged, fixes retention to one explicit config-backed 90-day rule with a daily scheduler anchor in apps/platform/routes/console.php, and limits proof to unit + feature coverage.