Ahmed Darrazi 75979e7995 chore(worker): add structured logging, job events, worker health endpoint and health-check script

2025-12-09 12:22:16 +01:00

22 KiB

Raw Permalink Blame History

Tasks: Backend Architecture Pivot

Feature: 005-backend-arch-pivot Generated: 2025-12-09 Total Tasks: 66 (T001-T066) Spec: spec.md | Plan: plan.md

Phase 1: Setup (no story label)

T001 Confirm Dokploy-provided REDIS_URL and record connection string in specs/005-backend-arch-pivot/notes.md
T002 Add REDIS_URL to local .env.example and project .env (if used) (.env.example)
T003 Update lib/env.mjs to validate REDIS_URL (lib/env.mjs)
T004 [P] Add npm dependencies: bullmq, ioredis, @azure/identity and dev tsx (package.json)
T005 [P] Add npm script worker:start to package.json to run tsx ./worker/index.ts (package.json)
T006 [P] Create lib/queue/redis.ts - Redis connection wrapper reading process.env.REDIS_URL (lib/queue/redis.ts)
T007 [P] Create lib/queue/syncQueue.ts - Export BullMQ Queue('intune-sync-queue') (lib/queue/syncQueue.ts)
T008 Test connectivity: add a dummy job from a Node REPL/script and verify connection to provided Redis (scripts/test-queue-connection.js)

Phase 2: Worker Skeleton (no story label)

T009 Create worker/index.ts - minimal BullMQ Worker entry point (concurrency:1) (worker/index.ts)
T010 Create worker/logging.ts - structured JSON logger used by worker (worker/logging.ts)
T011 Create worker/events.ts - job lifecycle event handlers (completed/failed) (worker/events.ts)
T012 [P] Add npm run worker:start integration to README.md with run instructions (README.md)
T013 Create worker/health.ts - minimal health check handlers (used in docs) (worker/health.ts)
T014 Smoke test: start npm run worker:start and verify worker connects and logs idle state (no file)

Phase 3: US1 — Manual Policy Sync via Queue [US1]

T015 [US1] Update lib/actions/policySettings.ts → implement triggerPolicySync() to call syncQueue.add(...) and return jobId (lib/actions/policySettings.ts)
T016 [US1] Create server action wrapper if needed app/actions/triggerPolicySync.ts (app/actions/triggerPolicySync.ts)
T017 [US1] Update /app/search/SyncButton.tsx to call server action and show queued toast with jobId (components/search/SyncButton.tsx)
T018 [US1] Add API route /api/policy-sync/status (optional) to report job status using BullMQ Job API (app/api/policy-sync/status/route.ts)
T019 [US1] Add simple job payload typing types/syncJob.ts (types/syncJob.ts)
T020 [US1] Add unit test for triggerPolicySync() mocking syncQueue.add (tests/unit/triggerPolicySync.test.ts)
T021 [US1] End-to-end test: UI → triggerPolicySync → job queued (integration test) (tests/e2e/sync-button.test.ts)
T022 [US1] OPTIONAL [P] Document MVP scope for job status endpoint (FR-022) in specs/005-backend-arch-pivot/notes.md (specs/005-backend-arch-pivot/notes.md)

Phase 4: US2 — Microsoft Graph Data Fetching [US2]

T023 [US2] Create worker/jobs/graphAuth.ts - getGraphAccessToken() using @azure/identity (worker/jobs/graphAuth.ts)
T024 [US2] Create worker/jobs/graphFetch.ts - fetchFromGraph(endpoint) with pagination following @odata.nextLink (worker/jobs/graphFetch.ts)
T025 [US2] Implement worker/utils/retry.ts - exponential backoff retry helper (worker/utils/retry.ts)
T026 [US2] Create integration tests mocking Graph endpoints for paginated responses (tests/integration/graphFetch.test.ts)
T027 [US2] Implement rate limit handling and transient error classification in graphFetch.ts (worker/jobs/graphFetch.ts)
T028 [US2] Add logging for Graph fetch metrics (requests, pages, duration) (worker/logging.ts)
T029 [US2] Test: run syncPolicies job locally against mocked Graph responses (tests/e2e/sync-with-mock-graph.test.ts)

Phase 5: US3 — Deep Flattening & Transformation [US3]

T030 [US3] Create worker/jobs/policyParser.ts - top-level router and parsePolicySettings() (worker/jobs/policyParser.ts)
T031 [US3] Implement Settings Catalog parser in policyParser.ts (worker/jobs/policyParser.ts)
T032 [US3] Implement OMA-URI parser in policyParser.ts (worker/jobs/policyParser.ts)
T033 [US3] Create worker/utils/humanizer.ts - humanizeSettingId() function (worker/utils/humanizer.ts)
T034 [US3] Create normalization function worker/jobs/normalizer.ts to produce PolicyInsertData[] (worker/jobs/normalizer.ts)
T035 [US3] Unit tests for parsers + humanizer with representative Graph samples (tests/unit/policyParser.test.ts)

Phase 6: US3 — Database Persistence (shared, assign to US3) [US3]

T036 [US3] Create worker/jobs/dbUpsert.ts - batch upsert function using Drizzle (worker/jobs/dbUpsert.ts)
T037 [US3] Implement transactional upsert logic and ON CONFLICT DO UPDATE behavior (worker/jobs/dbUpsert.ts)
T038 [US3] Add performance tuning: batch size config and bulk insert strategy (worker/jobs/dbUpsert.ts)
T039 [US3] Add tests for upsert correctness (duplicates / conflict resolution) (tests/integration/dbUpsert.test.ts)
T040 [US3] Add lastSyncedAt update on upsert (worker/jobs/dbUpsert.ts)
T041 [US3] Load test: upsert 500+ policies and measure duration (scripts/load-tests/upsert-benchmark.js)
T042 [US3] Instrument metrics for DB operations (timings, rows inserted/updated) (worker/logging.ts)
T043 [US3] Validate data integrity end-to-end (Graph → transform → DB) (tests/e2e/full-sync.test.ts)

Phase 7: US4 — Frontend Integration & Legacy Cleanup [US4]

[X] T044 [US4] Update lib/actions/policySettings.ts to remove n8n webhook calls and call triggerPolicySync() (lib/actions/policySettings.ts) [X] T045 [US4] Update app/api/policy-settings/route.ts to be deleted or archive its behavior (app/api/policy-settings/route.ts) [X] T046 [US4] Delete app/api/admin/tenants/route.ts (n8n polling) (app/api/admin/tenants/route.ts) [X] T047 [US4] Remove POLICY_API_SECRET and N8N_SYNC_WEBHOOK_URL from .env and lib/env.mjs (.env, lib/env.mjs) [X] T048 [US4] Grep-check: verify no remaining n8n references (repo-wide) (no file)

T049 [US4] Update docs: remove n8n setup instructions and add worker notes (docs/worker-deployment.md)
T050 [US4] Add migration note to specs/002-manual-policy-sync/README.md marking it superseded (specs/002-manual-policy-sync/README.md)
T051 [US4] End-to-end QA: trigger sync from UI and confirm policies saved after cleanup (tests/e2e/post-cleanup-sync.test.ts)

Phase 8: Testing & Validation (no story label)

T052 Add unit tests for worker/utils/humanizer.ts and policyParser.ts coverage (tests/unit/*.test.ts)
T053 Add integration tests for worker jobs processing (tests/integration/worker.test.ts)
T054 Run load tests for large tenant (1000+ policies) and record results (scripts/load-tests/large-tenant.js)
T055 Test worker stability (run 1+ hour with multiple jobs) and check memory usage (local script)
T056 Validate all Success Criteria (SC-001 to SC-008) and document results (specs/005-backend-arch-pivot/validation.md)

Phase 9: Deployment & Documentation (no story label)

T057 Create docs/worker-deployment.md with production steps (docs/worker-deployment.md)
T058 Add deployment config for worker (Dockerfile or PM2 config) (deploy/worker/Dockerfile)
T059 Ensure REDIS_URL is set in production Dokploy config and documented (deploy/README.md)
T060 Add monitoring & alerting for worker failures (Sentry / logs / email) (deploy/monitoring.md)
T061 Run canary production sync and verify (scripts/canary-sync.js)
T062 Final cleanup: remove unused n8n-related code paths and feature flags (grep and code edits)
T063 Update README.md and DEPLOYMENT.md with worker instructions (README.md, DEPLOYMENT.md)
T064 Tag release branch 005-backend-arch-pivot and create PR template (.github/)
T065 Merge PR after review and monitor first production sync (GitHub workflow)
T066 Post-deploy: run post-mortem checklist and close feature ticket (specs/005-backend-arch-pivot/closure.md)

Notes

Tasks labeled [P] are safe to run in parallel across different files or developers.
Story labels map to spec user stories: US1 = Manual Sync, US2 = Graph Fetching, US3 = Transformation & DB, US4 = Cleanup & Frontend.
Each task includes a suggested file path to implement work; adjust as needed to match repo layout.

Tasks: Backend Architecture Pivot

Feature: 005-backend-arch-pivot
Generated: 2025-12-09
Total Tasks: 64 (T001-T066)
Spec: spec.md | Plan: plan.md

Phase 1: Setup & Infrastructure (8 tasks)

Goal: Prepare environment, install dependencies, setup Redis and BullMQ queue infrastructure

Environment Setup

T001 Install Redis via Docker Compose (add redis service to docker-compose.yml)
T002 [P] Add REDIS_URL to .env file (REDIS_URL=redis://localhost:6379)
T003 [P] Update lib/env.mjs - Add REDIS_URL: z.string().url() to server schema
T004 [P] Update lib/env.mjs - Add REDIS_URL to runtimeEnv object
T005 Install npm packages: bullmq, ioredis, @azure/identity, tsx

BullMQ Queue Infrastructure

T006 [P] Create lib/queue/redis.ts - Redis connection wrapper with IORedis
T007 [P] Create lib/queue/syncQueue.ts - BullMQ Queue definition for "intune-sync-queue"
T008 Test Redis connection and queue creation (add dummy job, verify in Redis CLI)

Phase 2: Worker Process Skeleton (6 tasks)

Goal: Set up worker process entry point and basic job processing infrastructure

Worker Setup

T009 Create worker/index.ts - BullMQ Worker entry point with job processor
T010 [P] Add worker:start script to package.json ("tsx watch worker/index.ts")
T011 [P] Implement worker event handlers (completed, failed, error)
T012 [P] Add structured logging for worker events (JSON format)
T013 Create worker/jobs/syncPolicies.ts - Main sync orchestration function (empty skeleton)
T014 Test worker starts successfully and listens on intune-sync-queue

Phase 3: Microsoft Graph Integration (9 tasks)

Goal: Implement Azure AD authentication and Microsoft Graph API data fetching with pagination

Authentication

T015 Create worker/jobs/graphAuth.ts - ClientSecretCredential token acquisition
T016 [P] Implement getGraphAccessToken() using @azure/identity
T017 Test token acquisition returns valid access token

Graph API Fetching

T018 Create worker/jobs/graphFetch.ts - Microsoft Graph API client
T019 [P] Implement fetchWithPagination() for handling @odata.nextLink
T020 [P] Create fetchAllPolicies() to fetch from 4 endpoints in parallel
T021 [P] Add Graph API endpoint constants (deviceConfigurations, compliancePolicies, configurationPolicies, intents)

Error Handling

T022 Create worker/utils/retry.ts - Exponential backoff retry logic
T023 Test Graph API calls with real tenant, verify pagination works for 100+ policies

Phase 4: Data Transformation (11 tasks)

Goal: Port n8n flattening logic to TypeScript, implement parsers for all policy types

Policy Parser Core

T024 Create worker/jobs/policyParser.ts - Main policy parsing router
T025 [P] Implement detectPolicyType() based on @odata.type
T026 [P] Implement parsePolicySettings() router function

Settings Catalog Parser

T027 Implement parseSettingsCatalog() for #microsoft.graph.deviceManagementConfigurationPolicy
T028 [P] Implement extractValue() for different value types (simple, choice, group collection)
T029 Handle nested settings with dot-notation path building

OMA-URI Parser

T030 [P] Implement parseOmaUri() for omaSettings[] arrays
T031 [P] Handle valueType mapping (string, int, boolean)

Humanizer & Utilities

T032 Create worker/utils/humanizer.ts - Setting ID humanization
T033 [P] Implement humanizeSettingId() to remove technical prefixes and format names
T034 [P] Implement defaultEmptySetting() for policies with no settings

Validation

T035 Test parser with sample Graph API responses, verify >95% extraction rate

Phase 5: Database Persistence (7 tasks)

Goal: Implement Drizzle ORM upsert logic with conflict resolution

Database Operations

T036 Create worker/jobs/dbUpsert.ts - Drizzle ORM upsert function
T037 [P] Implement upsertPolicySettings() with batch insert
T038 [P] Configure onConflictDoUpdate with policy_settings_upsert_unique constraint
T039 [P] Update lastSyncedAt timestamp on every sync
T040 Map FlattenedSetting[] to PolicySetting insert format

Integration

T041 Connect syncPolicies() orchestrator: auth → fetch → parse → upsert
T042 Test full sync with real tenant data, verify database updates correctly

Phase 6: Frontend Integration (4 tasks)

Goal: Replace n8n webhook with BullMQ job creation in Server Action

Server Action Update

T043 Modify lib/actions/policySettings.ts - triggerPolicySync() function
T044 Remove n8n webhook call (fetch to N8N_SYNC_WEBHOOK_URL)
T045 Add BullMQ job creation (syncQueue.add('sync-tenant', { tenantId }))
T046 Test end-to-end: UI click "Sync Now" → job created → worker processes → database updated

Phase 7: Legacy Cleanup (8 tasks)

Goal: Remove all n8n-related code, files, and environment variables

File Deletion

T047 Delete app/api/policy-settings/route.ts (n8n ingestion API)
T048 Delete app/api/admin/tenants/route.ts (n8n polling API)

Environment Variable Cleanup

T049 Remove POLICY_API_SECRET from .env file
T050 Remove N8N_SYNC_WEBHOOK_URL from .env file
T051 Remove POLICY_API_SECRET from lib/env.mjs server schema
T052 Remove N8N_SYNC_WEBHOOK_URL from lib/env.mjs server schema
T053 Remove POLICY_API_SECRET from lib/env.mjs runtimeEnv
T054 Remove N8N_SYNC_WEBHOOK_URL from lib/env.mjs runtimeEnv

Verification

T055 Run grep search for n8n references: grep -r "POLICY_API_SECRET|N8N_SYNC_WEBHOOK_URL" --exclude-dir=specs → should be 0 results

Phase 8: Testing & Validation (6 tasks)

Goal: Comprehensive testing of new architecture

Unit Tests

T056 [P] Write unit tests for humanizer.ts
T057 [P] Write unit tests for retry.ts
T058 [P] Write unit tests for policyParser.ts

Integration Tests

T059 Write integration test for full syncPolicies() flow with mocked Graph API
T060 Write integration test for database upsert with conflict resolution

End-to-End Test

T061 E2E test: Start Redis + Worker, trigger sync from UI, verify database updates

Phase 9: Deployment (5 tasks)

Goal: Deploy worker process to production environment

Docker & Infrastructure

T062 Update docker-compose.yml for production (Redis service with persistence)
T063 Create Dockerfile for worker process (if separate container)
T064 Configure worker as background service (PM2, Systemd, or Docker Compose)

Production Deployment

T065 Set REDIS_URL in production environment variables
T066 Deploy worker, monitor logs for first production sync

Dependencies Visualization

Phase 1 (Setup)
  ↓
Phase 2 (Worker Skeleton)
  ↓
Phase 3 (Graph Integration) ←─┐
  ↓                            │
Phase 4 (Transformation) ──────┤
  ↓                            │
Phase 5 (Database) ────────────┘
  ↓
Phase 6 (Frontend)
  ↓
Phase 7 (Cleanup)
  ↓
Phase 8 (Testing)
  ↓
Phase 9 (Deployment)

Parallel Opportunities:

Phase 3 & 4 can overlap (Graph integration while building parsers)
T002-T004 (env var updates) can be done in parallel
T006-T007 (Redis & Queue files) can be done in parallel
T015-T017 (auth) independent from T018-T021 (fetch)
T056-T058 (unit tests) can be done in parallel

Task Details

T001: Install Redis via Docker Compose

File: docker-compose.yml

Action: Add Redis service

services:
  redis:
    image: redis:alpine
    ports:
      - '6379:6379'
    volumes:
      - redis-data:/data
    restart: unless-stopped

volumes:
  redis-data:

Verification: docker-compose up -d redis && redis-cli ping returns PONG

T002-T004: Environment Variable Setup

Files: .env, lib/env.mjs

Changes:

Add REDIS_URL=redis://localhost:6379 to .env
Add REDIS_URL: z.string().url() to server schema
Add REDIS_URL: process.env.REDIS_URL to runtimeEnv

Verification: npm run dev starts without env validation errors

T005: Install npm Dependencies

Command:

npm install bullmq ioredis @azure/identity
npm install -D tsx

Verification: Check package.json for new dependencies

T006: Create Redis Connection Wrapper

File: lib/queue/redis.ts

Implementation: See technical-notes.md section "BullMQ Setup"

Exports: redisConnection

T007: Create BullMQ Queue

File: lib/queue/syncQueue.ts

Implementation: See technical-notes.md section "BullMQ Setup"

Exports: syncQueue

T009: Create Worker Entry Point

File: worker/index.ts

Implementation: See technical-notes.md section "Worker Implementation"

Features:

Worker listens on intune-sync-queue
Concurrency: 1 (sequential processing)
Event handlers for completed, failed, error

T015-T016: Azure AD Token Acquisition

File: worker/jobs/graphAuth.ts

Implementation: See technical-notes.md section "Authentication"

Function: getGraphAccessToken(): Promise<string>

Uses: @azure/identity ClientSecretCredential

T018-T021: Graph API Fetching

File: worker/jobs/graphFetch.ts

Functions:

fetchWithPagination<T>(url, token): Promise<T[]>
fetchAllPolicies(token): Promise<Policy[]>

Endpoints:

deviceManagement/deviceConfigurations
deviceManagement/deviceCompliancePolicies
deviceManagement/configurationPolicies
deviceManagement/intents

T024-T034: Policy Parser Implementation

File: worker/jobs/policyParser.ts

Functions:

detectPolicyType(odataType: string): string
parsePolicySettings(policy: any): FlattenedSetting[]
parseSettingsCatalog(policy: any): FlattenedSetting[]
parseOmaUri(policy: any): FlattenedSetting[]
extractValue(settingInstance: any): any

Reference: See technical-notes.md section "Flattening Strategy"

T036-T040: Database Upsert

File: worker/jobs/dbUpsert.ts

Function: upsertPolicySettings(tenantId: string, settings: FlattenedSetting[])

Features:

Batch insert with Drizzle ORM
Conflict resolution on policy_settings_upsert_unique
Update lastSyncedAt timestamp

Reference: See technical-notes.md section "Database Upsert"

T043-T045: Frontend Integration

File: lib/actions/policySettings.ts

Function: triggerPolicySync(tenantId: string)

Before:

const response = await fetch(env.N8N_SYNC_WEBHOOK_URL, {
  method: 'POST',
  body: JSON.stringify({ tenantId }),
});

After:

import { syncQueue } from '@/lib/queue/syncQueue';

const job = await syncQueue.add('sync-tenant', { 
  tenantId,
  triggeredAt: new Date(),
});
return { jobId: job.id };

Success Criteria Mapping

Task(s)	Success Criterion
T001-T008	SC-001: Job creation <200ms
T041-T042	SC-002: Sync 50 policies in <30s
T019-T021	SC-003: Pagination handles 100+ policies
T024-T035	SC-004: >95% setting extraction
T022-T023	SC-005: Automatic retry on 429
T047-T055	SC-006: Zero n8n references
T061, T066	SC-007: Worker stable 1+ hour
T041-T042	SC-008: No data loss on re-sync

Estimated Effort

Phase	Tasks	Hours	Priority
1. Setup	8	1-2h	P1
2. Worker Skeleton	6	2h	P1
3. Graph Integration	9	4h	P1
4. Transformation	11	6h	P1
5. Database	7	3h	P1
6. Frontend	4	2h	P1
7. Cleanup	8	2h	P1
8. Testing	6	4h	P1
9. Deployment	5	3h	P1
Total	64	27-29h

Implementation Notes

Task Execution Order

Sequential Tasks (blocking):

T001 → T002-T004 → T005 (setup before queue)
T006-T007 → T008 (Redis before queue test)
T009 → T013 (worker before sync skeleton)
T041 → T042 (integration before test)
T043-T045 → T046 (implementation before E2E test)

Parallel Tasks (can be done simultaneously):

T002, T003, T004 (env var updates)
T006, T007 (Redis + Queue files)
T010, T011, T012 (worker event handlers)
T015-T017, T018-T021 (auth independent from fetch)
T027-T029, T030-T031 (different parser types)
T047, T048 (file deletions)
T049-T054 (env var removals)
T056, T057, T058 (unit tests)

Common Pitfalls

Redis Connection: Ensure maxRetriesPerRequest: null for BullMQ compatibility
Graph API: Handle 429 rate limiting with exponential backoff
Pagination: Always follow @odata.nextLink until undefined
Upsert: Use correct constraint name policy_settings_upsert_unique
Worker Deployment: Don't forget concurrency: 1 for sequential processing

Testing Checkpoints

After T008: Redis + Queue working
After T014: Worker starts successfully
After T017: Token acquisition works
After T023: Graph API fetch with pagination works
After T035: Parser extracts >95% of settings
After T042: Full sync updates database
After T046: UI → Worker → DB flow complete
After T055: No n8n references remain
After T061: E2E test passes

Task Status: Ready for Implementation
Next Action: Start with Phase 1 (T001-T008) - Setup & Infrastructure

22 KiB Raw Permalink Blame History

Tasks: Backend Architecture Pivot

Phase 1: Setup (no story label)

Phase 2: Worker Skeleton (no story label)

Phase 3: US1 — Manual Policy Sync via Queue [US1]

Phase 4: US2 — Microsoft Graph Data Fetching [US2]

Phase 5: US3 — Deep Flattening & Transformation [US3]

Phase 6: US3 — Database Persistence (shared, assign to US3) [US3]

Phase 7: US4 — Frontend Integration & Legacy Cleanup [US4]

Phase 8: Testing & Validation (no story label)

Phase 9: Deployment & Documentation (no story label)

Notes

Tasks: Backend Architecture Pivot

Phase 1: Setup & Infrastructure (8 tasks)

Environment Setup

BullMQ Queue Infrastructure

Phase 2: Worker Process Skeleton (6 tasks)

Worker Setup

Phase 3: Microsoft Graph Integration (9 tasks)

Authentication

Graph API Fetching

Error Handling

Phase 4: Data Transformation (11 tasks)

Policy Parser Core

Settings Catalog Parser

OMA-URI Parser

Humanizer & Utilities

Validation

Phase 5: Database Persistence (7 tasks)

Database Operations

Integration

Phase 6: Frontend Integration (4 tasks)

Server Action Update

Phase 7: Legacy Cleanup (8 tasks)

File Deletion

Environment Variable Cleanup

Verification

Phase 8: Testing & Validation (6 tasks)

Unit Tests

Integration Tests

End-to-End Test

Phase 9: Deployment (5 tasks)

Docker & Infrastructure

Production Deployment

Dependencies Visualization

Task Details

T001: Install Redis via Docker Compose

T002-T004: Environment Variable Setup

T005: Install npm Dependencies

T006: Create Redis Connection Wrapper

T007: Create BullMQ Queue

T009: Create Worker Entry Point

T015-T016: Azure AD Token Acquisition

T018-T021: Graph API Fetching

T024-T034: Policy Parser Implementation

T036-T040: Database Upsert

T043-T045: Frontend Integration

Success Criteria Mapping

Estimated Effort

Implementation Notes

Task Execution Order

Common Pitfalls

Testing Checkpoints

22 KiB

Raw Permalink Blame History