768 lines
22 KiB
Markdown
768 lines
22 KiB
Markdown
# Implementation Plan: Backend Architecture Pivot
|
|
|
|
**Feature Branch**: `005-backend-arch-pivot`
|
|
**Created**: 2025-12-09
|
|
**Spec**: [spec.md](./spec.md)
|
|
**Status**: Ready for Implementation
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**Goal**: Migrate from n8n Low-Code backend to TypeScript Code-First backend with BullMQ job queue for Policy synchronization.
|
|
|
|
**Impact**: Removes external n8n dependency, improves maintainability, enables AI-assisted refactoring, and provides foundation for future scheduled sync features.
|
|
|
|
**Complexity**: HIGH - Requires new infrastructure (Redis, BullMQ), worker process deployment, and careful data transformation logic porting.
|
|
|
|
---
|
|
|
|
## Technical Context
|
|
|
|
### Current Architecture (n8n-based)
|
|
|
|
```
|
|
User clicks "Sync Now"
|
|
↓
|
|
Server Action: triggerPolicySync()
|
|
↓
|
|
HTTP POST → n8n Webhook (N8N_SYNC_WEBHOOK_URL)
|
|
↓
|
|
n8n Workflow:
|
|
1. Microsoft Graph Authentication
|
|
2. Fetch Policies (4 endpoints with pagination)
|
|
3. JavaScript Code Node: Deep Flattening Logic
|
|
4. HTTP POST → TenantPilot Ingestion API
|
|
↓
|
|
API Route: /api/policy-settings (validates POLICY_API_SECRET)
|
|
↓
|
|
Drizzle ORM: Insert/Update policy_settings table
|
|
```
|
|
|
|
**Problems**:
|
|
- External dependency (n8n instance required)
|
|
- Complex transformation logic hidden in n8n Code Node
|
|
- Hard to test, version control, and refactor
|
|
- No AI assistance for n8n code
|
|
- Additional API security layer needed (POLICY_API_SECRET)
|
|
|
|
### Target Architecture (BullMQ-based)
|
|
|
|
```
|
|
User clicks "Sync Now"
|
|
↓
|
|
Server Action: triggerPolicySync()
|
|
↓
|
|
BullMQ: Add job to Redis queue "intune-sync-queue"
|
|
↓
|
|
Worker Process (TypeScript):
|
|
1. Microsoft Graph Authentication (@azure/identity)
|
|
2. Fetch Policies (4 endpoints with pagination)
|
|
3. TypeScript: Deep Flattening Logic
|
|
4. Drizzle ORM: Direct Insert/Update
|
|
↓
|
|
Database: policy_settings table
|
|
```
|
|
|
|
**Benefits**:
|
|
- No external dependencies (Redis only)
|
|
- All logic in TypeScript (version-controlled, testable)
|
|
- AI-assisted refactoring possible
|
|
- Simpler security model (no API bridge)
|
|
- Foundation for scheduled syncs
|
|
|
|
---
|
|
|
|
## Constitution Check *(mandatory)*
|
|
|
|
### Compliance Verification
|
|
|
|
| Principle | Status | Notes |
|
|
|-----------|--------|-------|
|
|
| **I. Server-First Architecture** | ✅ COMPLIANT | Worker uses Server Actions pattern (background job processing), no client fetches |
|
|
| **II. TypeScript Strict Mode** | ✅ COMPLIANT | All worker code in TypeScript strict mode, fully typed Graph API responses |
|
|
| **III. Drizzle ORM Integration** | ✅ COMPLIANT | Worker uses Drizzle for all DB operations, no raw SQL |
|
|
| **IV. Shadcn UI Components** | ✅ COMPLIANT | No UI changes (frontend only triggers job, uses existing components) |
|
|
| **V. Azure AD Multi-Tenancy** | ✅ COMPLIANT | Uses existing Azure AD Client Credentials for Graph API access |
|
|
|
|
### Risk Assessment
|
|
|
|
**HIGH RISK**: Worker deployment as separate process (requires Docker Compose update, PM2/Systemd config)
|
|
|
|
**MEDIUM RISK**: Graph API rate limiting handling (needs robust retry logic)
|
|
|
|
**LOW RISK**: BullMQ integration (well-documented library, standard Redis setup)
|
|
|
|
### Justification
|
|
|
|
Architecture pivot necessary to:
|
|
1. Remove external n8n dependency (reduces operational complexity)
|
|
2. Enable AI-assisted development (TypeScript vs. n8n visual flows)
|
|
3. Improve testability (unit/integration tests for worker logic)
|
|
4. Prepare for Phase 2 features (scheduled syncs, multi-tenant parallel processing)
|
|
|
|
**Approved**: Constitution compliance verified, complexity justified by maintainability gains.
|
|
|
|
---
|
|
|
|
## File Tree & Changes
|
|
|
|
```
|
|
tenantpilot/
|
|
├── .env # [MODIFIED] Add REDIS_URL, remove POLICY_API_SECRET + N8N_SYNC_WEBHOOK_URL
|
|
├── (Redis provided by deployment) # No `docker-compose.yml` required; ensure `REDIS_URL` is set by Dokploy
|
|
├── package.json # [MODIFIED] Add bullmq, ioredis, @azure/identity, tsx dependencies
|
|
│
|
|
├── lib/
|
|
│ ├── env.mjs # [MODIFIED] Add REDIS_URL validation, remove POLICY_API_SECRET + N8N_SYNC_WEBHOOK_URL
|
|
│ ├── queue/
|
|
│ │ ├── redis.ts # [NEW] Redis connection for BullMQ
|
|
│ │ └── syncQueue.ts # [NEW] BullMQ Queue definition for "intune-sync-queue"
|
|
│ └── actions/
|
|
│ └── policySettings.ts # [MODIFIED] Replace n8n webhook call with BullMQ job creation
|
|
│
|
|
├── worker/
|
|
│ ├── index.ts # [NEW] BullMQ Worker entry point
|
|
│ ├── jobs/
|
|
│ │ ├── syncPolicies.ts # [NEW] Main sync orchestration logic
|
|
│ │ ├── graphAuth.ts # [NEW] Azure AD token acquisition
|
|
│ │ ├── graphFetch.ts # [NEW] Microsoft Graph API calls with pagination
|
|
│ │ ├── policyParser.ts # [NEW] Deep flattening & transformation logic
|
|
│ │ └── dbUpsert.ts # [NEW] Drizzle ORM upsert operations
|
|
│ └── utils/
|
|
│ ├── humanizer.ts # [NEW] Setting ID humanization
|
|
│ └── retry.ts # [NEW] Exponential backoff retry logic
|
|
│
|
|
├── app/api/
|
|
│ ├── policy-settings/
|
|
│ │ └── route.ts # [DELETED] n8n ingestion API no longer needed
|
|
│ └── admin/
|
|
│ └── tenants/
|
|
│ └── route.ts # [DELETED] n8n polling API no longer needed
|
|
│
|
|
└── specs/005-backend-arch-pivot/
|
|
├── spec.md # ✅ Complete
|
|
├── plan.md # 📝 This file
|
|
├── technical-notes.md # ✅ Complete (implementation reference)
|
|
└── tasks.md # 🔜 Generated next
|
|
```
|
|
|
|
---
|
|
|
|
## Phase Breakdown
|
|
|
|
### Phase 1: Setup & Infrastructure (T001-T008)
|
|
|
|
**Goal**: Prepare environment, install dependencies, and wire the app to the provisioned Redis instance
|
|
|
|
**Tasks**:
|
|
- T001: Confirm `REDIS_URL` is provided by Dokploy and obtain connection details
|
|
- T002-T004: Add `REDIS_URL` to local `.env` (for development) and to `lib/env.mjs` runtime validation
|
|
- T005: Install npm packages: `bullmq`, `ioredis`, `@azure/identity`, `tsx`
|
|
- T006-T007: Create Redis connection and BullMQ Queue
|
|
- T008: Test infrastructure (connect to provided Redis from local/dev environment)
|
|
|
|
**Deliverables**:
|
|
- Connection details for Redis from Dokploy documented
|
|
- Environment variables validated (local + deploy)
|
|
- Dependencies in `package.json`
|
|
- Queue operational using the provided Redis
|
|
|
|
**Exit Criteria**: `npm run dev` starts without env validation errors and the queue accepts jobs against the provided Redis
|
|
|
|
---
|
|
|
|
### Phase 2: Worker Process Skeleton (T009-T014)ntry point and basic job processing infrastructure
|
|
|
|
**Tasks**:
|
|
- T009: Create `worker/index.ts` - BullMQ Worker entry point
|
|
- T010-T012: Add npm script, event handlers, structured logging
|
|
- T013: Create sync orchestration skeleton
|
|
- T014: Test worker startup
|
|
|
|
**Deliverables**:
|
|
- Worker process can be started via `npm run worker:start`
|
|
- Jobs flow from queue → worker
|
|
- Event logging operational
|
|
|
|
**Exit Criteria**: Worker logs "Processing job X" when job is added to queue
|
|
|
|
---
|
|
|
|
### Phase 3: Microsoft Graph Integration (T015-T023)ion and Microsoft Graph API data fetching with pagination
|
|
|
|
**Tasks**:
|
|
- T015-T017: Create `worker/jobs/graphAuth.ts` - Azure AD token acquisition
|
|
- T018-T021: Create `worker/jobs/graphFetch.ts` - Fetch from 4 endpoints with pagination
|
|
- T022: Create `worker/utils/retry.ts` - Exponential backoff
|
|
- T023: Test with real tenant data
|
|
|
|
**Deliverables**:
|
|
- `getGraphAccessToken()` returns valid token
|
|
- `fetchAllPolicies()` returns all policies from 4 endpoints
|
|
- Pagination handled correctly (follows `@odata.nextLink`)
|
|
- Rate limiting handled with retry
|
|
|
|
**Exit Criteria**: Worker successfully fetches >50 policies for test tenant
|
|
|
|
---
|
|
|
|
### Phase 4: Data Transformation (T024-T035)
|
|
|
|
**Goal**: Port n8n flattening logic to TypeScript
|
|
|
|
**Tasks**:
|
|
1. Create `worker/jobs/policyParser.ts` - Policy type detection & routing
|
|
2. Implement Settings Catalog parser (`settings[]` → flat key-value)
|
|
3. Implement OMA-URI parser (`omaSettings[]` → flat key-value)
|
|
4. Create `worker/utils/humanizer.ts` - Setting ID humanization
|
|
5. Handle empty policies (default placeholder setting)
|
|
6. Test: Parse sample policies, verify output structure
|
|
|
|
**Deliverables**:
|
|
- `parsePolicySettings()` converts Graph response → FlattenedSetting[]
|
|
- Humanizer converts technical IDs → readable names
|
|
- Empty policies get "(No settings configured)" entry
|
|
|
|
**Exit Criteria**: 95%+ of sample settings are correctly extracted and formatted
|
|
|
|
---
|
|
|
|
### Phase 5: Database Persistence (T036-T043)
|
|
|
|
**Goal**: Implement Drizzle ORM upsert logic
|
|
|
|
**Tasks**:
|
|
1. Create `worker/jobs/dbUpsert.ts` - Batch upsert with conflict resolution
|
|
2. Use existing `policy_settings` table schema
|
|
3. Leverage `policy_settings_upsert_unique` constraint (tenantId + graphPolicyId + settingName)
|
|
4. Update `lastSyncedAt` on every sync
|
|
5. Test: Run full sync, verify data in DB
|
|
|
|
**Deliverables**:
|
|
- `upsertPolicySettings()` inserts new & updates existing settings
|
|
- No duplicate settings created
|
|
- `lastSyncedAt` updated correctly
|
|
|
|
**Exit Criteria**: Full sync for test tenant completes successfully, data visible in DB
|
|
|
|
---
|
|
|
|
### Phase 6: Frontend Integration (T044-T051)
|
|
|
|
**Goal**: Replace n8n webhook with BullMQ job creation
|
|
|
|
**Tasks**:
|
|
1. Modify `lib/actions/policySettings.ts` → `triggerPolicySync()`
|
|
2. Remove n8n webhook call (`fetch(env.N8N_SYNC_WEBHOOK_URL)`)
|
|
3. Replace with BullMQ job creation (`syncQueue.add(...)`)
|
|
4. Return job ID to frontend
|
|
5. Test: Click "Sync Now", verify job created & processed
|
|
|
|
**Deliverables**:
|
|
- "Sync Now" button triggers BullMQ job
|
|
- User sees immediate feedback (no blocking)
|
|
- Worker processes job in background
|
|
|
|
**Exit Criteria**: End-to-end sync works from UI → Queue → Worker → DB
|
|
|
|
---
|
|
|
|
### Phase 7: Legacy Cleanup (T052-T056)
|
|
|
|
**Goal**: Remove all n8n-related code and configuration
|
|
|
|
**Tasks**:
|
|
1. Delete `app/api/policy-settings/route.ts` (n8n ingestion API)
|
|
2. Delete `app/api/admin/tenants/route.ts` (n8n polling API)
|
|
3. Remove `POLICY_API_SECRET` from `.env` and `lib/env.mjs`
|
|
4. Remove `N8N_SYNC_WEBHOOK_URL` from `.env` and `lib/env.mjs`
|
|
5. Grep search for remaining references (should be 0)
|
|
6. Update documentation (remove n8n setup instructions)
|
|
|
|
**Deliverables**:
|
|
- No n8n-related files in codebase
|
|
- No n8n-related env vars
|
|
- Clean grep search results
|
|
|
|
**Exit Criteria**: `grep -r "POLICY_API_SECRET\|N8N_SYNC_WEBHOOK_URL" .` returns 0 results (excluding specs/)
|
|
|
|
---
|
|
|
|
### Phase 8: Testing & Validation (T057-T061)
|
|
|
|
**Goal**: Comprehensive testing of new architecture
|
|
|
|
**Tasks**:
|
|
1. Unit tests for flattening logic
|
|
2. Integration tests for worker jobs
|
|
3. End-to-end test: UI → Queue → Worker → DB
|
|
4. Load test: 100+ policies sync
|
|
5. Error handling test: Graph API failures, Redis unavailable
|
|
6. Memory leak test: Worker runs 1+ hour with 10+ jobs
|
|
|
|
**Deliverables**:
|
|
- Test suite with >80% coverage for worker code
|
|
- All edge cases verified
|
|
- Performance benchmarks met (SC-001 to SC-008)
|
|
|
|
**Exit Criteria**: All tests pass, no regressions in existing features
|
|
|
|
---
|
|
|
|
### Phase 9: Deployment (T062-T066)
|
|
|
|
**Goal**: Deploy worker process to production
|
|
|
|
**Tasks**:
|
|
1. Ensure `REDIS_URL` is set in production (provided by Dokploy) — no Docker Compose Redis required
|
|
2. Configure worker as background service (PM2, Systemd, or Docker)
|
|
3. Set `REDIS_URL` in production environment
|
|
4. Monitor worker logs for first production sync
|
|
5. Verify sync completes successfully
|
|
6. Document worker deployment process
|
|
|
|
**Deliverables**:
|
|
- Worker running as persistent service
|
|
- Redis accessible from worker
|
|
- Production sync successful
|
|
|
|
**Exit Criteria**: Production sync works end-to-end, no errors in logs
|
|
|
|
---
|
|
|
|
## Key Technical Decisions
|
|
|
|
### 1. BullMQ vs. Other Queue Libraries
|
|
|
|
**Decision**: Use BullMQ
|
|
|
|
**Rationale**:
|
|
- Modern, actively maintained (vs. Kue, Bull)
|
|
- TypeScript-first design
|
|
- Built-in retry, rate limiting, priority queues
|
|
- Excellent documentation
|
|
- Redis-based (simpler than RabbitMQ/Kafka)
|
|
|
|
**Alternatives Considered**:
|
|
- **Bee-Queue**: Lighter but less features
|
|
- **Agenda**: MongoDB-based (adds extra dependency)
|
|
- **AWS SQS**: Vendor lock-in, requires AWS setup
|
|
|
|
---
|
|
|
|
### 2. Worker Process Architecture
|
|
|
|
**Decision**: Single worker process, sequential job processing (concurrency: 1)
|
|
|
|
**Rationale**:
|
|
- Simpler implementation (no race conditions)
|
|
- Microsoft Graph rate limits per tenant
|
|
- Database upsert logic easier without concurrency
|
|
- Can scale later if needed (multiple workers)
|
|
|
|
**Alternatives Considered**:
|
|
- **Parallel Processing**: Higher complexity, potential conflicts
|
|
- **Lambda/Serverless**: Cold starts, harder debugging
|
|
|
|
---
|
|
|
|
### 3. Token Acquisition Strategy
|
|
|
|
**Decision**: Use `@azure/identity` ClientSecretCredential
|
|
|
|
**Rationale**:
|
|
- Official Microsoft library
|
|
- Handles token refresh automatically
|
|
- TypeScript support
|
|
- Simpler than manual OAuth flow
|
|
|
|
**Alternatives Considered**:
|
|
- **Manual fetch()**: More code, error-prone
|
|
- **MSAL Node**: Overkill for server-side client credentials
|
|
|
|
---
|
|
|
|
### 4. Flattening Algorithm
|
|
|
|
**Decision**: Port n8n logic 1:1 initially, refactor later
|
|
|
|
**Rationale**:
|
|
- Minimize risk (proven logic)
|
|
- Faster migration (no re-design needed)
|
|
- Can optimize in Phase 2 after validation
|
|
|
|
**Alternatives Considered**:
|
|
- **Re-design from scratch**: Higher risk, longer timeline
|
|
|
|
---
|
|
|
|
### 5. Database Schema Changes
|
|
|
|
**Decision**: No schema changes needed
|
|
|
|
**Rationale**:
|
|
- Existing `policy_settings` table has required fields
|
|
- UNIQUE constraint already supports upsert logic
|
|
- `lastSyncedAt` field exists for tracking
|
|
|
|
**Alternatives Considered**:
|
|
- **Add job tracking table**: Overkill for MVP (BullMQ handles this)
|
|
|
|
---
|
|
|
|
## Data Flow Diagrams
|
|
|
|
### Current Flow (n8n)
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant User
|
|
participant UI as Next.js UI
|
|
participant SA as Server Action
|
|
participant n8n as n8n Webhook
|
|
participant API as Ingestion API
|
|
participant DB as PostgreSQL
|
|
|
|
User->>UI: Click "Sync Now"
|
|
UI->>SA: triggerPolicySync(tenantId)
|
|
SA->>n8n: POST /webhook
|
|
n8n->>n8n: Fetch Graph API
|
|
n8n->>n8n: Transform Data
|
|
n8n->>API: POST /api/policy-settings
|
|
API->>API: Validate API Secret
|
|
API->>DB: Insert/Update
|
|
DB-->>API: Success
|
|
API-->>n8n: 200 OK
|
|
n8n-->>SA: 200 OK
|
|
SA-->>UI: Success
|
|
UI-->>User: Toast "Sync started"
|
|
```
|
|
|
|
### Target Flow (BullMQ)
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant User
|
|
participant UI as Next.js UI
|
|
participant SA as Server Action
|
|
participant Queue as Redis Queue
|
|
participant Worker as Worker Process
|
|
participant Graph as MS Graph API
|
|
participant DB as PostgreSQL
|
|
|
|
User->>UI: Click "Sync Now"
|
|
UI->>SA: triggerPolicySync(tenantId)
|
|
SA->>Queue: Add job "sync-tenant"
|
|
Queue-->>SA: Job ID
|
|
SA-->>UI: Success (immediate)
|
|
UI-->>User: Toast "Sync started"
|
|
|
|
Note over Worker: Background Processing
|
|
Worker->>Queue: Pick job
|
|
Worker->>Graph: Fetch policies
|
|
Graph-->>Worker: Policy data
|
|
Worker->>Worker: Transform data
|
|
Worker->>DB: Upsert settings
|
|
DB-->>Worker: Success
|
|
Worker->>Queue: Mark job complete
|
|
```
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
### Changes Required
|
|
|
|
**Add**:
|
|
```bash
|
|
REDIS_URL=redis://localhost:6379
|
|
```
|
|
|
|
**Remove**:
|
|
```bash
|
|
# Delete these lines:
|
|
POLICY_API_SECRET=...
|
|
N8N_SYNC_WEBHOOK_URL=...
|
|
```
|
|
|
|
### Updated `lib/env.mjs`
|
|
|
|
```typescript
|
|
export const env = createEnv({
|
|
server: {
|
|
DATABASE_URL: z.string().url(),
|
|
NEXTAUTH_SECRET: z.string().min(1),
|
|
NEXTAUTH_URL: z.string().url(),
|
|
AZURE_AD_CLIENT_ID: z.string().min(1),
|
|
AZURE_AD_CLIENT_SECRET: z.string().min(1),
|
|
REDIS_URL: z.string().url(), // ADD THIS
|
|
RESEND_API_KEY: z.string().optional(),
|
|
STRIPE_SECRET_KEY: z.string().optional(),
|
|
// ... other Stripe vars
|
|
// REMOVE: POLICY_API_SECRET
|
|
// REMOVE: N8N_SYNC_WEBHOOK_URL
|
|
},
|
|
client: {},
|
|
runtimeEnv: {
|
|
DATABASE_URL: process.env.DATABASE_URL,
|
|
NEXTAUTH_SECRET: process.env.NEXTAUTH_SECRET,
|
|
NEXTAUTH_URL: process.env.NEXTAUTH_URL,
|
|
AZURE_AD_CLIENT_ID: process.env.AZURE_AD_CLIENT_ID,
|
|
AZURE_AD_CLIENT_SECRET: process.env.AZURE_AD_CLIENT_SECRET,
|
|
REDIS_URL: process.env.REDIS_URL, // ADD THIS
|
|
RESEND_API_KEY: process.env.RESEND_API_KEY,
|
|
STRIPE_SECRET_KEY: process.env.STRIPE_SECRET_KEY,
|
|
// ... other vars
|
|
},
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
|
|
**Target Coverage**: 80%+ for worker code
|
|
|
|
**Files to Test**:
|
|
- `worker/utils/humanizer.ts` - Setting ID transformation
|
|
- `worker/jobs/policyParser.ts` - Flattening logic
|
|
- `worker/utils/retry.ts` - Backoff algorithm
|
|
|
|
**Example**:
|
|
```typescript
|
|
describe('humanizeSettingId', () => {
|
|
it('removes vendor prefix', () => {
|
|
expect(humanizeSettingId('device_vendor_msft_policy_config_wifi'))
|
|
.toBe('Wifi');
|
|
});
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
### Integration Tests
|
|
|
|
**Target**: Full worker job processing
|
|
|
|
**Scenario**:
|
|
1. Mock Microsoft Graph API responses
|
|
2. Add job to queue
|
|
3. Verify worker processes job
|
|
4. Check database for inserted settings
|
|
|
|
**Example**:
|
|
```typescript
|
|
describe('syncPolicies', () => {
|
|
it('fetches and stores policies', async () => {
|
|
await syncPolicies('test-tenant-123');
|
|
const settings = await db.query.policySettings.findMany({
|
|
where: eq(policySettings.tenantId, 'test-tenant-123'),
|
|
});
|
|
expect(settings.length).toBeGreaterThan(0);
|
|
});
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
### End-to-End Test
|
|
|
|
**Scenario**:
|
|
1. Start Redis + Worker
|
|
2. Login to UI
|
|
3. Navigate to `/search`
|
|
4. Click "Sync Now"
|
|
5. Verify:
|
|
- Job created in Redis
|
|
- Worker picks up job
|
|
- Database updated
|
|
- UI shows success message
|
|
|
|
---
|
|
|
|
## Rollback Plan
|
|
|
|
**If migration fails in production**:
|
|
|
|
1. **Immediate**: Revert to previous Docker image (with n8n integration)
|
|
2. **Restore env vars**: Re-add `POLICY_API_SECRET` and `N8N_SYNC_WEBHOOK_URL`
|
|
3. **Verify**: n8n webhook accessible, sync works
|
|
4. **Post-mortem**: Document failure reason, plan fixes
|
|
|
|
**Data Safety**: No data loss risk (upsert logic preserves existing data)
|
|
|
|
---
|
|
|
|
## Performance Targets
|
|
|
|
Based on Success Criteria (SC-001 to SC-008):
|
|
|
|
| Metric | Target | Measurement |
|
|
|--------|--------|-------------|
|
|
| Job Creation | <200ms | Server Action response time |
|
|
| Sync Duration (50 policies) | <30s | Worker job duration |
|
|
| Setting Extraction | >95% | Manual validation with sample data |
|
|
| Worker Stability | 1+ hour, 10+ jobs | Memory profiling |
|
|
| Pagination | 100% | Test with 100+ policies tenant |
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
### npm Packages
|
|
|
|
```json
|
|
{
|
|
"dependencies": {
|
|
"bullmq": "^5.0.0",
|
|
"ioredis": "^5.3.0",
|
|
"@azure/identity": "^4.0.0"
|
|
},
|
|
"devDependencies": {
|
|
"tsx": "^4.0.0"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Infrastructure
|
|
|
|
- **Redis**: 7.x (via Docker or external service)
|
|
- **Node.js**: 20+ (for worker process)
|
|
|
|
---
|
|
|
|
## Monitoring & Observability
|
|
|
|
### Worker Logs
|
|
|
|
**Format**: Structured JSON logs
|
|
|
|
**Key Events**:
|
|
- Job started: `{ event: "job_start", jobId, tenantId, timestamp }`
|
|
- Job completed: `{ event: "job_complete", jobId, duration, settingsCount }`
|
|
- Job failed: `{ event: "job_failed", jobId, error, stack }`
|
|
|
|
**Storage**: Write to file or stdout (captured by Docker/PM2)
|
|
|
|
---
|
|
|
|
### Health Check Endpoint
|
|
|
|
**Path**: `/api/worker-health`
|
|
|
|
**Response**:
|
|
```json
|
|
{
|
|
"status": "healthy",
|
|
"queue": {
|
|
"waiting": 2,
|
|
"active": 1,
|
|
"completed": 45,
|
|
"failed": 3
|
|
}
|
|
}
|
|
```
|
|
|
|
**Use Case**: Monitoring dashboard, uptime checks
|
|
|
|
---
|
|
|
|
## Documentation Updates
|
|
|
|
**Files to Update**:
|
|
1. `README.md` - Add worker deployment instructions
|
|
2. `DEPLOYMENT.md` - Document Redis setup, worker config
|
|
3. `specs/002-manual-policy-sync/` - Mark as superseded by 005
|
|
|
|
**New Documentation**:
|
|
1. `docs/worker-deployment.md` - Step-by-step worker setup
|
|
2. `docs/troubleshooting.md` - Common worker issues & fixes
|
|
|
|
---
|
|
|
|
## Open Questions & Risks
|
|
|
|
### Q1: Redis Hosting Strategy
|
|
|
|
**Question**: Self-hosted Redis or managed service (e.g., Upstash, Redis Cloud)?
|
|
|
|
**Options**:
|
|
- Docker Compose (simple, dev-friendly)
|
|
- Upstash (serverless, paid but simple)
|
|
- Self-hosted on VPS (more control, more ops)
|
|
|
|
**Recommendation**: Start with Docker Compose, migrate to managed service if scaling needed
|
|
|
|
---
|
|
|
|
### Q2: Worker Deployment Method
|
|
|
|
**Question**: How to deploy worker in production?
|
|
|
|
**Options**:
|
|
- PM2 (Node process manager)
|
|
- Systemd (Linux service)
|
|
- Docker container (consistent with app)
|
|
|
|
**Recommendation**: Docker container (matches Next.js deployment strategy)
|
|
|
|
---
|
|
|
|
### Q3: Job Failure Notifications
|
|
|
|
**Question**: How to notify admins when sync jobs fail?
|
|
|
|
**Options**:
|
|
- Email via Resend (already integrated)
|
|
- In-app notification system (Phase 2)
|
|
- External monitoring (e.g., Sentry)
|
|
|
|
**Recommendation**: Start with logs only, add notifications in Phase 2
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
| Metric | Target | Status |
|
|
|--------|--------|--------|
|
|
| n8n dependency removed | Yes | 🔜 |
|
|
| All tests passing | 100% | 🔜 |
|
|
| Production sync successful | Yes | 🔜 |
|
|
| Worker uptime | >99% | 🔜 |
|
|
| Zero data loss | Yes | 🔜 |
|
|
|
|
---
|
|
|
|
## Timeline Estimate
|
|
|
|
| Phase | Duration | Dependencies |
|
|
|-------|----------|--------------|
|
|
| 0. Pre-Implementation | 1h | None |
|
|
| 1. Queue Infrastructure | 2h | Phase 0 |
|
|
| 2. Graph Integration | 4h | Phase 1 |
|
|
| 3. Data Transformation | 6h | Phase 2 |
|
|
| 4. Database Persistence | 3h | Phase 3 |
|
|
| 5. Frontend Integration | 2h | Phase 4 |
|
|
| 6. Legacy Cleanup | 2h | Phase 5 |
|
|
| 7. Testing & Validation | 4h | Phases 1-6 |
|
|
| 8. Deployment | 3h | Phase 7 |
|
|
| **Total** | **~27h** | **~3-4 days** |
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. ✅ Generate `tasks.md` with detailed task breakdown
|
|
2. 🔜 Start Phase 0: Install Redis, update env vars
|
|
3. 🔜 Implement Phase 1: Queue infrastructure
|
|
4. 🔜 Continue through Phase 8: Deployment
|
|
|
|
---
|
|
|
|
**Plan Status**: ✅ Ready for Task Generation
|
|
**Approved by**: Technical Lead (pending)
|
|
**Last Updated**: 2025-12-09
|