# Replymate Chrome Extension - Design Document

## Architecture Overview

### Current Architecture (v1.0)

```
Chrome Extension
    ↓
Direct call to OpenAI API
    ↓
User's OpenAI API key
```

**Issues:**
- Every user needs their own OpenAI API key
- No centralized cost monitoring
- No usage analytics
- Users can exceed quota unexpectedly
- No rate limiting control

### Proposed Architecture (v2.0)

```
Chrome Extension
    ↓
Cloudflare Worker API (with AI Gateway)
    ↓
OpenAI API
```

**Benefits:**
- Single shared API key (managed by Robomate)
- Centralized cost monitoring via Cloudflare AI Gateway
- Usage analytics per user/team
- Rate limiting and quota controls
- Better security
- Easier onboarding (no API key setup)

## Cloudflare Worker API Design

### Endpoint Structure

```
POST https://replymate-api.robomate.workers.dev/generate-reply

Headers:
  Authorization: Bearer <extension-token>
  Content-Type: application/json

Body:
{
  "customerName": "John Doe",
  "customerMessage": "I need help with...",
  "ticketSubject": "Order #12345",
  "agentName": "Sarah"
}

Response:
{
  "reply": "Hi John,\n\nThank you for reaching out...",
  "model": "gpt-4o-mini",
  "usage": {
    "input_tokens": 320,
    "output_tokens": 180
  }
}
```

### Cloudflare AI Gateway Integration

```javascript
// Worker routes requests through AI Gateway
const response = await fetch(
  'https://gateway.ai.cloudflare.com/v1/ACCOUNT_ID/replymate/openai/chat/completions',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${OPENAI_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(payload)
  }
);
```

**AI Gateway Features:**
- Real-time analytics dashboard
- Cost tracking per request
- Rate limiting
- Caching for duplicate requests
- Logs for debugging
- Usage alerts

## Security Architecture

### Problem: Preventing Unauthorized API Usage

Without security, anyone could:
1. Reverse engineer the Worker URL
2. Send unlimited requests
3. Drain your OpenAI credits
4. Sell access to your API

### Security Layers

#### Layer 1: Extension Token Authentication

**How it works:**
```
Extension → Signs request with unique token → Worker validates

Token format: JWT (JSON Web Token)
Payload: {
  "extensionId": "chrome-extension-123",
  "userId": "user@company.com",
  "domain": "company.freshdesk.com",
  "exp": 1234567890
}
```

**Implementation:**
- Each extension installation gets a unique token
- Token stored in Chrome's secure storage
- Token signed with secret key (stored in Worker environment variables)
- Worker validates signature before processing request
- Tokens expire after 30 days (auto-refresh)

**Protection:**
- Can't forge tokens without secret key
- Expired tokens rejected
- Revocable if compromised

#### Layer 2: Origin Validation

**How it works:**
```javascript
// Worker validates request origin
const origin = request.headers.get('Origin');
const referer = request.headers.get('Referer');

// Only accept requests from Freshdesk domains
if (!origin?.includes('.freshdesk.com') && !referer?.includes('.freshdesk.com')) {
  return new Response('Forbidden', { status: 403 });
}
```

**Protection:**
- Requests must come from Freshdesk pages
- Blocks direct curl/Postman requests
- Prevents third-party sites from using API

#### Layer 3: Rate Limiting

**Per-user rate limits:**
```javascript
// Cloudflare KV for rate limiting
const userId = jwt.payload.userId;
const key = `ratelimit:${userId}:${currentMinute}`;

const count = await env.KV.get(key) || 0;
if (count > 20) { // 20 requests per minute max
  return new Response('Rate limit exceeded', { status: 429 });
}

await env.KV.put(key, count + 1, { expirationTtl: 60 });
```

**Limits:**
- Per user: 20 requests/minute, 500/day
- Per company: 5000 requests/day
- Adjustable via config

**Protection:**
- Prevents API abuse
- Limits damage from compromised tokens
- Fair usage enforcement

#### Layer 4: Domain Allowlist

**How it works:**
```javascript
// Only allow registered Freshdesk domains
const ALLOWED_DOMAINS = [
  'robomate.freshdesk.com',
  'company1.freshdesk.com',
  'company2.freshdesk.com'
];

const domain = jwt.payload.domain;
if (!ALLOWED_DOMAINS.includes(domain)) {
  return new Response('Domain not authorized', { status: 403 });
}
```

**Stored in:**
- Cloudflare KV (key-value store)
- Admin dashboard to add/remove domains
- Automatic during customer onboarding

**Protection:**
- Only paying customers can use API
- Easy to revoke access
- Prevents competitors from using your API

#### Layer 5: Request Fingerprinting

**How it works:**
```javascript
// Generate fingerprint from request metadata
const fingerprint = await crypto.subtle.digest('SHA-256',
  `${userId}-${ticketId}-${timestamp}-${userAgent}`
);

// Check if duplicate request in last 5 seconds
const key = `fingerprint:${fingerprint}`;
if (await env.KV.get(key)) {
  return cachedResponse; // Return cached, don't call OpenAI again
}

await env.KV.put(key, response, { expirationTtl: 5 });
```

**Protection:**
- Prevents replay attacks
- Reduces duplicate API calls
- Cost optimization

#### Layer 6: Usage Monitoring & Alerts

**Real-time monitoring:**
```javascript
// Log every request to Analytics Engine
await env.ANALYTICS.writeDataPoint({
  userId,
  domain,
  timestamp: Date.now(),
  tokens: usage.total_tokens,
  cost: calculateCost(usage),
  ip: request.headers.get('CF-Connecting-IP')
});

// Alert on suspicious patterns
if (requestsInLastMinute > 100) {
  await sendAlert({
    type: 'SUSPICIOUS_ACTIVITY',
    userId,
    message: 'Abnormal request rate detected'
  });
}
```

**Protection:**
- Detect anomalies early
- Auto-block suspicious users
- Forensics for security incidents

### Token Distribution Strategy

#### Option A: Pre-generated Tokens (Recommended)

**Process:**
1. Admin generates tokens via dashboard
2. Each token tied to specific:
   - Email address
   - Freshdesk domain
   - Company/team
3. User installs extension
4. Extension prompts for token (one-time setup)
5. Token stored in Chrome sync storage

**Pros:**
- Full control over who gets access
- Easy to revoke
- Audit trail of all tokens
- Can tie to billing

**Cons:**
- Requires token distribution process
- Users must enter token once

#### Option B: OAuth-style Flow

**Process:**
1. User installs extension
2. Extension redirects to Worker auth page
3. User logs in with email
4. Worker generates token and redirects back
5. Extension stores token

**Pros:**
- Better UX (no manual token entry)
- Can integrate with SSO

**Cons:**
- More complex implementation
- Requires auth page

#### Option C: Domain-based Auto-provisioning

**Process:**
1. User installs extension
2. Extension detects Freshdesk domain
3. Sends domain to Worker
4. Worker checks if domain is allowlisted
5. If yes, auto-generates token
6. Token tied to that domain

**Pros:**
- Zero-config for users
- Easy onboarding

**Cons:**
- Less granular control
- Anyone on that Freshdesk can use it

**Recommendation:** Start with Option A for MVP, migrate to Option B for production.

## Cost Monitoring Dashboard

### Cloudflare AI Gateway Analytics

**Built-in metrics:**
- Requests per day/week/month
- Tokens consumed
- Cost breakdown by user/domain
- Response times
- Error rates
- Cache hit rates

**Custom tracking:**
```javascript
// Worker tracks additional metadata
{
  userId: "user@company.com",
  domain: "company.freshdesk.com",
  ticketId: "12345",
  agentName: "Sarah",
  timestamp: "2025-01-15T10:30:00Z",
  model: "gpt-4o-mini",
  inputTokens: 320,
  outputTokens: 180,
  cost: 0.00092,
  cached: false
}
```

**Visualization:**
- Grafana dashboard showing real-time usage
- Slack alerts for daily cost summaries
- Email alerts when approaching budget limits

### Budget Controls

**Hard limits:**
```javascript
// Check daily budget before processing
const todaySpend = await getTodaySpend(userId);
const userLimit = await getUserLimit(userId);

if (todaySpend >= userLimit) {
  return new Response(JSON.stringify({
    error: 'Daily budget exceeded',
    spent: todaySpend,
    limit: userLimit
  }), {
    status: 429,
    headers: { 'Content-Type': 'application/json' }
  });
}
```

**Soft limits:**
- Warning at 80% of daily limit
- Auto-throttle at 90% (slower responses, caching)
- Hard stop at 100%

## Privacy & Data Handling

### What We Store

**Temporary (for rate limiting/caching):**
- Request fingerprints (5 seconds)
- Rate limit counters (60 seconds)
- Cached responses (5 minutes)

**Long-term (for analytics):**
- User ID (hashed)
- Domain
- Token count
- Cost
- Timestamp

**Never stored:**
- Customer names
- Customer messages
- Ticket content
- Generated replies
- PII/sensitive data

### Compliance

- GDPR compliant (no PII storage)
- SOC 2 ready (audit logs)
- Data residency via Cloudflare regions
- Encryption in transit (TLS 1.3)
- Encryption at rest (Cloudflare KV)

## Migration Plan (v1.0 → v2.0)

### Phase 1: Worker Setup (Week 1)
- [ ] Create Cloudflare Worker
- [ ] Set up AI Gateway
- [ ] Implement basic authentication
- [ ] Deploy to staging

### Phase 2: Security Implementation (Week 2)
- [ ] JWT token generation system
- [ ] Rate limiting with KV
- [ ] Domain allowlist
- [ ] Analytics tracking

### Phase 3: Extension Update (Week 3)
- [ ] Add token configuration UI
- [ ] Update API calls to Worker endpoint
- [ ] Fallback to direct OpenAI (if Worker down)
- [ ] Testing with team

### Phase 4: Rollout (Week 4)
- [ ] Generate tokens for existing users
- [ ] Distribute tokens via email
- [ ] Monitor usage/errors
- [ ] Gather feedback

### Backward Compatibility

**Support both modes:**
```javascript
// Extension checks for token first
if (hasWorkerToken()) {
  // Use Cloudflare Worker API
  response = await callWorkerAPI(context);
} else if (hasOpenAIKey()) {
  // Fallback to direct OpenAI (v1.0 behavior)
  response = await callOpenAIDirect(context);
} else {
  // Prompt for setup
  showConfigPrompt();
}
```

**Sunset timeline:**
- Week 1-4: Both modes supported
- Week 5-8: Deprecation warnings for direct OpenAI
- Week 9+: Worker-only mode

## Cost Projections

### Current (v1.0) - Direct OpenAI
```
User A: 100 replies/day = $3/mo (their account)
User B: 50 replies/day = $1.50/mo (their account)
User C: 200 replies/day = $6/mo (their account)

Total: Users pay individually, no visibility
```

### Proposed (v2.0) - Cloudflare Worker
```
500 total replies/day
= ~15,000 replies/month
= ~$15/month OpenAI costs

Cloudflare costs:
- Worker requests: ~450k/mo = $0.15
- KV reads: ~900k/mo = $0.45
- KV writes: ~450k/mo = $2.25
- AI Gateway: Free tier (500k requests/mo)
- Analytics Engine: ~$5/mo

Total monthly cost: ~$23
Cost per reply: $0.0015

Benefits:
- 10% cheaper per request (caching)
- Full visibility into usage
- Better rate limiting
- Professional service
```

## Alternative Security Approaches

### What NOT to do:

❌ **API key in extension code**
- Anyone can extract it
- Impossible to rotate without updating extension

❌ **Obfuscation only**
- Security through obscurity fails
- Easy to reverse engineer

❌ **IP allowlisting**
- Users have dynamic IPs
- VPNs would break it

❌ **Simple API keys (like `sk-xxx`)**
- No metadata (who, when, where)
- Can't rate limit per user
- Hard to revoke

### What TO do:

✅ **JWT tokens with claims**
- Contains user/domain metadata
- Cryptographically signed
- Verifiable without database lookup

✅ **Multiple security layers**
- Defense in depth
- If one layer fails, others protect

✅ **Rate limiting + monitoring**
- Limits blast radius
- Early detection

✅ **AI Gateway caching**
- Reduces costs
- Faster responses
- Less API abuse impact

## Data Collection & Learning System

### Problem Statement

Currently, we generate replies but have no visibility into:
- Which replies users actually send vs edit heavily
- What edits users make (deletions, dropdown changes)
- Which types of tickets generate good vs poor replies
- How to improve our prompts over time

**Solution:** Store all generations, edits, and outcomes to create a learning feedback loop.

### Architecture: Generation Storage & Analytics

```
Chrome Extension
    ↓
Cloudflare Worker API
    ↓
┌─────────────────────────────────┐
│  Cache Layer (Cloudflare KV)    │  ← Check for existing generation
└─────────────────────────────────┘
    ↓ (if cache miss)
┌─────────────────────────────────┐
│  OpenAI via AI Gateway          │
└─────────────────────────────────┘
    ↓
┌─────────────────────────────────┐
│  Storage (Cloudflare D1)        │  ← Store everything
└─────────────────────────────────┘
    ↓
┌─────────────────────────────────┐
│  Analytics Dashboard            │  ← Learn & improve
└─────────────────────────────────┘
```

### Database Schema (Cloudflare D1)

**D1 = Cloudflare's serverless SQLite database**
- Serverless, no connection management
- SQL queries via Worker
- Free tier: 100k reads/day, 100MB storage
- Perfect for storing generations

```sql
-- Table 1: Generations (every AI reply we create)
CREATE TABLE generations (
  id TEXT PRIMARY KEY,                    -- UUID
  created_at INTEGER NOT NULL,            -- Unix timestamp

  -- Input context (what we sent to AI)
  ticket_id TEXT,                         -- Freshdesk ticket ID
  customer_name TEXT,
  customer_message TEXT,
  ticket_subject TEXT,
  agent_name TEXT,

  -- Context hash for caching
  context_hash TEXT NOT NULL,             -- SHA256 of input context

  -- AI response
  raw_reply TEXT NOT NULL,                -- Original AI generation
  model TEXT NOT NULL,                    -- gpt-4o-mini, etc.

  -- Metadata
  user_id TEXT NOT NULL,                  -- Who generated it
  domain TEXT NOT NULL,                   -- company.freshdesk.com

  -- Costs
  input_tokens INTEGER,
  output_tokens INTEGER,
  cost_usd REAL,

  -- Cache hit?
  from_cache INTEGER DEFAULT 0,           -- 1 if served from cache

  INDEX idx_context_hash (context_hash),
  INDEX idx_ticket_id (ticket_id),
  INDEX idx_user_domain (user_id, domain),
  INDEX idx_created_at (created_at)
);

-- Table 2: User Edits (what users changed)
CREATE TABLE generation_edits (
  id TEXT PRIMARY KEY,
  generation_id TEXT NOT NULL,
  created_at INTEGER NOT NULL,

  -- Edit type
  edit_type TEXT NOT NULL,                -- 'sentence_delete', 'dropdown_change', 'manual_edit'

  -- Edit details (JSON)
  edit_data TEXT NOT NULL,                -- JSON blob with specifics

  -- Examples:
  -- { "type": "sentence_delete", "sentence_id": 2, "sentence_text": "I apologize..." }
  -- { "type": "dropdown_change", "placeholder": "DELIVERY TIME", "from": "[DELIVERY TIME]", "to": "3-5 business days" }

  FOREIGN KEY (generation_id) REFERENCES generations(id),
  INDEX idx_generation (generation_id),
  INDEX idx_edit_type (edit_type)
);

-- Table 3: Final Outcomes (what actually got sent)
CREATE TABLE generation_outcomes (
  id TEXT PRIMARY KEY,
  generation_id TEXT NOT NULL,
  created_at INTEGER NOT NULL,

  -- What was sent
  final_text TEXT NOT NULL,               -- After all edits

  -- How much did they edit?
  edit_distance INTEGER,                  -- Levenshtein distance from original
  sentences_deleted INTEGER DEFAULT 0,
  dropdowns_changed INTEGER DEFAULT 0,
  manual_edits INTEGER DEFAULT 0,         -- Did they type new text?

  -- Outcome
  action TEXT NOT NULL,                   -- 'sent', 'copied', 'discarded'

  -- User feedback (optional)
  user_rating INTEGER,                    -- 1-5 stars (future feature)
  user_comment TEXT,                      -- Manual feedback

  FOREIGN KEY (generation_id) REFERENCES generations(id),
  INDEX idx_generation (generation_id),
  INDEX idx_action (action)
);

-- Table 4: Learning Insights (aggregated metrics)
CREATE TABLE prompt_performance (
  id TEXT PRIMARY KEY,
  created_at INTEGER NOT NULL,

  -- Time window
  date TEXT NOT NULL,                     -- YYYY-MM-DD

  -- Metrics
  total_generations INTEGER DEFAULT 0,
  cache_hit_rate REAL DEFAULT 0,
  avg_edit_distance REAL DEFAULT 0,
  avg_sentences_deleted REAL DEFAULT 0,

  -- Success indicators
  low_edit_count INTEGER DEFAULT 0,       -- Edit distance < 50 chars
  high_edit_count INTEGER DEFAULT 0,      -- Edit distance > 200 chars
  discarded_count INTEGER DEFAULT 0,      -- User didn't use it

  -- Cost
  total_cost_usd REAL DEFAULT 0,
  cost_saved_cache_usd REAL DEFAULT 0,

  INDEX idx_date (date)
);
```

### API Endpoints

#### 1. Generate Reply (with caching)

```javascript
POST /api/generate-reply

Request:
{
  "customerName": "John Doe",
  "customerMessage": "Where is my order?",
  "ticketSubject": "Order #12345",
  "ticketId": "67890",
  "agentName": "Sarah"
}

Response (cache hit):
{
  "generationId": "gen_abc123",
  "reply": "Hi John,\n\nThank you for reaching out...",
  "fromCache": true,
  "cacheAge": 3600,  // seconds
  "usage": {
    "inputTokens": 320,
    "outputTokens": 180,
    "cost": 0.00092
  }
}

Response (cache miss):
{
  "generationId": "gen_abc124",
  "reply": "Hi John,\n\nThank you for reaching out...",
  "fromCache": false,
  "usage": { ... }
}
```

**Worker Implementation:**
```javascript
export default {
  async fetch(request, env) {
    const body = await request.json();

    // 1. Calculate context hash for caching
    const contextHash = await hashContext({
      customerMessage: body.customerMessage,
      ticketSubject: body.ticketSubject
      // NOTE: Don't include customerName/agentName in hash
      // so "Where is my order?" works for any customer
    });

    // 2. Check cache (KV for speed, D1 for history)
    const cached = await env.GENERATION_CACHE.get(contextHash);
    if (cached) {
      const generation = JSON.parse(cached);

      // Log cache hit to D1
      await logCacheHit(env.DB, generation.id, body);

      return new Response(JSON.stringify({
        generationId: generation.id,
        reply: personalize(generation.reply, body), // Swap in names
        fromCache: true,
        cacheAge: Date.now() - generation.created_at,
        usage: generation.usage
      }));
    }

    // 3. Cache miss - generate new reply
    const generationId = crypto.randomUUID();
    const reply = await callOpenAI(body, env);

    // 4. Store in D1
    await env.DB.prepare(`
      INSERT INTO generations (
        id, created_at, context_hash, ticket_id,
        customer_name, customer_message, ticket_subject, agent_name,
        raw_reply, model, user_id, domain,
        input_tokens, output_tokens, cost_usd, from_cache
      ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    `).bind(
      generationId,
      Date.now(),
      contextHash,
      body.ticketId,
      body.customerName,
      body.customerMessage,
      body.ticketSubject,
      body.agentName,
      reply.text,
      'gpt-4o-mini',
      body.userId,
      body.domain,
      reply.usage.input_tokens,
      reply.usage.output_tokens,
      reply.usage.cost,
      0
    ).run();

    // 5. Store in KV cache (30 day TTL)
    await env.GENERATION_CACHE.put(contextHash, JSON.stringify({
      id: generationId,
      reply: reply.text,
      created_at: Date.now(),
      usage: reply.usage
    }), { expirationTtl: 30 * 24 * 60 * 60 });

    return new Response(JSON.stringify({
      generationId,
      reply: reply.text,
      fromCache: false,
      usage: reply.usage
    }));
  }
};

// Personalize cached replies with actual names
function personalize(template, context) {
  return template
    .replace(/\[CUSTOMER_NAME\]/g, context.customerName)
    .replace(/\[AGENT_NAME\]/g, context.agentName);
}

// Smart prompt to use placeholders
const SYSTEM_PROMPT = `
You are a customer support assistant. Generate professional replies.

IMPORTANT:
- Use [CUSTOMER_NAME] for the customer's name
- Use [AGENT_NAME] for the agent's name
- Use placeholders for unknown info: [AVAILABILITY], [DELIVERY TIME], [SHIP STATUS]

This allows reply caching while personalizing names.
`;
```

#### 2. Track Edits

```javascript
POST /api/track-edit

Request:
{
  "generationId": "gen_abc123",
  "editType": "sentence_delete",
  "editData": {
    "sentenceId": 2,
    "sentenceText": "I apologize for any inconvenience."
  }
}

Response:
{
  "success": true,
  "editId": "edit_xyz789"
}
```

**Extension sends edits in real-time:**
```javascript
// When user deletes a sentence
const deleteLine = () => {
  // ... delete DOM element ...

  // Track the edit
  fetch(API_URL + '/track-edit', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${token}` },
    body: JSON.stringify({
      generationId: currentGenerationId,
      editType: 'sentence_delete',
      editData: {
        sentenceId: selectedLineId,
        sentenceText: lineElement.textContent
      }
    })
  });
};
```

#### 3. Submit Final Outcome

```javascript
POST /api/submit-outcome

Request:
{
  "generationId": "gen_abc123",
  "finalText": "Hi John,\n\nYour order has shipped...",
  "action": "sent",  // or "copied", "discarded"
  "userRating": 4,   // optional
  "userComment": "Needed to add tracking number manually"
}

Response:
{
  "success": true,
  "outcomeId": "out_def456"
}
```

**Extension calls when reply is sent:**
```javascript
document.getElementById('robomate-insert-btn').onclick = () => {
  const finalText = getFinalReplyText();

  // Submit outcome
  await fetch(API_URL + '/submit-outcome', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${token}` },
    body: JSON.stringify({
      generationId: currentGenerationId,
      finalText: finalText,
      action: 'sent'
    })
  });

  injectTextIntoEditor(editor, finalText);
};
```

### Data Collection Philosophy

**Priority: Learning > Efficiency**

We're optimizing for data quality, not cost savings. Every generation is unique and tracked:
- Store every request, even if similar to previous ones
- No smart caching (yet) - want to see natural patterns first
- Each ticket gets fresh generation to capture context nuances
- Focus on making feedback frictionless, not reducing API calls

Once we have 1000+ feedback samples, we can decide if caching makes sense.

### Analytics Dashboard

**Built with Cloudflare Pages + D1 + Charts.js**

#### Key Metrics

**1. Cache Efficiency**
```sql
SELECT
  DATE(created_at / 1000, 'unixepoch') as date,
  COUNT(*) as total,
  SUM(from_cache) as cache_hits,
  ROUND(100.0 * SUM(from_cache) / COUNT(*), 2) as hit_rate
FROM generations
GROUP BY date
ORDER BY date DESC
LIMIT 30;
```

**2. Edit Analysis**
```sql
-- Which sentences get deleted most?
SELECT
  json_extract(edit_data, '$.sentence_text') as sentence,
  COUNT(*) as delete_count
FROM generation_edits
WHERE edit_type = 'sentence_delete'
GROUP BY sentence
ORDER BY delete_count DESC
LIMIT 20;
```

**3. Success Rate**
```sql
-- How often do users send without heavy edits?
SELECT
  CASE
    WHEN edit_distance < 20 THEN 'Excellent (minimal edits)'
    WHEN edit_distance < 100 THEN 'Good (light edits)'
    WHEN edit_distance < 300 THEN 'Fair (moderate edits)'
    ELSE 'Poor (heavy edits)'
  END as quality,
  COUNT(*) as count,
  ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) as percentage
FROM generation_outcomes
WHERE action = 'sent'
GROUP BY quality;
```

**4. Common Patterns**
```sql
-- What types of tickets generate the best replies?
SELECT
  substr(ticket_subject, 1, 30) as subject_pattern,
  AVG(o.edit_distance) as avg_edits,
  COUNT(*) as count
FROM generations g
JOIN generation_outcomes o ON g.id = o.generation_id
GROUP BY subject_pattern
HAVING count >= 5
ORDER BY avg_edits ASC
LIMIT 20;
```

**5. Prompt Performance Over Time**
```sql
-- Did our prompt changes improve things?
SELECT
  date,
  avg_edit_distance,
  cache_hit_rate,
  total_cost_usd
FROM prompt_performance
ORDER BY date DESC
LIMIT 90;  -- 3 months
```

### Learning Loop: Improving Prompts

**Process:**

1. **Collect Data** (automatic)
   - Every generation stored
   - Every edit tracked
   - Every outcome recorded

2. **Analyze Patterns** (weekly)
   ```sql
   -- Find sentences that get deleted >50% of the time
   SELECT sentence_text, delete_rate
   FROM (
     SELECT
       json_extract(e.edit_data, '$.sentence_text') as sentence_text,
       COUNT(DISTINCT e.generation_id) as deletes,
       (SELECT COUNT(*) FROM generations
        WHERE raw_reply LIKE '%' || sentence_text || '%') as total,
       ROUND(100.0 * deletes / total, 1) as delete_rate
     FROM generation_edits e
     WHERE e.edit_type = 'sentence_delete'
     GROUP BY sentence_text
   )
   WHERE total >= 10  -- Only sentences that appeared 10+ times
   AND delete_rate > 50
   ORDER BY delete_rate DESC;
   ```

   **Example output:**
   ```
   "I apologize for any inconvenience." → 78% deletion rate
   "Thank you for your patience." → 65% deletion rate
   "Please let me know if you need anything else." → 12% deletion rate
   ```

3. **Update Prompt** (manual)
   ```javascript
   // OLD PROMPT
   const SYSTEM_PROMPT = `
   Generate professional, empathetic support replies.
   Always apologize for inconvenience.
   `;

   // NEW PROMPT (based on data)
   const SYSTEM_PROMPT = `
   Generate professional support replies.
   Be direct and solution-focused.
   Only apologize if there's an actual problem (delay, error, etc.).
   Avoid generic pleasantries unless contextually appropriate.
   `;
   ```

4. **A/B Test** (if advanced)
   - Run old prompt for 50% of requests
   - Run new prompt for 50% of requests
   - Compare edit_distance metrics
   - Deploy winner

5. **Monitor Impact**
   - Check if avg_edit_distance decreases
   - Check if cache_hit_rate increases
   - Check if user_rating improves

### Feedback UI (Future Enhancement)

**Quick Feedback Widget:**
```javascript
// Extension shows after user sends reply
showFeedbackModal({
  generationId,
  question: "How was this AI suggestion?",
  options: [
    { label: "Perfect, sent as-is", rating: 5 },
    { label: "Good, minor tweaks", rating: 4 },
    { label: "Okay, needed edits", rating: 3 },
    { label: "Poor, rewrote most", rating: 2 },
    { label: "Useless, wrote from scratch", rating: 1 }
  ],
  commentPrompt: "What could be better? (optional)"
});
```

Stores in `generation_outcomes` table.

### Cost Impact

**Learning Phase (no caching):**
- 500 tickets/day = 15,000/month
- All unique generations
- OpenAI cost: ~$15/month (gpt-4o-mini)
- Cloudflare D1: ~$0.50/month (storage + writes)
- **Total: ~$16/month**

**Value:**
- Complete dataset of every generation + feedback
- Identify which 20% of replies cause 80% of edits
- Optimize prompts based on real usage data
- ROI: Save hours of agent time with better AI suggestions

The ~$16/month is an investment in understanding what works. Once we optimize prompts based on data, we can reduce edit time by 50%+, easily paying back the AI costs in saved labor.

### Privacy & Compliance

**PII Handling:**
- Customer names/messages stored encrypted
- Retention policy: 90 days, then purge
- GDPR right to deletion: DELETE FROM generations WHERE ticket_id = ?
- Data residency: Use Cloudflare region controls

**What to store:**
- ✅ Ticket subject (generic)
- ✅ Message content (for learning)
- ✅ Generated replies
- ✅ Edit patterns
- ❌ Customer email addresses
- ❌ Payment info
- ❌ Personal identifiers beyond name

### Migration Path

**Phase 1: Add tracking (Week 1)**
- Deploy tracking endpoints
- Extension sends edits/outcomes
- Start collecting data
- No changes to generation yet

**Phase 2: Enable caching (Week 2)**
- Implement context hashing
- Add KV cache layer
- Monitor cache hit rate
- Cost savings begin

**Phase 3: Build dashboard (Week 3)**
- Create analytics page
- Visualize edit patterns
- Identify improvement areas

**Phase 4: Learning loop (Week 4+)**
- Analyze data weekly
- Iterate on prompts
- Measure improvements
- Continuous optimization

### Open Questions

**Technical:**
- Cache TTL: 30 days or dynamic based on ticket volume?
- Context hash: Include ticket metadata (category, priority)?
- Storage: D1 or Durable Objects for high write volume?

**Business:**
- Show cache-hit indicator to users?
- Charge less for cached replies?
- Share cache across companies (anonymized)?

**Learning:**
- Automate prompt optimization with GPT-4?
- Manual review of high-edit replies?
- Feedback incentives (gamification)?

## Open Questions

1. **Token rotation frequency?**
   - Recommendation: 30 days with auto-refresh

2. **What happens if token is leaked?**
   - Revoke via admin dashboard
   - Issue new token to user
   - Monitor for abuse

3. **Multi-tenant support?**
   - Separate token pools per company
   - Per-company rate limits
   - Billing by company

4. **Offline support?**
   - Cache common responses
   - Graceful degradation
   - Queue requests for retry

5. **Custom prompts per company?**
   - Store in KV per domain
   - Admin UI to manage
   - Override default system prompt

## References

- [Cloudflare Workers Docs](https://developers.cloudflare.com/workers/)
- [Cloudflare AI Gateway](https://developers.cloudflare.com/ai-gateway/)
- [JWT Best Practices](https://datatracker.ietf.org/doc/html/rfc8725)
- [OWASP API Security](https://owasp.org/www-project-api-security/)
- [Rate Limiting Patterns](https://cloud.google.com/architecture/rate-limiting-strategies-techniques)

---

**Last Updated:** 2025-01-15
**Status:** Proposed (pending approval)
**Owner:** Development Team
