{
  "accuracy": "83",
  "results": [
    {
      "brief": "make a poster for a climbing festival",
      "expected": "poster",
      "got": "poster",
      "correct": true
    },
    {
      "brief": "create a tech startup pitch deck",
      "expected": "presentation",
      "got": "presentation",
      "correct": true
    },
    {
      "brief": "design business cards for a law firm",
      "expected": "brand-collateral",
      "got": "poster",
      "correct": false
    },
    {
      "brief": "make a social media banner for a music event",
      "expected": "poster",
      "got": "poster",
      "correct": true
    },
    {
      "brief": "create an animated promo for a product launch",
      "expected": "motion",
      "got": "motion",
      "correct": true
    },
    {
      "brief": "design a restaurant menu header",
      "expected": "poster",
      "got": "poster",
      "correct": true
    }
  ],
  "calibrationAnalysis": "### Analysis of Routing Results\n\n**Current Accuracy:** 83% (5/6 correct)\n**Primary Failure Mode:** **Over-generalization to the \"poster\" category.**\n\nThe system is currently exhibiting a \"gravity\" toward the `poster` label. When the model encounters a design task that doesn't clearly fit a specialized category (like `motion` or `presentation`), it defaults to `poster` rather than identifying it as `brand-collateral` or a more generic category.\n\n#### Specific Error Breakdown:\n*   **The Error:** `\"design business cards for a law firm\"` $\\rightarrow$ `poster` (Expected: `brand-collateral`).\n*   **The Root Cause:** The model is likely using keyword-based heuristics or shallow semantic matching. It sees \"design\" and \"for a [company]\" and maps it to the most frequent/prominent category in its training set (`poster`), failing to recognize the specific medium (business cards) as part of a broader brand identity suite.\n\n---\n\n### Suggested Improvements\n\n#### 1. Refine the Taxonomy (Schema Level)\nThe gap between `poster` and `brand-collateral` is currently causing confusion. \n*   **Introduce a \"Print/Stationery\" category:** If `brand-collateral` is too broad, create a middle-tier category.\n*   **Define \"Poster\" strictly:** Update the system prompt to define `poster` exclusively as \"single-sheet large-format displays\" to prevent it from capturing small-format items like business cards or flyers.\n\n#### 2. Implement Few-Shot Prompting (Prompt Level)\nThe model needs explicit examples of what constitutes `brand-collateral` vs. `poster`. Add examples to your routing prompt:\n*   *Input:* \"Design letterheads and business cards\" $\\rightarrow$ `brand-collateral`\n*   *Input:* \"Create a large event flyer\" $\\rightarrow$ `poster`\n*   *Input:* \"Make a banner for a street protest\" $\\rightarrow$ `poster`\n\n#### 3. Feature Engineering: Entity Extraction (Logic Level)\nBefore routing, implement a lightweight preprocessing step to extract the **\"Medium\"** from the brief.\n*   **Current logic:** `Brief` $\\rightarrow$ `Router` $\\rightarrow$ `Label`\n*   **Improved logic:** `Brief` $\\rightarrow$ `Extract Medium (e.g., \"business cards\")` $\\rightarrow$ `Router` $\\rightarrow$ `Label`\n*   If the extracted medium is \"business card,\" \"letterhead,\" or \"envelope,\" force the route to `brand-collateral`.\n\n#### 4. Strengthen Semantic Boundaries (Instruction Level)\nUpdate the Router's instructions to include a **\"Negative Constraint\"** section:\n*   *\"Do NOT classify items as 'poster' if the brief mentions business cards, stationery, or corporate identity elements. These must be routed to 'brand-collateral'.\"*\n\n#### 5. Confidence Thresholding (System Level)\nIf the model's confidence score for a classification is low (e.g., $<0.7$), instead of forcing a route to `poster`, route it to a `general-design` or `unclassified` bucket. This prevents the \"false positive\" errors that skew your accuracy.",
  "timestamp": "2026-04-23T03:52:37.768Z"
}