AutomationFlowsAI & RAG › Monitor AI Quality Drift with Gpt-4o-mini Evaluations and Slack Alerts

Monitor AI Quality Drift with Gpt-4o-mini Evaluations and Slack Alerts

ByElvis Sarvia @elvissaravia on n8n.io

Catch AI quality drift before your users do. This template ties scheduled evaluation, LLM-as-a-Judge scoring, and threshold-based alerts into a continuous monitoring loop that fires a Slack alert the moment a response drops below your quality bar.

Cron / scheduled trigger★★★★☆ complexityAI-powered19 nodesEvaluation TriggerAgentOpenAI ChatEvaluationOpenAISlack
AI & RAG Trigger: Cron / scheduled Nodes: 19 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #15135 — we link there as the canonical source.

This workflow follows the Agent → OpenAI Chat recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "nodes": [
    {
      "id": "a074a6d5-5412-4187-a46c-89e4cb4262f8",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -480,
        -128
      ],
      "parameters": {
        "width": 700,
        "height": 816,
        "content": "# Ongoing Monitoring with Alerts\n\n### How it works\n1. **Production path:** A Daily Schedule trigger kicks off the AI Agent on its normal cadence and Production logic handles real traffic.\n2. **Evaluation path:** The Evaluation Trigger reads test cases (question + expected answer) from a Data Table and feeds each one through the same AI Agent as a separate execution.\n3. **Judging:** A separate judge model (Score Response) scores each AI response on correctness (1-5) and helpfulness (1-5). Evaluation - Set Outputs and Set Metrics record results in the Evaluations tab.\n4. **Per-case threshold check:** The Check Threshold Code node averages the two scores per test case and the Below Threshold? IF node compares against the configured threshold (default 3.5/5).\n5. **Alerting:** When a test case falls below the threshold, a Slack Alert fires immediately. When scores are healthy, the flow routes to All Clear. Aggregate metrics live in the Evaluations tab for trend-level tracking.\n\n### Setup\n1. Add credentials for the OpenAI Chat Model, the judge model, and the Slack node.\n2. Create the Data Table with your golden dataset (question + expected answer pairs, ideally seeded from production traffic).\n3. Configure the Slack channel and message template for the Slack Alert node.\n4. Adjust the Daily Schedule trigger to your desired cadence (daily, hourly, weekly).\n\n### Customization\n- Tune the threshold in Check Threshold to match the stakes of your workflow (e.g., 4.0 for customer-facing, 3.0 for internal drafts).\n- Swap Slack for email, PagerDuty, or any other notification channel.\n- Add per-metric thresholds so a low correctness score alerts a different channel than a low helpfulness score.\n- Grow the golden dataset over time by adding any production failures back into the Data Table as new test cases.\n\n---\nThis template is a learning companion to the **Production AI Playbook**, a series that explores strategies, shares best practices, and provides practical examples for building reliable AI systems in n8n."
      },
      "typeVersion": 1
    },
    {
      "id": "707c7560-d36c-4ddd-9808-31146936602e",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2496,
        -128
      ],
      "parameters": {
        "color": 7,
        "width": 300,
        "height": 572,
        "content": "Alert fires when average score drops below threshold"
      },
      "typeVersion": 1
    },
    {
      "id": "a4ce1fbc-763a-4e01-a8ce-390f378a7e48",
      "name": "Daily Schedule",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        528,
        288
      ],
      "parameters": {
        "rule": {
          "interval": [
            {
              "field": "hours",
              "hoursInterval": 24
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "ced4cc61-2490-45cc-b091-888ad66d70fd",
      "name": "When fetching a dataset row",
      "type": "n8n-nodes-base.evaluationTrigger",
      "position": [
        304,
        96
      ],
      "parameters": {
        "source": "dataTable",
        "dataTableId": {
          "__rl": true,
          "mode": "list",
          "value": "VPCxS9mO1gPbvyRa",
          "cachedResultUrl": "/projects/5xhYaLjYeyMka6t9/datatables/VPCxS9mO1gPbvyRa",
          "cachedResultName": "Customer Support QA Test Cases"
        }
      },
      "typeVersion": 4.6
    },
    {
      "id": "a3b9889d-2516-4a13-8666-cc6f065500fe",
      "name": "Format eval input",
      "type": "n8n-nodes-base.code",
      "position": [
        528,
        96
      ],
      "parameters": {
        "jsCode": "const row = $input.first().json;\nreturn [{ json: { chatInput: row.input || row.question } }];"
      },
      "typeVersion": 2
    },
    {
      "id": "763fb99f-1b06-4fe2-bfdb-a504e318c230",
      "name": "AI Agent",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        752,
        192
      ],
      "parameters": {
        "text": "={{ $json.chatInput }}",
        "options": {
          "systemMessage": "You are a customer support AI assistant. Your job is to help customers with their inquiries about billing, technical issues, sales questions, and general support.\n\nWhen responding to a customer ticket:\n1. Identify the type of issue (billing, technical, sales, or general)\n2. Provide a clear, helpful, and accurate response\n3. Include specific next steps or solutions when possible\n4. Be professional, empathetic, and concise\n\nAlways aim to resolve the customer's issue in a single response when possible."
        },
        "promptType": "define"
      },
      "typeVersion": 1.9
    },
    {
      "id": "50a96d78-7e71-43b0-abdb-7d3da34da77e",
      "name": "OpenAI Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "position": [
        832,
        416
      ],
      "parameters": {
        "model": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-4o-mini",
          "cachedResultName": "GPT-4O-MINI"
        },
        "options": {}
      },
      "credentials": {
        "openAiApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "8e6eb1bb-8c62-46b9-b975-29ec66563302",
      "name": "Evaluating?",
      "type": "n8n-nodes-base.evaluation",
      "position": [
        1104,
        192
      ],
      "parameters": {
        "operation": "checkIfEvaluating"
      },
      "typeVersion": 4.6
    },
    {
      "id": "a391933d-6a01-4630-9ce1-10c91cf65ec3",
      "name": "Production logic",
      "type": "n8n-nodes-base.noOp",
      "position": [
        1392,
        288
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "f7a4dad7-ddbd-48eb-9e57-135281985369",
      "name": "Score Response",
      "type": "@n8n/n8n-nodes-langchain.openAi",
      "position": [
        1328,
        96
      ],
      "parameters": {
        "modelId": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-4o-mini",
          "cachedResultName": "GPT-4O-MINI"
        },
        "options": {},
        "messages": {
          "values": [
            {
              "content": "=You are evaluating AI customer support responses for correctness and helpfulness.\n\nScore the following response on two dimensions, each on a 1-5 scale:\n\n**Correctness** (does the response match the expected answer?):\n- 5: Fully correct and complete\n- 4: Mostly correct, minor issues\n- 3: Partially correct\n- 2: Mostly incorrect\n- 1: Completely wrong\n\n**Helpfulness** (is the response clear, actionable, and useful?):\n- 5: Extremely helpful, clear next steps\n- 4: Helpful with minor gaps\n- 3: Somewhat helpful\n- 2: Barely helpful\n- 1: Not helpful at all\n\n**Expected:** {{  $('When fetching a dataset row').first().json.expected_output }}\n**AI Response:** {{  $json.output }}\n\nRespond with ONLY valid JSON: {\"correctness\": <1-5>, \"helpfulness\": <1-5>, \"justification\": \"<brief reason>\"}}}}}"
            }
          ]
        }
      },
      "credentials": {
        "openAiApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.8
    },
    {
      "id": "ac3214a6-2ac9-4b09-9e7e-a484aa4f19d6",
      "name": "Evaluation - Set Outputs",
      "type": "n8n-nodes-base.evaluation",
      "position": [
        1680,
        96
      ],
      "parameters": {
        "source": "dataTable",
        "outputs": {
          "values": [
            {
              "outputName": "response",
              "outputValue": "={{  $('AI Agent').item.json.output }} }}"
            }
          ]
        },
        "dataTableId": {
          "__rl": true,
          "mode": "list",
          "value": "VPCxS9mO1gPbvyRa",
          "cachedResultUrl": "/projects/5xhYaLjYeyMka6t9/datatables/VPCxS9mO1gPbvyRa",
          "cachedResultName": "Customer Support QA Test Cases"
        }
      },
      "typeVersion": 4.6
    },
    {
      "id": "7624cc6f-4252-4e3d-8118-b8a4c719a5a5",
      "name": "Set Metrics",
      "type": "n8n-nodes-base.evaluation",
      "position": [
        1904,
        96
      ],
      "parameters": {
        "metrics": {
          "assignments": [
            {
              "id": "m1",
              "name": "correctness_score",
              "type": "number",
              "value": "={{ JSON.parse($json.message.content.replace(/```json\\n?|\\n?```/g, '').trim()).correctness }}"
            },
            {
              "id": "3025c8aa-7fc4-4ee0-bc94-5befa946f85d",
              "name": "helpfulness_score",
              "type": "number",
              "value": "={{ JSON.parse($json.message.content.replace(/```json\\n?|\\n?```/g, '').trim()).helpfulness }}"
            }
          ]
        },
        "operation": "setMetrics"
      },
      "typeVersion": 4.6
    },
    {
      "id": "060eebc6-f857-41a3-ad8c-e77df3f761da",
      "name": "Check Threshold",
      "type": "n8n-nodes-base.code",
      "position": [
        2128,
        96
      ],
      "parameters": {
        "jsCode": "// Calculate average score from Set Metrics output\nconst items = $input.all();\nconst scores = items.map(i => {\n    const correctness = i.json.correctness_score || 0;\n    const helpfulness = i.json.helpfulness_score || 0;\n    return (correctness + helpfulness) / 2;\n});\nconst avgScore = scores.reduce((a, b) => a + b, 0) / scores.length;\nconst threshold = 3.5;\n\nreturn [{\n    json: {\n          average_score: Math.round(avgScore * 100) / 100,\n          total_cases: scores.length,\n          below_threshold: avgScore < threshold,\n          threshold: threshold,\n          scores: scores\n    }\n}];"
      },
      "typeVersion": 2
    },
    {
      "id": "0ab6fc91-0aac-4ef4-8439-11788cb87d54",
      "name": "Below Threshold?",
      "type": "n8n-nodes-base.if",
      "position": [
        2352,
        96
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "loose"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "1",
              "operator": {
                "type": "string",
                "operation": "equals"
              },
              "leftValue": "={{ $json.below_threshold }}",
              "rightValue": "true"
            }
          ]
        },
        "looseTypeValidation": true
      },
      "typeVersion": 2.2
    },
    {
      "id": "8c88b554-8b07-4734-b581-dd5069b61114",
      "name": "Slack Alert",
      "type": "n8n-nodes-base.slack",
      "position": [
        2576,
        0
      ],
      "parameters": {
        "text": "=:warning: *AI Quality Alert*\n\nThe daily evaluation detected a quality drop.\n\n*Average Score:* {{ $json.average_score }}/5\n*Threshold:* {{ $json.threshold }}/5\n*Test Cases:* {{ $json.total_cases }}\n\nPlease investigate recent model or prompt changes.",
        "select": "channel",
        "channelId": {
          "__rl": true,
          "mode": "list",
          "value": "C0AFUTD89K5",
          "cachedResultName": "n8n-tests"
        },
        "otherOptions": {},
        "authentication": "oAuth2"
      },
      "credentials": {
        "slackOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "758a87c7-82f2-474d-9a75-81b212393c20",
      "name": "All Clear",
      "type": "n8n-nodes-base.noOp",
      "position": [
        2576,
        192
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "107c5c3a-673c-4867-a536-33c981a1ab8c",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        256,
        -48
      ],
      "parameters": {
        "color": 7,
        "width": 768,
        "height": 640,
        "content": "## Customer Support Agent"
      },
      "typeVersion": 1
    },
    {
      "id": "2d323441-4a2e-40b9-9b55-509dc1016d46",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1040,
        -48
      ],
      "parameters": {
        "color": 7,
        "width": 992,
        "height": 640,
        "content": "## LLM-as-a-Judge"
      },
      "typeVersion": 1
    },
    {
      "id": "7782730e-7cd6-4735-8c3d-01b2b71ea0fb",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2064,
        -176
      ],
      "parameters": {
        "color": 7,
        "width": 832,
        "height": 768,
        "content": "## Monitor Degradations"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "AI Agent": {
      "main": [
        [
          {
            "node": "Evaluating?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Evaluating?": {
      "main": [
        [
          {
            "node": "Score Response",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Production logic",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set Metrics": {
      "main": [
        [
          {
            "node": "Check Threshold",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Daily Schedule": {
      "main": [
        [
          {
            "node": "AI Agent",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Score Response": {
      "main": [
        [
          {
            "node": "Evaluation - Set Outputs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check Threshold": {
      "main": [
        [
          {
            "node": "Below Threshold?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Below Threshold?": {
      "main": [
        [
          {
            "node": "Slack Alert",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "All Clear",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Format eval input": {
      "main": [
        [
          {
            "node": "AI Agent",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "AI Agent",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Evaluation - Set Outputs": {
      "main": [
        [
          {
            "node": "Set Metrics",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "When fetching a dataset row": {
      "main": [
        [
          {
            "node": "Format eval input",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

Catch AI quality drift before your users do. This template ties scheduled evaluation, LLM-as-a-Judge scoring, and threshold-based alerts into a continuous monitoring loop that fires a Slack alert the moment a response drops below your quality bar.

Source: https://n8n.io/workflows/15135/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

Marketing, content, and enablement teams that need a quick, human-readable summary of every new video published by the YouTube channels they care about—without leaving Slack.

HTTP Request, Google Sheets, XML +7
AI & RAG

This workflow is designed for Japanese-speaking professionals, and learners who want to efficiently stay up to date with practical productivity, lifehack, and efficiency-related insights from Japanese

RSS Feed Read, Chain Llm, Google Gemini Chat +7
AI & RAG

It analyzes each review’s sentiment and tone and posts a human-like response — saving time for indie devs, founders, and PMs managing multiple apps. Respond to reviews at scale without sounding roboti

OpenAI, Memory Buffer Window, OpenAI Chat +3
AI & RAG

Tracking what people say about your brand on Twitter can be overwhelming, especially when important mentions slip through the cracks. This workflow automates the process: it scrapes Twitter mentions,

OpenAI, Output Parser Structured, Google Sheets +4
AI & RAG

Automated content publishing system that discovers industry news, transforms it into original articles using GPT-4, and publishes across multiple channels with SEO optimization and intelligent duplica

HTTP Request, OpenAI, Slack +4