Most-used Evaluation workflows
- Evaluate AI Workflows Using Google Sheets, Gemini, Claude, Gpt, and Perplexity (64 nodes)
- My Solution for the "agentic Arena Community Contest" (rag, Qdrant, Mistral Ocr) — n8n Evaluation workflow (41 nodes)
- Faqs Embeddings (35 nodes)
- Route and Qualify Email Leads with Gmail, Gemini, Slack, Sheets and Salesforce — n8n Evaluation workflow (35 nodes)
- Route Event Sales Leads with Gmail, Google Gemini, Sheets and Salesforce (35 nodes)
- Faqs Embeddings (google Docs) — n8n Evaluation workflow (35 nodes)
- Automate Reddit Replies with F5bot Alerts & Gpt-5 Personalized Comments (31 nodes)
- Custom Discord Notifications for Radarr, Sonarr, Bazarr Etc. — n8n Evaluation workflow (28 nodes)
- Evaluate AI Agent Response Correctness with Openai and Ragas Methodology (27 nodes)
- Evaluation Metric Example: RAG Document Relevance — n8n Evaluation workflow (26 nodes)
This template and YouTube video goes over 5 different implementations of evaluations within n8n. Categorization Correctness Tools used String similarity Helpfulness
🤖📈 This workflow is my personal solution for the Agentic Arena Community Contest, where the goal is to build a Retrieval-Augmented Generation (RAG) AI agent capable of answering questions based on a p
FAQs Embeddings. Uses googleDocs, openAi, supabase, httpRequest. Event-driven trigger; 35 nodes.
Who is this for? Event sales teams & conference organizers processing 100+ sponsor/partner emails weekly who need instant lead qualification, Salesforce automation, & pipeline analytics. _
Email Sentiment Router for Event Sales Leads
FAQs Embeddings. Uses googleDocs, openAi, supabase, httpRequest. Event-driven trigger; 35 nodes.
Automate how you reply to Reddit posts using AI-generated, first-person comments that sound human, follow subreddit rules, and (optionally) promote your own links or products.
This is a simple temlate that will allow you to customise the notifications in Radarr, Sonarr, Bazarr and similar. By default the notifications are configured to be sent to discord and look similar to
The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/answercorre
This is a template for n8n's evaluation feature.
The scoring approach is adapted from https://cloud.google.com/vertex-ai/generative-ai/docs/models/metrics-templates#pointwise_groundedness This evaluation works best for an agent that requires documen
Developers building AI-powered workflows who want to ensure their agents work reliably. If you need to validate AI outputs, test agent behavior systematically, or build maintainable automation, this t
The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/answersimil
This n8n template demonstrates how to deploy an AI workflow in production while simultaneously running a robust, data-driven Evaluation Framework to ensure quality and optimize costs.
The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/answerrelev
Catch AI quality drift before your users do. This template ties scheduled evaluation, LLM-as-a-Judge scoring, and threshold-based alerts into a continuous monitoring loop that fires a Slack alert the
The scoring approach is adapted from https://cloud.google.com/vertex-ai/generative-ai/docs/models/metrics-templates#pointwisesummarizationquality This evaluation works best for an AI summarization wor
Who's it for
This is a template for n8n's evaluation feature.
This workflow is a beginner-friendly tutorial demonstrating how to use the Evaluation tool to automatically score the AI’s output against a known correct answer (“ground truth”) stored in a Google She
Score open-ended AI responses with a judge model. This template shows how to evaluate a customer support agent using a separate LLM that rates each response on correctness and helpfulness, going beyon
This is a template for n8n's evaluation feature.
This is a template for n8n's evaluation feature.
Measure how well your AI classifier actually performs. This template shows how to evaluate a support ticket classifier using n8n's built-in evaluation system, comparing AI predictions against expected
This is a template for n8n's evaluation feature.
evaluation. Uses formTrigger, lmChatOpenAi, agent, form. Event-driven trigger; 10 nodes.
26 of 26 workflows in this view · Browse all →
FAQ
How many n8n Evaluation workflows are in the catalog?
26 n8n workflows in AutomationFlows currently use the Evaluation integration — triggers, actions, or both.
How do I connect Evaluation in n8n?
After importing the workflow JSON, n8n will prompt for Evaluation credentials on the relevant nodes. AutomationFlows strips credential IDs before publishing — you'll add your own.
Can I combine these with other integrations?
Yes — most Evaluation workflows pair with adjacent tools (Slack alerts, Google Sheets logging, OpenAI summarisation). Browse the integration tags on each workflow page to discover pairings.