AutomationFlowsAI & RAG › Extract Specific Website Data with Form Input, Gemini 2.5 Flash and Gmail

Extract Specific Website Data with Form Input, Gemini 2.5 Flash and Gmail

ByBilly Christi @billy on n8n.io

This workflow creates an automated web scraper that accepts form submissions, extracts specific data from any website using AI, and emails the results back to you.

Event trigger★★★★☆ complexityAI-powered13 nodesOutput Parser StructuredHTTP RequestChain LlmGmailForm TriggerGoogle Gemini Chat
AI & RAG Trigger: Event Nodes: 13 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #7754 — we link there as the canonical source.

This workflow follows the Chainllm → Form Trigger recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "nodes": [
    {
      "id": "0f86ab8c-2915-4d0a-b2fd-802c3740053b",
      "name": "Structured Output Parser",
      "type": "@n8n/n8n-nodes-langchain.outputParserStructured",
      "position": [
        1080,
        700
      ],
      "parameters": {
        "jsonSchemaExample": "{\n    \"result\": \"extracted value(s)\"\n}"
      },
      "typeVersion": 1.2
    },
    {
      "id": "bb283c5b-77b7-4b19-8834-8d67a3d93cb9",
      "name": "Get HTML from source url",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        480,
        500
      ],
      "parameters": {
        "url": "={{ $json['Source URL'] }}",
        "options": {}
      },
      "typeVersion": 4.2
    },
    {
      "id": "6d2fe6d0-8d0a-466e-8550-b3a7fa79ab06",
      "name": "Data Extractor LLM Chain",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        900,
        500
      ],
      "parameters": {
        "text": "=Your task is to extract the exact information specified by the user.\n\nUser\u2019s extraction request:\n\"{{ $('Web Scraper form submission').item.json['Data to extract'] }}\"\n\nRules:\n1. Extract ONLY the requested information.\n2. If multiple matches exist, combine them into a single string separated by commas.\n3. Do NOT add explanations or extra text\u2014output only the extracted data.\n4. Maintain the original values unless formatting is requested.\n5. If no matches are found, return: { \"result\": \"No data found\" }.\n6. Always return the response in this format:\n{\n    \"result\": \"extracted value(s)\"\n}\n\nHere is the source data:\n{{ $json.body }}\n",
        "promptType": "define",
        "hasOutputParser": true
      },
      "typeVersion": 1.6
    },
    {
      "id": "1913ba31-fd5a-44ee-baa7-98d4346d6dd4",
      "name": "Gmail - Send Result",
      "type": "n8n-nodes-base.gmail",
      "position": [
        1320,
        500
      ],
      "parameters": {
        "sendTo": " user@example.com",
        "message": "=Your web scraping task has been completed.\n\nSource URL:\n{{ $('Web Scraper form submission').item.json['Source URL'] }}\n\nData Requested:\n{{ $('Web Scraper form submission').item.json['Data to extract'] }}\n\nExtracted Result:\n{{ $json.output.result }}\n\nThank you for using our web scraping automation.",
        "options": {
          "appendAttribution": false
        },
        "subject": "=\u2705 Web Scraping Result for {{ $('Web Scraper form submission').item.json['Source URL'] }}",
        "emailType": "text"
      },
      "credentials": {
        "gmailOAuth2": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 2.1
    },
    {
      "id": "acb22cbf-9f79-49c4-8f5f-05a951b27f9c",
      "name": "Web Scraper form submission",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        280,
        500
      ],
      "parameters": {
        "options": {},
        "formTitle": "Web Scraper Form",
        "formFields": {
          "values": [
            {
              "fieldLabel": "Source URL"
            },
            {
              "fieldLabel": "Data to extract"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "4e206df0-7b3f-4303-a6a4-ddcd73fb6bf9",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -320,
        420
      ],
      "parameters": {
        "color": 4,
        "width": 500,
        "height": 360,
        "content": "## SETUP REQUIRED\n\nWorkflow Configurations:\n- Update the email recipient in the Gmail node (currently set to template_data_extactor_replace_me@yopmail.com)\n- Adjust the JSON schema in the Structured Output Parser if you need different output formats\n- Modify the LLM prompt in the Data Extractor LLM Chain based on your specific extraction requirements\n\nRequired Credentials:\n- Google Gemini API Key (Google PaLM API account)\n- Gmail Credential for sending result emails"
      },
      "typeVersion": 1
    },
    {
      "id": "d2a7d845-47dc-4986-b897-26c56232751f",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -420,
        920
      ],
      "parameters": {
        "color": 4,
        "width": 600,
        "height": 400,
        "content": "## \ud83d\udd0dExtract Specific Website Data with Form Input, Gemini 2.5 flash and Gmail Delivery\n\nWhat This Template Does:\n\n- Provides a web form interface for users to submit scraping requests\n- Accepts any website URL and custom data extraction requirements\n- Fetches HTML content from the specified source URL\n- Uses Google Gemini AI to intelligently extract only the requested information\n- Processes raw HTML content and returns structured JSON results\n- Automatically sends extraction results via Gmail with detailed reporting\n- Handles various data types and formats while maintaining original values unless formatting is requested\n"
      },
      "typeVersion": 1
    },
    {
      "id": "261e3806-c514-4bbc-934f-cbade5550f88",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        220,
        920
      ],
      "parameters": {
        "color": 4,
        "width": 1000,
        "height": 300,
        "content": "## \ud83d\udccb WORKFLOW PROCESS OVERVIEW\n\nStep 1: \ud83d\udcdd Web Scraper Form Submission triggers the workflow when users submit URL and extraction requirements\nStep 2: \ud83c\udf10 Get HTML from Source URL fetches the complete HTML content from the provided website\nStep 3: \ud83d\udd27 HTML Extractor processes the raw HTML and extracts the body content for analysis\nStep 4: \ud83e\udd16 Data Extractor LLM Chain uses Google Gemini AI to analyze content and extract only the specific data requested by the user\nStep 5: \ud83d\udcca Structured Output Parser formats the AI response into clean JSON structure with standardized format\nStep 6: \ud83d\udce7 Gmail Send Result delivers the extraction results via email including:\n  - Original source URL\n  - Data extraction request details  \n  - Clean extracted results\n  - Professional formatting with success confirmation"
      },
      "typeVersion": 1
    },
    {
      "id": "690951dd-5be4-4e54-b86f-669c2ee51de8",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        820,
        320
      ],
      "parameters": {
        "color": 4,
        "width": 400,
        "height": 560,
        "content": "## Data Extractor LLM Chain  \nThis is where we extract the content based on the user request  \n\nConfiguration:  \nYou can update the prompt and the model here to adjust to your use case.  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "1240b386-bbce-4d44-afe9-748783e457ab",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1240,
        340
      ],
      "parameters": {
        "color": 4,
        "width": 260,
        "height": 340,
        "content": "## Gmail - Send Results  \n\nConfiguration:  \nUpdate the target email  \nUpdate the email subject and body  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "ddc340d6-9e3c-4326-90cc-e1b06aa3c752",
      "name": "Google Gemini Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        880,
        700
      ],
      "parameters": {
        "options": {},
        "modelName": "models/gemini-2.5-flash"
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "abb31cf4-2d1a-426b-a42d-821729c15f7b",
      "name": "HTML Extractor",
      "type": "n8n-nodes-base.html",
      "position": [
        660,
        500
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "extractionValues": {
          "values": [
            {
              "key": "body",
              "cssSelector": "body"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "f08e9aaa-344c-4791-8a34-2df6aadcbf83",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1540,
        140
      ],
      "parameters": {
        "color": 4,
        "width": 380,
        "height": 760,
        "content": "# \ud83d\udc4b Hi, I\u2019m Billy\n![My Photo](https://i.ibb.co/Gvn63Bzc/Billy-Christi-AI-Automation.jpg)\nI help businesses build **n8n workflows** & **AI automation projects**.  \nNeed help with n8n or AI Automation projects? \nContact me and let\u2019s build your automation together.\n\n\ud83d\udce9 **Email:** billychartanto@gmail.com  \n\ud83e\udd1d **n8n Creator:** [n8n.io/creators/billy](https://n8n.io/creators/billy/)\n\ud83c\udf10 **My n8n Projects:** [billychristi.com/n8n](https://www.billychristi.com/n8n)  \n\n\n\n---\n\ud83d\udca1 Feel free to get in touch if you\u2019d like help on your next automation project or if you have any feedback or thoughts to share.\n"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "HTML Extractor": {
      "main": [
        [
          {
            "node": "Data Extractor LLM Chain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Data Extractor LLM Chain": {
      "main": [
        [
          {
            "node": "Gmail - Send Result",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get HTML from source url": {
      "main": [
        [
          {
            "node": "HTML Extractor",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Gemini Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Data Extractor LLM Chain",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Structured Output Parser": {
      "ai_outputParser": [
        [
          {
            "node": "Data Extractor LLM Chain",
            "type": "ai_outputParser",
            "index": 0
          }
        ]
      ]
    },
    "Web Scraper form submission": {
      "main": [
        [
          {
            "node": "Get HTML from source url",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This workflow creates an automated web scraper that accepts form submissions, extracts specific data from any website using AI, and emails the results back to you.

Source: https://n8n.io/workflows/7754/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

Automate your lead intake, scoring, and outreach pipeline. This workflow collects leads from forms, enriches and scores them using Relevance AI, routes them by quality, and triggers the right follow-u

Form Trigger, HTTP Request, Chain Llm +6
AI & RAG

Content - Newsletter Agent. Uses formTrigger, chainLlm, outputParserStructured, httpRequest. Event-driven trigger; 91 nodes.

Form Trigger, Chain Llm, Output Parser Structured +8
AI & RAG

Content - Newsletter Agent. Uses formTrigger, chainLlm, outputParserStructured, httpRequest. Event-driven trigger; 87 nodes.

Form Trigger, Chain Llm, Output Parser Structured +7
AI & RAG

This template attempts to replicate OpenAI's DeepResearch feature which, at time of writing, is only available to their pro subscribers.

Output Parser Structured, OpenAI Chat, Form Trigger +8
AI & RAG

My workflow 53. Uses formTrigger, httpRequest, lmChatOpenAi, form. Event-driven trigger; 74 nodes.

Form Trigger, HTTP Request, OpenAI Chat +15