{
  "nodes": [
    {
      "id": "0f86ab8c-2915-4d0a-b2fd-802c3740053b",
      "name": "Structured Output Parser",
      "type": "@n8n/n8n-nodes-langchain.outputParserStructured",
      "position": [
        1080,
        700
      ],
      "parameters": {
        "jsonSchemaExample": "{\n    \"result\": \"extracted value(s)\"\n}"
      },
      "typeVersion": 1.2
    },
    {
      "id": "bb283c5b-77b7-4b19-8834-8d67a3d93cb9",
      "name": "Get HTML from source url",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        480,
        500
      ],
      "parameters": {
        "url": "={{ $json['Source URL'] }}",
        "options": {}
      },
      "typeVersion": 4.2
    },
    {
      "id": "6d2fe6d0-8d0a-466e-8550-b3a7fa79ab06",
      "name": "Data Extractor LLM Chain",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        900,
        500
      ],
      "parameters": {
        "text": "=Your task is to extract the exact information specified by the user.\n\nUser\u2019s extraction request:\n\"{{ $('Web Scraper form submission').item.json['Data to extract'] }}\"\n\nRules:\n1. Extract ONLY the requested information.\n2. If multiple matches exist, combine them into a single string separated by commas.\n3. Do NOT add explanations or extra text\u2014output only the extracted data.\n4. Maintain the original values unless formatting is requested.\n5. If no matches are found, return: { \"result\": \"No data found\" }.\n6. Always return the response in this format:\n{\n    \"result\": \"extracted value(s)\"\n}\n\nHere is the source data:\n{{ $json.body }}\n",
        "promptType": "define",
        "hasOutputParser": true
      },
      "typeVersion": 1.6
    },
    {
      "id": "1913ba31-fd5a-44ee-baa7-98d4346d6dd4",
      "name": "Gmail - Send Result",
      "type": "n8n-nodes-base.gmail",
      "position": [
        1320,
        500
      ],
      "parameters": {
        "sendTo": " user@example.com",
        "message": "=Your web scraping task has been completed.\n\nSource URL:\n{{ $('Web Scraper form submission').item.json['Source URL'] }}\n\nData Requested:\n{{ $('Web Scraper form submission').item.json['Data to extract'] }}\n\nExtracted Result:\n{{ $json.output.result }}\n\nThank you for using our web scraping automation.",
        "options": {
          "appendAttribution": false
        },
        "subject": "=\u2705 Web Scraping Result for {{ $('Web Scraper form submission').item.json['Source URL'] }}",
        "emailType": "text"
      },
      "credentials": {
        "gmailOAuth2": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 2.1
    },
    {
      "id": "acb22cbf-9f79-49c4-8f5f-05a951b27f9c",
      "name": "Web Scraper form submission",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        280,
        500
      ],
      "parameters": {
        "options": {},
        "formTitle": "Web Scraper Form",
        "formFields": {
          "values": [
            {
              "fieldLabel": "Source URL"
            },
            {
              "fieldLabel": "Data to extract"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "4e206df0-7b3f-4303-a6a4-ddcd73fb6bf9",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -320,
        420
      ],
      "parameters": {
        "color": 4,
        "width": 500,
        "height": 360,
        "content": "## SETUP REQUIRED\n\nWorkflow Configurations:\n- Update the email recipient in the Gmail node (currently set to template_data_extactor_replace_me@yopmail.com)\n- Adjust the JSON schema in the Structured Output Parser if you need different output formats\n- Modify the LLM prompt in the Data Extractor LLM Chain based on your specific extraction requirements\n\nRequired Credentials:\n- Google Gemini API Key (Google PaLM API account)\n- Gmail Credential for sending result emails"
      },
      "typeVersion": 1
    },
    {
      "id": "d2a7d845-47dc-4986-b897-26c56232751f",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -420,
        920
      ],
      "parameters": {
        "color": 4,
        "width": 600,
        "height": 400,
        "content": "## \ud83d\udd0dExtract Specific Website Data with Form Input, Gemini 2.5 flash and Gmail Delivery\n\nWhat This Template Does:\n\n- Provides a web form interface for users to submit scraping requests\n- Accepts any website URL and custom data extraction requirements\n- Fetches HTML content from the specified source URL\n- Uses Google Gemini AI to intelligently extract only the requested information\n- Processes raw HTML content and returns structured JSON results\n- Automatically sends extraction results via Gmail with detailed reporting\n- Handles various data types and formats while maintaining original values unless formatting is requested\n"
      },
      "typeVersion": 1
    },
    {
      "id": "261e3806-c514-4bbc-934f-cbade5550f88",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        220,
        920
      ],
      "parameters": {
        "color": 4,
        "width": 1000,
        "height": 300,
        "content": "## \ud83d\udccb WORKFLOW PROCESS OVERVIEW\n\nStep 1: \ud83d\udcdd Web Scraper Form Submission triggers the workflow when users submit URL and extraction requirements\nStep 2: \ud83c\udf10 Get HTML from Source URL fetches the complete HTML content from the provided website\nStep 3: \ud83d\udd27 HTML Extractor processes the raw HTML and extracts the body content for analysis\nStep 4: \ud83e\udd16 Data Extractor LLM Chain uses Google Gemini AI to analyze content and extract only the specific data requested by the user\nStep 5: \ud83d\udcca Structured Output Parser formats the AI response into clean JSON structure with standardized format\nStep 6: \ud83d\udce7 Gmail Send Result delivers the extraction results via email including:\n  - Original source URL\n  - Data extraction request details  \n  - Clean extracted results\n  - Professional formatting with success confirmation"
      },
      "typeVersion": 1
    },
    {
      "id": "690951dd-5be4-4e54-b86f-669c2ee51de8",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        820,
        320
      ],
      "parameters": {
        "color": 4,
        "width": 400,
        "height": 560,
        "content": "## Data Extractor LLM Chain  \nThis is where we extract the content based on the user request  \n\nConfiguration:  \nYou can update the prompt and the model here to adjust to your use case.  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "1240b386-bbce-4d44-afe9-748783e457ab",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1240,
        340
      ],
      "parameters": {
        "color": 4,
        "width": 260,
        "height": 340,
        "content": "## Gmail - Send Results  \n\nConfiguration:  \nUpdate the target email  \nUpdate the email subject and body  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "ddc340d6-9e3c-4326-90cc-e1b06aa3c752",
      "name": "Google Gemini Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        880,
        700
      ],
      "parameters": {
        "options": {},
        "modelName": "models/gemini-2.5-flash"
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "abb31cf4-2d1a-426b-a42d-821729c15f7b",
      "name": "HTML Extractor",
      "type": "n8n-nodes-base.html",
      "position": [
        660,
        500
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "extractionValues": {
          "values": [
            {
              "key": "body",
              "cssSelector": "body"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "f08e9aaa-344c-4791-8a34-2df6aadcbf83",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1540,
        140
      ],
      "parameters": {
        "color": 4,
        "width": 380,
        "height": 760,
        "content": "# \ud83d\udc4b Hi, I\u2019m Billy\n![My Photo](https://i.ibb.co/Gvn63Bzc/Billy-Christi-AI-Automation.jpg)\nI help businesses build **n8n workflows** & **AI automation projects**.  \nNeed help with n8n or AI Automation projects? \nContact me and let\u2019s build your automation together.\n\n\ud83d\udce9 **Email:** billychartanto@gmail.com  \n\ud83e\udd1d **n8n Creator:** [n8n.io/creators/billy](https://n8n.io/creators/billy/)\n\ud83c\udf10 **My n8n Projects:** [billychristi.com/n8n](https://www.billychristi.com/n8n)  \n\n\n\n---\n\ud83d\udca1 Feel free to get in touch if you\u2019d like help on your next automation project or if you have any feedback or thoughts to share.\n"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "HTML Extractor": {
      "main": [
        [
          {
            "node": "Data Extractor LLM Chain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Data Extractor LLM Chain": {
      "main": [
        [
          {
            "node": "Gmail - Send Result",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get HTML from source url": {
      "main": [
        [
          {
            "node": "HTML Extractor",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Gemini Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Data Extractor LLM Chain",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Structured Output Parser": {
      "ai_outputParser": [
        [
          {
            "node": "Data Extractor LLM Chain",
            "type": "ai_outputParser",
            "index": 0
          }
        ]
      ]
    },
    "Web Scraper form submission": {
      "main": [
        [
          {
            "node": "Get HTML from source url",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}