From Zero to Published: My Daily AI News-to-Blog Pipeline with n8n

How I built a fully automated system that scrapes AI news, asks me what to publish over Telegram, writes the post, generates the image, and pushes it to my CMS — every morning at 8am, without writing a backend.

🧠 Why n8n for This Kind of Pipeline

I've been running a Proxmox homelab for a while — n8n, Ollama, Cloudflare Tunnels, all self-hosted. When I decided I wanted a daily pipeline that monitors AI news and turns the best stories into published blog posts, I had a choice: write a backend, or wire it visually in n8n.

I went visual. And the honest reason isn't laziness — it's debugging speed. Every step in the pipeline is a node. When something breaks at 8am while I'm making coffee, I can open n8n, click the failed node, and see exactly what came in and what went wrong. In a code-based backend, that same investigation might take 20 minutes of log spelunking.

🏠 Homelab context n8n runs as a self-hosted LXC container on Proxmox, exposed externally via a Cloudflare Tunnel. All API keys — Claude, Telegram, image generation, CMS — live as n8n environment variables. The workflow itself contains zero credentials.

The other reason: this pipeline has a human-in-the-loop step. I want to decide what gets published. n8n's Telegram node and webhook handling make that kind of interactive, approval-based flow genuinely easy to build. Doing that in a traditional backend would require a small web app, session state, a UI. In n8n it's three nodes.

🏗️ The Full Architecture at a Glance

Here's the shape of the whole system before we go into each part:

CODE

⏰ Schedule Trigger — 08:00 daily
   │
   ├──────────────────┬──────────────────┐
   ▼                  ▼                  ▼
🤖 Branch A         📰 Branch B         🦾 Branch C
AI Model News       AI Social Impact    Fun Tech / Robots
─────────────       ────────────────    ─────────────────
Scrape sitemaps     Scrape news sites   Scrape tech blogs
Claude/Gemini/OAI   Traditional + X     Gadgets/Hardware
Detect new posts    Filter AI stories   Broad interest
   │                  │                  │
   └──────────────────┴──────────────────┘
                       │
                       ▼
          📊 Google Sheets — append all stories
                       │
                       ▼
          ✉️  Telegram → send digest to me
          ← I reply: "all"  or  "1,3,5"
                       │
                       ▼
          🔁 Loop — for each selected story:
              🤖 Claude     → write blog post (JSON)
              🎨 Image API  → generate featured image
              🚀 CMS API    → publish post
              ✉️  Telegram  → notify me: post is live
              ↺  repeat until all stories published

The key principle: split → merge → decide → loop. Three independent scrapers feed one stream, I approve what gets published, then the workflow fans out into a sequential publishing loop. The visual canvas makes every branch legible at a glance — no mental model required.

🔍 Three Searches, One Brain

The pipeline runs three scraping branches in parallel. Each targets different content, and the scraping strategy for each is deliberately different.

🤖 Branch A — AI Model News

Rather than scraping news aggregators for secondhand coverage, I go directly to the source — the official blogs of Anthropic, Google DeepMind, and OpenAI.

The trick is sitemap monitoring. Each company publishes an XML sitemap. I fetch it, extract every URL, filter for paths that match blog/news/research/announcements patterns, and diff against a list of known URLs stored in Google Sheets. If a URL is new, I pull the page content and pass it downstream.

This means I catch a new model announcement within hours of it going live — before it hits the tech press.

JAVASCRIPT

// n8n Code Node — sitemap diff checker
// Runs for each AI company: openai.com, anthropic.com, deepmind.google

const sitemapXml = $input.first().json.data; // raw sitemap XML response
const company    = $input.first().json.company; // passed from Set node

// ─── Parse sitemap URLs ──────────────────────────────────────
const urlMatches = [...sitemapXml.matchAll(/<loc>(.*?)<\/loc>/g)];
const allUrls    = urlMatches.map(m => m[1]);

// ─── Filter to likely blog/news posts ───────────────────────
const blogPatterns = [/\/blog\//, /\/news\//, /\/research\//,
                      /\/updates\//, /\/announcements\//];

const blogUrls = allUrls.filter(url =>
  blogPatterns.some(p => p.test(url))
);

// ─── Compare against known URLs stored in Google Sheets ─────
const knownUrls = $node["Load Known URLs"].json.urls ?? [];
const newUrls   = blogUrls.filter(u => !knownUrls.includes(u));

if (newUrls.length === 0) {
  return []; // nothing new — n8n skips downstream nodes
}

return newUrls.map(url => ({ json: { url, company, source: "sitemap" } }));

💡 Why sitemap monitoring beats news aggregators Google News and TechCrunch typically lag the source by 2–6 hours. By monitoring the official sitemaps directly, I see new Anthropic or OpenAI posts the moment they go live. For a content pipeline, that edge matters.

📰 Branch B — AI Social Impact

This branch scrapes traditional news sources and X for stories about AI's broader social impact — jobs, regulation, ethics, economics. The intent is editorial: what is the world actually making of all this?

I use a mix of RSS feeds from publications that cover AI seriously, and a filtered X search for high-signal accounts. Claude scores each article for relevance before it hits the merge node — anything below a 6/10 relevance score gets dropped.

🦾 Branch C — Fun Tech & Robots

The lightest branch. Robots, weird gadgets, AI-powered hardware, things that make people go "that's wild". The goal is one or two genuinely interesting posts per week that aren't just another LLM announcement.

Less strict filtering — if it's vaguely AI-adjacent and interesting, it passes through.

✉️ Telegram: The Human-in-the-Loop Layer

This is the part of the workflow I'm most pleased with. Fully automated content pipelines that publish without human review make me nervous — especially for AI-generated blog posts. I want a checkpoint. But I didn't want to build a dashboard or a review UI.

Telegram solved it.

After the three branches merge and stories are written to Google Sheets, the workflow sends me a digest message listing each story with a number. I reply either all to publish everything, or 1,3,5 to pick specific stories. A Telegram webhook node catches the reply and routes it back into the workflow.

✉️ Why Telegram and not email Email introduces a context switch — I'd likely batch-review it later and lose the moment. Telegram pings my phone, I glance at the list while still in morning mode, reply in five seconds, and the pipeline continues. The whole approval flow takes less time than opening a web app.

JAVASCRIPT

// n8n Code Node — parse Telegram reply into story list
// Handles "all" or comma-separated indices like "1,3,5"

const reply      = $input.first().json.message.text.trim().toLowerCase();
const allStories = $node["Merged Stories"].json.stories;

let selected;

if (reply === "all") {
  selected = allStories;
} else {
  // Parse "1,3,5" → pick those indices (1-based for human readability)
  const indices = reply
    .split(",")
    .map(s => parseInt(s.trim(), 10) - 1)
    .filter(i => i >= 0 && i < allStories.length);

  selected = indices.map(i => allStories[i]);
}

if (selected.length === 0) {
  throw new Error("No valid stories selected from reply: " + reply);
}

// Output each story as a separate item for the loop
return selected.map(story => ({ json: story }));

The output of this node is an array of selected story objects — one item per story I approved. n8n's Split In Batches node then feeds them one at a time into the publishing loop.

🔁 The Content Loop: Blog → Image → Published

For each approved story, the workflow runs four steps in sequence. Then it moves to the next story and repeats.

Step 1 — 🤖 Blog Generation

Claude receives the raw story data and returns a structured JSON object containing the full post: title, slug, intro, HTML body, meta description, and — critically — an image generation prompt. The post and the image prompt are produced in the same call so they're coherent with each other.

JSON

{
  "system": "You are a tech blogger. Write in a direct, opinionated voice. Return ONLY valid JSON. No markdown fences. No prose before or after.",

  "user": "Write a blog post based on this story. Return JSON with these exact keys:\n  title        — punchy headline\n  slug         — url-safe slug, hyphens only\n  intro        — 2-sentence hook paragraph\n  body         — full HTML body using <h2>, <p>, <ul> tags\n  metaDesc     — 160-char SEO meta description\n  imagePrompt  — vivid scene description for image generation\n\nStory:\n{{ JSON.stringify($json) }}"
}

Step 2 — 🎨 Image Generation

The imagePrompt field from Claude's output gets passed to an image generation API. The result is an image URL that gets attached to the post object for the next step.

JSON

{
  "prompt": "{{ $json.imagePrompt }}, tech blog illustration, flat design, no text",
  "size": "1200x630",
  "style": "digital art"
}

Step 3 — 🚀 CMS Publish

A single HTTP Request node posts the assembled content to the CMS REST API. Everything mapped from the validated JSON.

JSON

{
  "title":         "{{ $json.title }}",
  "slug":          "{{ $json.slug }}",
  "content":       "{{ $json.body }}",
  "excerpt":       "{{ $json.metaDesc }}",
  "featuredImage": "{{ $json.imageUrl }}",
  "status":        "published"
}

Step 4 — ✉️ Telegram Confirmation

Once the CMS API returns a success response, a Telegram message fires with the post title and live URL. I get a ping for every publish. If I don't get a ping, I know something failed.

🔁 The loop pattern in n8n Use Split In Batches with batch size 1 for sequential loops. Each iteration processes one story fully before the next starts. This keeps Telegram notifications in order and isolates failures cleanly — if story 3 fails, stories 1, 2, 4, and 5 still publish.

🔥 When Things Broke: The JSON Problem

This section is the honest one.

The first production run looked perfect in testing. Real-world articles were different. About 20% of Claude's responses came back malformed — usually JSON wrapped in markdown code fences, but sometimes valid JSON with invented or missing field values.

The Failure Modes

Status	Failure	What Happened
🔴	Markdown fences	Claude returned ```json\n{...}``` — the backticks made `JSON.parse()` throw
🟡	Hallucinated slugs	Slugs with spaces, slashes, or emoji — valid JSON, broken URLs
🟡	Bad image prompts	Claude referenced real people or brand logos — unusable with image APIs
🔴	Missing fields	`metaDesc` or `intro` omitted on low-quality source articles — CMS returned a silent 422

💡 The core insight Claude wraps output in markdown fences because it's trained to be helpful to humans reading a chat. In an API pipeline, that helpfulness is a bug. The fix is explicit: "Return ONLY valid JSON. No markdown fences. No prose before or after. No comments inside the JSON." Put it in the system prompt, not the user prompt — it needs to be a standing instruction.

The most dangerous failure was the missing fields one. The Code Node passed, the image generated fine, and then the CMS POST returned a 422 with a vague error. I only caught it by adding a status code check to the HTTP Request node and wiring the error path to Telegram.

The Fix — JSON Sanitiser Code Node

Drop this immediately after every Claude AI node in the pipeline:

JAVASCRIPT

// n8n Code Node — JSON fence stripper + field validator
// Place after EVERY Claude AI node, no exceptions

const raw = $input.first().json.message?.content?.[0]?.text
         ?? $input.first().json.claudeOutput
         ?? "";

// ─── 1. Strip markdown fences ───────────────────────────────
let cleaned = raw
  .replace(/^```json\s*/i, "")
  .replace(/^```\s*/i, "")
  .replace(/```\s*$/g, "")
  .trim();

// ─── 2. Parse ───────────────────────────────────────────────
let parsed;
try {
  parsed = JSON.parse(cleaned);
} catch (e) {
  throw new Error(`JSON parse failed: ${e.message}\n\n${cleaned.slice(0, 400)}`);
}

// ─── 3. Enforce required fields ─────────────────────────────
const required = ["title", "slug", "intro", "body", "metaDesc", "imagePrompt"];
const missing  = required.filter(k => !parsed[k]);
if (missing.length) {
  throw new Error(`Missing required fields: ${missing.join(", ")}`);
}

// ─── 4. Sanitise slug ───────────────────────────────────────
parsed.slug = parsed.slug
  .toLowerCase()
  .replace(/[^a-z0-9-]/g, "-")
  .replace(/-+/g, "-")
  .replace(/^-|-$/g, "");

return [{ json: parsed }];

After adding this node plus the explicit system prompt instruction, bad runs dropped from ~20% to under 2%. The remaining failures are almost always genuinely empty source articles — they route to a dead-letter tab in Google Sheets for manual review.

📐 Patterns I Now Build Into Every Workflow

After several months running this in production, here's the checklist I apply to every new n8n AI workflow:

⚙️ Drop a JSON sanitiser Code Node after every Claude AI node — always, no exceptions
🧠 Put output constraints in the system prompt, not the user prompt — they need to be standing instructions
✉️ Use Telegram webhooks for human-in-the-loop checkpoints — faster than email, simpler than building a UI
🔀 Dead-letter every error path — failed runs write to a Google Sheets tab, nothing disappears silently
🔁 Use Split In Batches (size 1) for sequential loops — easier to debug, isolates failures cleanly
📊 Google Sheets for storage — boring, accessible, zero infrastructure, opens on your phone
🔒 All credentials as n8n environment variables — the workflow JSON contains zero secrets
📣 Wire Telegram to every error path — if a publish fails, you hear about it immediately

The deepest lesson from building this: the visual canvas forces honesty. Every error path has to go somewhere visible. Every data transformation is a discrete node you can inspect. You can't accidentally hide complexity in a utility function you'll forget about in six months.

If you're sitting on a backlog of "I should automate that" ideas and you've been putting them off because spinning up a backend feels heavy — n8n is worth a serious look. The JSON fragility is real, but it's solvable. Once you've built the sanitiser pattern and the Telegram approval loop once, they take about 10 minutes to drop into any new workflow.

Brendan Tack is a Product Manager and AI Strategist. He runs Valdris Consulting — a GTM engineering consultancy for SMBs — and writes about building AI systems that actually work in production.

I Built a Full AI Content Pipeline With n8n and Never Wrote a Line of Code

From Zero to Published: My Daily AI News-to-Blog Pipeline with n8n

🧠 Why n8n for This Kind of Pipeline

🏗️ The Full Architecture at a Glance

🔍 Three Searches, One Brain

🤖 Branch A — AI Model News

📰 Branch B — AI Social Impact

🦾 Branch C — Fun Tech & Robots

✉️ Telegram: The Human-in-the-Loop Layer

🔁 The Content Loop: Blog → Image → Published

Step 1 — 🤖 Blog Generation

Step 2 — 🎨 Image Generation

Step 3 — 🚀 CMS Publish

Step 4 — ✉️ Telegram Confirmation

🔥 When Things Broke: The JSON Problem

The Failure Modes

The Fix — JSON Sanitiser Code Node

📐 Patterns I Now Build Into Every Workflow

Want to talk about your business?