AI Tools

The 'Thought-to-Draft' Stack: Why Karpathy's Voice Workflow is a Cheat Code for Creators

Brendan Tack Brendan Tack · · 4 min read
The 'Thought-to-Draft' Stack: Why Karpathy's Voice Workflow is a Cheat Code for Creators

A few weeks ago, Andrej Karpathy casually dropped a detail about his writing stack that completely changed how I think about content creation.

The guy who literally built Tesla's Autopilot and directed AI at OpenAI isn't hunched over a keyboard agonizing over his first drafts. He's using SuperWhisper to dictate his thoughts, and then dropping that raw text into an AI composer to structure it.

He essentially built a pipeline that removes the keyboard from the thinking process.

If you are a founder or creator who constantly says, "I have so many ideas, but I just don't have time to write," this is massive. You don't have a time problem. You have a friction problem.

The Blank Page Bottleneck

When you sit down to write, you are actually trying to do three completely different things at the exact same time:

  1. Ideation (coming up with the point)
  2. Drafting (getting the words out)
  3. Editing (making it sound smart)

Doing all three simultaneously is exhausting. It's why you stare at a blinking cursor for twenty minutes, type a sentence, delete it, and then close the tab to check Twitter.

Here is the brutal truth about writing online: Your brain moves at 400 words per minute. Your fingers move at 60. That gap is where your best ideas die.

Karpathy's stack solves this by unbundling the process. You speak to capture the idea. The AI handles the drafting and structuring. You step in at the very end to edit and polish.

Building the 'Thought-to-Draft' Stack

You don't need to be an AI researcher to set this up. The tools are off-the-shelf and incredibly cheap.

Here is how you build the exact pipeline:

1. The Capture Layer (SuperWhisper) SuperWhisper is a macOS menu bar app that runs locally on your machine. You hit a hotkey, start talking, and hit the hotkey again. It uses OpenAI's Whisper model to instantly transcribe your audio with perfect punctuation and copies it to your clipboard. It doesn't matter if you say "um," lose your train of thought, or repeat yourself. Just ramble.

2. The Formatting Layer (Claude / Custom Agent) This is where the magic happens. You don't just paste that raw transcript into a CMS. You pass it through a specific prompt designed to clean the formatting without killing your voice.

If you want to fully automate this, you can set up a simple n8n webhook that takes your raw text and pings the Anthropic API. Here is the exact payload structure I use:

JAVASCRIPT
// n8n Code Node: Prepares the SuperWhisper raw text for the Claude API
// Trigger this via a macOS Shortcut that grabs your clipboard
const rawTranscript = $input.item.json.body.text;

const systemPrompt = `You are an expert editor. 
Your job is to take the following raw, spoken transcript and turn it into a structured first draft.
RULES:
1. Keep my exact tone, opinions, and vocabulary.
2. Break the text into short, punchy paragraphs.
3. Add markdown headings (##) for logical section breaks.
4. Do NOT add an introduction or conclusion if I didn't speak one.
5. Fix verbal tics, but keep the conversational edge. Do not use AI jargon like "delve" or "tapestry".`;

return {
  json: {
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: rawTranscript }
    ]
  }
};

The most important part of this code is the negative constraints in the system prompt. By explicitly banning words like "delve" and forbidding the AI from writing its own introductions, you force the model to act as a formatter rather than a ghostwriter.

Why This Playbook Works

Once you adopt this workflow, a few things happen almost immediately.

Motion creates better ideas. You aren't chained to a desk. You can put your AirPods in, go for a 20-minute walk, and just talk out a concept. Walking naturally lowers your filter. You end up capturing the raw, opinionated takes you'd normally share over a beer, rather than the sanitized corporate speak you type into Google Docs.

You stop over-editing. Because you are speaking, you can't hit backspace. You are forced to push through the idea until it's finished. The AI deals with the messy syntax later.

Your output skyrockets. A 10-minute ramble translates to roughly 1,500 words. Even if 80% of that is garbage, you are left with a 300-word core insight that is already structured, formatted, and ready for your final polish. You just turned a two-hour writing block into a 15-minute editing session.

The Final Polish

The goal of the Thought-to-Draft stack isn't to let AI write your content. It's to let AI get you to the 80% mark instantly.

When you get the draft back from the agent, you still need to get your hands dirty. Punch up the hook. Add a specific example. Make sure it actually sounds like you. But editing a structured draft is infinitely easier than conjuring one from nothing.

Stop typing your first drafts. Tomorrow, when you have an idea for a thread or a newsletter, don't open a blank document. Put your shoes on, go outside, and just start talking.

Want to talk about your business?

Book a free Reverse Demo — we'll show you what your operation could look like with the right automations in place.

Book a Reverse Demo