Agent Skills – About Things | A Hans Scharler Blog

Create the “Best Agent Skill” with SkillOpt from Microsoft Research

Hans Scharler — Fri, 19 Jun 2026 01:15:49 +0000

For two years the move was simple: pick a better model. Then the models got good, and stayed good, and the gaps between them got boring. So the interesting lever is the other thing you hand the agent now. The skill file.

Microsoft Research just shipped a tool called SkillOpt that takes that idea literally.

SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.

It treats the skill markdown you hand an agent (the instructions, the system prompt, the SKILL.md) as trainable state. It runs epochs. It has a batch size. It has a learning rate. It just never touches the model weights. The weights stay frozen behind the API. The thing that gets trained is the text.

The loop is short. Run the agent on a batch of tasks. Score each one. A second model reads the failures and proposes small edits to the skill file. Keep an edit only if it improves a held-out score. Repeat. What you get at the end is a best_skill.md you drop in front of the same unchanged model.

Microsoft Research SkillOpt

I wanted to see it actually work, so I gave it a job…

The setup

I had OpenAI Codex do the writing. The task: take a paragraph stuffed with AI slop and clean it up. The skill it started from was short:

Improve the passage so it reads a little better. Keep the meaning and roughly the length.

No mention of slop. No list of banned words. Nothing to go on.

The verifier was the part that mattered. I wrote a dumb little counter that scans the output for the tells (the canned phrases, the em-dashes) and returns how many are left. Fewer is better. That is the whole eval. Objective, cheap, no judgment calls.

Then I let SkillOpt run.

What it wrote

The seed turned into a thirty-line deslop skill. SkillOpt wrote it.

It worked out, on its own, that “remove AI slop” is a removal constraint and not a tone nudge. It named the exact phrases that kept leaking through (“let’s dive in,” “have you ever wondered,” “it’s worth noting,” “furthermore,” “in conclusion,” “great question”). It flagged em-dashes. And it caught the sneaky failure mode I never told it about: the model dodging a banned word by swapping in a synonym. That one became its own rule, the last line of the file:

Before answering, do a quick scan to ensure none of the original flagged words remain, including close synonyms you may have introduced.

The score went from passing half the held-out passages to passing all of them. The skill it wrote is one I would actually keep.

The part worth keeping

The optimizer is the easy part.

I went in assuming the clever bit was the model proposing edits. It isn’t. The clever bit is the verifier. Give SkillOpt a score and nothing else and it does nothing. I tried that first. It saw the failures, shrugged, and changed not a single line, because “you got a 0.6” tells it nothing about what to fix. The run that worked was the one where the verifier also said which words leaked. Same model, same loop. The difference was the signal.

So the rule of thumb is less “use the self-improving optimizer” and more: can you score this cheaply, and can you tell it why it failed. If yes, the skill mostly writes itself. If no, there is nothing to train and the fancy loop sits there idle.

That is also the catch, and it is the same catch hiding under every self-improving-agent demo. A slop counter is a clean verifier. Most of what you do in a day is not. “Was this brief any good.” “Did this post land.” No cheap score, no training. The optimizer was never the bottleneck. The verifier is.

So build the verifier first. Get that right and the skill mostly writes itself.

Bonus: a taste of running it

The repo is github.com/microsoft/SkillOpt. MIT licensed, Python.

Get it:

pip install skillopt
# or, to poke at the internals:
git clone https://github.com/microsoft/SkillOpt && cd SkillOpt && pip install -e .

The mental model is one small folder per task. A loader that hands over your examples, a rollout that runs your agent and scores it, and a seed skill to start from. SkillOpt’s whole job is to grow that seed.

The config reads like a training run, on purpose:

train:
  num_epochs: 4
  batch_size: 40
optimizer:
  learning_rate: 4        # max edits to the skill per step
  lr_scheduler: cosine
evaluation:
  use_gate: true          # keep an edit only if it beats the held-out score
model:
  optimizer: gpt-5.4      # the model that proposes the edits

Then it is one command:

python scripts/train.py --config configs/yourtask/default.yaml

The only part that is really on you is the scoring. Your rollout returns, per task, a pass/fail and a number between 0 and 1:

return {"id": task_id, "hard": passed, "soft": fraction_correct}

That is the whole contract. Give it tasks, a way to score them, and a skill to start from. It runs the epochs.

One side note: keep the tasks hard enough that the agent fails some of them. If it aces everything on the seed skill, there is nothing to learn and SkillOpt politely does nothing. Ask me how I know.

What I Learned from Garry Tan and gstack

Hans Scharler — Sat, 06 Jun 2026 15:28:44 +0000

Garry Tan, the guy who runs Y Combinator, just open-sourced a toolkit called gstack. It runs on top of Claude Code and turns it into a virtual engineering team. Twenty-three specialized skills, eight power tools, each one playing a role you’d normally have to hire for. A CEO that rethinks your product. An engineering manager that locks down the architecture. A designer that catches AI slop. A QA lead that drives a real browser and files bugs against you.

I read through it expecting to roll my eyes. Instead I sat there nodding, because I’d accidentally built a tiny version of the same thing for this blog.

The pitch

gstack’s whole argument is that solo builders don’t lose to big teams because they’re worse engineers. They lose because they skip process. One person at a keyboard, prompting Claude into existence one vibe at a time, eventually ships something that works. Until it doesn’t. No review. No second opinion. No one asking “wait, should we even build this?”

So gstack bakes the process in. Every skill is a slash command, and each command plays a role in a pipeline:

Think → Plan → Build → Review → Test → Ship → Reflect

gsta

You start a project with /office-hours, which hits you with six forcing questions before you write a line of code. Then /plan-ceo-review challenges your scope. /plan-eng-review draws the data-flow diagrams and enumerates the edge cases. You build. /review does a staff-engineer pass and auto-fixes the obvious stuff. /qa opens an actual Chromium browser, clicks through your app, finds bugs, and writes a regression test for each one it fixes. /ship runs the suite and opens the PR.

Each stage feeds the next. The CEO’s decisions constrain the engineer. The engineer’s plan constrains the build. Nothing happens in a vacuum. That’s exactly the part solo work gets wrong.

There’s a whole security wing too. A /cso skill that runs OWASP Top 10 and STRIDE threat modeling. A sidebar Claude that watches for prompt injection. A /careful mode that warns you before rm -rf ruins your afternoon. It’s a lot. The point isn’t that you use all of it. The point is that the roles exist, named and on call.

Why this looked familiar

Here’s the thing. I don’t ship code through a blog. I ship words. But the moment I saw that pipeline I recognized it, because the posts you’re reading go through the same kind of relay.

When I write here, an idea doesn’t go straight from my head to WordPress. It moves through stages, and each stage has one job:

Research does the digging and writes up a structured report. Pitch, key facts, angle, the competition check.
Refine tightens the structure and pacing without touching the voice.
De-slop strips out the AI tells. The em-dashes, the “it’s not just X, it’s Y,” the forced enthusiasm, the closer that tries to sound profound.
Voice rewrites it to sound like me. Short sentences. Plain words. No throat-clearing.
Format turns it into clean WordPress blocks.

Each one hands off to the next. Same shape as gstack. A CEO who reframes the product is doing what my research stage does for a post. A designer who catches AI slop is, almost word for word, my de-slop pass. Garry Tan built an org chart for shipping software. I built a smaller one for shipping writing, and I didn’t even notice the resemblance until his showed up.

I’m not claiming I invented anything. The idea is in the water. Once you spend enough time working alongside an agent, you stop treating it like one genius who does everything and start treating it like a team you have to manage. You give each role a narrow job and a clean handoff. That’s not a trick. It’s just how good teams have always worked, now running in a terminal.

The number everyone’s going to argue about

gstack ships with a doc making a bold claim: Garry measured his own output and found he’s writing about 810 times more code per day than he did back in 2013 when he was coding part-time around a day job. Eleven thousand logical lines a day across his first 108 days.

Your skepticism alarm should be going off. So was his. He concedes up front that lines of code is a garbage quality metric. His own line: measuring programming progress by lines of code is like measuring aircraft-building progress by weight. So he deflates the number hard. Strips comments and blanks, applies a 2x penalty for AI being verbose and defensive, and lands at something like 5,700 lines a day. Still a wild figure. Still self-reported, so take it with salt.

But the interesting claim isn’t the speed. It’s the quality holding steady while the speed goes up. A 2 percent revert rate, right in line with the open-source baseline. Two thousand-plus automated tests with CI. A 95 percent success rate across 305,000 skill runs. His actual thesis is buried under the headline number:

“Testing at multiple levels is what makes AI-assisted coding actually work.”

That one I believe without reservation, because it’s the same thing the pipeline is really about. The process isn’t there to make the agent faster. It’s there to catch the agent when it’s confidently wrong, which it will be, several times a day. Every stage is a checkpoint, a chance for something to get caught before it ships. The payoff isn’t speed, it’s fewer mistakes reaching the end.

What I did with it

I’m not installing the whole thing. gstack is TypeScript and Bun and Playwright, built for shipping web apps, and Windows is a second-class citizen in it. Not my stack. Not my use case.

So I stole the front of it instead. The skill I keep coming back to is /office-hours: six forcing questions you have to answer before you’re allowed to build anything. Who is this for. What’s the sharpest version of it. What are you quietly avoiding. It’s a CEO review for an idea that doesn’t exist yet.

That maps straight onto the worst part of blogging, which is deciding what’s even worth writing. So I pointed it at this blog. Before a topic earns a draft, I run it through the same kind of interrogation in Claude Code. What’s the one sentence. Who already wrote it better. What’s the angle only I have. Why now. Most ideas die right there, which is the point. The ones that live show up to the draft stage already knowing what they are.

This post is one of them. gstack was the raw idea. The forcing questions are what turned “gstack exists, that’s neat” into something with an angle worth your time. The tool I borrowed from is the tool that helped me write about borrowing from it.

So here’s what you should know today. gstack is out there, it’s free, and even if you never run a line of it, the shape is worth stealing. Garry Tan built that discipline for code. I run a lighter version of it on words. You can point it at whatever you make. The pipeline is the product. The agent’s just the labor.

Claude Code and Agent Skills for Electron App Development: Your Desktop App Just Got a Cheat Code

Hans Scharler — Mon, 02 Mar 2026 00:14:19 +0000

I’ve been thinking about Compound Engineering a lot lately. This is the idea that every project should make the next one easier. And right now, there’s no better example of that than what’s happening with Claude Code, Agent Skills, and Electron app development.

Here’s the irony that got me started down this rabbit hole. Anthropic’s own Claude desktop app? It’s an Electron app. Boris Cherny from the Claude Code team confirmed it on Hacker News. The framework that everyone loves to hate is still the pragmatic choice. That tension tells you something important about where we actually are with AI-assisted development.

The Groundhog Day Problem (Electron Edition)

Every Electron project starts the same way. You configure BrowserWindow with contextIsolation: true and nodeIntegration: false. You write a preload script with contextBridge.exposeInMainWorld. You set up IPC channels. You configure Content Security Policy headers. You wrestle with electron-builder.yml. You set up code signing. You do this from memory, or you copy-paste from your last project, or you spend an hour on Stack Overflow re-finding the patterns you already know.

I called this the Groundhog Day Problem in my Compound Engineering post. Sixty to eighty percent of what you do on a new project, you’ve already done before. And yet, every time, you start from scratch.

Agent Skills fix this. Not like templates — templates are dead things. Skills are living context that Claude Code loads on demand when it recognizes you’re doing Electron work.

What Are Agent Skills? (The 60-Second Version)

If you haven’t been following the Agent Skills story, here’s the short version.

A skill is a folder with a SKILL.md file. It contains YAML frontmatter (name, description) and markdown instructions that Claude follows when the skill activates. Anthropic released Agent Skills as an open standard in December 2025, and it’s been adopted by over 26 platforms — not just Claude Code, but also OpenAI Codex, Gemini CLI, GitHub Copilot, Cursor, VS Code, and more.

The key design principle is progressive disclosure. Only the skill’s name and description load at startup — roughly 30 to 50 tokens per skill. The full SKILL.md loads only when triggered. Reference files and scripts load only when needed during execution. This means you can have dozens of skills installed without bloating your context window.

Think of it like an onboarding guide for a new team member — except the new team member is an AI agent that reads and follows instructions instantly.

The Electron Skill Stack

Here’s where it gets practical. There’s already a growing ecosystem of skills and subagents specifically for Electron development. Let’s walk through the ones worth knowing about — and how to install each one.

1. electron-scaffold

What it does: Scaffolds production-ready Electron apps with security hardening baked in from the start. It handles the architecture decisions (Electron Forge vs. Vite vs. electron-builder), sets up proper IPC patterns with contextBridge, configures CSP headers, enables context isolation, sets up auto-updates, integrates native menus, and generates the full project structure with TypeScript support.

Why it matters: This is the security-first scaffolding that most tutorials skip. It encodes the difference between a toy Electron app and one that’s ready for distribution.

How to install:

Using the Vercel skills CLI (works across Claude Code, Codex, Cursor, and others):

npx skills add chrisvoncsefalvay/claude-skills --skill electron-scaffold

Or manually: download from the claude-plugins.dev listing, extract the ZIP, and drop the folder into ~/.claude/skills/.

For Claude.ai users, go to claude.ai/settings/capabilities, find the Skills section, and upload the downloaded ZIP.

2. electron-pro (Subagent)

What it does: This isn’t a skill — it’s a full subagent. Think of it as a senior Electron developer persona with deep expertise in Electron 27+ and native OS integrations. It follows a phased approach: understanding your requirements, designing secure architecture, implementing with a full security checklist (context isolation, CSP, IPC validation, code signing), and packaging for multi-platform distribution.

Why it matters: It’s the difference between asking Claude to “make an Electron app” and having a dedicated Electron specialist with a checklist that covers everything from memory budgets to auto-update rollback strategies.

How to install:

Download the subagent file directly from VoltAgent’s repository and save it to your agents directory:

mkdir -p ~/.claude/agents
curl -o ~/.claude/agents/electron-pro.md \
  https://raw.githubusercontent.com/VoltAgent/awesome-claude-code-subagents/main/categories/01-core-development/electron-pro.md

Or use the built-in agent installer in Claude Code by typing /agents and creating a new agent from the file.

3. Full-Stack Electron Skill (partme-ai)

What it does: A comprehensive Electron reference skill organized to mirror the official Electron documentation structure. It covers everything: main process, renderer process, IPC communication, BrowserWindow management, menus, tray icons, native integrations, packaging with ASAR, electron-builder configuration, code signing, auto-updates, debugging, memory profiling, crash reporting, and security best practices including sandboxing and CSP.

Why it matters: This is the one that turns Claude Code into something like having the entire Electron docs loaded as contextual intelligence. Instead of searching docs, Claude just knows the right patterns.

How to install:

Via the Vercel skills CLI:

npx skills add partme-ai/full-stack-skills --skill electron

Via LobeHub:

mkdir -p ~/.claude/skills/partme-ai-full-stack-skills-electron && \
curl -fsSL "https://market.lobehub.com/api/v1/skills/partme-ai-full-stack-skills-electron/download" \
  -o /tmp/electron-skill.zip && \
unzip -o /tmp/electron-skill.zip \
  -d ~/.claude/skills/partme-ai-full-stack-skills-electron

4. Electron’s Own CLAUDE.md

What it does: The Electron framework itself ships a CLAUDE.md in its repository. This teaches Claude Code the Electron project’s structure — where the C++ shell code lives, how TypeScript implementations map to API modules, how to work with the 159+ Chromium patches and 48+ Node.js patches, and the build workflow using @electron/build-tools. It even includes a dedicated “Electron Chromium Upgrade” skill for Chromium version bumps.

Why it matters: This is a real-world example of a major open source project using CLAUDE.md to encode institutional knowledge. If you’re contributing to Electron itself, or if you want inspiration for structuring your own project’s CLAUDE.md, this is the gold standard.

How to access: No installation needed — it’s in the Electron repo. But the pattern is what matters. Your own Electron app should have a CLAUDE.md at the project root that teaches Claude Code about your specific architecture, IPC channel naming conventions, and build setup.

5. Electron FSD + React 19

What it does: A specialized skill for building Electron apps using Feature-Sliced Design architecture combined with React 19 patterns. It enforces a clean separation of concerns across the three-process model (Main, Preload, Renderer) while implementing strict FSD layer responsibilities. Covers modern React patterns like the use() hook and useActionState.

Why it matters: If your Electron app is a React app (and let’s be honest, a lot of them are), this skill bridges the gap between “generic Electron best practices” and “how to actually structure a complex React-based desktop application.”

How to install:

Available on MCPMarket. Download the skill ZIP and extract it:

mkdir -p ~/.claude/skills/electron-fsd-development
# Extract the downloaded ZIP into the directory above

Or upload it directly as a skill in Claude.ai settings.

Building Your Own Electron Skills

The pre-built skills get you started, but the real compounding happens when you build your own. Here’s the thing — you already have the knowledge. It’s just locked in your head.

That IPC channel naming convention you use across every project? That’s a skill. Your electron-builder.yml that took you a weekend to get right? That’s a skill. The way you structure preload scripts for your team? Skill.

Here’s what a simple custom Electron skill looks like:

---
name: my-electron-conventions
description: Project conventions for Electron IPC channels,
  preload patterns, and build configuration. Use when creating
  new IPC handlers, preload scripts, or modifying build config.
---

# Electron Project Conventions

## IPC Channel Naming
- Use colon-separated namespaces: `app:get-version`, `file:open`
- Prefix with `dialog:` for user-facing dialogs
- Prefix with `store:` for persistent data operations

## Preload Script Pattern
- One preload file per window type
- Always use `contextBridge.exposeInMainWorld`
- Never expose raw `ipcRenderer`

## Build Configuration
- Target: DMG for macOS, NSIS for Windows, AppImage for Linux
- Always enable `hardenedRuntime` on macOS
- Auto-updater points to GitHub Releases

Save that to ~/.claude/skills/my-electron-conventions/SKILL.md and it’s active globally across all your projects. Or put it in your project’s .claude/skills/ directory to scope it to one repo.

Since this follows the Agent Skills open standard, it also works in Codex, Cursor, Gemini CLI, and anywhere else that supports the spec.

What Happens When You Actually Use Them

Stephan Miller documented building an Electron writing app from scratch with Claude Code — 16 hours and $80 in API costs. His biggest lesson? Planning saves time. He had to stop and refactor his CLAUDE.md because the project outgrew his initial architecture.

Skills encode that planning. They front-load the decisions so you don’t have to make them again. With the Electron skills loaded, Claude Code doesn’t just generate code — it generates correct code with context isolation enabled, CSP headers configured, proper IPC patterns, and a project structure that scales.

This is the compound engineering flywheel in action. Project 1, you build everything from scratch and learn the hard way. By project 3, your skills are doing the heavy lifting. By project 5, you describe what you want and the system drafts the first 70% with security baked in. You refine, you polish, you add the creative spark.

The Meta Question: Should AI Kill Electron?

Drew Breunig wrote a post asking why Anthropic doesn’t use Claude to build a native desktop app instead. If coding agents are so good, why not generate native apps for each platform from a spec and test suite?

The answer is pragmatic. Agents excel at the first 90% of development, but that last 10% — edge cases, real-world testing, ongoing support — is still hard. And with three different native apps, your bug surface area triples. Electron still makes sense for most teams.

But here’s what skills change about the equation: they make Electron better. The security hardening that would normally be forgotten? A skill remembers it. The IPC patterns that would normally be sloppy? A skill enforces them. The packaging configuration that would normally be a weekend of trial and error? A skill has it pre-encoded.

Agent Skills don’t make Electron obsolete. They make Electron apps that feel like they were built by a team that actually cares about security and native integration.

Start the Flywheel

Here’s your homework. This week, install one of the Electron skills I listed above. Or better yet, write one. Take that electron-builder.yml you’ve tweaked fifty times. That preload script pattern you copy from project to project. That IPC naming convention that lives in your team’s heads.

Codify it. Make it a SKILL.md. Drop it in ~/.claude/skills/. Watch what happens on the next project.

If you want to get started quickly, here are all the install commands in one place:

# electron-scaffold (security-first scaffolding)
npx skills add chrisvoncsefalvay/claude-skills --skill electron-scaffold

# Full-Stack Electron reference (partme-ai)
npx skills add partme-ai/full-stack-skills --skill electron

# electron-pro subagent
mkdir -p ~/.claude/agents && curl -o ~/.claude/agents/electron-pro.md \
  https://raw.githubusercontent.com/VoltAgent/awesome-claude-code-subagents/main/categories/01-core-development/electron-pro.md

# Your own custom skill
mkdir -p ~/.claude/skills/my-electron-conventions
# Then create SKILL.md with your conventions