A few weeks ago I realized something obvious. I'm paying for Claude Code and Codex subscriptions. But when I go to bed, those subscriptions sit completely idle. Eight hours of nothing. Every night.

So I built OvernightCoder. You give it a to-do list, go to sleep, and wake up to pull requests. It's been running on my projects for a while now, and it's changed how I think about backlogs entirely.

How It Works

The idea is simple. You have Claude Code create a to-do file for itself based on what needs to happen next in your project. Then you point it at that file with the overnight-coder skill. It asks you two questions: do you want PRs merged automatically or left for review, and do you want tasks run one at a time or in parallel. Then you go to sleep.

For each task, it creates an isolated branch in a separate git worktree so nothing conflicts. It writes tests first, then writes the code to make them pass. Then it sends the code to Codex for review. Codex pushes back, Claude fixes things, and they go back and forth up to nine review passes. Once the code is clean, it opens a pull request and moves on to the next task.

In the morning, you get a summary of what got done, what failed, and links to every PR.

Why Worktrees Matter

The key design decision was using git worktrees instead of just branching. Each task gets its own full copy of the repo in a separate directory. That means tasks can't step on each other's files. If task three is rewriting the auth middleware while task seven is adding a new API endpoint, they're working in completely different folders on completely different branches.

This is what makes parallel mode possible. Independent tasks can actually run at the same time because there's no shared working directory. The grouper analyzes your codebase, figures out which tasks touch overlapping files, and batches them so parallel groups are truly independent.

The Cross-Model Review Loop

If you read my last post, you know I'm big on using Claude and Codex together. OvernightCoder bakes that workflow in. Every task goes through Codex review automatically. Claude writes the code, Codex reviews it, Claude fixes the feedback, Codex reviews again. Up to three outer cycles of three internal passes each.

This matters more when you're asleep than when you're awake. During the day, you're the safety net. You're reading diffs, catching things the AI missed. At night, nobody's watching. The cross-model review is the safety net instead. Two independently trained models checking each other's work is the closest thing to having a human in the loop when there isn't one.

What Surprised Me

The first surprise was how many tasks actually succeed. I expected maybe half to make it through cleanly. In practice, straightforward tasks like adding API endpoints, writing CRUD operations, fixing well-described bugs, these work almost every time. The ones that fail tend to be vaguely described or require understanding context that isn't in the codebase.

The second surprise was how good the PRs look. Because every task goes through TDD and then multiple rounds of cross-model review, the code that comes out the other end is often cleaner than what I'd write at 2am trying to push through a backlog. The test coverage is better too, since tests are written first by design.

The third surprise was how it changed my relationship with my backlog. I used to look at my to-do list with dread. Now I look at it as a queue. The more clearly I describe each task, the more likely it is to get done overnight. It's made me a better spec writer.

The Honest Downsides

It's not magic. Tasks that require deep architectural judgment or cross-cutting changes still need a human. If you write vague tasks like "improve performance," you'll get vague results. The tool is only as good as your task descriptions.

Usage limits are real. On a Claude Max plan, long overnight runs can hit rate limits. I built a wrapper script that handles this by auto-restarting after a cooldown period, but it means some nights you get through fewer tasks than you'd like.

And you still need to review the PRs. Even with autonomous merge mode, I spend my first hour in the morning reading through what got merged. It's faster than writing the code myself, but it's not zero effort. Think of it less like delegation and more like code review as your primary job.

Where This Is Going

OvernightCoder started as a personal hack. But the more I use it, the more I think this is just what software development looks like now. You describe what you want built. The machines build it, review it, and ship it. You come back and make sure it's right.

The backlog isn't a source of stress anymore. It's a queue that drains itself. And every morning I wake up a little closer to done.