Every frontend codebase carries technical debt. Most teams treat it as a permanent tax — a thing you live with rather than something you fix. For a long time that was the right call. A serious refactor — change the folder structure, swap the state manager, replace the data-fetching layer, migrate the design system — used to mean three months of pain across hundreds of files, with a non-trivial chance of regressing the product in ways nobody could fully predict.

That math has changed. With AI-assisted refactoring, the same migration is a week of focused work and a much smaller blast radius. We have shipped refactors that touch 1,000+ files in a single PR. The trick is not the AI. The trick is the guardrails you put in place before you let the agent touch the code. Three of them, in order of importance.

Why frontend carries more debt than the backend

The reason most frontend code is messier than the backend it talks to is structural, not cultural. Backend services tend to live behind clear contracts — schemas, protobufs, OpenAPI definitions, database constraints — and there is almost always a senior engineer enforcing them. Frontend has historically been the part of the system where the contracts get loose:

  • Designers iterate faster than the schema keeps up.
  • Forms accept whatever the user types and hope for the best.
  • API responses get mapped into ad-hoc shapes inside React components.
  • "We'll add tests later" becomes "we never added tests."

The platforms reinforce this. On native mobile — Swift on iOS, Kotlin on Android — the compiler does a lot of the work. Models are typed. View hierarchies are typed. The IDE pushes you toward correctness. On the web, the default has been to ship JavaScript at the user as fast as possible and figure the rest out at runtime. TypeScript improved this, but TypeScript is opt-in by file, by line, and by team discipline. Slip once and the safety net has a hole in it.

What changes with AI

Pre-AI, a 1,000-file refactor was an act of pure willpower. You would spend two weeks understanding the codebase, two weeks proposing a target shape, and then six weeks grinding through find-and-replace work that nobody enjoyed. The boring middle was where bugs were introduced — a missed call site, a stale import, a type that nominally matched but semantically did not.

AI handles that middle well. The model can read the whole codebase. It can find every call site of a deprecated function. It can rewrite imports across an entire directory tree. It can swap one CSS approach for another and keep the visual output stable. But it does this with no judgment about what your business actually requires — and that is where you need the guardrails.

The kind of migrations we are talking about:

  • Folder structure — feature-first → domain-first, or both → colocated by route.
  • State management — Redux → Zustand, or global store → server state with React Query plus a thin local store.
  • CSS approach — CSS modules → Tailwind, or styled-components → Vanilla Extract.
  • Data fetching — scattered fetch calls → a typed client with React Query / SWR.
  • Auth provider — Auth0 → Clerk, or self-hosted → Supabase Auth.

In each case, the surface area is wide but the logic is mechanical. Perfect AI territory — provided the codebase tells the agent what is safe to change.

That brings us to the three rules.

Rule 1: Validate every boundary with Zod

This one cannot be overstated. UI/UX issues surface during QA — a missing button, a wrong color, a misaligned input. They are obvious, and they get fixed. Data-shape regressions are different. A field renamed in the API, a nullable that became required, a string that became an enum — these do not crash. They silently produce wrong screens for some users, and the bug report arrives three weeks later as "the dashboard is empty for our biggest customer."

The fix is parsing, not typing. TypeScript is a compile-time hint. At runtime, your code believes whatever the network told it. Zod (or Valibot, or ArkType, or any of the parsing libraries in the same lineage) lets you assert the shape at the boundary.

We use Zod everywhere data enters the frontend:

  • Form submissions.
  • API responses — most importantly.
  • Messages between micro-frontends in a federated architecture, even between modules of the same monorepo.
  • Local storage and IndexedDB reads.
  • URL parameters that drive state.

If you can name the boundary, you parse at it. The pattern:

const UserResponse = z.object({
  id: z.string().uuid(),
  email: z.string().email(),
  plan: z.enum(["free", "pro", "team"]),
  trialEndsAt: z.coerce.date().nullable(),
});

type User = z.infer<typeof UserResponse>;

export async function getUser(id: string): Promise<User> {
  const res = await fetch(`/api/users/${id}`).then((r) => r.json());
  return UserResponse.parse(res);
}

This catches the bug at the moment it enters the codebase, not three steps later in a component that did not know plan could now be "team". When you give an AI agent a codebase full of these parses, it can refactor the network layer aggressively without breaking the contracts your UI depends on. Without them, the agent has no signal about what is load-bearing and what is incidental.

Claude rule: parse-at-the-boundary

A short rule to drop into .claude/rules/parse-at-the-boundary.md. Generic enough for any TypeScript frontend:

# Parse data at every system boundary

When data crosses into the frontend, define a Zod schema and parse
it before assigning to anything typed. Boundaries include:

- HTTP responses (REST, GraphQL, RPC)
- Messages between micro-frontends or workers
- Local storage, IndexedDB, cookies
- URL params, search params, route state
- Form submissions

Derive the TypeScript type from the schema with `z.infer`. Do not
write the type and the schema separately; they will drift.

If the parse fails, surface a typed error — do not return partial
data.

The rule reads in three lines. The discipline pays back a hundred-fold during a refactor.

Rule 2: Ironclad TypeScript

any is a backdoor. unknown is a backdoor that has been padlocked but not removed. A codebase that allows either widely cannot be safely refactored at scale — by humans or by AI. The compiler stops being a tool and becomes a suggestion.

Our defaults:

  • strict: true, noImplicitAny: true, strictNullChecks: true in tsconfig.json.
  • ESLint: @typescript-eslint/no-explicit-any set to error, with a narrow allowlist for genuinely unconstrained input.
  • unknown is acceptable at the boundary; it must be narrowed (via Zod, a type guard, or instanceof) before use anywhere else.
  • Types are derived from runtime sources where possible — Zod schemas, database row types (Drizzle / Prisma / Kysely), API contracts (OpenAPI generators).

The reason this matters for refactoring is that the agent can only reason about what the compiler can verify. If half your codebase uses any for "data we got from somewhere," the AI cannot tell whether a change is safe. It will either be conservative (slow, expensive in tokens) or wrong (worse).

When we inherit a codebase with widespread any, the first refactor is almost always a sweep to introduce schemas at the network layer and replace any with z.infer<typeof Schema>. Everything downstream gets cheaper after that.

Claude rule: no escape hatches

# No `any`, narrow every `unknown`

- `any` is forbidden anywhere in `src/`. If you need an escape
  hatch, use `unknown` and narrow it with Zod, a type guard, or
  `instanceof` before any property access.
- Do not write an interface or type alias that duplicates a Zod
  schema. Use `z.infer<typeof Schema>` instead.
- When importing data from an external API, prefer a generated
  client (OpenAPI, GraphQL Codegen) over hand-written types. If a
  generated client is not available, parse with Zod at the fetch
  site.
- If you find yourself adding `// @ts-ignore`, stop. Surface the
  type error and either fix it or document in the PR why it cannot
  be fixed.

Rule 3: Tests are the contract

Refactoring without tests is hope. The biggest fear with any large refactor is the silent regression — the feature that "still works" because nobody noticed it broke. The job of tests, at refactor time, is to give the agent (and you) an honest answer to "did I break anything."

Order of preference, in our experience:

  1. End-to-end tests (Playwright, Cypress). These are the only tests that catch the kind of regression a refactor actually introduces — a route that no longer resolves, a form that no longer submits, a layout that broke on mobile. Slow to run, expensive to maintain, irreplaceable.
  2. Unit tests for non-trivial logic. Anything with branching, math, or non-trivial transformation deserves a unit test. Do not unit-test the render output of a purely presentational component — that is the snapshot's job.
  3. Snapshot tests for stable components. Cheap to write, useful for catching accidental UI changes. Almost worthless if nobody reads the diff. Worth it for design-system primitives and for outputs of complex utilities (markdown renderers, code highlighters, formatters).

The single most useful rule we put in every .claude/rules directory is the one that requires tests on new code. Without it, the model will happily generate a component with no test and call it done. With it, the model writes a test in the same PR as the component. After a quarter of operation, the test coverage of new code converges to near-100%, even if the legacy coverage is still spotty.

Claude rule: tests ship with code

# Tests are part of the change

Every new component, hook, route, or utility ships with at least
one test in the same PR. The order of preference:

1. E2E test (Playwright) when the change is user-visible.
2. Unit test (Vitest / Jest) when the change is logic.
3. Snapshot test when the change is a design-system primitive or a
   pure formatter.

If you cannot test a change automatically, write down why in the PR
description.

Putting it together

Once these three rules are in place, the refactor itself is mechanical. You write a short brief — "move all data-fetching from local fetch calls into a typed client; preserve every existing call signature; keep the test suite green" — and let the agent work. The agent reads the codebase, finds every call site, rewrites them, runs the tests, and surfaces anything that did not pass.

The first hour of that work is more productive than two engineers grinding for a week. The remaining hours are spent reading the diff, watching the test suite, and pulling on threads where the agent guessed wrong. With the guardrails, "guessed wrong" looks like a TypeScript error, a failed parse at a boundary, or a red test — all recoverable. Without the guardrails, "guessed wrong" looks like a silent breaking change shipped to production.

What this means for your codebase today

If you are sitting on a frontend that "needs a refactor" and you have been putting it off because the cost looked too high — that calculus is out of date. The cost dropped. What is still expensive is the prerequisite work: validation at boundaries, strict TypeScript, and a real test suite. None of that is glamorous, and none of it is the refactor. But once those three are in place, the refactor itself is the easy part.

Don't be lazy. The hard work is the guardrails. The refactor follows for free.

FAQ

Is AI-assisted refactoring safe for production code?

It is safe if and only if the codebase has guardrails that let the agent verify its own work — strict TypeScript, runtime validation at every external boundary (Zod or similar), and a test suite that covers the user-visible flows. Without those, AI-assisted refactoring is the same gamble as any large refactor: fast but unverifiable. With them, you can safely change a thousand files in an afternoon.

What is the biggest risk in a large frontend refactor?

Silent data-shape regressions. UI bugs surface during QA. Data bugs — a field that quietly became nullable, a string that became an enum — slip past everything and show up later as wrong screens for specific customers. Parsing every external response with Zod is the cheapest defense.

Should we rewrite or refactor our frontend?

In almost every case, refactor. A rewrite discards the years of bug fixes and edge-case handling encoded in the existing code. See When to Refactor vs. Rewrite for the five-question framework we use to decide.

What is the role of TypeScript in AI-assisted refactoring?

The compiler is the agent's most important collaborator. Every any is a place where the model has to guess; every strict, derived type is a place where it can verify. A codebase with zero any and Zod-derived types can be refactored aggressively. A codebase that has opted out of the compiler cannot.

How many files can we safely change in one PR?

There is no hard number. We routinely ship 1,000+ file PRs when the change is mechanical and the test suite stays green. The right limit is set by the size of the diff a senior engineer can actually read and understand — not by tooling.

Do these rules apply outside of React?

Yes. The three rules — parse at boundaries, no any, tests ship with code — are framework-agnostic. We apply the same pattern in Vue, Svelte, SolidJS, and React Native codebases. The parsing library (Zod, Valibot) and the test runner (Vitest, Jest, Playwright) can vary; the discipline does not.


Kirin runs codebase audits and refactor engagements for teams sitting on frontends that have outgrown their original architecture. We embed for four to eight weeks, install the guardrails, and do the refactor alongside your team — not for them.