Kirin is a senior-only software engineering studio based in Munich, Germany. A small, highly specialized team of senior engineers and designers — no juniors, no account managers — that builds product engineering, design engineering, AI interfaces, audits, and scale work for founders and product teams who already know what to build.

How is Kirin different from a typical software agency?

Three differences. First, no juniors — every project is staffed entirely by senior people. Second, no account managers — the partners who pitch the work do the work. Third, one client at a time — Kirin embeds rather than juggling a portfolio.

How long does a project with Kirin take?

Product engineering and design engineering engagements run 8–16 weeks. Audit and refactor engagements run 4–8 weeks. Build-for-scale work varies with scope.

What is design engineering?

Design engineering is the last 20% of polish — motion, micro-interactions, density, fit-and-finish — written in production code rather than Figma. A frontend engineer implements specs; a design engineer makes design decisions in code.

What technologies does Kirin use?

React and TypeScript on the frontend, React Native for mobile, Node.js and Python on the backend, PostgreSQL for data, AWS for infrastructure. For AI products Kirin has shipped against most major LLM providers and orchestration frameworks across 11 production AI products.

How do I start a project with Kirin?

Email hello@gokirin.com or submit the contact form at gokirin.com. Kirin replies within two business days.

Don't Be Lazy — Refactor Your Frontend, It's Easy Now

Frontend codebases carry more debt than backend ones for cultural, historical, and structural reasons. AI changes the math, but only with three guardrails in place: Zod at every boundary, ironclad TypeScript, and tests that ship with the code.

We just shipped a PR that touched 1,247 files. Data-fetching refactor — every fetch moved behind a typed client, the API layer rewritten, UI bit-identical. Clean diff, green CI, Friday deploy.

A year ago that PR is a quarter-long project. The math changed because the agent does the mechanical middle now — every call site, every import, every narrowing.

The agent isn't what makes this safe. Three guardrails are. In this order:

Zod at every boundary — runtime safety where the type system stops.
Ironclad TypeScript — no any, no escape hatches.
Tests that ship with the code — every PR, no exceptions.

Skip any of them and you're gambling. Have all three and the refactor is mostly boring.

Why your frontend is a mess

Three reasons. None of them your fault.

Cultural. The industry treated frontend as the easy part. "Just HTML." Juniors staffed it. A generation of frontend code was written by people learning to engineer software in the same codebase that ended up in production.

Historical. Every framework cycle left a fossil layer. Angular 1 → 2. Classes → hooks. Redux → Zustand → whatever's next. Every surviving codebase has all of them somewhere.

Structural. Backend has schemas, protobufs, OpenAPI, database constraints. Frontend had vibes. Forms accept whatever the user types, API responses get mapped into ad-hoc shapes, "we'll add tests later" becomes never. Swift and Kotlin are closer to backend on this — the compiler does the work the team didn't. The web has been opt-in safety, by file and by discipline.

That's why your frontend has years of mess nobody has touched.

What changes with AI

The middle of a refactor used to be the expensive part — finding every call site, rewriting imports, swapping a CSS approach while keeping the visual output identical. The agent does that work now.

What it does not do is judge what your business actually requires. That's the guardrails' job.

Rule 1: Parse every boundary with Zod

UI bugs surface in QA. A missing button, wrong color — obvious, fixed.

Data-shape regressions are different. A field renamed. A nullable that quietly became required. A string that became an enum. They don't crash. They produce wrong screens for some users, and the bug report shows up weeks later as "the dashboard is empty for our biggest customer."

TypeScript is a compile-time hint. At runtime your code trusts whatever the network said. Zod (or Valibot, ArkType, etc.) parses the shape at the boundary. If you can name the boundary, you parse at it:

HTTP responses (REST, GraphQL, RPC)
Messages between micro-frontends or workers
Local storage, IndexedDB, cookies
URL params, search params, route state
Form submissions

const UserResponse = z.object({
  id: z.string().uuid(),
  email: z.string().email(),
  plan: z.enum(["free", "pro", "team"]),
  trialEndsAt: z.coerce.date().nullable(),
});

type User = z.infer<typeof UserResponse>;

export async function getUser(id: string): Promise<User> {
  const res = await fetch(`/api/users/${id}`).then((r) => r.json());
  return UserResponse.parse(res);
}

With every boundary parsed, the agent can hammer the network layer without breaking what your UI depends on. Without it, the agent has no signal for what's load-bearing.

.claude/rules/parse-at-boundary.md

# Parse data at every system boundary

When data crosses into the frontend, define a Zod schema and parse
it before assigning to anything typed. Boundaries include:

- HTTP responses (REST, GraphQL, RPC)
- Messages between micro-frontends or workers
- Local storage, IndexedDB, cookies
- URL params, search params, route state
- Form submissions

Derive the TypeScript type from the schema with z.infer. Do not
write the type and the schema separately; they will drift.

If the parse fails, surface a typed error — never return partial
data.

Rule 2: Ironclad TypeScript

any is a backdoor. unknown is a backdoor with a padlock you forgot to remove. A codebase that allows either widely cannot be refactored safely at scale — the compiler stops being a tool and becomes a suggestion.

Defaults we ship with:

strict: true. @typescript-eslint/no-explicit-any set to error.
unknown is fine at the boundary. Narrow it via Zod or a type guard before any property access.
Types come from runtime sources — Zod schemas, generated API clients, database row types (Drizzle / Prisma / Kysely). Never hand-written in parallel. Parallel types drift, every time.

The agent can only reason about what the compiler can verify. Half your code typed as any is half your code the agent will either touch conservatively or get wrong.

.claude/rules/ironclad-typescript.md

# No `any`, narrow every `unknown`

- `any` is forbidden in `src/`. If you need an escape hatch, use
  `unknown` and narrow with Zod, a type guard, or `instanceof`
  before any property access.
- Do not write an interface or type alias that duplicates a Zod
  schema. Use `z.infer<typeof Schema>` instead.
- For external APIs, prefer a generated client (OpenAPI, GraphQL
  Codegen) over hand-written types. If no client is available,
  parse with Zod at the fetch site.
- No `// @ts-ignore` or `// @ts-expect-error` without a one-line
  comment explaining why.

Rule 3: Tests are the contract

Refactoring without tests is hope.

At refactor time, the job of tests is to answer one question honestly: did I break anything. Order of preference:

E2E (Playwright, Cypress). The only tests that catch the regressions a refactor actually causes — a broken route, a form that no longer submits, a layout that collapsed on mobile.
Unit, for non-trivial logic. Anything with branching, math, or non-trivial transformation. Not for presentational components.
Snapshot, for design-system primitives and pure formatters. Cheap to write; worthless if nobody reads the diff.

The rule below is the single most useful thing we put in every .claude/rules. Without it, the agent ships components with no test. With it, the test is in the same PR.

.claude/rules/tests-ship-with-code.md

# Tests are part of the change

Every new component, hook, route, or utility ships with at least
one test in the same PR. Order of preference:

1. E2E (Playwright) when the change is user-visible.
2. Unit (Vitest / Jest) when the change is logic.
3. Snapshot when the change is a design-system primitive or pure
   formatter.

If you cannot test a change automatically, write down why in the
PR description.

What the guardrails don't catch

The three rules are pre-merge. They give you confidence in the shapes — types, parses, tests. They do not catch semantic drift under real load: an effect dep the agent reshuffled, an async ordering that compiles and tests green but races in production, a memoised selector whose new identity tanks render perf on a customer tier you don't have in your fixtures.

The obvious answer — feature-flag the refactor and ramp it — usually isn't realistic at this scale. Flagging 1,000 files means wrapping whole modules in conditional imports, running two code paths in parallel, and accreting flag debt that outlives the refactor itself. For a feature, do it. For a sweeping refactor, the cost rarely pencils out. This is the honest weakness of the approach. Pretend otherwise and you'll meet it in production.

What works instead: a tight canary.

Ramp 1% → 10% → 50% → 100% over hours, not minutes. Each step has to outlast a normal traffic cycle or it tells you nothing.
Tighten error-rate, p95, and route-traversal alarms for the canary window. You're hunting behaviour change, not catastrophe — wide tolerances will hide everything.
Keep the previous build one click away. Revert first, diagnose second. A fast revert beats a clever flag.

Guardrails make the refactor mechanical. The canary makes the deploy survivable. Different lines of defense.

TL;DR

Install the three guardrails. Write a one-paragraph brief ("move all data-fetching into a typed client, preserve call signatures, keep tests green"). Let the agent do the diff. Read it. Canary the deploy.

With guardrails, the agent's mistakes look like a type error, a failed parse, or a red test. Without them, they look like a silent breaking change in production.

Don't be lazy. The hard work is the guardrails. The refactor follows for free.

FAQ

Is AI-assisted refactoring safe for production code?

Yes, if the codebase gives the agent ways to verify its own work — strict TypeScript, Zod at every external boundary, and a test suite that covers user-visible flows. Without those, AI-assisted refactoring is the same gamble as any large refactor, just faster.

What's the biggest risk in a large frontend refactor?

Silent data-shape regressions. UI bugs get caught in QA. Data bugs — a nullable that quietly became required, an enum that grew a new value — slip through and surface later as wrong screens for specific customers.

Rewrite or refactor?

Almost always refactor. A rewrite throws away the years of bug fixes encoded in the existing code. See When to Refactor vs. Rewrite for the five-question framework.

How many files can we safely change in one PR?

No hard number. We routinely ship 1,000+ file PRs when the change is mechanical and the test suite is green. The right limit is what a senior engineer can actually read and review — not what the tooling allows.

Kirin runs codebase audits and refactor engagements for teams sitting on frontends that have outgrown their original architecture. We embed for four to eight weeks, install the guardrails, and do the refactor alongside your team.