Is good taste in software design actually teachable, or is it something you either have or you don't?

It is teachable through deliberate practice. Write one-page design briefs before coding, do depth walks (count what a module hides versus what it exposes), run misuse audits, and rehearse plausible future changes on throwaway branches to measure blast radius. Aim for a functionality-to-interface ratio of at least 3:1, ideally 5:1 or higher.

How do I use the nine-item scorecard when I am improving an existing module, not designing a new one?

Score the current module as-is, then score a small plausible refactor the same way. If the refactor moves you from under 10 into the 10 to 13 band, ship it behind a feature flag and address the remaining weak items in the next iteration. The scorecard is a diagnostic, not a certification.

Are microservices always a bad Type-1 decision, then?

No. They are a bad Type-1 decision when made on day one without evidence. The pattern to avoid is committing to a distributed topology before you have observed real usage, team boundaries, or deployment pain. Wrapping a vendor API behind an internal interface is a cheap reversible move; splitting a monolith across a network boundary is not.

What does "blast radius" mean in practice and how do I measure it on my codebase?

Pick a plausible upcoming requirement, implement it on a throwaway branch, and count the files and modules touched. If more than about 20 to 30 percent of the relevant module surface changes, your abstraction boundaries are shallow. Track the number over time and treat it as a quality metric alongside defect density.

How do I introduce this in a legacy codebase without rewriting everything?

Pick one seam, wrap it behind a clean interface, hide one volatile decision behind it, and wait for the next requirement to land. Measure whether fewer call sites change than before. Repeat on the next seam. The metrics pipeline case study in the article moved MTTR from 2h07m to 15 minutes without a rewrite, by applying the five anchors to one service over three weeks.

Good Taste in Software Engineering: Tests, Not Vibes

Taste isn't mysterious. It's disciplined judgment about tradeoffs, with anchors, scorecards, and practices you can learn today.

I recently read Sean Goedecke's excellent piece on taste in software engineering, and it struck a chord. He's right that taste is about selecting the right engineering values for your project, not rigid adherence to "best practices." But I want to push further: taste isn't mysterious or unteachable. It's a skill you can deliberately develop.

What Taste Actually Means

Here's my working definition: taste is disciplined judgment about tradeoffs under uncertainty that reduces long-term cost and risk. Good taste identifies the right tradeoffs for the problem, then executes them effectively.

Good taste produces systems with a single unifying idea, deep modules, misuse-resistant APIs, and simple composition. These qualities are testable. Run these checks before you code:

Conceptual integrity test: Can you explain it on one page?
Depth test: Does each module hide more than it exposes?
Misuse test: Can you make it do the wrong thing?
Simplicity test: Are concepts untangled?
Observability plan: Are key invariants measurable?
Reversibility: Can you disable it without a deploy?

Taste as Economics

Good taste has a clear business case: it keeps the cost of change low and preserves option value. Jeff Bezos's Type-1 vs Type-2 decision framework applies directly here. Type-1 decisions are hard to reverse: database schemas, public APIs, data formats. Type-2 decisions are easy to undo: internal implementations, feature flags, tool choices.

Good taste means:

Delaying Type-1 commitments until you have sufficient information
Making Type-2 decisions reversible by default
Keeping cheap options open (feature flags, abstraction seams, versioned APIs)
Measuring the cost of change (how many files are touched per requirement?)

Delay Type-1 decisions until you've observed real usage patterns. Example test: "We won't version the public event schema until we've seen 3 external consumers and 2 months of stability."

When engineers with poor taste push for microservices on day one, they're making an expensive Type-1 decision with incomplete information. When engineers with good taste wrap a vendor API behind an internal interface, they're preserving optionality at low cost.

Critical Foundation: Schemas and Events Are APIs

Schemas and events are public APIs, apply Hyrum's Law. JSON fields, Kafka topics, database columns, and protobuf messages all accumulate dependencies. Once external systems consume them, every observable behavior becomes a contract. Design them with the same discipline you'd use for code APIs: minimize the surface area, version changes carefully, and maintain compatibility.

The Five Anchors

1. Conceptual Integrity

Fred Brooks called conceptual integrity "the most important consideration in system design." A system built around one coherent set of ideas is easier to use, extend, and debug. This means saying no to features that don't fit the core concept, even when they're individually "good ideas."

Test it: Write a one-page design brief before you code. Include goals, non-goals, core concepts, and three usage examples. Share it with someone unfamiliar with your project. If they're confused, your design lacks clarity.

2. Deep Modules with Shallow Interfaces

John Ousterhout provides a useful metric: maximize functionality / interface complexity. Great modules hide substantial internal work behind a small, stable surface. This reduces cognitive load and limits the blast radius when things change.

Test it: For any module, list your public types and methods. Then list the significant internal decisions that module handles. If the first list is longer than the second, you have a shallow module. Either push decisions inward or simplify the interface.

I once worked on a codebase where unnecessary 5-layer abstractions increased our average PR touch count from 4 files to 13. No layer performed enough work to justify its existence.

3. API Discipline

Joshua Bloch's advice: good APIs are easy to learn, easy to use without documentation, and hard to misuse. Add Hyrum's Law - users will depend on every observable behavior, and you get a clear imperative: minimize what's observable and treat public behavior as permanent.

Public behavior is permanent. Prefer additive changes; version breaking changes carefully. Establish a clear deprecation policy, for example, 2 releases plus telemetry proving near-zero usage before removal. Use compatibility layers for schemas and events to bridge old and new consumers.

Test it: Before implementing, write code that uses your API. Can you make it do the wrong thing? Can you accidentally create an invalid state? If yes, redesign so these mistakes are impossible or caught at compile time.

4. Simplicity Over Convenience

Rich Hickey draws a crucial distinction: simple means untangled concepts, while easy means familiar or convenient. We should favor simplicity first. Convenience can follow.

This approach is counterintuitive because convenient features feel productive in the moment. But tangled concepts compound over time. Every shortcut you take today becomes a design decision you must defend tomorrow.

Test it: When a design makes two concepts interdependent, ask how to separate them. If you can't separate them, you've found a seam where complexity will accumulate. If removing a helper function increases clarity, remove it.

5. Operational Taste

Good taste isn't limited to design, it's also operational taste. How does your code behave in production? Can you diagnose failures quickly? Can you roll back safely?

Operational taste shows up in:

Observability: Are key invariants measurable? Do you emit counters, histograms, and traces at decision points?
Failure modes: What happens under partial failure or slow dependencies? Do you have backpressure and timeouts?
Idempotency: Can operations be safely retried? What's the scope of idempotency keys (per user? per operation?) and their retention period?
Reversibility: Is there a feature flag or rollback path?
PII hygiene: Structured logs and traces must exclude secrets and user identifiers unless hashed or tokenized. Redaction is part of good taste.
Security invariants: Threat-model the feature. Prefer deny-by-default, least privilege, and signed/verified inputs for cross-service calls.

Test it: Walk through a failure scenario. Can you diagnose the issue from logs and metrics alone? Can you disable the feature without deploying?

A Practical Scorecard

Before you start coding, score your design on these nine items (0-2 each):

One-page story: Can it be explained on one page? (0 = no, 1 = barely, 2 = clearly)
Depth check: Does each module hide more than it exposes? (0 = shallow everywhere, 1 = mixed, 2 = consistently deep)
Misuse resistance: Is it hard to do the wrong thing? (0 = easy to misuse, 1 = requires care, 2 = wrong things impossible)
Simplicity: Are concepts untangled? (0 = tangled, 1 = mostly separate, 2 = cleanly separated)
Information hiding: Are volatile decisions confined behind interfaces? (0 = leaked, 1 = partially hidden, 2 = well-hidden)
Evolution path: Can you change behavior without breaking users? (0 = breaking, 1 = versioned, 2 = additive only)
Reversibility: Can you undo or disable the change fast? (0 = requires deploy, 1 = requires restart, 2 = feature flag)
Runtime clarity: Are failures diagnosable from logs/metrics/traces? (0 = opaque, 1 = partial, 2 = clear)
Blast radius: If a requirement changes, what % of files/modules move? (0 = >30%, 1 = 10-30%, 2 = <10%)

Scoring ranges:

0–9: Redesign before coding
10–13: Proceed behind a feature flag, address weak items in the first iteration
14–18: Ship with confidence

Measuring blast radius: Implement a small, plausible change on a throwaway branch and count the touched files and modules. Track this number over time as a quality metric.

Worked Example: Good Score (16/18)

Feature: User authentication service

One-page story: Yes, three core concepts (identity, session, permission) with clear examples. 2
Depth: AuthService interface has 4 methods, hides 12 internal decisions (token format, crypto, session store, revocation). 2
Misuse: Returns AuthResult sealed type; invalid tokens can't be constructed. 2
Simplicity: Identity, session, and permission are separate value types. 2
Information hiding: Token format and crypto are fully internal. 2
Evolution: All methods accept optional version param for future changes. 2
Reversibility: Feature flag per auth provider. 2
Runtime clarity: Emits auth_attempts, auth_failures_by_reason, traces with user context (hashed). 2
Blast radius: Adding OAuth only touched AuthService impl and config. 0 (also needed gateway and 3 API endpoints)

Total: 16/18 Ship it, but document the gateway coupling for next time.

Worked Example: Bad Score (7/18)

Feature: Email notification system

One-page story: Took two pages, unclear boundary with logging system. 0
Depth: 15 public methods, minimal internal hiding. 0
Misuse: Takes raw HTML strings, easy to inject malicious content or forget escaping. 0
Simplicity: Email, logging, and formatting are tangled in one class. 0
Information hiding: SMTP config leaked to all callers. 0
Evolution: Hard-coded SMTP provider in 20 call sites. 0
Reversibility: Requires full deploy to change providers. 1
Runtime clarity: Good metrics on send rates, failures logged with context. 2
Blast radius: Adding SMS required touching 18 of 20 call sites. 0

Total: 7/18 Stop. Redesign before shipping. Separate concerns, hide the provider, and use typed templates.

A Kotlin Example: From Bad to Good

Instead of:

// Shallow: wide surface, primitive obsession
class Payments(private val http: HttpClient) {
    fun charge(userId: String, amount: Double, currency: String): Boolean
    fun refund(userId: String, amount: Double, currency: String): Boolean
}

Problems:

String-typed parameters and Double rounding issues
Boolean result loses information (why did it fail?)
No idempotency support
Easy to accidentally swap userId and currency parameters
Impossible to test without making real HTTP calls

Do this:

@JvmInline value class UserId(val value: String)
@JvmInline value class ChargeId(val value: String)

enum class Currency { USD, EUR, GBP }

data class Money private constructor(
    val amount: BigDecimal,
    val currency: Currency
) {
    companion object {
        private fun minorUnits(c: Currency) = 2
        private fun rounding(c: Currency) = RoundingMode.HALF_EVEN

        fun of(amount: BigDecimal, currency: Currency): Money =
            Money(amount.setScale(minorUnits(currency), rounding(currency)), currency)
    }

    fun compareToSameCurrency(other: Money): Int {
        require(currency == other.currency) { "Currency mismatch" }
        return amount.compareTo(other.amount)
    }
}

@JvmInline value class NonNegativeMoney private constructor(val value: Money) {
    companion object {
        fun of(amount: BigDecimal, currency: Currency): NonNegativeMoney {
            require(amount >= BigDecimal.ZERO) { "Amount must be non-negative" }
            return NonNegativeMoney(Money.of(amount, currency))
        }
    }
}

sealed interface ChargeResult {
    data class Success(val chargeId: ChargeId): ChargeResult
    sealed interface Failure: ChargeResult {
        data object InsufficientFunds: Failure
        data object InvalidAmount: Failure
        data object RateLimited: Failure
    }
}

sealed interface RefundResult {
    data class Success(val refundId: RefundId): RefundResult
    sealed interface Failure: RefundResult {
        data object ChargeNotFound: Failure
        data object AlreadyRefunded: Failure
        data object ExceedsChargeAmount: Failure
    }
}

interface Payments {
    /**
     * Charge a user's payment method.
     *
     * @param idempotencyKey Scoped per user+operation. Retained for 24 hours.
     *        Duplicate keys within retention window return cached result.
     */
    fun charge(
        user: UserId,
        amount: NonNegativeMoney,
        idempotencyKey: String
    ): ChargeResult

    /**
     * Refund a previous charge, fully or partially.
     *
     * @param idempotencyKey Scoped per charge. Retained for 24 hours.
     */
    fun refund(
        charge: ChargeId,
        amount: NonNegativeMoney,
        idempotencyKey: String
    ): RefundResult
}

Why it's better:

Depth: Interface has only 2 methods but hides HTTP details, retry logic, FX conversion, scale/rounding rules, and duplicate detection. High functionality-to-interface ratio.
Hard to misuse: Currency comparison is validated at runtime with clear errors (no silent bugs). Can't create negative money for charges/refunds. Can't forget to handle failures. Can't swap parameters (all have different types).
Simple: Money, charges, and refunds are separate concepts. Currency math is internal. Signed Money is available for internal accounting; non-negative amounts are enforced at API boundaries.
Testable: Implement Payments with in-memory state for tests; swap in HTTP version for production.
Observable: Return types tell you exactly what went wrong; you can log and create metrics at call sites based on the sealed type.
Idempotent: Explicit idempotency keys (scoped and time-bounded) make retries safe.

Property test you could write:

@Test
fun `total refunded never exceeds charge amount`() = forAll { charge: NonNegativeMoney ->
    val payments = InMemoryPayments()
    val result = payments.charge(testUser, charge, randomKey())

    (result as? ChargeResult.Success)?.let { success ->
        val refund1 = payments.refund(success.chargeId, charge, randomKey())
        val refund2 = payments.refund(
            success.chargeId,
            NonNegativeMoney.of(1.bd, charge.value.currency),
            randomKey()
        )

        refund1 is RefundResult.Success
        refund2 is RefundResult.Failure.AlreadyRefunded
    } ?: true
}

Case Study: Refactoring a Metrics Pipeline

Context

We had a metrics collection service used by 13 teams. Each team posted JSON blobs with arbitrary structure. The service routed metrics to different backends (Datadog, CloudWatch, Prometheus) based on configuration files scattered across repositories.

Bad Outcome

Adding a new backend required touching 15 configuration files
Teams regularly sent malformed JSON that crashed workers
No way to validate metrics locally before deployment
Incidents took 2+ hours to diagnose (which team? which metric? which backend?)
Mean time to add a metric: 3 days (PR + configuration + deployment + verification)

Over Q1, we tracked 10 incidents with MTTR averaging 2h07m.

The Redesign

We applied the five anchors:

1. Conceptual integrity: One core idea "type-safe metrics with declarative routing." We wrote it on one page, shared it with 5 teams, and revised it based on feedback.

2. Deep module:

Before: 8 public configuration formats with a thin routing layer
After: 1 public interface (MetricSchema) that hides routing, backend translation, validation, and retry logic
Depth ratio improved from ~1:1 to ~1:8

3. Misuse resistance:

sealed class MetricValue {
    data class Counter(val value: Long): MetricValue()
    data class Gauge(val value: Double): MetricValue()
    data class Histogram(val values: List<Double>): MetricValue()
}

data class MetricSchema(
    val name: String,
    val tags: Map<String, String>,
    val type: MetricValue
) {
    init {
        require(name.matches(Regex("[a-z0-9._]+"))) { "Invalid metric name" }
        require(tags.all { it.key.matches(Regex("[a-z0-9_]+")) }) { "Invalid tag key" }
    }
}

Now malformed metrics fail at compile time or during construction, not in production.

4. Simplicity: We separated three previously tangled concerns:

Schema definition (types)
Routing policy (which metrics → which backends)
Backend protocol (how to translate)

Each lives in its own module with clear interfaces.

5. Operational taste:

Added validation endpoint: POST schema, get validation result before deployment
Feature flags per backend: can disable Datadog without redeploying
Structured logs with team, metric_name, backend, error_type tags (PII-free)
Emitted metrics_published, metrics_failed_by_reason, and backend_latency_p99 metrics
Rollout plan: dark launch with 5% traffic for 1 week, ramped to 100% over 2 weeks, with kill switch in admin panel

Measurable Results

After rollout in Q2 (measured over 8 incidents):

MTTR: 2h07m → 15m (structured logs made root cause obvious)
Mean time to add metric: 3 days → 30 minutes (type-safe schemas, no configuration files)
Malformed metric incidents: 2-3/week → 0 (caught at compile time)
Backend changes: 15 files → 1 file (deep module design)
Team satisfaction: 6.2 → 8.7 on internal survey

Cost: 2 engineers for 3 weeks. We achieved payback within 2 months based on incident reduction alone.

Building Taste Through Practice

Goedecke is right that taste develops through varied experience. But you can accelerate this development with deliberate practice:

1. Write One-Pagers First

Every single time. Include goals, non-goals, core concepts, and three usage examples. Get feedback before you code. If you can't fill one page clearly, you don't understand the problem yet.

2. Do Depth Walks

Pick a module and list what it exposes versus what it hides. Quantify the ratio: non-test lines of code behind an interface divided by public surface size. If you're not satisfied with the ratio, refactor until you are. Aim for at least 3:1, ideally 5:1 or higher.

3. Run Misuse Audits

Try to break your API in three different ways. Use static analysis, type-safe builders, and compile-time checks. If you can misuse the API, fix it so misuse becomes impossible.

4. Rehearse Changes

Pick a plausible future requirement. Create a throwaway branch and implement it. Count the files and modules touched. If it's more than 20%, you have shallow abstraction boundaries. Refactor to isolate the volatility.

5. Run Design Postmortems

After incidents or rewrites, add a standard section: "Which abstraction was too shallow? Which decision leaked?" Don't blame people, blame design choices that amplified complexity.

Where Bad Taste Comes From

Most poor taste stems from inflexibility. I've worked with engineers who evangelized formal methods for an internal dashboard, or insisted on microservices for a tool with three users. Their values didn't match the problem context.

The warning sign is "best practice" language. No engineering decision is a best practice in all contexts. If someone tells you "we should always do X," they're revealing that their taste has crystallized around one narrow set of values.

Interestingly, this means engineers with poor taste can be quite effective in narrow contexts where their preferences happen to fit. It's only when the project shifts, or they move to a new domain that problems emerge.

Break glass for performance: Sometimes a shallow, specialized path is justified for performance-critical code. When you do this, measure the impact with microbenchmarks, document the reasoning, and fence it behind a narrow API. Make the escape hatch explicit.

Legacy reality: Many readers work in messy systems. You can introduce depth incrementally: pick one seam, wrap it with a clean interface, hide one volatile decision behind it, then measure whether fewer call sites change when the next requirement arrives. Repeat.

Team-Level Taste

Taste isn't limited to code. It also lives in review norms, documentation, and team interfaces.

Make Taste Artifacts Shared

One-pager templates with examples in the team wiki
Review checklists for misuse and depth in PR templates
Design postmortem template with "which design choice amplified the problem?"

Track Design Debt Like Technical Debt

Create tickets for shallow modules, leaked decisions, and missing feature flags
Make the depth ratio visible in your team dashboard
Budget time for depth refactoring, not just new features

Codify Taste in Guidelines

"Public APIs are permanent, prefer additive changes"
"Schemas and events are APIs, apply Hyrum's Law"
"One concept per module, if it has 'and' in the name, split it"

How to Review for Taste

In code reviews and design documents, ask these questions in this order:

What's the single idea a user must learn? If there are two ideas, split them. (Conceptual integrity)
What does this module hide? If not much, deepen it. (Depth)
How could a caller misuse this? Make misuse impossible. (API discipline)
If we had 10x the users, what behavior would we be stuck with? Are we comfortable with that? (Hyrum's Law)
Can a new teammate grasp this from a page and a few examples? If not, fix the narrative. (Readability)
What happens when dependency X fails? Do we have a graceful degradation path? (Operational taste)
Can we disable this without deploying? If not, why not? (Reversibility)
What telemetry will tell us this is failing? Do we have the right metrics and traces?
What's the kill switch? Can we roll back or disable quickly if things go wrong?

The Bottom Line

Sean's article makes the crucial point that taste is about selecting values, not following rigid rules. I want to add that taste is also testable, learnable, and has measurable economic consequences.

Use conceptual integrity as your compass, deep modules for leverage, API discipline for safety, simplicity as your default bias, and operational excellence for production resilience. Do this consistently and people will say you have good taste. But really, you'll just be practicing disciplined engineering.

The programs we write are meant to be read by people, then executed by machines, in that order. Good taste is what makes the first part possible while keeping the second part running smoothly.

These judgment frameworks become even more critical when your co-author is a coding agent. The Coding Agents meistern workshop teaches teams how to encode design taste into systematic agent workflows — so the output meets the bar, not just the prompt.

FAQ

Is good taste in software design actually teachable, or is it something you either have or you don't?: It is teachable through deliberate practice. Write one-page design briefs before coding, do depth walks (count what a module hides versus what it exposes), run misuse audits, and rehearse plausible future changes on throwaway branches to measure blast radius. Aim for a functionality-to-interface ratio of at least 3:1, ideally 5:1 or higher.
How do I use the nine-item scorecard when I am improving an existing module, not designing a new one?: Score the current module as-is, then score a small plausible refactor the same way. If the refactor moves you from under 10 into the 10 to 13 band, ship it behind a feature flag and address the remaining weak items in the next iteration. The scorecard is a diagnostic, not a certification.
Are microservices always a bad Type-1 decision, then?: No. They are a bad Type-1 decision when made on day one without evidence. The pattern to avoid is committing to a distributed topology before you have observed real usage, team boundaries, or deployment pain. Wrapping a vendor API behind an internal interface is a cheap reversible move; splitting a monolith across a network boundary is not.
What does "blast radius" mean in practice and how do I measure it on my codebase?: Pick a plausible upcoming requirement, implement it on a throwaway branch, and count the files and modules touched. If more than about 20 to 30 percent of the relevant module surface changes, your abstraction boundaries are shallow. Track the number over time and treat it as a quality metric alongside defect density.
How do I introduce this in a legacy codebase without rewriting everything?: Pick one seam, wrap it behind a clean interface, hide one volatile decision behind it, and wait for the next requirement to land. Measure whether fewer call sites change than before. Repeat on the next seam. The metrics pipeline case study in the article moved MTTR from 2h07m to 15 minutes without a rewrite, by applying the five anchors to one service over three weeks.

Good Taste in Software Engineering: Tests, Not Vibes

What Taste Actually Means

Taste as Economics

Critical Foundation: Schemas and Events Are APIs

The Five Anchors

1. Conceptual Integrity

2. Deep Modules with Shallow Interfaces

3. API Discipline

4. Simplicity Over Convenience

5. Operational Taste

A Practical Scorecard

Worked Example: Good Score (16/18)

Worked Example: Bad Score (7/18)

A Kotlin Example: From Bad to Good

Case Study: Refactoring a Metrics Pipeline

Context

Bad Outcome

The Redesign

Measurable Results

Building Taste Through Practice

1. Write One-Pagers First

2. Do Depth Walks

3. Run Misuse Audits

4. Rehearse Changes

5. Run Design Postmortems

Where Bad Taste Comes From

Team-Level Taste

Make Taste Artifacts Shared

Track Design Debt Like Technical Debt

Codify Taste in Guidelines

How to Review for Taste

The Bottom Line

FAQ

Further Reading

The Case for a Programmable Desktop

Who Owns the Means of Computation?

Good Taste in Software Engineering: Tests, Not Vibes

What Taste Actually Means

Taste as Economics

Critical Foundation: Schemas and Events Are APIs

The Five Anchors

1. Conceptual Integrity

2. Deep Modules with Shallow Interfaces

3. API Discipline

4. Simplicity Over Convenience

5. Operational Taste

A Practical Scorecard

Worked Example: Good Score (16/18)

Worked Example: Bad Score (7/18)

A Kotlin Example: From Bad to Good

Case Study: Refactoring a Metrics Pipeline

Context

Bad Outcome

The Redesign

Measurable Results

Building Taste Through Practice

1. Write One-Pagers First

2. Do Depth Walks

3. Run Misuse Audits

4. Rehearse Changes

5. Run Design Postmortems

Where Bad Taste Comes From

Team-Level Taste

Make Taste Artifacts Shared

Track Design Debt Like Technical Debt

Codify Taste in Guidelines

How to Review for Taste

The Bottom Line

FAQ

Further Reading

Continue reading

The Case for a Programmable Desktop

Who Owns the Means of Computation?