I recently read Sean Goedecke's excellent piece on taste in software engineering, and it struck a chord. He's right that taste is about selecting the right engineering values for your project, not rigid adherence to "best practices." But I want to push further: taste isn't mysterious or unteachable. It's a skill you can deliberately develop.
What Taste Actually Means
Here's my working definition: taste is disciplined judgment about tradeoffs under uncertainty that reduces long-term cost and risk. Good taste identifies the right tradeoffs for the problem, then executes them effectively.
Good taste produces systems with a single unifying idea, deep modules, misuse-resistant APIs, and simple composition. These qualities are testable. Run these checks before you code:
- Conceptual integrity test: Can you explain it on one page?
- Depth test: Does each module hide more than it exposes?
- Misuse test: Can you make it do the wrong thing?
- Simplicity test: Are concepts untangled?
- Observability plan: Are key invariants measurable?
- Reversibility: Can you disable it without a deploy?
Taste as Economics
Good taste has a clear business case: it keeps the cost of change low and preserves option value. Jeff Bezos's Type-1 vs Type-2 decision framework applies directly here. Type-1 decisions are hard to reverse: database schemas, public APIs, data formats. Type-2 decisions are easy to undo: internal implementations, feature flags, tool choices.
Good taste means:
- Delaying Type-1 commitments until you have sufficient information
- Making Type-2 decisions reversible by default
- Keeping cheap options open (feature flags, abstraction seams, versioned APIs)
- Measuring the cost of change (how many files are touched per requirement?)
Delay Type-1 decisions until you've observed real usage patterns. Example test: "We won't version the public event schema until we've seen 3 external consumers and 2 months of stability."
When engineers with poor taste push for microservices on day one, they're making an expensive Type-1 decision with incomplete information. When engineers with good taste wrap a vendor API behind an internal interface, they're preserving optionality at low cost.
Critical Foundation: Schemas and Events Are APIs
Schemas and events are public APIs, apply Hyrum's Law. JSON fields, Kafka topics, database columns, and protobuf messages all accumulate dependencies. Once external systems consume them, every observable behavior becomes a contract. Design them with the same discipline you'd use for code APIs: minimize the surface area, version changes carefully, and maintain compatibility.
The Five Anchors
1. Conceptual Integrity
Fred Brooks called conceptual integrity "the most important consideration in system design." A system built around one coherent set of ideas is easier to use, extend, and debug. This means saying no to features that don't fit the core concept, even when they're individually "good ideas."
Test it: Write a one-page design brief before you code. Include goals, non-goals, core concepts, and three usage examples. Share it with someone unfamiliar with your project. If they're confused, your design lacks clarity.
2. Deep Modules with Shallow Interfaces
John Ousterhout provides a useful metric: maximize functionality / interface complexity. Great modules hide substantial internal work behind a small, stable surface. This reduces cognitive load and limits the blast radius when things change.
Test it: For any module, list your public types and methods. Then list the significant internal decisions that module handles. If the first list is longer than the second, you have a shallow module. Either push decisions inward or simplify the interface.
I once worked on a codebase where unnecessary 5-layer abstractions increased our average PR touch count from 4 files to 13. No layer performed enough work to justify its existence.
3. API Discipline
Joshua Bloch's advice: good APIs are easy to learn, easy to use without documentation, and hard to misuse. Add Hyrum's Law - users will depend on every observable behavior, and you get a clear imperative: minimize what's observable and treat public behavior as permanent.
Public behavior is permanent. Prefer additive changes; version breaking changes carefully. Establish a clear deprecation policy, for example, 2 releases plus telemetry proving near-zero usage before removal. Use compatibility layers for schemas and events to bridge old and new consumers.
Test it: Before implementing, write code that uses your API. Can you make it do the wrong thing? Can you accidentally create an invalid state? If yes, redesign so these mistakes are impossible or caught at compile time.
4. Simplicity Over Convenience
Rich Hickey draws a crucial distinction: simple means untangled concepts, while easy means familiar or convenient. We should favor simplicity first. Convenience can follow.
This approach is counterintuitive because convenient features feel productive in the moment. But tangled concepts compound over time. Every shortcut you take today becomes a design decision you must defend tomorrow.
Test it: When a design makes two concepts interdependent, ask how to separate them. If you can't separate them, you've found a seam where complexity will accumulate. If removing a helper function increases clarity, remove it.
5. Operational Taste
Good taste isn't limited to design, it's also operational taste. How does your code behave in production? Can you diagnose failures quickly? Can you roll back safely?
Operational taste shows up in:
- Observability: Are key invariants measurable? Do you emit counters, histograms, and traces at decision points?
- Failure modes: What happens under partial failure or slow dependencies? Do you have backpressure and timeouts?
- Idempotency: Can operations be safely retried? What's the scope of idempotency keys (per user? per operation?) and their retention period?
- Reversibility: Is there a feature flag or rollback path?
- PII hygiene: Structured logs and traces must exclude secrets and user identifiers unless hashed or tokenized. Redaction is part of good taste.
- Security invariants: Threat-model the feature. Prefer deny-by-default, least privilege, and signed/verified inputs for cross-service calls.
Test it: Walk through a failure scenario. Can you diagnose the issue from logs and metrics alone? Can you disable the feature without deploying?
A Practical Scorecard
Before you start coding, score your design on these nine items (0-2 each):
- One-page story: Can it be explained on one page? (0 = no, 1 = barely, 2 = clearly)
- Depth check: Does each module hide more than it exposes? (0 = shallow everywhere, 1 = mixed, 2 = consistently deep)
- Misuse resistance: Is it hard to do the wrong thing? (0 = easy to misuse, 1 = requires care, 2 = wrong things impossible)
- Simplicity: Are concepts untangled? (0 = tangled, 1 = mostly separate, 2 = cleanly separated)
- Information hiding: Are volatile decisions confined behind interfaces? (0 = leaked, 1 = partially hidden, 2 = well-hidden)
- Evolution path: Can you change behavior without breaking users? (0 = breaking, 1 = versioned, 2 = additive only)
- Reversibility: Can you undo or disable the change fast? (0 = requires deploy, 1 = requires restart, 2 = feature flag)
- Runtime clarity: Are failures diagnosable from logs/metrics/traces? (0 = opaque, 1 = partial, 2 = clear)
- Blast radius: If a requirement changes, what % of files/modules move? (0 = >30%, 1 = 10-30%, 2 = <10%)
Scoring ranges:
- 0–9: Redesign before coding
- 10–13: Proceed behind a feature flag, address weak items in the first iteration
- 14–18: Ship with confidence
Measuring blast radius: Implement a small, plausible change on a throwaway branch and count the touched files and modules. Track this number over time as a quality metric.
Worked Example: Good Score (16/18)
Feature: User authentication service
- One-page story: Yes, three core concepts (identity, session, permission) with clear examples. 2
- Depth:
AuthServiceinterface has 4 methods, hides 12 internal decisions (token format, crypto, session store, revocation). 2 - Misuse: Returns
AuthResultsealed type; invalid tokens can't be constructed. 2 - Simplicity: Identity, session, and permission are separate value types. 2
- Information hiding: Token format and crypto are fully internal. 2
- Evolution: All methods accept optional
versionparam for future changes. 2 - Reversibility: Feature flag per auth provider. 2
- Runtime clarity: Emits
auth_attempts,auth_failures_by_reason, traces with user context (hashed). 2 - Blast radius: Adding OAuth only touched
AuthServiceimpl and config. 0 (also needed gateway and 3 API endpoints)
Total: 16/18 Ship it, but document the gateway coupling for next time.
Worked Example: Bad Score (7/18)
Feature: Email notification system
- One-page story: Took two pages, unclear boundary with logging system. 0
- Depth: 15 public methods, minimal internal hiding. 0
- Misuse: Takes raw HTML strings, easy to inject malicious content or forget escaping. 0
- Simplicity: Email, logging, and formatting are tangled in one class. 0
- Information hiding: SMTP config leaked to all callers. 0
- Evolution: Hard-coded SMTP provider in 20 call sites. 0
- Reversibility: Requires full deploy to change providers. 1
- Runtime clarity: Good metrics on send rates, failures logged with context. 2
- Blast radius: Adding SMS required touching 18 of 20 call sites. 0
Total: 7/18 Stop. Redesign before shipping. Separate concerns, hide the provider, and use typed templates.
A Kotlin Example: From Bad to Good
Instead of:
// Shallow: wide surface, primitive obsession
class Payments(private val http: HttpClient) {
fun charge(userId: String, amount: Double, currency: String): Boolean
fun refund(userId: String, amount: Double, currency: String): Boolean
}
Problems:
- String-typed parameters and
Doublerounding issues - Boolean result loses information (why did it fail?)
- No idempotency support
- Easy to accidentally swap userId and currency parameters
- Impossible to test without making real HTTP calls
Do this:
@JvmInline value class UserId(val value: String)
@JvmInline value class ChargeId(val value: String)
enum class Currency { USD, EUR, GBP }
data class Money private constructor(
val amount: BigDecimal,
val currency: Currency
) {
companion object {
private fun minorUnits(c: Currency) = 2
private fun rounding(c: Currency) = RoundingMode.HALF_EVEN
fun of(amount: BigDecimal, currency: Currency): Money =
Money(amount.setScale(minorUnits(currency), rounding(currency)), currency)
}
fun compareToSameCurrency(other: Money): Int {
require(currency == other.currency) { "Currency mismatch" }
return amount.compareTo(other.amount)
}
}
@JvmInline value class NonNegativeMoney private constructor(val value: Money) {
companion object {
fun of(amount: BigDecimal, currency: Currency): NonNegativeMoney {
require(amount >= BigDecimal.ZERO) { "Amount must be non-negative" }
return NonNegativeMoney(Money.of(amount, currency))
}
}
}
sealed interface ChargeResult {
data class Success(val chargeId: ChargeId): ChargeResult
sealed interface Failure: ChargeResult {
data object InsufficientFunds: Failure
data object InvalidAmount: Failure
data object RateLimited: Failure
}
}
sealed interface RefundResult {
data class Success(val refundId: RefundId): RefundResult
sealed interface Failure: RefundResult {
data object ChargeNotFound: Failure
data object AlreadyRefunded: Failure
data object ExceedsChargeAmount: Failure
}
}
interface Payments {
/**
* Charge a user's payment method.
*
* @param idempotencyKey Scoped per user+operation. Retained for 24 hours.
* Duplicate keys within retention window return cached result.
*/
fun charge(
user: UserId,
amount: NonNegativeMoney,
idempotencyKey: String
): ChargeResult
/**
* Refund a previous charge, fully or partially.
*
* @param idempotencyKey Scoped per charge. Retained for 24 hours.
*/
fun refund(
charge: ChargeId,
amount: NonNegativeMoney,
idempotencyKey: String
): RefundResult
}
Why it's better:
- Depth: Interface has only 2 methods but hides HTTP details, retry logic, FX conversion, scale/rounding rules, and duplicate detection. High functionality-to-interface ratio.
- Hard to misuse: Currency comparison is validated at runtime with clear errors (no silent bugs). Can't create negative money for charges/refunds. Can't forget to handle failures. Can't swap parameters (all have different types).
- Simple: Money, charges, and refunds are separate concepts. Currency math is internal. Signed
Moneyis available for internal accounting; non-negative amounts are enforced at API boundaries. - Testable: Implement
Paymentswith in-memory state for tests; swap in HTTP version for production. - Observable: Return types tell you exactly what went wrong; you can log and create metrics at call sites based on the sealed type.
- Idempotent: Explicit idempotency keys (scoped and time-bounded) make retries safe.
Property test you could write:
@Test
fun `total refunded never exceeds charge amount`() = forAll { charge: NonNegativeMoney ->
val payments = InMemoryPayments()
val result = payments.charge(testUser, charge, randomKey())
(result as? ChargeResult.Success)?.let { success ->
val refund1 = payments.refund(success.chargeId, charge, randomKey())
val refund2 = payments.refund(
success.chargeId,
NonNegativeMoney.of(1.bd, charge.value.currency),
randomKey()
)
refund1 is RefundResult.Success
refund2 is RefundResult.Failure.AlreadyRefunded
} ?: true
}
Case Study: Refactoring a Metrics Pipeline
Context
We had a metrics collection service used by 13 teams. Each team posted JSON blobs with arbitrary structure. The service routed metrics to different backends (Datadog, CloudWatch, Prometheus) based on configuration files scattered across repositories.
Bad Outcome
- Adding a new backend required touching 15 configuration files
- Teams regularly sent malformed JSON that crashed workers
- No way to validate metrics locally before deployment
- Incidents took 2+ hours to diagnose (which team? which metric? which backend?)
- Mean time to add a metric: 3 days (PR + configuration + deployment + verification)
Over Q1, we tracked 10 incidents with MTTR averaging 2h07m.
The Redesign
We applied the five anchors:
1. Conceptual integrity: One core idea "type-safe metrics with declarative routing." We wrote it on one page, shared it with 5 teams, and revised it based on feedback.
2. Deep module:
- Before: 8 public configuration formats with a thin routing layer
- After: 1 public interface (
MetricSchema) that hides routing, backend translation, validation, and retry logic - Depth ratio improved from ~1:1 to ~1:8
3. Misuse resistance:
sealed class MetricValue {
data class Counter(val value: Long): MetricValue()
data class Gauge(val value: Double): MetricValue()
data class Histogram(val values: List<Double>): MetricValue()
}
data class MetricSchema(
val name: String,
val tags: Map<String, String>,
val type: MetricValue
) {
init {
require(name.matches(Regex("[a-z0-9._]+"))) { "Invalid metric name" }
require(tags.all { it.key.matches(Regex("[a-z0-9_]+")) }) { "Invalid tag key" }
}
}
Now malformed metrics fail at compile time or during construction, not in production.
4. Simplicity: We separated three previously tangled concerns:
- Schema definition (types)
- Routing policy (which metrics → which backends)
- Backend protocol (how to translate)
Each lives in its own module with clear interfaces.
5. Operational taste:
- Added validation endpoint: POST schema, get validation result before deployment
- Feature flags per backend: can disable Datadog without redeploying
- Structured logs with
team,metric_name,backend,error_typetags (PII-free) - Emitted
metrics_published,metrics_failed_by_reason, andbackend_latency_p99metrics - Rollout plan: dark launch with 5% traffic for 1 week, ramped to 100% over 2 weeks, with kill switch in admin panel
Measurable Results
After rollout in Q2 (measured over 8 incidents):
- MTTR: 2h07m → 15m (structured logs made root cause obvious)
- Mean time to add metric: 3 days → 30 minutes (type-safe schemas, no configuration files)
- Malformed metric incidents: 2-3/week → 0 (caught at compile time)
- Backend changes: 15 files → 1 file (deep module design)
- Team satisfaction: 6.2 → 8.7 on internal survey
Cost: 2 engineers for 3 weeks. We achieved payback within 2 months based on incident reduction alone.
Building Taste Through Practice
Goedecke is right that taste develops through varied experience. But you can accelerate this development with deliberate practice:
1. Write One-Pagers First
Every single time. Include goals, non-goals, core concepts, and three usage examples. Get feedback before you code. If you can't fill one page clearly, you don't understand the problem yet.
2. Do Depth Walks
Pick a module and list what it exposes versus what it hides. Quantify the ratio: non-test lines of code behind an interface divided by public surface size. If you're not satisfied with the ratio, refactor until you are. Aim for at least 3:1, ideally 5:1 or higher.
3. Run Misuse Audits
Try to break your API in three different ways. Use static analysis, type-safe builders, and compile-time checks. If you can misuse the API, fix it so misuse becomes impossible.
4. Rehearse Changes
Pick a plausible future requirement. Create a throwaway branch and implement it. Count the files and modules touched. If it's more than 20%, you have shallow abstraction boundaries. Refactor to isolate the volatility.
5. Run Design Postmortems
After incidents or rewrites, add a standard section: "Which abstraction was too shallow? Which decision leaked?" Don't blame people, blame design choices that amplified complexity.
Where Bad Taste Comes From
Most poor taste stems from inflexibility. I've worked with engineers who evangelized formal methods for an internal dashboard, or insisted on microservices for a tool with three users. Their values didn't match the problem context.
The warning sign is "best practice" language. No engineering decision is a best practice in all contexts. If someone tells you "we should always do X," they're revealing that their taste has crystallized around one narrow set of values.
Interestingly, this means engineers with poor taste can be quite effective in narrow contexts where their preferences happen to fit. It's only when the project shifts, or they move to a new domain that problems emerge.
Break glass for performance: Sometimes a shallow, specialized path is justified for performance-critical code. When you do this, measure the impact with microbenchmarks, document the reasoning, and fence it behind a narrow API. Make the escape hatch explicit.
Legacy reality: Many readers work in messy systems. You can introduce depth incrementally: pick one seam, wrap it with a clean interface, hide one volatile decision behind it, then measure whether fewer call sites change when the next requirement arrives. Repeat.
Team-Level Taste
Taste isn't limited to code. It also lives in review norms, documentation, and team interfaces.
Make Taste Artifacts Shared
- One-pager templates with examples in the team wiki
- Review checklists for misuse and depth in PR templates
- Design postmortem template with "which design choice amplified the problem?"
Track Design Debt Like Technical Debt
- Create tickets for shallow modules, leaked decisions, and missing feature flags
- Make the depth ratio visible in your team dashboard
- Budget time for depth refactoring, not just new features
Codify Taste in Guidelines
- "Public APIs are permanent, prefer additive changes"
- "Schemas and events are APIs, apply Hyrum's Law"
- "One concept per module, if it has 'and' in the name, split it"
How to Review for Taste
In code reviews and design documents, ask these questions in this order:
- What's the single idea a user must learn? If there are two ideas, split them. (Conceptual integrity)
- What does this module hide? If not much, deepen it. (Depth)
- How could a caller misuse this? Make misuse impossible. (API discipline)
- If we had 10x the users, what behavior would we be stuck with? Are we comfortable with that? (Hyrum's Law)
- Can a new teammate grasp this from a page and a few examples? If not, fix the narrative. (Readability)
- What happens when dependency X fails? Do we have a graceful degradation path? (Operational taste)
- Can we disable this without deploying? If not, why not? (Reversibility)
- What telemetry will tell us this is failing? Do we have the right metrics and traces?
- What's the kill switch? Can we roll back or disable quickly if things go wrong?
The Bottom Line
Sean's article makes the crucial point that taste is about selecting values, not following rigid rules. I want to add that taste is also testable, learnable, and has measurable economic consequences.
Use conceptual integrity as your compass, deep modules for leverage, API discipline for safety, simplicity as your default bias, and operational excellence for production resilience. Do this consistently and people will say you have good taste. But really, you'll just be practicing disciplined engineering.
The programs we write are meant to be read by people, then executed by machines, in that order. Good taste is what makes the first part possible while keeping the second part running smoothly.
Further Reading
- Brooks, Fred. The Mythical Man-Month (1975). Chapter on conceptual integrity.
- Parnas, David. "On the Criteria to Be Used in Decomposing Systems into Modules" (1972). The original argument for information hiding.
- Ousterhout, John. A Philosophy of Software Design (2018). Deep vs shallow modules, the functionality/interface metric.
- Bloch, Joshua. "How to Design a Good API and Why It Matters" (2006). API discipline, public APIs are forever.
- Hickey, Rich. "Simple Made Easy" (2011). Separating simple from easy.
- Bezos, Jeff. Amazon shareholder letters on Type-1 vs Type-2 decisions.
- Hyrum's Law: "With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody."

