Every test answers a question. The problem is that teams often ask the right question at the wrong layer.

A unit test that tries to prove “does checkout work?” is too high-level for its layer. A BDD scenario that tries to prove “does this regex validate email?” is too low-level for its layer. Both tests might pass. Neither is doing its job well.

That is how test suites become slow, fragile, and full of blind spots at the same time. Teams add more tests to compensate. The suite gets slower. Confidence does not improve much. Eventually someone decides the tests are too expensive to maintain, and coverage starts to erode.

The goal is not to maximize the number of tests. The goal is to put each test in the layer where it provides the most confidence for the least cost.

The Four Layers

Most teams work across four layers, whether they name them that way or not.

Unit tests check whether a piece of logic works in isolation.

Integration tests check whether components work together correctly.

End-to-end tests check whether a real user workflow works across the system.

BDD scenarios check whether the behavior matches what the team agreed should happen.

These layers are not interchangeable. Each one answers a different kind of question.

A Simple Rule

Use the cheapest layer that can answer the question with confidence.

If a unit test can give you the answer, start there. If the real risk is in the connection between components, move up to integration. If the risk is in a full workflow, use E2E. If the behavior needs to be reviewed and agreed on by people across roles, use BDD.

That rule handles most placement decisions surprisingly well.

Where Each Layer Wins

Unit tests are best for internal logic: calculations, parsing, transformations, validation rules, branching logic, and edge cases. If the question is “does this function produce the right result for these inputs?”, a unit test is usually the answer.

Integration tests are best for boundaries: API-to-service calls, database queries, queue producers and consumers, external service adapters, and configuration wiring. The question is not whether the code compiles. The question is whether the parts still work when connected.

E2E tests are best for a small number of critical workflows: checkout, sign-up, authentication, and other high-value journeys. They are expensive and fragile, so they should be used deliberately.

BDD scenarios are best for behavior that needs shared understanding: business rules, acceptance criteria, cross-team requirements, and features where misunderstanding is expensive. BDD is not defined by where it runs. It is defined by what it does: it turns expected behavior into something product, QA, and developers can all review.

Where Teams Go Wrong

The most common failure mode is not too few tests. It is the same intent duplicated across layers, often at the wrong cost.

Three mistakes show up again and again:

Too much at the E2E layer. Teams push low-level checks into browser tests and pay the highest cost for the weakest signal.

Fake integration in unit tests. Teams mock everything and call it system coverage, even though nothing real is actually connected.

Technical detail in BDD. Teams write Gherkin for cache invalidation, parser behavior, helper methods, or retry loops that no stakeholder will ever read. For more on this failure mode, see when to skip BDD.

Where BDD Fits

BDD belongs wherever the behavior itself needs to be visible and agreed on.

Scenario: Apply percentage discount to cart total
  Given I have items in my cart totaling $100
  When I apply coupon "SAVE20"
  Then my cart total should be $80

That scenario expresses a business rule. It says what behavior matters.

The automation underneath could happen in different places: as a unit test against discount logic, as an integration test through the checkout API, or as an E2E test through the full UI. Those are separate decisions.

The real question is not “should this be a BDD test?” The real question is “does this behavior need stakeholder visibility?” If yes, BDD is useful. If no, it may just be overhead. For a worked example of structuring BDD scenarios, see the 5-step decomposition framework.

Quick Reference

Feature Best layer Why
Password validation Unit Internal logic, clear inputs and outputs
Discount calculation Unit Math and business logic at code level
Checkout API Integration Boundary between services
CSV export Integration Boundary between app logic and output
Payment flow BDD + E2E User-visible, high-risk, needs agreement
User permissions BDD + integration Business rules plus enforcement
Coupon application BDD + unit Visible behavior plus underlying rule logic

The phrase “best layer” matters. Some features deserve support from more than one layer. The point is to know which layer owns the main risk.

What Not to Force into BDD

BDD is usually the wrong home for:

  • pure implementation details
  • algorithmic correctness
  • visual layout precision
  • performance thresholds
  • infrastructure resilience
  • throwaway spikes and prototypes

Those still matter. They just need different tools. A performance regression might need a load test. A visual bug might need screenshot comparison. A parsing edge case might need a unit test.

Summary

A healthy suite does not force everything into one layer. It uses unit tests for internal logic, integration tests for boundaries, E2E tests for a few critical workflows, and BDD for behavior that needs agreement.

The goal is not to prove the same obvious thing three times. The goal is to place each assertion where it gives the most signal.

The discipline is not writing more tests. The discipline is writing each test at the layer where it earns its cost.