paper · 2026

Dissecting BERT Layers: FFN Dual Role, Separability-Guided Layer Skip, and Interpretable Classification via Charge-Flow Learning

Layer-level analysis of BERT on the five GLUE tasks, applying a forward-primary learning framework. Three findings: (1) separability-guided layer skip with compensation classifier — lossless compression on 3 of 5 GLUE tasks; (2) FFN's role decomposed as 92% structural (norm normalization) and 8% classification, explaining why FFN removal hurts even when individual layers look classification-harmful; (3) 60–93% of misclassifications are high-confidence errors — the BERT CLS vector itself is the fundamental limitation.

What this paper does

BERT (fine-tuned on GLUE) is treated as a black box for most operational purposes, but per-layer attribution reveals structure that’s missing from output-level analysis. This paper applies a forward-primary learning framework (River XAI / RX) to dissect BERT across five GLUE tasks, producing three findings that change how to think about BERT compression and interpretability.

Finding 1 — Separability-guided layer skip + compensation classifier

For each layer, we compute a separability measure — between-class distance over within-class variance on the CLS representation. Layers with low Δ-separability are candidates for removal. Skip the harmful layer, train a small compensation classifier (V6, 4–16 hidden nodes) at the intermediate output.

Lossless track (6.5% compute reduction on 3 of 5 GLUE tasks):

TaskBERTCompressedΔ
SST-292.7%92.4%−0.3%
CoLA80.5%80.7%+0.2%
RTE68.3%69.8%+1.5%
MRPC91.2%90.2%−1.0%
QNLI92.2%88.8%−3.4%

Separability analysis identifies skip candidates a priori — before any forward pass. Validation matches actual skip experiments on 4 of 5 tasks.

Finding 2 — FFN’s dual role: structural vs classification

The conventional view of FFN (Geva et al.: “key-value memory”) suggests FFN does classification work. Decomposing FFN’s transformation into structural change (class-common) versus classification change (class-specific) reveals a different picture:

LayerStructural / Classification ratio
L1~150× (almost pure format conversion)
L4~50×
L8~14×
L12~2× (structural ≈ classification)

Front-layer FFNs are ~92% structural — primarily norm normalization preparing inputs for the next attention layer. Back-layer FFNs blend the two roles.

This explains a paradox: individual layer FFNs often hurt classification accuracy when measured naively (e.g., L8 FFN drops SST-2 by 17%), yet their removal breaks downstream attention because the next attention layer depends on the FFN’s norm-normalized output format. FFN can be analyzed away but not removed away.

Finding 3 — Confident-wrong, not uncertain-wrong

Of misclassified samples across the 5 tasks, 60–93% are high-confidence errors (Q_out margin > 0.3 between correct and wrong classes):

TaskErrorsHigh-confidenceAvg margin
SST-23060%0.39
CoLA7593%0.72
MRPC4487%0.61
QNLI4673%0.60
RTE7587%0.58

When BERT is wrong, it is confidently wrong — not uncertain. The CLS vector itself points the wrong direction. No compensation classifier downstream can recover these samples. The fundamental limitation is upstream of the classifier.

This reframes BERT’s failure modes: the bottleneck isn’t post-processing capacity but the representation itself.

Connections to other works

This is the BERT track of a broader framework that extends across architectures and domains:

  • paper9 — same framework on GPT-2 (decoder transformer), surgical routing correction
  • CheXNet compression — same framework on medical imaging (DenseNet121), Treatment Decision System

Forward-primary learning enables compression and interpretability as two faces of the same operation, across architecture types.

Verify

  • Zenodo — paper PDF + permanent DOI for citation

Status

Foundational layer-level analysis. Method specifics (separability metric definition, layer-selection automation, compensation classifier training procedure) are proprietary. Korean patent application covers the foundational learning framework.