BERT 레이어 해부 — FFN 이중 역할, 분리도 기반 레이어 스킵, 전하 흐름 학습 기반 해석 가능 분류

What this paper does

BERT (fine-tuned on GLUE) is treated as a black box for most operational purposes, but per-layer attribution reveals structure that’s missing from output-level analysis. This paper applies a forward-primary learning framework (River XAI / RX) to dissect BERT across five GLUE tasks, producing three findings that change how to think about BERT compression and interpretability.

Finding 1 — Separability-guided layer skip + compensation classifier

For each layer, we compute a separability measure — between-class distance over within-class variance on the CLS representation. Layers with low Δ-separability are candidates for removal. Skip the harmful layer, train a small compensation classifier (V6, 4–16 hidden nodes) at the intermediate output.

Lossless track (6.5% compute reduction on 3 of 5 GLUE tasks):

Task	BERT	Compressed	Δ
SST-2	92.7%	92.4%	−0.3%
CoLA	80.5%	80.7%	+0.2%
RTE	68.3%	69.8%	+1.5%
MRPC	91.2%	90.2%	−1.0%
QNLI	92.2%	88.8%	−3.4%

Separability analysis identifies skip candidates a priori — before any forward pass. Validation matches actual skip experiments on 4 of 5 tasks.

Finding 2 — FFN’s dual role: structural vs classification

The conventional view of FFN (Geva et al.: “key-value memory”) suggests FFN does classification work. Decomposing FFN’s transformation into structural change (class-common) versus classification change (class-specific) reveals a different picture:

Layer	Structural / Classification ratio
L1	~150× (almost pure format conversion)
L4	~50×
L8	~14×
L12	~2× (structural ≈ classification)

Front-layer FFNs are ~92% structural — primarily norm normalization preparing inputs for the next attention layer. Back-layer FFNs blend the two roles.

This explains a paradox: individual layer FFNs often hurt classification accuracy when measured naively (e.g., L8 FFN drops SST-2 by 17%), yet their removal breaks downstream attention because the next attention layer depends on the FFN’s norm-normalized output format. FFN can be analyzed away but not removed away.

Finding 3 — Confident-wrong, not uncertain-wrong

Of misclassified samples across the 5 tasks, 60–93% are high-confidence errors (Q_out margin > 0.3 between correct and wrong classes):

Task	Errors	High-confidence	Avg margin
SST-2	30	60%	0.39
CoLA	75	93%	0.72
MRPC	44	87%	0.61
QNLI	46	73%	0.60
RTE	75	87%	0.58

When BERT is wrong, it is confidently wrong — not uncertain. The CLS vector itself points the wrong direction. No compensation classifier downstream can recover these samples. The fundamental limitation is upstream of the classifier.

This reframes BERT’s failure modes: the bottleneck isn’t post-processing capacity but the representation itself.

Connections to other works

This is the BERT track of a broader framework that extends across architectures and domains:

paper9 — same framework on GPT-2 (decoder transformer), surgical routing correction
CheXNet compression — same framework on medical imaging (DenseNet121), Treatment Decision System

Forward-primary learning enables compression and interpretability as two faces of the same operation, across architecture types.

Verify

Zenodo — paper PDF + permanent DOI for citation

Status

Foundational layer-level analysis. Method specifics (separability metric definition, layer-selection automation, compensation classifier training procedure) are proprietary. Korean patent application covers the foundational learning framework.