paper · 2026

BERT 레이어 해부 — FFN 이중 역할, 분리도 기반 레이어 스킵, 전하 흐름 학습 기반 해석 가능 분류

BERT 5개 GLUE 태스크 레이어 단위 분석. 세 가지 발견: (1) 분리도 기반 레이어 skip + 보상 분류기 — 3/5 태스크 무손실 압축; (2) FFN의 역할을 92% 구조적 (norm 정규화) + 8% 분류 관련 으로 분해 — 개별 레이어가 분류에 해로운데도 FFN 제거가 모델을 망가뜨리는 이유; (3) 오답의 60–93%가 high-confidence error — BERT의 CLS 벡터 자체가 본질적 한계.

What this paper does

BERT (fine-tuned on GLUE) is treated as a black box for most operational purposes, but per-layer attribution reveals structure that’s missing from output-level analysis. This paper applies a forward-primary learning framework (River XAI / RX) to dissect BERT across five GLUE tasks, producing three findings that change how to think about BERT compression and interpretability.

Finding 1 — Separability-guided layer skip + compensation classifier

For each layer, we compute a separability measure — between-class distance over within-class variance on the CLS representation. Layers with low Δ-separability are candidates for removal. Skip the harmful layer, train a small compensation classifier (V6, 4–16 hidden nodes) at the intermediate output.

Lossless track (6.5% compute reduction on 3 of 5 GLUE tasks):

TaskBERTCompressedΔ
SST-292.7%92.4%−0.3%
CoLA80.5%80.7%+0.2%
RTE68.3%69.8%+1.5%
MRPC91.2%90.2%−1.0%
QNLI92.2%88.8%−3.4%

Separability analysis identifies skip candidates a priori — before any forward pass. Validation matches actual skip experiments on 4 of 5 tasks.

Finding 2 — FFN’s dual role: structural vs classification

The conventional view of FFN (Geva et al.: “key-value memory”) suggests FFN does classification work. Decomposing FFN’s transformation into structural change (class-common) versus classification change (class-specific) reveals a different picture:

LayerStructural / Classification ratio
L1~150× (almost pure format conversion)
L4~50×
L8~14×
L12~2× (structural ≈ classification)

Front-layer FFNs are ~92% structural — primarily norm normalization preparing inputs for the next attention layer. Back-layer FFNs blend the two roles.

This explains a paradox: individual layer FFNs often hurt classification accuracy when measured naively (e.g., L8 FFN drops SST-2 by 17%), yet their removal breaks downstream attention because the next attention layer depends on the FFN’s norm-normalized output format. FFN can be analyzed away but not removed away.

Finding 3 — Confident-wrong, not uncertain-wrong

Of misclassified samples across the 5 tasks, 60–93% are high-confidence errors (Q_out margin > 0.3 between correct and wrong classes):

TaskErrorsHigh-confidenceAvg margin
SST-23060%0.39
CoLA7593%0.72
MRPC4487%0.61
QNLI4673%0.60
RTE7587%0.58

When BERT is wrong, it is confidently wrong — not uncertain. The CLS vector itself points the wrong direction. No compensation classifier downstream can recover these samples. The fundamental limitation is upstream of the classifier.

This reframes BERT’s failure modes: the bottleneck isn’t post-processing capacity but the representation itself.

Connections to other works

This is the BERT track of a broader framework that extends across architectures and domains:

  • paper9 — same framework on GPT-2 (decoder transformer), surgical routing correction
  • CheXNet compression — same framework on medical imaging (DenseNet121), Treatment Decision System

Forward-primary learning enables compression and interpretability as two faces of the same operation, across architecture types.

Verify

  • Zenodo — paper PDF + permanent DOI for citation

Status

Foundational layer-level analysis. Method specifics (separability metric definition, layer-selection automation, compensation classifier training procedure) are proprietary. Korean patent application covers the foundational learning framework.