Lossless Mechanistic Compression and Surgical Correction of Medical Imaging Models

What this paper does

CheXNet (DenseNet121) is widely deployed for multi-label thoracic pathology classification but has 6.97M parameters and opaque debugging. This paper presents a unified framework that addresses three orthogonal challenges — footprint, opacity, and expensive retraining — in a single mechanistic system:

Lossless compression — 51.43% parameter reduction (6.97M → 3.38M) via channel-wise sparsity-constrained weight reconstruction on NIH ChestX-ray14. Output equivalence to numerical precision (max |Δlogit| < 5×10⁻⁶). Latency reduced from 16.07 ms to 15.54 ms.
Surgical correction — classifier-channel attribution + selective weight zeroing. A 5-channel correction reduces a target false positive by Δprob −0.13 with zero true-positive loss and exactly zero AUROC change on the other 13 pathologies (by construction — only one classifier row is modified).
Cost-aware Treatment Decision System — routes each pathology issue to its cheapest effective intervention: threshold calibration, surgical correction, partial retraining, or augmentation. Empirical cost matrix revises initial estimates (notably: retrain_part cost 6 → 2, just 85 s).
Clinical report auto-generation — channel-level evidence, Grad-CAM region mapping, mutual-exclusivity-based exclusion.

Reframing polysemanticity — mutual exclusivity exploitation

A classical concern about polysemantic channels (Olah et al.) is that a single channel responds to unrelated concepts. In the multi-label medical setting we find something qualitatively different: polarized channels (with both strong positive and strong negative weights across different pathologies) are not architectural conflicts. They are bipolar discriminative axes between mutually exclusive labels.

Jaccard-based legitimacy classification of 100 polarized classifier channels:

Category	Count	Threshold
Perfect	48	J = 0
Legitimate	41	J < 0.1
Mixed	11	0.1 ≤ J < 0.3
Conflict	0	J ≥ 0.3

The model is efficient, not confused. Hernia and Cardiomegaly almost never co-occur (J = 0); encoding them on one bipolar channel saves capacity.

Minimal retraining — when to use which tool

The empirical cost matrix decomposition:

Treatment	Empirical cost	Time	Trainable params
No action	0	0	—
Threshold calibration	1	<1 s	—
Surgical correction	2	<5 s	5–10 (zeroed)
Partial retraining	2 ★	85 s	18K (0.55%)
F re-optimization	3	minutes	layer-dependent
Data augmentation	8	hours	—
Full retraining	10	hours	3.38M (100%)

Classifier-only fine-tune (18K params, 0.55%, 85 s) + threshold calibration via Youden’s J yields F1 × 1.6 and recall × 7 over the default threshold-0.5 evaluation. Adding deeper layers to the fine-tune set yields no measurable additional benefit on our test split.

Verify

Zenodo — paper PDF, permanent DOI for citation
Model weights for the compressed backbone and the fine-tuned classifier are released alongside the paper

Status

Korean patent application pending; PCT international filing planned within the priority year. The author is actively seeking collaborators in medical imaging AI, model compression for edge deployment, and integrated diagnostic systems.