paper · 2026
Lossless Mechanistic Compression and Surgical Correction of Medical Imaging Models
DenseNet121 chest X-ray classifier compressed 51.43% (6.97M → 3.38M parameters) at constant accuracy (AUROC +0.0003). Compression makes channel attribution atomic enough for surgical correction: 5-channel weight zeroing reduces a target false positive by ∆prob −0.13 with zero true-positive loss and exactly zero AUROC change on the other 13 pathologies. Polarized channels reframed as bipolar discriminative axes exploiting label mutual exclusivity (Jaccard < 0.1 in 89 of 100 polarized channels, zero architectural conflicts).
What this paper does
CheXNet (DenseNet121) is widely deployed for multi-label thoracic pathology classification but has 6.97M parameters and opaque debugging. This paper presents a unified framework that addresses three orthogonal challenges — footprint, opacity, and expensive retraining — in a single mechanistic system:
-
Lossless compression — 51.43% parameter reduction (6.97M → 3.38M) via channel-wise sparsity-constrained weight reconstruction on NIH ChestX-ray14. Output equivalence to numerical precision (max |Δlogit| < 5×10⁻⁶). Latency reduced from 16.07 ms to 15.54 ms.
-
Surgical correction — classifier-channel attribution + selective weight zeroing. A 5-channel correction reduces a target false positive by Δprob −0.13 with zero true-positive loss and exactly zero AUROC change on the other 13 pathologies (by construction — only one classifier row is modified).
-
Cost-aware Treatment Decision System — routes each pathology issue to its cheapest effective intervention: threshold calibration, surgical correction, partial retraining, or augmentation. Empirical cost matrix revises initial estimates (notably: retrain_part cost 6 → 2, just 85 s).
-
Clinical report auto-generation — channel-level evidence, Grad-CAM region mapping, mutual-exclusivity-based exclusion.
Reframing polysemanticity — mutual exclusivity exploitation
A classical concern about polysemantic channels (Olah et al.) is that a single channel responds to unrelated concepts. In the multi-label medical setting we find something qualitatively different: polarized channels (with both strong positive and strong negative weights across different pathologies) are not architectural conflicts. They are bipolar discriminative axes between mutually exclusive labels.
Jaccard-based legitimacy classification of 100 polarized classifier channels:
| Category | Count | Threshold |
|---|---|---|
| Perfect | 48 | J = 0 |
| Legitimate | 41 | J < 0.1 |
| Mixed | 11 | 0.1 ≤ J < 0.3 |
| Conflict | 0 | J ≥ 0.3 |
The model is efficient, not confused. Hernia and Cardiomegaly almost never co-occur (J = 0); encoding them on one bipolar channel saves capacity.
Minimal retraining — when to use which tool
The empirical cost matrix decomposition:
| Treatment | Empirical cost | Time | Trainable params |
|---|---|---|---|
| No action | 0 | 0 | — |
| Threshold calibration | 1 | <1 s | — |
| Surgical correction | 2 | <5 s | 5–10 (zeroed) |
| Partial retraining | 2 ★ | 85 s | 18K (0.55%) |
| F re-optimization | 3 | minutes | layer-dependent |
| Data augmentation | 8 | hours | — |
| Full retraining | 10 | hours | 3.38M (100%) |
Classifier-only fine-tune (18K params, 0.55%, 85 s) + threshold calibration via Youden’s J yields F1 × 1.6 and recall × 7 over the default threshold-0.5 evaluation. Adding deeper layers to the fine-tune set yields no measurable additional benefit on our test split.
Verify
- Zenodo — paper PDF, permanent DOI for citation
- Model weights for the compressed backbone and the fine-tuned classifier are released alongside the paper
Status
Korean patent application pending; PCT international filing planned within the priority year. The author is actively seeking collaborators in medical imaging AI, model compression for edge deployment, and integrated diagnostic systems.